←to practical programming

Exercise "epsilon"

  1. Maximum/minimum representable integers.
    1. The maximum representable integer is the largest integer i for which i+1>i holds true.

      Using the while loop determine your maximum integer and compare it with int.MaxValue.

      Hint: something like

      int i=1; while(i+1>i) {i++;}
      Write("my max int = {0}\n",i);
      
      It can take some seconds to calculate.
    2. The minimum representable integer is the most negative integer i for which i-1<i holds true.

      Using the while loop determine your minimum integer and compare with int.MinValue.

  2. The machine epsilon is the difference between 1.0 and the next representable floating point number. Using the while loop calculate the machine epsilon for the types float and double. Hint:
    double x=1; while(1+x!=1){x/=2;} x*=2;
    float y=1F; while((float)(1F+y) != 1F){y/=2F;} y*=2F;
    
    There seem to be no predefined values for this numbers in csharp (tell me if you found them). However, for a IEEE 64-bit floating-point number, where 1bit is reserved for the sign and 11bits for exponent, there are 52bits remaining for the fraction, therefore the double machine epsilon must be about System.Math.Pow(2,-52).

    For single precision the machine epsilon should be about System.Math.Pow(2,-23).

    Check this.

  3. Harmonic sum.
    1. Define int max=int.MaxValue/2; (or, say, int.MaxValue/3, if the execution time is longer than you are willing to wait) and calculate (using iteration statements) the following harmonic sum,
      float float_sum_up = 1f + 1f/2 + 1f/3 + ... + 1f/max;
      
      and a seemingly the same harmonic sum, only from the other side,
      float float_sum_down = 1f/max + 1f/(max-1) + 1f/(max-2) + ...  +1f;
      
      with float type and compare the two sums.

      Something like

      int max=int.MaxValue/3;
      
      float float_sum_up=1F;
      for(int i=2;i<max;i++)float_sum_up+=1F/i;
      Write("float_sum_up={0}\n",float_sum_up);
      
      float float_sum_down=1F/max;
      for(int i=max-1;i>0;i--)float_sum_down+=1F/i;
      Write("float_sum_down={0}\n",float_sum_down);
      
    2. Explain the difference.
    3. Does this sum converge as function of max?
    4. Now calculate the sums sum_up_double and sum_down_double using double type. Explain the result.
  4. Write a function with the signature
    bool approx(double a, double b, double tau=1e-9, double epsilon=1e-9)
    
    that returns true if the numbers 'a' and 'b' are equal with absolute precision 'tau', \[|a-b|\lt\tau\] or are equal with relative precision 'epsilon', \[\frac{|a-b|}{|a|+|b|}\lt\frac{\epsilon}{2}\] and returns false otherwise.

    The function must be placed in a separate .cs file, compiled separately and then linked to the final executable.