← ppnm

Exercise "machine epsilon"

Tasks

  1. Maximum/minimum representable integers.

  2. The machine epsilon is the difference between 1.0 and the next representable floating point number. Using the while loop calculate the machine epsilon for the types float and double. Something like
    double x=1; while(1+x!=1){x/=2;} x*=2;
    float y=1F; while((float)(1F+y) != 1F){y/=2F;} y*=2F;
    
    There seem to be no predefined values for this numbers in csharp (I couldn't find it in any case). However, in a IEEE 64-bit floating-point number (double), where 1bit is reserved for the sign and 11bits for exponent, there are 52bits remaining for the fraction, therefore the double machine epsilon must be about System.Math.Pow(2,-52). For single precision (float) the machine epsilon should be about System.Math.Pow(2,-23). Check this.
  3. Suppose tiny=epsilon/2. Calculate the two sums,

    sumA=1+tiny+tiny+...+tiny;
    sumB=tiny+tiny+...+tiny+1;
    
    which should seemingly be the same and print out the values sumA-1 and sumB-1. Someting like
    int n=(int)1e6;
    double epsilon=Pow(2,-52);
    double tiny=epsilon/2;
    double sumA=0,sumB=0;
    
    sumA+=1; for(int i=0;i<n;i++){sumA+=tiny;}
    for(int i=0;i<n;i++){sumB+=tiny;} sumB+=1;
    
    WriteLine($"sumA-1 = {sumA-1:e} should be {n*tiny:e}");
    WriteLine($"sumB-1 = {sumB-1:e} should be {n*tiny:e}");
    
    Explain why there is a difference.

  4. The equality operator "==" works well on integer types but is not very useful on floating types. Indeed most doubles do not have an exact representation of their values in a computer. They must be rounded to be saved. Because of this rounding, comparing two doubles with the "==" operator would produce a wrong result. For example, in this code
    double d1 = 0.1+0.1+0.1+0.1+0.1+0.1+0.1+0.1;
    double d2 = 8*0.1;
    	
    both doubles "d1" and "d2" should be equal 0.8 and then the "==" operator should produce the "true" result. However, try
    WriteLine($"d1={d1:e15}");
    WriteLine($"d2={d2:e15}");
    WriteLine($"d1==d2 ? => {d1==d2}");
    	
    and see that this is not the case (not in my box in any case). That is because the decimal number 0.1 cannot be represented exactly as a 52-digit binary number.

    For this reason, one needs a more complex comparison algorithm. Two doubles in a finite digit representation can only be compared with the given absolute and/or relative precision (where the values for the precision actually depend on the task at hand and generally must be supplied by the user).

    Therefore, implement a function with the signature

    bool approx(double a, double b, double acc=1e-9, double eps=1e-9)
    
    that returns "true" if the numbers 'a' and 'b' are equal either with absolute precision "acc",
    |a-b| < acc
    
    or with relative precision "epsilon",
    |a-b|/Max(|a|,|b|) < eps
    
    and returns "false" otherwise. Something like
    public static bool approx
    (double a, double b, double acc=1e-9, double eps=1e-9){
    	if(Abs(b-a) < acc) return true;
    	else if(Abs(b-a) < Max(Abs(a),Abs(b))*eps) return true;
    	else return false;
    }