Add distributed summation routines
Parallel summation of floating-point values is not associative and thus gives different results depending on the execution sequence which changes with MPI implementation and decomposition of the summed array.
Therefore a method to give identical results for any decomposition should be implemented, three alternatives are currently evaluated:
- summation of double double precision values (fast, but only limited guarantees against cancellation)
- summation of wordsize integers derived from scaling of the FP values to a pre-established exponent (slow on some platforms where fp to integer conversion is slow, does not address cancellation or overflow)
- summation of bignum integers derived from conversion of the FP values (slow, but addresses both overflow of intermediates and cancellation on any exponent range)
The first alternative is already implemented in a prototype and only needs porting to the ScalES-PPM.