diff options
author | Rich Felker <dalias@aerifal.cx> | 2012-03-20 00:51:32 -0400 |
---|---|---|
committer | Rich Felker <dalias@aerifal.cx> | 2012-03-20 00:51:32 -0400 |
commit | baa43bca0a051e8deb0d6a9a8882ceeea5c27249 (patch) | |
tree | f5fe7ae916d9039adfe82217716e2aafd08702fb /src/prng/__rand48_step.c | |
parent | 7513d3ecabb998e2c8c4cb9ed5de48c4b64a166b (diff) | |
download | musl-baa43bca0a051e8deb0d6a9a8882ceeea5c27249.tar.gz musl-baa43bca0a051e8deb0d6a9a8882ceeea5c27249.tar.xz musl-baa43bca0a051e8deb0d6a9a8882ceeea5c27249.zip |
optimize scalbn family
the fscale instruction is slow everywhere, probably because it involves a costly and unnecessary integer truncation operation that ends up being a no-op in common usages. instead, construct a floating point scale value with integer arithmetic and simply multiply by it, when possible. for float and double, this is always possible by going to the next-larger type. we use some cheap but effective saturating arithmetic tricks to make sure even very large-magnitude exponents fit. for long double, if the scaling exponent is too large to fit in the exponent of a long double value, we simply fallback to the expensive fscale method. on atom cpu, these changes speed up scalbn by over 30%. (min rdtsc timing dropped from 110 cycles to 70 cycles.)
Diffstat (limited to 'src/prng/__rand48_step.c')
0 files changed, 0 insertions, 0 deletions