|
from https://github.com/ARM-software/optimized-routines,
commit 04884bd04eac4b251da4026900010ea7d8850edc
In expf TOINT_INTRINSICS is kept, but is unused, it would require support
for __builtin_round and __builtin_lround as single instruction.
code size change: +94 bytes.
benchmark on x86_64 before, after, speedup:
-Os:
expf rthruput: 9.19 ns/call 8.11 ns/call 1.13x
expf latency: 34.19 ns/call 18.77 ns/call 1.82x
exp2f rthruput: 5.59 ns/call 6.52 ns/call 0.86x
exp2f latency: 17.93 ns/call 16.70 ns/call 1.07x
-O3:
expf rthruput: 9.12 ns/call 4.92 ns/call 1.85x
expf latency: 34.44 ns/call 18.99 ns/call 1.81x
exp2f rthruput: 5.58 ns/call 4.49 ns/call 1.24x
exp2f latency: 17.95 ns/call 16.94 ns/call 1.06x
|