diff options
author | Wilco Dijkstra <wdijkstr@arm.com> | 2018-02-12 10:42:42 +0000 |
---|---|---|
committer | Wilco Dijkstra <wdijkstr@arm.com> | 2018-02-12 10:47:09 +0000 |
commit | c3d466cba1692708a19c6ff829d0386c83a0c6e5 (patch) | |
tree | d01ce6103dc25d3b662898c3429b8b103b8d3155 /sysdeps/generic | |
parent | 7bb087bd7bfe3616c4c0974a3f7352b593353ea5 (diff) | |
download | glibc-c3d466cba1692708a19c6ff829d0386c83a0c6e5.tar.gz glibc-c3d466cba1692708a19c6ff829d0386c83a0c6e5.tar.xz glibc-c3d466cba1692708a19c6ff829d0386c83a0c6e5.zip |
Remove slow paths from pow
Remove the slow paths from pow. Like several other double precision math functions, pow is exactly rounded. This is not required from math functions and causes major overheads as it requires multiple fallbacks using higher precision arithmetic if a result is close to 0.5ULP. Ridiculous slowdowns of up to 100000x have been reported when the highest precision path triggers. All GLIBC math tests pass on AArch64 and x64 (with ULP of pow set to 1). The worst case error is ~0.506ULP. A simple test over a few hundred million values shows pow is 10% faster on average. This fixes BZ #13932. [BZ #13932] * sysdeps/ieee754/dbl-64/uexp.h (err_1): Remove. * benchtests/pow-inputs: Update comment for slow path cases. * manual/probes.texi (slowpow_p10): Delete removed probe. (slowpow_p10): Likewise. * math/Makefile: Remove halfulp.c and slowpow.c. * sysdeps/aarch64/libm-test-ulps: Set ULP of pow to 1. * sysdeps/generic/math_private.h (__exp1): Remove error argument. (__halfulp): Remove. (__slowpow): Remove. * sysdeps/i386/fpu/halfulp.c: Delete file. * sysdeps/i386/fpu/slowpow.c: Likewise. * sysdeps/ia64/fpu/halfulp.c: Likewise. * sysdeps/ia64/fpu/slowpow.c: Likewise. * sysdeps/ieee754/dbl-64/e_exp.c (__exp1): Remove error argument, improve comments and add error analysis. * sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Add error analysis. (power1): Remove function: (log1): Remove error argument, add error analysis. (my_log2): Remove function. * sysdeps/ieee754/dbl-64/halfulp.c: Delete file. * sysdeps/ieee754/dbl-64/slowpow.c: Likewise. * sysdeps/m68k/m680x0/fpu/halfulp.c: Likewise. * sysdeps/m68k/m680x0/fpu/slowpow.c: Likewise. * sysdeps/powerpc/power4/fpu/Makefile: Remove CPPFLAGS-slowpow.c. * sysdeps/x86_64/fpu/libm-test-ulps: Set ULP of pow to 1. * sysdeps/x86_64/fpu/multiarch/Makefile: Remove slowpow-fma.c, slowpow-fma4.c, halfulp-fma.c, halfulp-fma4.c. * sysdeps/x86_64/fpu/multiarch/e_pow-fma.c (__slowpow): Remove define. * sysdeps/x86_64/fpu/multiarch/e_pow-fma4.c (__slowpow): Likewise. * sysdeps/x86_64/fpu/multiarch/halfulp-fma.c: Delete file. * sysdeps/x86_64/fpu/multiarch/halfulp-fma4.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowpow-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowpow-fma4.c: Likewise.
Diffstat (limited to 'sysdeps/generic')
-rw-r--r-- | sysdeps/generic/math_private.h | 4 |
1 files changed, 1 insertions, 3 deletions
diff --git a/sysdeps/generic/math_private.h b/sysdeps/generic/math_private.h index 0a35cb39fb..07c97fa5e3 100644 --- a/sysdeps/generic/math_private.h +++ b/sysdeps/generic/math_private.h @@ -250,20 +250,18 @@ fabsf128 (_Float128 x) /* Prototypes for functions of the IBM Accurate Mathematical Library. */ -extern double __exp1 (double __x, double __xx, double __error); +extern double __exp1 (double __x, double __xx); extern double __sin (double __x); extern double __cos (double __x); extern int __branred (double __x, double *__a, double *__aa); extern void __doasin (double __x, double __dx, double __v[]); extern void __dubsin (double __x, double __dx, double __v[]); extern void __dubcos (double __x, double __dx, double __v[]); -extern double __halfulp (double __x, double __y); extern double __sin32 (double __x, double __res, double __res1); extern double __cos32 (double __x, double __res, double __res1); extern double __mpsin (double __x, double __dx, bool __range_reduce); extern double __mpcos (double __x, double __dx, bool __range_reduce); extern double __slowexp (double __x); -extern double __slowpow (double __x, double __y, double __z); extern void __docos (double __x, double __dx, double __v[]); #ifndef math_opt_barrier |