diff options
author | Wilco Dijkstra <wdijkstr@arm.com> | 2018-08-10 17:31:30 +0100 |
---|---|---|
committer | Wilco Dijkstra <wdijkstr@arm.com> | 2018-08-10 17:34:39 +0100 |
commit | ea5c662c623125ad39fc43d5b9224150256407a9 (patch) | |
tree | d4adba7c05e215900b95d9278ace33cd7554272c /sysdeps/ieee754/flt-32/s_sincosf_data.c | |
parent | 43cfdf8f4811c63bca8a2185d82a6d92977a125b (diff) | |
download | glibc-ea5c662c623125ad39fc43d5b9224150256407a9.tar.gz glibc-ea5c662c623125ad39fc43d5b9224150256407a9.tar.xz glibc-ea5c662c623125ad39fc43d5b9224150256407a9.zip |
Improve performance of sincosf
This patch is a complete rewrite of sincosf. The new version is significantly faster, as well as simple and accurate. The worst-case ULP is 0.5607, maximum relative error is 0.5303 * 2^-23 over all 4 billion inputs. In non-nearest rounding modes the error is 1ULP. The algorithm uses 3 main cases: small inputs which don't need argument reduction, small inputs which need a simple range reduction and large inputs requiring complex range reduction. The code uses approximate integer comparisons to quickly decide between these cases. The small range reducer uses a single reduction step to handle values up to 120.0. It is fastest on targets which support inlined round instructions. The large range reducer uses integer arithmetic for simplicity. It does a 32x96 bit multiply to compute a 64-bit modulo result. This is more than accurate enough to handle the worst-case cancellation for values close to an integer multiple of PI/4. It could be further optimized, however it is already much faster than necessary. sincosf throughput gains on Cortex-A72: * |x| < 0x1p-12 : 1.6x * |x| < M_PI_4 : 1.7x * |x| < 2 * M_PI: 1.5x * |x| < 120.0 : 1.8x * |x| < Inf : 2.3x * math/Makefile: Add s_sincosf_data.c. * sysdeps/ia64/fpu/s_sincosf_data.c: New file. * sysdeps/ieee754/flt-32/s_sincosf.h (abstop12): Add new function. (sincosf_poly): Likewise. (reduce_small): Likewise. (reduce_large): Likewise. * sysdeps/ieee754/flt-32/s_sincosf.c (sincosf): Rewrite. * sysdeps/ieee754/flt-32/s_sincosf_data.c: New file with sincosf data. * sysdeps/m68k/m680x0/fpu/s_sincosf_data.c: New file. * sysdeps/x86_64/fpu/s_sincosf_data.c: New file.
Diffstat (limited to 'sysdeps/ieee754/flt-32/s_sincosf_data.c')
-rw-r--r-- | sysdeps/ieee754/flt-32/s_sincosf_data.c | 74 |
1 files changed, 74 insertions, 0 deletions
diff --git a/sysdeps/ieee754/flt-32/s_sincosf_data.c b/sysdeps/ieee754/flt-32/s_sincosf_data.c new file mode 100644 index 0000000000..21fc2b60f9 --- /dev/null +++ b/sysdeps/ieee754/flt-32/s_sincosf_data.c @@ -0,0 +1,74 @@ +/* Compute sine and cosine of argument. + Copyright (C) 2018 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#include <stdint.h> +#include <math.h> +#include "math_config.h" +#include "s_sincosf.h" + +/* The constants and polynomials for sine and cosine. The 2nd entry + computes -cos (x) rather than cos (x) to get negation for free. */ +const sincos_t __sincosf_table[2] = +{ + { + { 1.0, -1.0, -1.0, 1.0 }, +#if TOINT_INTRINSICS + 0x1.45F306DC9C883p-1, +#else + 0x1.45F306DC9C883p+23, +#endif + 0x1.921FB54442D18p0, + 0x1p0, + -0x1.ffffffd0c621cp-2, + 0x1.55553e1068f19p-5, + -0x1.6c087e89a359dp-10, + 0x1.99343027bf8c3p-16, + -0x1.555545995a603p-3, + 0x1.1107605230bc4p-7, + -0x1.994eb3774cf24p-13 + }, + { + { 1.0, -1.0, -1.0, 1.0 }, +#if TOINT_INTRINSICS + 0x1.45F306DC9C883p-1, +#else + 0x1.45F306DC9C883p+23, +#endif + 0x1.921FB54442D18p0, + -0x1p0, + 0x1.ffffffd0c621cp-2, + -0x1.55553e1068f19p-5, + 0x1.6c087e89a359dp-10, + -0x1.99343027bf8c3p-16, + -0x1.555545995a603p-3, + 0x1.1107605230bc4p-7, + -0x1.994eb3774cf24p-13 + } +}; + +/* Table with 4/PI to 192 bit precision. To avoid unaligned accesses + only 8 new bits are added per entry, making the table 4 times larger. */ +const uint32_t __inv_pio4[24] = +{ + 0xa2, 0xa2f9, 0xa2f983, 0xa2f9836e, + 0xf9836e4e, 0x836e4e44, 0x6e4e4415, 0x4e441529, + 0x441529fc, 0x1529fc27, 0x29fc2757, 0xfc2757d1, + 0x2757d1f5, 0x57d1f534, 0xd1f534dd, 0xf534ddc0, + 0x34ddc0db, 0xddc0db62, 0xc0db6295, 0xdb629599, + 0x6295993c, 0x95993c43, 0x993c4390, 0x3c439041 +}; |