about summary refs log tree commit diff
path: root/sysdeps/x86_64/fpu/multiarch/s_sincosf-fma.c
Commit message (Collapse)AuthorAgeFilesLines
* x86-64: Vectorize sincosf_poly and update s_sincosf-fma.cH.J. Lu2018-12-261-270/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Add <sincosf_poly.h> and include it in s_sincosf.h to allow vectorized sincosf_poly. Add x86 sincosf_poly.h to vectorize sincosf_poly. On Broadwell, bench-sincosf shows: Before After Improvement max 160.273 114.198 40% min 6.25 5.625 11% mean 13.0325 10.6462 22% Vectorized sincosf_poly shows Before After Improvement max 138.653 114.198 21% min 5.004 5.625 -11% mean 11.5934 10.6462 9% Tested on x86-64 and i686 as well as with build-many-glibcs.py. * sysdeps/ieee754/flt-32/s_sincosf.h: Include <sincosf_poly.h>. (sincos_t, sincosf_poly, sinf_poly): Moved to ... * sysdeps/ieee754/flt-32/sincosf_poly.h: Here. New file. * sysdeps/x86/fpu/s_sincosf_data.c: New file. * sysdeps/x86/fpu/sincosf_poly.h: Likewise. * sysdeps/x86_64/fpu/multiarch/s_sincosf-fma.c: Just include <sysdeps/ieee754/flt-32/s_sincosf.c>.
* Improve performance of sinf and cosfWilco Dijkstra2018-08-141-1/+32
| | | | | | | | | | | | | | | | | | | | | | | | | The second patch improves performance of sinf and cosf using the same algorithms and polynomials. The returned values are identical to sincosf for the same input. ULP definitions for AArch64 and x64 are updated. sinf/cosf througput gains on Cortex-A72: * |x| < 0x1p-12 : 1.2x * |x| < M_PI_4 : 1.8x * |x| < 2 * M_PI: 1.7x * |x| < 120.0 : 2.3x * |x| < Inf : 3.0x * NEWS: Mention sinf, cosf, sincosf. * sysdeps/aarch64/libm-test-ulps: Update ULP for sinf, cosf, sincosf. * sysdeps/x86_64/fpu/libm-test-ulps: Update ULP for sinf and cosf. * sysdeps/x86_64/fpu/multiarch/s_sincosf-fma.c: Add definitions of constants rather than including generic sincosf.h. * sysdeps/x86_64/fpu/s_sincosf_data.c: Remove. * sysdeps/ieee754/flt-32/s_cosf.c (cosf): Rewrite. * sysdeps/ieee754/flt-32/s_sincosf.h (reduced_sin): Remove. (reduced_cos): Remove. (sinf_poly): New function. * sysdeps/ieee754/flt-32/s_sinf.c (sinf): Rewrite.
* x86-64: Add sincosf with vector FMAH.J. Lu2018-01-081-0/+240
Since the x86-64 assembly version of sincosf is higly optimized with vector instructions, there isn't much room for improvement. However s_sincosf.c written in C with vector math and intrinsics can be optimized by GCC with FMA. On Skylake, bench-sincosf reports performance improvement: Assembly FMA improvement max 104.042 101.008 3% min 9.426 8.586 10% mean 20.6209 18.2238 13% * sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines): Add s_sincosf-sse2 and s_sincosf-fma. (CFLAGS-s_sincosf-fma.c): New. * sysdeps/x86_64/fpu/multiarch/s_sincosf-fma.c: New file. * sysdeps/x86_64/fpu/multiarch/s_sincosf-sse2.S: Likewise. * sysdeps/x86_64/fpu/multiarch/s_sincosf.c: Likewise. * sysdeps/x86_64/fpu/s_sincosf.S: Don't add alias if __sincosf is defined.