about summary refs log tree commit diff
path: root/sysdeps/x86_64/fpu/multiarch
Commit message (Collapse)AuthorAgeFilesLines
* Improve performance of sinf and cosfWilco Dijkstra2018-08-141-1/+32
| | | | | | | | | | | | | | | | | | | | | | | | | The second patch improves performance of sinf and cosf using the same algorithms and polynomials. The returned values are identical to sincosf for the same input. ULP definitions for AArch64 and x64 are updated. sinf/cosf througput gains on Cortex-A72: * |x| < 0x1p-12 : 1.2x * |x| < M_PI_4 : 1.8x * |x| < 2 * M_PI: 1.7x * |x| < 120.0 : 2.3x * |x| < Inf : 3.0x * NEWS: Mention sinf, cosf, sincosf. * sysdeps/aarch64/libm-test-ulps: Update ULP for sinf, cosf, sincosf. * sysdeps/x86_64/fpu/libm-test-ulps: Update ULP for sinf and cosf. * sysdeps/x86_64/fpu/multiarch/s_sincosf-fma.c: Add definitions of constants rather than including generic sincosf.h. * sysdeps/x86_64/fpu/s_sincosf_data.c: Remove. * sysdeps/ieee754/flt-32/s_cosf.c (cosf): Rewrite. * sysdeps/ieee754/flt-32/s_sincosf.h (reduced_sin): Remove. (reduced_cos): Remove. (sinf_poly): New function. * sysdeps/ieee754/flt-32/s_sinf.c (sinf): Rewrite.
* x86: Don't include <init-arch.h> in assembly codesH.J. Lu2018-08-032-2/+0
| | | | | | | | | | | | There is no need to include <init-arch.h> in assembly codes since all x86 IFUNC selector functions are written in C. Tested on i686 and x86-64. There is no code change in libc.so, ld.so and libmvec.so. * sysdeps/i386/i686/multiarch/bzero-ia32.S: Don't include <init-arch.h>. * sysdeps/x86_64/fpu/multiarch/svml_d_sin8_core-avx2.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core-avx2.S: Likewise. * sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Likewise.
* Remove mplog and mpexpWilco Dijkstra2018-02-1510-75/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | Remove the now unused mplog and mpexp files. * math/Makefile: Remove mpexp.c and mplog.c * sysdeps/i386/fpu/mpexp.c: Delete file. * sysdeps/i386/fpu/mplog.c: Likewise. * sysdeps/ia64/fpu/mpexp.c: Likewise. * sysdeps/ia64/fpu/mplog.c: Likewise. * sysdeps/ieee754/dbl-64/e_exp.c: Remove mention of mpexp and mplog. * sysdeps/ieee754/dbl-64/mpa.h (__pow_mp): Remove unused function. * sysdeps/ieee754/dbl-64/mpexp.c: Delete file. * sysdeps/ieee754/dbl-64/mplog.c: Likewise. * sysdeps/m68k/m680x0/fpu/mpexp.c: Likewise. * sysdeps/m68k/m680x0/fpu/mplog.c: Likewise. * sysdeps/x86_64/fpu/multiarch/Makefile: Remove mpexp* and mplog*. * sysdeps/x86_64/fpu/multiarch/e_log-avx.c: Remove unused defines. * sysdeps/x86_64/fpu/multiarch/e_log-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/e_log-fma4.c: Likewise. * sysdeps/x86_64/fpu/multiarch/mpexp-avx.c: Delete file. * sysdeps/x86_64/fpu/multiarch/mpexp-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/mpexp-fma4.c: Likewise. * sysdeps/x86_64/fpu/multiarch/mplog-avx.c: Likewise. * sysdeps/x86_64/fpu/multiarch/mplog-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/mplog-fma4.c: Likewise.
* Remove slow paths from expSzabolcs Nagy2018-02-127-36/+3
| | | | | | | | | | | | | | | | | | | | | | | | | Remove the __slowexp code, so exp is no longer correctly rounded. The result is computed to about 70 bits precision so the worst case ulp error is about 0.500007 in nearest rounding mode. * manual/probes.texi: Remove slowexp probes. * math/Makefile: Remove slowexp. * sysdeps/generic/math_private.h (__slowexp): Remove. * sysdeps/ieee754/dbl-64/e_exp.c (__ieee754_exp): Remove __slowexp and document error bounds. * sysdeps/i386/fpu/slowexp.c: Remove. * sysdeps/ia64/fpu/slowexp.c: Remove. * sysdeps/ieee754/dbl-64/slowexp.c: Remove. * sysdeps/ieee754/dbl-64/uexp.h (err_0): Remove. * sysdeps/m68k/m680x0/fpu/slowexp.c: Remove. * sysdeps/powerpc/power4/fpu/Makefile (CPPFLAGS-slowexp.c): Remove. * sysdeps/x86_64/fpu/multiarch/Makefile: Remove slowexp-fma. * sysdeps/x86_64/fpu/multiarch/e_exp-avx.c (__slowexp): Remove. * sysdeps/x86_64/fpu/multiarch/e_exp-fma.c (__slowexp): Remove. * sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c (__slowexp): Remove. * sysdeps/x86_64/fpu/multiarch/slowexp-avx.c: Remove. * sysdeps/x86_64/fpu/multiarch/slowexp-fma.c: Remove. * sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c: Remove.
* Remove slow paths from powWilco Dijkstra2018-02-127-40/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove the slow paths from pow. Like several other double precision math functions, pow is exactly rounded. This is not required from math functions and causes major overheads as it requires multiple fallbacks using higher precision arithmetic if a result is close to 0.5ULP. Ridiculous slowdowns of up to 100000x have been reported when the highest precision path triggers. All GLIBC math tests pass on AArch64 and x64 (with ULP of pow set to 1). The worst case error is ~0.506ULP. A simple test over a few hundred million values shows pow is 10% faster on average. This fixes BZ #13932. [BZ #13932] * sysdeps/ieee754/dbl-64/uexp.h (err_1): Remove. * benchtests/pow-inputs: Update comment for slow path cases. * manual/probes.texi (slowpow_p10): Delete removed probe. (slowpow_p10): Likewise. * math/Makefile: Remove halfulp.c and slowpow.c. * sysdeps/aarch64/libm-test-ulps: Set ULP of pow to 1. * sysdeps/generic/math_private.h (__exp1): Remove error argument. (__halfulp): Remove. (__slowpow): Remove. * sysdeps/i386/fpu/halfulp.c: Delete file. * sysdeps/i386/fpu/slowpow.c: Likewise. * sysdeps/ia64/fpu/halfulp.c: Likewise. * sysdeps/ia64/fpu/slowpow.c: Likewise. * sysdeps/ieee754/dbl-64/e_exp.c (__exp1): Remove error argument, improve comments and add error analysis. * sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Add error analysis. (power1): Remove function: (log1): Remove error argument, add error analysis. (my_log2): Remove function. * sysdeps/ieee754/dbl-64/halfulp.c: Delete file. * sysdeps/ieee754/dbl-64/slowpow.c: Likewise. * sysdeps/m68k/m680x0/fpu/halfulp.c: Likewise. * sysdeps/m68k/m680x0/fpu/slowpow.c: Likewise. * sysdeps/powerpc/power4/fpu/Makefile: Remove CPPFLAGS-slowpow.c. * sysdeps/x86_64/fpu/libm-test-ulps: Set ULP of pow to 1. * sysdeps/x86_64/fpu/multiarch/Makefile: Remove slowpow-fma.c, slowpow-fma4.c, halfulp-fma.c, halfulp-fma4.c. * sysdeps/x86_64/fpu/multiarch/e_pow-fma.c (__slowpow): Remove define. * sysdeps/x86_64/fpu/multiarch/e_pow-fma4.c (__slowpow): Likewise. * sysdeps/x86_64/fpu/multiarch/halfulp-fma.c: Delete file. * sysdeps/x86_64/fpu/multiarch/halfulp-fma4.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowpow-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowpow-fma4.c: Likewise.
* x86-64: Add sincosf with vector FMAH.J. Lu2018-01-084-2/+273
| | | | | | | | | | | | | | | | | | | | | | | Since the x86-64 assembly version of sincosf is higly optimized with vector instructions, there isn't much room for improvement. However s_sincosf.c written in C with vector math and intrinsics can be optimized by GCC with FMA. On Skylake, bench-sincosf reports performance improvement: Assembly FMA improvement max 104.042 101.008 3% min 9.426 8.586 10% mean 20.6209 18.2238 13% * sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines): Add s_sincosf-sse2 and s_sincosf-fma. (CFLAGS-s_sincosf-fma.c): New. * sysdeps/x86_64/fpu/multiarch/s_sincosf-fma.c: New file. * sysdeps/x86_64/fpu/multiarch/s_sincosf-sse2.S: Likewise. * sysdeps/x86_64/fpu/multiarch/s_sincosf.c: Likewise. * sysdeps/x86_64/fpu/s_sincosf.S: Don't add alias if __sincosf is defined.
* Update copyright dates with scripts/update-copyrights.Joseph Myers2018-01-01152-152/+152
| | | | | | | * All files with FSF copyright notices: Update copyright dates using scripts/update-copyrights. * locale/programs/charmap-kw.h: Regenerated. * locale/programs/locfile-kw.h: Likewise.
* Revert exp reimplementation (causes test failures).Joseph Myers2017-12-197-3/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Revert: 2017-12-19 Joseph Myers <joseph@codesourcery.com> * sysdeps/x86_64/fpu/libm-test-ulps: Update. 2017-12-19 Patrick McGehearty <patrick.mcgehearty@oracle.com> * sysdeps/ieee754/dbl-64/e_exp.c: Include <math-svid-compat.h> and <errno.h>. Include "eexp.tbl". (half): New constant. (one): Likewise. (__ieee754_exp): Rewrite. (__slowexp): Remove prototype. * sysdeps/ieee754/dbl-64/eexp.tbl: New file. * sysdeps/ieee754/dbl-64/slowexp.c: Remove file. * sysdeps/i386/fpu/slowexp.c: Likewise. * sysdeps/ia64/fpu/slowexp.c: Likewise. * sysdeps/m68k/m680x0/fpu/slowexp.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowexp-avx.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowexp-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c: Likewise. * sysdeps/generic/math_private.h (__slowexp): Remove prototype. * sysdeps/ieee754/dbl-64/e_pow.c: Remove mention of slowexp.c in comment. * sysdeps/powerpc/power4/fpu/Makefile [$(subdir) = math] (CPPFLAGS-slowexp.c): Remove variable. * sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines): Remove slowexp-fma, slowexp-fma4 and slowexp-avx. (CFLAGS-slowexp-fma.c): Remove variable. (CFLAGS-slowexp-fma4.c): Likewise. (CFLAGS-slowexp-avx.c): Likewise. * sysdeps/x86_64/fpu/multiarch/e_exp-avx.c (__slowexp): Do not define as macro. * sysdeps/x86_64/fpu/multiarch/e_exp-fma.c (__slowexp): Likewise. * sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c (__slowexp): Likewise. * math/Makefile (type-double-routines): Remove slowexp. * manual/probes.texi (slowexp_p6): Remove. (slowexp_p32): Likewise.
* Improve __ieee754_exp() performance by greater than 5x on sparc/x86.Patrick McGehearty2017-12-197-36/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These changes will be active for all platforms that don't provide their own exp() routines. They will also be active for ieee754 versions of ccos, ccosh, cosh, csin, csinh, sinh, exp10, gamma, and erf. Typical performance gains is typically around 5x when measured on Sparc s7 for common values between exp(1) and exp(40). Using the glibc perf tests on sparc, sparc (nsec) x86 (nsec) old new old new max 17629 395 5173 144 min 399 54 15 13 mean 5317 200 1349 23 The extreme max times for the old (ieee754) exp are due to the multiprecision computation in the old algorithm when the true value is very near 0.5 ulp away from an value representable in double precision. The new algorithm does not take special measures for those cases. The current glibc exp perf tests overrepresent those values. Informal testing suggests approximately one in 200 cases might invoke the high cost computation. The performance advantage of the new algorithm for other values is still large but not as large as indicated by the chart above. Glibc correctness tests for exp() and expf() were run. Within the test suite 3 input values were found to cause 1 bit differences (ulp) when "FE_TONEAREST" rounding mode is set. No differences in exp() were seen for the tested values for the other rounding modes. Typical example: exp(-0x1.760cd2p+0) (-1.46113312244415283203125) new code: 2.31973271630014299393707e-01 0x1.db14cd799387ap-3 old code: 2.31973271630014271638132e-01 0x1.db14cd7993879p-3 exp = 2.31973271630014285508337 (high precision) Old delta: off by 0.49 ulp New delta: off by 0.51 ulp In addition, because ieee754_exp() is used by other routines, cexp() showed test results with very small imaginary input values where the imaginary portion of the result was off by 3 ulp when in upward rounding mode, but not in the other rounding modes. For x86, tgamma showed a few values where the ulp increased to 6 (max ulp for tgamma is 5). Sparc tgamma did not show these failures. I presume the tgamma differences are due to compiler optimization differences within the gamma function.The gamma function is known to be difficult to compute accurately. * sysdeps/ieee754/dbl-64/e_exp.c: Include <math-svid-compat.h> and <errno.h>. Include "eexp.tbl". (half): New constant. (one): Likewise. (__ieee754_exp): Rewrite. (__slowexp): Remove prototype. * sysdeps/ieee754/dbl-64/eexp.tbl: New file. * sysdeps/ieee754/dbl-64/slowexp.c: Remove file. * sysdeps/i386/fpu/slowexp.c: Likewise. * sysdeps/ia64/fpu/slowexp.c: Likewise. * sysdeps/m68k/m680x0/fpu/slowexp.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowexp-avx.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowexp-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c: Likewise. * sysdeps/generic/math_private.h (__slowexp): Remove prototype. * sysdeps/ieee754/dbl-64/e_pow.c: Remove mention of slowexp.c in comment. * sysdeps/powerpc/power4/fpu/Makefile [$(subdir) = math] (CPPFLAGS-slowexp.c): Remove variable. * sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines): Remove slowexp-fma, slowexp-fma4 and slowexp-avx. (CFLAGS-slowexp-fma.c): Remove variable. (CFLAGS-slowexp-fma4.c): Likewise. (CFLAGS-slowexp-avx.c): Likewise. * sysdeps/x86_64/fpu/multiarch/e_exp-avx.c (__slowexp): Do not define as macro. * sysdeps/x86_64/fpu/multiarch/e_exp-fma.c (__slowexp): Likewise. * sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c (__slowexp): Likewise. * math/Makefile (type-double-routines): Remove slowexp. * manual/probes.texi (slowexp_p6): Remove. (slowexp_p32): Likewise.
* x86-64: Add cosf with FMAH.J. Lu2017-12-124-2/+35
| | | | | | | | | | | | | | | | | | On Skylake, bench-cosf reports performance improvement: Before After Improvement max 135.362 94.552 43% min 8.532 7.688 11% mean 17.1446 11.8128 45% * sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines): Add s_cosf-sse2 and s_cosf-fma. (CFLAGS-s_cosf-fma.c): New. * sysdeps/x86_64/fpu/multiarch/s_cosf-fma.c: New file. * sysdeps/x86_64/fpu/multiarch/s_cosf-sse2.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_cosf.c: Likewise. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* x86-64: Add sinf with FMAH.J. Lu2017-12-074-1/+36
| | | | | | | | | | | | | | | | On Skylake, bench-sinf reports performance improvement: Before After Improvement max 153.996 100.094 54% min 8.546 6.852 25% mean 18.1223 11.802 54% * sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines): Add s_sinf-sse2 and s_sinf-fma. (CFLAGS-s_sinf-fma.c): New. * sysdeps/x86_64/fpu/multiarch/s_sinf-fma.c: New file. * sysdeps/x86_64/fpu/multiarch/s_sinf-sse2.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_sinf.c: Likewise.
* Use libm_alias_float for x86_64.Joseph Myers2017-11-2911-11/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Continuing the preparation for additional _FloatN / _FloatNx function aliases, this patch makes x86_64 libm function implementations use libm_alias_float to define function aliases, or libm_alias_float_other where the main name is defined with versioned_symbol. Tested with the glibc testsuite for x86_64, and tested with build-many-glibcs.py for all its x86_64 configurations that installed stripped shared libraries are unchanged by the patch. * sysdeps/x86_64/fpu/multiarch/e_exp2f.c: Include <libm-alias-float.h>. (exp2f): Define using libm_alias_float, or libm_alias_float_other if [SHARED]. * sysdeps/x86_64/fpu/multiarch/e_expf.c: Include <libm-alias-float.h>. (exp2f): Define using libm_alias_float, or libm_alias_float_other if [SHARED]. * sysdeps/x86_64/fpu/multiarch/e_log2f.c: Include <libm-alias-float.h>. (exp2f): Define using libm_alias_float, or libm_alias_float_other if [SHARED]. * sysdeps/x86_64/fpu/multiarch/e_logf.c: Include <libm-alias-float.h>. (exp2f): Define using libm_alias_float, or libm_alias_float_other if [SHARED]. * sysdeps/x86_64/fpu/multiarch/e_powf.c: Include <libm-alias-float.h>. (exp2f): Define using libm_alias_float, or libm_alias_float_other if [SHARED]. * sysdeps/x86_64/fpu/multiarch/s_ceilf.c: Include <libm-alias-float.h>. (ceilf): Define using libm_alias_float. * sysdeps/x86_64/fpu/multiarch/s_floorf.c: Include <libm-alias-float.h>. (floorf): Define using libm_alias_float. * sysdeps/x86_64/fpu/multiarch/s_fmaf.c: Include <libm-alias-float.h>. (fmaf): Define using libm_alias_float. * sysdeps/x86_64/fpu/multiarch/s_nearbyintf.c: Include <libm-alias-float.h>. (nearbyintf): Define using libm_alias_float. * sysdeps/x86_64/fpu/multiarch/s_rintf.c: Include <libm-alias-float.h>. (rintf): Define using libm_alias_float. * sysdeps/x86_64/fpu/multiarch/s_truncf.c: Include <libm-alias-float.h>. (truncf): Define using libm_alias_float. * sysdeps/x86_64/fpu/s_copysignf.S: Include <libm-alias-float.h>. (copysignf): Define using libm_alias_float. * sysdeps/x86_64/fpu/s_cosf.S: Include <libm-alias-float.h>. (cosf): Define using libm_alias_float. * sysdeps/x86_64/fpu/s_fabsf.c: Include <libm-alias-float.h>. (fabsf): Define using libm_alias_float. * sysdeps/x86_64/fpu/s_fmaxf.S: Include <libm-alias-float.h>. (fmaxf): Define using libm_alias_float. * sysdeps/x86_64/fpu/s_fminf.S: Include <libm-alias-float.h>. (fminf): Define using libm_alias_float. * sysdeps/x86_64/fpu/s_llrintf.S: Include <libm-alias-float.h>. (llrintf): Define using libm_alias_float. [!__ILP32__] (lrintf): Likewise. * sysdeps/x86_64/fpu/s_sincosf.S: Include <libm-alias-float.h>. (sincosf): Define using libm_alias_float. * sysdeps/x86_64/fpu/s_sinf.S: Include <libm-alias-float.h>. (sinf): Define using libm_alias_float. * sysdeps/x86_64/x32/fpu/s_lrintf.S: Include <libm-alias-float.h>. (lrintf): Define using libm_alias_float.
* Use libm_alias_double for x86_64.Joseph Myers2017-11-299-10/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Continuing the preparation for additional _FloatN / _FloatNx function aliases, this patch makes x86_64 libm function implementations use libm_alias_double to define function aliases. Tested with the glibc testsuite for x86_64, and tested with build-many-glibcs.py for all its x86_64 configurations that installed stripped shared libraries are unchanged by the patch. * sysdeps/x86_64/fpu/multiarch/s_atan.c: Include <libm-alias-double.h>. (atan): Define using libm_alias_double. * sysdeps/x86_64/fpu/multiarch/s_ceil.c: Include <libm-alias-double.h>. (ceil): Define using libm_alias_double. * sysdeps/x86_64/fpu/multiarch/s_floor.c: Include <libm-alias-double.h>. (floor): Define using libm_alias_double. * sysdeps/x86_64/fpu/multiarch/s_fma.c: Include <libm-alias-double.h>. (fma): Define using libm_alias_double. * sysdeps/x86_64/fpu/multiarch/s_nearbyint.c: Include <libm-alias-double.h>. (nearbyint): Define using libm_alias_double. * sysdeps/x86_64/fpu/multiarch/s_rint.c: Include <libm-alias-double.h>. (rint): Define using libm_alias_double. * sysdeps/x86_64/fpu/multiarch/s_sin.c: Include <libm-alias-double.h>. (sin): Define using libm_alias_double. (cos): Likewise. * sysdeps/x86_64/fpu/multiarch/s_tan.c: Include <libm-alias-double.h>. (tan): Define using libm_alias_double. * sysdeps/x86_64/fpu/multiarch/s_trunc.c: Include <libm-alias-double.h>. (trunc): Define using libm_alias_double. * sysdeps/x86_64/fpu/s_copysign.S: Include <libm-alias-double.h>. (copysign): Define using libm_alias_double. * sysdeps/x86_64/fpu/s_fabs.c: Include <libm-alias-double.h>. (fabs): Define using libm_alias_double. * sysdeps/x86_64/fpu/s_fmax.S: Include <libm-alias-double.h>. (fmax): Define using libm_alias_double. * sysdeps/x86_64/fpu/s_fmin.S: Include <libm-alias-double.h>. (fmin): Define using libm_alias_double. * sysdeps/x86_64/fpu/s_llrint.S: Include <libm-alias-double.h>. (llrint): Define using libm_alias_double. [!__ILP32__] (lrint): Likewise. * sysdeps/x86_64/x32/fpu/s_lrint.S: Include <libm-alias-double.h>. (lrint): Define using libm_alias_double.
* Replace "if if " with "if " in commentsH.J. Lu2017-10-253-3/+3
| | | | | | | | | | | | | * include/alloc_buffer.h: Replace "if if " with "if " in comments. * sysdeps/mips/memcpy.S: Likkewise. * sysdeps/mips/memset.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core_sse4.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core_avx2.S: Likewise.
* x86-64: Add powf with FMAH.J. Lu2017-10-223-1/+49
| | | | | | | | | | | | | | For workload-spec2017.wrf, on Skylake, it improves performance by: Before After Improvement reciprocal-throughput 35.4713 27.3842 29% latency 82.4537 66.3175 24% * sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines): Add e_powf-fma. (CFLAGS-e_powf-fma.c): New. * sysdeps/x86_64/fpu/multiarch/e_powf-fma.c: New file. * sysdeps/x86_64/fpu/multiarch/e_powf.c: Likewise.
* x86-64: Add log2f with FMAH.J. Lu2017-10-223-1/+45
| | | | | | | | | | | | | | For workload-spec2017.wrf, on Skylake, it improves performance by: Before After Improvement reciprocal-throughput 16.5937 14.0789 17% latency 41.7755 35.3586 18% * sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines): Add e_log2f-fma. (CFLAGS-e_log2f-fma.c): New. * sysdeps/x86_64/fpu/multiarch/e_log2f-fma.c: New file. * sysdeps/x86_64/fpu/multiarch/e_log2f.c: Likewise.
* x86-64: Add logf with FMAH.J. Lu2017-10-223-1/+45
| | | | | | | | | | | | | | For workload-spec2017.wrf, on Skylake, it improves performance by: Before After Improvement reciprocal-throughput 16.1534 13.8874 16% latency 41.9642 34.3072 22% * sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines): Add e_logf-fma. (CFLAGS-e_logf-fma.c): New. * sysdeps/x86_64/fpu/multiarch/e_logf-fma.c: New file. * sysdeps/x86_64/fpu/multiarch/e_logf.c: Likewise.
* x86-64: Add exp2f with FMAH.J. Lu2017-10-223-1/+42
| | | | | | | | | | | | | | For workload-spec2017.wrf, on Skylake, it improves performance by: Before After Improvement reciprocal-throughput 13.0291 11.2225 16% latency 44.5154 37.5766 18% * sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines): Add e_exp2f-fma. (CFLAGS-e_exp2f-fma.c): New. * sysdeps/x86_64/fpu/multiarch/e_exp2f-fma.c: New file. * sysdeps/x86_64/fpu/multiarch/e_exp2f.c: Likewise.
* x86-64: Replace assembly versions of e_expf with generic e_expf.cH.J. Lu2017-10-224-189/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch replaces x86-64 assembly versions of e_expf with generic e_expf.c. For workload-spec2017.wrf, on Nehalem, it improves performance by: Before After Improvement reciprocal-throughput 36.039 20.7749 73% latency 58.8096 40.8715 43% On Skylake, it improves Before After Improvement reciprocal-throughput 18.4436 11.1693 65% latency 47.5162 37.5411 26% * sysdeps/x86_64/fpu/e_expf.S: Removed. * sysdeps/x86_64/fpu/multiarch/e_expf-fma.S: Likewise. * sysdeps/x86_64/fpu/w_expf.c: Likewise. * sysdeps/x86_64/fpu/libm-test-ulps: Updated for generic e_expf.c. * sysdeps/x86_64/fpu/multiarch/Makefile (CFLAGS-e_expf-fma.c): New. * sysdeps/x86_64/fpu/multiarch/e_expf-fma.c: New file. * sysdeps/x86_64/fpu/multiarch/e_expf.c (__redirect_ieee754_expf): Renamed to ... (__redirect_expf): This. (SYMBOL_NAME): Changed to expf. (__ieee754_expf): Renamed to ... (__expf): This. (__GI___expf): This. (__ieee754_expf): Add strong_alias. (__expf_finite): Likewise. (__expf): New. Include <sysdeps/ieee754/flt-32/e_expf.c>.
* Make dbl-64 atan and tan into weak aliases.Joseph Myers2017-10-028-10/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch converts the dbl-64 implementations of atan and tan into weak aliases of __atan and __tan, in preparation for making them use libm_alias_double. Consequent changes are made to the x86_64 multiarch versions wrapping round them (with the dbl-64 functions, like other such functions, being made not to define their aliases at all if __atan or __tan are defined as macros by an including file). Tested for x86_64, and with build-many-glibcs.py. * sysdeps/ieee754/dbl-64/s_atan.c (atan): Rename to __atan and define as weak alias of __atan. Do not define any aliases if [__atan]. [NO_LONG_DOUBLE] (__atanl): Define as strong alias of __atan. [NO_LONG_DOUBLE] (atanl): Define as weak alias of __atanl. * sysdeps/ieee754/dbl-64/s_tan.c (tan): Rename to __tan and define as weak alias of __tan. Do not define any aliases if [__tan]. [NO_LONG_DOUBLE] (__tanl): Define as strong alias of __tan. [NO_LONG_DOUBLE] (tanl): Define as weak alias of __tanl. * sysdeps/x86_64/fpu/multiarch/s_atan-avx.c (atan): Rename to __atan. * sysdeps/x86_64/fpu/multiarch/s_atan-fma.c (atan): Likewise. * sysdeps/x86_64/fpu/multiarch/s_atan-fma4.c (atan): Likewise. * sysdeps/x86_64/fpu/multiarch/s_atan.c (atan): Rename to __atan and define as weak alias of __atan. * sysdeps/x86_64/fpu/multiarch/s_tan-avx.c (tan): Rename to __atan. * sysdeps/x86_64/fpu/multiarch/s_tan-fma.c (tan): Likewise. * sysdeps/x86_64/fpu/multiarch/s_tan-fma4.c (tan): Likewise. * sysdeps/x86_64/fpu/multiarch/s_tan.c (tan): Rename to __tan and define as weak alias of __tan.
* Add SSE4.1 trunc, truncf (bug 20142).Joseph Myers2017-09-207-2/+116
| | | | | | | | | | | | | | | | | | | | This patch adds SSE4.1 versions of trunc and truncf, using the roundsd / roundss instructions, similar to the versions of ceil, floor, rint and nearbyint functions we already have. In my testing with the glibc benchtests these are about 30% faster than the C versions for double, 20% faster for float. Tested for x86_64. [BZ #20142] * sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines): Add s_trunc-c, s_truncf-c, s_trunc-sse4_1 and s_truncf-sse4_1. * sysdeps/x86_64/fpu/multiarch/s_trunc-c.c: New file. * sysdeps/x86_64/fpu/multiarch/s_trunc-sse4_1.S: Likewise. * sysdeps/x86_64/fpu/multiarch/s_trunc.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_truncf-c.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_truncf-sse4_1.S: Likewise. * sysdeps/x86_64/fpu/multiarch/s_truncf.c: Likewise.
* x86: Add MathVec_Prefer_No_AVX512 to cpu-features [BZ #21967]H.J. Lu2017-09-121-5/+8
| | | | | | | | | | | | | | | | | | | | | AVX512 functions in mathvec are used on machines with AVX512. An AVX2 wrapper is also provided and it can be used when the AVX512 version isn't profitable. MathVec_Prefer_No_AVX512 is addded to cpu-features. If glibc.tune.hwcaps=MathVec_Prefer_No_AVX512 is set in GLIBC_TUNABLES environment variable, the AVX2 wrapper will be used. Tested on x86-64 machines with and without AVX512. Also verified glibc.tune.hwcaps=MathVec_Prefer_No_AVX512 on AVX512 machine. [BZ #21967] * sysdeps/x86/cpu-features.h (bit_arch_MathVec_Prefer_No_AVX512): New. (index_arch_MathVec_Prefer_No_AVX512): Likewise. * sysdeps/x86/cpu-tunables.c (TUNABLE_CALLBACK (set_hwcaps)): Handle MathVec_Prefer_No_AVX512. * sysdeps/x86_64/fpu/multiarch/ifunc-mathvec-avx512.h (IFUNC_SELECTOR): Return AVX2 version if MathVec_Prefer_No_AVX512 is set.
* x86_64 __redirect_ieee754_expf: Change double to floatH.J. Lu2017-08-281-1/+1
| | | | | | | __redirect_ieee754_expf has type float, not double. * sysdeps/x86_64/fpu/multiarch/e_expf.c (__redirect_ieee754_expf): Change double to float.
* x86_64: Replace AVX512F .byte sequences with instructionsH.J. Lu2017-08-232-74/+8
| | | | | | | | | | | | | | | | | | Since binutils 2.25 or later is required to build glibc, we can replace AVX512F .byte sequences with AVX512F instructions. Tested on x86-64 and x32. There are no code differences in libmvec.so and libmvec.a. * sysdeps/x86_64/fpu/svml_d_sincos8_core.S: Replace AVX512F .byte sequences with AVX512F instructions. * sysdeps/x86_64/fpu/svml_d_wrapper_impl.h: Likewise. * sysdeps/x86_64/fpu/svml_s_sincosf16_core.S: Likewise. * sysdeps/x86_64/fpu/svml_s_wrapper_impl.h: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core_avx512.S: Likewise.
* x86-64: Check FMA_Usable in ifunc-mathvec-avx2.h [BZ #21966]H.J. Lu2017-08-181-1/+2
| | | | | | | | | | Since the AVX2 version of mathvec functions uses FMA, it can only be used when FMA is usable. [BZ #21966] * sysdeps/x86_64/fpu/multiarch/ifunc-mathvec-avx2.h (IFUNC_SELECTOR): Don't use the AVX2 version if FMA isn't usable.
* x86-64: Optimize e_expf with FMA [BZ #21912]H.J. Lu2017-08-164-0/+245
| | | | | | | | | | | FMA optimized e_expf improves performance by more than 50% on Skylake. [BZ #21912] * sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines): Add e_expf-fma. * sysdeps/x86_64/fpu/multiarch/e_expf-fma.S: New file. * sysdeps/x86_64/fpu/multiarch/e_expf.c: Likewise. * sysdeps/x86_64/fpu/multiarch/ifunc-fma.h: Likewise.
* x86-64: Add FMA multiarch functions to libmH.J. Lu2017-08-0732-105/+493
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds multiarch functions optimized with -mfma -mavx2 to libm. e_pow-fma.c is compiled with $(config-cflags-nofma) due to PR 19003. * sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines): Add e_exp-fma, e_log-fma, e_pow-fma, s_atan-fma, e_asin-fma, e_atan2-fma, s_sin-fma, s_tan-fma, mplog-fma, mpa-fma, slowexp-fma, slowpow-fma, sincos32-fma, doasin-fma, dosincos-fma, halfulp-fma, mpexp-fma, mpatan2-fma, mpatan-fma, mpsqrt-fma, and mptan-fma. (CFLAGS-doasin-fma.c): New. (CFLAGS-dosincos-fma.c): Likewise. (CFLAGS-e_asin-fma.c): Likewise. (CFLAGS-e_atan2-fma.c): Likewise. (CFLAGS-e_exp-fma.c): Likewise. (CFLAGS-e_log-fma.c): Likewise. (CFLAGS-e_pow-fma.c): Likewise. (CFLAGS-halfulp-fma.c): Likewise. (CFLAGS-mpa-fma.c): Likewise. (CFLAGS-mpatan-fma.c): Likewise. (CFLAGS-mpatan2-fma.c): Likewise. (CFLAGS-mpexp-fma.c): Likewise. (CFLAGS-mplog-fma.c): Likewise. (CFLAGS-mpsqrt-fma.c): Likewise. (CFLAGS-mptan-fma.c): Likewise. (CFLAGS-s_atan-fma.c): Likewise. (CFLAGS-sincos32-fma.c): Likewise. (CFLAGS-slowexp-fma.c): Likewise. (CFLAGS-slowpow-fma.c): Likewise. (CFLAGS-s_sin-fma.c): Likewise. (CFLAGS-s_tan-fma.c): Likewise. * sysdeps/x86_64/fpu/multiarch/doasin-fma.c: New file. * sysdeps/x86_64/fpu/multiarch/dosincos-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/e_asin-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/e_atan2-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/e_exp-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/e_log-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/e_pow-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/halfulp-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/ifunc-avx-fma4.h: Likewise. * sysdeps/x86_64/fpu/multiarch/ifunc-fma4.h: Likewise. * sysdeps/x86_64/fpu/multiarch/mpa-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/mpatan-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/mpatan2-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/mpexp-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/mplog-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/mpsqrt-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/mptan-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_atan-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_sin-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_tan-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/sincos32-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowexp-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowpow-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/e_asin.c: Rewrite. * sysdeps/x86_64/fpu/multiarch/e_atan2.c: Likewise. * sysdeps/x86_64/fpu/multiarch/e_exp.c: Likewise. * sysdeps/x86_64/fpu/multiarch/e_log.c: Likewise. * sysdeps/x86_64/fpu/multiarch/e_pow.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_atan.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_sin.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_tan.c: Likewise.
* x86-64: Implement libmathvec IFUNC selectors in CH.J. Lu2017-08-0476-619/+1181
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * sysdeps/x86_64/fpu/multiarch/Makefile (libmvec-sysdep_routines) Add svml_d_cos2_core-sse2, svml_d_cos4_core-sse, svml_d_cos8_core-avx2, svml_d_exp2_core-sse2, svml_d_exp4_core-sse, svml_d_exp8_core-avx2, svml_d_log2_core-sse2, svml_d_log4_core-sse, svml_d_log8_core-avx2, svml_d_pow2_core-sse2, svml_d_pow4_core-sse, svml_d_pow8_core-avx2 svml_d_sin2_core-sse2, svml_d_sin4_core-sse, svml_d_sin8_core-avx2, svml_d_sincos2_core-sse2, svml_d_sincos4_core-sse, svml_d_sincos8_core-avx2, svml_s_cosf16_core-avx2, svml_s_cosf4_core-sse2, svml_s_cosf8_core-sse, svml_s_expf16_core-avx2, svml_s_expf4_core-sse2, svml_s_expf8_core-sse, svml_s_logf16_core-avx2, svml_s_logf4_core-sse2, svml_s_logf8_core-sse, svml_s_powf16_core-avx2, svml_s_powf4_core-sse2, svml_s_powf8_core-sse, svml_s_sincosf16_core-avx2, svml_s_sincosf4_core-sse2, svml_s_sincosf8_core-sse, svml_s_sinf16_core-avx2, svml_s_sinf4_core-sse2 and svml_s_sinf8_core-sse. * sysdeps/x86_64/fpu/multiarch/ifunc-mathvec-avx2.h: New file. * sysdeps/x86_64/fpu/multiarch/ifunc-mathvec-avx512.h: Likewise. * sysdeps/x86_64/fpu/multiarch/ifunc-mathvec-sse4_1.h: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_cos2_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_cos4_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_cos8_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_exp2_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_exp4_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_exp8_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_log2_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_log4_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_log8_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_pow2_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_pow4_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_pow8_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sin2_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sin4_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sin8_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos2_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos4_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_cosf16_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_cosf4_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_cosf8_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_expf16_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_expf4_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_expf8_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_logf16_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_logf4_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_logf8_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_powf16_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_powf4_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_powf8_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sincosf16_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sincosf4_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sincosf8_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sinf16_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sinf4_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sinf8_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_cos2_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_cos2_core-sse2.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVbN2v_cos): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_cos4_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_cos4_core-sse.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVdN4v_cos): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_cos8_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_cos8_core-avx2.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVeN8v_cos): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_exp2_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_exp2_core-sse2.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVbN2v_exp): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_exp4_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_exp4_core-sse.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVdN4v_exp): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_exp8_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_exp8_core-avx2.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVeN8v_exp): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_log2_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_log2_core-sse2.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVbN2v_log): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_log4_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_log4_core-sse.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVdN4v_log): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_log8_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_log8_core-avx2.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVeN8v_log): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_pow2_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_pow2_core-sse2.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVbN2vv_pow): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_pow4_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_pow4_core-sse.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVdN4vv_pow): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_pow8_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_pow8_core-avx2.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVeN8vv_pow): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_sin2_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_sin2_core-sse2.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVbN2v_sin): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_sin4_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_sin4_core-sse.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVbN4v_sin): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_sin8_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_sin8_core-avx2.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVbN8v_sin): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos2_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_sincos2_core-sse2.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVbN2vvv_sincos): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos4_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_sincos4_core-sse.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVdN4vvv_sincos): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core-avx2.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVeN8vvv_sincos): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_cosf16_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_cosf16_core-avx2.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVeN16v_cosf): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_cosf4_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_cosf4_core-sse2.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVbN4v_cosf): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_cosf8_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_cosf8_core-sse.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVdN8v_cosf): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_expf16_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_expf16_core-avx2.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVeN16v_expf): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_expf4_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_expf4_core-sse2.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVbN4v_expf): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_expf8_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_expf8_core-sse.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVdN8v_expf): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_logf16_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_logf16_core-avx2.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVeN16v_logf): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_logf4_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_logf4_core-sse2.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVbN4v_logf): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_logf8_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_logf8_core-sse.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVdN8v_logf): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_powf16_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_powf16_core-avx2.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVeN16vv_powf): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_powf4_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_powf4_core-sse2.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVbN4vv_powf): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_powf8_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_powf8_core-sse.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVdN8vv_powf): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_sincosf16_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_sincosf16_core-avx2.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVeN16vvv_sincosf): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_sincosf4_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_sincosf4_core-sse2.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVbN4vvv_sincosf): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_sincosf8_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_sincosf8_core-sse.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVdN8vvv_sincosf): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_sinf16_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_sinf16_core-avx2.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVeN16v_sinf): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_sinf4_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_sinf4_core-sse2.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVbN4v_sinf): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_sinf8_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_sinf8_core-sse.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVdN8v_sinf): Removed.
* x86-64: Implement libm IFUNC selectors in CH.J. Lu2017-08-0418-120/+287
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines): Add s_ceil-sse4_1, s_ceilf-sse4_1, s_floor-sse4_1, s_floorf-sse4_1, s_nearbyint-sse4_1, s_nearbyintf-sse4_1, s_rint-sse4_1 and s_rintf-sse4_1. * sysdeps/x86_64/fpu/multiarch/ifunc-sse4_1.h: New file. * sysdeps/x86_64/fpu/multiarch/s_ceil.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_ceilf.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_floor.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_floorf.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_nearbyint.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_nearbyintf.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_rint.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_rintf.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_ceil.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/s_ceil-sse4_1.S: This. Don't include <machine/asm.h> nor <init-arch.h>. Include <sysdep.h>. (__ceil): Removed. * sysdeps/x86_64/fpu/multiarch/s_ceilf.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/s_ceilf-sse4_1.S: This. Don't include <machine/asm.h> nor <init-arch.h>. Include <sysdep.h>. (__ceilf): Removed. * sysdeps/x86_64/fpu/multiarch/s_floor.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/s_floor-sse4_1.S: This. Don't include <machine/asm.h> nor <init-arch.h>. Include <sysdep.h>. (__floor): Removed. * sysdeps/x86_64/fpu/multiarch/s_floorf.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/s_floorf-sse4_1.S: This. Don't include <machine/asm.h> nor <init-arch.h>. Include <sysdep.h>. (__floorf): Removed. * sysdeps/x86_64/fpu/multiarch/s_nearbyint.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/s_nearbyint-sse4_1.S: This. Don't include <machine/asm.h> nor <init-arch.h>. Include <sysdep.h>. (__nearbyint): Removed. * sysdeps/x86_64/fpu/multiarch/s_nearbyintf.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/s_nearbyintf-sse4_1.S: This. Don't include <machine/asm.h> nor <init-arch.h>. Include <sysdep.h>. (__nearbyintf): Removed. * sysdeps/x86_64/fpu/multiarch/s_rint.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/s_rint-sse4_1.S: This. Don't include <machine/asm.h> nor <init-arch.h>. Include <sysdep.h>. (__rint): Removed. * sysdeps/x86_64/fpu/multiarch/s_rintf.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/s_rintf-sse4_1.S: This. Don't include <machine/asm.h> nor <init-arch.h>. Include <sysdep.h>. (__rintf): Removed.
* Update copyright dates with scripts/update-copyrights.Joseph Myers2017-01-0182-82/+82
|
* x86_64: Call finite scalar versions in vectorized log, pow, exp (bz #20033).Andrew Senkevich2016-08-0218-48/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Vector math functions require -ffast-math which sets -ffinite-math-only, so it is needed to call finite scalar versions (which are called from vector functions in some cases). Since finite version of pow() returns qNaN instead of 1.0 for several inputs, those inputs are excluded for tests of vector math functions. [BZ #20033] * sysdeps/x86_64/fpu/multiarch/svml_d_exp2_core_sse4.S: Call finite version. * sysdeps/x86_64/fpu/multiarch/svml_d_exp4_core_avx2.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_exp8_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_log2_core_sse4.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_log4_core_avx2.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_log8_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_pow2_core_sse4.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_pow4_core_avx2.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_pow8_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_expf4_core_sse4.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_expf8_core_avx2.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_logf16_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_logf4_core_sse4.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_logf8_core_avx2.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_powf16_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_powf4_core_sse4.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_powf8_core_avx2.S: Likewise. * sysdeps/x86_64/fpu/svml_d_exp2_core.S: Likewise. * sysdeps/x86_64/fpu/svml_d_log2_core.S: Likewise. * sysdeps/x86_64/fpu/svml_d_pow2_core.S: Likewise. * sysdeps/x86_64/fpu/svml_s_expf4_core.S: Likewise. * sysdeps/x86_64/fpu/svml_s_logf4_core.S: Likewise. * sysdeps/x86_64/fpu/svml_s_powf4_core.S: Likewise. * math/libm-test.inc (pow_test_data): Exclude tests for qNaN in power zero.
* Require binutils 2.24 to build x86-64 glibc [BZ #20139]H.J. Lu2016-07-0112-24/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If assembler doesn't support AVX512DQ, _dl_runtime_resolve_avx is used to save the first 8 vector registers, which only saves the lower 256 bits of vector register, for lazy binding. When it is called on AVX512 platform, the upper 256 bits of ZMM registers are clobbered. Parameters passed in ZMM registers will be wrong when the function is called the first time. This patch requires binutils 2.24, whose assembler can store and load ZMM registers, to build x86-64 glibc. Since mathvec library needs assembler support for AVX512DQ, we disable mathvec if assembler doesn't support AVX512DQ. [BZ #20139] * config.h.in (HAVE_AVX512_ASM_SUPPORT): Renamed to ... (HAVE_AVX512DQ_ASM_SUPPORT): This. * sysdeps/x86_64/configure.ac: Require assembler from binutils 2.24 or above. (HAVE_AVX512_ASM_SUPPORT): Removed. (HAVE_AVX512DQ_ASM_SUPPORT): New. * sysdeps/x86_64/configure: Regenerated. * sysdeps/x86_64/dl-trampoline.S: Make HAVE_AVX512_ASM_SUPPORT check unconditional. * sysdeps/x86_64/multiarch/ifunc-impl-list.c: Likewise. * sysdeps/x86_64/multiarch/memcpy.S: Likewise. * sysdeps/x86_64/multiarch/memcpy_chk.S: Likewise. * sysdeps/x86_64/multiarch/memmove-avx512-no-vzeroupper.S: Likewise. * sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memmove.S: Likewise. * sysdeps/x86_64/multiarch/memmove_chk.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy_chk.S: Likewise. * sysdeps/x86_64/multiarch/memset-avx512-no-vzeroupper.S: Likewise. * sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memset.S: Likewise. * sysdeps/x86_64/multiarch/memset_chk.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_cos8_core_avx512.S: Check HAVE_AVX512DQ_ASM_SUPPORT instead of HAVE_AVX512_ASM_SUPPORT. * sysdeps/x86_64/fpu/multiarch/svml_d_exp8_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_log8_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_pow8_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sin8_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core_avx512.: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_cosf16_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_logf16_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_powf16_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core_avx51: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sinf16_core_avx512.S: Likewise.
* Fixed wrong vector sincos/sincosf ABI to have it compatible withAndrew Senkevich2016-07-016-14/+862
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | current vector function declaration "#pragma omp declare simd notinbranch", according to which vector sincos should have vector of pointers for second and third parameters. It is fixed with implementation as wrapper to version having second and third parameters as pointers. [BZ #20024] * sysdeps/x86/fpu/test-math-vector-sincos.h: New. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos2_core_sse4.S: Fixed ABI of this implementation of vector function. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos4_core_avx2.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core_sse4.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core_avx2.S: Likewise. * sysdeps/x86_64/fpu/svml_d_sincos2_core.S: Likewise. * sysdeps/x86_64/fpu/svml_d_sincos4_core.S: Likewise. * sysdeps/x86_64/fpu/svml_d_sincos4_core_avx.S: Likewise. * sysdeps/x86_64/fpu/svml_d_sincos8_core.S: Likewise. * sysdeps/x86_64/fpu/svml_s_sincosf16_core.S: Likewise. * sysdeps/x86_64/fpu/svml_s_sincosf4_core.S: Likewise. * sysdeps/x86_64/fpu/svml_s_sincosf8_core.S: Likewise. * sysdeps/x86_64/fpu/svml_s_sincosf8_core_avx.S: Likewise. * sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c: Use another wrapper for testing vector sincos with fixed ABI. * sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-double-libmvec-sincos-avx.c: New test. * sysdeps/x86_64/fpu/test-double-libmvec-sincos-avx2.c: Likewise. * sysdeps/x86_64/fpu/test-double-libmvec-sincos-avx512.c: Likewise. * sysdeps/x86_64/fpu/test-double-libmvec-sincos.c: Likewise. * sysdeps/x86_64/fpu/test-float-libmvec-sincosf-avx.c: Likewise. * sysdeps/x86_64/fpu/test-float-libmvec-sincosf-avx2.c: Likewise. * sysdeps/x86_64/fpu/test-float-libmvec-sincosf-avx512.c: Likewise. * sysdeps/x86_64/fpu/test-float-libmvec-sincosf.c: Likewise. * sysdeps/x86_64/fpu/Makefile: Added new tests.
* Do not raise "inexact" from x86_64 SSE4.1 ceil, floor (bug 15479).Joseph Myers2016-05-244-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | Continuing fixes for ceil and floor functions not to raise the "inexact" exception, this patch fixes the x86_64 SSE4.1 versions. The roundss / roundsd instructions take an immediate operand that determines the rounding mode and whether to raise "inexact"; this just needs bit 3 set to disable "inexact", which this patch does. Remark: we don't have an SSE4.1 version of trunc / truncf (using this instruction with operand 11); I'd expect one to make sense, but of course it should be benchmarked against the existing C code. I'll file a bug in Bugzilla for the lack of such a version. Tested for x86_64. [BZ #15479] * sysdeps/x86_64/fpu/multiarch/s_ceil.S (__ceil_sse41): Set bit 3 of immediate operand to rounding instruction. * sysdeps/x86_64/fpu/multiarch/s_ceilf.S (__ceilf_sse41): Likewise. * sysdeps/x86_64/fpu/multiarch/s_floor.S (__floor_sse41): Likewise. * sysdeps/x86_64/fpu/multiarch/s_floorf.S (__floorf_sse41): Likewise.
* Use JUMPTARGET in x86-64 mathvecH.J. Lu2016-03-1636-112/+112
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When PLT may be used, JUMPTARGET should be used instead calling the function directly. * sysdeps/x86_64/fpu/multiarch/svml_d_cos2_core_sse4.S (_ZGVbN2v_cos_sse4): Use JUMPTARGET to call cos. * sysdeps/x86_64/fpu/multiarch/svml_d_cos4_core_avx2.S (_ZGVdN4v_cos_avx2): Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_cos8_core_avx512.S (_ZGVdN4v_cos): Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_exp2_core_sse4.S (_ZGVbN2v_exp_sse4): Use JUMPTARGET to call exp. * sysdeps/x86_64/fpu/multiarch/svml_d_exp4_core_avx2.S (_ZGVdN4v_exp_avx2): Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_exp8_core_avx512.S (_ZGVdN4v_exp): Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_log2_core_sse4.S (_ZGVbN2v_log_sse4): Use JUMPTARGET to call log. * sysdeps/x86_64/fpu/multiarch/svml_d_log4_core_avx2.S (_ZGVdN4v_log_avx2): Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_log8_core_avx512.S (_ZGVdN4v_log): Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_pow2_core_sse4.S (_ZGVbN2vv_pow_sse4): Use JUMPTARGET to call pow. * sysdeps/x86_64/fpu/multiarch/svml_d_pow4_core_avx2.S (_ZGVdN4vv_pow_avx2): Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_pow8_core_avx512.S (_ZGVdN4vv_pow): Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sin2_core_sse4.S (_ZGVbN2v_sin_sse4): Use JUMPTARGET to call sin. * sysdeps/x86_64/fpu/multiarch/svml_d_sin4_core_avx2.S (_ZGVdN4v_sin_avx2): Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sin8_core_avx512.S (_ZGVdN4v_sin): Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos2_core_sse4.S (_ZGVbN2vvv_sincos_sse4): Use JUMPTARGET to call sin and cos. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos4_core_avx2.S (_ZGVdN4vvv_sincos_avx2): Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core_avx512.S (_ZGVdN4vvv_sincos): Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_cosf16_core_avx512.S (_ZGVdN8v_cosf): Use JUMPTARGET to call cosf. * sysdeps/x86_64/fpu/multiarch/svml_s_cosf4_core_sse4.S (_ZGVbN4v_cosf_sse4): Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_cosf8_core_avx2.S (_ZGVdN8v_cosf_avx2): Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core_avx512.S (_ZGVdN8v_expf): Use JUMPTARGET to call expf. * sysdeps/x86_64/fpu/multiarch/svml_s_expf4_core_sse4.S (_ZGVbN4v_expf_sse4): Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_expf8_core_avx2.S (_ZGVdN8v_expf_avx2): Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_logf16_core_avx512.S (_ZGVdN8v_logf): Use JUMPTARGET to call logf. * sysdeps/x86_64/fpu/multiarch/svml_s_logf4_core_sse4.S (_ZGVbN4v_logf_sse4): Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_logf8_core_avx2.S (_ZGVdN8v_logf_avx2): Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_powf16_core_avx512.S (_ZGVdN8vv_powf): Use JUMPTARGET to call powf. * sysdeps/x86_64/fpu/multiarch/svml_s_powf4_core_sse4.S (_ZGVbN4vv_powf_sse4): Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_powf8_core_avx2.S (_ZGVdN8vv_powf_avx2): Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core_avx512.S (_ZGVdN8vv_powf): Use JUMPTARGET to call sinf and cosf. * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core_sse4.S (_ZGVbN4vvv_sincosf_sse4): Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core_avx2.S (_ZGVdN8vvv_sincosf_avx2): Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sinf16_core_avx512.S (_ZGVdN8v_sinf): Use JUMPTARGET to call sinf. * sysdeps/x86_64/fpu/multiarch/svml_s_sinf4_core_sse4.S (_ZGVbN4v_sinf_sse4): Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sinf8_core_avx2.S (_ZGVdN8v_sinf_avx2): Likewise. * sysdeps/x86_64/fpu/svml_d_wrapper_impl.h (WRAPPER_IMPL_SSE2): Use JUMPTARGET to call callee. (WRAPPER_IMPL_SSE2_ff): Likewise. (WRAPPER_IMPL_SSE2_fFF): Likewise. (WRAPPER_IMPL_AVX): Likewise. (WRAPPER_IMPL_AVX_ff): Likewise. (WRAPPER_IMPL_AVX_fFF): Likewise. (WRAPPER_IMPL_AVX512): Likewise. (WRAPPER_IMPL_AVX512_ff): Likewise. * sysdeps/x86_64/fpu/svml_s_wrapper_impl.h (WRAPPER_IMPL_SSE2): Likewise. (WRAPPER_IMPL_SSE2_ff): Likewise. (WRAPPER_IMPL_SSE2_fFF): Likewise. (WRAPPER_IMPL_AVX): Likewise. (WRAPPER_IMPL_AVX_ff): Likewise. (WRAPPER_IMPL_AVX_fFF): Likewise. (WRAPPER_IMPL_AVX512): Likewise. (WRAPPER_IMPL_AVX512_ff): Likewise. (WRAPPER_IMPL_AVX512_fFF): Likewise.
* Update copyright dates with scripts/update-copyrights.Joseph Myers2016-01-0482-82/+82
|
* Remove configure tests for FMA4 support.Joseph Myers2015-10-0911-64/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GCC added support for -mfma4 in version 4.5. Thus the configure tests for this support are obsolete, and this patch removes them. Tested for x86_64 and x86 (testsuite, and that installed stripped shared libraries are unchanged by this patch). * sysdeps/i386/configure.ac (libc_cv_cc_fma4): Remove configure test. * sysdeps/i386/configure: Regenerated. * sysdeps/x86_64/configure.ac (libc_cv_cc_fma4): Remove configure test. * sysdeps/x86_64/configure: Regenerated. * sysdeps/x86_64/fpu/multiarch/Makefile [$(have-mfma4) = yes]: Make code unconditional. * sysdeps/x86_64/fpu/multiarch/e_asin.c [HAVE_FMA4_SUPPORT]: Likewise. * sysdeps/x86_64/fpu/multiarch/e_atan2.c [HAVE_FMA4_SUPPORT]: Likewise. [!HAVE_FMA4_SUPPORT]: Remove conditional code. * sysdeps/x86_64/fpu/multiarch/e_exp.c [HAVE_FMA4_SUPPORT]: Make code unconditional. [!HAVE_FMA4_SUPPORT]: Remove conditional code. * sysdeps/x86_64/fpu/multiarch/e_log.c [HAVE_FMA4_SUPPORT]: Make code unconditional. [!HAVE_FMA4_SUPPORT]: Remove conditional code. * sysdeps/x86_64/fpu/multiarch/e_pow.c [HAVE_FMA4_SUPPORT]: Make code unconditional. * sysdeps/x86_64/fpu/multiarch/s_atan.c [HAVE_FMA4_SUPPORT]: Make code unconditional. [!HAVE_FMA4_SUPPORT]: Remove conditional code. * sysdeps/x86_64/fpu/multiarch/s_fma.c [HAVE_FMA4_SUPPORT]: Make code unconditional. [!HAVE_FMA4_SUPPORT]: Remove conditional code. * sysdeps/x86_64/fpu/multiarch/s_fmaf.c [HAVE_FMA4_SUPPORT]: Make code unconditional. [!HAVE_FMA4_SUPPORT]: Remove conditional code. * sysdeps/x86_64/fpu/multiarch/s_sin.c [HAVE_FMA4_SUPPORT]: Make code unconditional. [!HAVE_FMA4_SUPPORT]: Remove conditional code. * sysdeps/x86_64/fpu/multiarch/s_tan.c [HAVE_FMA4_SUPPORT]: Make code unconditional. [!HAVE_FMA4_SUPPORT]: Remove conditional code. * config.h.in (HAVE_FMA4_SUPPORT): Remove #undef.
* Remove configure tests for AVX support.Joseph Myers2015-10-089-94/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GCC added support for -mavx and -msse2avx in version 4.4. Thus the configure tests for this support are obsolete, and this patch removes them. Tested for x86_64 and x86 (testsuite, and that installed stripped shared libraries are unchanged by this patch). * sysdeps/i386/configure.ac (libc_cv_cc_avx): Remove configure test. (libc_cv_cc_sse2avx): Likewise. * sysdeps/i386/configure: Regenerated. * sysdeps/i386/i686/multiarch/Makefile [$(subdir)$(config-cflags-avx) = mathyes]: Change conditional to [$(subdir) = math]. * sysdeps/i386/i686/multiarch/s_fma-fma.c [HAVE_AVX_SUPPORT]: Make code unconditional. * sysdeps/i386/i686/multiarch/s_fma.c [HAVE_AVX_SUPPORT]: Likewise. * sysdeps/i386/i686/multiarch/s_fmaf-fma.c [HAVE_AVX_SUPPORT]: Likewise. * sysdeps/i386/i686/multiarch/s_fmaf.c [HAVE_AVX_SUPPORT]: Likewise. * sysdeps/x86_64/configure.ac (libc_cv_cc_avx): Remove configure test. (libc_cv_cc_sse2avx): Likewise. * sysdeps/x86_64/configure: Regenerated. * sysdeps/x86_64/Makefile [$(config-cflags-avx) = yes]: Make code unconditional. * sysdeps/x86_64/dl-trampoline.h (_dl_runtime_profile) [HAVE_AVX_SUPPORT || HAVE_AVX512_ASM_SUPPORT]: Make code unconditional. (_dl_runtime_profile) [!(HAVE_AVX_SUPPORT || HAVE_AVX512_ASM_SUPPORT)]: Remove conditional code. * sysdeps/x86_64/fpu/multiarch/Makefile [$(config-cflags-sse2avx) = yes]: Make code unconditional. * sysdeps/x86_64/fpu/multiarch/e_atan2.c [HAVE_FMA4_SUPPORT || HAVE_AVX_SUPPORT]: Likewise. * sysdeps/x86_64/fpu/multiarch/e_exp.c [HAVE_FMA4_SUPPORT || HAVE_AVX_SUPPORT]: Likewise. * sysdeps/x86_64/fpu/multiarch/e_log.c [HAVE_FMA4_SUPPORT || HAVE_AVX_SUPPORT]: Likewise. * sysdeps/x86_64/fpu/multiarch/s_atan.c [HAVE_FMA4_SUPPORT || HAVE_AVX_SUPPORT]: Likewise. * sysdeps/x86_64/fpu/multiarch/s_fma.c [HAVE_AVX_SUPPORT]: Likewise. * sysdeps/x86_64/fpu/multiarch/s_fmaf.c [HAVE_AVX_SUPPORT]: Likewise. * sysdeps/x86_64/fpu/multiarch/s_sin.c [HAVE_FMA4_SUPPORT || HAVE_AVX_SUPPORT]: Likewise. * sysdeps/x86_64/fpu/multiarch/s_tan.c [HAVE_FMA4_SUPPORT || HAVE_AVX_SUPPORT]: Likewise. * sysdeps/x86_64/multiarch/strcmp.S [HAVE_AVX_SUPPORT]: Likewise. * config.h.in (HAVE_AVX_SUPPORT): Remove #undef. (HAVE_SSE2AVX_SUPPORT): Likewise.
* Fix x86_64 fma4 pow inappropriate contraction (bug 19003).Joseph Myers2015-09-241-1/+1
| | | | | | | | | | | | | | | | The x86_64 fma4 version of pow fails to disable contraction of operations other than those explicitly intended to use fma instructions, so resulting in large ulps errors on processors with fma4 instructions, as in bug 18104 (165ulp for the test added for that bug; error originally reported by "blaaa" on #glibc). This patch adds $(config-cflags-nofma) for e_pow-fma4.c, corresponding to the use for e_pow.c in sysdeps/ieee754/dbl-64/Makefile. Tested for x86_64 on a processor with fma4. [BZ #19003] * sysdeps/x86_64/fpu/multiarch/Makefile (CFLAGS-e_pow-fma4.c): Add $(config-cflags-nofma).
* Remove incorrect register mov in floorf/nearbyint on x86_64Siddhesh Poyarekar2015-08-142-2/+0
| | | | | | | | | | | | | The change in 0b5395f052ee09cd7e3d219af4e805c38058afb5 replaced calls to __get_cpu_features@plt followed by a mov from rax to rdx, with a single macro LOAD_RTLD_GLOBAL_RO_RDX. It is pretty clear that there was a typo in s_floorf and __nearbyint due to which the (now incorrect) mov was not removed. This patch removes that mov. * sysdeps/x86_64/fpu/multiarch/s_floorf.S (__floorf): Remove unnecessary movq. * sysdeps/x86_64/fpu/multiarch/s_nearbyint.S (__nearbyint): Likewise.
* Update libmvec multiarch functions for <cpu-features.h>H.J. Lu2015-08-1336-190/+118
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch updates libmvec multiarch functions to use the newly defined HAS_CPU_FEATURE, HAS_ARCH_FEATURE and LOAD_RTLD_GLOBAL_RO_RDX from <cpu-features.h>. * math/Makefile ($(addprefix $(objpfx), $(libm-vec-tests))): Remove $(objpfx)init-arch.o. * sysdeps/x86_64/fpu/Makefile (libmvec-support): Remove init-arch. * sysdeps/x86_64/fpu/math-tests-arch.h (avx_usable): Removed. (INIT_ARCH_EXT): Defined as empty. (CHECK_ARCH_EXT): Replace HAS_XXX with HAS_ARCH_FEATURE (XXX). * sysdeps/x86_64/fpu/multiarch/svml_d_cos2_core.S: Remove __init_cpu_features call. Replace HAS_XXX with HAS_CPU_FEATURE/HAS_ARCH_FEATURE (XXX). * sysdeps/x86_64/fpu/multiarch/svml_d_cos4_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_cos8_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_exp2_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_exp4_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_exp8_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_log2_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_log4_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_log8_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_pow2_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_pow4_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_pow8_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sin2_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sin4_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sin8_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos2_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos4_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_cosf16_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_cosf4_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_cosf8_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_expf4_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_expf8_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_logf16_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_logf4_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_logf8_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_powf16_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_powf4_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_powf8_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sinf16_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sinf4_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sinf8_core.S: Likewise.
* Update x86_64 multiarch functions for <cpu-features.h>H.J. Lu2015-08-1318-59/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch updates x86_64 multiarch functions to use the newly defined HAS_CPU_FEATURE, HAS_ARCH_FEATURE and LOAD_RTLD_GLOBAL_RO_RDX from <cpu-features.h>. * sysdeps/x86_64/fpu/multiarch/e_asin.c: Replace HAS_XXX with HAS_CPU_FEATURE/HAS_ARCH_FEATURE (XXX). * sysdeps/x86_64/fpu/multiarch/e_atan2.c: Likewise. * sysdeps/x86_64/fpu/multiarch/e_exp.c: Likewise. * sysdeps/x86_64/fpu/multiarch/e_log.c: Likewise. * sysdeps/x86_64/fpu/multiarch/e_pow.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_atan.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_fmaf.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_sin.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_tan.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_ceil.S: Use LOAD_RTLD_GLOBAL_RO_RDX and HAS_CPU_FEATURE (SSE4_1). * sysdeps/x86_64/fpu/multiarch/s_ceilf.S: Likewise. * sysdeps/x86_64/fpu/multiarch/s_floor.S: Likewise. * sysdeps/x86_64/fpu/multiarch/s_floorf.S: Likewise. * sysdeps/x86_64/fpu/multiarch/s_nearbyint.S : Likewise. * sysdeps/x86_64/fpu/multiarch/s_nearbyintf.S: Likewise. * sysdeps/x86_64/fpu/multiarch/s_rintf.S: Likewise. * sysdeps/x86_64/fpu/multiarch/s_rintf.S : Likewise. * sysdeps/x86_64/multiarch/ifunc-impl-list.c: Likewise. * sysdeps/x86_64/multiarch/sched_cpucount.c: Likewise. * sysdeps/x86_64/multiarch/strstr.c: Likewise. * sysdeps/x86_64/multiarch/memmove.c: Likewise. * sysdeps/x86_64/multiarch/memmove_chk.c: Likewise. * sysdeps/x86_64/multiarch/test-multiarch.c: Likewise. * sysdeps/x86_64/multiarch/memcmp.S: Remove __init_cpu_features call. Add LOAD_RTLD_GLOBAL_RO_RDX. Replace HAS_XXX with HAS_CPU_FEATURE/HAS_ARCH_FEATURE (XXX). * sysdeps/x86_64/multiarch/memcpy.S: Likewise. * sysdeps/x86_64/multiarch/memcpy_chk.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy_chk.S: Likewise. * sysdeps/x86_64/multiarch/memset.S: Likewise. * sysdeps/x86_64/multiarch/memset_chk.S: Likewise. * sysdeps/x86_64/multiarch/strcat.S: Likewise. * sysdeps/x86_64/multiarch/strchr.S: Likewise. * sysdeps/x86_64/multiarch/strcmp.S: Likewise. * sysdeps/x86_64/multiarch/strcpy.S: Likewise. * sysdeps/x86_64/multiarch/strcspn.S: Likewise. * sysdeps/x86_64/multiarch/strspn.S: Likewise. * sysdeps/x86_64/multiarch/wcscpy.S: Likewise. * sysdeps/x86_64/multiarch/wmemcmp.S: Likewise.
* Fixed several libmvec bugs found during testing on KNL hardware.Andrew Senkevich2015-07-2413-60/+61
| | | | | | | | | | | | | | | | | | | | | | AVX512 IFUNC implementations, implementations of wrappers to AVX2 versions and KNL expf implementation fixed. * sysdeps/x86_64/fpu/multiarch/svml_d_cos8_core.S: Fixed AVX512 IFUNC. * sysdeps/x86_64/fpu/multiarch/svml_d_exp8_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_log8_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_pow8_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sin8_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_cosf16_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_logf16_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_powf16_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sinf16_core.S: Likewise. * sysdeps/x86_64/fpu/svml_d_wrapper_impl.h: Fixed wrappers to AVX2. * sysdeps/x86_64/fpu/svml_s_wrapper_impl.h: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core_avx512.S: Fixed KNL implementation.
* Combination of data tables for x86_64 vector functions sinf, cosf and sincosf.Andrew Senkevich2015-06-249-21/+21
| | | | | | | | | | | | | | | | | | | | | | * sysdeps/x86_64/fpu/Makefile (libmvec-support): Fixed files list. * sysdeps/x86_64/fpu/multiarch/svml_s_cosf4_core_sse4.S: Renamed variable and included header. * sysdeps/x86_64/fpu/multiarch/svml_s_cosf8_core_avx2.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_cosf16_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sinf4_core_sse4.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sinf8_core_avx2.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sinf16_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core_sse4.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core_avx2.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/svml_s_trig_data.S: New file. * sysdeps/x86_64/fpu/svml_s_trig_data.h: Likewise. * sysdeps/x86_64/fpu/svml_s_cosf_data.S: Removed file. * sysdeps/x86_64/fpu/svml_s_cosf_data.h: Likewise. * sysdeps/x86_64/fpu/svml_s_sinf_data.S: Likewise. * sysdeps/x86_64/fpu/svml_s_sinf_data.h: Likewise. * sysdeps/x86_64/fpu/svml_s_sincosf_data.S: Likewise. * sysdeps/x86_64/fpu/svml_s_sincosf_data.h: Likewise.
* Combination of data tables for x86_64 vector functions sin, cos and sincos.Andrew Senkevich2015-06-239-61/+61
| | | | | | | | | | | | | | | | | | | | | | | | | * sysdeps/x86_64/fpu/Makefile (libmvec-support): Fixed files list. * sysdeps/x86_64/fpu/multiarch/svml_d_cos2_core_sse4.S: Renamed variable and included header. * sysdeps/x86_64/fpu/multiarch/svml_d_cos4_core_avx2.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_cos8_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sin2_core_sse4.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sin4_core_avx2.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sin8_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos2_core_sse4.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos4_core_avx2.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/svml_d_trig_data.S: New file. * sysdeps/x86_64/fpu/svml_d_trig_data.h: Likewise. * sysdeps/x86_64/fpu/svml_d_cos2_core.S: Removed unneeded include. * sysdeps/x86_64/fpu/svml_d_cos4_core.S: Likewise. * sysdeps/x86_64/fpu/svml_d_cos8_core.S: Likewise. * sysdeps/x86_64/fpu/svml_d_cos_data.S: Removed file. * sysdeps/x86_64/fpu/svml_d_cos_data.h: Likewise. * sysdeps/x86_64/fpu/svml_d_sin_data.S: Likewise. * sysdeps/x86_64/fpu/svml_d_sin_data.h: Likewise. * sysdeps/x86_64/fpu/svml_d_sincos_data.S: Likewise. * sysdeps/x86_64/fpu/svml_d_sincos_data.h: Likewise.
* Vector sincosf for x86_64 and tests.Andrew Senkevich2015-06-187-1/+1130
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here is implementation of vectorized sincosf containing SSE, AVX, AVX2 and AVX512 versions according to Vector ABI <https://groups.google.com/forum/#!topic/x86-64-abi/LmppCfN1rZ4>. * NEWS: Mention addition of x86_64 vector sincosf. * math/test-float-vlen16.h: Added wrapper for sincosf tests. * math/test-float-vlen4.h: Likewise. * math/test-float-vlen8.h: Likewise. * sysdeps/unix/sysv/linux/x86_64/libmvec.abilist: New symbols added. * sysdeps/x86/fpu/bits/math-vector.h: Added sincosf SIMD declaration. * sysdeps/x86_64/fpu/Makefile (libmvec-support): Added new files. * sysdeps/x86_64/fpu/Versions: New versions added. * sysdeps/x86_64/fpu/libm-test-ulps: Regenerated. * sysdeps/x86_64/fpu/multiarch/Makefile (libmvec-sysdep_routines): Added build of SSE, AVX2 and AVX512 IFUNC versions. * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core.S * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core_avx512.S * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core.S * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core_sse4.S * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core.S * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core_avx2.S * sysdeps/x86_64/fpu/svml_s_sincosf16_core.S * sysdeps/x86_64/fpu/svml_s_sincosf4_core.S * sysdeps/x86_64/fpu/svml_s_sincosf8_core.S * sysdeps/x86_64/fpu/svml_s_sincosf8_core_avx.S * sysdeps/x86_64/fpu/svml_s_sincosf_data.S: New file. * sysdeps/x86_64/fpu/svml_s_sincosf_data.h: New file. * sysdeps/x86_64/fpu/svml_s_wrapper_impl.h: Added 3 argument wrappers. * sysdeps/x86_64/fpu/test-float-vlen16.c: : Vector sincosf tests. * sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen4.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen8-avx2.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen8.c: Likewise.
* Vector sincos for x86_64 and tests.Andrew Senkevich2015-06-187-1/+1301
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here is implementation of vectorized sincos containing SSE, AVX, AVX2 and AVX512 versions according to Vector ABI <https://groups.google.com/forum/#!topic/x86-64-abi/LmppCfN1rZ4>. * NEWS: Mention addition of x86_64 vector sincos. * bits/libm-simd-decl-stubs.h: Added stubs for sincos. * math/math.h (__MATHDECL_VEC): New macro. * math/bits/mathcalls.h: Added sincos declaration with __MATHDECL_VEC. * math/gen-libm-have-vector-test.sh: Added generation of sincos wrapper declaration under condition. * math/test-vec-loop.h (TEST_VEC_LOOP): Refactored. * math/test-double-vlen2.h: Added wrapper for sincos tests, reflected TEST_VEC_LOOP change. * math/test-double-vlen4.h: Likewise. * math/test-double-vlen8.h: Likewise. * math/test-float-vlen16.h: Reflected TEST_VEC_LOOP change. * math/test-float-vlen4.h: Likewise. * math/test-float-vlen8.h: Likewise. * sysdeps/unix/sysv/linux/x86_64/libmvec.abilist: New symbols added. * sysdeps/x86/fpu/bits/math-vector.h: Added sincos SIMD declaration. * sysdeps/x86_64/fpu/Makefile (libmvec-support): Added new files. * sysdeps/x86_64/fpu/Versions: New versions added. * sysdeps/x86_64/fpu/libm-test-ulps: Regenerated. * sysdeps/x86_64/fpu/multiarch/Makefile (libmvec-sysdep_routines): Added build of SSE, AVX2 and AVX512 IFUNC versions. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos2_core.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos2_core_sse4.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos4_core.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos4_core_avx2.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core_avx512.S: New file. * sysdeps/x86_64/fpu/svml_d_sincos2_core.S: New file. * sysdeps/x86_64/fpu/svml_d_sincos4_core.S: New file. * sysdeps/x86_64/fpu/svml_d_sincos4_core_avx.S: New file. * sysdeps/x86_64/fpu/svml_d_sincos8_core.S: New file. * sysdeps/x86_64/fpu/svml_d_sincos_data.S: New file. * sysdeps/x86_64/fpu/svml_d_sincos_data.h: New file. * sysdeps/x86_64/fpu/svml_d_wrapper_impl.h: Added wrappers for sincos. * sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c: Vector sincos tests. * sysdeps/x86_64/fpu/test-double-vlen2.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen4-avx2.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen4.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen8.c: Likewise.
* Vector powf for x86_64 and tests.Andrew Senkevich2015-06-187-1/+1502
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here is implementation of vectorized powf containing SSE, AVX, AVX2 and AVX512 versions according to Vector ABI <https://groups.google.com/forum/#!topic/x86-64-abi/LmppCfN1rZ4>. * sysdeps/unix/sysv/linux/x86_64/libmvec.abilist: New symbols added. * sysdeps/x86/fpu/bits/math-vector.h: Added SIMD declaration and asm redirections for powf. * sysdeps/x86_64/fpu/Makefile (libmvec-support): Added new files. * sysdeps/x86_64/fpu/Versions: New versions added. * sysdeps/x86_64/fpu/libm-test-ulps: Regenerated. * sysdeps/x86_64/fpu/multiarch/Makefile (libmvec-sysdep_routines): Added build of SSE, AVX2 and AVX512 IFUNC versions. * sysdeps/x86_64/fpu/svml_s_wrapper_impl.h: Added 2 argument wrappers. * sysdeps/x86_64/fpu/multiarch/svml_s_powf16_core.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_s_powf16_core_avx512.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_s_powf4_core.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_s_powf4_core_sse4.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_s_powf8_core.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_s_powf8_core_avx2.S: New file. * sysdeps/x86_64/fpu/svml_s_powf16_core.S: New file. * sysdeps/x86_64/fpu/svml_s_powf4_core.S: New file. * sysdeps/x86_64/fpu/svml_s_powf8_core.S: New file. * sysdeps/x86_64/fpu/svml_s_powf8_core_avx.S: New file. * sysdeps/x86_64/fpu/svml_s_powf_data.S: New file. * sysdeps/x86_64/fpu/svml_s_powf_data.h: New file. * sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c: Vector powf tests. * sysdeps/x86_64/fpu/test-float-vlen16.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen4.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen8-avx2.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen8.c: Likewise. * math/test-float-vlen16.h: Fixed 2 argument macro. * math/test-float-vlen4.h: Likewise. * math/test-float-vlen8.h: Likewise. * NEWS: Mention addition of x86_64 vector powf.
* Vector pow for x86_64 and tests.Andrew Senkevich2015-06-177-1/+1677
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here is implementation of vectorized pow containing SSE, AVX, AVX2 and AVX512 versions according to Vector ABI <https://groups.google.com/forum/#!topic/x86-64-abi/LmppCfN1rZ4>. * bits/libm-simd-decl-stubs.h: Added stubs for pow. * math/bits/mathcalls.h: Added pow declaration with __MATHCALL_VEC. * sysdeps/unix/sysv/linux/x86_64/libmvec.abilist: New versions added. * sysdeps/x86/fpu/bits/math-vector.h: Added SIMD declaration and asm redirections for pow. * sysdeps/x86_64/fpu/Makefile (libmvec-support): Added new files. * sysdeps/x86_64/fpu/Versions: New versions added. * sysdeps/x86_64/fpu/libm-test-ulps: Regenerated. * sysdeps/x86_64/fpu/multiarch/Makefile (libmvec-sysdep_routines): Added build of SSE, AVX2 and AVX512 IFUNC versions. * sysdeps/x86_64/fpu/svml_d_wrapper_impl.h: Added 2 argument wrappers. * sysdeps/x86_64/fpu/multiarch/svml_d_pow2_core.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_d_pow2_core_sse4.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_d_pow4_core.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_d_pow4_core_avx2.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_d_pow8_core.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_d_pow8_core_avx512.S: New file. * sysdeps/x86_64/fpu/svml_d_pow2_core.S: New file. * sysdeps/x86_64/fpu/svml_d_pow4_core.S: New file. * sysdeps/x86_64/fpu/svml_d_pow4_core_avx.S: New file. * sysdeps/x86_64/fpu/svml_d_pow8_core.S: New file. * sysdeps/x86_64/fpu/svml_d_pow_data.S: New file. * sysdeps/x86_64/fpu/svml_d_pow_data.h: New file. * sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c: Added vector pow test. * sysdeps/x86_64/fpu/test-double-vlen2.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen4-avx2.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen4.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen8.c: Likewise. * NEWS: Mention addition of x86_64 vector pow.
* Vector expf for x86_64 and tests.Andrew Senkevich2015-06-177-1/+978
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here is implementation of vectorized expf containing SSE, AVX, AVX2 and AVX512 versions according to Vector ABI <https://groups.google.com/forum/#!topic/x86-64-abi/LmppCfN1rZ4>. * sysdeps/unix/sysv/linux/x86_64/libmvec.abilist: New symbols added. * sysdeps/x86/fpu/bits/math-vector.h: Added SIMD declaration and asm redirections for expf. * sysdeps/x86_64/fpu/Makefile (libmvec-support): Added new files. * sysdeps/x86_64/fpu/Versions: New versions added. * sysdeps/x86_64/fpu/libm-test-ulps: Regenerated. * sysdeps/x86_64/fpu/multiarch/Makefile (libmvec-sysdep_routines): Added build of SSE, AVX2 and AVX512 IFUNC versions. * sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core_avx512.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_s_expf4_core.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_s_expf4_core_sse4.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_s_expf8_core.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_s_expf8_core_avx2.S: New file. * sysdeps/x86_64/fpu/svml_s_expf16_core.S: New file. * sysdeps/x86_64/fpu/svml_s_expf4_core.S: New file. * sysdeps/x86_64/fpu/svml_s_expf8_core.S: New file. * sysdeps/x86_64/fpu/svml_s_expf8_core_avx.S: New file. * sysdeps/x86_64/fpu/svml_s_expf_data.S: New file. * sysdeps/x86_64/fpu/svml_s_expf_data.h: New file. * sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c: Vector expf tests. * sysdeps/x86_64/fpu/test-float-vlen16.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen4.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen8-avx2.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen8.c: Likewise. * NEWS: Mention addition of x86_64 vector expf.