about summary refs log tree commit diff
path: root/sysdeps/x86_64
Commit message (Collapse)AuthorAgeFilesLines
* Don't disable SSE in x86-64 ld.soH.J. Lu2015-08-261-0/+4
| | | | | | | | | | | | | | | | Since x86-64 ld.so preserves vector registers now, we can use SSE in x86-64 ld.so. We should run tst-ld-sse-use.sh only on i386. * sysdeps/x86/Makefile [$(subdir) == elf] (CFLAGS-.os, tests-special, $(objpfx)tst-ld-sse-use.out): Moved to ... * sysdeps/i386/Makefile [$(subdir) == elf] (CFLAGS-.os, tests-special, $(objpfx)tst-ld-sse-use.out): Here. Update comments. * sysdeps/x86_64/Makefile [$(subdir) == elf] (CFLAGS-.os): Add -mno-mmx for $(all-rtld-routines). * sysdeps/x86/tst-ld-sse-use.sh: Moved to ... * sysdeps/i386/tst-ld-sse-use.sh: Here. Replace x86-64 with i386.
* Use SSE2 optimized strcmp in x86-64 ld.soH.J. Lu2015-08-251-253/+216
| | | | | | | Since ld.so preserves vector registers now, we can use the same SSE2 optimized strcmp in x86-64 libc and ld.so. * sysdeps/x86_64/strcmp.S: Remove "#if !IS_IN (libc)".
* Replace %xmm[8-12] with %xmm[0-4]H.J. Lu2015-08-251-47/+47
| | | | | | | Since ld.so preserves vector registers now, we can use %xmm[0-4] to avoid the REX prefix. * sysdeps/x86_64/strlen.S: Replace %xmm[8-12] with %xmm[0-4].
* Remove x86-64 rtld-xxx.c and rtld-xxx.SH.J. Lu2015-08-256-464/+0
| | | | | | | | | | | | Since ld.so preserves vector registers now, we can use the regular, non-ifunc string and memory functions in ld.so. * sysdeps/x86_64/rtld-memcmp.c: Removed. * sysdeps/x86_64/rtld-memset.S: Likewise. * sysdeps/x86_64/rtld-strchr.S: Likewise. * sysdeps/x86_64/rtld-strlen.S: Likewise. * sysdeps/x86_64/multiarch/rtld-memcmp.c: Likewise. * sysdeps/x86_64/multiarch/rtld-memset.S: Likewise.
* Replace %xmm8 with %xmm0H.J. Lu2015-08-251-26/+26
| | | | | | | Since ld.so preserves vector registers now, we can use %xmm0 to avoid the REX prefix. * sysdeps/x86_64/memset.S: Replace %xmm8 with %xmm0.
* Fix strcpy_chk and stpcpy_chk performance.Ondřej Bílka2015-08-252-211/+0
| | | | | | | | | | | | | | | Hi, as I wrote in previous patches a performance of checked strcpy and stpcpy is terrible as these don't use sse2 and are around four times slower that strcpy and stpcpy now. As this bug shows that these functions are not performance sensitive I decided just to improve generic implementation instead for easier maintainance. * debug/strcpy_chk.c: Improve performance. * debug/stpcpy_chk.c: Likewise. * sysdeps/x86_64/strcpy_chk.S: Remove. * sysdeps/x86_64/stpcpy_chk.S: Remove.
* Save and restore vector registers in x86-64 ld.soH.J. Lu2015-08-258-501/+472
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds SSE, AVX and AVX512 versions of _dl_runtime_resolve and _dl_runtime_profile, which save and restore the first 8 vector registers used for parameter passing. elf_machine_runtime_setup selects the proper _dl_runtime_resolve or _dl_runtime_profile based on _dl_x86_cpu_features. It avoids race condition caused by FOREIGN_CALL macros, which are only used for x86-64. Performance impact of saving and restoring 8 vector registers are negligible on Nehalem, Sandy Bridge, Ivy Bridge and Haswell when ld.so is optimized with SSE2. [BZ #15128] * sysdeps/x86_64/Makefile [$(subdir) == elf] (tests): Add ifuncmain8. (modules-names): Add ifuncmod8. ($(objpfx)ifuncmain8): New rule. * sysdeps/x86_64/dl-machine.h: Include <dl-procinfo.h> and <cpuid.h>. (elf_machine_runtime_setup): Use _dl_runtime_resolve_sse, _dl_runtime_resolve_avx, or _dl_runtime_resolve_avx512, _dl_runtime_profile_sse, _dl_runtime_profile_avx, or _dl_runtime_profile_avx512, based on HAS_ARCH_FEATURE. * sysdeps/x86_64/dl-trampoline.S: Rewrite. * sysdeps/x86_64/dl-trampoline.h: Likewise. * sysdeps/x86_64/ifuncmain8.c: New file. * sysdeps/x86_64/ifuncmod8.c: Likewise. * sysdeps/x86_64/nptl/tcb-offsets.sym (RTLD_SAVESPACE_SSE): Removed. * sysdeps/x86_64/nptl/tls.h (__128bits): Removed. (tcbhead_t): Change rtld_must_xmm_save to __glibc_unused1. Change rtld_savespace_sse to __glibc_unused2. (RTLD_CHECK_FOREIGN_CALL): Removed. (RTLD_ENABLE_FOREIGN_CALL): Likewise. (RTLD_PREPARE_FOREIGN_CALL): Likewise. (RTLD_FINALIZE_FOREIGN_CALL): Likewise.
* Remove the unused IFUNC filesH.J. Lu2015-08-202-17/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sysdeps/i386/i686/multiarch/strcasestr-c.c became unused after commit 1818483b15d22016b0eae41d37ee91cc87b37510 Author: Andreas Schwab <schwab@suse.de> Date: Wed Dec 18 11:53:27 2013 +1000 Remove use of SSE4.2 functions for strstr on i686 which contains -sysdep_routines += strcspn-c strpbrk-c strspn-c strstr-c strcasestr-c +sysdep_routines += strcspn-c strpbrk-c strspn-c sysdeps/x86_64/multiarch/strcasestr.c became useless after t 584b18eb4df61ccd447db2dfe8c8a7901f8c8598 Author: Ondřej Bílka <neleai@seznam.cz> Date: Sat Dec 14 19:33:56 2013 +0100 Add strstr with unaligned loads. Fixes bug 12100. which changes sysdeps/x86_64/multiarch/strcasestr.c to libc_ifunc (__strcasestr, __strcasestr_sse2); This patch removes these file. * i386/i686/multiarch/strcasestr-c.c: Removed. * x86_64/multiarch/strcasestr.c: Likewise. * x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Remove strcasestr.
* Move x86_64 init-arch.h to sysdeps/x86/init-arch.hH.J. Lu2015-08-202-23/+1
| | | | | | | | | | | | Move sysdeps/x86_64/multiarch/init-arch.h to sysdeps/x86/init-arch.h which can be used for both i386 and x86_64. * sysdeps/i386/i686/multiarch/init-arch.h: Removed. * sysdeps/unix/sysv/linux/x86/init-arch.h: Likewise. * sysdeps/x86_64/cacheinfo.c: Include <init-arch.h> instead of "multiarch/init-arch.h". * sysdeps/x86_64/multiarch/init-arch.h: Renamed to ... * sysdeps/x86/init-arch.h: This.
* Fix dynamic linker issue with bind-nowPetar Jovanovic2015-08-193-0/+38
| | | | | | | | | | | | | | Fix the bind-now case when DT_REL and DT_JMPREL sections are separate and there is a gap between them. [BZ #14341] * elf/dynamic-link.h (elf_machine_lazy_rel): Properly handle the case when there is a gap between DT_REL and DT_JMPREL sections. * sysdeps/x86_64/Makefile (tests): Add tst-split-dynreloc. (LDFLAGS-tst-split-dynreloc): New. (tst-split-dynreloc-ENV): Likewise. * sysdeps/x86_64/tst-split-dynreloc.c: New file. * sysdeps/x86_64/tst-split-dynreloc.lds: Likewise.
* Fix BZ #18084 -- backtrace (..., 0) dumps core on x86.Paul Pluzhnikov2015-08-151-2/+5
| | | | Other architectures also had bugs, or did unnecessary work.
* Regenerated sysdeps/x86_64/fpu/libm-test-ulps with AVX2.Paul Pluzhnikov2015-08-141-1/+1
|
* Remove incorrect register mov in floorf/nearbyint on x86_64Siddhesh Poyarekar2015-08-142-2/+0
| | | | | | | | | | | | | The change in 0b5395f052ee09cd7e3d219af4e805c38058afb5 replaced calls to __get_cpu_features@plt followed by a mov from rax to rdx, with a single macro LOAD_RTLD_GLOBAL_RO_RDX. It is pretty clear that there was a typo in s_floorf and __nearbyint due to which the (now incorrect) mov was not removed. This patch removes that mov. * sysdeps/x86_64/fpu/multiarch/s_floorf.S (__floorf): Remove unnecessary movq. * sysdeps/x86_64/fpu/multiarch/s_nearbyint.S (__nearbyint): Likewise.
* Add more random libm-test inputs.Joseph Myers2015-08-131-62/+64
| | | | | | | | | | | | | | | | This patch adds more test inputs to various libm functions found through random generation to have larger ulps errors than previously listed in libm-test-ulp, on at least one of x86_64 and x86. Tested for x86_64 and x86. * math/auto-libm-test-in: Add more tests of acos, acosh, asin, asinh, atan, atan2, atanh, cabs, cbrt, cosh, csqrt, erf, erfc, exp, exp2, lgamma, log, log1p, log2, pow, sin, sincos, tan, tanh and tgamma. * math/auto-libm-test-out: Regenerated. * sysdeps/i386/fpu/libm-test-ulps: Update. * sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
* Update libmvec multiarch functions for <cpu-features.h>H.J. Lu2015-08-1338-227/+125
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch updates libmvec multiarch functions to use the newly defined HAS_CPU_FEATURE, HAS_ARCH_FEATURE and LOAD_RTLD_GLOBAL_RO_RDX from <cpu-features.h>. * math/Makefile ($(addprefix $(objpfx), $(libm-vec-tests))): Remove $(objpfx)init-arch.o. * sysdeps/x86_64/fpu/Makefile (libmvec-support): Remove init-arch. * sysdeps/x86_64/fpu/math-tests-arch.h (avx_usable): Removed. (INIT_ARCH_EXT): Defined as empty. (CHECK_ARCH_EXT): Replace HAS_XXX with HAS_ARCH_FEATURE (XXX). * sysdeps/x86_64/fpu/multiarch/svml_d_cos2_core.S: Remove __init_cpu_features call. Replace HAS_XXX with HAS_CPU_FEATURE/HAS_ARCH_FEATURE (XXX). * sysdeps/x86_64/fpu/multiarch/svml_d_cos4_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_cos8_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_exp2_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_exp4_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_exp8_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_log2_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_log4_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_log8_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_pow2_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_pow4_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_pow8_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sin2_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sin4_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sin8_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos2_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos4_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_cosf16_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_cosf4_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_cosf8_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_expf4_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_expf8_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_logf16_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_logf4_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_logf8_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_powf16_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_powf4_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_powf8_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sinf16_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sinf4_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sinf8_core.S: Likewise.
* Update x86_64 multiarch functions for <cpu-features.h>H.J. Lu2015-08-1339-233/+246
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch updates x86_64 multiarch functions to use the newly defined HAS_CPU_FEATURE, HAS_ARCH_FEATURE and LOAD_RTLD_GLOBAL_RO_RDX from <cpu-features.h>. * sysdeps/x86_64/fpu/multiarch/e_asin.c: Replace HAS_XXX with HAS_CPU_FEATURE/HAS_ARCH_FEATURE (XXX). * sysdeps/x86_64/fpu/multiarch/e_atan2.c: Likewise. * sysdeps/x86_64/fpu/multiarch/e_exp.c: Likewise. * sysdeps/x86_64/fpu/multiarch/e_log.c: Likewise. * sysdeps/x86_64/fpu/multiarch/e_pow.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_atan.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_fmaf.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_sin.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_tan.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_ceil.S: Use LOAD_RTLD_GLOBAL_RO_RDX and HAS_CPU_FEATURE (SSE4_1). * sysdeps/x86_64/fpu/multiarch/s_ceilf.S: Likewise. * sysdeps/x86_64/fpu/multiarch/s_floor.S: Likewise. * sysdeps/x86_64/fpu/multiarch/s_floorf.S: Likewise. * sysdeps/x86_64/fpu/multiarch/s_nearbyint.S : Likewise. * sysdeps/x86_64/fpu/multiarch/s_nearbyintf.S: Likewise. * sysdeps/x86_64/fpu/multiarch/s_rintf.S: Likewise. * sysdeps/x86_64/fpu/multiarch/s_rintf.S : Likewise. * sysdeps/x86_64/multiarch/ifunc-impl-list.c: Likewise. * sysdeps/x86_64/multiarch/sched_cpucount.c: Likewise. * sysdeps/x86_64/multiarch/strstr.c: Likewise. * sysdeps/x86_64/multiarch/memmove.c: Likewise. * sysdeps/x86_64/multiarch/memmove_chk.c: Likewise. * sysdeps/x86_64/multiarch/test-multiarch.c: Likewise. * sysdeps/x86_64/multiarch/memcmp.S: Remove __init_cpu_features call. Add LOAD_RTLD_GLOBAL_RO_RDX. Replace HAS_XXX with HAS_CPU_FEATURE/HAS_ARCH_FEATURE (XXX). * sysdeps/x86_64/multiarch/memcpy.S: Likewise. * sysdeps/x86_64/multiarch/memcpy_chk.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy_chk.S: Likewise. * sysdeps/x86_64/multiarch/memset.S: Likewise. * sysdeps/x86_64/multiarch/memset_chk.S: Likewise. * sysdeps/x86_64/multiarch/strcat.S: Likewise. * sysdeps/x86_64/multiarch/strchr.S: Likewise. * sysdeps/x86_64/multiarch/strcmp.S: Likewise. * sysdeps/x86_64/multiarch/strcpy.S: Likewise. * sysdeps/x86_64/multiarch/strcspn.S: Likewise. * sysdeps/x86_64/multiarch/strspn.S: Likewise. * sysdeps/x86_64/multiarch/wcscpy.S: Likewise. * sysdeps/x86_64/multiarch/wmemcmp.S: Likewise.
* Add _dl_x86_cpu_features to rtld_globalH.J. Lu2015-08-1310-546/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds _dl_x86_cpu_features to rtld_global in x86 ld.so and initializes it early before __libc_start_main is called so that cpu_features is always available when it is used and we can avoid calling __init_cpu_features in IFUNC selectors. * sysdeps/i386/dl-machine.h: Include <cpu-features.c>. (dl_platform_init): Call init_cpu_features. * sysdeps/i386/dl-procinfo.c (_dl_x86_cpu_features): New. * sysdeps/i386/i686/cacheinfo.c (DISABLE_PREFERRED_MEMORY_INSTRUCTION): Removed. * sysdeps/i386/i686/multiarch/Makefile (aux): Remove init-arch. * sysdeps/i386/i686/multiarch/Versions: Removed. * sysdeps/i386/i686/multiarch/ifunc-defines.sym (KIND_OFFSET): Removed. * sysdeps/i386/ldsodefs.h: Include <cpu-features.h>. * sysdeps/unix/sysv/linux/x86/Makefile (libpthread-sysdep_routines): Remove init-arch. * sysdeps/unix/sysv/linux/x86_64/dl-procinfo.c: Include <sysdeps/x86_64/dl-procinfo.c> instead of sysdeps/generic/dl-procinfo.c>. * sysdeps/x86/Makefile [$(subdir) == csu] (gen-as-const-headers): Add cpu-features-offsets.sym and rtld-global-offsets.sym. [$(subdir) == elf] (sysdep-dl-routines): Add dl-get-cpu-features. [$(subdir) == elf] (tests): Add tst-get-cpu-features. [$(subdir) == elf] (tests-static): Add tst-get-cpu-features-static. * sysdeps/x86/Versions: New file. * sysdeps/x86/cpu-features-offsets.sym: Likewise. * sysdeps/x86/cpu-features.c: Likewise. * sysdeps/x86/cpu-features.h: Likewise. * sysdeps/x86/dl-get-cpu-features.c: Likewise. * sysdeps/x86/libc-start.c: Likewise. * sysdeps/x86/rtld-global-offsets.sym: Likewise. * sysdeps/x86/tst-get-cpu-features-static.c: Likewise. * sysdeps/x86/tst-get-cpu-features.c: Likewise. * sysdeps/x86_64/dl-procinfo.c: Likewise. * sysdeps/x86_64/cacheinfo.c (__cpuid_count): Removed. Assume USE_MULTIARCH is defined and don't check it. (is_intel): Replace __cpu_features with GLRO(dl_x86_cpu_features). (is_amd): Likewise. (max_cpuid): Likewise. (intel_check_word): Likewise. (__cache_sysconf): Don't call __init_cpu_features. (__x86_preferred_memory_instruction): Removed. (init_cacheinfo): Don't call __init_cpu_features. Replace __cpu_features with GLRO(dl_x86_cpu_features). * sysdeps/x86_64/dl-machine.h: <cpu-features.c>. (dl_platform_init): Call init_cpu_features. * sysdeps/x86_64/ldsodefs.h: Include <cpu-features.h>. * sysdeps/x86_64/multiarch/Makefile (aux): Remove init-arch. * sysdeps/x86_64/multiarch/Versions: Removed. * sysdeps/x86_64/multiarch/cacheinfo.c: Likewise. * sysdeps/x86_64/multiarch/init-arch.c: Likewise. * sysdeps/x86_64/multiarch/ifunc-defines.sym (KIND_OFFSET): Removed. * sysdeps/x86_64/multiarch/init-arch.h: Rewrite.
* Add more tests of various libm functions.Joseph Myers2015-08-111-12/+12
| | | | | | | | | | | | | | This patch adds more tests of various libm functions found through random test generation to give increased ulps on 32-bit x86. Tested for x86_64 and x86. * math/auto-libm-test-in: Add more tests of acosh, asin, asinh, atanh, cabs, carg, cbrt, cosh, csqrt, erf, erfc, exp, exp10, expm1, hypot, log, log10, log1p, log2, pow, sinh, tan and tgamma. * math/auto-libm-test-out: Regenerated. * sysdeps/i386/fpu/libm-test-ulps: Update. * sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
* Align stack to 16 bytes when calling __errno_locationH.J. Lu2015-08-053-0/+18
| | | | | | | | | | We should align stack to 16 bytes when calling __errno_location. [BZ #18661] * sysdeps/x86_64/fpu/s_cosf.S (__cosf): Align stack to 16 bytes when calling __errno_location. * sysdeps/x86_64/fpu/s_sincosf.S (__sincosf): Likewise. * sysdeps/x86_64/fpu/s_sinf.S (__sinf): Likewise.
* Compile {memcpy,strcmp}-sse2-unaligned.S only for libcH.J. Lu2015-08-052-0/+8
| | | | | | | | {memcpy,strcmp}-sse2-unaligned.S aren't needed in ld.so. * sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: Compile only for libc. * sysdeps/x86_64/multiarch/strcmp-sse2-unaligned.S: Likewise.
* Prevent runtime fail of SSE vector math tests on non SSE4.1 machine.Andrew Senkevich2015-07-301-2/+0
| | | | | | | | [BZ #18740] * sysdeps/x86_64/fpu/Makefile (double-vlen2-arch-ext-cflags, float-vlen4-arch-ext-cflags): Removed. * math/Makefile (CFLAGS-test-double-vlen2-wrappers.c, CFLAGS-test-float-vlen4-wrappers.c): Likewise.
* Extend local PLT reference checkH.J. Lu2015-07-291-0/+19
| | | | | | | | | | | | | | On x86, linker in binutils 2.26 and newer consolidates R_*_JUMP_SLOT with R_*_GLOB_DAT relocation against the same symbol. This patch extends local PLT reference check to support alternate relocations. [BZ #18078] * scripts/check-localplt.awk: Support alternate relocations. * scripts/localplt.awk: Also check relocations in DT_RELA/DT_REL sections. * sysdeps/unix/sysv/linux/i386/localplt.data: Mark free and malloc entries with + REL R_386_GLOB_DAT. * sysdeps/x86_64/localplt.data: New file.
* Added runtime check for AVX vector math tests.Andrew Senkevich2015-07-293-1/+27
| | | | | | | [BZ #18731] * sysdeps/x86_64/fpu/math-tests-arch.h: Added AVX runtime check. * sysdeps/x86_64/fpu/test-double-vlen4.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen8.c: Likewise.
* Fixed several libmvec bugs found during testing on KNL hardware.Andrew Senkevich2015-07-2415-223/+201
| | | | | | | | | | | | | | | | | | | | | | AVX512 IFUNC implementations, implementations of wrappers to AVX2 versions and KNL expf implementation fixed. * sysdeps/x86_64/fpu/multiarch/svml_d_cos8_core.S: Fixed AVX512 IFUNC. * sysdeps/x86_64/fpu/multiarch/svml_d_exp8_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_log8_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_pow8_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sin8_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_cosf16_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_logf16_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_powf16_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sinf16_core.S: Likewise. * sysdeps/x86_64/fpu/svml_d_wrapper_impl.h: Fixed wrappers to AVX2. * sysdeps/x86_64/fpu/svml_s_wrapper_impl.h: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core_avx512.S: Fixed KNL implementation.
* Modify several tests to use test-skeleton.cArjun Shankar2015-07-151-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These tests were skipped by the use-test-skeleton conversion done in commit 29955b5d because they were reused in other tests via the #include directive, and so deemed worth an inspection before they were modified. This has now been done. ChangeLog: 2015-07-09 Arjun Shankar <arjun.is@lostca.se> * elf/tst-leaks1.c (main): Converted to ... (do_test): ... this. (TEST_FUNCTION): New macro. Include test-skeleton.c. * localedata/tst-langinfo.c (main): Converted to ... (do_test): ... this. (TEST_FUNCTION): New macro. Include test-skeleton.c. * math/test-fpucw.c (main): Converted to ... (do_test): ... this. (TEST_FUNCTION): New macro. Include test-skeleton.c. * math/test-tgmath.c (main): Converted to ... (do_test): ... this. (TEST_FUNCTION): New macro. Include test-skeleton.c. * math/test-tgmath2.c (main): Converted to ... (do_test): ... this. (TEST_FUNCTION): New macro. Include test-skeleton.c. * setjmp/tst-setjmp.c (main): Converted to ... (do_test): ... this. (TEST_FUNCTION): New macro. Include test-skeleton.c. * stdio-common/tst-sscanf.c (main): Converted to ... (do_test): ... this. (TEST_FUNCTION): New macro. Include test-skeleton.c. * sysdeps/x86_64/tst-audit6.c (main): Converted to ... (do_test): ... this. (TEST_FUNCTION): New macro. Include test-skeleton.c.
* Improve bndmov encoding with zero displacementH.J. Lu2015-07-091-0/+8
| | | | | | | | | | If x86-64 assembler doesn't support MPX, we encode bndmov instruction by hand. When displacement is zero, assembler generates shorter encoding. This patch improves bndmov encoding with zero displacement so that ld.so is identical when using assemblers with and without MPX support. * sysdeps/x86_64/dl-trampoline.S (_dl_runtime_resolve): Improve bndmov encoding with zero displacement.
* Preserve bound registers for pointer pass/returnIgor Zamyatin2015-07-092-20/+25
| | | | | | | | | | | | | | | | | | | | | | | We need to save/restore bound registers and add a BND prefix before branches in _dl_runtime_profile so that bound registers for pointer pass and return are preserved when LD_AUDIT is used. [BZ #18134] * sysdeps/i386/configure.ac: Set HAVE_MPX_SUPPORT. * sysdeps/i386/configure: Regenerated. * sysdeps/i386/dl-trampoline.S (PRESERVE_BND_REGS_PREFIX): New. (_dl_runtime_profile): Save and restore Intel MPX return bound registers when calling _dl_call_pltexit. Add PRESERVE_BND_REGS_PREFIX before return. * sysdeps/i386/link-defines.sym (LRV_BND0_OFFSET): New. (LRV_BND1_OFFSET): Likewise. * sysdeps/x86/bits/link.h (La_i86_retval): Add lrv_bnd0 and lrv_bnd1. * sysdeps/x86_64/dl-trampoline.S (_dl_runtime_profile): Fix typo in bndmov encoding. * sysdeps/x86_64/dl-trampoline.h: Properly save and restore Intel MPX bound registers. Add PRESERVE_BND_REGS_PREFIX before branch instructions to preserve bounds.
* Add la_symbind32 to x86-64 audit testsH.J. Lu2015-07-076-0/+60
| | | | | | | | | | | | la_symbind32 is used for x32 in x86-64 audit tests. We should define both la_symbind32 and la_symbind64 in x86-64 audit tests. * sysdeps/x86_64/tst-auditmod10b.c (la_symbind32): New. * sysdeps/x86_64/tst-auditmod4b.c (la_symbind32): Likewise. * sysdeps/x86_64/tst-auditmod5b.c (la_symbind32): Likewise. * sysdeps/x86_64/tst-auditmod6b.c (la_symbind32): Likewise. * sysdeps/x86_64/tst-auditmod6c.c (la_symbind32): Likewise. * sysdeps/x86_64/tst-auditmod7b.c (la_symbind32): Likewise.
* Clean up BUSY_WAIT_NOP and atomic_delay.Torvald Riegel2015-06-301-1/+1
| | | | | | | | | This patch combines BUSY_WAIT_NOP and atomic_delay into a new atomic_spin_nop function and adjusts all clients. The new function is put into atomic.h because what is best done in a spin loop is architecture-specific, and atomics must be used for spinning. The function name is meant to tell users that this has no effect on synchronization semantics but is a performance aid for spinning.
* Improve tgamma accuracy (bug 18613).Joseph Myers2015-06-291-4/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In non-default rounding modes, tgamma can be slightly less accurate than permitted by glibc's accuracy goals. Part of the problem is error accumulation, addressed in this patch by setting round-to-nearest for internal computations. However, there was also a bug in the code dealing with computing pow (x + n, x + n) where x + n is not exactly representable, providing another source of error even in round-to-nearest mode; it was necessary to address both bugs to get errors for all testcases within glibc's accuracy goals. Given this second fix, accuracy in round-to-nearest mode is also improved (hence regeneration of ulps for tgamma should be from scratch - truncate libm-test-ulps or at least remove existing tgamma entries - so that the expected ulps can be reduced). Some additional complications also arose. Certain tgamma tests should strictly, according to IEEE semantics, overflow or not depending on the rounding mode; this is beyond the scope of glibc's accuracy goals for any function without exactly-determined results, but gen-auto-libm-tests doesn't handle being lax there as it does for underflow. (libm-test.inc also doesn't handle being lax about whether the result in cases very close to the overflow threshold is infinity or a finite value close to overflow, but that doesn't cause problems in this case though I've seen it cause problems with random test generation for some functions.) Thus, spurious-overflow markings, with a comment, are added to auto-libm-test-in (no bug in Bugzilla because the issue is with the testsuite, not a user-visible bug in glibc). And on x86, after the patch I saw ERANGE issues as previously reported by Carlos (see my commentary in <https://sourceware.org/ml/libc-alpha/2015-01/msg00485.html>), which needed addressing by ensuring excess range and precision were eliminated at various points if FLT_EVAL_METHOD != 0. I also noticed and fixed a cosmetic issue where 1.0f was used in long double functions and should have been 1.0L. This completes the move of all functions to testing in all rounding modes with ALL_RM_TEST, so gen-libm-have-vector-test.sh is updated to remove the workaround for some functions not using ALL_RM_TEST. Tested for x86_64, x86, mips64 and powerpc. [BZ #18613] * sysdeps/ieee754/dbl-64/e_gamma_r.c (gamma_positive): Take log of X_ADJ not X when adjusting exponent. (__ieee754_gamma_r): Do intermediate computations in round-to-nearest then adjust overflowing and underflowing results as needed. * sysdeps/ieee754/flt-32/e_gammaf_r.c (gammaf_positive): Take log of X_ADJ not X when adjusting exponent. (__ieee754_gammaf_r): Do intermediate computations in round-to-nearest then adjust overflowing and underflowing results as needed. * sysdeps/ieee754/ldbl-128/e_gammal_r.c (gammal_positive): Take log of X_ADJ not X when adjusting exponent. (__ieee754_gammal_r): Do intermediate computations in round-to-nearest then adjust overflowing and underflowing results as needed. Use 1.0L not 1.0f as numerator of division. * sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (gammal_positive): Take log of X_ADJ not X when adjusting exponent. (__ieee754_gammal_r): Do intermediate computations in round-to-nearest then adjust overflowing and underflowing results as needed. Use 1.0L not 1.0f as numerator of division. * sysdeps/ieee754/ldbl-96/e_gammal_r.c (gammal_positive): Take log of X_ADJ not X when adjusting exponent. (__ieee754_gammal_r): Do intermediate computations in round-to-nearest then adjust overflowing and underflowing results as needed. Use 1.0L not 1.0f as numerator of division. * math/libm-test.inc (tgamma_test_data): Remove one test. Moved to auto-libm-test-in. (tgamma_test): Use ALL_RM_TEST. * math/auto-libm-test-in: Add one test of tgamma. Mark some other tests of tgamma with spurious-overflow. * math/auto-libm-test-out: Regenerated. * math/gen-libm-have-vector-test.sh: Do not check for START. * sysdeps/i386/fpu/libm-test-ulps: Update. * sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
* Use round-to-nearest internally in jn, test with ALL_RM_TEST (bug 18602).Joseph Myers2015-06-251-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some existing jn tests, if run in non-default rounding modes, produce errors above those accepted in glibc, which causes problems for moving tests of jn to use ALL_RM_TEST. This patch makes jn set rounding to-nearest internally, as was done for yn some time ago, then computes the appropriate underflowing value for results that underflowed to zero in to-nearest, and moves the tests to ALL_RM_TEST. It does nothing about the general inaccuracy of Bessel function implementations in glibc, though it should make jn more accurate on average in non-default rounding modes through reduced error accumulation. The recomputation of results that underflowed to zero should as a side-effect fix some cases of bug 16559, where jn just used an exact zero, but that is *not* the goal of this patch and other cases of that bug remain unfixed. (Most of the changes in the patch are reindentation to add new scopes for SET_RESTORE_ROUND*.) Tested for x86_64, x86, powerpc and mips64. [BZ #16559] [BZ #18602] * sysdeps/ieee754/dbl-64/e_jn.c (__ieee754_jn): Set round-to-nearest internally then recompute results that underflowed to zero in the original rounding mode. * sysdeps/ieee754/flt-32/e_jnf.c (__ieee754_jnf): Likewise. * sysdeps/ieee754/ldbl-128/e_jnl.c (__ieee754_jnl): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_jnl.c (__ieee754_jnl): Likewise. * sysdeps/ieee754/ldbl-96/e_jnl.c (__ieee754_jnl): Likewise * math/libm-test.inc (jn_test): Use ALL_RM_TEST. * sysdeps/i386/fpu/libm-test-ulps: Update. * sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
* Refactor libm tests.Joseph Myers2015-06-248-8/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch refactors the libm tests using libm-test.inc to reduce the level of duplicate definitions. New headers are created for the definitions shared by tests for a particular type; by tests of inline functions; by tests of non-inline functions; by scalar tests; and by vector tests. The unused MATHCONST macro is removed. A new macro VEC_LEN is added to the vector headers to allow the macros defining wrappers for vector functions to be defined once, instead of six times each (differing only in vector length) as before. There is still scope for further refactoring, but this seems a useful start. Tested for x86_64. * math/test-double.h: New file. * math/test-float.h: Likewise. * math/test-ldouble.h: Likewise. * math/test-math-inline.h: Likewise. * math/test-math-no-inline.h: Likewise. * math/test-math-scalar.h: Likewise. * math/test-math-vector.h: Likewise. * math/test-vec-loop.h: Remove file. Contents moved into test-math-vector.h. * math/libm-test.inc (MATHCONST): Do not document macro. * math/test-double.c: Include test-double.h, test-math-no-inline.h and test-math-scalar.h. (FUNC): Remove macro. (FUNC_TEST): Likewise. (FLOAT): Likewise. (MATHCONST): Likewise. (PRINTF_EXPR): Likewise. (PRINTF_XEXPR): Likewise. (PRINTF_NEXPR): Likewise. (TEST_DOUBLE): Likewise. (TEST_MATHVEC): Likewise. (__NO_MATH_INLINES): Likewise. * math/test-float.c: Include test-float.h, test-math-no-inline.h and test-math-scalar.h. (FUNC): Remove macro. (FUNC_TEST): Likewise. (FLOAT): Likewise. (MATHCONST): Likewise. (PRINTF_EXPR): Likewise. (PRINTF_XEXPR): Likewise. (PRINTF_NEXPR): Likewise. (TEST_FLOAT): Likewise. (TEST_MATHVEC): Likewise. (__NO_MATH_INLINES): Likewise. * math/test-idouble.c: Include test-double.h, test-math-inline.h and test-math-scalar.h. (FUNC): Remove macro. (FUNC_TEST): Likewise. (FLOAT): Likewise. (MATHCONST): Likewise. (PRINTF_EXPR): Likewise. (PRINTF_XEXPR): Likewise. (PRINTF_NEXPR): Likewise. (TEST_DOUBLE): Likewise. (TEST_MATHVEC): Likewise. (TEST_INLINE): Likewise. (__NO_MATH_INLINES): Likewise. * math/test-ifloat.c: Include test-float.h, test-math-inline.h and test-math-scalar.h. (FUNC): Remove macro. (FUNC_TEST): Likewise. (FLOAT): Likewise. (MATHCONST): Likewise. (PRINTF_EXPR): Likewise. (PRINTF_XEXPR): Likewise. (PRINTF_NEXPR): Likewise. (TEST_FLOAT): Likewise. (TEST_MATHVEC): Likewise. (TEST_INLINE): Likewise. (__NO_MATH_INLINES): Likewise. * math/test-ildoubl.c: Include test-ldouble.h, test-math-inline.h and test-math-scalar.h. (FUNC): Remove macro. (FUNC_TEST): Likewise. (FLOAT): Likewise. (MATHCONST): Likewise. (PRINTF_EXPR): Likewise. (PRINTF_XEXPR): Likewise. (PRINTF_NEXPR): Likewise. (TEST_LDOUBLE): Likewise. (TEST_MATHVEC): Likewise. (TEST_INLINE): Likewise. (__NO_MATH_INLINES): Likewise. * math/test-ldouble.c: Include test-ldouble.h, test-math-no-inline.h and test-math-scalar.h. (FUNC): Remove macro. (FUNC_TEST): Likewise. (FLOAT): Likewise. (MATHCONST): Likewise. (PRINTF_EXPR): Likewise. (PRINTF_XEXPR): Likewise. (PRINTF_NEXPR): Likewise. (TEST_LDOUBLE): Likewise. (TEST_MATHVEC): Likewise. (__NO_MATH_INLINES): Likewise. * math/test-double-vlen2.h: Include test-double.h, test-math-no-inline.h and test-math-vector.h. (FLOAT): Remove macro. (FUNC): Likewise. (MATHCONST): Likewise. (PRINTF_EXPR): Likewise. (PRINTF_XEXPR): Likewise. (PRINTF_NEXPR): Likewise. (TEST_DOUBLE): Likewise. (TEST_MATHVEC): Likewise. (__NO_MATH_INLINES): Likewise. (CNCT): Likewise. (CONCAT): Likewise. (WRAPPER_NAME): Likewise. (WRAPPER_DECL): Likewise. (WRAPPER_DECL_ff): Likewise. (WRAPPER_DECL_fFF): Likewise. (VECTOR_WRAPPER): Likewise. (VECTOR_WRAPPER_ff): Likewise. (VECTOR_WRAPPER_fFF): Likewise. (VEC_LEN): New macro. * math/test-double-vlen4.h: Include test-double.h, test-math-no-inline.h and test-math-vector.h. (FLOAT): Remove macro. (FUNC): Likewise. (MATHCONST): Likewise. (PRINTF_EXPR): Likewise. (PRINTF_XEXPR): Likewise. (PRINTF_NEXPR): Likewise. (TEST_DOUBLE): Likewise. (TEST_MATHVEC): Likewise. (__NO_MATH_INLINES): Likewise. (CNCT): Likewise. (CONCAT): Likewise. (WRAPPER_NAME): Likewise. (WRAPPER_DECL): Likewise. (WRAPPER_DECL_ff): Likewise. (WRAPPER_DECL_fFF): Likewise. (VECTOR_WRAPPER): Likewise. (VECTOR_WRAPPER_ff): Likewise. (VECTOR_WRAPPER_fFF): Likewise. (VEC_LEN): New macro. * math/test-double-vlen8.h: Include test-double.h, test-math-no-inline.h and test-math-vector.h. (FLOAT): Remove macro. (FUNC): Likewise. (MATHCONST): Likewise. (PRINTF_EXPR): Likewise. (PRINTF_XEXPR): Likewise. (PRINTF_NEXPR): Likewise. (TEST_DOUBLE): Likewise. (TEST_MATHVEC): Likewise. (__NO_MATH_INLINES): Likewise. (CNCT): Likewise. (CONCAT): Likewise. (WRAPPER_NAME): Likewise. (WRAPPER_DECL): Likewise. (WRAPPER_DECL_ff): Likewise. (WRAPPER_DECL_fFF): Likewise. (VECTOR_WRAPPER): Likewise. (VECTOR_WRAPPER_ff): Likewise. (VECTOR_WRAPPER_fFF): Likewise. (VEC_LEN): New macro. * math/test-float-vlen4.h: Include test-float.h, test-math-no-inline.h and test-math-vector.h. (FLOAT): Remove macro. (FUNC): Likewise. (MATHCONST): Likewise. (PRINTF_EXPR): Likewise. (PRINTF_XEXPR): Likewise. (PRINTF_NEXPR): Likewise. (TEST_FLOAT): Likewise. (TEST_MATHVEC): Likewise. (__NO_MATH_INLINES): Likewise. (CNCT): Likewise. (CONCAT): Likewise. (WRAPPER_NAME): Likewise. (WRAPPER_DECL): Likewise. (WRAPPER_DECL_ff): Likewise. (WRAPPER_DECL_fFF): Likewise. (VECTOR_WRAPPER): Likewise. (VECTOR_WRAPPER_ff): Likewise. (VECTOR_WRAPPER_fFF): Likewise. (VEC_LEN): New macro. * math/test-float-vlen8.h: Include test-float.h, test-math-no-inline.h and test-math-vector.h. (FLOAT): Remove macro. (FUNC): Likewise. (MATHCONST): Likewise. (PRINTF_EXPR): Likewise. (PRINTF_XEXPR): Likewise. (PRINTF_NEXPR): Likewise. (TEST_FLOAT): Likewise. (TEST_MATHVEC): Likewise. (__NO_MATH_INLINES): Likewise. (CNCT): Likewise. (CONCAT): Likewise. (WRAPPER_NAME): Likewise. (WRAPPER_DECL): Likewise. (WRAPPER_DECL_ff): Likewise. (WRAPPER_DECL_fFF): Likewise. (VECTOR_WRAPPER): Likewise. (VECTOR_WRAPPER_ff): Likewise. (VECTOR_WRAPPER_fFF): Likewise. (VEC_LEN): New macro. * math/test-float-vlen16.h: Include test-float.h, test-math-no-inline.h and test-math-vector.h. (FLOAT): Remove macro. (FUNC): Likewise. (MATHCONST): Likewise. (PRINTF_EXPR): Likewise. (PRINTF_XEXPR): Likewise. (PRINTF_NEXPR): Likewise. (TEST_FLOAT): Likewise. (TEST_MATHVEC): Likewise. (__NO_MATH_INLINES): Likewise. (CNCT): Likewise. (CONCAT): Likewise. (WRAPPER_NAME): Likewise. (WRAPPER_DECL): Likewise. (WRAPPER_DECL_ff): Likewise. (WRAPPER_DECL_fFF): Likewise. (VECTOR_WRAPPER): Likewise. (VECTOR_WRAPPER_ff): Likewise. (VECTOR_WRAPPER_fFF): Likewise. (VEC_LEN): New macro. * sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c: Do not include test-vec-loop.h. * sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c: Likewise.
* Fix cexp, ccos, ccosh, csin, csinh spurious underflows (bug 18594).Joseph Myers2015-06-241-0/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cexp, ccos, ccosh, csin and csinh have spurious underflows in cases where they compute sin of the smallest normal, that produces an underflow exception (depending on which sin implementation is in use) but the final result does not underflow. ctan and ctanh may also have such underflows, or they may be latent (the issue there is that e.g. ctan (DBL_MIN) should, rounded upwards, be the next double value above DBL_MIN, which under glibc's accuracy goals may not have an underflow exception, but the intermediate computation of sin (DBL_MIN) would legitimately underflow on before-rounding architectures). This patch fixes all those functions so they use plain comparisons (> DBL_MIN etc.) instead of comparing the result of fpclassify with FP_SUBNORMAL (in all these cases, we already know the number being compared is finite). Note that in the case of csin / csinf / csinl, there is no need for fabs calls in the comparison because the real part has already been reduced to its absolute value. As the patch fixes the failures that previously obstructed moving tests of cexp to use ALL_RM_TEST, those tests are moved to ALL_RM_TEST by the patch (two functions remain yet to be converted). Tested for x86_64 and x86 and ulps updated accordingly. [BZ #18594] * math/s_ccosh.c (__ccosh): Compare with least normal value instead of comparing class with FP_SUBNORMAL. * math/s_ccoshf.c (__ccoshf): Likewise. * math/s_ccoshl.c (__ccoshl): Likewise. * math/s_cexp.c (__cexp): Likewise. * math/s_cexpf.c (__cexpf): Likewise. * math/s_cexpl.c (__cexpl): Likewise. * math/s_csin.c (__csin): Likewise. * math/s_csinf.c (__csinf): Likewise. * math/s_csinh.c (__csinh): Likewise. * math/s_csinhf.c (__csinhf): Likewise. * math/s_csinhl.c (__csinhl): Likewise. * math/s_csinl.c (__csinl): Likewise. * math/s_ctan.c (__ctan): Likewise. * math/s_ctanf.c (__ctanf): Likewise. * math/s_ctanh.c (__ctanh): Likewise. * math/s_ctanhf.c (__ctanhf): Likewise. * math/s_ctanhl.c (__ctanhl): Likewise. * math/s_ctanl.c (__ctanl): Likewise. * math/auto-libm-test-in: Add more tests of ccos, ccosh, cexp, csin, csinh, ctan and ctanh. * math/auto-libm-test-out: Regenerated. * math/libm-test.inc (cexp_test): Use ALL_RM_TEST. * sysdeps/i386/fpu/libm-test-ulps: Update. * sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
* Fix csin, csinh overflow in directed rounding modes (bug 18593).Joseph Myers2015-06-241-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | csin and csinh can produce bad results when overflowing in directed rounding modes, because a multiplication that can overflow is followed by a possible negation. This patch fixes this by negating one of the arguments of the multiplication before the multiplication instead of negating the result. The new tests for this issue are added to auto-libm-test-in, starting use of that file for csin and csinh. The issue was found in the course of moving existing tests for csin and csinh (existing tests, by being enabled in more cases than previously, showed the issue for float and double but not for long double); that move will now be done separately. Tested for x86_64 and x86 and ulps updated accordingly. [BZ #18593] * math/s_csin.c (__csin): Negate before rather than after possibly overflowing multiplication. * math/s_csinf.c (__csinf): Likewise. * math/s_csinh.c (__csinh): Likewise. * math/s_csinhf.c (__csinhf): Likewise. * math/s_csinhl.c (__csinhl): Likewise. * math/s_csinl.c (__csinl): Likewise. * math/auto-libm-test-in: Add some tests of csin and csinh. * math/auto-libm-test-out: Regenerated. * math/libm-test.inc (csin_test_data): Use AUTO_TESTS_c_c. (csinh_test_data): Likewise. * sysdeps/x86_64/fpu/libm-test-ulps: Update.
* Combination of data tables for x86_64 vector functions sinf, cosf and sincosf.Andrew Senkevich2015-06-2418-3586/+197
| | | | | | | | | | | | | | | | | | | | | | * sysdeps/x86_64/fpu/Makefile (libmvec-support): Fixed files list. * sysdeps/x86_64/fpu/multiarch/svml_s_cosf4_core_sse4.S: Renamed variable and included header. * sysdeps/x86_64/fpu/multiarch/svml_s_cosf8_core_avx2.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_cosf16_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sinf4_core_sse4.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sinf8_core_avx2.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sinf16_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core_sse4.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core_avx2.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/svml_s_trig_data.S: New file. * sysdeps/x86_64/fpu/svml_s_trig_data.h: Likewise. * sysdeps/x86_64/fpu/svml_s_cosf_data.S: Removed file. * sysdeps/x86_64/fpu/svml_s_cosf_data.h: Likewise. * sysdeps/x86_64/fpu/svml_s_sinf_data.S: Likewise. * sysdeps/x86_64/fpu/svml_s_sinf_data.h: Likewise. * sysdeps/x86_64/fpu/svml_s_sincosf_data.S: Likewise. * sysdeps/x86_64/fpu/svml_s_sincosf_data.h: Likewise.
* Fix atomic_full_barrier on x86 and x86_64.Torvald Riegel2015-06-231-0/+7
| | | | | | | | | This fixes BZ #17403 by defining atomic_full_barrier, atomic_read_barrier, and atomic_write_barrier on x86 and x86_64. A full barrier is implemented through an atomic idempotent modification to the stack and not through using mfence because the latter can supposedly be somewhat slower due to having to provide stronger guarantees wrt. self-modifying code, for example.
* Combination of data tables for x86_64 vector functions sin, cos and sincos.Andrew Senkevich2015-06-2320-439/+173
| | | | | | | | | | | | | | | | | | | | | | | | | * sysdeps/x86_64/fpu/Makefile (libmvec-support): Fixed files list. * sysdeps/x86_64/fpu/multiarch/svml_d_cos2_core_sse4.S: Renamed variable and included header. * sysdeps/x86_64/fpu/multiarch/svml_d_cos4_core_avx2.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_cos8_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sin2_core_sse4.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sin4_core_avx2.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sin8_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos2_core_sse4.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos4_core_avx2.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/svml_d_trig_data.S: New file. * sysdeps/x86_64/fpu/svml_d_trig_data.h: Likewise. * sysdeps/x86_64/fpu/svml_d_cos2_core.S: Removed unneeded include. * sysdeps/x86_64/fpu/svml_d_cos4_core.S: Likewise. * sysdeps/x86_64/fpu/svml_d_cos8_core.S: Likewise. * sysdeps/x86_64/fpu/svml_d_cos_data.S: Removed file. * sysdeps/x86_64/fpu/svml_d_cos_data.h: Likewise. * sysdeps/x86_64/fpu/svml_d_sin_data.S: Likewise. * sysdeps/x86_64/fpu/svml_d_sin_data.h: Likewise. * sysdeps/x86_64/fpu/svml_d_sincos_data.S: Likewise. * sysdeps/x86_64/fpu/svml_d_sincos_data.h: Likewise.
* Fix x86_64 / x86 expm1l (-min_subnorm) result sign (bug 18569).Joseph Myers2015-06-211-0/+6
| | | | | | | | | | | | | | | | | | | | In the x86 / x86_64 implementations of expm1l, when expm1l's result should underflow to 0 (argument minus the least subnormal, in some rounding modes), it can be a zero of the wrong sign. This patch fixes this by returning the argument with underflow forced in that case (this is a 1ulp error relative to the correctly rounded result of -0, which is OK in terms of the documented accuracy goals, whereas a result with the wrong sign never is). Tested for x86_64 and x86. [BZ #18569] * sysdeps/i386/fpu/e_expl.S (IEEE754_EXPL) [USE_AS_EXPM1L]: Force underflow and return argument in case of subnormal argument. * sysdeps/x86_64/fpu/e_expl.S (IEEE754_EXPL) [USE_AS_EXPM1L]: Likewise. * math/auto-libm-test-in: Add more tests of expm1. * math/auto-libm-test-out: Regenerated.
* Fix x86 / x86_64 expl, exp10l missing underflows (bug 16361).Joseph Myers2015-06-211-1/+14
| | | | | | | | | | | | | | | | | | | | | Similar to various other bugs in this area, the x86 and x86_64 implementations of expl / exp10l can fail to produce underflow exceptions when the unscaled result has trailing 0 bits so the scaling down to subnormal precision is exact. This patch fixes this by forcing the exception in the case of tiny results. Tested for x86_64 and x86. [BZ #16361] * sysdeps/i386/fpu/e_expl.S [!USE_AS_EXPM1L] (cmin): New object. [!USE_AS_EXPM1L] (IEEE754_EXPL): Force underflow exception for tiny results. * sysdeps/x86_64/fpu/e_expl.S [!USE_AS_EXPM1L] (cmin): New object. [!USE_AS_EXPM1L] (IEEE754_EXPL): Force underflow exception for tiny results. * math/auto-libm-test-in: Add more tests of exp and exp10. Do not mark underflow exceptions as possibly missing for bug 16361. * math/auto-libm-test-out: Regenerated.
* Vector sincosf for x86_64 and tests.Andrew Senkevich2015-06-1825-41/+2614
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here is implementation of vectorized sincosf containing SSE, AVX, AVX2 and AVX512 versions according to Vector ABI <https://groups.google.com/forum/#!topic/x86-64-abi/LmppCfN1rZ4>. * NEWS: Mention addition of x86_64 vector sincosf. * math/test-float-vlen16.h: Added wrapper for sincosf tests. * math/test-float-vlen4.h: Likewise. * math/test-float-vlen8.h: Likewise. * sysdeps/unix/sysv/linux/x86_64/libmvec.abilist: New symbols added. * sysdeps/x86/fpu/bits/math-vector.h: Added sincosf SIMD declaration. * sysdeps/x86_64/fpu/Makefile (libmvec-support): Added new files. * sysdeps/x86_64/fpu/Versions: New versions added. * sysdeps/x86_64/fpu/libm-test-ulps: Regenerated. * sysdeps/x86_64/fpu/multiarch/Makefile (libmvec-sysdep_routines): Added build of SSE, AVX2 and AVX512 IFUNC versions. * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core.S * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core_avx512.S * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core.S * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core_sse4.S * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core.S * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core_avx2.S * sysdeps/x86_64/fpu/svml_s_sincosf16_core.S * sysdeps/x86_64/fpu/svml_s_sincosf4_core.S * sysdeps/x86_64/fpu/svml_s_sincosf8_core.S * sysdeps/x86_64/fpu/svml_s_sincosf8_core_avx.S * sysdeps/x86_64/fpu/svml_s_sincosf_data.S: New file. * sysdeps/x86_64/fpu/svml_s_sincosf_data.h: New file. * sysdeps/x86_64/fpu/svml_s_wrapper_impl.h: Added 3 argument wrappers. * sysdeps/x86_64/fpu/test-float-vlen16.c: : Vector sincosf tests. * sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen4.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen8-avx2.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen8.c: Likewise.
* Vector sincos for x86_64 and tests.Andrew Senkevich2015-06-1825-1/+1770
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here is implementation of vectorized sincos containing SSE, AVX, AVX2 and AVX512 versions according to Vector ABI <https://groups.google.com/forum/#!topic/x86-64-abi/LmppCfN1rZ4>. * NEWS: Mention addition of x86_64 vector sincos. * bits/libm-simd-decl-stubs.h: Added stubs for sincos. * math/math.h (__MATHDECL_VEC): New macro. * math/bits/mathcalls.h: Added sincos declaration with __MATHDECL_VEC. * math/gen-libm-have-vector-test.sh: Added generation of sincos wrapper declaration under condition. * math/test-vec-loop.h (TEST_VEC_LOOP): Refactored. * math/test-double-vlen2.h: Added wrapper for sincos tests, reflected TEST_VEC_LOOP change. * math/test-double-vlen4.h: Likewise. * math/test-double-vlen8.h: Likewise. * math/test-float-vlen16.h: Reflected TEST_VEC_LOOP change. * math/test-float-vlen4.h: Likewise. * math/test-float-vlen8.h: Likewise. * sysdeps/unix/sysv/linux/x86_64/libmvec.abilist: New symbols added. * sysdeps/x86/fpu/bits/math-vector.h: Added sincos SIMD declaration. * sysdeps/x86_64/fpu/Makefile (libmvec-support): Added new files. * sysdeps/x86_64/fpu/Versions: New versions added. * sysdeps/x86_64/fpu/libm-test-ulps: Regenerated. * sysdeps/x86_64/fpu/multiarch/Makefile (libmvec-sysdep_routines): Added build of SSE, AVX2 and AVX512 IFUNC versions. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos2_core.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos2_core_sse4.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos4_core.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos4_core_avx2.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core_avx512.S: New file. * sysdeps/x86_64/fpu/svml_d_sincos2_core.S: New file. * sysdeps/x86_64/fpu/svml_d_sincos4_core.S: New file. * sysdeps/x86_64/fpu/svml_d_sincos4_core_avx.S: New file. * sysdeps/x86_64/fpu/svml_d_sincos8_core.S: New file. * sysdeps/x86_64/fpu/svml_d_sincos_data.S: New file. * sysdeps/x86_64/fpu/svml_d_sincos_data.h: New file. * sysdeps/x86_64/fpu/svml_d_wrapper_impl.h: Added wrappers for sincos. * sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c: Vector sincos tests. * sysdeps/x86_64/fpu/test-double-vlen2.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen4-avx2.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen4.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen8.c: Likewise.
* Vector powf for x86_64 and tests.Andrew Senkevich2015-06-1825-2/+5586
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here is implementation of vectorized powf containing SSE, AVX, AVX2 and AVX512 versions according to Vector ABI <https://groups.google.com/forum/#!topic/x86-64-abi/LmppCfN1rZ4>. * sysdeps/unix/sysv/linux/x86_64/libmvec.abilist: New symbols added. * sysdeps/x86/fpu/bits/math-vector.h: Added SIMD declaration and asm redirections for powf. * sysdeps/x86_64/fpu/Makefile (libmvec-support): Added new files. * sysdeps/x86_64/fpu/Versions: New versions added. * sysdeps/x86_64/fpu/libm-test-ulps: Regenerated. * sysdeps/x86_64/fpu/multiarch/Makefile (libmvec-sysdep_routines): Added build of SSE, AVX2 and AVX512 IFUNC versions. * sysdeps/x86_64/fpu/svml_s_wrapper_impl.h: Added 2 argument wrappers. * sysdeps/x86_64/fpu/multiarch/svml_s_powf16_core.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_s_powf16_core_avx512.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_s_powf4_core.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_s_powf4_core_sse4.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_s_powf8_core.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_s_powf8_core_avx2.S: New file. * sysdeps/x86_64/fpu/svml_s_powf16_core.S: New file. * sysdeps/x86_64/fpu/svml_s_powf4_core.S: New file. * sysdeps/x86_64/fpu/svml_s_powf8_core.S: New file. * sysdeps/x86_64/fpu/svml_s_powf8_core_avx.S: New file. * sysdeps/x86_64/fpu/svml_s_powf_data.S: New file. * sysdeps/x86_64/fpu/svml_s_powf_data.h: New file. * sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c: Vector powf tests. * sysdeps/x86_64/fpu/test-float-vlen16.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen4.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen8-avx2.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen8.c: Likewise. * math/test-float-vlen16.h: Fixed 2 argument macro. * math/test-float-vlen4.h: Likewise. * math/test-float-vlen8.h: Likewise. * NEWS: Mention addition of x86_64 vector powf.
* Vector pow for x86_64 and tests.Andrew Senkevich2015-06-1725-2/+6886
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here is implementation of vectorized pow containing SSE, AVX, AVX2 and AVX512 versions according to Vector ABI <https://groups.google.com/forum/#!topic/x86-64-abi/LmppCfN1rZ4>. * bits/libm-simd-decl-stubs.h: Added stubs for pow. * math/bits/mathcalls.h: Added pow declaration with __MATHCALL_VEC. * sysdeps/unix/sysv/linux/x86_64/libmvec.abilist: New versions added. * sysdeps/x86/fpu/bits/math-vector.h: Added SIMD declaration and asm redirections for pow. * sysdeps/x86_64/fpu/Makefile (libmvec-support): Added new files. * sysdeps/x86_64/fpu/Versions: New versions added. * sysdeps/x86_64/fpu/libm-test-ulps: Regenerated. * sysdeps/x86_64/fpu/multiarch/Makefile (libmvec-sysdep_routines): Added build of SSE, AVX2 and AVX512 IFUNC versions. * sysdeps/x86_64/fpu/svml_d_wrapper_impl.h: Added 2 argument wrappers. * sysdeps/x86_64/fpu/multiarch/svml_d_pow2_core.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_d_pow2_core_sse4.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_d_pow4_core.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_d_pow4_core_avx2.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_d_pow8_core.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_d_pow8_core_avx512.S: New file. * sysdeps/x86_64/fpu/svml_d_pow2_core.S: New file. * sysdeps/x86_64/fpu/svml_d_pow4_core.S: New file. * sysdeps/x86_64/fpu/svml_d_pow4_core_avx.S: New file. * sysdeps/x86_64/fpu/svml_d_pow8_core.S: New file. * sysdeps/x86_64/fpu/svml_d_pow_data.S: New file. * sysdeps/x86_64/fpu/svml_d_pow_data.h: New file. * sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c: Added vector pow test. * sysdeps/x86_64/fpu/test-double-vlen2.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen4-avx2.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen4.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen8.c: Likewise. * NEWS: Mention addition of x86_64 vector pow.
* Vector expf for x86_64 and tests.Andrew Senkevich2015-06-1724-1/+1214
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here is implementation of vectorized expf containing SSE, AVX, AVX2 and AVX512 versions according to Vector ABI <https://groups.google.com/forum/#!topic/x86-64-abi/LmppCfN1rZ4>. * sysdeps/unix/sysv/linux/x86_64/libmvec.abilist: New symbols added. * sysdeps/x86/fpu/bits/math-vector.h: Added SIMD declaration and asm redirections for expf. * sysdeps/x86_64/fpu/Makefile (libmvec-support): Added new files. * sysdeps/x86_64/fpu/Versions: New versions added. * sysdeps/x86_64/fpu/libm-test-ulps: Regenerated. * sysdeps/x86_64/fpu/multiarch/Makefile (libmvec-sysdep_routines): Added build of SSE, AVX2 and AVX512 IFUNC versions. * sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core_avx512.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_s_expf4_core.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_s_expf4_core_sse4.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_s_expf8_core.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_s_expf8_core_avx2.S: New file. * sysdeps/x86_64/fpu/svml_s_expf16_core.S: New file. * sysdeps/x86_64/fpu/svml_s_expf4_core.S: New file. * sysdeps/x86_64/fpu/svml_s_expf8_core.S: New file. * sysdeps/x86_64/fpu/svml_s_expf8_core_avx.S: New file. * sysdeps/x86_64/fpu/svml_s_expf_data.S: New file. * sysdeps/x86_64/fpu/svml_s_expf_data.h: New file. * sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c: Vector expf tests. * sysdeps/x86_64/fpu/test-float-vlen16.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen4.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen8-avx2.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen8.c: Likewise. * NEWS: Mention addition of x86_64 vector expf.
* Vector exp for x86_64 and tests.Andrew Senkevich2015-06-1724-2/+2281
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here is implementation of vectorized exp containing SSE, AVX, AVX2 and AVX512 versions according to Vector ABI <https://groups.google.com/forum/#!topic/x86-64-abi/LmppCfN1rZ4>. * bits/libm-simd-decl-stubs.h: Added stubs for exp. * math/bits/mathcalls.h: Added exp declaration with __MATHCALL_VEC. * sysdeps/unix/sysv/linux/x86_64/libmvec.abilist: New versions added. * sysdeps/x86/fpu/bits/math-vector.h: Added SIMD declaration and asm redirections for exp. * sysdeps/x86_64/fpu/Makefile (libmvec-support): Added new files. * sysdeps/x86_64/fpu/Versions: New versions added. * sysdeps/x86_64/fpu/libm-test-ulps: Regenerated. * sysdeps/x86_64/fpu/multiarch/Makefile (libmvec-sysdep_routines): Added build of SSE, AVX2 and AVX512 IFUNC versions. * sysdeps/x86_64/fpu/multiarch/svml_d_exp2_core.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_d_exp2_core_sse4.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_d_exp4_core.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_d_exp4_core_avx2.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_d_exp8_core.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_d_exp8_core_avx512.S: New file. * sysdeps/x86_64/fpu/svml_d_exp2_core.S: New file. * sysdeps/x86_64/fpu/svml_d_exp4_core.S: New file. * sysdeps/x86_64/fpu/svml_d_exp4_core_avx.S: New file. * sysdeps/x86_64/fpu/svml_d_exp8_core.S: New file. * sysdeps/x86_64/fpu/svml_d_exp_data.S: New file. * sysdeps/x86_64/fpu/svml_d_exp_data.h: New file. * sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c: Added vector exp test. * sysdeps/x86_64/fpu/test-double-vlen2.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen4-avx2.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen4.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen8.c: Likewise. * NEWS: Mention addition of x86_64 vector exp.
* Vector logf for x86_64 and tests.Andrew Senkevich2015-06-1724-2/+1191
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here is implementation of vectorized logf containing SSE, AVX, AVX2 and AVX512 versions according to Vector ABI <https://groups.google.com/forum/#!topic/x86-64-abi/LmppCfN1rZ4>. * sysdeps/unix/sysv/linux/x86_64/libmvec.abilist: New symbols added. * sysdeps/x86/fpu/bits/math-vector.h: Added SIMD declaration and asm redirections for logf. * sysdeps/x86_64/fpu/Makefile (libmvec-support): Added new files. * sysdeps/x86_64/fpu/Versions: New versions added. * sysdeps/x86_64/fpu/libm-test-ulps: Regenerated. * sysdeps/x86_64/fpu/multiarch/Makefile (libmvec-sysdep_routines): Added build of SSE, AVX2 and AVX512 IFUNC versions. * sysdeps/x86_64/fpu/multiarch/svml_s_logf16_core.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_s_logf16_core_avx512.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_s_logf4_core.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_s_logf4_core_sse4.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_s_logf8_core.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_s_logf8_core_avx2.S: New file. * sysdeps/x86_64/fpu/svml_s_logf16_core.S: New file. * sysdeps/x86_64/fpu/svml_s_logf4_core.S: New file. * sysdeps/x86_64/fpu/svml_s_logf8_core.S: New file. * sysdeps/x86_64/fpu/svml_s_logf8_core_avx.S: New file. * sysdeps/x86_64/fpu/svml_s_logf_data.S: New file. * sysdeps/x86_64/fpu/svml_s_logf_data.h: New file. * sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c: Vector logf tests. * sysdeps/x86_64/fpu/test-float-vlen16.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen4.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen8-avx2.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen8.c: Likewise. * NEWS: Mention addition of x86_64 vector logf.
* Vector log for x86_64 and tests.Andrew Senkevich2015-06-1724-0/+2871
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here is implementation of vectorized log containing SSE, AVX, AVX2 and AVX512 versions according to Vector ABI <https://groups.google.com/forum/#!topic/x86-64-abi/LmppCfN1rZ4>. * bits/libm-simd-decl-stubs.h: Added stubs for log. * math/bits/mathcalls.h: Added log declaration with __MATHCALL_VEC. * sysdeps/unix/sysv/linux/x86_64/libmvec.abilist: New versions added. * sysdeps/x86/fpu/bits/math-vector.h: Added SIMD declaration and asm redirections for log. * sysdeps/x86_64/fpu/Makefile (libmvec-support): Added new files. * sysdeps/x86_64/fpu/Versions: New versions added. * sysdeps/x86_64/fpu/libm-test-ulps: Regenerated. * sysdeps/x86_64/fpu/multiarch/Makefile (libmvec-sysdep_routines): Added build of SSE, AVX2 and AVX512 IFUNC versions. * sysdeps/x86_64/fpu/multiarch/svml_d_log2_core.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_d_log2_core_sse4.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_d_log4_core.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_d_log4_core_avx2.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_d_log8_core.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_d_log8_core_avx512.S: New file. * sysdeps/x86_64/fpu/svml_d_log2_core.S: New file. * sysdeps/x86_64/fpu/svml_d_log4_core.S: New file. * sysdeps/x86_64/fpu/svml_d_log4_core_avx.S: New file. * sysdeps/x86_64/fpu/svml_d_log8_core.S: New file. * sysdeps/x86_64/fpu/svml_d_log_data.S: New file. * sysdeps/x86_64/fpu/svml_d_log_data.h: New file. * sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c: Added vector log test. * sysdeps/x86_64/fpu/test-double-vlen2.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen4-avx2.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen4.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen8.c: Likewise. * NEWS: Mention addition of x86_64 vector log.
* Vector sinf for x86_64 and tests.Andrew Senkevich2015-06-1524-1/+2339
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here is implementation of vectorized sinf containing SSE, AVX, AVX2 and AVX512 versions according to Vector ABI <https://groups.google.com/forum/#!topic/x86-64-abi/LmppCfN1rZ4>. * sysdeps/unix/sysv/linux/x86_64/libmvec.abilist: New symbols added. * sysdeps/x86/fpu/bits/math-vector.h: Added SIMD declaration for sinf. * sysdeps/x86_64/fpu/Makefile (libmvec-support): Added new files. * sysdeps/x86_64/fpu/Versions: New versions added. * sysdeps/x86_64/fpu/libm-test-ulps: Regenerated. * sysdeps/x86_64/fpu/multiarch/Makefile (libmvec-sysdep_routines): Added build of SSE, AVX2 and AVX512 IFUNC versions. * sysdeps/x86_64/fpu/multiarch/svml_s_sinf16_core.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_s_sinf16_core_avx512.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_s_sinf4_core.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_s_sinf4_core_sse4.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_s_sinf8_core.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_s_sinf8_core_avx2.S: New file. * sysdeps/x86_64/fpu/svml_s_sinf16_core.S: New file. * sysdeps/x86_64/fpu/svml_s_sinf4_core.S: New file. * sysdeps/x86_64/fpu/svml_s_sinf8_core.S: New file. * sysdeps/x86_64/fpu/svml_s_sinf8_core_avx.S: New file. * sysdeps/x86_64/fpu/svml_s_sinf_data.S: New file. * sysdeps/x86_64/fpu/svml_s_sinf_data.h: New file. * sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c: Vector sinf tests. * sysdeps/x86_64/fpu/test-float-vlen16.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen4.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen8-avx2.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen8.c: Likewise. * NEWS: Mention addition of x86_64 vector sinf.
* Vector sin for x86_64 and tests.Andrew Senkevich2015-06-1124-3/+1290
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here is implementation of vectorized sin containing SSE, AVX, AVX2 and AVX512 versions according to Vector ABI <https://groups.google.com/forum/#!topic/x86-64-abi/LmppCfN1rZ4>. * bits/libm-simd-decl-stubs.h: Added stubs for sin. * math/bits/mathcalls.h: Added sin declaration with __MATHCALL_VEC. * sysdeps/unix/sysv/linux/x86_64/libmvec.abilist: New versions added. * sysdeps/x86/fpu/bits/math-vector.h: SIMD declaration for sin. * sysdeps/x86_64/fpu/Makefile (libmvec-support): Added new files. * sysdeps/x86_64/fpu/Versions: New versions added. * sysdeps/x86_64/fpu/libm-test-ulps: Regenerated. * sysdeps/x86_64/fpu/multiarch/Makefile (libmvec-sysdep_routines): Added build of SSE, AVX2 and AVX512 IFUNC versions. * sysdeps/x86_64/fpu/multiarch/svml_d_sin2_core.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_d_sin2_core_sse4.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_d_sin4_core.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_d_sin4_core_avx2.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_d_sin8_core.S: New file. * sysdeps/x86_64/fpu/multiarch/svml_d_sin8_core_avx512.S: New file. * sysdeps/x86_64/fpu/svml_d_sin2_core.S: New file. * sysdeps/x86_64/fpu/svml_d_sin4_core.S: New file. * sysdeps/x86_64/fpu/svml_d_sin4_core_avx.S: New file. * sysdeps/x86_64/fpu/svml_d_sin8_core.S: New file. * sysdeps/x86_64/fpu/svml_d_sin_data.S: New file. * sysdeps/x86_64/fpu/svml_d_sin_data.h: New file. * sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c: Added vector sin test. * sysdeps/x86_64/fpu/test-double-vlen2.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen4-avx2.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen4.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen8.c: Likewise. * NEWS: Mention addition of x86_64 vector sin.
* More strict check of AVX512 support in assembler.Andrew Senkevich2015-06-112-0/+2
| | | | | | | | Binutils 2.24 doesn't support some AVX512 instructions with ZMM registers, so we need add more strict check. * configure.ac: Added more strict check. * configure: Regenerated.