about summary refs log tree commit diff
path: root/sysdeps/x86_64
Commit message (Collapse)AuthorAgeFilesLines
* x86-64: Add memcmp family functions with 256-bit EVEXH.J. Lu2021-03-295-4/+467
| | | | | | | Update ifunc-memcmp.h to select the function optimized with 256-bit EVEX instructions using YMM16-YMM31 registers to avoid RTM abort with usable AVX512VL, AVX512BW and MOVBE since VZEROUPPER isn't needed at function exit.
* x86-64: Add memset family functions with 256-bit EVEXH.J. Lu2021-03-296-14/+90
| | | | | | | Update ifunc-memset.h/ifunc-wmemset.h to select the function optimized with 256-bit EVEX instructions using YMM16-YMM31 registers to avoid RTM abort with usable AVX512VL and AVX512BW since VZEROUPPER isn't needed at function exit.
* x86-64: Add memmove family functions with 256-bit EVEXH.J. Lu2021-03-295-11/+104
| | | | | | Update ifunc-memmove.h to select the function optimized with 256-bit EVEX instructions using YMM16-YMM31 registers to avoid RTM abort with usable AVX512VL since VZEROUPPER isn't needed at function exit.
* x86-64: Add strcpy family functions with 256-bit EVEXH.J. Lu2021-03-299-3/+1339
| | | | | | Update ifunc-strcpy.h to select the function optimized with 256-bit EVEX instructions using YMM16-YMM31 registers to avoid RTM abort with usable AVX512VL and AVX512BW since VZEROUPPER isn't needed at function exit.
* x86-64: Add ifunc-avx2.h functions with 256-bit EVEXH.J. Lu2021-03-2924-17/+2995
| | | | | | | | | | Update ifunc-avx2.h, strchr.c, strcmp.c, strncmp.c and wcsnlen.c to select the function optimized with 256-bit EVEX instructions using YMM16-YMM31 registers to avoid RTM abort with usable AVX512VL, AVX512BW and BMI2 since VZEROUPPER isn't needed at function exit. For strcmp/strncmp, prefer AVX2 strcmp/strncmp if Prefer_AVX2_STRCMP is set.
* nptl: Remove MULTI_PAGE_ALIASING [BZ #23554]H.J. Lu2021-03-191-1/+0
| | | | | MULTI_PAGE_ALIASING was introduced to mitigate an aliasing issue on Pentium 4. It is no longer needed for processors after Pentium 4.
* math: Remove mpa files [BZ #15267]Wilco Dijkstra2021-03-1118-183/+3
| | | | | | | Finally remove all mpa related files, headers, declarations, probes, unused tables and update makefiles. Reviewed-By: Paul Zimmermann <Paul.Zimmermann@inria.fr>
* math: Remove slow paths from asin and acos [BZ #15267]Wilco Dijkstra2021-03-111-1/+3
| | | | | | | | | | | This patch series removes all remaining slow paths and related code. First asin/acos, tan, atan, atan2 implementations are updated, and the final patch removes the unused mpa files, headers and probes. Passes buildmanyglibc. Remove slow paths from asin/acos. Add ULP annotations based on previous slow path checks (which are approximate). Update AArch64 and x86_64 libm-test-ulps. Reviewed-By: Paul Zimmermann <Paul.Zimmermann@inria.fr>
* Add inputs that generate larger error boundsPaul Zimmermann2021-02-271-24/+26
| | | | (Using values from https://members.loria.fr/PZimmermann/papers/accuracy.pdf)
* Reduce the statically linked startup code [BZ #23323]Florian Weimer2021-02-251-8/+4
| | | | | | | | | | | | | | | | | | | It turns out the startup code in csu/elf-init.c has a perfect pair of ROP gadgets (see Marco-Gisbert and Ripoll-Ripoll, "return-to-csu: A New Method to Bypass 64-bit Linux ASLR"). These functions are not needed in dynamically-linked binaries because DT_INIT/DT_INIT_ARRAY are already processed by the dynamic linker. However, the dynamic linker skipped the main program for some reason. For maximum backwards compatibility, this is not changed, and instead, the main map is consulted from __libc_start_main if the init function argument is a NULL pointer. For statically linked binaries, the old approach based on linker symbols is still used because there is nothing else available. A new symbol version __libc_start_main@@GLIBC_2.34 is introduced because new binaries running on an old libc would not run their ELF constructors, leading to difficult-to-debug issues.
* x86: Use x86/nptl/pthreaddef.hH.J. Lu2021-02-221-47/+0
| | | | | | | 1. Move sysdeps/i386/nptl/pthreaddef.h to sysdeps/x86/nptl/pthreaddef.h. 2. Remove sysdeps/x86_64/nptl/pthreaddef.h. Reviewed-by: DJ Delorie <dj@redhat.com>
* x86-64: Refactor and improve performance of strchr-avx2.Snoah2021-02-082-113/+113
| | | | | | | | No bug. Just seemed the performance could be improved a bit. Observed and expected behavior are unchanged. Optimized body of main loop. Updated page cross logic and optimized accordingly. Made a few minor instruction selection modifications. No regressions in test suite. Both test-strchrnul and test-strchr passed.
* x86: Adding an upper bound for Enhanced REP MOVSB.Sajan Karumanchi2021-02-021-2/+5
| | | | | | | | | | | | | | | In the process of optimizing memcpy for AMD machines, we have found the vector move operations are outperforming enhanced REP MOVSB for data transfers above the L2 cache size on Zen3 architectures. To handle this use case, we are adding an upper bound parameter on enhanced REP MOVSB:'__x86_rep_movsb_stop_threshold'. As per large-bench results, we are configuring this parameter to the L2 cache size for AMD machines and applicable from Zen3 architecture supporting the ERMS feature. For architectures other than AMD, it is the computed value of non-temporal threshold parameter. Reviewed-by: Premachandra Mallappa <premachandra.mallappa@amd.com>
* configure: Check for static PIE supportSzabolcs Nagy2021-01-212-0/+6
| | | | | | | | | | | | | | Add SUPPORT_STATIC_PIE that targets can define if they support static PIE. This requires PI_STATIC_AND_HIDDEN support and various linker features as described in commit 9d7a3741c9e59eba87fb3ca6b9f979befce07826 Add --enable-static-pie configure option to build static PIE [BZ #19574] Currently defined on x86_64, i386 and aarch64 where static PIE is known to work. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* <sys/platform/x86.h>: Remove the C preprocessor magicH.J. Lu2021-01-212-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In <sys/platform/x86.h>, define CPU features as enum instead of using the C preprocessor magic to make it easier to wrap this functionality in other languages. Move the C preprocessor magic to internal header for better GCC codegen when more than one features are checked in a single expression as in x86-64 dl-hwcaps-subdirs.c. 1. Rename COMMON_CPUID_INDEX_XXX to CPUID_INDEX_XXX. 2. Move CPUID_INDEX_MAX to sysdeps/x86/include/cpu-features.h. 3. Remove struct cpu_features and __x86_get_cpu_features from <sys/platform/x86.h>. 4. Add __x86_get_cpuid_feature_leaf to <sys/platform/x86.h> and put it in libc. 5. Make __get_cpu_features() private to glibc. 6. Replace __x86_get_cpu_features(N) with __get_cpu_features(). 7. Add _dl_x86_get_cpu_features to GLIBC_PRIVATE. 8. Use a single enum index for each CPU feature detection. 9. Pass the CPUID feature leaf to __x86_get_cpuid_feature_leaf. 10. Return zero struct cpuid_feature for the older glibc binary with a smaller CPUID_INDEX_MAX [BZ #27104]. 11. Inside glibc, use the C preprocessor magic so that cpu_features data can be loaded just once leading to more compact code for glibc. 256 bits are used for each CPUID leaf. Some leaves only contain a few features. We can add exceptions to such leaves. But it will increase code sizes and it is harder to provide backward/forward compatibilities when new features are added to such leaves in the future. When new leaves are added, _rtld_global_ro offsets will change which leads to race condition during in-place updates. We may avoid in-place updates by 1. Rename the old glibc. 2. Install the new glibc. 3. Remove the old glibc. NB: A function, __x86_get_cpuid_feature_leaf , is used to avoid the copy relocation issue with IFUNC resolver as shown in IFUNC resolver tests.
* libmvec: Add extra-test-objs to test-extrasH.J. Lu2021-01-191-0/+8
| | | | | | | Add extra-test-objs to test-extras so that they are compiled with -DMODULE_NAME=testsuite instead of -DMODULE_NAME=libc. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* Fix x86 build with --enable-tunable=noAdhemerval Zanella2021-01-141-0/+1
| | | | Checked on x86_64-linux-gnu.
* x86: Support GNU_PROPERTY_X86_ISA_1_V[234] marker [BZ #26717]H.J. Lu2021-01-072-52/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GCC 11 supports -march=x86-64-v[234] to enable x86 micro-architecture ISA levels: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97250 and -mneeded to emit GNU_PROPERTY_X86_ISA_1_NEEDED property with GNU_PROPERTY_X86_ISA_1_V[234] marker: https://gitlab.com/x86-psABIs/x86-64-ABI/-/merge_requests/13 Binutils support for GNU_PROPERTY_X86_ISA_1_V[234] marker were added by commit b0ab06937385e0ae25cebf1991787d64f439bf12 Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Oct 30 06:49:57 2020 -0700 x86: Support GNU_PROPERTY_X86_ISA_1_BASELINE marker and commit 32930e4edbc06bc6f10c435dbcc63131715df678 Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Oct 9 05:05:57 2020 -0700 x86: Support GNU_PROPERTY_X86_ISA_1_V[234] marker GNU_PROPERTY_X86_ISA_1_NEEDED property in x86 ELF binaries indicate the micro-architecture ISA level required to execute the binary. The marker must be added by programmers explicitly in one of 3 ways: 1. Pass -mneeded to GCC. 2. Add the marker in the linker inputs as this patch does. 3. Pass -z x86-64-v[234] to the linker. Add GNU_PROPERTY_X86_ISA_1_BASELINE and GNU_PROPERTY_X86_ISA_1_V[234] marker support to ld.so if binutils 2.32 or newer is used to build glibc: 1. Add GNU_PROPERTY_X86_ISA_1_BASELINE and GNU_PROPERTY_X86_ISA_1_V[234] markers to elf.h. 2. Add GNU_PROPERTY_X86_ISA_1_BASELINE and GNU_PROPERTY_X86_ISA_1_V[234] marker to abi-note.o based on the ISA level used to compile abi-note.o, assuming that the same ISA level is used to compile the whole glibc. 3. Add isa_1 to cpu_features to record the supported x86 ISA level. 4. Rename _dl_process_cet_property_note to _dl_process_property_note and add GNU_PROPERTY_X86_ISA_1_V[234] marker detection. 5. Update _rtld_main_check and _dl_open_check to check loaded objects with the incompatible ISA level. 6. Add a testcase to verify that dlopen an x86-64-v4 shared object fails on lesser platforms. 7. Use <get-isa-level.h> in dl-hwcaps-subdirs.c and tst-glibc-hwcaps.c. Tested under i686, x32 and x86-64 modes on x86-64-v2, x86-64-v3 and x86-64-v4 machines. Marked elf/tst-isa-level-1 with x86-64-v4, ran it on x86-64-v3 machine and got: [hjl@gnu-cfl-2 build-x86_64-linux]$ ./elf/tst-isa-level-1 ./elf/tst-isa-level-1: CPU ISA level is lower than required [hjl@gnu-cfl-2 build-x86_64-linux]$
* Remove dbl-64/wordsize-64 (part 2)Wilco Dijkstra2021-01-071-1/+0
| | | | | | | | Remove the wordsize-64 implementations by merging them into the main dbl-64 directory. The second patch just moves all wordsize-64 files and removes a few wordsize-64 uses in comments and Implies files. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* x86: Check IFUNC definition in unrelocated executable [BZ #20019]H.J. Lu2021-01-041-5/+11
| | | | | | Calling an IFUNC function defined in unrelocated executable also leads to segfault. Issue a fatal error message when calling IFUNC function defined in the unrelocated executable from a shared library.
* x86-64: Avoid rep movsb with short distance [BZ #27130]H.J. Lu2021-01-041-0/+21
| | | | | | | | | | | | | | | | | When copying with "rep movsb", if the distance between source and destination is N*4GB + [1..63] with N >= 0, performance may be very slow. This patch updates memmove-vec-unaligned-erms.S for AVX and AVX512 versions with the distance in RCX: cmpl $63, %ecx // Don't use "rep movsb" if ECX <= 63 jbe L(Don't use rep movsb") Use "rep movsb" Benchtests data with bench-memcpy, bench-memcpy-large, bench-memcpy-random and bench-memcpy-walk on Skylake, Ice Lake and Tiger Lake show that its performance impact is within noise range as "rep movsb" is only used for data size >= 4KB.
* Update copyright dates with scripts/update-copyrightsPaul Eggert2021-01-02515-515/+515
| | | | | | | | | | | | | | | | I used these shell commands: ../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright (cd ../glibc && git commit -am"[this commit message]") and then ignored the output, which consisted lines saying "FOO: warning: copyright statement not found" for each of 6694 files FOO. I then removed trailing white space from benchtests/bench-pthread-locks.c and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this diagnostic from Savannah: remote: *** pre-commit check failed ... remote: *** error: lines with trailing whitespace found remote: error: hook declined to update refs/heads/master
* x86 long double: Support pseudo numbers in isnanlSiddhesh Poyarekar2020-12-241-1/+0
| | | | | | | This syncs up isnanl behaviour with gcc. Also move the isnanl implementation to sysdeps/x86 and remove the sysdeps/x86_64 version. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* x86 long double: Support pseudo numbers in fpclassifylSiddhesh Poyarekar2020-12-241-2/+0
| | | | | | | | Also move sysdeps/i386/fpu/s_fpclassifyl.c to sysdeps/x86/fpu/s_fpclassifyl.c and remove sysdeps/x86_64/fpu/s_fpclassifyl.c Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* add inputs to auto-libm-test-in yielding larger errors (binary64, x86_64)Paul Zimmermann2020-12-211-11/+13
|
* ieee754: Remove unused __sin32 and __cos32Anssi Hannula2020-12-184-8/+0
| | | | | The __sin32 and __cos32 functions were only used in the now removed slow path of asin and acos.
* x86_64: Add glibc-hwcaps supportFlorian Weimer2020-12-043-0/+181
| | | | | | | | The subdirectories match those in the x86-64 psABI: https://gitlab.com/x86-psABIs/x86-64-ABI/-/commit/77566eb03bc6a326811cb7e9a6b9396884b67c7c Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* x86: Fix THREAD_SELF definition to avoid ld.so crash (bug 27004)Jakub Jelinek2020-12-031-1/+6
| | | | | | | | | | | The previous definition of THREAD_SELF did not tell the compiler that %fs (or %gs) usage is invalid for the !DL_LOOKUP_GSCOPE_LOCK case in _dl_lookup_symbol_x. As a result, ld.so could try to use the TCB before it was initialized. As the comment in tls.h explains, asm volatile is undesirable here. Using the __seg_fs (or __seg_gs) namespace does not interfere with optimization, and expresses that THREAD_SELF is potentially trapping.
* nptl: Move stack list variables into _rtld_globalFlorian Weimer2020-11-161-2/+0
| | | | | | | | | Now __thread_gscope_wait (the function behind THREAD_GSCOPE_WAIT, formerly __wait_lookup_done) can be implemented directly in ld.so, eliminating the unprotected GL (dl_wait_lookup_done) function pointer. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* Remove tls.h inclusion from internal errno.hAdhemerval Zanella2020-11-131-2/+8
| | | | | | | | | | | | The tls.h inclusion is not really required and limits possible definition on more arch specific headers. This is a cleanup to allow inline functions on sysdep.h, more specifically on i386 and ia64 which requires to access some tls definitions its own. No semantic changes expected, checked with a build against all affected ABIs.
* x86: Remove UP macro. Define LOCK_PREFIX unconditionally.Florian Weimer2020-11-132-14/+2
| | | | | | | The UP macro is never defined. Also define LOCK_PREFIX unconditionally, to the same string. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* hurd: keep only required PLTs in ld.soSamuel Thibault2020-11-111-4/+0
| | | | | | | | | | | | | | | | | | | | | We need NO_RTLD_HIDDEN because of the need for PLT calls in ld.so. See Roland's comment in https://sourceware.org/bugzilla/show_bug.cgi?id=15605 "in the Hurd it's crucial that calls like __mmap be the libc ones instead of the rtld-local ones after the bootstrap phase, when the dynamic linker is being used for dlopen and the like." We used to just avoid all hidden use in the rtld ; this commit switches to keeping only those that should use PLT calls, i.e. essentially those defined in sysdeps/mach/hurd/dl-sysdep.c: __assert_fail __assert_perror_fail __*stat64 _exit This fixes a few startup issues, notably the call to __tunable_get_val that is made before PLTs are set up.
* x86: Initialize CPU info via IFUNC relocation [BZ 26203]H.J. Lu2020-10-161-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | X86 CPU features in ld.so are initialized by init_cpu_features, which is invoked by DL_PLATFORM_INIT from _dl_sysdep_start. But when ld.so is loaded by static executable, DL_PLATFORM_INIT is never called. Also x86 cache info in libc.o and libc.a is initialized by a constructor which may be called too late. Since some fields in _rtld_global_ro in ld.so are initialized by dynamic relocation, we can also initialize x86 CPU features in _rtld_global_ro in ld.so and cache info in libc.so by initializing dummy function pointers in ld.so and libc.so via IFUNC relocation. Key points: 1. IFUNC is always supported, independent of --enable-multi-arch or --disable-multi-arch. Linker generates IFUNC relocations from input IFUNC objects and ld.so performs IFUNC relocations. 2. There are no IFUNC dependencies in ld.so before dynamic relocation have been performed, 3. The x86 CPU features in ld.so is initialized by DL_PLATFORM_INIT in dynamic executable and by IFUNC relocation in dlopen in static executable. 4. The x86 cache info in libc.o is initialized by IFUNC relocation. 5. In libc.a, both x86 CPU features and cache info are initialized from ARCH_INIT_CPU_FEATURES, not by IFUNC relocation, before __libc_early_init is called. Note: _dl_x86_init_cpu_features can be called more than once from DL_PLATFORM_INIT and during relocation in ld.so.
* aarch64: enforce >=64K guard size [BZ #26691]Szabolcs Nagy2020-10-021-0/+3
| | | | | | | | | | | | | | | | | | | | | | | There are several compiler implementations that allow large stack allocations to jump over the guard page at the end of the stack and corrupt memory beyond that. See CVE-2017-1000364. Compilers can emit code to probe the stack such that the guard page cannot be skipped, but on aarch64 the probe interval is 64K by default instead of the minimum supported page size (4K). This patch enforces at least 64K guard on aarch64 unless the guard is disabled by setting its size to 0. For backward compatibility reasons the increased guard is not reported, so it is only observable by exhausting the address space or parsing /proc/self/maps on linux. On other targets the patch has no effect. If the stack probe interval is larger than a page size on a target then ARCH_MIN_GUARD_SIZE can be defined to get large enough stack guard on libc allocated stacks. The patch does not affect threads with user allocated stacks. Fixes bug 26691.
* x86: Use one ldbl2mpn.c file for both i386 and x86_64Florian Weimer2020-09-221-1/+0
|
* x86: Install <sys/platform/x86.h> [BZ #26124]H.J. Lu2020-09-113-101/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Install <sys/platform/x86.h> so that programmers can do #if __has_include(<sys/platform/x86.h>) #include <sys/platform/x86.h> #endif ... if (CPU_FEATURE_USABLE (SSE2)) ... if (CPU_FEATURE_USABLE (AVX2)) ... <sys/platform/x86.h> exports only: enum { COMMON_CPUID_INDEX_1 = 0, COMMON_CPUID_INDEX_7, COMMON_CPUID_INDEX_80000001, COMMON_CPUID_INDEX_D_ECX_1, COMMON_CPUID_INDEX_80000007, COMMON_CPUID_INDEX_80000008, COMMON_CPUID_INDEX_7_ECX_1, /* Keep the following line at the end. */ COMMON_CPUID_INDEX_MAX }; struct cpuid_features { struct cpuid_registers cpuid; struct cpuid_registers usable; }; struct cpu_features { struct cpu_features_basic basic; struct cpuid_features features[COMMON_CPUID_INDEX_MAX]; }; /* Get a pointer to the CPU features structure. */ extern const struct cpu_features *__x86_get_cpu_features (unsigned int max) __attribute__ ((const)); Since all feature checks are done through macros, programs compiled with a newer <sys/platform/x86.h> are compatible with the older glibc binaries as long as the layout of struct cpu_features is identical. The features array can be expanded with backward binary compatibility for both .o and .so files. When COMMON_CPUID_INDEX_MAX is increased to support new processor features, __x86_get_cpu_features in the older glibc binaries returns NULL and HAS_CPU_FEATURE/CPU_FEATURE_USABLE return false on the new processor feature. No new symbol version is neeeded. Both CPU_FEATURE_USABLE and HAS_CPU_FEATURE are provided. HAS_CPU_FEATURE can be used to identify processor features. Note: Although GCC has __builtin_cpu_supports, it only supports a subset of <sys/platform/x86.h> and it is equivalent to CPU_FEATURE_USABLE. It doesn't support HAS_CPU_FEATURE.
* x86-64: Fix FMA4 detection in ifunc [BZ #26534]Ondřej Hošek2020-09-021-1/+1
| | | | | | A typo in commit 107e6a3c2212ba7a3a4ec7cae8d82d73f7c95d0b causes the FMA4 code path to be taken on systems that support FMA, even if they do not support FMA4. Fix this to detect FMA4.
* math: Update x86_64 ulpsAdhemerval Zanella2020-08-081-3/+3
| | | | From new j0 test.
* Update x86-64 libm-test-ulpsAndreas K. Hüttel2020-07-251-2/+2
| | | | | | x86_64 Intel(R) Core(TM) i5-8265U gcc (Gentoo 10.1.0-r2 p3) 10.1.0 Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* x86: Support usable check for all CPU featuresH.J. Lu2020-07-1328-187/+187
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Support usable check for all CPU features with the following changes: 1. Change struct cpu_features to struct cpuid_features { struct cpuid_registers cpuid; struct cpuid_registers usable; }; struct cpu_features { struct cpu_features_basic basic; struct cpuid_features features[COMMON_CPUID_INDEX_MAX]; unsigned int preferred[PREFERRED_FEATURE_INDEX_MAX]; ... }; so that there is a usable bit for each cpuid bit. 2. After the cpuid bits have been initialized, copy the known bits to the usable bits. EAX/EBX from INDEX_1 and EAX from INDEX_7 aren't used for CPU feature detection. 3. Clear the usable bits which require OS support. 4. If the feature is supported by OS, copy its cpuid bit to its usable bit. 5. Replace HAS_CPU_FEATURE and CPU_FEATURES_CPU_P with CPU_FEATURE_USABLE and CPU_FEATURE_USABLE_P to check if a feature is usable. 6. Add DEPR_FPU_CS_DS for INDEX_7_EBX_13. 7. Unset MPX feature since it has been deprecated. The results are 1. If the feature is known and doesn't requre OS support, its usable bit is copied from the cpuid bit. 2. Otherwise, its usable bit is copied from the cpuid bit only if the feature is known to supported by OS. 3. CPU_FEATURE_USABLE/CPU_FEATURE_USABLE_P are used to check if the feature can be used. 4. HAS_CPU_FEATURE/CPU_FEATURE_CPU_P are used to check if CPU supports the feature.
* x86: Remove the unused __x86_prefetchwH.J. Lu2020-07-111-1/+1
| | | | | | | | | | | | | Since commit c867597bff2562180a18da4b8dba89d24e8b65c4 Author: H.J. Lu <hjl.tools@gmail.com> Date: Wed Jun 8 13:57:50 2016 -0700 X86-64: Remove previous default/SSE2/AVX2 memcpy/memmove removed the only usage of __x86_prefetchw, we can remove the unused __x86_prefetchw.
* x86: Add thresholds for "rep movsb/stosb" to tunablesH.J. Lu2020-07-062-26/+2
| | | | | | | | | | Add x86_rep_movsb_threshold and x86_rep_stosb_threshold to tunables to update thresholds for "rep movsb" and "rep stosb" at run-time. Note that the user specified threshold for "rep movsb" smaller than the minimum threshold will be ignored. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* i386: Use builtin sqrtlAdhemerval Zanella2020-06-221-27/+0
| | | | Checked on i686-linux-gnu.
* x86_64: Use builtin sqrt{f,l}Adhemerval Zanella2020-06-224-65/+31
| | | | Checked on x86_64-linux-gnu.
* Fix avx2 strncmp offset compare condition check [BZ #25933]Sunil K Pandey2020-06-171-0/+15
| | | | | | | | | strcmp-avx2.S: In avx2 strncmp function, strings are compared in chunks of 4 vector size(i.e. 32x4=128 byte for avx2). After first 4 vector size comparison, code must check whether it already passed the given offset. This patch implement avx2 offset check condition for strncmp function, if both string compare same for first 4 vector size.
* x86_64: Use %xmmN with vpxor to clear a vector registerH.J. Lu2020-06-172-3/+3
| | | | | Since "vpxor %xmmN, %xmmN, %xmmN" clears the whole vector register, use %xmmN, instead of %ymmN, with vpxor to clear a vector register.
* dl-runtime: reloc_{offset,index} now functions arch overide'ableVineet Gupta2020-06-052-9/+35
| | | | | | | | | | | | | | | The existing macros are fragile and expect local variables with a certain name. Fix this by defining them as functions with default implementation in a new header dl-runtime.h which arches can override if need be. This came up during ARC port review, hence the need for argument pltgot in reloc_index() which is not needed by existing ports. This patch potentially only affects hppa/x86 ports, build tested for both those configs and a few more. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* Linux: Remove remnants of the getcpu cacheFlorian Weimer2020-05-162-2/+1
| | | | | | The getcpu cache was removed from the kernel in Linux 2.6.24. glibc support from the sched_getcpu implementation was removed in commit dd26c44403582fdf10d663170f947dfe4b3207a0 ("Consolidate sched_getcpu").
* x86-64: Use RDX_LP on __x86_shared_non_temporal_threshold [BZ #25966]H.J. Lu2020-05-091-3/+3
| | | | | | | | | Since __x86_shared_non_temporal_threshold is defined as long int __x86_shared_non_temporal_threshold; and long int is 4 bytes for x32, use RDX_LP to compare against __x86_shared_non_temporal_threshold in assembly code.
* Remove unused floating-point configuration from gmp-impl.h.Joseph Myers2020-04-281-2/+0
| | | | | | | | | | | | | | | This patch removes the IEEE_DOUBLE_BIG_ENDIAN and IEEE_DOUBLE_MIXED_ENDIAN macros from gmp-impl.h and gmp-mparam.h, and the ieee_double_extract union from gmp-impl.h. The macros were used only in defining the union, which was used nowhere in glibc. As GMP's gmp-impl.h is over 5000 lines, the file in glibc is so far from the GMP version that it doesn't seem to make sense to keep things there that are not relevant in glibc. (I expect there is plenty more in the header after this patch that is also not relevant in glibc and can be cleaned up later.) Tested with build-many-glibcs.py that installed stripped shared libraries are unchanged by this patch.