about summary refs log tree commit diff
path: root/sysdeps/x86/cpu-features.c
Commit message (Collapse)AuthorAgeFilesLines
* Use index_cpu_RTM and reg_RTM to clear the bit_cpu_RTM bitH.J. Lu2017-02-171-1/+1
| | | | | * sysdeps/x86/cpu-features.c (init_cpu_features): Use index_cpu_RTM and reg_RTM to clear the bit_cpu_RTM bit.
* Update copyright dates with scripts/update-copyrights.Joseph Myers2017-01-011-1/+1
|
* Disable TSX on some Haswell processors.Andrew Senkevich2016-12-191-6/+23
| | | | | | | | | | Patch disables Intel TSX on some Haswell processors to avoid TSX on kernels that weren't updated with the latest microcode package (which disables broken feature by default). * sysdeps/x86/cpu-features.c (get_common_indeces): Add stepping identification. (init_cpu_features): Add handle of Haswell.
* Bug 20689: Fix FMA and AVX2 detection on IntelCarlos O'Donell2016-10-171-10/+14
| | | | | | | | | | | | | | | | | | | | In the Intel Architecture Instruction Set Extensions Programming reference the recommended way to test for FMA in section '2.2.1 Detection of FMA' is: "Application Software must identify that hardware supports AVX as explained in ... after that it must also detect support for FMA..." We don't do that in glibc. We use osxsave to detect the use of xgetbv, and after that we check for AVX and FMA orthogonally. It is conceivable that you could have the AVX bit clear and the FMA bit in an undefined state. This commit fixes FMA and AVX2 detection to depend on usable AVX as required by the recommended Intel sequences. v1: https://www.sourceware.org/ml/libc-alpha/2016-10/msg00241.html v2: https://www.sourceware.org/ml/libc-alpha/2016-10/msg00265.html
* X86-64: Add _dl_runtime_resolve_avx[512]_{opt|slow} [BZ #20508]H.J. Lu2016-09-061-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is transition penalty when SSE instructions are mixed with 256-bit AVX or 512-bit AVX512 load instructions. Since _dl_runtime_resolve_avx and _dl_runtime_profile_avx512 save/restore 256-bit YMM/512-bit ZMM registers, there is transition penalty when SSE instructions are used with lazy binding on AVX and AVX512 processors. To avoid SSE transition penalty, if only the lower 128 bits of the first 8 vector registers are non-zero, we can preserve %xmm0 - %xmm7 registers with the zero upper bits. For AVX and AVX512 processors which support XGETBV with ECX == 1, we can use XGETBV with ECX == 1 to check if the upper 128 bits of YMM registers or the upper 256 bits of ZMM registers are zero. We can restore only the non-zero portion of vector registers with AVX/AVX512 load instructions which will zero-extend upper bits of vector registers. This patch adds _dl_runtime_resolve_sse_vex which saves and restores XMM registers with 128-bit AVX store/load instructions. It is used to preserve YMM/ZMM registers when only the lower 128 bits are non-zero. _dl_runtime_resolve_avx_opt and _dl_runtime_resolve_avx512_opt are added and used on AVX/AVX512 processors supporting XGETBV with ECX == 1 so that we store and load only the non-zero portion of vector registers. This avoids SSE transition penalty caused by _dl_runtime_resolve_avx and _dl_runtime_profile_avx512 when only the lower 128 bits of vector registers are used. _dl_runtime_resolve_avx_slow is added and used for AVX processors which don't support XGETBV with ECX == 1. Since there is no SSE transition penalty on AVX512 processors which don't support XGETBV with ECX == 1, _dl_runtime_resolve_avx512_slow isn't provided. [BZ #20495] [BZ #20508] * sysdeps/x86/cpu-features.c (init_cpu_features): For Intel processors, set Use_dl_runtime_resolve_slow and set Use_dl_runtime_resolve_opt if XGETBV suports ECX == 1. * sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt): New. (bit_arch_Use_dl_runtime_resolve_slow): Likewise. (index_arch_Use_dl_runtime_resolve_opt): Likewise. (index_arch_Use_dl_runtime_resolve_slow): Likewise. * sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup): Use _dl_runtime_resolve_avx512_opt and _dl_runtime_resolve_avx_opt if Use_dl_runtime_resolve_opt is set. Use _dl_runtime_resolve_slow if Use_dl_runtime_resolve_slow is set. * sysdeps/x86_64/dl-trampoline.S: Include <cpu-features.h>. (_dl_runtime_resolve_opt): New. Defined for AVX and AVX512. (_dl_runtime_resolve): Add one for _dl_runtime_resolve_sse_vex. * sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx_slow): New. (_dl_runtime_resolve_opt): Likewise. (_dl_runtime_profile): Define only if _dl_runtime_profile is defined.
* Check FMA after COMMON_CPUID_INDEX_80000001H.J. Lu2016-06-071-4/+9
| | | | | | | | | | | Since the FMA4 bit is in COMMON_CPUID_INDEX_80000001 and FMA4 requires AVX, determine if FMA4 is usable after COMMON_CPUID_INDEX_80000001 is available and if AVX is usable. [BZ #20195] * sysdeps/x86/cpu-features.c (get_common_indeces): Move FMA4 check to ... (init_cpu_features): Here.
* Detect Intel Goldmont and Airmont processorsH.J. Lu2016-04-151-0/+8
| | | | | | | | | Updated from the model numbers of Goldmont and Airmont processors in Intel64 And IA-32 Processor Architectures Software Developer's Manual Volume 3 Revision 058. * sysdeps/x86/cpu-features.c (init_cpu_features): Detect Intel Goldmont and Airmont processors.
* Remove Fast_Copy_Backward from Intel Core processorsH.J. Lu2016-04-011-5/+1
| | | | | | | | | Intel Core i3, i5 and i7 processors have fast unaligned copy and copy backward is ignored. Remove Fast_Copy_Backward from Intel Core processors to avoid confusion. * sysdeps/x86/cpu-features.c (init_cpu_features): Don't set bit_arch_Fast_Copy_Backward for Intel Core proessors.
* [x86] Add a feature bit: Fast_Unaligned_CopyH.J. Lu2016-03-281-1/+13
| | | | | | | | | | | | | | | | | | | On AMD processors, memcpy optimized with unaligned SSE load is slower than emcpy optimized with aligned SSSE3 while other string functions are faster with unaligned SSE load. A feature bit, Fast_Unaligned_Copy, is added to select memcpy optimized with unaligned SSE load. [BZ #19583] * sysdeps/x86/cpu-features.c (init_cpu_features): Set Fast_Unaligned_Copy with Fast_Unaligned_Load for Intel processors. Set Fast_Copy_Backward for AMD Excavator processors. * sysdeps/x86/cpu-features.h (bit_arch_Fast_Unaligned_Copy): New. (index_arch_Fast_Unaligned_Copy): Likewise. * sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Check Fast_Unaligned_Copy instead of Fast_Unaligned_Load.
* Set index_arch_AVX_Fast_Unaligned_Load only for Intel processorsH.J. Lu2016-03-221-72/+80
| | | | | | | | | | | | | | | | | | | | | | | | | | Since only Intel processors with AVX2 have fast unaligned load, we should set index_arch_AVX_Fast_Unaligned_Load only for Intel processors. Move AVX, AVX2, AVX512, FMA and FMA4 detection into get_common_indeces and call get_common_indeces for other processors. Add CPU_FEATURES_CPU_P and CPU_FEATURES_ARCH_P to aoid loading GLRO(dl_x86_cpu_features) in cpu-features.c. [BZ #19583] * sysdeps/x86/cpu-features.c (get_common_indeces): Remove inline. Check family before setting family, model and extended_model. Set AVX, AVX2, AVX512, FMA and FMA4 usable bits here. (init_cpu_features): Replace HAS_CPU_FEATURE and HAS_ARCH_FEATURE with CPU_FEATURES_CPU_P and CPU_FEATURES_ARCH_P. Set index_arch_AVX_Fast_Unaligned_Load for Intel processors with usable AVX2. Call get_common_indeces for other processors with family == NULL. * sysdeps/x86/cpu-features.h (CPU_FEATURES_CPU_P): New macro. (CPU_FEATURES_ARCH_P): Likewise. (HAS_CPU_FEATURE): Use CPU_FEATURES_CPU_P. (HAS_ARCH_FEATURE): Use CPU_FEATURES_ARCH_P.
* Add _arch_/_cpu_ to index_*/bit_* in x86 cpu-features.hH.J. Lu2016-03-101-38/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | index_* and bit_* macros are used to access cpuid and feature arrays o struct cpu_features. It is very easy to use bits and indices of cpuid array on feature array, especially in assembly codes. For example, sysdeps/i386/i686/multiarch/bcopy.S has HAS_CPU_FEATURE (Fast_Rep_String) which should be HAS_ARCH_FEATURE (Fast_Rep_String) We change index_* and bit_* to index_cpu_*/index_arch_* and bit_cpu_*/bit_arch_* so that we can catch such error at build time. [BZ #19762] * sysdeps/unix/sysv/linux/x86_64/64/dl-librecon.h (EXTRA_LD_ENVVARS): Add _arch_ to index_*/bit_*. * sysdeps/x86/cpu-features.c (init_cpu_features): Likewise. * sysdeps/x86/cpu-features.h (bit_*): Renamed to ... (bit_arch_*): This for feature array. (bit_*): Renamed to ... (bit_cpu_*): This for cpu array. (index_*): Renamed to ... (index_arch_*): This for feature array. (index_*): Renamed to ... (index_cpu_*): This for cpu array. [__ASSEMBLER__] (HAS_FEATURE): Add and use field. [__ASSEMBLER__] (HAS_CPU_FEATURE)): Pass cpu to HAS_FEATURE. [__ASSEMBLER__] (HAS_ARCH_FEATURE)): Pass arch to HAS_FEATURE. [!__ASSEMBLER__] (HAS_CPU_FEATURE): Replace index_##name and bit_##name with index_cpu_##name and bit_cpu_##name. [!__ASSEMBLER__] (HAS_ARCH_FEATURE): Replace index_##name and bit_##name with index_arch_##name and bit_arch_##name.
* Set index_Fast_Unaligned_Load for Excavator family CPUsAmit Pawar2016-01-141-0/+8
| | | | | | | | | | | | | GLIBC benchtest testcases shows SSE2_Unaligned based implementations are performing faster compare to SSE2 based implementations for routines: strcmp, strcat, strncat, stpcpy, stpncpy, strcpy, strncpy and strstr. Flag index_Fast_Unaligned_Load is set for Excavator family 0x15h CPU's. This makes SSE2_Unaligned based implementations as default for these routines. [BZ #19467] * sysdeps/x86/cpu-features.c (init_cpu_features): Set index_Fast_Unaligned_Load flag for Excavator family CPUs.
* Update copyright dates with scripts/update-copyrights.Joseph Myers2016-01-041-1/+1
|
* Added memset optimized with AVX512 for KNL hardware.Andrew Senkevich2015-12-191-0/+2
| | | | | | | | | | | | | | | It shows improvement up to 28% over AVX2 memset (performance results attached at <https://sourceware.org/ml/libc-alpha/2015-12/msg00052.html>). * sysdeps/x86_64/multiarch/memset-avx512-no-vzeroupper.S: New file. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Added new file. * sysdeps/x86_64/multiarch/ifunc-impl-list.c: Added new tests. * sysdeps/x86_64/multiarch/memset.S: Added new IFUNC branch. * sysdeps/x86_64/multiarch/memset_chk.S: Likewise. * sysdeps/x86/cpu-features.h (bit_Prefer_No_VZEROUPPER, index_Prefer_No_VZEROUPPER): New. * sysdeps/x86/cpu-features.c (init_cpu_features): Set the Prefer_No_VZEROUPPER for Knights Landing.
* Enable Silvermont optimizations for Knights LandingH.J. Lu2015-12-151-0/+3
| | | | | | | | Knights Landing processor is based on Silvermont. This patch enables Silvermont optimizations for Knights Landing. * sysdeps/x86/cpu-features.c (init_cpu_features): Enable Silvermont optimizations for Knights Landing.
* Update family and model detection for AMD CPUsH.J. Lu2015-11-301-12/+15
| | | | | | | | | | | | | | | | AMD CPUs uses the similar encoding scheme for extended family and model as Intel CPUs as shown in: http://support.amd.com/TechDocs/25481.pdf This patch updates get_common_indeces to get family and model for both Intel and AMD CPUs when family == 0x0f. [BZ #19214] * sysdeps/x86/cpu-features.c (get_common_indeces): Add an argument to return extended model. Update family and model with extended family and model when family == 0x0f. (init_cpu_features): Updated.
* Detect and select i586/i686 implementation at run-time fedora/masterH.J. Lu2015-08-271-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We detect i586 and i686 features at run-time by checking CX8 and CMOV CPUID features bits. We can use these information to select the best implementation in ix86 multiarch. HAS_I586/HAS_I686 is true if i586/i686 instructions are available on the processor. Due to the reordering and the other nifty extensions in i686, it is not really good to use heavily i586 optimized code on an i686. It's better to use i486 code if it isn't an i586. USE_I586/USE_I686 is true if i586/i686 implementation should be used for the processor. USE_I586 is true only if i686 instructions aren't available. If i686 instructions are available, we always choose i686 or i486 implementation, in that order, and we never choose i586 implementation for i686-class processors. * sysdeps/i386/init-arch.h: New file. * sysdeps/i386/i586/init-arch.h: Likewise. * sysdeps/i386/i686/init-arch.h: Likewise. * sysdeps/x86/cpu-features.c (init_cpu_features): Set bit_I586 bit if CX8 is available. Set bit_I686 bit if CMOV is available. * sysdeps/x86/cpu-features.h (bit_I586): New. (bit_I686): Likewise. (bit_CX8): Likewise. (bit_CMOV): Likewise. (index_CX8): Likewise. (index_CMOV): Likewise. (index_I586): Likewise. (index_I686): Likewise. (reg_CX8): Likewise. (reg_CMOV): Likewise. (HAS_I586): Defined as HAS_ARCH_FEATURE (I586) if i586 isn't available at compile-time. (HAS_I686): Defined as HAS_ARCH_FEATURE (I686) if i686 isn't available at compile-time. * sysdeps/x86/init-arch.h (USE_I586): New macro. (USE_I686): Likewise.
* Define HAS_CPUID/HAS_I586/HAS_I686 from -march=H.J. Lu2015-08-181-2/+2
| | | | | | | | | | | | cpuid, i586 and i686 instructions are available if the processor specified by -march= supports them. We can use this information to determine whether those instructions can be used safely. * sysdeps/x86/cpu-features.c (init_cpu_features): Check whether cpuid is available only if HAS_CPUID is 0. * sysdeps/x86/cpu-features.h (HAS_CPUID): New. (HAS_I586): Likewise. (HAS_I686): Likewise.
* Check if cpuid is available in init_cpu_featuresH.J. Lu2015-08-131-0/+12
| | | | | | | | | Since not all i486 processors support cpuid, we call __get_cpuid_max to check if cpuid is available before using it if not compiling for i586, i686 nor x86-64. * sysdeps/x86/cpu-features.c (init_cpu_features): Call __get_cpuid_max if not compiling for i586, i686 nor x86-64.
* Add _dl_x86_cpu_features to rtld_globalH.J. Lu2015-08-131-0/+202
This patch adds _dl_x86_cpu_features to rtld_global in x86 ld.so and initializes it early before __libc_start_main is called so that cpu_features is always available when it is used and we can avoid calling __init_cpu_features in IFUNC selectors. * sysdeps/i386/dl-machine.h: Include <cpu-features.c>. (dl_platform_init): Call init_cpu_features. * sysdeps/i386/dl-procinfo.c (_dl_x86_cpu_features): New. * sysdeps/i386/i686/cacheinfo.c (DISABLE_PREFERRED_MEMORY_INSTRUCTION): Removed. * sysdeps/i386/i686/multiarch/Makefile (aux): Remove init-arch. * sysdeps/i386/i686/multiarch/Versions: Removed. * sysdeps/i386/i686/multiarch/ifunc-defines.sym (KIND_OFFSET): Removed. * sysdeps/i386/ldsodefs.h: Include <cpu-features.h>. * sysdeps/unix/sysv/linux/x86/Makefile (libpthread-sysdep_routines): Remove init-arch. * sysdeps/unix/sysv/linux/x86_64/dl-procinfo.c: Include <sysdeps/x86_64/dl-procinfo.c> instead of sysdeps/generic/dl-procinfo.c>. * sysdeps/x86/Makefile [$(subdir) == csu] (gen-as-const-headers): Add cpu-features-offsets.sym and rtld-global-offsets.sym. [$(subdir) == elf] (sysdep-dl-routines): Add dl-get-cpu-features. [$(subdir) == elf] (tests): Add tst-get-cpu-features. [$(subdir) == elf] (tests-static): Add tst-get-cpu-features-static. * sysdeps/x86/Versions: New file. * sysdeps/x86/cpu-features-offsets.sym: Likewise. * sysdeps/x86/cpu-features.c: Likewise. * sysdeps/x86/cpu-features.h: Likewise. * sysdeps/x86/dl-get-cpu-features.c: Likewise. * sysdeps/x86/libc-start.c: Likewise. * sysdeps/x86/rtld-global-offsets.sym: Likewise. * sysdeps/x86/tst-get-cpu-features-static.c: Likewise. * sysdeps/x86/tst-get-cpu-features.c: Likewise. * sysdeps/x86_64/dl-procinfo.c: Likewise. * sysdeps/x86_64/cacheinfo.c (__cpuid_count): Removed. Assume USE_MULTIARCH is defined and don't check it. (is_intel): Replace __cpu_features with GLRO(dl_x86_cpu_features). (is_amd): Likewise. (max_cpuid): Likewise. (intel_check_word): Likewise. (__cache_sysconf): Don't call __init_cpu_features. (__x86_preferred_memory_instruction): Removed. (init_cacheinfo): Don't call __init_cpu_features. Replace __cpu_features with GLRO(dl_x86_cpu_features). * sysdeps/x86_64/dl-machine.h: <cpu-features.c>. (dl_platform_init): Call init_cpu_features. * sysdeps/x86_64/ldsodefs.h: Include <cpu-features.h>. * sysdeps/x86_64/multiarch/Makefile (aux): Remove init-arch. * sysdeps/x86_64/multiarch/Versions: Removed. * sysdeps/x86_64/multiarch/cacheinfo.c: Likewise. * sysdeps/x86_64/multiarch/init-arch.c: Likewise. * sysdeps/x86_64/multiarch/ifunc-defines.sym (KIND_OFFSET): Removed. * sysdeps/x86_64/multiarch/init-arch.h: Rewrite.