about summary refs log tree commit diff
path: root/sysdeps/x86/cpu-features.c
Commit message (Collapse)AuthorAgeFilesLines
* [x86] Add a feature bit: Fast_Unaligned_Copy hjl/pr19583H.J. Lu2016-03-231-1/+13
| | | | | | | | | | | | | | | | | | | On AMD processors, memcpy optimized with unaligned SSE load is slower than emcpy optimized with aligned SSSE3 while other string functions are faster with unaligned SSE load. A feature bit, Fast_Unaligned_Copy, is added to select memcpy optimized with unaligned SSE load. [BZ #19583] * sysdeps/x86/cpu-features.c (init_cpu_features): Set Fast_Unaligned_Copy with Fast_Unaligned_Load for Intel processors. Set Fast_Copy_Backward for AMD Excavator processors. * sysdeps/x86/cpu-features.h (bit_arch_Fast_Unaligned_Copy): New. (index_arch_Fast_Unaligned_Copy): Likewise. * sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Check Fast_Unaligned_Copy instead of Fast_Unaligned_Load.
* Set index_arch_AVX_Fast_Unaligned_Load only for Intel processorsH.J. Lu2016-03-221-72/+80
| | | | | | | | | | | | | | | | | | | | | | | | | | Since only Intel processors with AVX2 have fast unaligned load, we should set index_arch_AVX_Fast_Unaligned_Load only for Intel processors. Move AVX, AVX2, AVX512, FMA and FMA4 detection into get_common_indeces and call get_common_indeces for other processors. Add CPU_FEATURES_CPU_P and CPU_FEATURES_ARCH_P to aoid loading GLRO(dl_x86_cpu_features) in cpu-features.c. [BZ #19583] * sysdeps/x86/cpu-features.c (get_common_indeces): Remove inline. Check family before setting family, model and extended_model. Set AVX, AVX2, AVX512, FMA and FMA4 usable bits here. (init_cpu_features): Replace HAS_CPU_FEATURE and HAS_ARCH_FEATURE with CPU_FEATURES_CPU_P and CPU_FEATURES_ARCH_P. Set index_arch_AVX_Fast_Unaligned_Load for Intel processors with usable AVX2. Call get_common_indeces for other processors with family == NULL. * sysdeps/x86/cpu-features.h (CPU_FEATURES_CPU_P): New macro. (CPU_FEATURES_ARCH_P): Likewise. (HAS_CPU_FEATURE): Use CPU_FEATURES_CPU_P. (HAS_ARCH_FEATURE): Use CPU_FEATURES_ARCH_P.
* Add _arch_/_cpu_ to index_*/bit_* in x86 cpu-features.hH.J. Lu2016-03-101-38/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | index_* and bit_* macros are used to access cpuid and feature arrays o struct cpu_features. It is very easy to use bits and indices of cpuid array on feature array, especially in assembly codes. For example, sysdeps/i386/i686/multiarch/bcopy.S has HAS_CPU_FEATURE (Fast_Rep_String) which should be HAS_ARCH_FEATURE (Fast_Rep_String) We change index_* and bit_* to index_cpu_*/index_arch_* and bit_cpu_*/bit_arch_* so that we can catch such error at build time. [BZ #19762] * sysdeps/unix/sysv/linux/x86_64/64/dl-librecon.h (EXTRA_LD_ENVVARS): Add _arch_ to index_*/bit_*. * sysdeps/x86/cpu-features.c (init_cpu_features): Likewise. * sysdeps/x86/cpu-features.h (bit_*): Renamed to ... (bit_arch_*): This for feature array. (bit_*): Renamed to ... (bit_cpu_*): This for cpu array. (index_*): Renamed to ... (index_arch_*): This for feature array. (index_*): Renamed to ... (index_cpu_*): This for cpu array. [__ASSEMBLER__] (HAS_FEATURE): Add and use field. [__ASSEMBLER__] (HAS_CPU_FEATURE)): Pass cpu to HAS_FEATURE. [__ASSEMBLER__] (HAS_ARCH_FEATURE)): Pass arch to HAS_FEATURE. [!__ASSEMBLER__] (HAS_CPU_FEATURE): Replace index_##name and bit_##name with index_cpu_##name and bit_cpu_##name. [!__ASSEMBLER__] (HAS_ARCH_FEATURE): Replace index_##name and bit_##name with index_arch_##name and bit_arch_##name.
* Set index_Fast_Unaligned_Load for Excavator family CPUsAmit Pawar2016-01-141-0/+8
| | | | | | | | | | | | | GLIBC benchtest testcases shows SSE2_Unaligned based implementations are performing faster compare to SSE2 based implementations for routines: strcmp, strcat, strncat, stpcpy, stpncpy, strcpy, strncpy and strstr. Flag index_Fast_Unaligned_Load is set for Excavator family 0x15h CPU's. This makes SSE2_Unaligned based implementations as default for these routines. [BZ #19467] * sysdeps/x86/cpu-features.c (init_cpu_features): Set index_Fast_Unaligned_Load flag for Excavator family CPUs.
* Update copyright dates with scripts/update-copyrights.Joseph Myers2016-01-041-1/+1
|
* Added memset optimized with AVX512 for KNL hardware.Andrew Senkevich2015-12-191-0/+2
| | | | | | | | | | | | | | | It shows improvement up to 28% over AVX2 memset (performance results attached at <https://sourceware.org/ml/libc-alpha/2015-12/msg00052.html>). * sysdeps/x86_64/multiarch/memset-avx512-no-vzeroupper.S: New file. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Added new file. * sysdeps/x86_64/multiarch/ifunc-impl-list.c: Added new tests. * sysdeps/x86_64/multiarch/memset.S: Added new IFUNC branch. * sysdeps/x86_64/multiarch/memset_chk.S: Likewise. * sysdeps/x86/cpu-features.h (bit_Prefer_No_VZEROUPPER, index_Prefer_No_VZEROUPPER): New. * sysdeps/x86/cpu-features.c (init_cpu_features): Set the Prefer_No_VZEROUPPER for Knights Landing.
* Enable Silvermont optimizations for Knights LandingH.J. Lu2015-12-151-0/+3
| | | | | | | | Knights Landing processor is based on Silvermont. This patch enables Silvermont optimizations for Knights Landing. * sysdeps/x86/cpu-features.c (init_cpu_features): Enable Silvermont optimizations for Knights Landing.
* Update family and model detection for AMD CPUsH.J. Lu2015-11-301-12/+15
| | | | | | | | | | | | | | | | AMD CPUs uses the similar encoding scheme for extended family and model as Intel CPUs as shown in: http://support.amd.com/TechDocs/25481.pdf This patch updates get_common_indeces to get family and model for both Intel and AMD CPUs when family == 0x0f. [BZ #19214] * sysdeps/x86/cpu-features.c (get_common_indeces): Add an argument to return extended model. Update family and model with extended family and model when family == 0x0f. (init_cpu_features): Updated.
* Detect and select i586/i686 implementation at run-time fedora/masterH.J. Lu2015-08-271-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We detect i586 and i686 features at run-time by checking CX8 and CMOV CPUID features bits. We can use these information to select the best implementation in ix86 multiarch. HAS_I586/HAS_I686 is true if i586/i686 instructions are available on the processor. Due to the reordering and the other nifty extensions in i686, it is not really good to use heavily i586 optimized code on an i686. It's better to use i486 code if it isn't an i586. USE_I586/USE_I686 is true if i586/i686 implementation should be used for the processor. USE_I586 is true only if i686 instructions aren't available. If i686 instructions are available, we always choose i686 or i486 implementation, in that order, and we never choose i586 implementation for i686-class processors. * sysdeps/i386/init-arch.h: New file. * sysdeps/i386/i586/init-arch.h: Likewise. * sysdeps/i386/i686/init-arch.h: Likewise. * sysdeps/x86/cpu-features.c (init_cpu_features): Set bit_I586 bit if CX8 is available. Set bit_I686 bit if CMOV is available. * sysdeps/x86/cpu-features.h (bit_I586): New. (bit_I686): Likewise. (bit_CX8): Likewise. (bit_CMOV): Likewise. (index_CX8): Likewise. (index_CMOV): Likewise. (index_I586): Likewise. (index_I686): Likewise. (reg_CX8): Likewise. (reg_CMOV): Likewise. (HAS_I586): Defined as HAS_ARCH_FEATURE (I586) if i586 isn't available at compile-time. (HAS_I686): Defined as HAS_ARCH_FEATURE (I686) if i686 isn't available at compile-time. * sysdeps/x86/init-arch.h (USE_I586): New macro. (USE_I686): Likewise.
* Define HAS_CPUID/HAS_I586/HAS_I686 from -march=H.J. Lu2015-08-181-2/+2
| | | | | | | | | | | | cpuid, i586 and i686 instructions are available if the processor specified by -march= supports them. We can use this information to determine whether those instructions can be used safely. * sysdeps/x86/cpu-features.c (init_cpu_features): Check whether cpuid is available only if HAS_CPUID is 0. * sysdeps/x86/cpu-features.h (HAS_CPUID): New. (HAS_I586): Likewise. (HAS_I686): Likewise.
* Check if cpuid is available in init_cpu_featuresH.J. Lu2015-08-131-0/+12
| | | | | | | | | Since not all i486 processors support cpuid, we call __get_cpuid_max to check if cpuid is available before using it if not compiling for i586, i686 nor x86-64. * sysdeps/x86/cpu-features.c (init_cpu_features): Call __get_cpuid_max if not compiling for i586, i686 nor x86-64.
* Add _dl_x86_cpu_features to rtld_globalH.J. Lu2015-08-131-0/+202
This patch adds _dl_x86_cpu_features to rtld_global in x86 ld.so and initializes it early before __libc_start_main is called so that cpu_features is always available when it is used and we can avoid calling __init_cpu_features in IFUNC selectors. * sysdeps/i386/dl-machine.h: Include <cpu-features.c>. (dl_platform_init): Call init_cpu_features. * sysdeps/i386/dl-procinfo.c (_dl_x86_cpu_features): New. * sysdeps/i386/i686/cacheinfo.c (DISABLE_PREFERRED_MEMORY_INSTRUCTION): Removed. * sysdeps/i386/i686/multiarch/Makefile (aux): Remove init-arch. * sysdeps/i386/i686/multiarch/Versions: Removed. * sysdeps/i386/i686/multiarch/ifunc-defines.sym (KIND_OFFSET): Removed. * sysdeps/i386/ldsodefs.h: Include <cpu-features.h>. * sysdeps/unix/sysv/linux/x86/Makefile (libpthread-sysdep_routines): Remove init-arch. * sysdeps/unix/sysv/linux/x86_64/dl-procinfo.c: Include <sysdeps/x86_64/dl-procinfo.c> instead of sysdeps/generic/dl-procinfo.c>. * sysdeps/x86/Makefile [$(subdir) == csu] (gen-as-const-headers): Add cpu-features-offsets.sym and rtld-global-offsets.sym. [$(subdir) == elf] (sysdep-dl-routines): Add dl-get-cpu-features. [$(subdir) == elf] (tests): Add tst-get-cpu-features. [$(subdir) == elf] (tests-static): Add tst-get-cpu-features-static. * sysdeps/x86/Versions: New file. * sysdeps/x86/cpu-features-offsets.sym: Likewise. * sysdeps/x86/cpu-features.c: Likewise. * sysdeps/x86/cpu-features.h: Likewise. * sysdeps/x86/dl-get-cpu-features.c: Likewise. * sysdeps/x86/libc-start.c: Likewise. * sysdeps/x86/rtld-global-offsets.sym: Likewise. * sysdeps/x86/tst-get-cpu-features-static.c: Likewise. * sysdeps/x86/tst-get-cpu-features.c: Likewise. * sysdeps/x86_64/dl-procinfo.c: Likewise. * sysdeps/x86_64/cacheinfo.c (__cpuid_count): Removed. Assume USE_MULTIARCH is defined and don't check it. (is_intel): Replace __cpu_features with GLRO(dl_x86_cpu_features). (is_amd): Likewise. (max_cpuid): Likewise. (intel_check_word): Likewise. (__cache_sysconf): Don't call __init_cpu_features. (__x86_preferred_memory_instruction): Removed. (init_cacheinfo): Don't call __init_cpu_features. Replace __cpu_features with GLRO(dl_x86_cpu_features). * sysdeps/x86_64/dl-machine.h: <cpu-features.c>. (dl_platform_init): Call init_cpu_features. * sysdeps/x86_64/ldsodefs.h: Include <cpu-features.h>. * sysdeps/x86_64/multiarch/Makefile (aux): Remove init-arch. * sysdeps/x86_64/multiarch/Versions: Removed. * sysdeps/x86_64/multiarch/cacheinfo.c: Likewise. * sysdeps/x86_64/multiarch/init-arch.c: Likewise. * sysdeps/x86_64/multiarch/ifunc-defines.sym (KIND_OFFSET): Removed. * sysdeps/x86_64/multiarch/init-arch.h: Rewrite.