about summary refs log tree commit diff
path: root/sysdeps/x86/cpu-tunables.c
Commit message (Collapse)AuthorAgeFilesLines
* Remove --enable-tunables configure optionAdhemerval Zanella Netto2023-03-291-24/+21
| | | | | | | | | | | | And make always supported. The configure option was added on glibc 2.25 and some features require it (such as hwcap mask, huge pages support, and lock elisition tuning). It also simplifies the build permutations. Changes from v1: * Remove glibc.rtld.dynamic_sort changes, it is orthogonal and needs more discussion. * Cleanup more code. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* x86: Fix bug about glibc.cpu.hwcaps.caiyinyu2023-03-071-3/+3
| | | | | | | | | | | | | | | | | | | Recorded in [BZ #30183]: 1. export GLIBC_TUNABLES=glibc.cpu.hwcaps=-AVX512 2. Add _dl_printf("p -- %s\n", p); just before switch(nl) in sysdeps/x86/cpu-tunables.c 3. compiled and run ./testrun.sh /usr/bin/ls you will get: p -- -AVX512 p -- LC_ADDRESS=en_US.UTF-8 p -- LC_NUMERIC=C ... The function, TUNABLE_CALLBACK (set_hwcaps) (tunable_val_t *valp), checks far more than it should and it should stop at end of "-AVX512".
* Update copyright dates with scripts/update-copyrightsJoseph Myers2023-01-061-1/+1
|
* x86: Add support for building {w}memcmp{eq} with explicit ISA levelNoah Goldstein2022-07-051-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. Refactor files so that all implementations are in the multiarch directory - Moved the implementation portion of memcmp sse2 from memcmp.S to multiarch/memcmp-sse2.S - The non-multiarch file now only includes one of the implementations in the multiarch directory based on the compiled ISA level (only used for non-multiarch builds. Otherwise we go through the ifunc selector). 2. Add ISA level build guards to different implementations. - I.e memcmp-avx2-movsb.S which is ISA level 3 will only build if compiled ISA level <= 3. Otherwise there is no reason to include it as we will always use one of the ISA level 4 implementations (memcmp-evex-movbe.S). 3. Add new multiarch/rtld-{w}memcmp{eq}.S that just include the non-multiarch {w}memcmp{eq}.S which will in turn select the best implementation based on the compiled ISA level. 4. Refactor the ifunc selector and ifunc implementation list to use the ISA level aware wrapper macros that allow functions below the compiled ISA level (with a guranteed replacement) to be skipped. Tested with and without multiarch on x86_64 for ISA levels: {generic, x86-64-v2, x86-64-v3, x86-64-v4} And m32 with and without multiarch.
* Update copyright dates with scripts/update-copyrightsPaul Eggert2022-01-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | I used these shell commands: ../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright (cd ../glibc && git commit -am"[this commit message]") and then ignored the output, which consisted lines saying "FOO: warning: copyright statement not found" for each of 7061 files FOO. I then removed trailing white space from math/tgmath.h, support/tst-support-open-dev-null-range.c, and sysdeps/x86_64/multiarch/strlen-vec.S, to work around the following obscure pre-commit check failure diagnostics from Savannah. I don't know why I run into these diagnostics whereas others evidently do not. remote: *** 912-#endif remote: *** 913: remote: *** 914- remote: *** error: lines with trailing whitespace found ... remote: *** error: sysdeps/unix/sysv/linux/statx_cp.c: trailing lines
* x86-64: Remove LD_PREFER_MAP_32BIT_EXEC support [BZ #28656]H.J. Lu2021-12-101-7/+0
| | | | | | | | | | Remove the LD_PREFER_MAP_32BIT_EXEC environment variable support since the first PT_LOAD segment is no longer executable due to defaulting to -z separate-code. This fixes [BZ #28656]. Reviewed-by: Florian Weimer <fweimer@redhat.com>
* x86-64: Remove Prefer_AVX2_STRCMPH.J. Lu2021-11-011-2/+0
| | | | | | | | | Remove Prefer_AVX2_STRCMP to enable EVEX strcmp. When comparing 2 32-byte strings, EVEX strcmp has been improved to require 1 load, 1 VPTESTM, 1 VPCMP, 1 KMOVD and 1 INCL instead of 2 loads, 3 VPCMPs, 2 KORDs, 1 KMOVD and 1 TESTL while AVX2 strcmp requires 1 load, 2 VPCMPEQs, 1 VPMINU, 1 VPMOVMSKB and 1 TESTL. EVEX strcmp is now faster than AVX2 strcmp by up to 40% on Tiger Lake and Ice Lake.
* x86: Set Prefer_No_VZEROUPPER and add Prefer_AVX2_STRCMPH.J. Lu2021-03-291-0/+2
| | | | | | | | | 1. Set Prefer_No_VZEROUPPER if RTM is usable to avoid RTM abort triggered by VZEROUPPER inside a transactionally executing RTM region. 2. Since to compare 2 32-byte strings, 256-bit EVEX strcmp requires 2 loads, 3 VPCMPs and 2 KORDs while AVX2 strcmp requires 1 load, 2 VPCMPEQs, 1 VPMINU and 1 VPMOVMSKB, AVX2 strcmp is faster than EVEX strcmp. Add Prefer_AVX2_STRCMP to prefer AVX2 strcmp family functions.
* x86: Properly disable XSAVE related features [BZ #27605]H.J. Lu2021-03-291-0/+1
| | | | | | | 1. Support GLIBC_TUNABLES=glibc.cpu.hwcaps=-XSAVE. 2. Disable all features which depend on XSAVE: a. If OSXSAVE is disabled by glibc tunables. Or b. If both XSAVE and XSAVEC aren't usable.
* Update copyright dates with scripts/update-copyrightsPaul Eggert2021-01-021-1/+1
| | | | | | | | | | | | | | | | I used these shell commands: ../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright (cd ../glibc && git commit -am"[this commit message]") and then ignored the output, which consisted lines saying "FOO: warning: copyright statement not found" for each of 6694 files FOO. I then removed trailing white space from benchtests/bench-pthread-locks.c and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this diagnostic from Savannah: remote: *** pre-commit check failed ... remote: *** error: lines with trailing whitespace found remote: error: hook declined to update refs/heads/master
* x86: Support usable check for all CPU featuresH.J. Lu2020-07-131-106/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Support usable check for all CPU features with the following changes: 1. Change struct cpu_features to struct cpuid_features { struct cpuid_registers cpuid; struct cpuid_registers usable; }; struct cpu_features { struct cpu_features_basic basic; struct cpuid_features features[COMMON_CPUID_INDEX_MAX]; unsigned int preferred[PREFERRED_FEATURE_INDEX_MAX]; ... }; so that there is a usable bit for each cpuid bit. 2. After the cpuid bits have been initialized, copy the known bits to the usable bits. EAX/EBX from INDEX_1 and EAX from INDEX_7 aren't used for CPU feature detection. 3. Clear the usable bits which require OS support. 4. If the feature is supported by OS, copy its cpuid bit to its usable bit. 5. Replace HAS_CPU_FEATURE and CPU_FEATURES_CPU_P with CPU_FEATURE_USABLE and CPU_FEATURE_USABLE_P to check if a feature is usable. 6. Add DEPR_FPU_CS_DS for INDEX_7_EBX_13. 7. Unset MPX feature since it has been deprecated. The results are 1. If the feature is known and doesn't requre OS support, its usable bit is copied from the cpuid bit. 2. Otherwise, its usable bit is copied from the cpuid bit only if the feature is known to supported by OS. 3. CPU_FEATURE_USABLE/CPU_FEATURE_USABLE_P are used to check if the feature can be used. 4. HAS_CPU_FEATURE/CPU_FEATURE_CPU_P are used to check if CPU supports the feature.
* x86: Update CPU feature detection [BZ #26149]H.J. Lu2020-06-221-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. Divide architecture features into the usable features and the preferred features. The usable features are for correctness and can be exported in a stable ABI. The preferred features are for performance and only for glibc internal use. 2. Change struct cpu_features to struct cpu_features { struct cpu_features_basic basic; unsigned int *usable_p; struct cpuid_registers cpuid[COMMON_CPUID_INDEX_MAX]; unsigned int usable[USABLE_FEATURE_INDEX_MAX]; unsigned int preferred[PREFERRED_FEATURE_INDEX_MAX]; ... }; and initialize usable_p to pointer to the usable arary so that struct cpu_features { struct cpu_features_basic basic; unsigned int *usable_p; struct cpuid_registers cpuid[COMMON_CPUID_INDEX_MAX]; }; can be exported via a stable ABI. The cpuid and usable arrays can be expanded with backward binary compatibility for both .o and .so files. 3. Add COMMON_CPUID_INDEX_7_ECX_1 for AVX512_BF16. 4. Detect ENQCMD, PKS, AVX512_VP2INTERSECT, MD_CLEAR, SERIALIZE, HYBRID, TSXLDTRK, L1D_FLUSH, CORE_CAPABILITIES and AVX512_BF16. 5. Rename CAPABILITIES to ARCH_CAPABILITIES. 6. Check if AVX512_VP2INTERSECT, AVX512_BF16 and PKU are usable. 7. Update CPU feature detection test.
* x86: Move CET control to _dl_x86_feature_control [BZ #25887]H.J. Lu2020-05-181-25/+6
| | | | | | | | | | 1. Include <dl-procruntime.c> to get architecture specific initializer in rtld_global. 2. Change _dl_x86_feature_1[2] to _dl_x86_feature_1. 3. Add _dl_x86_feature_control after _dl_x86_feature_1, which is a struct of 2 bitfields for IBT and SHSTK control This fixes [BZ #25887].
* Update copyright dates with scripts/update-copyrights.Joseph Myers2020-01-011-1/+1
|
* Prefer https to http for gnu.org and fsf.org URLsPaul Eggert2019-09-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Also, change sources.redhat.com to sourceware.org. This patch was automatically generated by running the following shell script, which uses GNU sed, and which avoids modifying files imported from upstream: sed -ri ' s,(http|ftp)(://(.*\.)?(gnu|fsf|sourceware)\.org($|[^.]|\.[^a-z])),https\2,g s,(http|ftp)(://(.*\.)?)sources\.redhat\.com($|[^.]|\.[^a-z]),https\2sourceware.org\4,g ' \ $(find $(git ls-files) -prune -type f \ ! -name '*.po' \ ! -name 'ChangeLog*' \ ! -path COPYING ! -path COPYING.LIB \ ! -path manual/fdl-1.3.texi ! -path manual/lgpl-2.1.texi \ ! -path manual/texinfo.tex ! -path scripts/config.guess \ ! -path scripts/config.sub ! -path scripts/install-sh \ ! -path scripts/mkinstalldirs ! -path scripts/move-if-change \ ! -path INSTALL ! -path locale/programs/charmap-kw.h \ ! -path po/libc.pot ! -path sysdeps/gnu/errlist.c \ ! '(' -name configure \ -execdir test -f configure.ac -o -f configure.in ';' ')' \ ! '(' -name preconfigure \ -execdir test -f preconfigure.ac ';' ')' \ -print) and then by running 'make dist-prepare' to regenerate files built from the altered files, and then executing the following to cleanup: chmod a+x sysdeps/unix/sysv/linux/riscv/configure # Omit irrelevant whitespace and comment-only changes, # perhaps from a slightly-different Autoconf version. git checkout -f \ sysdeps/csky/configure \ sysdeps/hppa/configure \ sysdeps/riscv/configure \ sysdeps/unix/sysv/linux/csky/configure # Omit changes that caused a pre-commit check to fail like this: # remote: *** error: sysdeps/powerpc/powerpc64/ppc-mcount.S: trailing lines git checkout -f \ sysdeps/powerpc/powerpc64/ppc-mcount.S \ sysdeps/unix/sysv/linux/s390/s390-64/syscall.S # Omit change that caused a pre-commit check to fail like this: # remote: *** error: sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S: last line does not end in newline git checkout -f sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S
* Update copyright dates with scripts/update-copyrights.Joseph Myers2019-01-011-1/+1
| | | | | | | * All files with FSF copyright notices: Update copyright dates using scripts/update-copyrights. * locale/programs/charmap-kw.h: Regenerated. * locale/programs/locfile-kw.h: Likewise.
* Rename the glibc.tune namespace to glibc.cpuSiddhesh Poyarekar2018-08-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The glibc.tune namespace is vaguely named since it is a 'tunable', so give it a more specific name that describes what it refers to. Rename the tunable namespace to 'cpu' to more accurately reflect what it encompasses. Also rename glibc.tune.cpu to glibc.cpu.name since glibc.cpu.cpu is weird. * NEWS: Mention the change. * elf/dl-tunables.list: Rename tune namespace to cpu. * sysdeps/powerpc/dl-tunables.list: Likewise. * sysdeps/x86/dl-tunables.list: Likewise. * sysdeps/aarch64/dl-tunables.list: Rename tune.cpu to cpu.name. * elf/dl-hwcaps.c (_dl_important_hwcaps): Adjust. * elf/dl-hwcaps.h (GET_HWCAP_MASK): Likewise. * manual/README.tunables: Likewise. * manual/tunables.texi: Likewise. * sysdeps/powerpc/cpu-features.c: Likewise. * sysdeps/unix/sysv/linux/aarch64/cpu-features.c (init_cpu_features): Likewise. * sysdeps/x86/cpu-features.c: Likewise. * sysdeps/x86/cpu-features.h: Likewise. * sysdeps/x86/cpu-tunables.c: Likewise. * sysdeps/x86_64/Makefile: Likewise. * sysdeps/x86/dl-cet.c: Likewise. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* x86: Support IBT and SHSTK in Intel CET [BZ #21598]H.J. Lu2018-07-161-0/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Intel Control-flow Enforcement Technology (CET) instructions: https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-en forcement-technology-preview.pdf includes Indirect Branch Tracking (IBT) and Shadow Stack (SHSTK). GNU_PROPERTY_X86_FEATURE_1_IBT is added to GNU program property to indicate that all executable sections are compatible with IBT when ENDBR instruction starts each valid target where an indirect branch instruction can land. Linker sets GNU_PROPERTY_X86_FEATURE_1_IBT on output only if it is set on all relocatable inputs. On an IBT capable processor, the following steps should be taken: 1. When loading an executable without an interpreter, enable IBT and lock IBT if GNU_PROPERTY_X86_FEATURE_1_IBT is set on the executable. 2. When loading an executable with an interpreter, enable IBT if GNU_PROPERTY_X86_FEATURE_1_IBT is set on the interpreter. a. If GNU_PROPERTY_X86_FEATURE_1_IBT isn't set on the executable, disable IBT. b. Lock IBT. 3. If IBT is enabled, when loading a shared object without GNU_PROPERTY_X86_FEATURE_1_IBT: a. If legacy interwork is allowed, then mark all pages in executable PT_LOAD segments in legacy code page bitmap. Failure of legacy code page bitmap allocation causes an error. b. If legacy interwork isn't allowed, it causes an error. GNU_PROPERTY_X86_FEATURE_1_SHSTK is added to GNU program property to indicate that all executable sections are compatible with SHSTK where return address popped from shadow stack always matches return address popped from normal stack. Linker sets GNU_PROPERTY_X86_FEATURE_1_SHSTK on output only if it is set on all relocatable inputs. On a SHSTK capable processor, the following steps should be taken: 1. When loading an executable without an interpreter, enable SHSTK if GNU_PROPERTY_X86_FEATURE_1_SHSTK is set on the executable. 2. When loading an executable with an interpreter, enable SHSTK if GNU_PROPERTY_X86_FEATURE_1_SHSTK is set on interpreter. a. If GNU_PROPERTY_X86_FEATURE_1_SHSTK isn't set on the executable or any shared objects loaded via the DT_NEEDED tag, disable SHSTK. b. Otherwise lock SHSTK. 3. After SHSTK is enabled, it is an error to load a shared object without GNU_PROPERTY_X86_FEATURE_1_SHSTK. To enable CET support in glibc, --enable-cet is required to configure glibc. When CET is enabled, both compiler and assembler must support CET. Otherwise, it is a configure-time error. To support CET run-time control, 1. _dl_x86_feature_1 is added to the writable ld.so namespace to indicate if IBT or SHSTK are enabled at run-time. It should be initialized by init_cpu_features. 2. For dynamic executables: a. A l_cet field is added to struct link_map to indicate if IBT or SHSTK is enabled in an ELF module. _dl_process_pt_note or _rtld_process_pt_note is called to process PT_NOTE segment for GNU program property and set l_cet. b. _dl_open_check is added to check IBT and SHSTK compatibilty when dlopening a shared object. 3. Replace i386 _dl_runtime_resolve and _dl_runtime_profile with _dl_runtime_resolve_shstk and _dl_runtime_profile_shstk, respectively if SHSTK is enabled. CET run-time control can be changed via GLIBC_TUNABLES with $ export GLIBC_TUNABLES=glibc.tune.x86_shstk=[permissive|on|off] $ export GLIBC_TUNABLES=glibc.tune.x86_ibt=[permissive|on|off] 1. permissive: SHSTK is disabled when dlopening a legacy ELF module. 2. on: IBT or SHSTK are always enabled, regardless if there are IBT or SHSTK bits in GNU program property. 3. off: IBT or SHSTK are always disabled, regardless if there are IBT or SHSTK bits in GNU program property. <cet.h> from CET-enabled GCC is automatically included by assembly codes to add GNU_PROPERTY_X86_FEATURE_1_IBT and GNU_PROPERTY_X86_FEATURE_1_SHSTK to GNU program property. _CET_ENDBR is added at the entrance of all assembly functions whose address may be taken. _CET_NOTRACK is used to insert NOTRACK prefix with indirect jump table to support IBT. It is defined as notrack when _CET_NOTRACK is defined in <cet.h>. [BZ #21598] * configure.ac: Add --enable-cet. * configure: Regenerated. * elf/Makefille (all-built-dso): Add a comment. * elf/dl-load.c (filebuf): Moved before "dynamic-link.h". Include <dl-prop.h>. (_dl_map_object_from_fd): Call _dl_process_pt_note on PT_NOTE segment. * elf/dl-open.c: Include <dl-prop.h>. (dl_open_worker): Call _dl_open_check. * elf/rtld.c: Include <dl-prop.h>. (dl_main): Call _rtld_process_pt_note on PT_NOTE segment. Call _rtld_main_check. * sysdeps/generic/dl-prop.h: New file. * sysdeps/i386/dl-cet.c: Likewise. * sysdeps/unix/sysv/linux/x86/cpu-features.c: Likewise. * sysdeps/unix/sysv/linux/x86/dl-cet.h: Likewise. * sysdeps/x86/cet-tunables.h: Likewise. * sysdeps/x86/check-cet.awk: Likewise. * sysdeps/x86/configure: Likewise. * sysdeps/x86/configure.ac: Likewise. * sysdeps/x86/dl-cet.c: Likewise. * sysdeps/x86/dl-procruntime.c: Likewise. * sysdeps/x86/dl-prop.h: Likewise. * sysdeps/x86/libc-start.h: Likewise. * sysdeps/x86/link_map.h: Likewise. * sysdeps/i386/dl-trampoline.S (_dl_runtime_resolve): Add _CET_ENDBR. (_dl_runtime_profile): Likewise. (_dl_runtime_resolve_shstk): New. (_dl_runtime_profile_shstk): Likewise. * sysdeps/linux/x86/Makefile (sysdep-dl-routines): Add dl-cet if CET is enabled. (CFLAGS-.o): Add -fcf-protection if CET is enabled. (CFLAGS-.os): Likewise. (CFLAGS-.op): Likewise. (CFLAGS-.oS): Likewise. (asm-CPPFLAGS): Add -fcf-protection -include cet.h if CET is enabled. (tests-special): Add $(objpfx)check-cet.out. (cet-built-dso): New. (+$(cet-built-dso:=.note)): Likewise. (common-generated): Add $(cet-built-dso:$(common-objpfx)%=%.note). ($(objpfx)check-cet.out): New. (generated): Add check-cet.out. * sysdeps/x86/cpu-features.c: Include <dl-cet.h> and <cet-tunables.h>. (TUNABLE_CALLBACK (set_x86_ibt)): New prototype. (TUNABLE_CALLBACK (set_x86_shstk)): Likewise. (init_cpu_features): Call get_cet_status to check CET status and update dl_x86_feature_1 with CET status. Call TUNABLE_CALLBACK (set_x86_ibt) and TUNABLE_CALLBACK (set_x86_shstk). Disable and lock CET in libc.a. * sysdeps/x86/cpu-tunables.c: Include <cet-tunables.h>. (TUNABLE_CALLBACK (set_x86_ibt)): New function. (TUNABLE_CALLBACK (set_x86_shstk)): Likewise. * sysdeps/x86/sysdep.h (_CET_NOTRACK): New. (_CET_ENDBR): Define if not defined. (ENTRY): Add _CET_ENDBR. * sysdeps/x86/dl-tunables.list (glibc.tune): Add x86_ibt and x86_shstk. * sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve): Add _CET_ENDBR. (_dl_runtime_profile): Likewise.
* x86-64: Check Prefer_FSRM in ifunc-memmove.hH.J. Lu2018-05-211-0/+2
| | | | | | | | | | | | | Although the REP MOVSB implementations of memmove, memcpy and mempcpy aren't used by the current processors, this patch adds Prefer_FSRM check in ifunc-memmove.h so that they can be used in the future. * sysdeps/x86/cpu-features.h (bit_arch_Prefer_FSRM): New. (index_arch_Prefer_FSRM): Likewise. * sysdeps/x86/cpu-tunables.c (TUNABLE_CALLBACK (set_hwcaps)): Also check Prefer_FSRM. * sysdeps/x86_64/multiarch/ifunc-memmove.h (IFUNC_SELECTOR): Also return OPTIMIZE (erms) for Prefer_FSRM.
* Update copyright dates with scripts/update-copyrights.Joseph Myers2018-01-011-1/+1
| | | | | | | * All files with FSF copyright notices: Update copyright dates using scripts/update-copyrights. * locale/programs/charmap-kw.h: Regenerated. * locale/programs/locfile-kw.h: Likewise.
* x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265]H.J. Lu2017-10-201-7/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In _dl_runtime_resolve, use fxsave/xsave/xsavec to preserve all vector, mask and bound registers. It simplifies _dl_runtime_resolve and supports different calling conventions. ld.so code size is reduced by more than 1 KB. However, use fxsave/xsave/xsavec takes a little bit more cycles than saving and restoring vector and bound registers individually. Latency for _dl_runtime_resolve to lookup the function, foo, from one shared library plus libc.so: Before After Change Westmere (SSE)/fxsave 345 866 151% IvyBridge (AVX)/xsave 420 643 53% Haswell (AVX)/xsave 713 1252 75% Skylake (AVX+MPX)/xsavec 559 719 28% Skylake (AVX512+MPX)/xsavec 145 272 87% Ryzen (AVX)/xsavec 280 553 97% This is the worst case where portion of time spent for saving and restoring registers is bigger than majority of cases. With smaller _dl_runtime_resolve code size, overall performance impact is negligible. On IvyBridge, differences in build and test time of binutils with lazy binding GCC and binutils are noises. On Westmere, differences in bootstrap and "makc check" time of GCC 7 with lazy binding GCC and binutils are also noises. [BZ #21265] * sysdeps/x86/cpu-features-offsets.sym (XSAVE_STATE_SIZE_OFFSET): New. * sysdeps/x86/cpu-features.c: Include <libc-pointer-arith.h>. (get_common_indeces): Set xsave_state_size, xsave_state_full_size and bit_arch_XSAVEC_Usable if needed. (init_cpu_features): Remove bit_arch_Use_dl_runtime_resolve_slow and bit_arch_Use_dl_runtime_resolve_opt. * sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt): Removed. (bit_arch_Use_dl_runtime_resolve_slow): Likewise. (bit_arch_Prefer_No_AVX512): Updated. (bit_arch_MathVec_Prefer_No_AVX512): Likewise. (bit_arch_XSAVEC_Usable): New. (STATE_SAVE_OFFSET): Likewise. (STATE_SAVE_MASK): Likewise. [__ASSEMBLER__]: Include <cpu-features-offsets.h>. (cpu_features): Add xsave_state_size and xsave_state_full_size. (index_arch_Use_dl_runtime_resolve_opt): Removed. (index_arch_Use_dl_runtime_resolve_slow): Likewise. (index_arch_XSAVEC_Usable): New. * sysdeps/x86/cpu-tunables.c (TUNABLE_CALLBACK (set_hwcaps)): Support XSAVEC_Usable. Remove Use_dl_runtime_resolve_slow. * sysdeps/x86_64/Makefile (tst-x86_64-1-ENV): New if tunables is enabled. * sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup): Replace _dl_runtime_resolve_sse, _dl_runtime_resolve_avx, _dl_runtime_resolve_avx_slow, _dl_runtime_resolve_avx_opt, _dl_runtime_resolve_avx512 and _dl_runtime_resolve_avx512_opt with _dl_runtime_resolve_fxsave, _dl_runtime_resolve_xsave and _dl_runtime_resolve_xsavec. * sysdeps/x86_64/dl-trampoline.S (DL_RUNTIME_UNALIGNED_VEC_SIZE): Removed. (DL_RUNTIME_RESOLVE_REALIGN_STACK): Check STATE_SAVE_ALIGNMENT instead of VEC_SIZE. (REGISTER_SAVE_BND0): Removed. (REGISTER_SAVE_BND1): Likewise. (REGISTER_SAVE_BND3): Likewise. (REGISTER_SAVE_RAX): Always defined to 0. (VMOV): Removed. (_dl_runtime_resolve_avx): Likewise. (_dl_runtime_resolve_avx_slow): Likewise. (_dl_runtime_resolve_avx_opt): Likewise. (_dl_runtime_resolve_avx512): Likewise. (_dl_runtime_resolve_avx512_opt): Likewise. (_dl_runtime_resolve_sse): Likewise. (_dl_runtime_resolve_sse_vex): Likewise. (USE_FXSAVE): New. (_dl_runtime_resolve_fxsave): Likewise. (USE_XSAVE): Likewise. (_dl_runtime_resolve_xsave): Likewise. (USE_XSAVEC): Likewise. (_dl_runtime_resolve_xsavec): Likewise. * sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx512): Removed. (_dl_runtime_resolve_avx512_opt): Likewise. (_dl_runtime_resolve_avx): Likewise. (_dl_runtime_resolve_avx_opt): Likewise. (_dl_runtime_resolve_sse): Likewise. (_dl_runtime_resolve_sse_vex): Likewise. (_dl_runtime_resolve_fxsave): New. (_dl_runtime_resolve_xsave): Likewise. (_dl_runtime_resolve_xsavec): Likewise.
* x86: Add MathVec_Prefer_No_AVX512 to cpu-features [BZ #21967]H.J. Lu2017-09-121-0/+7
| | | | | | | | | | | | | | | | | | | | | AVX512 functions in mathvec are used on machines with AVX512. An AVX2 wrapper is also provided and it can be used when the AVX512 version isn't profitable. MathVec_Prefer_No_AVX512 is addded to cpu-features. If glibc.tune.hwcaps=MathVec_Prefer_No_AVX512 is set in GLIBC_TUNABLES environment variable, the AVX2 wrapper will be used. Tested on x86-64 machines with and without AVX512. Also verified glibc.tune.hwcaps=MathVec_Prefer_No_AVX512 on AVX512 machine. [BZ #21967] * sysdeps/x86/cpu-features.h (bit_arch_MathVec_Prefer_No_AVX512): New. (index_arch_MathVec_Prefer_No_AVX512): Likewise. * sysdeps/x86/cpu-tunables.c (TUNABLE_CALLBACK (set_hwcaps)): Handle MathVec_Prefer_No_AVX512. * sysdeps/x86_64/fpu/multiarch/ifunc-mathvec-avx512.h (IFUNC_SELECTOR): Return AVX2 version if MathVec_Prefer_No_AVX512 is set.
* x86: Add IBT/SHSTK bits to cpu-featuresH.J. Lu2017-08-141-0/+2
| | | | | | | | | | | | | | | | Add IBT/SHSTK bits to cpu-features for Shadow Stack in Intel Control-flow Enforcement Technology (CET) instructions: https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf * sysdeps/x86/cpu-features.h (bit_cpu_BIT): New. (bit_cpu_SHSTK): Likewise. (index_cpu_IBT): Likewise. (index_cpu_SHSTK): Likewise. (reg_IBT): Likewise. (reg_SHSTK): Likewise. * sysdeps/x86/cpu-tunables.c (TUNABLE_CALLBACK (set_hwcaps)): Handle index_cpu_IBT and index_cpu_SHSTK.
* x86: Rename glibc.tune.ifunc to glibc.tune.hwcapsH.J. Lu2017-06-211-18/+11
| | | | | | | | | | | | | | | | | | | | | | | | Rename glibc.tune.ifunc to glibc.tune.hwcaps and move it to sysdeps/x86/dl-tunables.list since it is x86 specicifc. Also change type of data_cache_size, data_cache_size and non_temporal_threshold to unsigned long int to match size_t. Remove usage DEFAULT_STRLEN from cpu-tunables.c. * elf/dl-tunables.list (glibc.tune.ifunc): Removed. * sysdeps/x86/dl-tunables.list (glibc.tune.hwcaps): New. Remove security_level on all fields. * manual/tunables.texi: Replace ifunc with hwcaps. * sysdeps/x86/cpu-features.c (TUNABLE_CALLBACK (set_ifunc)): Renamed to .. (TUNABLE_CALLBACK (set_hwcaps)): This. (init_cpu_features): Updated. * sysdeps/x86/cpu-features.h (cpu_features): Change type of data_cache_size, data_cache_size and non_temporal_threshold to unsigned long int. * sysdeps/x86/cpu-tunables.c (DEFAULT_STRLEN): Removed. (TUNABLE_CALLBACK (set_ifunc)): Renamed to ... (TUNABLE_CALLBACK (set_hwcaps)): This. Update comments. Don't use DEFAULT_STRLEN.
* tunables: Add IFUNC selection and cache sizesH.J. Lu2017-06-201-0/+330
The current IFUNC selection is based on microbenchmarks in glibc. It should give the best performance for most workloads. But other choices may have better performance for a particular workload or on the hardware which wasn't available at the selection was made. The environment variable, GLIBC_TUNABLES=glibc.tune.ifunc=-xxx,yyy,-zzz...., can be used to enable CPU/ARCH feature yyy, disable CPU/ARCH feature yyy and zzz, where the feature name is case-sensitive and has to match the ones in cpu-features.h. It can be used by glibc developers to override the IFUNC selection to tune for a new processor or improve performance for a particular workload. It isn't intended for normal end users. NOTE: the IFUNC selection may change over time. Please check all multiarch implementations when experimenting. Also, GLIBC_TUNABLES=glibc.tune.x86_non_temporal_threshold=NUMBER is provided to set threshold to use non temporal store to NUMBER, GLIBC_TUNABLES=glibc.tune.x86_data_cache_size=NUMBER to set data cache size, GLIBC_TUNABLES=glibc.tune.x86_shared_cache_size=NUMBER to set shared cache size. * elf/dl-tunables.list (tune): Add ifunc, x86_non_temporal_threshold, x86_data_cache_size and x86_shared_cache_size. * manual/tunables.texi: Document glibc.tune.ifunc, glibc.tune.x86_data_cache_size, glibc.tune.x86_shared_cache_size and glibc.tune.x86_non_temporal_threshold. * sysdeps/unix/sysv/linux/x86/dl-sysdep.c: New file. * sysdeps/x86/cpu-tunables.c: Likewise. * sysdeps/x86/cacheinfo.c (init_cacheinfo): Check and get data cache size, shared cache size and non temporal threshold from cpu_features. * sysdeps/x86/cpu-features.c [HAVE_TUNABLES] (TUNABLE_NAMESPACE): New. [HAVE_TUNABLES] Include <unistd.h>. [HAVE_TUNABLES] Include <elf/dl-tunables.h>. [HAVE_TUNABLES] (TUNABLE_CALLBACK (set_ifunc)): Likewise. [HAVE_TUNABLES] (init_cpu_features): Use TUNABLE_GET to set IFUNC selection, data cache size, shared cache size and non temporal threshold. * sysdeps/x86/cpu-features.h (cpu_features): Add data_cache_size, shared_cache_size and non_temporal_threshold.