about summary refs log tree commit diff
path: root/sysdeps/x86
Commit message (Collapse)AuthorAgeFilesLines
* x86: Properly disable XSAVE related features [BZ #27605]H.J. Lu2021-03-292-0/+56
| | | | | | | 1. Support GLIBC_TUNABLES=glibc.cpu.hwcaps=-XSAVE. 2. Disable all features which depend on XSAVE: a. If OSXSAVE is disabled by glibc tunables. Or b. If both XSAVE and XSAVEC aren't usable.
* elf: Fix not compiling ifunc tests that need gcc ifunc supportSamuel Thibault2021-03-241-0/+2
|
* Build get-cpuid-feature-leaf.c without stack-protector [BZ #27555]Siddhesh Poyarekar2021-03-152-0/+4
| | | | | | | | | | | | | | | __x86_get_cpuid_feature_leaf is called during early startup, before the stack check guard is initialized and is hence not safe to build with stack-protector. Additionally, IFUNC resolvers for static tst-ifunc-isa tests get called too early for stack protector to be useful, so fix them to disable stack protector for the resolver functions. This fixes all failures seen with --enable-stack-protector=all configuration. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* x86: Handle _SC_LEVEL1_ICACHE_LINESIZE [BZ #27444]H.J. Lu2021-03-157-0/+79
| | | | | | | | | | | | | | | | | commit 2d651eb9265d1366d7b9e881bfddd46db9c1ecc4 Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Sep 18 07:55:14 2020 -0700 x86: Move x86 processor cache info to cpu_features missed _SC_LEVEL1_ICACHE_LINESIZE. 1. Add level1_icache_linesize to struct cpu_features. 2. Initialize level1_icache_linesize by calling handle_intel, handle_zhaoxin and handle_amd with _SC_LEVEL1_ICACHE_LINESIZE. 3. Return level1_icache_linesize for _SC_LEVEL1_ICACHE_LINESIZE. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* x86: Set minimum x86-64 level marker [BZ #27318]H.J. Lu2021-03-063-11/+58
| | | | | | | | | | | | | | | | | | | Since the full ISA set used in an ELF binary is unknown to compiler, an x86-64 ISA level marker indicates the minimum, not maximum, ISA set required to run such an ELF binary. We never guarantee a library with an x86-64 ISA level v3 marker doesn't contain other ISAs beyond x86-64 ISA level v3, like AVX VNNI. We check the x86-64 ISA level marker for the minimum ISA set. Since -march=sandybridge enables only some ISAs in x86-64 ISA level v3, we should set the needed ISA marker to v2. Otherwise, libc is compiled with -march=sandybridge will fail to run on Sandy Bridge: $ ./elf/ld.so ./libc.so ./libc.so: (p) CPU ISA level is lower than required: needed: 7; got: 3 Set the minimum, instead of maximum, x86-64 ISA level marker should have no impact on the glibc-hwcaps directory assignment logic in ldconfig nor ld.so.
* x86: Add CPU-specific diagnostics to ld.so --list-diagnosticsFlorian Weimer2021-03-022-0/+120
|
* x86: Automate generation of PREFERRED_FEATURE_INDEX_1 bitfieldFlorian Weimer2021-03-022-34/+51
| | | | | Use a .def file to define the bitfield layout, so that it is possible to iterate over field members using the preprocessor.
* x86: Use x86/nptl/pthreaddef.hH.J. Lu2021-02-221-0/+49
| | | | | | | 1. Move sysdeps/i386/nptl/pthreaddef.h to sysdeps/x86/nptl/pthreaddef.h. 2. Remove sysdeps/x86_64/nptl/pthreaddef.h. Reviewed-by: DJ Delorie <dj@redhat.com>
* x86: Remove unused variables for raw cache sizes from cacheinfo.hFlorian Weimer2021-02-221-12/+0
|
* <bits/platform/x86.h>: Correct x86_cpu_TBMH.J. Lu2021-02-221-1/+1
| | | | x86_cpu_TBM should be x86_cpu_index_80000001_ecx + 21.
* x86: Remove the extra space between "# endif"H.J. Lu2021-02-121-1/+1
| | | | | | | | | | Remove the extra space between "# endif" left over from commit f380868f6dcfdeae8d449d556298d9c41012ed8d Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Dec 24 15:43:34 2020 -0800 Remove _ISOMAC check from <cpu-features.h>
* x86: Use SIZE_MAX instead of (long int)-1 for tunable range valueSiddhesh Poyarekar2021-02-101-5/+5
| | | | | The tunable types are SIZE_T, so set the ranges to the correct maximum value, i.e. SIZE_MAX.
* tunables: Simplify TUNABLE_SET interfaceSiddhesh Poyarekar2021-02-101-9/+6
| | | | | | | | | | | | | | | The TUNABLE_SET interface took a primitive C type argument, which resulted in inconsistent type conversions internally due to incorrect dereferencing of types, especialy on 32-bit architectures. This change simplifies the TUNABLE setting logic along with the interfaces. Now all numeric tunable values are stored as signed numbers in tunable_num_t, which is intmax_t. All calls to set tunables cast the input value to its primitive type and then to tunable_num_t for storage. This relies on gcc-specific (although I suspect other compilers woul also do the same) unsigned to signed integer conversion semantics, i.e. the bit pattern is conserved. The reverse conversion is guaranteed by the standard.
* x86: Add PTWRITE feature detection [BZ #27346]H.J. Lu2021-02-079-5/+44
| | | | | | | 1. Add CPUID_INDEX_14_ECX_0 for CPUID leaf 0x14 to detect PTWRITE feature in EBX of CPUID leaf 0x14 with ECX == 0. 2. Add PTWRITE detection to CPU feature tests. 3. Add 2 static CPU feature tests.
* x86: Adding an upper bound for Enhanced REP MOVSB.Sajan Karumanchi2021-02-023-1/+20
| | | | | | | | | | | | | | | In the process of optimizing memcpy for AMD machines, we have found the vector move operations are outperforming enhanced REP MOVSB for data transfers above the L2 cache size on Zen3 architectures. To handle this use case, we are adding an upper bound parameter on enhanced REP MOVSB:'__x86_rep_movsb_stop_threshold'. As per large-bench results, we are configuring this parameter to the L2 cache size for AMD machines and applicable from Zen3 architecture supporting the ERMS feature. For architectures other than AMD, it is the computed value of non-temporal threshold parameter. Reviewed-by: Premachandra Mallappa <premachandra.mallappa@amd.com>
* sysconf: Add _SC_MINSIGSTKSZ/_SC_SIGSTKSZ [BZ #20305]H.J. Lu2021-02-012-0/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add _SC_MINSIGSTKSZ for the minimum signal stack size derived from AT_MINSIGSTKSZ, which is the minimum number of bytes of free stack space required in order to gurantee successful, non-nested handling of a single signal whose handler is an empty function, and _SC_SIGSTKSZ which is the suggested minimum number of bytes of stack space required for a signal stack. If AT_MINSIGSTKSZ isn't available, sysconf (_SC_MINSIGSTKSZ) returns MINSIGSTKSZ. On Linux/x86 with XSAVE, the signal frame used by kernel is composed of the following areas and laid out as: ------------------------------ | alignment padding | ------------------------------ | xsave buffer | ------------------------------ | fsave header (32-bit only) | ------------------------------ | siginfo + ucontext | ------------------------------ Compute AT_MINSIGSTKSZ value as size of xsave buffer + size of fsave header (32-bit only) + size of siginfo and ucontext + alignment padding. If _SC_SIGSTKSZ_SOURCE or _GNU_SOURCE are defined, MINSIGSTKSZ and SIGSTKSZ are redefined as /* Default stack size for a signal handler: sysconf (SC_SIGSTKSZ). */ # undef SIGSTKSZ # define SIGSTKSZ sysconf (_SC_SIGSTKSZ) /* Minimum stack size for a signal handler: SIGSTKSZ. */ # undef MINSIGSTKSZ # define MINSIGSTKSZ SIGSTKSZ Compilation will fail if the source assumes constant MINSIGSTKSZ or SIGSTKSZ. The reason for not simply increasing the kernel's MINSIGSTKSZ #define (apart from the fact that it is rarely used, due to glibc's shadowing definitions) was that userspace binaries will have baked in the old value of the constant and may be making assumptions about it. For example, the type (char [MINSIGSTKSZ]) changes if this #define changes. This could be a problem if an newly built library tries to memcpy() or dump such an object defined by and old binary. Bounds-checking and the stack sizes passed to things like sigaltstack() and makecontext() could similarly go wrong.
* x86: Properly set usable CET feature bits [BZ #26625]H.J. Lu2021-01-2910-13/+120
| | | | | | | | | | | | | | | | | | | | | | | | commit 94cd37ebb293321115a36a422b091fdb72d2fb08 Author: H.J. Lu <hjl.tools@gmail.com> Date: Wed Sep 16 05:27:32 2020 -0700 x86: Use HAS_CPU_FEATURE with IBT and SHSTK [BZ #26625] broke GLIBC_TUNABLES=glibc.cpu.hwcaps=-IBT,-SHSTK since it can no longer disable IBT nor SHSTK. Handle IBT and SHSTK with: 1. Revert commit 94cd37ebb293321115a36a422b091fdb72d2fb08. 2. Clears the usable CET feature bits if kernel doesn't support CET. 3. Add GLIBC_TUNABLES tests without dlopen. 4. Add tests to verify that CPU_FEATURE_USABLE on IBT and SHSTK matches _get_ssp. 5. Update GLIBC_TUNABLES tests with dlopen to verify that CET is disabled with GLIBC_TUNABLES. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* Fix misplaced constAndreas Schwab2021-01-252-2/+2
| | | | | Constify __x86_cacheinfo_p and __x86_cpu_features_p, not their pointer target types.
* x86: Properly match CPU features in /proc/cpuinfo [BZ #27222]H.J. Lu2021-01-221-13/+30
| | | | | | | | | | | | | | Search " YYY " and " YYY\n", instead of "YYY", to avoid matching "XXXYYYZZZ" with "YYY". Update /proc/cpuinfo CPU feature names: /proc/cpuinfo glibc ------------------------------------------------ avx512vbmi AVX512_VBMI dts DS pni SSE3 tsc_deadline_timer TSC_DEADLINE
* x86: Check ifunc resolver with CPU_FEATURE_USABLE [BZ #27072]H.J. Lu2021-01-216-0/+184
| | | | | | | | Check ifunc resolver with CPU_FEATURE_USABLE and tunables in dynamic and static executables to verify that CPUID features are initialized early in static PIE. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* Use hidden visibility for early static PIE codeSzabolcs Nagy2021-01-211-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | Extern symbol access in position independent code usually involves GOT indirection which needs RELATIVE reloc in a static linked PIE. (On some targets this is avoided e.g. because the linker can relax a GOT access to a pc-relative access, but this is not generally true.) Code that runs before static PIE self relocation must avoid relying on dynamic relocations which can be ensured by using hidden visibility. However we cannot just make all symbols hidden: On i386, all calls to IFUNC functions must go through PLT and calls to hidden functions CANNOT go through PLT in PIE since EBX used in PIE PLT may not be set up for local calls to hidden IFUNC functions. This patch aims to make symbol references hidden in code that is used before and by _dl_relocate_static_pie when building a static PIE libc. Note: for an object that is used in the startup code, its references and definition may not have consistent visibility: it is only forced hidden in the startup code. This is needed for fixing bug 27072. Co-authored-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* <sys/platform/x86.h>: Remove the C preprocessor magicH.J. Lu2021-01-2112-832/+1153
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In <sys/platform/x86.h>, define CPU features as enum instead of using the C preprocessor magic to make it easier to wrap this functionality in other languages. Move the C preprocessor magic to internal header for better GCC codegen when more than one features are checked in a single expression as in x86-64 dl-hwcaps-subdirs.c. 1. Rename COMMON_CPUID_INDEX_XXX to CPUID_INDEX_XXX. 2. Move CPUID_INDEX_MAX to sysdeps/x86/include/cpu-features.h. 3. Remove struct cpu_features and __x86_get_cpu_features from <sys/platform/x86.h>. 4. Add __x86_get_cpuid_feature_leaf to <sys/platform/x86.h> and put it in libc. 5. Make __get_cpu_features() private to glibc. 6. Replace __x86_get_cpu_features(N) with __get_cpu_features(). 7. Add _dl_x86_get_cpu_features to GLIBC_PRIVATE. 8. Use a single enum index for each CPU feature detection. 9. Pass the CPUID feature leaf to __x86_get_cpuid_feature_leaf. 10. Return zero struct cpuid_feature for the older glibc binary with a smaller CPUID_INDEX_MAX [BZ #27104]. 11. Inside glibc, use the C preprocessor magic so that cpu_features data can be loaded just once leading to more compact code for glibc. 256 bits are used for each CPUID leaf. Some leaves only contain a few features. We can add exceptions to such leaves. But it will increase code sizes and it is harder to provide backward/forward compatibilities when new features are added to such leaves in the future. When new leaves are added, _rtld_global_ro offsets will change which leads to race condition during in-place updates. We may avoid in-place updates by 1. Rename the old glibc. 2. Install the new glibc. 3. Remove the old glibc. NB: A function, __x86_get_cpuid_feature_leaf , is used to avoid the copy relocation issue with IFUNC resolver as shown in IFUNC resolver tests.
* x86: Move x86 processor cache info to cpu_featuresH.J. Lu2021-01-145-412/+551
| | | | | | | | | | | 1. Move x86 processor cache info to _dl_x86_cpu_features in ld.so. 2. Update tunable bounds with TUNABLE_SET_WITH_BOUNDS. 3. Move x86 cache info initialization to dl-cacheinfo.h and initialize x86 cache info in init_cpu_features (). 4. Put x86 cache info for libc in cacheinfo.h, which is included in libc-start.c in libc.a and is included in cacheinfo.c in libc.so. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* Fix x86 build with --enable-tunable=noAdhemerval Zanella2021-01-141-0/+1
| | | | Checked on x86_64-linux-gnu.
* ldconfig/x86: Store ISA level in cache and aux cacheH.J. Lu2021-01-131-0/+31
| | | | | | | | | | | Store ISA level in the portion of the unused upper 32 bits of the hwcaps field in cache and the unused pad field in aux cache. ISA level is stored and checked only for shared objects in glibc-hwcaps subdirectories. The shared objects in the default directories aren't checked since there are no fallbacks for these shared objects. Tested on x86-64-v2, x86-64-v3 and x86-64-v4 machines with --disable-hardcoded-path-in-tests and --enable-hardcoded-path-in-tests.
* x86: Set header.feature_1 in TCB for always-on CET [BZ #27177]H.J. Lu2021-01-133-1/+11
| | | | | Update dl_cet_check() to set header.feature_1 in TCB when both IBT and SHSTK are always on.
* x86: Support GNU_PROPERTY_X86_ISA_1_V[234] marker [BZ #26717]H.J. Lu2021-01-0717-41/+607
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GCC 11 supports -march=x86-64-v[234] to enable x86 micro-architecture ISA levels: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97250 and -mneeded to emit GNU_PROPERTY_X86_ISA_1_NEEDED property with GNU_PROPERTY_X86_ISA_1_V[234] marker: https://gitlab.com/x86-psABIs/x86-64-ABI/-/merge_requests/13 Binutils support for GNU_PROPERTY_X86_ISA_1_V[234] marker were added by commit b0ab06937385e0ae25cebf1991787d64f439bf12 Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Oct 30 06:49:57 2020 -0700 x86: Support GNU_PROPERTY_X86_ISA_1_BASELINE marker and commit 32930e4edbc06bc6f10c435dbcc63131715df678 Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Oct 9 05:05:57 2020 -0700 x86: Support GNU_PROPERTY_X86_ISA_1_V[234] marker GNU_PROPERTY_X86_ISA_1_NEEDED property in x86 ELF binaries indicate the micro-architecture ISA level required to execute the binary. The marker must be added by programmers explicitly in one of 3 ways: 1. Pass -mneeded to GCC. 2. Add the marker in the linker inputs as this patch does. 3. Pass -z x86-64-v[234] to the linker. Add GNU_PROPERTY_X86_ISA_1_BASELINE and GNU_PROPERTY_X86_ISA_1_V[234] marker support to ld.so if binutils 2.32 or newer is used to build glibc: 1. Add GNU_PROPERTY_X86_ISA_1_BASELINE and GNU_PROPERTY_X86_ISA_1_V[234] markers to elf.h. 2. Add GNU_PROPERTY_X86_ISA_1_BASELINE and GNU_PROPERTY_X86_ISA_1_V[234] marker to abi-note.o based on the ISA level used to compile abi-note.o, assuming that the same ISA level is used to compile the whole glibc. 3. Add isa_1 to cpu_features to record the supported x86 ISA level. 4. Rename _dl_process_cet_property_note to _dl_process_property_note and add GNU_PROPERTY_X86_ISA_1_V[234] marker detection. 5. Update _rtld_main_check and _dl_open_check to check loaded objects with the incompatible ISA level. 6. Add a testcase to verify that dlopen an x86-64-v4 shared object fails on lesser platforms. 7. Use <get-isa-level.h> in dl-hwcaps-subdirs.c and tst-glibc-hwcaps.c. Tested under i686, x32 and x86-64 modes on x86-64-v2, x86-64-v3 and x86-64-v4 machines. Marked elf/tst-isa-level-1 with x86-64-v4, ran it on x86-64-v3 machine and got: [hjl@gnu-cfl-2 build-x86_64-linux]$ ./elf/tst-isa-level-1 ./elf/tst-isa-level-1: CPU ISA level is lower than required [hjl@gnu-cfl-2 build-x86_64-linux]$
* Drop nan-pseudo-number.h usage from testsSiddhesh Poyarekar2021-01-041-1/+0
| | | | | | | | | | Make the tests use TEST_COND_intel96 to decide on whether to build the unnormal tests instead of the macro in nan-pseudo-number.h and then drop the header inclusion. This unbreaks test runs on all architectures that do not have ldbl-96. Also drop the HANDLE_PSEUDO_NUMBERS macro since it is not used anywhere.
* Update copyright dates with scripts/update-copyrightsPaul Eggert2021-01-0281-81/+81
| | | | | | | | | | | | | | | | I used these shell commands: ../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright (cd ../glibc && git commit -am"[this commit message]") and then ignored the output, which consisted lines saying "FOO: warning: copyright statement not found" for each of 6694 files FOO. I then removed trailing white space from benchtests/bench-pthread-locks.c and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this diagnostic from Savannah: remote: *** pre-commit check failed ... remote: *** error: lines with trailing whitespace found remote: error: hook declined to update refs/heads/master
* x86 long double: Consider pseudo numbers as signalingSiddhesh Poyarekar2020-12-301-0/+30
| | | | | | | Add support to treat pseudo-numbers specially and implement x86 version to consider all of them as signaling. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* Remove _ISOMAC check from <cpu-features.h>H.J. Lu2020-12-241-81/+75
| | | | | Remove _ISOMAC check from <cpu-features.h> since it isn't an installer header file.
* x86: Remove the duplicated CPU_FEATURE_CPU_PH.J. Lu2020-12-241-2/+0
| | | | | CPU_FEATURE_CPU_P is defined in sysdeps/x86/sys/platform/x86.h. Remove the duplicated CPU_FEATURE_CPU_P in sysdeps/x86/include/cpu-features.h.
* Partially revert 681900d29683722b1cb0a8e565a0585846ec5a61Siddhesh Poyarekar2020-12-242-12/+1
| | | | | | | | | Do not attempt to fix the significand top bit in long double input received in printf. The code should never reach here because isnan should now detect unnormals as NaN. This is already a NOP for glibc since it uses the gcc __builtin_isnan, which detects unnormals as NaN. Reviewed-by: Florian Weimer <fweimer@redhat.com>
* x86 long double: Support pseudo numbers in isnanlSiddhesh Poyarekar2020-12-241-0/+45
| | | | | | | This syncs up isnanl behaviour with gcc. Also move the isnanl implementation to sysdeps/x86 and remove the sysdeps/x86_64 version. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* x86 long double: Support pseudo numbers in fpclassifylSiddhesh Poyarekar2020-12-241-0/+46
| | | | | | | | Also move sysdeps/i386/fpu/s_fpclassifyl.c to sysdeps/x86/fpu/s_fpclassifyl.c and remove sysdeps/x86_64/fpu/s_fpclassifyl.c Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* <sys/platform/x86.h>: Add Intel LAM supportH.J. Lu2020-12-222-0/+4
| | | | | | | | Add Intel Linear Address Masking (LAM) support to <sys/platform/x86.h>. HAS_CPU_FEATURE (LAM) can be used to detect if LAM is enabled in CPU. LAM modifies the checking that is applied to 64-bit linear addresses, allowing software to use of the untranslated address bits for metadata.
* x86: Remove the default REP MOVSB threshold tunable value [BZ #27061]H.J. Lu2020-12-141-2/+4
| | | | | | | | | | | Since we can't tell if the tunable value is set by user or not: https://sourceware.org/bugzilla/show_bug.cgi?id=27069 remove the default REP MOVSB threshold tunable value so that the correct default value will be set correctly by init_cacheinfo (). Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* elf: Pass the fd to note processingSzabolcs Nagy2020-12-111-3/+3
| | | | | | | | | | | | | | To handle GNU property notes on aarch64 some segments need to be mmaped again, so the fd of the loaded ELF module is needed. When the fd is not available (kernel loaded modules), then -1 is passed. The fd is passed to both _dl_process_pt_gnu_property and _dl_process_pt_note for consistency. Target specific note processing functions are updated accordingly. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* x86: Adjust tst-cpu-features-supports.c for GCC 11H.J. Lu2020-12-041-5/+10
| | | | | | Check HAS_CPU_FEATURE instead of CPU_FEATURE_USABLE for FSGSBASE, IBT, LM, SHSTK and XSAVES since FSGSBASE requires kernel support, IBT/SHSTK/LM require OS support and XSAVES is supervisor-mode only.
* x86: Set RDRAND usable if CPU supports RDRANDH.J. Lu2020-12-041-0/+1
| | | | Set RDRAND usable if CPU supports RDRAND.
* x86: Remove UP macro. Define LOCK_PREFIX unconditionally.Florian Weimer2020-11-131-7/+1
| | | | | | | The UP macro is never defined. Also define LOCK_PREFIX unconditionally, to the same string. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* x86: Restore processing of cache size tunables in init_cacheinfoFlorian Weimer2020-10-281-8/+4
| | | | | Fixes and partially reverts commit 59803e81f96b479c17f583b31eac44b5 ("x86: Optimizing memcpy for AMD Zen architecture.").
* x86: Optimizing memcpy for AMD Zen architecture.Sajan Karumanchi2020-10-281-6/+26
| | | | | | | | | | | | | | Modifying the shareable cache '__x86_shared_cache_size', which is a factor in computing the non-temporal threshold parameter '__x86_shared_non_temporal_threshold' to optimize memcpy for AMD Zen architectures. In the existing implementation, the shareable cache is computed as 'L3 per thread, L2 per core'. Recomputing this shareable cache as 'L3 per CCX(Core-Complex)' has brought in performance gains. As per the large bench variant results, this patch also addresses the regression problem on AMD Zen architectures. Reviewed-by: Premachandra Mallappa <premachandra.mallappa@amd.com>
* x86: Initialize CPU info via IFUNC relocation [BZ 26203]H.J. Lu2020-10-167-857/+943
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | X86 CPU features in ld.so are initialized by init_cpu_features, which is invoked by DL_PLATFORM_INIT from _dl_sysdep_start. But when ld.so is loaded by static executable, DL_PLATFORM_INIT is never called. Also x86 cache info in libc.o and libc.a is initialized by a constructor which may be called too late. Since some fields in _rtld_global_ro in ld.so are initialized by dynamic relocation, we can also initialize x86 CPU features in _rtld_global_ro in ld.so and cache info in libc.so by initializing dummy function pointers in ld.so and libc.so via IFUNC relocation. Key points: 1. IFUNC is always supported, independent of --enable-multi-arch or --disable-multi-arch. Linker generates IFUNC relocations from input IFUNC objects and ld.so performs IFUNC relocations. 2. There are no IFUNC dependencies in ld.so before dynamic relocation have been performed, 3. The x86 CPU features in ld.so is initialized by DL_PLATFORM_INIT in dynamic executable and by IFUNC relocation in dlopen in static executable. 4. The x86 cache info in libc.o is initialized by IFUNC relocation. 5. In libc.a, both x86 CPU features and cache info are initialized from ARCH_INIT_CPU_FEATURES, not by IFUNC relocation, before __libc_early_init is called. Note: _dl_x86_init_cpu_features can be called more than once from DL_PLATFORM_INIT and during relocation in ld.so.
* <sys/platform/x86.h>: Add FSRCS/FSRS/FZLRM supportH.J. Lu2020-10-093-0/+18
| | | | | Add Fast Short REP CMP and SCA (FSRCS), Fast Short REP STO (FSRS) and Fast Zero-Length REP MOV (FZLRM) support to <sys/platform/x86.h>.
* <sys/platform/x86.h>: Add Intel HRESET supportH.J. Lu2020-10-092-0/+4
| | | | Add Intel HRESET support to <sys/platform/x86.h>.
* <sys/platform/x86.h>: Add AVX-VNNI supportH.J. Lu2020-10-093-0/+7
| | | | Add AVX-VNNI support to <sys/platform/x86.h>.
* <sys/platform/x86.h>: Add AVX512_FP16 supportH.J. Lu2020-10-093-3/+7
| | | | Add AVX512_FP16 support to <sys/platform/x86.h>.
* <sys/platform/x86.h>: Add Intel UINTR supportH.J. Lu2020-10-092-3/+4
| | | | Add Intel UINTR support to <sys/platform/x86.h>.
* Reversing calculation of __x86_shared_non_temporal_thresholdPatrick McGehearty2020-09-281-5/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The __x86_shared_non_temporal_threshold determines when memcpy on x86 uses non_temporal stores to avoid pushing other data out of the last level cache. This patch proposes to revert the calculation change made by H.J. Lu's patch of June 2, 2017. H.J. Lu's patch selected a threshold suitable for a single thread getting maximum performance. It was tuned using the single threaded large memcpy micro benchmark on an 8 core processor. The last change changes the threshold from using 3/4 of one thread's share of the cache to using 3/4 of the entire cache of a multi-threaded system before switching to non-temporal stores. Multi-threaded systems with more than a few threads are server-class and typically have many active threads. If one thread consumes 3/4 of the available cache for all threads, it will cause other active threads to have data removed from the cache. Two examples show the range of the effect. John McCalpin's widely parallel Stream benchmark, which runs in parallel and fetches data sequentially, saw a 20% slowdown with this patch on an internal system test of 128 threads. This regression was discovered when comparing OL8 performance to OL7. An example that compares normal stores to non-temporal stores may be found at https://vgatherps.github.io/2018-09-02-nontemporal/. A simple test shows performance loss of 400 to 500% due to a failure to use nontemporal stores. These performance losses are most likely to occur when the system load is heaviest and good performance is critical. The tunable x86_non_temporal_threshold can be used to override the default for the knowledgable user who really wants maximum cache allocation to a single thread in a multi-threaded system. The manual entry for the tunable has been expanded to provide more information about its purpose. modified: sysdeps/x86/cacheinfo.c modified: manual/tunables.texi