mirror/glibc - mirror of git://sourceware.org/git/glibc.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	x86: Enable non-temporal memset for Hygon processors	Feifei Wang	2024-08-26	2	-3/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch uses 'Avoid_Non_Temporal_Memset' flag to access the non-temporal memset implementation for hygon processors. Test Results: hygon1 arch x86_memset_non_temporal_threshold = 8MB size new performance time / old performance time 1MB 0.994 4MB 0.996 8MB 0.670 16MB 0.343 32MB 0.355 hygon2 arch x86_memset_non_temporal_threshold = 8MB size new performance time / old performance time 1MB 1 4MB 1 8MB 1.312 16MB 0.822 32MB 0.830 hygon3 arch x86_memset_non_temporal_threshold = 8MB size new performance time / old performance time 1MB 1 4MB 0.990 8MB 0.737 16MB 0.390 32MB 0.401 For hygon arch with this patch, non-temporal stores can improve performance by 20% - 65%. Signed-off-by: Feifei Wang <wangfeifei@hygon.cn> Reviewed-by: Jing Li <lijing@hygon.cn> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	x86: Add cache information support for Hygon processors	Feifei Wang	2024-08-26	1	-0/+60
\| \| \| \| \| \| \| \| \| \|	Add hygon branch in dl_init_cacheinfo function to initialize cache size variables for hygon processors. In the meanwhile, add handle_hygon() function to get cache information. Signed-off-by: Feifei Wang <wangfeifei@hygon.cn> Reviewed-by: Jing Li <lijing@hygon.cn> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	x86: Add new architecture type for Hygon processors	Feifei Wang	2024-08-26	2	-3/+17
\| \| \| \| \| \| \| \| \| \|	Add a new architecture type arch_kind_hygon to spilt Hygon branch from AMD. This is to facilitate the Hygon processors to make settings that are suitable for its own characteristics. Signed-off-by: Feifei Wang <wangfeifei@hygon.cn> Reviewed-by: Jing Li <lijing@hygon.cn> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	x86: Add `Avoid_STOSB` tunable to allow NT memset without ERMS	Noah Goldstein	2024-08-15	5	-7/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The goal of this flag is to allow targets which don't prefer/have ERMS to still access the non-temporal memset implementation. There are 4 cases for tuning memset: 1) `Avoid_STOSB && Avoid_Non_Temporal_Memset` - Memset with temporal stores 2) `Avoid_STOSB && !Avoid_Non_Temporal_Memset` - Memset with temporal/non-temporal stores. Non-temporal path goes through `rep stosb` path. We accomplish this by setting `x86_rep_stosb_threshold` to `x86_memset_non_temporal_threshold`. 3) `!Avoid_STOSB && Avoid_Non_Temporal_Memset` - Memset with temporal stores/`rep stosb` 3) `!Avoid_STOSB && !Avoid_Non_Temporal_Memset` - Memset with temporal stores/`rep stosb`/non-temporal stores. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	x86: Use `Avoid_Non_Temporal_Memset` to control non-temporal path	Noah Goldstein	2024-08-15	2	-8/+23
\| \| \| \| \| \| \| \| \| \|	This is just a refactor and there should be no behavioral change from this commit. The goal is to make `Avoid_Non_Temporal_Memset` a more universal knob for controlling whether we use non-temporal memset rather than having extra logic based on vendor. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	x86: Tunables may incorrectly set Prefer_PMINUB_for_stringop (bug 32047)	Florian Weimer	2024-08-02	1	-0/+1
\| \| \| \| \| \| \|	Fixes commit 5bcf6265f215326d14dfacdce8532792c2c7f8f8 ("x86: Disable non-temporal memset on Skylake Server"). Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
*	x86: Add missing switch/case fall-through markers to init_cpu_features	Florian Weimer	2024-08-02	1	-0/+2
\| \| \| \| \| \| \|	The commits introducing these fall-throughs intended them to happen. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
*	x86: Disable non-temporal memset on Skylake Server	Noah Goldstein	2024-07-16	5	-12/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The original commit enabling non-temporal memset on Skylake Server had erroneous benchmarks (actually done on ICX). Further benchmarks indicate non-temporal stores may in fact by a regression on Skylake Server. This commit may be over-cautious in some cases, but should avoid any regressions for 2.40. Tested using qemu on all x86_64 cpu arch supported by both qemu + GLIBC. Reviewed-by: DJ Delorie <dj@redhat.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	x86: Set default non_temporal_threshold for Zhaoxin processors	MayShao-oc	2024-06-30	2	-2/+5
\| \| \| \| \| \| \| \| \|	Current 'non_temporal_threshold' set to 'non_temporal_threshold_lowbound' on Zhaoxin processors without ERMS. The default 'non_temporal_threshold_lowbound' is too small for the KH-40000 and KX-7000 Zhaoxin processors, this patch updates the value to 'shared / cachesize_non_temporal_divisor'. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
*	x86: Set preferred CPU features on the KH-40000 and KX-7000 Zhaoxin processors	MayShao-oc	2024-06-30	1	-16/+35
\| \| \| \| \| \| \| \| \| \| \| \|	Fix code formatting under the Zhaoxin branch and add comments for different Zhaoxin models. Unaligned AVX load are slower on KH-40000 and KX-7000, so disable the AVX_Fast_Unaligned_Load. Enable Prefer_No_VZEROUPPER and Fast_Unaligned_Load features to use sse2_unaligned version of memset,strcpy and strcat. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
*	elf: Remove HWCAP_IMPORTANT	Stefan Liebler	2024-06-18	1	-13/+0
\| \| \| \| \| \| \|	Remove the definitions of HWCAP_IMPORTANT after removal of LD_HWCAP_MASK / tunable glibc.cpu.hwcap_mask. There HWCAP_IMPORTANT was used as default value. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
*	elf: Remove _DL_PLATFORMS_COUNT	Stefan Liebler	2024-06-18	3	-12/+4
\| \| \| \| \| \| \| \| \|	Remove the definitions of _DL_PLATFORMS_COUNT as those are not used anymore after removal in elf/dl-cache.c:search_cache(). Note: On x86, we can also get rid of the definitions HWCAP_PLATFORMS_START and HWCAP_PLATFORMS_COUNT. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
*	elf: Remove _DL_FIRST_PLATFORM	Stefan Liebler	2024-06-18	1	-3/+0
\| \| \| \| \| \| \| \| \| \|	Remove the definitions of _DL_FIRST_PLATFORM as those were only used in the _DL_HWCAP_PLATFORM definitions and in _dl_string_platform(). Both were removed. Note: Removed on every architecture despite of powerpc, where _dl_string_platform() is still used. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
*	elf: Remove _DL_HWCAP_PLATFORM	Stefan Liebler	2024-06-18	1	-3/+0
\| \| \| \| \| \|	Remove the definitions of _DL_HWCAP_PLATFORM as those are not used anymore after removal in elf/dl-cache.c:search_cache(). Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
*	elf: Remove platform strings in dl-procinfo.c	Stefan Liebler	2024-06-18	1	-16/+0
\| \| \| \| \| \|	Remove the platform strings in dl-procinfo.c where also the implementation of _dl_string_platform() was removed. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
*	elf: Remove _dl_string_platform	Stefan Liebler	2024-06-18	1	-15/+0
\| \| \| \| \| \| \| \| \|	Despite of powerpc where the returned integer is stored in tcb, and the diagnostics output, there is no user anymore. Thus this patch removes the diagnostics output and _dl_string_platform for all other platforms. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
*	x86: Remove HWCAP_START and HWCAP_COUNT	Stefan Liebler	2024-06-18	1	-6/+0
\| \| \| \| \| \| \| \| \| \|	Both defines are not used anymore. Those were only used for _dl_string_hwcap(), which itself was removed with commit ab40f20364f4a417a63dd51fdd943742070bfe96 "elf: Remove _dl_string_hwcap" Just clean up. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
*	Convert to autoconf 2.72 (vanilla release, no distribution patches)	Andreas K. Hüttel	2024-06-17	1	-12/+16
\| \| \| \| \| \| \|	As discussed at the patch review meeting Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org> Reviewed-by: Simon Chopin <simon.chopin@canonical.com>
*	x86: Fix value for `x86_memset_non_temporal_threshold` when it is undesirable	Noah Goldstein	2024-06-14	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we don't want to use non-temporal stores for memset, we set `x86_memset_non_temporal_threshold` to SIZE_MAX. The current code, however, we using `maximum_non_temporal_threshold` as the upper bound which is `SIZE_MAX >> 4` so we ended up with a value of `0`. Fix is to just use `SIZE_MAX` as the upper bound for when setting the tunable. Tested-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	x86: Properly set x86 minimum ISA level [BZ #31883]	H.J. Lu	2024-06-12	3	-3/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Properly set libc_cv_have_x86_isa_level in shell for MINIMUM_X86_ISA_LEVEL defined as (__X86_ISA_V1 + __X86_ISA_V2 + __X86_ISA_V3 + __X86_ISA_V4) Also set __X86_ISA_V2 to 1 for i386 if __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8 is defined. There are no changes in config.h nor in config.make on x86-64. On i386, -march=x86-64-v2 with GCC generates #define MINIMUM_X86_ISA_LEVEL 2 in config.h and have-x86-isa-level = 2 in config.make. This fixes BZ #31883. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
*	x86: Properly set MINIMUM_X86_ISA_LEVEL for i386 [BZ #31867]	H.J. Lu	2024-06-11	2	-4/+12
\| \| \| \| \| \| \| \| \| \| \|	On i386, set the default minimum ISA level to 0, not 1 (baseline which includes SSE2). There are no changes in config.h nor in config.make on x86-64. This fixes BZ #31867. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Tested-by: Ian Jordan <immoloism@gmail.com> Reviewed-by: Sam James <sam@gentoo.org> Reviewed-by: Florian Weimer <fweimer@redhat.com>
*	x86: Enable non-temporal memset tunable for AMD	Joe Damato	2024-06-10	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In commit 46b5e98ef6f1 ("x86: Add seperate non-temporal tunable for memset") a tunable threshold for enabling non-temporal memset was added, but only for Intel hardware. Since that commit, new benchmark results suggest that non-temporal memset is beneficial on AMD, as well, so allow this tunable to be set for AMD. See: https://docs.google.com/spreadsheets/d/1opzukzvum4n6-RUVHTGddV6RjAEil4P2uMjjQGLbLcU/edit?usp=sharing which has been updated to include data using different stategies for large memset on AMD Zen2, Zen3, and Zen4. Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
*	x86: Add seperate non-temporal tunable for memset	Noah Goldstein	2024-05-30	5	-2/+31
\| \| \| \| \| \| \| \| \| \| \|	The tuning for non-temporal stores for memset vs memcpy is not always the same. This includes both the exact value and whether non-temporal stores are profitable at all for a given arch. This patch add `x86_memset_non_temporal_threshold`. Currently we disable non-temporal stores for non Intel vendors as the only benchmarks showing its benefit have been on Intel hardware. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	i386: Disable Intel Xeon Phi tests for GCC 15 and above (BZ 31782)	Sunil K Pandey	2024-05-27	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch disables Intel Xeon Phi tests for GCC 15 and above. GCC 15 removed Intel Xeon Phi ISA support. commit e1a7e2c54d52d0ba374735e285b617af44841ace Author: Haochen Jiang <haochen.jiang@intel.com> Date: Mon May 20 10:43:44 2024 +0800 i386: Remove Xeon Phi ISA support Fixes BZ 31782. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	support: Add envp argument to support_capture_subprogram	Adhemerval Zanella	2024-05-07	1	-1/+1
\| \| \| \| \|	So tests can specify a list of environment variables. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
*	x86: In ld.so, diagnose missing APX support in APX-only builds	Florian Weimer	2024-04-25	1	-0/+5
\| \| \| \| \| \| \| \| \|	At this point, this is mainly a tool for testing the early ld.so CPU compatibility diagnostics: GCC uses the new instructions in most functions, so it's easy to spot if some of the early code is not built correctly. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	x86: Define MINIMUM_X86_ISA_LEVEL in config.h [BZ #31676]	H.J. Lu	2024-04-24	3	-1/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Define MINIMUM_X86_ISA_LEVEL at configure time to avoid /usr/bin/ld: …/build/elf/librtld.os: in function `init_cpu_features': …/git/elf/../sysdeps/x86/cpu-features.c:1202: undefined reference to `_dl_runtime_resolve_fxsave' /usr/bin/ld: …/build/elf/librtld.os: relocation R_X86_64_PC32 against undefined hidden symbol `_dl_runtime_resolve_fxsave' can not be used when making a shared object /usr/bin/ld: final link failed: bad value collect2: error: ld returned 1 exit status when glibc is built with -march=x86-64-v3 and configured with --with-rtld-early-cflags=-march=x86-64, which is used to allow ld.so to print an error message on unsupported CPUs: Fatal glibc error: CPU does not support x86-64-v3 This fixes BZ #31676. Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
*	login: structs utmp, utmpx, lastlog _TIME_BITS independence (bug 30701)	Florian Weimer	2024-04-19	1	-3/+2
\| \| \| \| \| \| \| \| \|	These structs describe file formats under /var/log, and should not depend on the definition of _TIME_BITS. This is achieved by defining __WORDSIZE_TIME64_COMPAT32 to 1 on 32-bit ports that support 32-bit time_t values (where __time_t is 32 bits). Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
*	login: Check default sizes of structs utmp, utmpx, lastlog	Florian Weimer	2024-04-19	1	-0/+2
\| \| \| \| \| \| \| \|	The default <utmp-size.h> is for ports with a 64-bit time_t. Ports with a 32-bit time_t or with __WORDSIZE_TIME64_COMPAT32=1 need to override it. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
*	x86: Add generic CPUID data dumper to ld.so --list-diagnostics	Florian Weimer	2024-04-08	1	-0/+384
\| \| \| \| \| \| \| \| \| \| \|	This is surprisingly difficult to implement if the goal is to produce reasonably sized output. With the current approaches to output compression (suppressing zeros and repeated results between CPUs, folding ranges of identical subleaves, dealing with the %ecx reflection issue), the output is less than 600 KiB even for systems with 256 logical CPUs. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	math: x86 trunc traps when FE_INEXACT is enabled (BZ 31603)	Adhemerval Zanella	2024-04-04	1	-0/+25
\| \| \| \| \| \| \| \| \| \| \| \| \|	The implementations of trunc functions using x87 floating point (i386 and x86_64 long double only) traps when FE_INEXACT is enabled. Although this is a GNU extension outside the scope of the C standard, other architectures that also support traps do not show this behavior. The fix moves the implementation to a common one that holds any exceptions with a 'fnclex' (libc_feholdexcept_setround_387). Checked on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	math: x86 floor traps when FE_INEXACT is enabled (BZ 31601)	Adhemerval Zanella	2024-04-04	1	-0/+25
\| \| \| \| \| \| \| \| \| \| \| \| \|	The implementations of floor functions using x87 floating point (i386 and 86_64 long double only) traps when FE_INEXACT is enabled. Although this is a GNU extension outside the scope of the C standard, other architectures that also support traps do not show this behavior. The fix moves the implementation to a common one that holds any exceptions with a 'fnclex' (libc_feholdexcept_setround_387). Checked on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	math: x86 ceill traps when FE_INEXACT is enabled (BZ 31600)	Adhemerval Zanella	2024-04-04	2	-0/+61
\| \| \| \| \| \| \| \| \| \| \| \| \|	The implementations of ceil functions using x87 floating point (i386 and x86_64 long double only) traps when FE_INEXACT is enabled. Although this is a GNU extension outside the scope of the C standard, other architectures that also support traps do not show this behavior. The fix moves the implementation to a common one that holds any exceptions with a 'fnclex' (libc_feholdexcept_setround_387). Checked on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	x86-64: Allocate state buffer space for RDI, RSI and RBX	H.J. Lu	2024-03-18	2	-11/+60
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	_dl_tlsdesc_dynamic preserves RDI, RSI and RBX before realigning stack. After realigning stack, it saves RCX, RDX, R8, R9, R10 and R11. Define TLSDESC_CALL_REGISTER_SAVE_AREA to allocate space for RDI, RSI and RBX to avoid clobbering saved RDI, RSI and RBX values on stack by xsave to STATE_SAVE_OFFSET(%rsp). +==================+<- stack frame start aligned at 8 or 16 bytes \| \|<- RDI saved in the red zone \| \|<- RSI saved in the red zone \| \|<- RBX saved in the red zone \| \|<- paddings for stack realignment of 64 bytes \|------------------\|<- xsave buffer end aligned at 64 bytes \| \|<- \| \|<- \| \|<- \|------------------\|<- xsave buffer start at STATE_SAVE_OFFSET(%rsp) \| \|<- 8-byte padding for 64-byte alignment \| \|<- 8-byte padding for 64-byte alignment \| \|<- R11 \| \|<- R10 \| \|<- R9 \| \|<- R8 \| \|<- RDX \| \|<- RCX +==================+<- RSP aligned at 64 bytes Define TLSDESC_CALL_REGISTER_SAVE_AREA, the total register save area size for all integer registers by adding 24 to STATE_SAVE_OFFSET since RDI, RSI and RBX are saved onto stack without adjusting stack pointer first, using the red-zone. This fixes BZ #31501. Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
*	x86-64: Simplify minimum ISA check ifdef conditional with if	Sunil K Pandey	2024-03-03	1	-11/+8
\| \| \| \| \| \| \| \| \|	Replace minimum ISA check ifdef conditional with if. Since MINIMUM_X86_ISA_LEVEL and AVX_X86_ISA_LEVEL are compile time constants, compiler will perform constant folding optimization, getting same results. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	x86-64: Update _dl_tlsdesc_dynamic to preserve AMX registers	H.J. Lu	2024-02-29	4	-5/+71
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	_dl_tlsdesc_dynamic should also preserve AMX registers which are caller-saved. Add X86_XSTATE_TILECFG_ID and X86_XSTATE_TILEDATA_ID to x86-64 TLSDESC_CALL_STATE_SAVE_MASK. Compute the AMX state size and save it in xsave_state_full_size which is only used by _dl_tlsdesc_dynamic_xsave and _dl_tlsdesc_dynamic_xsavec. This fixes the AMX part of BZ #31372. Tested on AMX processor. AMX test is enabled only for compilers with the fix for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114098 GCC 14 and GCC 11/12/13 branches have the bug fix. Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
*	x86: Don't check XFD against /proc/cpuinfo	H.J. Lu	2024-02-28	1	-0/+3
\| \| \| \| \|	Since /proc/cpuinfo doesn't report XFD, don't check it against /proc/cpuinfo.
*	x86-64: Don't use SSE resolvers for ISA level 3 or above	H.J. Lu	2024-02-28	1	-6/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When glibc is built with ISA level 3 or above enabled, SSE resolvers aren't available and glibc fails to build: ld: .../elf/librtld.os: in function `init_cpu_features': .../elf/../sysdeps/x86/cpu-features.c:1200:(.text+0x1445f): undefined reference to `_dl_runtime_resolve_fxsave' ld: .../elf/librtld.os: relocation R_X86_64_PC32 against undefined hidden symbol `_dl_runtime_resolve_fxsave' can not be used when making a shared object /usr/local/bin/ld: final link failed: bad value For ISA level 3 or above, don't use _dl_runtime_resolve_fxsave nor _dl_tlsdesc_dynamic_fxsave. This fixes BZ #31429. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
*	x86: Update _dl_tlsdesc_dynamic to preserve caller-saved registers	H.J. Lu	2024-02-28	6	-3/+110
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Compiler generates the following instruction sequence for GNU2 dynamic TLS access: leaq tls_var@TLSDESC(%rip), %rax call tls_var@TLSCALL(%rax) or leal tls_var@TLSDESC(%ebx), %eax call tls_var@TLSCALL(%eax) CALL instruction is transparent to compiler which assumes all registers, except for EFLAGS and RAX/EAX, are unchanged after CALL. When _dl_tlsdesc_dynamic is called, it calls __tls_get_addr on the slow path. __tls_get_addr is a normal function which doesn't preserve any caller-saved registers. _dl_tlsdesc_dynamic saved and restored integer caller-saved registers, but didn't preserve any other caller-saved registers. Add _dl_tlsdesc_dynamic IFUNC functions for FNSAVE, FXSAVE, XSAVE and XSAVEC to save and restore all caller-saved registers. This fixes BZ #31372. Add GLRO(dl_x86_64_runtime_resolve) with GLRO(dl_x86_tlsdesc_dynamic) to optimize elf_machine_runtime_setup. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
*	x86: Change ENQCMD test to CHECK_FEATURE_PRESENT	H.J. Lu	2024-02-27	1	-1/+1
\| \| \| \| \| \|	Since ENQCMD is mainly used in kernel, change the ENQCMD test to CHECK_FEATURE_PRESENT. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
*	x86_64: Exclude SSE, AVX and FMA4 variants in libm multiarch	Sunil K Pandey	2024-02-25	2	-0/+57
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When glibc is built with ISA level 3 or higher by default, the resulting glibc binaries won't run on SSE or FMA4 processors. Exclude SSE, AVX and FMA4 variants in libm multiarch when ISA level 3 or higher is enabled by default. When glibc is built with ISA level 2 enabled by default, only keep SSE4.1 variant. Fixes BZ 31335. NB: elf/tst-valgrind-smoke test fails with ISA level 4, because valgrind doesn't support AVX512 instructions: https://bugs.kde.org/show_bug.cgi?id=383010 Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	x86-64: Save APX registers in ld.so trampoline	H.J. Lu	2024-02-25	1	-6/+46
\| \| \| \| \| \| \| \| \|	Add APX registers to STATE_SAVE_MASK so that APX registers are saved in ld.so trampoline. This fixes BZ #31371. Also update STATE_SAVE_OFFSET and STATE_SAVE_MASK for i386 which will be used by i386 _dl_tlsdesc_dynamic. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
*	Apply the Makefile sorting fix	H.J. Lu	2024-02-15	1	-3/+3
\| \| \| \|	Apply the Makefile sorting fix generated by sort-makefile-lines.py.
*	x86: Do not prefer ERMS for memset on Zen3+	Adhemerval Zanella	2024-02-13	1	-0/+5
\| \| \| \| \| \| \| \|	For AMD Zen3+ architecture, the performance of the vectorized loop is slightly better than ERMS. Checked on x86_64-linux-gnu on Zen3. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	x86: Fix Zen3/Zen4 ERMS selection (BZ 30994)	Adhemerval Zanella	2024-02-13	1	-20/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The REP MOVSB usage on memcpy/memmove does not show much performance improvement on Zen3/Zen4 cores compared to the vectorized loops. Also, as from BZ 30994, if the source is aligned and the destination is not the performance can be 20x slower. The performance difference is noticeable with small buffer sizes, closer to the lower bounds limits when memcpy/memmove starts to use ERMS. The performance of REP MOVSB is similar to vectorized instruction on the size limit (the L2 cache). Also, there is no drawback to multiple cores sharing the cache. Checked on x86_64-linux-gnu on Zen3. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	Refer to C23 in place of C2X in glibc	Joseph Myers	2024-02-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	WG14 decided to use the name C23 as the informal name of the next revision of the C standard (notwithstanding the publication date in 2024). Update references to C2X in glibc to use the C23 name. This is intended to update everything except where it involves renaming files (the changes involving renaming tests are intended to be done separately). In the case of the _ISOC2X_SOURCE feature test macro - the only user-visible interface involved - support for that macro is kept for backwards compatibility, while adding _ISOC23_SOURCE. Tested for x86_64.
*	x86-64: Check if mprotect works before rewriting PLT	H.J. Lu	2024-01-15	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Systemd execution environment configuration may prohibit changing a memory mapping to become executable: MemoryDenyWriteExecute= Takes a boolean argument. If set, attempts to create memory mappings that are writable and executable at the same time, or to change existing memory mappings to become executable, or mapping shared memory segments as executable, are prohibited. When it is set, systemd service stops working if PLT rewrite is enabled. Check if mprotect works before rewriting PLT. This fixes BZ #31230. This also works with SELinux when deny_execmem is on. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
*	x86-64/cet: Make CET feature check specific to Linux/x86	H.J. Lu	2024-01-11	5	-33/+35
\| \| \| \| \| \|	CET feature bits in TCB, which are Linux specific, are used to check if CET features are active. Move CET feature check to Linux/x86 directory. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
*	i386: Remove CET support bits	H.J. Lu	2024-01-10	5	-135/+3
\| \| \| \| \| \| \| \| \| \| \| \|	1. Remove _dl_runtime_resolve_shstk and _dl_runtime_profile_shstk. 2. Move CET offsets from x86 cpu-features-offsets.sym to x86-64 features-offsets.sym. 3. Rename x86 cet-control.h to x86-64 feature-control.h since it is only for x86-64 and also used for PLT rewrite. 4. Add x86-64 ldsodefs.h to include feature-control.h. 5. Change TUNABLE_CALLBACK (set_plt_rewrite) to x86-64 only. 6. Move x86 dl-procruntime.c to x86-64. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
*	x86-64/cet: Move check-cet.awk to x86_64	H.J. Lu	2024-01-10	1	-53/+0
\| \| \| \|	Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>