about summary refs log tree commit diff
path: root/sysdeps/x86/cpu-features-offsets.sym
Commit message (Collapse)AuthorAgeFilesLines
* x86-64: Update _dl_tlsdesc_dynamic to preserve AMX registersH.J. Lu2024-02-291-0/+1
| | | | | | | | | | | | | | | | _dl_tlsdesc_dynamic should also preserve AMX registers which are caller-saved. Add X86_XSTATE_TILECFG_ID and X86_XSTATE_TILEDATA_ID to x86-64 TLSDESC_CALL_STATE_SAVE_MASK. Compute the AMX state size and save it in xsave_state_full_size which is only used by _dl_tlsdesc_dynamic_xsave and _dl_tlsdesc_dynamic_xsavec. This fixes the AMX part of BZ #31372. Tested on AMX processor. AMX test is enabled only for compilers with the fix for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114098 GCC 14 and GCC 11/12/13 branches have the bug fix. Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
* i386: Remove CET support bitsH.J. Lu2024-01-101-2/+0
| | | | | | | | | | | | 1. Remove _dl_runtime_resolve_shstk and _dl_runtime_profile_shstk. 2. Move CET offsets from x86 cpu-features-offsets.sym to x86-64 features-offsets.sym. 3. Rename x86 cet-control.h to x86-64 feature-control.h since it is only for x86-64 and also used for PLT rewrite. 4. Add x86-64 ldsodefs.h to include feature-control.h. 5. Change TUNABLE_CALLBACK (set_plt_rewrite) to x86-64 only. 6. Move x86 dl-procruntime.c to x86-64. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* x86/cet: Enable shadow stack during startupH.J. Lu2024-01-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | Previously, CET was enabled by kernel before passing control to user space and the startup code must disable CET if applications or shared libraries aren't CET enabled. Since the current kernel only supports shadow stack and won't enable shadow stack before passing control to user space, we need to enable shadow stack during startup if the application and all shared library are shadow stack enabled. There is no need to disable shadow stack at startup. Shadow stack can only be enabled in a function which will never return. Otherwise, shadow stack will underflow at the function return. 1. GL(dl_x86_feature_1) is set to the CET features which are supported by the processor and are not disabled by the tunable. Only non-zero features in GL(dl_x86_feature_1) should be enabled. After enabling shadow stack with ARCH_SHSTK_ENABLE, ARCH_SHSTK_STATUS is used to check if shadow stack is really enabled. 2. Use ARCH_SHSTK_ENABLE in RTLD_START in dynamic executable. It is safe since RTLD_START never returns. 3. Call arch_prctl (ARCH_SHSTK_ENABLE) from ARCH_SETUP_TLS in static executable. Since the start function using ARCH_SETUP_TLS never returns, it is safe to enable shadow stack in ARCH_SETUP_TLS.
* x86: Cleanup cpu-features-offsets.symH.J. Lu2018-08-031-19/+1
| | | | | | | | | | | | | | | | | | | | | | | Remove the unused macros. There is no code changes in libc.so nor ld.so on i686 and x86-64. * sysdeps/x86/cpu-features-offsets.sym (rtld_global_ro_offsetof): Removed. (CPU_FEATURES_SIZE): Likewise. (CPUID_OFFSET): Likewise. (CPUID_SIZE): Likewise. (CPUID_EAX_OFFSET): Likewise. (CPUID_EBX_OFFSET): Likewise. (CPUID_ECX_OFFSET): Likewise. (CPUID_EDX_OFFSET): Likewise. (FAMILY_OFFSET): Likewise. (MODEL_OFFSET): Likewise. (FEATURE_OFFSET): Likewise. (FEATURE_SIZ): Likewise. (COMMON_CPUID_INDEX_1): Likewise. (COMMON_CPUID_INDEX_7): Likewise. (FEATURE_INDEX_1): Likewise. (RTLD_GLOBAL_RO_DL_X86_CPU_FEATURES_OFFSET): Updated.
* x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265]H.J. Lu2017-10-201-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In _dl_runtime_resolve, use fxsave/xsave/xsavec to preserve all vector, mask and bound registers. It simplifies _dl_runtime_resolve and supports different calling conventions. ld.so code size is reduced by more than 1 KB. However, use fxsave/xsave/xsavec takes a little bit more cycles than saving and restoring vector and bound registers individually. Latency for _dl_runtime_resolve to lookup the function, foo, from one shared library plus libc.so: Before After Change Westmere (SSE)/fxsave 345 866 151% IvyBridge (AVX)/xsave 420 643 53% Haswell (AVX)/xsave 713 1252 75% Skylake (AVX+MPX)/xsavec 559 719 28% Skylake (AVX512+MPX)/xsavec 145 272 87% Ryzen (AVX)/xsavec 280 553 97% This is the worst case where portion of time spent for saving and restoring registers is bigger than majority of cases. With smaller _dl_runtime_resolve code size, overall performance impact is negligible. On IvyBridge, differences in build and test time of binutils with lazy binding GCC and binutils are noises. On Westmere, differences in bootstrap and "makc check" time of GCC 7 with lazy binding GCC and binutils are also noises. [BZ #21265] * sysdeps/x86/cpu-features-offsets.sym (XSAVE_STATE_SIZE_OFFSET): New. * sysdeps/x86/cpu-features.c: Include <libc-pointer-arith.h>. (get_common_indeces): Set xsave_state_size, xsave_state_full_size and bit_arch_XSAVEC_Usable if needed. (init_cpu_features): Remove bit_arch_Use_dl_runtime_resolve_slow and bit_arch_Use_dl_runtime_resolve_opt. * sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt): Removed. (bit_arch_Use_dl_runtime_resolve_slow): Likewise. (bit_arch_Prefer_No_AVX512): Updated. (bit_arch_MathVec_Prefer_No_AVX512): Likewise. (bit_arch_XSAVEC_Usable): New. (STATE_SAVE_OFFSET): Likewise. (STATE_SAVE_MASK): Likewise. [__ASSEMBLER__]: Include <cpu-features-offsets.h>. (cpu_features): Add xsave_state_size and xsave_state_full_size. (index_arch_Use_dl_runtime_resolve_opt): Removed. (index_arch_Use_dl_runtime_resolve_slow): Likewise. (index_arch_XSAVEC_Usable): New. * sysdeps/x86/cpu-tunables.c (TUNABLE_CALLBACK (set_hwcaps)): Support XSAVEC_Usable. Remove Use_dl_runtime_resolve_slow. * sysdeps/x86_64/Makefile (tst-x86_64-1-ENV): New if tunables is enabled. * sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup): Replace _dl_runtime_resolve_sse, _dl_runtime_resolve_avx, _dl_runtime_resolve_avx_slow, _dl_runtime_resolve_avx_opt, _dl_runtime_resolve_avx512 and _dl_runtime_resolve_avx512_opt with _dl_runtime_resolve_fxsave, _dl_runtime_resolve_xsave and _dl_runtime_resolve_xsavec. * sysdeps/x86_64/dl-trampoline.S (DL_RUNTIME_UNALIGNED_VEC_SIZE): Removed. (DL_RUNTIME_RESOLVE_REALIGN_STACK): Check STATE_SAVE_ALIGNMENT instead of VEC_SIZE. (REGISTER_SAVE_BND0): Removed. (REGISTER_SAVE_BND1): Likewise. (REGISTER_SAVE_BND3): Likewise. (REGISTER_SAVE_RAX): Always defined to 0. (VMOV): Removed. (_dl_runtime_resolve_avx): Likewise. (_dl_runtime_resolve_avx_slow): Likewise. (_dl_runtime_resolve_avx_opt): Likewise. (_dl_runtime_resolve_avx512): Likewise. (_dl_runtime_resolve_avx512_opt): Likewise. (_dl_runtime_resolve_sse): Likewise. (_dl_runtime_resolve_sse_vex): Likewise. (USE_FXSAVE): New. (_dl_runtime_resolve_fxsave): Likewise. (USE_XSAVE): Likewise. (_dl_runtime_resolve_xsave): Likewise. (USE_XSAVEC): Likewise. (_dl_runtime_resolve_xsavec): Likewise. * sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx512): Removed. (_dl_runtime_resolve_avx512_opt): Likewise. (_dl_runtime_resolve_avx): Likewise. (_dl_runtime_resolve_avx_opt): Likewise. (_dl_runtime_resolve_sse): Likewise. (_dl_runtime_resolve_sse_vex): Likewise. (_dl_runtime_resolve_fxsave): New. (_dl_runtime_resolve_xsave): Likewise. (_dl_runtime_resolve_xsavec): Likewise.
* Remove x86 ifunc-defines.sym and rtld-global-offsets.symH.J. Lu2016-05-111-0/+16
| | | | | | | | | | | | | | | | | | | | Merge x86 ifunc-defines.sym with x86 cpu-features-offsets.sym. Remove x86 ifunc-defines.sym and rtld-global-offsets.sym. No code changes on i686 and x86-64. * sysdeps/i386/i686/multiarch/Makefile (gen-as-const-headers): Remove ifunc-defines.sym. * sysdeps/x86_64/multiarch/Makefile (gen-as-const-headers): Likewise. * sysdeps/i386/i686/multiarch/ifunc-defines.sym: Removed. * sysdeps/x86/rtld-global-offsets.sym: Likewise. * sysdeps/x86_64/multiarch/ifunc-defines.sym: Likewise. * sysdeps/x86/Makefile (gen-as-const-headers): Remove rtld-global-offsets.sym. * sysdeps/x86_64/multiarch/ifunc-defines.sym: Merged with ... * sysdeps/x86/cpu-features-offsets.sym: This. * sysdeps/x86/cpu-features.h: Include <cpu-features-offsets.h> instead of <ifunc-defines.h> and <rtld-global-offsets.h>.
* Add _dl_x86_cpu_features to rtld_globalH.J. Lu2015-08-131-0/+7
This patch adds _dl_x86_cpu_features to rtld_global in x86 ld.so and initializes it early before __libc_start_main is called so that cpu_features is always available when it is used and we can avoid calling __init_cpu_features in IFUNC selectors. * sysdeps/i386/dl-machine.h: Include <cpu-features.c>. (dl_platform_init): Call init_cpu_features. * sysdeps/i386/dl-procinfo.c (_dl_x86_cpu_features): New. * sysdeps/i386/i686/cacheinfo.c (DISABLE_PREFERRED_MEMORY_INSTRUCTION): Removed. * sysdeps/i386/i686/multiarch/Makefile (aux): Remove init-arch. * sysdeps/i386/i686/multiarch/Versions: Removed. * sysdeps/i386/i686/multiarch/ifunc-defines.sym (KIND_OFFSET): Removed. * sysdeps/i386/ldsodefs.h: Include <cpu-features.h>. * sysdeps/unix/sysv/linux/x86/Makefile (libpthread-sysdep_routines): Remove init-arch. * sysdeps/unix/sysv/linux/x86_64/dl-procinfo.c: Include <sysdeps/x86_64/dl-procinfo.c> instead of sysdeps/generic/dl-procinfo.c>. * sysdeps/x86/Makefile [$(subdir) == csu] (gen-as-const-headers): Add cpu-features-offsets.sym and rtld-global-offsets.sym. [$(subdir) == elf] (sysdep-dl-routines): Add dl-get-cpu-features. [$(subdir) == elf] (tests): Add tst-get-cpu-features. [$(subdir) == elf] (tests-static): Add tst-get-cpu-features-static. * sysdeps/x86/Versions: New file. * sysdeps/x86/cpu-features-offsets.sym: Likewise. * sysdeps/x86/cpu-features.c: Likewise. * sysdeps/x86/cpu-features.h: Likewise. * sysdeps/x86/dl-get-cpu-features.c: Likewise. * sysdeps/x86/libc-start.c: Likewise. * sysdeps/x86/rtld-global-offsets.sym: Likewise. * sysdeps/x86/tst-get-cpu-features-static.c: Likewise. * sysdeps/x86/tst-get-cpu-features.c: Likewise. * sysdeps/x86_64/dl-procinfo.c: Likewise. * sysdeps/x86_64/cacheinfo.c (__cpuid_count): Removed. Assume USE_MULTIARCH is defined and don't check it. (is_intel): Replace __cpu_features with GLRO(dl_x86_cpu_features). (is_amd): Likewise. (max_cpuid): Likewise. (intel_check_word): Likewise. (__cache_sysconf): Don't call __init_cpu_features. (__x86_preferred_memory_instruction): Removed. (init_cacheinfo): Don't call __init_cpu_features. Replace __cpu_features with GLRO(dl_x86_cpu_features). * sysdeps/x86_64/dl-machine.h: <cpu-features.c>. (dl_platform_init): Call init_cpu_features. * sysdeps/x86_64/ldsodefs.h: Include <cpu-features.h>. * sysdeps/x86_64/multiarch/Makefile (aux): Remove init-arch. * sysdeps/x86_64/multiarch/Versions: Removed. * sysdeps/x86_64/multiarch/cacheinfo.c: Likewise. * sysdeps/x86_64/multiarch/init-arch.c: Likewise. * sysdeps/x86_64/multiarch/ifunc-defines.sym (KIND_OFFSET): Removed. * sysdeps/x86_64/multiarch/init-arch.h: Rewrite.