about summary refs log tree commit diff
path: root/sysdeps
Commit message (Collapse)AuthorAgeFilesLines
* Update copyright dates with scripts/update-copyrightsPaul Eggert2024-01-017411-7412/+7411
|
* x86/cet: Run some CET tests with shadow stackH.J. Lu2024-01-014-0/+17
| | | | | | | When CET is disabled by default, run some CET tests with shadow stack enabled using $ export GLIBC_TUNABLES=glibc.cpu.hwcaps=SHSTK
* x86/cet: Don't set CET active by defaultH.J. Lu2024-01-012-2/+15
| | | | | | | | | | | | | | | Not all CET enabled applications and libraries have been properly tested in CET enabled environments. Some CET enabled applications or libraries will crash or misbehave when CET is enabled. Don't set CET active by default so that all applications and libraries will run normally regardless of whether CET is active or not. Shadow stack can be enabled by $ export GLIBC_TUNABLES=glibc.cpu.hwcaps=SHSTK at run-time if shadow stack can be enabled by kernel. NB: This commit can be reverted if it is OK to enable CET by default for all applications and libraries.
* x86/cet: Check feature_1 in TCB for active IBT and SHSTKH.J. Lu2024-01-013-1/+35
| | | | | | | | | Initially, IBT and SHSTK are marked as active when CPU supports them and CET are enabled in glibc. They can be disabled early by tunables before relocation. Since after relocation, GLRO(dl_x86_cpu_features) becomes read-only, we can't update GLRO(dl_x86_cpu_features) to mark IBT and SHSTK as inactive. Instead, check the feature_1 field in TCB to decide if IBT and SHST are active.
* x86/cet: Enable shadow stack during startupH.J. Lu2024-01-0110-146/+175
| | | | | | | | | | | | | | | | | | | | | | | Previously, CET was enabled by kernel before passing control to user space and the startup code must disable CET if applications or shared libraries aren't CET enabled. Since the current kernel only supports shadow stack and won't enable shadow stack before passing control to user space, we need to enable shadow stack during startup if the application and all shared library are shadow stack enabled. There is no need to disable shadow stack at startup. Shadow stack can only be enabled in a function which will never return. Otherwise, shadow stack will underflow at the function return. 1. GL(dl_x86_feature_1) is set to the CET features which are supported by the processor and are not disabled by the tunable. Only non-zero features in GL(dl_x86_feature_1) should be enabled. After enabling shadow stack with ARCH_SHSTK_ENABLE, ARCH_SHSTK_STATUS is used to check if shadow stack is really enabled. 2. Use ARCH_SHSTK_ENABLE in RTLD_START in dynamic executable. It is safe since RTLD_START never returns. 3. Call arch_prctl (ARCH_SHSTK_ENABLE) from ARCH_SETUP_TLS in static executable. Since the start function using ARCH_SETUP_TLS never returns, it is safe to enable shadow stack in ARCH_SETUP_TLS.
* elf: Always provide _dl_get_dl_main_map in libc.aH.J. Lu2024-01-011-4/+3
| | | | | Always provide _dl_get_dl_main_map in libc.a. It will be used by x86 to process PT_GNU_PROPERTY segment.
* x86/cet: Sync with Linux kernel 6.6 shadow stack interfaceH.J. Lu2024-01-0115-133/+173
| | | | | | | | | | | | | | | | | | | | | | | Sync with Linux kernel 6.6 shadow stack interface. Since only x86-64 is supported, i386 shadow stack codes are unchanged and CET shouldn't be enabled for i386. 1. When the shadow stack base in TCB is unset, the default shadow stack is in use. Use the current shadow stack pointer as the marker for the default shadow stack. It is used to identify if the current shadow stack is the same as the target shadow stack when switching ucontexts. If yes, INCSSP will be used to unwind shadow stack. Otherwise, shadow stack restore token will be used. 2. Allocate shadow stack with the map_shadow_stack syscall. Since there is no function to explicitly release ucontext, there is no place to release shadow stack allocated by map_shadow_stack in ucontext functions. Such shadow stacks will be leaked. 3. Rename arch_prctl CET commands to ARCH_SHSTK_XXX. 4. Rewrite the CET control functions with the current kernel shadow stack interface. Since CET is no longer enabled by kernel, a separate patch will enable shadow stack during startup.
* RISC-V: Add support for dl_runtime_profile (BZ #31151)Aurelien Jarno2023-12-304-1/+225
| | | | | | | | | | | | | Code is mostly inspired from the LoongArch one, which has a similar ABI, with minor changes to support riscv32 and register differences. This fixes elf/tst-sprof-basic. This also fixes elf/tst-audit1, elf/tst-audit2 and elf/tst-audit8 with recent binutils snapshots when --enable-bind-now is used. Resolves: BZ #31151 Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
* x86-64: Fix the tcb field load for x32 [BZ #31185]H.J. Lu2023-12-221-2/+2
| | | | | | | | | | | | | | | | | | | | | | | _dl_tlsdesc_undefweak and _dl_tlsdesc_dynamic access the thread pointer via the tcb field in TCB: _dl_tlsdesc_undefweak: _CET_ENDBR movq 8(%rax), %rax subq %fs:0, %rax ret _dl_tlsdesc_dynamic: ... subq %fs:0, %rax movq -8(%rsp), %rdi ret Since the tcb field in TCB is a pointer, %fs:0 is a 32-bit location, not 64-bit. It should use "sub %fs:0, %RAX_LP" instead. Since _dl_tlsdesc_undefweak returns ptrdiff_t and _dl_make_tlsdesc_dynamic returns void *, RAX_LP is appropriate here for x32 and x86-64. This fixes BZ #31185.
* x86-64: Fix the dtv field load for x32 [BZ #31184]H.J. Lu2023-12-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On x32, I got FAIL: elf/tst-tlsgap $ gdb elf/tst-tlsgap ... open tst-tlsgap-mod1.so Thread 2 "tst-tlsgap" received signal SIGSEGV, Segmentation fault. [Switching to LWP 2268754] _dl_tlsdesc_dynamic () at ../sysdeps/x86_64/dl-tlsdesc.S:108 108 movq (%rsi), %rax (gdb) p/x $rsi $4 = 0xf7dbf9005655fb18 (gdb) This is caused by _dl_tlsdesc_dynamic: _CET_ENDBR /* Preserve call-clobbered registers that we modify. We need two scratch regs anyway. */ movq %rsi, -16(%rsp) movq %fs:DTV_OFFSET, %rsi Since the dtv field in TCB is a pointer, %fs:DTV_OFFSET is a 32-bit location, not 64-bit. Load the dtv field to RSI_LP instead of rsi. This fixes BZ #31184.
* x86/cet: Don't disable CET if not single threadedH.J. Lu2023-12-201-2/+9
| | | | | | | | In permissive mode, don't disable IBT nor SHSTK when dlopening a legacy shared library if not single threaded since IBT and SHSTK may be still enabled in other threads. Other threads with IBT or SHSTK enabled will crash when calling functions in the legacy shared library. Instead, an error will be issued.
* x86: Modularize sysdeps/x86/dl-cet.cH.J. Lu2023-12-201-176/+280
| | | | | | | | | | | Improve readability and make maintenance easier for dl-feature.c by modularizing sysdeps/x86/dl-cet.c: 1. Support processors with: a. Only IBT. Or b. Only SHSTK. Or c. Both IBT and SHSTK. 2. Lock CET features only if IBT or SHSTK are enabled and are not enabled permissively.
* x86/cet: Update tst-cet-vfork-1H.J. Lu2023-12-201-26/+17
| | | | | Change tst-cet-vfork-1.c to verify that vfork child return triggers SIGSEGV due to shadow stack mismatch.
* aarch64: Add SIMD attributes to math functions with vector versionsJoe Ramsay2023-12-202-0/+113
| | | | | | | Added annotations for autovec by GCC and GFortran - this enables GCC >= 9 to autovectorise math calls at -Ofast. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* aarch64: Add half-width versions of AdvSIMD f32 libmvec routinesJoe Ramsay2023-12-2019-14/+123
| | | | | | | | | | | Compilers may emit calls to 'half-width' routines (two-lane single-precision variants). These have been added in the form of wrappers around the full-width versions, where the low half of the vector is simply duplicated. This will perform poorly when one lane triggers the special-case handler, as there will be a redundant call to the scalar version, however this is expected to be rare at Ofast. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* Fix elf: Do not duplicate the GLIBC_TUNABLES stringH.J. Lu2023-12-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | commit 2a969b53c0b02fed7e43473a92f219d737fd217a Author: Adhemerval Zanella <adhemerval.zanella@linaro.org> Date: Wed Dec 6 10:24:01 2023 -0300 elf: Do not duplicate the GLIBC_TUNABLES string has @@ -38,7 +39,7 @@ which isn't available. */ #define CHECK_GLIBC_IFUNC_PREFERRED_OFF(f, cpu_features, name, len) \ _Static_assert (sizeof (#name) - 1 == len, #name " != " #len); \ - if (memcmp (f, #name, len) == 0) \ + if (tunable_str_comma_strcmp_cte (&f, #name) == 0) \ { \ cpu_features->preferred[index_arch_##name] \ &= ~bit_arch_##name; \ @@ -46,12 +47,11 @@ Fix it by removing "== 0" after tunable_str_comma_strcmp_cte.
* Fix elf: Do not duplicate the GLIBC_TUNABLES stringH.J. Lu2023-12-191-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix issues in sysdeps/x86/tst-hwcap-tunables.c added by Author: Adhemerval Zanella <adhemerval.zanella@linaro.org> Date: Wed Dec 6 10:24:01 2023 -0300 elf: Do not duplicate the GLIBC_TUNABLES string 1. -AVX,-AVX2,-AVX512F should be used to disable AVX, AVX2 and AVX512. 2. AVX512 IFUNC functions check AVX512VL. -AVX512VL should be added to disable these functions. This fixed: FAIL: elf/tst-hwcap-tunables ... [0] Spawned test for -Prefer_ERMS,-Prefer_FSRM,-AVX,-AVX2,-AVX_Usable,-AVX2_Usable,-AVX512F_Usable,-SSE4_1,-SSE4_2,-SSSE3,-Fast_Unaligned_Load,-ERMS,-AVX_Fast_Unaligned_Load error: subprocess failed: tst-tunables error: unexpected output from subprocess ../sysdeps/x86/tst-hwcap-tunables.c:91: numeric comparison failure left: 1 (0x1); from: impls[i].usable right: 0 (0x0); from: false ../sysdeps/x86/tst-hwcap-tunables.c:91: numeric comparison failure left: 1 (0x1); from: impls[i].usable right: 0 (0x0); from: false ../sysdeps/x86/tst-hwcap-tunables.c:91: numeric comparison failure left: 1 (0x1); from: impls[i].usable right: 0 (0x0); from: false ../sysdeps/x86/tst-hwcap-tunables.c:91: numeric comparison failure left: 1 (0x1); from: impls[i].usable right: 0 (0x0); from: false ../sysdeps/x86/tst-hwcap-tunables.c:91: numeric comparison failure left: 1 (0x1); from: impls[i].usable right: 0 (0x0); from: false [1] Spawned test for ,-,-Prefer_ERMS,-Prefer_FSRM,-AVX,-AVX2,-AVX_Usable,-AVX2_Usable,-AVX512F_Usable,-SSE4_1,-SSE4_2,,-SSSE3,-Fast_Unaligned_Load,,-,-ERMS,-AVX_Fast_Unaligned_Load,-, error: subprocess failed: tst-tunables error: unexpected output from subprocess ../sysdeps/x86/tst-hwcap-tunables.c:91: numeric comparison failure left: 1 (0x1); from: impls[i].usable right: 0 (0x0); from: false ../sysdeps/x86/tst-hwcap-tunables.c:91: numeric comparison failure left: 1 (0x1); from: impls[i].usable right: 0 (0x0); from: false ../sysdeps/x86/tst-hwcap-tunables.c:91: numeric comparison failure left: 1 (0x1); from: impls[i].usable right: 0 (0x0); from: false ../sysdeps/x86/tst-hwcap-tunables.c:91: numeric comparison failure left: 1 (0x1); from: impls[i].usable right: 0 (0x0); from: false ../sysdeps/x86/tst-hwcap-tunables.c:91: numeric comparison failure left: 1 (0x1); from: impls[i].usable right: 0 (0x0); from: false error: 2 test failures on Intel Tiger Lake.
* hppa: Fix undefined behaviour in feclearexcept (BZ 30983)Bruno Haible2023-12-191-1/+1
| | | | | | | | | | | | | | | | | | The expression (excepts & FE_ALL_EXCEPT) << 27 produces a signed integer overflow when 'excepts' is specified as FE_INVALID (= 0x10), because - excepts is of type 'int', - FE_ALL_EXCEPT is of type 'int', - thus (excepts & FE_ALL_EXCEPT) is (int) 0x10, - 'int' is 32 bits wide. The patched code produces the same instruction sequence as previosuly. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* alpha: Fix fesetexceptflag (BZ 30998)Bruno Haible2023-12-191-1/+1
| | | | | | | | It clears some exception flags that are outside the EXCEPTS argument. It fixes math/test-fexcept on qemu-user. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* riscv: Fix feenvupdate with FE_DFL_ENV (BZ 31022)Adhemerval Zanella2023-12-191-5/+3
| | | | | | | | | | | libc_feupdateenv_riscv should check for FE_DFL_ENV, similar to libc_fesetenv_riscv. Also extend the test-fenv.c to test fenvupdate. Checked on riscv under qemu-system. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* x86: Do not raises floating-point exception traps on fesetexceptflag (BZ 30990)Bruno Haible2023-12-192-31/+56
| | | | | | | | | | | | | | | | | | | According to ISO C23 (7.6.4.4), fesetexcept is supposed to set floating-point exception flags without raising a trap (unlike feraiseexcept, which is supposed to raise a trap if feenableexcept was called with the appropriate argument). The flags can be set in the 387 unit or in the SSE unit. When we need to clear a flag, we need to do so in both units, due to the way fetestexcept is implemented. When we need to set a flag, it is sufficient to do it in the SSE unit, because that is guaranteed to not trap. However, on i386 CPUs that have only a 387 unit, set the flags in the 387, as long as this cannot trap. Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* i686: Do not raise exception traps on fesetexcept (BZ 30989)Adhemerval Zanella2023-12-193-22/+76
| | | | | | | | | | | | | | | | According to ISO C23 (7.6.4.4), fesetexcept is supposed to set floating-point exception flags without raising a trap (unlike feraiseexcept, which is supposed to raise a trap if feenableexcept was called with the appropriate argument). The flags can be set in the 387 unit or in the SSE unit. To set a flag, it is sufficient to do it in the SSE unit, because that is guaranteed to not trap. However, on i386 CPUs that have only a 387 unit, set the flags in the 387, as long as this cannot trap. Checked on i686-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* powerpc: Do not raise exception traps for fesetexcept/fesetexceptflag (BZ 30988)Adhemerval Zanella2023-12-192-1/+13
| | | | | | | | | | | | | | | | | | | | | | | According to ISO C23 (7.6.4.4), fesetexcept is supposed to set floating-point exception flags without raising a trap (unlike feraiseexcept, which is supposed to raise a trap if feenableexcept was called with the appropriate argument). This is a side-effect of how we implement the GNU extension feenableexcept, where feenableexcept/fesetenv/fesetmode/feupdateenv might issue prctl (PR_SET_FPEXC, PR_FP_EXC_PRECISE) depending of the argument. And on PR_FP_EXC_PRECISE, setting a floating-point exception flag triggers a trap. To make the both functions follow the C23, fesetexcept and fesetexceptflag now fail if the argument may trigger a trap. The math tests now check for an value different than 0, instead of bail out as unsupported for EXCEPTION_SET_FORCES_TRAP. Checked on powerpc64le-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* elf: Do not duplicate the GLIBC_TUNABLES stringAdhemerval Zanella2023-12-198-227/+424
| | | | | | | | | | | | | | | | | | | | | The tunable parsing duplicates the tunable environment variable so it null-terminates each one since it simplifies the later parsing. It has the drawback of adding another point of failure (__minimal_malloc failing), and the memory copy requires tuning the compiler to avoid mem operations calls. The parsing now tracks the tunable start and its size. The dl-tunable-parse.h adds helper functions to help parsing, like a strcmp that also checks for size and an iterator for suboptions that are comma-separated (used on hwcap parsing by x86, powerpc, and s390x). Since the environment variable is allocated on the stack by the kernel, it is safe to keep the references to the suboptions for later parsing of string tunables (as done by set_hwcaps by multiple architectures). Checked on x86_64-linux-gnu, powerpc64le-linux-gnu, and aarch64-linux-gnu. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* Do not build sparc32 libgcc functions into static libcJoseph Myers2023-12-191-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since GCC commit f31a019d1161ec78846473da743aedf49cca8c27 "Emit funcall external declarations only if actually used.", the glibc testsuite has failed to build for 32-bit SPARC with GCC mainline. /scratch/jmyers/glibc-bot/install/compilers/sparc64-linux-gnu/lib/gcc/sparc64-glibc-linux-gnu/14.0.0/../../../../sparc64-glibc-linux-gnu/bin/ld: /scratch/jmyers/glibc-bot/install/compilers/sparc64-linux-gnu/lib/gcc/sparc64-glibc-linux-gnu/14.0.0/32/libgcc.a(_divsi3.o): in function `.div': /scratch/jmyers/glibc-bot/src/gcc/libgcc/config/sparc/lb1spc.S:138: multiple definition of `.div'; /scratch/jmyers/glibc-bot/build/glibcs/sparcv9-linux-gnu/glibc/libc.a(sdiv.o):/scratch/jmyers/glibc-bot/src/glibc/gnulib/../sysdeps/sparc/sparc32/sparcv9/sdiv.S:13: first defined here /scratch/jmyers/glibc-bot/install/compilers/sparc64-linux-gnu/lib/gcc/sparc64-glibc-linux-gnu/14.0.0/../../../../sparc64-glibc-linux-gnu/bin/ld: disabling relaxation; it will not work with multiple definitions collect2: error: ld returned 1 exit status make[3]: *** [../Rules:298: /scratch/jmyers/glibc-bot/build/glibcs/sparcv9-linux-gnu/glibc/nptl/tst-cancel24-static] Error 1 https://sourceware.org/pipermail/libc-testresults/2023q4/012154.html I'm not sure of the exact sequence of undefined references that cause first the glibc object file defining .div and then the libgcc object file defining both .div and .udiv to be pulled in (which must have been perturbed by that GCC change in a way that introduced the build failure), but I think the failure illustrates that it's inherently fragile for glibc to define symbols in separate object files that libgcc defines in the same object file - and indeed for glibc to redefine libgcc symbols at all, since the division into object files shouldn't really be part of the interface between libgcc and libc. These symbols appear to be in libc only for compatibility, maybe one of the cases where they were accidentally exported from shared libc in glibc 2.0 before the introduction of symbol versioning and so programs started expecting shared libc to provide them. Thus, there is no need to have them in static libc. Add this set of libgcc functions to shared-only-routines so they are no longer provided in static libc. (No change is made regarding .mul - dotmul source file - since unlike the other symbols in this grouping, it doesn't actually appear to be a libgcc symbol, at least in current GCC.) Tested with build-many-glibcs.py for sparcv9-linux-gnu with GCC mainline.
* x86/cet: Check CPU_FEATURE_ACTIVE in permissive modeH.J. Lu2023-12-192-0/+6
| | | | Verify that CPU_FEATURE_ACTIVE works properly in permissive mode.
* x86/cet: Check legacy shadow stack code in .init_array sectionH.J. Lu2023-12-1911-0/+330
| | | | | | Verify that legacy shadow stack code in .init_array section in application and shared library, which are marked as shadow stack enabled, will trigger segfault.
* x86/cet: Add tests for GLIBC_TUNABLES=glibc.cpu.hwcaps=-SHSTKH.J. Lu2023-12-193-0/+28
| | | | | Verify that GLIBC_TUNABLES=glibc.cpu.hwcaps=-SHSTK turns off shadow stack properly.
* x86/cet: Check CPU_FEATURE_ACTIVE when CET is disabledH.J. Lu2023-12-193-0/+9
| | | | | Verify that CPU_FEATURE_ACTIVE (SHSTK) works properly when CET is disabled.
* x86/cet: Check legacy shadow stack applicationsH.J. Lu2023-12-196-0/+130
| | | | | Add tests to verify that legacy shadow stack applications run properly when shadow stack is enabled in Linux kernel.
* s390: Set psw addr field in getcontext and friends.Stefan Liebler2023-12-196-0/+34
| | | | | | | | | | | | | | | | | | | | So far if the ucontext structure was obtained by getcontext and co, the return address was stored in general purpose register 14 as it is defined as return address in the ABI. In contrast, the context passed to a signal handler contains the address in psw.addr field. If somebody e.g. wants to dump the address of the context, the origin needs to be known. Now this patch adjusts getcontext and friends and stores the return address also in psw.addr field. Note that setcontext isn't adjusted and it is not supported to pass a ucontext structure from signal-handler to setcontext. We are not able to restore all registers and branching to psw.addr without clobbering one register.
* x86: Unifies 'strlen-evex' and 'strlen-evex512' implementations.Matthew Sterrett2023-12-185-472/+439
| | | | | | | | | | | | | | | | | | | | | | This commit uses a common implementation 'strlen-evex-base.S' for both 'strlen-evex' and 'strlen-evex512' The motivation is to reduce the number of implementations to maintain. This incidentally gives a small performance improvement. All tests pass on x86. Benchmarks were taken on SKX. https://www.intel.com/content/www/us/en/products/sku/123613/intel-core-i97900x-xseries-processor-13-75m-cache-up-to-4-30-ghz/specifications.html Geometric mean for strlen-evex512 over all benchmarks (N=10) was (new/old) 0.939 Geometric mean for wcslen-evex512 over all benchmarks (N=10) was (new/old) 0.965 Code Size Changes: strlen-evex512.S : +24 bytes wcslen-evex512.S : +54 bytes Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
* x86/cet: Don't assume that SHSTK implies IBTH.J. Lu2023-12-183-11/+11
| | | | | | | Since shadow stack (SHSTK) is enabled in the Linux kernel without enabling indirect branch tracking (IBT), don't assume that SHSTK implies IBT. Use "CPU_FEATURE_ACTIVE (IBT)" to check if IBT is active and "CPU_FEATURE_ACTIVE (SHSTK)" to check if SHSTK is active.
* x86/cet: Check user_shstk in /proc/cpuinfoH.J. Lu2023-12-171-1/+1
| | | | | Linux kernel reports CPU shadow stack feature in /proc/cpuinfo as user_shstk, instead of shstk.
* powerpc: Add space for HWCAP3/HWCAP4 in the TCB for future Power.Manjunath Matti2023-12-157-1/+26
| | | | | | | | | | | | | | This patch reserves space for HWCAP3/HWCAP4 in the TCB of powerpc. These hardware capabilities bits will be used by future Power architectures. Versioned symbol '__parse_hwcap_3_4_and_convert_at_platform' advertises the availability of the new HWCAP3/HWCAP4 data in the TCB. This is an ABI change for GLIBC 2.39. Suggested-by: Peter Bergner <bergner@linux.ibm.com> Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
* powerpc: Fix performance issues of strcmp power10Amrita H S2023-12-151-66/+95
| | | | | | | | | | | | | | | | | Current implementation of strcmp for power10 has performance regression for multiple small sizes and alignment combination. Most of these performance issues are fixed by this patch. The compare loop is unrolled and page crosses of unrolled loop is handled. Thanks to Paul E. Murphy for helping in fixing the performance issues. Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com> Co-Authored-By: Paul E. Murphy <murphyp@linux.ibm.com> Reviewed-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>
* powerpc : Add optimized memchr for POWER10MAHESH BODAPATI2023-12-145-10/+367
| | | | | | Optimized memchr for POWER10 based on existing rawmemchr and strlen. Reordering instructions and loop unrolling helped in getting better performance. Reviewed-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>
* x86: Check PT_GNU_PROPERTY earlyH.J. Lu2023-12-111-40/+80
| | | | | | | The PT_GNU_PROPERTY segment is scanned before PT_NOTE. For binaries with the PT_GNU_PROPERTY segment, we can check it to avoid scan of the PT_NOTE segment. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
* sysdeps/x86/Makefile: Split and sort testsH.J. Lu2023-12-111-32/+78
| | | | Put each test on a separate line and sort tests.
* powerpc: Optimized strcmp for power10Amrita H S2023-12-075-1/+240
| | | | | | | | | | | | | | | | | | | This patch is based on __strcmp_power9 and __strlen_power10. Improvements from __strcmp_power9: 1. Uses new POWER10 instructions - This code uses lxvp to decrease contention on load by loading 32 bytes per instruction. 2. Performance implication - This version has around 30% better performance on average. - Performance regression is seen for a specific combination of sizes and alignments. Some of them is observed without changes also, while rest may be induced by the patch. Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com> Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>
* elf: Ignore LD_BIND_NOW and LD_BIND_NOT for setuid binariesAdhemerval Zanella2023-12-051-0/+2
| | | | | | | | To avoid any environment variable to change setuid binaries semantics. Checked on x86_64-linux-gnu. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* elf: Ignore loader debug env vars for setuidAdhemerval Zanella2023-12-051-0/+2
| | | | | | | | | | | | | Loader already ignores LD_DEBUG, LD_DEBUG_OUTPUT, and LD_TRACE_LOADED_OBJECTS. Both LD_WARN and LD_VERBOSE are similar to LD_DEBUG, in the sense they enable additional checks and debug information, so it makes sense to disable them. Also add both LD_VERBOSE and LD_WARN on filtered environment variables for setuid binaries. Checked on x86_64-linux-gnu. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* aarch64: correct CFI in rawmemchr (bug 31113)Andreas Schwab2023-12-051-1/+1
| | | | | | | The .cfi_return_column directive changes the return column for the whole FDE range. But the actual intent is to tell the unwinder that the value in x30 (lr) now resides in x15 after the move, and that is expressed by the .cfi_register directive.
* math: Add new exp10 implementationJoe Ramsay2023-12-043-24/+135
| | | | | | | | | | | | | | | | | | | | | | | New implementation is based on the existing exp/exp2, with different reduction constants and polynomial. Worst-case error in round-to- nearest is 0.513 ULP. The exp/exp2 shared table is reused for exp10 - .rodata size of e_exp_data increases by 64 bytes. As for exp/exp2, targets with single-instruction rounding/conversion intrinsics can use them by toggling TOINT_INTRINSICS=1 and adding the necessary code to their math_private.h. Improvements on Neoverse V1 compared to current GLIBC master: exp10 thruput: 3.3x in [-0x1.439b746e36b52p+8 0x1.34413509f79ffp+8] exp10 latency: 1.8x in [-0x1.439b746e36b52p+8 0x1.34413509f79ffp+8] Tested on: aarch64-linux-gnu (TOINT_INTRINSICS, fma contraction) and x86_64-linux-gnu (!TOINT_INTRINSICS, no fma contraction) Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* aarch64: fix tested ifunc variantsSzabolcs Nagy2023-12-041-3/+3
| | | | | Don't test a64fx string functions when BTI is enabled since they are not BTI compatible.
* hurd: [!__USE_MISC] Do not #undef BSD macros in ioctlsSamuel Thibault2023-12-021-0/+2
| | | | | When e.g. including termios.h first and then sys/ioctl.h, without e.g. _BSD_SOURCE, the latter would #undef e.g. ECHO, without defining it.
* linux: Make fdopendir fail with O_PATH (BZ 30373)Adhemerval Zanella2023-11-303-1/+56
| | | | | | | | | It is not strictly required by the POSIX, since O_PATH is a Linux extension, but it is QoI to fail early instead of at readdir. Also the check is free, since fdopendir already checks if the file descriptor is opened for read. Checked on x86_64-linux-gnu.
* Avoid padding in _init and _fini. [BZ #31042]Stefan Liebler2023-11-302-3/+1
| | | | | | | | | | | | | | | | | | | | | The linker just concatenates the .init and .fini sections which results in the complete _init and _fini functions. If needed the linker adds padding bytes due to an alignment. GNU ld is adding NOPs, which is fine. But e.g. mold is adding traps which results in broken _init and _fini functions. Thus this patch removes the alignment in .init and .fini sections in crtn.S files. We keep the 4 byte function alignment in crti.S files. As the assembler now also outputs the start of _init and _fini functions as multiples of 4 byte, it perhaps has to fill it. Although GNU as is using NOPs here, to be sure, we just keep the alignment with 0x07 (=NOPs) at the end of crti.S. In order to avoid an obvious NOP slide in _fini, this patch also uses an lg instead of lgr instruction. Then the emitted instructions needs a multiple of 4 bytes.
* aarch64: Improve special-case handling in AdvSIMD double-precision libmvec ↵Joe Ramsay2023-11-291-1/+7
| | | | | | | routines Avoids emitting many saves/restores of vector registers, reduces the amount of code generated around the scalar fallback.
* x86: Only align destination to 1x VEC_SIZE in memset 4x loopNoah Goldstein2023-11-281-1/+1
| | | | | | | | | Current code aligns to 2x VEC_SIZE. Aligning to 2x has no affect on performance other than potentially resulting in an additional iteration of the loop. 1x maintains aligned stores (the only reason to align in this case) and doesn't incur any unnecessary loop iterations. Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>