about summary refs log tree commit diff
path: root/sysdeps
Commit message (Collapse)AuthorAgeFilesLines
* x86/cet: Check legacy shadow stack code in .init_array sectionH.J. Lu2023-12-1911-0/+330
| | | | | | Verify that legacy shadow stack code in .init_array section in application and shared library, which are marked as shadow stack enabled, will trigger segfault.
* x86/cet: Add tests for GLIBC_TUNABLES=glibc.cpu.hwcaps=-SHSTKH.J. Lu2023-12-193-0/+28
| | | | | Verify that GLIBC_TUNABLES=glibc.cpu.hwcaps=-SHSTK turns off shadow stack properly.
* x86/cet: Check CPU_FEATURE_ACTIVE when CET is disabledH.J. Lu2023-12-193-0/+9
| | | | | Verify that CPU_FEATURE_ACTIVE (SHSTK) works properly when CET is disabled.
* x86/cet: Check legacy shadow stack applicationsH.J. Lu2023-12-196-0/+130
| | | | | Add tests to verify that legacy shadow stack applications run properly when shadow stack is enabled in Linux kernel.
* s390: Set psw addr field in getcontext and friends.Stefan Liebler2023-12-196-0/+34
| | | | | | | | | | | | | | | | | | | | So far if the ucontext structure was obtained by getcontext and co, the return address was stored in general purpose register 14 as it is defined as return address in the ABI. In contrast, the context passed to a signal handler contains the address in psw.addr field. If somebody e.g. wants to dump the address of the context, the origin needs to be known. Now this patch adjusts getcontext and friends and stores the return address also in psw.addr field. Note that setcontext isn't adjusted and it is not supported to pass a ucontext structure from signal-handler to setcontext. We are not able to restore all registers and branching to psw.addr without clobbering one register.
* x86: Unifies 'strlen-evex' and 'strlen-evex512' implementations.Matthew Sterrett2023-12-185-472/+439
| | | | | | | | | | | | | | | | | | | | | | This commit uses a common implementation 'strlen-evex-base.S' for both 'strlen-evex' and 'strlen-evex512' The motivation is to reduce the number of implementations to maintain. This incidentally gives a small performance improvement. All tests pass on x86. Benchmarks were taken on SKX. https://www.intel.com/content/www/us/en/products/sku/123613/intel-core-i97900x-xseries-processor-13-75m-cache-up-to-4-30-ghz/specifications.html Geometric mean for strlen-evex512 over all benchmarks (N=10) was (new/old) 0.939 Geometric mean for wcslen-evex512 over all benchmarks (N=10) was (new/old) 0.965 Code Size Changes: strlen-evex512.S : +24 bytes wcslen-evex512.S : +54 bytes Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
* x86/cet: Don't assume that SHSTK implies IBTH.J. Lu2023-12-183-11/+11
| | | | | | | Since shadow stack (SHSTK) is enabled in the Linux kernel without enabling indirect branch tracking (IBT), don't assume that SHSTK implies IBT. Use "CPU_FEATURE_ACTIVE (IBT)" to check if IBT is active and "CPU_FEATURE_ACTIVE (SHSTK)" to check if SHSTK is active.
* x86/cet: Check user_shstk in /proc/cpuinfoH.J. Lu2023-12-171-1/+1
| | | | | Linux kernel reports CPU shadow stack feature in /proc/cpuinfo as user_shstk, instead of shstk.
* powerpc: Add space for HWCAP3/HWCAP4 in the TCB for future Power.Manjunath Matti2023-12-157-1/+26
| | | | | | | | | | | | | | This patch reserves space for HWCAP3/HWCAP4 in the TCB of powerpc. These hardware capabilities bits will be used by future Power architectures. Versioned symbol '__parse_hwcap_3_4_and_convert_at_platform' advertises the availability of the new HWCAP3/HWCAP4 data in the TCB. This is an ABI change for GLIBC 2.39. Suggested-by: Peter Bergner <bergner@linux.ibm.com> Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
* powerpc: Fix performance issues of strcmp power10Amrita H S2023-12-151-66/+95
| | | | | | | | | | | | | | | | | Current implementation of strcmp for power10 has performance regression for multiple small sizes and alignment combination. Most of these performance issues are fixed by this patch. The compare loop is unrolled and page crosses of unrolled loop is handled. Thanks to Paul E. Murphy for helping in fixing the performance issues. Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com> Co-Authored-By: Paul E. Murphy <murphyp@linux.ibm.com> Reviewed-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>
* powerpc : Add optimized memchr for POWER10MAHESH BODAPATI2023-12-145-10/+367
| | | | | | Optimized memchr for POWER10 based on existing rawmemchr and strlen. Reordering instructions and loop unrolling helped in getting better performance. Reviewed-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>
* x86: Check PT_GNU_PROPERTY earlyH.J. Lu2023-12-111-40/+80
| | | | | | | The PT_GNU_PROPERTY segment is scanned before PT_NOTE. For binaries with the PT_GNU_PROPERTY segment, we can check it to avoid scan of the PT_NOTE segment. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
* sysdeps/x86/Makefile: Split and sort testsH.J. Lu2023-12-111-32/+78
| | | | Put each test on a separate line and sort tests.
* powerpc: Optimized strcmp for power10Amrita H S2023-12-075-1/+240
| | | | | | | | | | | | | | | | | | | This patch is based on __strcmp_power9 and __strlen_power10. Improvements from __strcmp_power9: 1. Uses new POWER10 instructions - This code uses lxvp to decrease contention on load by loading 32 bytes per instruction. 2. Performance implication - This version has around 30% better performance on average. - Performance regression is seen for a specific combination of sizes and alignments. Some of them is observed without changes also, while rest may be induced by the patch. Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com> Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>
* elf: Ignore LD_BIND_NOW and LD_BIND_NOT for setuid binariesAdhemerval Zanella2023-12-051-0/+2
| | | | | | | | To avoid any environment variable to change setuid binaries semantics. Checked on x86_64-linux-gnu. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* elf: Ignore loader debug env vars for setuidAdhemerval Zanella2023-12-051-0/+2
| | | | | | | | | | | | | Loader already ignores LD_DEBUG, LD_DEBUG_OUTPUT, and LD_TRACE_LOADED_OBJECTS. Both LD_WARN and LD_VERBOSE are similar to LD_DEBUG, in the sense they enable additional checks and debug information, so it makes sense to disable them. Also add both LD_VERBOSE and LD_WARN on filtered environment variables for setuid binaries. Checked on x86_64-linux-gnu. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* aarch64: correct CFI in rawmemchr (bug 31113)Andreas Schwab2023-12-051-1/+1
| | | | | | | The .cfi_return_column directive changes the return column for the whole FDE range. But the actual intent is to tell the unwinder that the value in x30 (lr) now resides in x15 after the move, and that is expressed by the .cfi_register directive.
* math: Add new exp10 implementationJoe Ramsay2023-12-043-24/+135
| | | | | | | | | | | | | | | | | | | | | | | New implementation is based on the existing exp/exp2, with different reduction constants and polynomial. Worst-case error in round-to- nearest is 0.513 ULP. The exp/exp2 shared table is reused for exp10 - .rodata size of e_exp_data increases by 64 bytes. As for exp/exp2, targets with single-instruction rounding/conversion intrinsics can use them by toggling TOINT_INTRINSICS=1 and adding the necessary code to their math_private.h. Improvements on Neoverse V1 compared to current GLIBC master: exp10 thruput: 3.3x in [-0x1.439b746e36b52p+8 0x1.34413509f79ffp+8] exp10 latency: 1.8x in [-0x1.439b746e36b52p+8 0x1.34413509f79ffp+8] Tested on: aarch64-linux-gnu (TOINT_INTRINSICS, fma contraction) and x86_64-linux-gnu (!TOINT_INTRINSICS, no fma contraction) Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* aarch64: fix tested ifunc variantsSzabolcs Nagy2023-12-041-3/+3
| | | | | Don't test a64fx string functions when BTI is enabled since they are not BTI compatible.
* hurd: [!__USE_MISC] Do not #undef BSD macros in ioctlsSamuel Thibault2023-12-021-0/+2
| | | | | When e.g. including termios.h first and then sys/ioctl.h, without e.g. _BSD_SOURCE, the latter would #undef e.g. ECHO, without defining it.
* linux: Make fdopendir fail with O_PATH (BZ 30373)Adhemerval Zanella2023-11-303-1/+56
| | | | | | | | | It is not strictly required by the POSIX, since O_PATH is a Linux extension, but it is QoI to fail early instead of at readdir. Also the check is free, since fdopendir already checks if the file descriptor is opened for read. Checked on x86_64-linux-gnu.
* Avoid padding in _init and _fini. [BZ #31042]Stefan Liebler2023-11-302-3/+1
| | | | | | | | | | | | | | | | | | | | | The linker just concatenates the .init and .fini sections which results in the complete _init and _fini functions. If needed the linker adds padding bytes due to an alignment. GNU ld is adding NOPs, which is fine. But e.g. mold is adding traps which results in broken _init and _fini functions. Thus this patch removes the alignment in .init and .fini sections in crtn.S files. We keep the 4 byte function alignment in crti.S files. As the assembler now also outputs the start of _init and _fini functions as multiples of 4 byte, it perhaps has to fill it. Although GNU as is using NOPs here, to be sure, we just keep the alignment with 0x07 (=NOPs) at the end of crti.S. In order to avoid an obvious NOP slide in _fini, this patch also uses an lg instead of lgr instruction. Then the emitted instructions needs a multiple of 4 bytes.
* aarch64: Improve special-case handling in AdvSIMD double-precision libmvec ↵Joe Ramsay2023-11-291-1/+7
| | | | | | | routines Avoids emitting many saves/restores of vector registers, reduces the amount of code generated around the scalar fallback.
* x86: Only align destination to 1x VEC_SIZE in memset 4x loopNoah Goldstein2023-11-281-1/+1
| | | | | | | | | Current code aligns to 2x VEC_SIZE. Aligning to 2x has no affect on performance other than potentially resulting in an additional iteration of the loop. 1x maintains aligned stores (the only reason to align in this case) and doesn't incur any unnecessary loop iterations. Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
* Add TCP_MD5SIG_FLAG_IFINDEX from Linux 5.6 to netinet/tcp.h.Tobias Klauser2023-11-281-1/+2
| | | | | | | | This patch adds the TCP_MD5SIG_FLAG_IFINDEX constant from Linux 5.6 to sysdeps/gnu/netinet/tcp.h and updates struct tcp_md5sig accordingly to contain the device index. Reviewed-by: Florian Weimer <fweimer@redhat.com>
* Remove __access_noerrnoJoseph Myers2023-11-238-64/+5
| | | | | | | | | | | | | | | | A recent commit, apparently commit 6c6fce572fb8f583f14d898e54fd7d25ae91cf56 "elf: Remove /etc/suid-debug support", resulted in localplt failures for i686-gnu and x86_64-gnu: Missing required PLT reference: ld.so: __access_noerrno After that commit, __access_noerrno is actually no longer used at all. So rather than just removing the localplt expectation for that symbol for Hurd, completely remove all definitions of and references to that symbol. Tested for x86_64, and with build-many-glibcs.py for i686-gnu and x86_64-gnu.
* malloc: Use __get_nprocs on arena_get2 (BZ 30945)Adhemerval Zanella2023-11-222-7/+1
| | | | | | | | | | | | | | | | This restore the 2.33 semantic for arena_get2. It was changed by 11a02b035b46 to avoid arena_get2 call malloc (back when __get_nproc was refactored to use an scratch_buffer - 903bc7dcc2acafc). The __get_nproc was refactored over then and now it also avoid to call malloc. The 11a02b035b46 did not take in consideration any performance implication, which should have been discussed properly. The __get_nprocs_sched is still used as a fallback mechanism if procfs and sysfs is not acessible. Checked on x86_64-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>
* aarch64: Fix libmvec benchmarksJoe Ramsay2023-11-222-49/+81
| | | | | | | | These were broken by the new atan2 functions, as they were only set up for univariate functions. Arity is now detected from the input file - this revealed a mistake that the double-precision inputs were being used for both single- and double-precision routines, which is now remedied.
* elf: Remove LD_PROFILE for static binariesAdhemerval Zanella2023-11-2133-84/+144
| | | | | | | | | | | The _dl_non_dynamic_init does not parse LD_PROFILE, which does not enable profile for dlopen objects. Since dlopen is deprecated for static objects, it is better to remove the support. It also allows to trim down libc.a of profile support. Checked on x86_64-linux-gnu. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* s390: Use dl-symbol-redir-ifunc.h on cpu-tunablesAdhemerval Zanella2023-11-212-15/+17
| | | | | | | | | Using the memcmp symbol directly allows the compile to inline the memcmp calls (especially because _dl_tunable_set_hwcaps uses constants values), generating better code. Checked with tst-tunables on s390x-linux-gnu (qemu system). Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* x86: Use dl-symbol-redir-ifunc.h on cpu-tunablesAdhemerval Zanella2023-11-214-55/+32
| | | | | | | | | | | | | | | | The dl-symbol-redir-ifunc.h redirects compiler-generated libcalls to arch-specific memory implementations to avoid ifunc calls where it is not yet possible. The memcmp-isa-default-impl.h aims to fix the same issue by calling the specific memset implementation directly. Using the memcmp symbol directly allows the compiler to inline the memset calls (especially because _dl_tunable_set_hwcaps uses constants values), generating better code. Checked on x86_64-linux-gnu. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* elf: Fix _dl_debug_vdprintf to work before self-relocationAdhemerval Zanella2023-11-211-0/+24
| | | | | | | | | | | | | The strlen might trigger and invalid GOT entry if it used before the process is self-relocated (for instance on dl-tunables if any error occurs). For i386, _dl_writev with PIE requires to use the old 'int $0x80' syscall mode because the calling the TLS register (gs) is not yet initialized. Checked on x86_64-linux-gnu. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* elf: Add all malloc tunable to unsecvarsAdhemerval Zanella2023-11-211-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | Some environment variables allow alteration of allocator behavior across setuid boundaries, where a setuid program may ignore the tunable, but its non-setuid child can read it and adjust the memory allocator behavior accordingly. Most library behavior tunings is limited to the current process and does not bleed in scope; so it is unclear how pratical this misfeature is. If behavior change across privilege boundaries is desirable, it would be better done with a wrapper program around the non-setuid child that sets these envvars, instead of using the setuid process as the messenger. The patch as fixes tst-env-setuid, where it fail if any unsecvars is set. It also adds a dynamic test, although it requires --enable-hardcoded-path-in-tests so kernel correctly sets the setuid bit (using the loader command directly would require to set the setuid bit on the loader itself, which is not a usual deployment). Co-authored-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Checked on x86_64-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>
* elf: Ignore GLIBC_TUNABLES for setuid/setgid binariesAdhemerval Zanella2023-11-211-1/+0
| | | | | | | | | | | | | | | | | | | | The tunable privilege levels were a retrofit to try and keep the malloc tunable environment variables' behavior unchanged across security boundaries. However, CVE-2023-4911 shows how tricky can be tunable parsing in a security-sensitive environment. Not only parsing, but the malloc tunable essentially changes some semantics on setuid/setgid processes. Although it is not a direct security issue, allowing users to change setuid/setgid semantics is not a good security practice, and requires extra code and analysis to check if each tunable is safe to use on all security boundaries. It also means that security opt-in features, like aarch64 MTE, would need to be explicit enabled by an administrator with a wrapper script or with a possible future system-wide tunable setting. Co-authored-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: DJ Delorie <dj@redhat.com>
* elf: Add GLIBC_TUNABLES to unsecvarsAdhemerval Zanella2023-11-211-0/+1
| | | | | | | | | | | | setuid/setgid process now ignores any glibc tunables, and filters out all environment variables that might changes its behavior. This patch also adds GLIBC_TUNABLES, so any spawned process by setuid/setgid processes should set tunable explicitly. Checked on x86_64-linux-gnu. Reviewed-by: Florian Weimer <fweimer@redhat.com> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* hurd: Prevent the final file_exec_paths call from signalsSamuel Thibault2023-11-201-1/+15
| | | | | | | Otherwise if the exec server started thrashing the old task, we won't be able to restart the exec. This notably fixes building ghc.
* aarch64: Add vector implementations of expm1 routinesJoe Ramsay2023-11-2013-0/+462
| | | | May discard sign of 0 - auto tests for -0 and -0x1p-10000 updated accordingly.
* linux: Use fchmodat2 on fchmod for flags different than 0 (BZ 26401)Adhemerval Zanella2023-11-202-53/+75
| | | | | | | | | | | | | | | | | | Linux 6.6 (09da082b07bbae1c) added support for fchmodat2, which has similar semantics as fchmodat with an extra flag argument. This allows fchmodat to implement AT_SYMLINK_NOFOLLOW and AT_EMPTY_PATH without the need for procfs. The syscall is registered on all architectures (with value of 452 except on alpha which is 562, commit 78252deb023cf087). The tst-lchmod.c requires a small fix where fchmodat checks two contradictory assertions ('(st.st_mode & 0777) == 2' and '(st.st_mode & 0777) == 3'). Checked on x86_64-linux-gnu on a 6.6 kernel. Reviewed-by: Florian Weimer <fweimer@redhat.com>
* x86: Fix unchecked AVX512-VBMI2 usage in strrchr-evex-base.SNoah Goldstein2023-11-151-24/+51
| | | | | | | | | | | | | strrchr-evex-base used `vpcompress{b|d}` in the page cross logic but was missing the CPU_FEATURE checks for VBMI2 in the ifunc/ifunc-impl-list. The fix is either to add those checks or change the logic to not use `vpcompress{b|d}`. Choosing the latter here so that the strrchr-evex implementation is usable on SKX. New implementation is a bit slower, but this is in a cold path so its probably okay.
* sparc: Fix broken memset for sparc32 [BZ #31068]Andreas Larsson2023-11-151-2/+2
| | | | | | | | | | | Fixes commit a61933fe27df ("sparc: Remove bzero optimization") that after moving code jumped to the wrong label 4. Verfied by successfully running string/test-memset on sparc32. Signed-off-by: Andreas Larsson <andreas@gaisler.com> Signed-off-by: Ludwig Rydberg <ludwig.rydberg@gaisler.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* hurd: Fix spawni returning allocation errors.Samuel Thibault2023-11-141-2/+8
|
* AArch64: Remove Falkor memcpyWilco Dijkstra2023-11-137-331/+0
| | | | | | | | | | The latest implementations of memcpy are actually faster than the Falkor implementations [1], so remove the falkor/phecda ifuncs for memcpy and the now unused IS_FALKOR/IS_PHECDA defines. [1] https://sourceware.org/pipermail/libc-alpha/2022-December/144227.html Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* AArch64: Add memset_zva64Wilco Dijkstra2023-11-136-68/+38
| | | | | | | | Add a specialized memset for the common ZVA size of 64 to avoid the overhead of reading the ZVA size. Since the code is identical to __memset_falkor, remove the latter. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* AArch64: Cleanup emag memsetWilco Dijkstra2023-11-134-197/+90
| | | | | | | Cleanup emag memset - merge the memset_base64.S file, remove the unused ZVA code (since it is disabled on emag). Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* aarch64: Add vector implementations of log1p routinesJoe Ramsay2023-11-1013-0/+500
| | | | May discard sign of zero.
* aarch64: Add vector implementations of atan2 routinesJoe Ramsay2023-11-1015-0/+535
|
* aarch64: Add vector implementations of atan routinesJoe Ramsay2023-11-1013-0/+407
|
* aarch64: Add vector implementations of acos routinesJoe Ramsay2023-11-1013-1/+440
|
* aarch64: Add vector implementations of asin routinesJoe Ramsay2023-11-1013-1/+407
|
* elf: Add glibc.mem.decorate_maps tunableAdhemerval Zanella2023-11-071-5/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The PR_SET_VMA_ANON_NAME support is only enabled through a configurable kernel switch, mainly because assigning a name to a anonymous virtual memory area might prevent that area from being merged with adjacent virtual memory areas. For instance, with the following code: void *p1 = mmap (NULL, 1024 * 4096, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); void *p2 = mmap (p1 + (1024 * 4096), 1024 * 4096, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); The kernel will potentially merge both mappings resulting in only one segment of size 0x800000. If the segment is names with PR_SET_VMA_ANON_NAME with different names, it results in two mappings. Although this will unlikely be an issue for pthread stacks and malloc arenas (since for pthread stacks the guard page will result in a PROT_NONE segment, similar to the alignment requirement for the arena block), it still might prevent the mmap memory allocated for detail malloc. There is also another potential scalability issue, where the prctl requires to take the mmap global lock which is still not fully fixed in Linux [1] (for pthread stacks and arenas, it is mitigated by the stack cached and the arena reuse). So this patch disables anonymous mapping annotations as default and add a new tunable, glibc.mem.decorate_maps, can be used to enable it. [1] https://lwn.net/Articles/906852/ Checked on x86_64-linux-gnu and aarch64-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>