about summary refs log tree commit diff
path: root/sysdeps
Commit message (Collapse)AuthorAgeFilesLines
* This patch adds larger ulp errors for the log2p1 function.Paul Zimmermann2024-07-221-5/+5
| | | | | | | | Changes in v2: - added larger error for long double on AMD reported by Adhemerval (https://sourceware.org/pipermail/libc-alpha/2024-June/157755.html) Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* linux: Trivial test output fix in tst-pkeyAndreas K. Hüttel2024-07-191-1/+1
| | | | Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
* linux: Also check pkey_get for ENOSYS on tst-pkey (BZ 31996)Adhemerval Zanella2024-07-191-1/+7
| | | | | | | | | | | The powerpc pkey_get/pkey_set support was only added for 64-bit [1], and tst-pkey only checks if the support was present with pkey_alloc (which does not fail on powerpc32, at least running a 64-bit kernel). Checked on powerpc-linux-gnu. [1] https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=a803367bab167f5ec4fde1f0d0ec447707c29520 Reviewed-By: Andreas K. Huettel <dilfridge@gentoo.org>
* powerpc: Update soft-fp ulpsAdhemerval Zanella2024-07-191-0/+103
| | | | Results based on regen-ulps using gcc 11.2.1 on a POWER8 machine.
* Fix usage of _STACK_GROWS_DOWN and _STACK_GROWS_UP defines [BZ 31989]John David Anglin2024-07-191-1/+1
| | | | | Signed-off-by: John David Anglin <dave.anglin@bell.net> Reviewed-By: Andreas K. Hüttel <dilfridge@gentoo.org>
* x32: xfail elf/tst-platform-1 [BZ #22363]H.J. Lu2024-07-191-0/+6
| | | | | | | | Xfail elf/tst-platform-1 on x32 since kernel passes i686 in AT_PLATFORM. See https://sourceware.org/bugzilla/show_bug.cgi?id=22363 Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sam James <sam@gentoo.org>
* Revert "LoongArch: Add cfi instructions for _dl_tlsdesc_dynamic"Andreas K. Hüttel2024-07-175-258/+373
| | | | | | | | We're in freeze for the 2.40 release. This reverts commit 43224b1379d60b1ad98d29ef3d7905d55f828a9f. Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
* htl: Fix __pthread_init_thread declaration and definitionSamuel Thibault2024-07-172-2/+3
| | | | | | 0e75c4a4634f ("hurd: Fix pthread_self() without libpthread") added a declaration for ___pthread_init_thread instead of __pthread_init_thread, and missed defining the external hidden symbol.
* hurd: Fix pthread_self() without libpthreadSamuel Thibault2024-07-173-11/+12
| | | | | | | | | | | | | 5476f8cd2e68 ("htl: move pthread_self info libc.") moved the htl pthread_self() function from libpthread to libc, replacing the previous libc stub that just returns 0. And 53da64d1cf36 ("htl: Initialize ___pthread_self early") added initialization code which is needed before being able to call pthread_self. It is currently in libpthread, and thus never called before programs can call pthread_self from libc, which then segfaults when accessing _pthread_self()->thread. This moves the initialization to libc itself, as initialized variables, so pthread_self can always be called fine.
* LoongArch: Add cfi instructions for _dl_tlsdesc_dynamicmengqinggang2024-07-175-373/+258
| | | | | | | | | | | | | | In _dl_tlsdesc_dynamic, there are three 'addi.d sp, sp, -size' instructions to allocate stack size for Float/LSX/LASX registers. Every 'addi.d sp, sp, -size' needs a cfi_adjust_cfa_offset because of sp is used to compute CFA. But only one 'addi.d sp, sp, -size' will be run according to HWCAP value. And all cfi_adjust_cfa_offset will be executed in stack unwinding, it result in incorrect CFA. Change _dl_tlsdesc_dynamic to _dl_tlsdesc_dynamic, _dl_tlsdesc_dynamic_lsx and _dl_tlsdesc_dynamic_lasx. Conflicting cfi instructions can be distributed to the three functions. And cfi instructions can correspond to stack down instructions.
* x86: Disable non-temporal memset on Skylake ServerNoah Goldstein2024-07-165-12/+26
| | | | | | | | | | | | | | | | | The original commit enabling non-temporal memset on Skylake Server had erroneous benchmarks (actually done on ICX). Further benchmarks indicate non-temporal stores may in fact by a regression on Skylake Server. This commit may be over-cautious in some cases, but should avoid any regressions for 2.40. Tested using qemu on all x86_64 cpu arch supported by both qemu + GLIBC. Reviewed-by: DJ Delorie <dj@redhat.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* Add pthread_getname_np and pthread_setname_np for HurdFlavio Cruz2024-07-167-13/+226
| | | | | | | | | | | We use thread_get_name and thread_set_name to get and set the thread name, so nothing is stored in the thread structure since these functions are supposed to be called sparingly. One notable difference with Linux is that the thread name is up to 32 chars, whereas Linux's is 16. Also added a mach_RPC_CHECK to check for the existing of gnumach RPCs.
* math: Update alpha ulpsAndreas K. Hüttel2024-07-141-0/+48
| | | | | | | | Linux alphadev 6.9.8-gentoo-alpha #1 Sun Jul 7 00:45:49 EDT 2024 alpha EV68CB Titan GNU/Linux gcc (Gentoo 14.1.1_p20240622 p2) 14.1.1 20240622 GNU ld (Gentoo 2.42 p6) 2.42.0 Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
* tests: XFAIL audit tests failing on all mips configurations, bug 29404Andreas K. Hüttel2024-07-121-0/+9
| | | | Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
* s390x: Fix segfault in wcsncmp [BZ #31934]Stefan Liebler2024-07-111-9/+1
| | | | | | | | | | | | | | | | | | | The z13/vector-optimized wcsncmp implementation segfaults if n=1 and there is only one character (equal on both strings) before the page end. Then it loads and compares one character and misses to check n again. The following load fails. This patch removes the extra load and compare of the first character and just start with the loop which uses vector-load-to-block-boundary. This code-path also checks n. With this patch both tests are passing: - the simplified one mentioned in the bugzilla 31934 - the full one in Florian Weimer's patch: "manual: Document a GNU extension for strncmp/wcsncmp" (https://patchwork.sourceware.org/project/glibc/patch/874j9eml6y.fsf@oldenburg.str.redhat.com/): On s390x-linux-gnu (z16), the new wcsncmp test fails due to bug 31934. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* Linux: Make __rseq_size useful for feature detection (bug 31965)Florian Weimer2024-07-093-10/+31
| | | | | | | | | | | The __rseq_size value is now the active area of struct rseq (so 20 initially), not the full struct size including padding at the end (32 initially). Update misc/tst-rseq to print some additional diagnostics. Reviewed-by: Michael Jeanson <mjeanson@efficios.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
* math: Update m68k ULPsAndreas K. Hüttel2024-07-081-90/+361
| | | | | | | | | | | | | | | | This hasn't been looked at for a loong time (already guessing from the number of missing entries), and it ain't pretty. There are some 9-ulps results for float. - ZaZaZebra (qemu-system-m68k clone of PowerBook 190 system) - GCC 13.3.1 20240614 (Gentoo 13.3.1_p20240614 p17) - ld GNU ld (Gentoo 2.42 p6) 2.42.0 - Linux ZaZaZebra 4.19.0-5-m68k #1 Gentoo 4.19.37-5 (2019-06-19) m68k 68040 68040 GNU/Linux - manual build - ../glibc/configure --enable-fortify-source --prefix=/usr - Tested by Immolo (via Andreas K. Hüttel) Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
* elf: Make dl-rseq-symbols Linux onlyAdhemerval Zanella2024-07-042-0/+68
| | | | | | And avoid a Hurd build failures. Checked on x86_64-linux-gnu.
* nptl: fix potential merge of __rseq_* relro symbolsMichael Jeanson2024-07-031-8/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | While working on a patch to add support for the extensible rseq ABI, we came across an issue where a new 'const' variable would be merged with the existing '__rseq_size' variable. We tracked this to the use of '-fmerge-all-constants' which allows the compiler to merge identical constant variables. This means that all 'const' variables in a compile unit that are of the same size and are initialized to the same value can be merged. In this specific case, on 32 bit systems 'unsigned int' and 'ptrdiff_t' are both 4 bytes and initialized to 0 which should trigger the merge. However for reasons we haven't delved into when the attribute 'section (".data.rel.ro")' is added to the mix, only variables of the same exact types are merged. As far as we know this behavior is not specified anywhere and could change with a new compiler version, hence this patch. Move the definitions of these variables into an assembler file and add hidden writable aliases for internal use. This has the added bonus of removing the asm workaround to set the values on rseq registration. Tested on Debian 12 with GCC 12.2. Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Florian Weimer <fweimer@redhat.com>
* riscv: Update nofpu libm test ulpsDarius Rad2024-07-031-0/+20
| | | | Fixes 32 test failures.
* hppa/vdso: Provide 64-bit clock_gettime() vDSO onlyJohn David Anglin2024-07-021-3/+0
| | | | | | | | | | Adhemerval noticed that the gettimeofday() and 32-bit clock_gettime() vDSO calls won't be used by glibc on hppa, so there is no need to declare them. Both syscalls will be emulated by utilizing return values of the 64-bit clock_gettime() vDSO instead. Signed-off-by: Helge Deller <deller@gmx.de> Suggested-by: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
* MIPSr6/math: Use builtin fma and fmafYunQiang Su2024-07-011-0/+36
| | | | | | | | | | | | MIPSr6 has MADDF.s/MADDF.d instructions, which are fused. In MIPS ISA, double support can be subsetted. Only FMAF is enabled for this case. * sysdeps/mips/fpu/math-use-builtins-fma.h Signed-off-by: YunQiang Su <syq@gcc.gnu.org> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* elf: Support recursive use of dynamic TLS in interposed mallocFlorian Weimer2024-07-012-1/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It turns out that quite a few applications use bundled mallocs that have been built to use global-dynamic TLS (instead of the recommended initial-exec TLS). The previous workaround from commit afe42e935b3ee97bac9a7064157587777259c60e ("elf: Avoid some free (NULL) calls in _dl_update_slotinfo") does not fix all encountered cases unfortunatelly. This change avoids the TLS generation update for recursive use of TLS from a malloc that was called during a TLS update. This is possible because an interposed malloc has a fixed module ID and TLS slot. (It cannot be unloaded.) If an initially-loaded module ID is encountered in __tls_get_addr and the dynamic linker is already in the middle of a TLS update, use the outdated DTV, thus avoiding another call into malloc. It's still necessary to update the DTV to the most recent generation, to get out of the slow path, which is why the check for recursion is needed. The bookkeeping is done using a global counter instead of per-thread flag because TLS access in the dynamic linker is tricky. All this will go away once the dynamic linker stops using malloc for TLS, likely as part of a change that pre-allocates all TLS during pthread_create/dlopen. Fixes commit d2123d68275acc0f061e73d5f86ca504e0d5a344 ("elf: Fix slow tls access after dlopen [BZ #19924]"). Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* x86: Set default non_temporal_threshold for Zhaoxin processorsMayShao-oc2024-06-302-2/+5
| | | | | | | | | Current 'non_temporal_threshold' set to 'non_temporal_threshold_lowbound' on Zhaoxin processors without ERMS. The default 'non_temporal_threshold_lowbound' is too small for the KH-40000 and KX-7000 Zhaoxin processors, this patch updates the value to 'shared / cachesize_non_temporal_divisor'. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
* x86_64: Optimize large size copy in memmove-ssse3MayShao-oc2024-06-301-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch optimizes large size copy using normal store when src > dst and overlap. Make it the same as the logic in memmove-vec-unaligned-erms.S. Current memmove-ssse3 use '__x86_shared_cache_size_half' as the non- temporal threshold, this patch updates that value to '__x86_shared_non_temporal_threshold'. Currently, the __x86_shared_non_temporal_threshold is cpu-specific, and different CPUs will have different values based on the related nt-benchmark results. However, in memmove-ssse3, the nontemporal threshold uses '__x86_shared_cache_size_half', which sounds unreasonable. The performance is not changed drastically although shows overall improvements without any major regressions or gains. Results on Zhaoxin KX-7000: bench-memcpy geometric_mean(N=20) New / Original: 0.999 bench-memcpy-random geometric_mean(N=20) New / Original: 0.999 bench-memcpy-large geometric_mean(N=20) New / Original: 0.978 bench-memmove geometric_mean(N=20) New / Original: 1.000 bench-memmmove-large geometric_mean(N=20) New / Original: 0.962 Results on Intel Core i5-6600K: bench-memcpy geometric_mean(N=20) New / Original: 1.001 bench-memcpy-random geometric_mean(N=20) New / Original: 0.999 bench-memcpy-large geometric_mean(N=20) New / Original: 1.001 bench-memmove geometric_mean(N=20) New / Original: 0.995 bench-memmmove-large geometric_mean(N=20) New / Original: 0.936 Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
* x86: Set preferred CPU features on the KH-40000 and KX-7000 Zhaoxin processorsMayShao-oc2024-06-301-16/+35
| | | | | | | | | | | | Fix code formatting under the Zhaoxin branch and add comments for different Zhaoxin models. Unaligned AVX load are slower on KH-40000 and KX-7000, so disable the AVX_Fast_Unaligned_Load. Enable Prefer_No_VZEROUPPER and Fast_Unaligned_Load features to use sse2_unaligned version of memset,strcpy and strcat. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
* Aarch64: Add new memset for Qualcomm's oryon-1 coreAndrew Pinski2024-06-304-0/+176
| | | | | | | | | | | | | | | | | | | | Qualcom's new core, oryon-1, has a different characteristics for memset than the current versions of memset. For non-zero, larger sizes, using GPRs rather than the SIMD stores is ~30% faster. For even larger sizes, using the nontemporal stores is needed not to polute the L1/L2 caches. For zero values, using `dc zva` should be used. Since we know the size will always be 64 bytes, we don't need to figure out the size there. I started with the emag memset and added back the `dc zva` code. Changes since v1: * v3: Fix comment formating Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* Aarch64: Add memcpy for qualcomm's oryon-1 coreAndrew Pinski2024-06-305-0/+316
| | | | | | | | | | | | | | | | | | | Qualcomm's new core (oryon-1) has a different performance characteristic than other cores. For memcpy, it is faster to use the GPRs to do the copy for large sizes (2x faster). For even larger sizes, it is better to use the nontemporal load/store instructions so we don't pollute the L1/L2 caches. For smaller sizes, the characteristic are very similar to other cores. I used the thunderx memcpy as a starting point and expanded from there. Changes since v1: * v2: Fix ordering in Makefile. * v3: Fix comment grammar about the ldnp/stnp instructions. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* arm: Avoid UB in elf_machine_rel()Palmer Dabbelt2024-06-261-5/+4
| | | | | | | | | | This recently came up during a cleanup to remove misaligned accesses from the RISC-V port. Link: https://sourceware.org/pipermail/libc-alpha/2022-June/139961.html Suggested-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Reviewed-by: Fangrui Song <maskray@google.com>
* LoongArch: Fix tst-gnu2-tls2 test casemengqinggang2024-06-261-143/+153
| | | | | | | | | | | | | | | | | | | | | | | | | | asm volatile ("movfcsr2gr $t0, $fcsr0" ::: "$t0"); asm volatile ("st.d $t0, %0" :"=m"(restore_fcsr)); generate to the following instructions with -Og flag: movfcsr2gr $t0, $zero addi.d $t0, $sp, 2047(0x7ff) addi.d $t0, $t0, 77(0x4d) st.w $t0, $t0, 0 fcsr0 register and restore_fcsr variable are both stored in t0 register. Change to: asm volatile ("movfcsr2gr %0, $fcsr0" :"=r"(restore_fcsr)); to avoid restore_fcsr address in t0. Comparing float value using memcmp because float value cannot be directly compared for equality. Put LOAD_REGISTER_FCSR and SAVE_REGISTER_FCC after LOAD_REGISTER_FLOAT. Some float instructions may change fcsr register.
* posix: Fix pidfd_spawn/pidfd_spawnp leak if execve fails (BZ 31695)Adhemerval Zanella2024-06-251-7/+16
| | | | | | | | | | | | | | | | | If the pidfd_spawn/pidfd_spawnp helper process succeeds, but evecve fails for some reason (either with an invalid/non-existent, memory allocation, etc.) the resulting pidfd is never closed, nor returned to caller (so it can call close). Since the process creation failed, it should be up to posix_spawn to also, close the file descriptor in this case (similar to what it does to reap the process). This patch also changes the waitpid with waitid (P_PIDFD) for pidfd case, to avoid a possible pid re-use. Checked on x86_64-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* Revert "MIPSr6/math: Use builtin fma and fmaf"Andreas K. Hüttel2024-06-251-13/+0
| | | | | | | Apologies, I mistakenly interpreted this to be already accepted. Reverting until v6 or later is reviewed and approved. This reverts commit 9e06e4a43b58519991acbed1d7f33abc40249226.
* RISC-V: Execute a PAUSE hint in spin loopsChristoph Müllner2024-06-241-0/+3
| | | | | | | | | | | | | | | | The atomic_spin_nop() macro can be used to run arch-specific code in the body of a spin loop to potentially improve efficiency. RISC-V's Zihintpause extension includes a PAUSE instruction for this use-case, which is encoded as a HINT, which means that it behaves like a NOP on systems that don't implement Zihintpause. Binutils supports Zihintpause since 2.36, so this patch uses the ".insn" directive to keep the code compatible with older toolchains. Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
* MIPSr6/math: Use builtin fma and fmafYunQiang Su2024-06-241-0/+13
| | | | | | | | | | | | | MIPSr6 has MADDF.s/MADDF.d instructions, which are fused. In MIPS ISA, double support can be subsetted. Only FMAF is enabled for this case. * sysdeps/mips/fpu/math-use-builtins-fma.h Signed-off-by: YunQiang Su <syq@gcc.gnu.org> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
* hppa/vdso: Add wrappers for vDSO functionsJohn David Anglin2024-06-231-0/+12
| | | | | | | | | | | | The upcoming parisc (hppa) v6.11 Linux kernel will include vDSO support for gettimeofday(), clock_gettime() and clock_gettime64() syscalls for 32- and 64-bit userspace. The patch below adds the necessary glue code for glibc. Signed-off-by: Helge Deller <deller@gmx.de> Changes in v2: - add vsyscalls for 64-bit too
* Update hppa libm-test-ulpsJohn David Anglin2024-06-231-0/+48
|
* Update hppa libm-test-ulpsJohn David Anglin2024-06-201-0/+16
|
* RISC-V: Update ulpsJulian Zhu2024-06-201-0/+80
| | | | | | For the exp10m1, exp2m1, log10p1 and log2p1 implementations. Signed-off-by: Julian Zhu <jz531210@gmail.com>
* MIPS: Update ulpsJulian Zhu2024-06-202-0/+108
| | | | | | Update mips32/mips64 ulps for the exp10m1, exp2m1, and log10p1 implementations. Signed-off-by: Julian Zhu <jz531210@gmail.com>
* i386: Update ulpsFlorian Weimer2024-06-201-2/+2
| | | | | This is from a -march=i686 -mtune=generic build with --disable-multi-arch, running on a Cascade Lake CPU.
* s390x: Capture grep output in static PIE checkFlorian Weimer2024-06-202-7/+7
| | | | | | | | | | | The test is not a run-time check, so update the description. Also use readelf -W for a more stable output format and fix an LC_ALL typo. This avoids garbled configure messages: checking for s390-specific static PIE requirements (runtime check)... 0x0000000000000017 (JMPREL) 0x280 yes
* powerpc: Update ulpsFlorian Weimer2024-06-201-1/+13
| | | | | Results based on POWER8 and POWER9 machines running powerpc64-linux-gnu, with and without --disable-multi-arch.
* i386: Update ulpsFlorian Weimer2024-06-202-8/+104
| | | | | | Based on a -march=x86-64-v4 -mfpmath=sse build, with and without --disable-multi-arch, running on a Zen 4 CPU. Also used different -march=x8i6-64-v… settings.
* LoongArch: Update ulpsXi Ruoyao2024-06-191-0/+60
| | | | | | Add ulps for recently added C23 exp10m1, exp2m1, and log10p1 functions. Signed-off-by: Xi Ruoyao <xry111@xry111.site>
* sparc: Regenerate ULPsAndreas K. Hüttel2024-06-191-0/+80
| | | | | | Linux catbus 5.15.110-gentoo-r1 #1 SMP Fri Jun 9 17:53:23 PDT 2023 sparc64 sun4v UltraSparc T5 (Niagara5) GNU/Linux Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
* s390x: Regenerate ULPs.Stefan Liebler2024-06-191-0/+60
| | | | | | | | Needed due to: - "Implement C23 log10p1" commit ID 55eb99e9a9d840ba452b128be14d6529c2dde039 - "Implement C23 exp2m1, exp10m1" commit ID 7ec903e028271d029818378fd60ddaf6b76b89ac
* LoongArch: Fix _dl_tlsdesc_dynamic in LSX casemengqinggang2024-06-191-9/+9
| | | | | | | HWCAP value is overwritten at the first comparison of the LASX case. The second comparison at LSX get incorrect result. Change to use t0 to save HWCAP value, and use t1 to save comparison result.
* arm: Update ulpsAdhemerval Zanella2024-06-181-0/+48
| | | | For the exp10m1, exp2m1, and log10p1 implementations.
* aarch64: Update ulpsAdhemerval Zanella2024-06-181-0/+60
| | | | For the exp10m1, exp2m1, and log10p1 implementations.
* powerpc: Update ulpsAdhemerval Zanella2024-06-181-0/+60
| | | | For the exp10m1, exp2m1, and log10p1 implementations.