about summary refs log tree commit diff
path: root/sysdeps
Commit message (Collapse)AuthorAgeFilesLines
* linux: Define __ASSUME_CLONE3 to 0 for alpha, ia64, nios2, sh, and sparcAdhemerval Zanella Netto2023-09-055-0/+40
| | | | | | Not all architectures added clone3 syscall. Reviewed-by: Florian Weimer <fweimer@redhat.com>
* mips: Add the clone3 wrapperAdhemerval Zanella Netto2023-09-052-0/+141
| | | | | | | | | | It follows the internal signature: extern int clone3 (struct clone_args *__cl_args, size_t __size, int (*__func) (void *__arg), void *__arg); Checked on mips64el-linux-gnueabihf, mips64el-n32-linux-gnu, and mipsel-linux-gnu.
* arm: Add the clone3 wrapperAdhemerval Zanella Netto2023-09-052-0/+81
| | | | | | | | | It follows the internal signature: extern int clone3 (struct clone_args *__cl_args, size_t __size, int (*__func) (void *__arg), void *__arg); Checked on arm-linux-gnueabihf.
* htl: Fix stack information for main threadSamuel Thibault2023-09-031-3/+27
| | | | | We can easily directly ask the kernel with vm_region rather than assuming a one-page stack.
* elf: Fix slow tls access after dlopen [BZ #19924]Szabolcs Nagy2023-09-012-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In short: __tls_get_addr checks the global generation counter and if the current dtv is older then _dl_update_slotinfo updates dtv up to the generation of the accessed module. So if the global generation is newer than generation of the module then __tls_get_addr keeps hitting the slow dtv update path. The dtv update path includes a number of checks to see if any update is needed and this already causes measurable tls access slow down after dlopen. It may be possible to detect up-to-date dtv faster. But if there are many modules loaded (> TLS_SLOTINFO_SURPLUS) then this requires at least walking the slotinfo list. This patch tries to update the dtv to the global generation instead, so after a dlopen the tls access slow path is only hit once. The modules with larger generation than the accessed one were not necessarily synchronized before, so additional synchronization is needed. This patch uses acquire/release synchronization when accessing the generation counter. Note: in the x86_64 version of dl-tls.c the generation is only loaded once, since relaxed mo is not faster than acquire mo load. I have not benchmarked this. Tested by Adhemerval Zanella on aarch64, powerpc, sparc, x86 who reported that it fixes the performance issue of bug 19924. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* x86: Check the lower byte of EAX of CPUID leaf 2 [BZ #30643]H.J. Lu2023-08-291-18/+13
| | | | | | | | | | | The old Intel software developer manual specified that the low byte of EAX of CPUID leaf 2 returned 1 which indicated the number of rounds of CPUDID leaf 2 was needed to retrieve the complete cache information. The newer Intel manual has been changed to that it should always return 1 and be ignored. If the lower byte isn't 1, CPUID leaf 2 can't be used. In this case, we ignore CPUID leaf 2 and use CPUID leaf 4 instead. If CPUID leaf 4 doesn't contain the cache information, cache information isn't available at all. This addresses BZ #30643.
* LoongArch: Change loongarch to LoongArch in commentsdengjianbo2023-08-2924-24/+24
|
* LoongArch: Add ifunc support for memcmp{aligned, lsx, lasx}dengjianbo2023-08-297-0/+861
| | | | | | | | | | | According to glibc memcmp microbenchmark test results(Add generic memcmp), this implementation have performance improvement except the length is less than 3, details as below: Name Percent of time reduced memcmp-lasx 16%-74% memcmp-lsx 20%-50% memcmp-aligned 5%-20%
* LoongArch: Add ifunc support for memset{aligned, unaligned, lsx, lasx}dengjianbo2023-08-298-0/+688
| | | | | | | | | | | | | | According to glibc memset microbenchmark test results, for LSX and LASX versions, A few cases with length less than 8 experience performace degradation, overall, the LASX version could reduce the runtime about 15% - 75%, LSX version could reduce the runtime about 15%-50%. The unaligned version uses unaligned memmory access to set data which length is less than 64 and make address aligned with 8. For this part, the performace is better than aligned version. Comparing with the generic version, the performance is close when the length is larger than 128. When the length is 8-128, the unaligned version could reduce the runtime about 30%-70%, the aligned version could reduce the runtime about 20%-50%.
* LoongArch: Add ifunc support for memrchr{lsx, lasx}dengjianbo2023-08-297-0/+335
| | | | | | | | | According to glibc memrchr microbenchmark, this implementation could reduce the runtime as following: Name Percent of rutime reduced memrchr-lasx 20%-83% memrchr-lsx 20%-64%
* LoongArch: Add ifunc support for memchr{aligned, lsx, lasx}dengjianbo2023-08-297-0/+401
| | | | | | | | | | According to glibc memchr microbenchmark, this implementation could reduce the runtime as following: Name Percent of runtime reduced memchr-lasx 37%-83% memchr-lsx 30%-66% memchr-aligned 0%-15%
* LoongArch: Add ifunc support for rawmemchr{aligned, lsx, lasx}dengjianbo2023-08-297-0/+365
| | | | | | | | | According to glibc rawmemchr microbenchmark, A few cases tested with char '\0' experience performance degradation due to the lasx and lsx versions don't handle the '\0' separately. Overall, rawmemchr-lasx implementation could reduce the runtime about 40%-80%, rawmemchr-lsx implementation could reduce the runtime about 40%-66%, rawmemchr-aligned implementation could reduce the runtime about 20%-40%.
* LoongArch: Micro-optimize LD_PCRELXi Ruoyao2023-08-291-6/+4
| | | | | | | We are requiring Binutils >= 2.41, so explicit relocation syntax is always supported by the assembler. Use it to reduce one instruction. Signed-off-by: Xi Ruoyao <xry111@xry111.site>
* LoongArch: Remove support code for old linker in start.SXi Ruoyao2023-08-291-16/+3
| | | | | | We are requiring Binutils >= 2.41, so la.pcrel always works here. Signed-off-by: Xi Ruoyao <xry111@xry111.site>
* LoongArch: Simplify the autoconf check for static PIEXi Ruoyao2023-08-292-50/+16
| | | | | | | We are strictly requiring GAS >= 2.41 now, so we don't need to check assembler capability anymore. Signed-off-by: Xi Ruoyao <xry111@xry111.site>
* Add F_SEAL_EXEC from Linux 6.3 to bits/fcntl-linux.h.Kir Kolyshkin2023-08-281-0/+1
| | | | | | | | This patch adds the new F_SEAL_EXEC constant from Linux 6.3 (see Linux commit 6fd7353829c ("mm/memfd: add F_SEAL_EXEC") to bits/fcntl-linux.h. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* m68k: Use M68K_SCALE_AVAILABLE on __mpn_lshift and __mpn_rshiftAdhemerval Zanella2023-08-253-7/+14
| | | | | | | | | This patch adds a new macro, M68K_SCALE_AVAILABLE, similar to gmp scale_available_p (mpn/m68k/m68k-defs.m4) that expand to 1 if a scale factor can be used in addressing modes. This is used instead of __mc68020__ for some optimization decisions. Checked on a build for m68k-linux-gnu target mc68020 and mc68040.
* m68k: Fix build with -mcpu=68040 or higher (BZ 30740)Adhemerval Zanella2023-08-252-1/+21
| | | | | | | | GCC currently does not define __mc68020__ for -mcpu=68040 or higher, which memcpy/memmove assumptions. Since this memory copy optimization seems only intended for m68020, disable for other m680X0 variants. Checked on a build for m68k-linux-gnu target mc68020 and mc68040.
* LoongArch: Add ifunc support for strncmp{aligned, lsx}dengjianbo2023-08-246-0/+508
| | | | | | | | Based on the glibc microbenchmark, only a few short inputs with this strncmp-aligned and strncmp-lsx implementation experience performance degradation, overall, strncmp-aligned could reduce the runtime 0%-10% for aligned comparision, 10%-25% for unaligend comparision, strncmp-lsx could reduce the runtime about 0%-60%.
* LoongArch: Add ifunc support for strcmp{aligned, lsx}dengjianbo2023-08-246-0/+426
| | | | | | Based on the glibc microbenchmark, strcmp-aligned implementation could reduce the runtime 0%-10% for aligned comparison, 10%-20% for unaligned comparison, strcmp-lsx implemenation could reduce the runtime 0%-50%.
* LoongArch: Add ifunc support for strnlen{aligned, lsx, lasx}dengjianbo2023-08-247-0/+382
| | | | | | | Based on the glibc microbenchmark, strnlen-aligned implementation could reduce the runtime more than 10%, strnlen-lsx implementation could reduce the runtime about 50%-78%, strnlen-lasx implementation could reduce the runtime about 50%-88%.
* htl: move pthread_attr_setdetachstate into libcGuy-Fleury Iteriteka2023-08-243-4/+0
| | | | | Signed-off-by: Guy-Fleury Iteriteka <gfleury@disroot.org> Message-Id: <20230716084414.107245-11-gfleury@disroot.org>
* htl: move pthread_attr_getdetachstate into libcGuy-Fleury Iteriteka2023-08-243-4/+0
| | | | | Signed-off-by: Guy-Fleury Iteriteka <gfleury@disroot.org> Message-Id: <20230716084414.107245-10-gfleury@disroot.org>
* htl: move pthread_attr_setschedpolicy into libcGuy-Fleury Iteriteka2023-08-243-4/+0
| | | | | Signed-off-by: Guy-Fleury Iteriteka <gfleury@disroot.org> Message-Id: <20230716084414.107245-9-gfleury@disroot.org>
* htl: move pthread_attr_getschedpolicy into libcGuy-Fleury Iteriteka2023-08-243-4/+0
| | | | | Signed-off-by: Guy-Fleury Iteriteka <gfleury@disroot.org> Message-Id: <20230716084414.107245-8-gfleury@disroot.org>
* htl: move pthread_attr_setinheritsched into libcGuy-Fleury Iteriteka2023-08-243-4/+0
| | | | | Signed-off-by: Guy-Fleury Iteriteka <gfleury@disroot.org> Message-Id: <20230716084414.107245-7-gfleury@disroot.org>
* htl: move pthread_attr_getinheritsched into libcGuy-Fleury Iteriteka2023-08-243-4/+0
| | | | | Signed-off-by: Guy-Fleury Iteriteka <gfleury@disroot.org> Message-Id: <20230716084414.107245-6-gfleury@disroot.org>
* htl: move pthread_attr_getschedparam into libcGuy-Fleury Iteriteka2023-08-243-6/+0
| | | | | Signed-off-by: Guy-Fleury Iteriteka <gfleury@disroot.org> Message-Id: <20230716084414.107245-5-gfleury@disroot.org>
* htl: move pthread_setschedparam into libcGuy-Fleury Iteriteka2023-08-243-6/+0
| | | | | Signed-off-by: Guy-Fleury Iteriteka <gfleury@disroot.org> Message-Id: <20230716084414.107245-4-gfleury@disroot.org>
* htl: move pthread_getschedparam into libcGuy-Fleury Iteriteka2023-08-243-4/+0
| | | | | Signed-off-by: Guy-Fleury Iteriteka <gfleury@disroot.org> Message-Id: <20230716084414.107245-3-gfleury@disroot.org>
* htl: move pthread_equal into libcGuy-Fleury Iteriteka2023-08-243-4/+0
| | | | | Signed-off-by: Guy-Fleury Iteriteka <gfleury@disroot.org> Message-Id: <20230716084414.107245-2-gfleury@disroot.org>
* Linux: Avoid conflicting types in ld.so --list-diagnosticsFlorian Weimer2023-08-231-5/+8
| | | | | | | | The path auxv[*].a_val could either be an integer or a string, depending on the a_type value. Use a separate field, a_val_string, to simplify mechanical parsing of the --list-diagnostics output. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* x86_64: Add log1p with FMAH.J. Lu2023-08-214-0/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On Skylake, it changes log1p bench performance by: Before After Improvement max 63.349 58.347 8% min 4.448 5.651 -30% mean 12.0674 10.336 14% The minimum code path is if (hx < 0x3FDA827A) /* x < 0.41422 */ { if (__glibc_unlikely (ax >= 0x3ff00000)) /* x <= -1.0 */ { ... } if (__glibc_unlikely (ax < 0x3e200000)) /* |x| < 2**-29 */ { math_force_eval (two54 + x); /* raise inexact */ if (ax < 0x3c900000) /* |x| < 2**-54 */ { ... } else return x - x * x * 0.5; FMA and non-FMA code sequences look similar. Non-FMA version is slightly faster. Since log1p is called by asinh and atanh, it improves asinh performance by: Before After Improvement max 75.645 63.135 16% min 10.074 10.071 0% mean 15.9483 14.9089 6% and improves atanh performance by: Before After Improvement max 91.768 75.081 18% min 15.548 13.883 10% mean 18.3713 16.8011 8%
* Remove references to the defunct db2 subdirAndreas Schwab2023-08-213-11/+0
| | | | The db2 subdir has been removed more than 20 years ago.
* s390x: Fix static PIE condition for toolchain bootstrapping.Stefan Liebler2023-08-182-12/+138
| | | | | | | | | | The static PIE configure check uses link tests. When bootstrapping a cross-toolchain, the link tests fail due to missing crt-files / libc.so. As we explicitely want to test an issue in binutils (ld), we now also explicitely check for known linker versions. See also commit 368b7c614b102122b86af3953daea2b30230d0a8 S390: Use compile-only instead of also link-tests in configure.
* m68k: fix __mpn_lshift and __mpn_rshift for non-68020Andreas Schwab2023-08-172-4/+4
| | | | From revision 03f3d275d0d6 in the gmp repository.
* sysdeps: tst-bz21269: fix -Wreturn-typeSam James2023-08-171-2/+0
| | | | | | | Thanks to Andreas Schwab for reporting. Fixes: 652b9fdb77d9fd056d4dd26dad2c14142768ab49 Signed-off-by: Sam James <sam@gentoo.org>
* Loongarch: Add ifunc support for memcpy{aligned, unaligned, lsx, lasx} and ↵dengjianbo2023-08-1713-0/+2435
| | | | | | | | | | | | | | | memmove{aligned, unaligned, lsx, lasx} These implementations improve the time to copy data in the glibc microbenchmark as below: memcpy-lasx reduces the runtime about 8%-76% memcpy-lsx reduces the runtime about 8%-72% memcpy-unaligned reduces the runtime of unaligned data copying up to 40% memcpy-aligned reduece the runtime of unaligned data copying up to 25% memmove-lasx reduces the runtime about 20%-73% memmove-lsx reduces the runtime about 50% memmove-unaligned reduces the runtime of unaligned data moving up to 40% memmove-aligned reduces the runtime of unaligned data moving up to 25%
* Loongarch: Add ifunc support for strchr{aligned, lsx, lasx} and ↵dengjianbo2023-08-1712-0/+581
| | | | | | | | | | | | | strchrnul{aligned, lsx, lasx} These implementations improve the time to run strchr{nul} microbenchmark in glibc as below: strchr-lasx reduces the runtime about 50%-83% strchr-lsx reduces the runtime about 30%-67% strchr-aligned reduces the runtime about 10%-20% strchrnul-lasx reduces the runtime about 50%-83% strchrnul-lsx reduces the runtime about 36%-65% strchrnul-aligned reduces the runtime about 6%-10%
* sysdeps: tst-bz21269: handle ENOSYS & skip appropriatelySam James2023-08-161-1/+10
| | | | | | | | | SYS_modify_ldt requires CONFIG_MODIFY_LDT_SYSCALL to be set in the kernel, which some distributions may disable for hardening. Check if that's the case (unset) and mark the test as UNSUPPORTED if so. Reviewed-by: DJ Delorie <dj@redhat.com> Signed-off-by: Sam James <sam@gentoo.org>
* sysdeps: tst-bz21269: fix test parameterSam James2023-08-161-1/+1
| | | | | | | | All callers pass 1 or 0x11 anyway (same meaning according to man page), but still. Reviewed-by: DJ Delorie <dj@redhat.com> Signed-off-by: Sam James <sam@gentoo.org>
* hurd: Fix strictness of <mach/thread_state.h>Samuel Thibault2023-08-161-3/+3
| | | | Fixes: db25bc52026f ("hurd: Add prototype for and thus fix _hurdsig_abort_rpcs call")
* x86_64: Add expm1 with FMAH.J. Lu2023-08-144-0/+55
| | | | | | | | | | | | | | | | | | On Skylake, it improves expm1 bench performance by: Before After Improvement max 70.204 68.054 3% min 20.709 16.2 22% mean 22.1221 16.7367 24% NB: Add extern long double __expm1l (long double); extern long double __expm1f128 (long double); for __typeof (__expm1l) and __typeof (__expm1f128) when __expm1 is defined since __expm1 may be expanded in their declarations which causes the build failure.
* Loongarch: Add ifunc support and add different versions of strlendengjianbo2023-08-149-0/+418
| | | | | | strlen-lasx is implemeted by LASX simd instructions(256bit) strlen-lsx is implemeted by LSX simd instructions(128bit) strlen-align is implemented by LA basic instructions and never use unaligned memory acess
* LoongArch: Add minuimum binutils required versiondengjianbo2023-08-144-8/+7
| | | | | | | LoongArch glibc can add some LASX/LSX vector instructions codes, change the required minimum binutils version to 2.41 which could support vector instructions. HAVE_LOONGARCH_VEC_ASM is removed accordingly.
* LoongArch: Redefine macro LEAF/ENTRY.dengjianbo2023-08-141-10/+26
| | | | | | The following usage of macro LEAF/ENTRY are all feasible: 1. LEAF(fcn) -- the align value of fcn is .align 3(default value) 2. LEAF(fcn, 6) -- the align value of fcn is .align 6
* x86: Fix incorrect scope of setting `shared_per_thread` [BZ# 30745]Noah Goldstein2023-08-111-4/+3
| | | | | | | | | | | | | | The: ``` if (shared_per_thread > 0 && threads > 0) shared_per_thread /= threads; ``` Code was accidentally moved to inside the else scope. This doesn't match how it was previously (before af992e7abd). This patch fixes that by putting the division after the `else` block.
* x86_64: Add log2 with FMAH.J. Lu2023-08-113-0/+48
| | | | | | | | | On Skylake, it improves log2 bench performance by: Before After Improvement max 208.779 63.827 69% min 9.977 6.55 34% mean 10.366 6.8191 34%
* nscd: Do not rebuild getaddrinfo (bug 30709)Florian Weimer2023-08-111-16/+1
| | | | | | | | | | | | | | | | The nscd daemon caches hosts data from NSS modules verbatim, without filtering protocol families or sorting them (otherwise separate caches would be needed for certain ai_flags combinations). The cache implementation is complete separate from the getaddrinfo code. This means that rebuilding getaddrinfo is not needed. The only function actually used is __bump_nl_timestamp from check_pf.c, and this change moves it into nscd/connections.c. Tested on x86_64-linux-gnu with -fexceptions, built with build-many-glibcs.py. I also backported this patch into a distribution that still supports nscd and verified manually that caching still works. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* x86_64: Sort fpu/multiarch/MakefileH.J. Lu2023-08-101-20/+74
| | | | | | Sort Makefile variables using scripts/sort-makefile-lines.py. No code generation changes observed in libm. No regressions on x86_64.