mirror/glibc - mirror of git://sourceware.org/git/glibc.git

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	x86: Optimize strnlen-evex.S and implement with VMM headers	Noah Goldstein	2022-10-19	3	-404/+572
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Optimizations are: 1. Use the fact that bsf(0) leaves the destination unchanged to save a branch in short string case. 2. Restructure code so that small strings are given the hot path. - This is a net-zero on the benchmark suite but in general makes sense as smaller sizes are far more common. 3. Use more code-size efficient instructions. - tzcnt ... -> bsf ... - vpcmpb $0 ... -> vpcmpeq ... 4. Align labels less aggressively, especially if it doesn't save fetch blocks / causes the basic-block to span extra cache-lines. The optimizations (especially for point 2) make the strnlen and strlen code essentially incompatible so split strnlen-evex to a new file. Code Size Changes: strlen-evex.S : -23 bytes strnlen-evex.S : -167 bytes Net perf changes: Reported as geometric mean of all improvements / regressions from N=10 runs of the benchtests. Value as New Time / Old Time so < 1.0 is improvement and 1.0 is regression. strlen-evex.S : 0.992 (No real change) strnlen-evex.S : 0.947 Full results attached in email. Full check passes on x86-64.
*	x86: Shrink / minorly optimize strchr-evex and implement with VMM headers	Noah Goldstein	2022-10-19	1	-218/+340
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Size Optimizations: 1. Condence hot path for better cache-locality. - This is most impact for strchrnul where the logic strings with len <= VEC_SIZE or with a match in the first VEC no fits entirely in the first cache line. 2. Reuse common targets in first 4x VEC and after the loop. 3. Don't align targets so aggressively if it doesn't change the number of fetch blocks it will require and put more care in avoiding the case where targets unnecessarily split cache lines. 4. Align the loop better for DSB/LSD 5. Use more code-size efficient instructions. - tzcnt ... -> bsf ... - vpcmpb $0 ... -> vpcmpeq ... 6. Align labels less aggressively, especially if it doesn't save fetch blocks / causes the basic-block to span extra cache-lines. Code Size Changes: strchr-evex.S : -63 bytes strchrnul-evex.S: -48 bytes Net perf changes: Reported as geometric mean of all improvements / regressions from N=10 runs of the benchtests. Value as New Time / Old Time so < 1.0 is improvement and 1.0 is regression. strchr-evex.S (Fixed) : 0.971 strchr-evex.S (Rand) : 0.932 strchrnul-evex.S : 0.965 Full results attached in email. Full check passes on x86-64.
*	x86: Optimize memchr-evex.S and implement with VMM headers	Noah Goldstein	2022-10-19	3	-410/+851
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Optimizations are: 1. Use the fact that tzcnt(0) -> VEC_SIZE for memchr to save a branch in short string case. 2. Restructure code so that small strings are given the hot path. - This is a net-zero on the benchmark suite but in general makes sense as smaller sizes are far more common. 3. Use more code-size efficient instructions. - tzcnt ... -> bsf ... - vpcmpb $0 ... -> vpcmpeq ... 4. Align labels less aggressively, especially if it doesn't save fetch blocks / causes the basic-block to span extra cache-lines. The optimizations (especially for point 2) make the memchr and rawmemchr code essentially incompatible so split rawmemchr-evex to a new file. Code Size Changes: memchr-evex.S : -107 bytes rawmemchr-evex.S : -53 bytes Net perf changes: Reported as geometric mean of all improvements / regressions from N=10 runs of the benchtests. Value as New Time / Old Time so < 1.0 is improvement and 1.0 is regression. memchr-evex.S : 0.928 rawmemchr-evex.S : 0.986 (Less targets cross cache lines) Full results attached in email. Full check passes on x86-64.
*	x86_64: Implement evex512 version of memchr, rawmemchr and wmemchr	Sunil K Pandey	2022-10-18	6	-0/+346
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch implements following evex512 version of string functions. evex512 version takes up to 30% less cycle as compared to evex, depending on length and alignment. - memchr function using 512 bit vectors. - rawmemchr function using 512 bit vectors. - wmemchr function using 512 bit vectors. Code size data: memchr-evex.o 762 byte memchr-evex512.o 576 byte (-24%) rawmemchr-evex.o 461 byte rawmemchr-evex512.o 412 byte (-11%) wmemchr-evex.o 794 byte wmemchr-evex512.o 552 byte (-30%) Placeholder function, not used by any processor at the moment. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
*	Use PTR_MANGLE and PTR_DEMANGLE unconditionally in C sources	Florian Weimer	2022-10-18	22	-46/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the future, this will result in a compilation failure if the macros are unexpectedly undefined (due to header inclusion ordering or header inclusion missing altogether). Assembler sources are more difficult to convert. In many cases, they are hand-optimized for the mangling and no-mangling variants, which is why they are not converted. sysdeps/s390/s390-32/__longjmp.c and sysdeps/s390/s390-64/__longjmp.c are special: These are C sources, but most of the implementation is in assembler, so the PTR_DEMANGLE macro has to be undefined in some cases, to match the assembler style. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
*	Introduce <pointer_guard.h>, extracted from <sysdep.h>	Florian Weimer	2022-10-18	102	-509/+911
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This allows us to define a generic no-op version of PTR_MANGLE and PTR_DEMANGLE. In the future, we can use PTR_MANGLE and PTR_DEMANGLE unconditionally in C sources, avoiding an unintended loss of hardening due to missing include files or unlucky header inclusion ordering. In i386 and x86_64, we can avoid a <tls.h> dependency in the C code by using the computed constant from <tcb-offsets.h>. <sysdep.h> no longer includes these definitions, so there is no cyclic dependency anymore when computing the <tcb-offsets.h> constants. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
*	x86-64: Move LP_SIZE definition to its own header	Florian Weimer	2022-10-18	4	-11/+48
\| \| \| \| \| \| \| \| \|	This way, we can define the pointer guard macros without including <sysdep.h> on x86-64. Other architectures will not have such an inclusion dependency, and the implied header file inclusion would create a porting hazard. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
*	math: Fix asin and acos invalid exception with old gcc	Szabolcs Nagy	2022-10-17	1	-16/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This works around a gcc issue where it const folded inf/inf into nan, preventing the invalid exception to be signalled. (x-x)/(x-x) is more robust against optimizations and works for all out of bounds values including x==nan. The gcc issue https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95115 should be fixed on release branches starting from gcc-10, but it is better to change the code in case glibc is built with older gcc. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
*	x86: Update strlen-evex-base to use new reg/vec macros.	Noah Goldstein	2022-10-14	2	-76/+44
\| \| \| \| \| \| \| \| \| \|	To avoid duplicate the VMM / GPR / mask insn macros in all incoming evex512 files use the macros defined in 'reg-macros.h' and '{vec}-macros.h' This commit does not change libc.so Tested build on x86-64
*	x86: Remove now unused vec header macros.	Noah Goldstein	2022-10-14	7	-328/+0
\| \| \| \| \| \|	This commit does not change libc.so Tested build on x86-64
*	x86: Update memset to use new VEC macros	Noah Goldstein	2022-10-14	6	-99/+43
\| \| \| \| \| \| \| \|	Replace %VEC(n) -> %VMM(n) This commit does not change libc.so Tested build on x86-64
*	x86: Update memmove to use new VEC macros	Noah Goldstein	2022-10-14	6	-221/+132
\| \| \| \| \| \| \| \|	Replace %VEC(n) -> %VMM(n) This commit does not change libc.so Tested build on x86-64
*	x86: Update memrchr to use new VEC macros	Noah Goldstein	2022-10-14	1	-21/+21
\| \| \| \| \| \| \| \|	Replace %VEC(n) -> %VMM(n) This commit does not change libc.so Tested build on x86-64
*	x86: Update VEC macros to complete API for evex/evex512 impls	Noah Goldstein	2022-10-14	9	-0/+635
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1) Copy so that backport will be easier. 2) Make section only define if there is not a previous definition 3) Add `VEC_lo` definition for proper reg-width but in the ymm/zmm0-15 range. 4) Add macros for accessing GPRs based on VEC_SIZE This is to make it easier to do think like: ``` vpcmpb %VEC(0), %VEC(1), %k0 kmov{d\|q} %k0, %{eax\|rax} test %{eax\|rax} ``` It adds macro s.t any GPR can get the proper width with: `V{upcase_GPR_name}` and any mask insn can get the proper width with: `{upcase_mask_insn_without_postfix}` This commit does not change libc.so Tested build on x86-64
*	Add AArch64 HWCAP2_EBF16 from Linux 6.0 to bits/hwcap.h	Joseph Myers	2022-10-12	1	-0/+1
\| \| \| \| \| \| \|	Linux 6.0 adds a new AArch64 HWCAP2 bit, HWCAP2_EBF16. Add this to glibc's bits/hwcap.h. Tested with build-many-glibcs.py for aarch64-linux-gnu.
*	elf: Remove -fno-tree-loop-distribute-patterns usage on dl-support	Adhemerval Zanella	2022-10-10	8	-0/+221
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Besides the option being gcc specific, this approach is still fragile and not future proof since we do not know if this will be the only optimization option gcc will add that transforms loops to memset (or any libcall). This patch adds a new header, dl-symbol-redir-ifunc.h, that can b used to redirect the compiler generated libcalls to port the generic memset implementation if required. Checked on x86_64-linux-gnu and aarch64-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
*	Expose all MAP_ constants in <sys/mman.h> unconditionally (bug 29375)	Andreas Schwab	2022-10-10	9	-110/+82
\| \| \| \| \|	POSIX reserves the MAP_ prefix for <sys/mman.h>, so there is no need to conditionalize their definitions on feature test macros.
*	LoongArch: Fix the condition to use PC-relative addressing in start.S	Xi Ruoyao	2022-10-08	3	-12/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A start.o compiled from start.S with -DPIC and no -DSHARED is used by both crt1.o and rcrt1.o. So the LoongArch static PIE patch unintentionally introduced PC-relative addressing for main and __libc_start_main into crt1.o. While the latest Binutils (trunk, which will be released as 2.40) supports the PC-relative relocs against an external function by creating a PLT entry, the 2.39 release branch doesn't (and won't) support this. An error is raised: "PLT stub does not represent and symbol not defined." So, we need the following changes: 1. Check if ld supports the PC-relative relocs against an external function. If it's not supported, we deem static PIE unsupported. 2. Change start.S. If static PIE is supported, use PC-relative addressing for main and __libc_start_main and rely on the linker to create PLT entries. Otherwise, restore the old behavior (using GOT to address these functions). An alternative would be adding a new "static-pie-start.S", and some custom logic into Makefile to build rcrt1.o with it. And, restore start.S to the state before static PIE change so crt1.o won't contain PC-relative relocs against external symbols. But I can't see any benefit of this alternative, so I'd just keep it simple. Tested by building glibc with the following configurations: 1. Binutils trunk + GCC trunk. Static PIE enabled. All tests passed. 2. Binutils 2.39 branch + GCC trunk. Static PIE disabled. Tests related to ifunc failed (it's a known issue). All other tests passed. 3. Binutils 2.39 branch + GCC 12 branch, cross compilation with build-many-glibcs.py from x86_64-linux-gnu. Static PIE disabled. Build succeeded.
*	arm: Enable USE_ATOMIC_COMPILER_BUILTINS (BZ #24774)	Adhemerval Zanella	2022-10-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As per other architectures. I have checked on a armv8 hardware with the following configurations: arm-linux-gnueabihf (gcc built with --with-float=hard --with-cpu=arm926ej-s) armv5-linux-gnueabihf (-march=armv5te -mfpu=vfpv3) armv7-linux-gnueabihf (-march=armv7-a -mfpu=vfpv3) armv7-thumb-linux-gnueabihf (-march=armv7-a -mfpu=vfpv3 -mthumb) armv7-neon-linux-gnueabihf (-march=armv7-a -mfpu=neon) armv7-neonhard-linux-gnueabihf (-march=armv7-a -mfpu=neon -mfloat-abi=hard) Without any regression. I haven't dig into the code, but since Linux atomic-machine.h handle pre-ARMv6 and ARMv6 I expect the compiler might have some small room to optimize. The code size also improves is most of the configurations: * master text data bss dec hex filename 1727801 9720 37928 1775449 1b1759 arm-linux-gnueabihf/libc.so 1691729 9720 37928 1739377 1a8a71 arm-linux-gnueabihf-armv7-disable-multi-arch/libc.so 1725509 9720 37928 1773157 1b0e65 armv5-linux-gnueabihf/libc.so 1700757 9720 37928 1748405 1aadb5 armv6-linux-gnueabihf/libc.so 1698973 9720 37928 1746621 1aa6bd armv6t2-linux-gnueabihf/libc.so 1695481 9752 37928 1743161 1a9939 armv7-linux-gnueabihf/libc.so 1692917 9744 37928 1740589 1a8f2d armv7-neonhard-linux-gnueabihf/libc.so 1692917 9744 37928 1740589 1a8f2d armv7-neon-linux-gnueabihf/libc.so 1225353 9752 37928 1273033 136cc9 armv7-thumb-linux-gnueabihf/libc.so * patched text data bss dec hex filename 1726805 9720 37928 1774453 1b1375 arm-linux-gnueabihf/libc.so 1689321 9720 37928 1736969 1a8109 arm-linux-gnueabihf-armv7-disable-multi-arch/libc.so 1724433 9720 37928 1772081 1b0a31 armv5-linux-gnueabihf/libc.so 1698301 9720 37928 1745949 1aa41d armv6-linux-gnueabihf/libc.so 1696525 9720 37928 1744173 1a9d2d armv6t2-linux-gnueabihf/libc.so 1693009 9752 37928 1740689 1a8f91 armv7-linux-gnueabihf/libc.so 1690493 9744 37928 1738165 1a85b5 armv7-neonhard-linux-gnueabihf/libc.so 1690493 9744 37928 1738165 1a85b5 armv7-neon-linux-gnueabihf/libc.so 1223837 9752 37928 1271517 1366dd armv7-thumb-linux-gnueabihf/libc.so The idea is eventually move all architectures to use compiler builtins. Reviewed-by: Aurelien Jarno <aurelien@aurel32.net> Tested-by: Aurelien Jarno <aurelien@aurel32.net>
*	elf: Remove _dl_string_hwcap	Javier Pello	2022-10-06	10	-83/+0
\| \| \| \| \| \| \| \|	Removal of legacy hwcaps support from the dynamic loader left no users of _dl_string_hwcap. Signed-off-by: Javier Pello <devel@otheo.eu> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
*	elf: Remove hwcap parameter from add_to_cache signature	Javier Pello	2022-10-06	1	-1/+1
\| \| \| \| \| \| \| \|	Last commit made it so that the value passed for that parameter was always 0 at its only call site. Signed-off-by: Javier Pello <devel@otheo.eu> Reviewed-by: Florian Weimer <fweimer@redhat.com>
*	x86_64: Remove platform directory library loading test	Javier Pello	2022-10-06	3	-64/+0
\| \| \| \| \| \| \| \| \|	This was to test loading of shared libraries from platform subdirectories, but this functionality is going away in the following commits. Signed-off-by: Javier Pello <devel@otheo.eu> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
*	Update kernel version to 6.0 in header constant tests	Joseph Myers	2022-10-05	3	-4/+4
\| \| \| \| \| \| \| \| \|	This patch updates the kernel version in the tests tst-mman-consts.py, tst-mount-consts.py and tst-pidfd-consts.py to 6.0. (There are no new constants covered by these tests in 6.0 that need any other header changes.) Tested with build-many-glibcs.py.
*	x86: Fix -Os build (BZ #29576)	Adhemerval Zanella Netto	2022-10-05	1	-0/+18
\| \| \| \| \| \| \| \| \| \| \|	The compiler might transform __stpcpy calls (which are routed to __builtin_stpcpy as an optimization) to strcpy and x86_64 strcpy multiarch implementation does not build any working symbol due ISA_SHOULD_BUILD not being evaluated for IS_IN(rtld). Checked on x86_64-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@redhat.com>
*	Regenerate sysdeps/mach/hurd/bits/errno.h	Joseph Myers	2022-10-05	1	-0/+1
\| \| \| \| \| \| \|	This addition to the list of source headers in sysdeps/mach/hurd/bits/errno.h appears in the source tree after build-many-glibcs.py runs, I'm guessing resulting from gnumach commit c566ad85a2d6728ebc8ec0f461a3b35df300e96e.
*	Update syscall lists for Linux 6.0	Joseph Myers	2022-10-05	1	-2/+2
\| \| \| \| \| \| \|	Linux 6.0 has no new syscalls. Update the version number in syscall-names.list to reflect that it is still current for 6.0. Tested with build-many-glibcs.py.
*	x86-64: Require BMI1/BMI2 for AVX2 strrchr and wcsrchr implementations	Aurelien Jarno	2022-10-03	3	-3/+16
\| \| \| \| \| \| \| \| \| \| \|	The AVX2 strrchr and wcsrchr implementation uses the 'blsmsk' instruction which belongs to the BMI1 CPU feature and the 'shrx' instruction, which belongs to the BMI2 CPU feature. Fixes: df7e295d18ff ("x86: Optimize {str\|wcs}rchr-avx2") Partially resolves: BZ #29611 Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
*	x86-64: Require BMI2 and LZCNT for AVX2 memrchr implementation	Aurelien Jarno	2022-10-03	3	-2/+10
\| \| \| \| \| \| \| \| \| \| \|	The AVX2 memrchr implementation uses the 'shlxl' instruction, which belongs to the BMI2 CPU feature and uses the 'lzcnt' instruction, which belongs to the LZCNT CPU feature. Fixes: af5306a735eb ("x86: Optimize memrchr-avx2.S") Partially resolves: BZ #29611 Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
*	x86-64: Require BMI2 for AVX2 (raw\|w)memchr implementations	Aurelien Jarno	2022-10-03	1	-3/+9
\| \| \| \| \| \| \| \| \| \|	The AVX2 memchr, rawmemchr and wmemchr implementations use the 'bzhi' and 'sarx' instructions, which belongs to the BMI2 CPU feature. Fixes: acfd088a1963 ("x86: Optimize memchr-avx2.S") Partially resolves: BZ #29611 Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
*	x86-64: Require BMI2 for AVX2 wcs(n)cmp implementations	Aurelien Jarno	2022-10-03	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The AVX2 wcs(n)cmp implementations use the 'bzhi' instruction, which belongs to the BMI2 CPU feature. NB: It also uses the 'tzcnt' BMI1 instruction, but it is executed as BSF as BSF if the CPU doesn't support TZCNT, and produces the same result for non-zero input. Partially fixes: b77b06e0e296 ("x86: Optimize strcmp-avx2.S") Partially resolves: BZ #29611 Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
*	x86-64: Require BMI2 for AVX2 strncmp implementation	Aurelien Jarno	2022-10-03	2	-4/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The AVX2 strncmp implementations uses the 'bzhi' instruction, which belongs to the BMI2 CPU feature. NB: It also uses the 'tzcnt' BMI1 instruction, but it is executed as BSF as BSF if the CPU doesn't support TZCNT, and produces the same result for non-zero input. Partially fixes: b77b06e0e296 ("x86: Optimize strcmp-avx2.S") Partially resolves: BZ #29611 Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
*	x86-64: Require BMI2 for AVX2 strcmp implementation	Aurelien Jarno	2022-10-03	2	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The AVX2 strcmp implementation uses the 'bzhi' instruction, which belongs to the BMI2 CPU feature. NB: It also uses the 'tzcnt' BMI1 instruction, but it is executed as BSF as BSF if the CPU doesn't support TZCNT, and produces the same result for non-zero input. Partially fixes: b77b06e0e296 ("x86: Optimize strcmp-avx2.S") Partially resolves: BZ #29611 Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
*	x86-64: Require BMI2 for AVX2 str(n)casecmp implementations	Aurelien Jarno	2022-10-03	2	-8/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The AVX2 str(n)casecmp implementations use the 'bzhi' instruction, which belongs to the BMI2 CPU feature. NB: It also uses the 'tzcnt' BMI1 instruction, but it is executed as BSF as BSF if the CPU doesn't support TZCNT, and produces the same result for non-zero input. Partially fixes: b77b06e0e296 ("x86: Optimize strcmp-avx2.S") Partially resolves: BZ #29611 Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
*	x86: include BMI1 and BMI2 in x86-64-v3 level	Aurelien Jarno	2022-10-03	1	-0/+2
\| \| \| \| \| \| \| \|	The "System V Application Binary Interface AMD64 Architecture Processor Supplement" mandates the BMI1 and BMI2 CPU features for the x86-64-v3 level. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
*	x86: Cleanup pthread_spin_{try}lock.S	Noah Goldstein	2022-10-03	2	-12/+29
\| \| \| \| \| \| \| \| \| \| \| \| \|	Save a jmp on the lock path coming from an initial failure in pthread_spin_lock.S. This costs 4-bytes of code but since the function still fits in the same number of 16-byte blocks (default function alignment) it does not have affect on the total binary size of libc.so (unchanged after this commit). pthread_spin_trylock was using a CAS when a simple xchg works which is often more expensive. Full check passes on x86-64.
*	x86: Remove .tfloat usage	Adhemerval Zanella	2022-10-03	9	-26/+47
\| \| \| \| \|	Some compiler does not support it (such as clang integrated assembler) neither gcc emits it.
*	hppa: Fix initialization of dp register [BZ 29635]	John David Anglin	2022-10-01	1	-5/+19
\| \| \| \| \| \| \| \| \| \| \| \| \|	After upgrading glibc to Debian 2.35-1, gdb faulted on startup and dropped core in a function call in the main application. This was caused by not initializing the global dp register for the main application early enough. Restore the code to initialize dp in _dl_start_user. It was removed when code was added to initialize dp in elf_machine_runtime_setup. Signed-off-by: John David Anglin <dave.anglin@bell.net>
*	malloc: Do not clobber errno on __getrandom_nocancel (BZ #29624)	Adhemerval Zanella	2022-09-30	2	-3/+11
\| \| \| \| \| \| \| \| \| \|	Use INTERNAL_SYSCALL_CALL instead of INLINE_SYSCALL_CALL. This requires emulate the semantic for hurd call (so __arc4random_buf uses the fallback). Checked on x86_64-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
*	stdlib: Fix __getrandom_nocancel type and arc4random usage (BZ #29638)	Adhemerval Zanella	2022-09-30	1	-1/+1
\| \| \| \| \| \| \| \| \|	Using an unsigned type prevents the fallback to be used if kernel does not support getrandom syscall. Checked on x86_64-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
*	LoongArch: Add static PIE support	Xi Ruoyao	2022-09-30	3	-3/+95
\| \| \| \| \| \| \|	If the compiler is new enough, enable static PIE support. In the static PIE version of _start (in rcrt1.o), use la.pcrel instead of la.got because in a static PIE we cannot use GOT entries until the dynamic relocations for GOT are resolved.
*	x86: Fix wcsnlen-avx2 page cross length comparison [BZ #29591]	Noah Goldstein	2022-09-28	1	-5/+2
\| \| \| \| \| \| \| \| \| \| \|	Previous implementation was adjusting length (rsi) to match bytes (eax), but since there is no bound to length this can cause overflow. Fix is to just convert the byte-count (eax) to length by dividing by sizeof (wchar_t) before the comparison. Full check passes on x86-64 and build succeeds w/ and w/o multiarch.
*	Update _FloatN header support for C++ in GCC 13	Joseph Myers	2022-09-28	5	-15/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	GCC 13 adds support for _FloatN and _FloatNx types in C++, so breaking the installed glibc headers that assume such support is not present. GCC mostly works around this with fixincludes, but that doesn't help for building glibc and its tests (glibc doesn't itself contain C++ code, but there's C++ code built for tests). Update glibc's bits/floatn-common.h and bits/floatn.h headers to handle the GCC 13 support directly. In general the changes match those made by fixincludes, though I think the ones in sysdeps/powerpc/bits/floatn.h, where the header tests __LDBL_MANT_DIG__ == 113 or uses #elif, wouldn't match the existing fixincludes patterns. Some places involving special C++ handling in relation to _FloatN support are not changed. There's no need to change the __HAVE_FLOATN_NOT_TYPEDEF definition (also in a form that wouldn't be matched by the fixincludes fixes) because it's only used in relation to macro definitions using features not supported for C++ (__builtin_types_compatible_p and _Generic). And there's no need to change the inline function overloads for issignaling, iszero and iscanonical in C++ because cases where types have the same format but are no longer compatible types are handled automatically by the C++ overload resolution rules. This patch also does not change the overload handling for iseqsig, and there I think changes are needed, beyond those in this patch or made by fixincludes. The way that overload is defined, via a template parameter to a structure type, requires overloads whenever the types are incompatible, even if they have the same format. So I think we need to add overloads with GCC 13 for every supported _FloatN and _FloatNx type, rather than just having one for _Float128 when it has a different ABI to long double as at present (but for older GCC, such overloads must not be defined for types that end up defined as typedefs for another type). Tested with build-many-glibcs.py: compilers build for aarch64-linux-gnu ia64-linux-gnu mips64-linux-gnu powerpc-linux-gnu powerpc64le-linux-gnu x86_64-linux-gnu; glibcs build for aarch64-linux-gnu ia64-linux-gnu i686-linux-gnu mips-linux-gnu mips64-linux-gnu-n32 powerpc-linux-gnu powerpc64le-linux-gnu x86_64-linux-gnu.
*	hurd: Fix typo	Samuel Thibault	2022-09-28	1	-1/+1
\|
*	get_nscd_addresses: Fix subscript typos [BZ #29605]	Jörg Sonnenberger	2022-09-28	1	-3/+3
\| \| \| \| \| \| \| \| \|	Fix the subscript on air->family, which was accidentally set to COUNT when it should have remained as I. Resolves: BZ #29605 Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
*	hurd: Increase SOMAXCONN to 4096	Samuel Thibault	2022-09-27	1	-1/+1
\| \| \| \|	Notably fakeroot-tcp may introduce a lot of parallel connections.
*	Use atomic_exchange_release/acquire	Wilco Dijkstra	2022-09-26	23	-28/+28
\| \| \| \| \| \| \|	Rename atomic_exchange_rel/acq to use atomic_exchange_release/acquire since these map to the standard C11 atomic builtins. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
*	Use C11 atomics instead of atomic_decrement_and_test	Wilco Dijkstra	2022-09-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Replace atomic_decrement_and_test with atomic_fetch_add_relaxed. These are simple counters which do not protect any shared data from concurrent accesses. Also remove the unused file cond-perf.c. Passes regress on AArch64. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
*	Use C11 atomics instead of atomic_increment(_val)	Wilco Dijkstra	2022-09-23	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \|	Replace atomic_increment and atomic_increment_val with atomic_fetch_add_relaxed. One case in sem_post.c uses release semantics (see comment above it). The others are simple counters and do not protect any shared data from concurrent accesses. Passes regress on AArch64. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
*	riscv: Remove RV32 floating point functions	Alistair Francis	2022-09-21	8	-132/+40
\| \| \| \| \| \| \|	We don't need RV32 specific floating point functions, instead make them generic for RISC-V. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
*	riscv: Consolidate the libm-test-ulps	Alistair Francis	2022-09-21	4	-1406/+0
\| \| \| \| \| \| \|	Both RV32 and RV64 should have the same libm-test-ulps, so consolidate them into a single file. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>