mirror/glibc - mirror of git://sourceware.org/git/glibc.git

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	x86: Shrink code size of memchr-evex.S	Noah Goldstein	2022-06-07	1	-21/+25
\| \| \| \| \| \| \| \| \| \| \| \| \|	This is not meant as a performance optimization. The previous code was far to liberal in aligning targets and wasted code size unnecissarily. The total code size saving is: 64 bytes There are no non-negligible changes in the benchmarks. Geometric Mean of all benchmarks New / Old: 1.000 Full xcheck passes on x86_64. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	x86: Shrink code size of memchr-avx2.S	Noah Goldstein	2022-06-07	2	-50/+60
\| \| \| \| \| \| \| \| \| \| \| \| \|	This is not meant as a performance optimization. The previous code was far to liberal in aligning targets and wasted code size unnecissarily. The total code size saving is: 59 bytes There are no major changes in the benchmarks. Geometric Mean of all benchmarks New / Old: 0.967 Full xcheck passes on x86_64. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	x86: Optimize memrchr-avx2.S	Noah Goldstein	2022-06-07	2	-278/+257
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The new code: 1. prioritizes smaller user-arg lengths more. 2. optimizes target placement more carefully 3. reuses logic more 4. fixes up various inefficiencies in the logic. The biggest case here is the `lzcnt` logic for checking returns which saves either a branch or multiple instructions. The total code size saving is: 306 bytes Geometric Mean of all benchmarks New / Old: 0.760 Regressions: There are some regressions. Particularly where the length (user arg length) is large but the position of the match char is near the beginning of the string (in first VEC). This case has roughly a 10-20% regression. This is because the new logic gives the hot path for immediate matches to shorter lengths (the more common input). This case has roughly a 15-45% speedup. Full xcheck passes on x86_64. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	x86: Optimize memrchr-evex.S	Noah Goldstein	2022-06-07	1	-271/+268
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The new code: 1. prioritizes smaller user-arg lengths more. 2. optimizes target placement more carefully 3. reuses logic more 4. fixes up various inefficiencies in the logic. The biggest case here is the `lzcnt` logic for checking returns which saves either a branch or multiple instructions. The total code size saving is: 263 bytes Geometric Mean of all benchmarks New / Old: 0.755 Regressions: There are some regressions. Particularly where the length (user arg length) is large but the position of the match char is near the beginning of the string (in first VEC). This case has roughly a 20% regression. This is because the new logic gives the hot path for immediate matches to shorter lengths (the more common input). This case has roughly a 35% speedup. Full xcheck passes on x86_64. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	x86: Optimize memrchr-sse2.S	Noah Goldstein	2022-06-07	1	-321/+292
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The new code: 1. prioritizes smaller lengths more. 2. optimizes target placement more carefully. 3. reuses logic more. 4. fixes up various inefficiencies in the logic. The total code size saving is: 394 bytes Geometric Mean of all benchmarks New / Old: 0.874 Regressions: 1. The page cross case is now colder, especially re-entry from the page cross case if a match is not found in the first VEC (roughly 50%). My general opinion with this patch is this is acceptable given the "coldness" of this case (less than 4%) and generally performance improvement in the other far more common cases. 2. There are some regressions 5-15% for medium/large user-arg lengths that have a match in the first VEC. This is because the logic was rewritten to optimize finds in the first VEC if the user-arg length is shorter (where we see roughly 20-50% performance improvements). It is not always the case this is a regression. My intuition is some frontend quirk is partially explaining the data although I haven't been able to find the root cause. Full xcheck passes on x86_64. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	x86: Add COND_VZEROUPPER that can replace vzeroupper if no `ret`	Noah Goldstein	2022-06-07	2	-0/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The RTM vzeroupper mitigation has no way of replacing inline vzeroupper not before a return. This can be useful when hoisting a vzeroupper to save code size for example: ``` L(foo): cmpl %eax, %edx jz L(bar) tzcntl %eax, %eax addq %rdi, %rax VZEROUPPER_RETURN L(bar): xorl %eax, %eax VZEROUPPER_RETURN ``` Can become: ``` L(foo): COND_VZEROUPPER cmpl %eax, %edx jz L(bar) tzcntl %eax, %eax addq %rdi, %rax ret L(bar): xorl %eax, %eax ret ``` This code does not change any existing functionality. There is no difference in the objdump of libc.so before and after this patch. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	x86: Create header for VEC classes in x86 strings library	Noah Goldstein	2022-06-07	7	-0/+327
\| \| \| \| \| \| \| \| \| \|	This patch does not touch any existing code and is only meant to be a tool for future patches so that simple source files can more easily be maintained to target multiple VEC classes. There is no difference in the objdump of libc.so before and after this patch. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	powerpc: Fix VSX register number on __strncpy_power9 [BZ #29197]	Matheus Castanho	2022-06-07	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	__strncpy_power9 initializes VR 18 with zeroes to be used throughout the code, including when zero-padding the destination string. However, the v18 reference was mistakenly being used for stxv and stxvl, which take a VSX vector as operand. The code ended up using the uninitialized VSR 18 register by mistake. Both occurrences have been changed to use the proper VSX number for VR 18 (i.e. VSR 50). Tested on powerpc, powerpc64 and powerpc64le. Signed-off-by: Kewen Lin <linkw@gcc.gnu.org>
*	AArch64: Sort makefile entries	Wilco Dijkstra	2022-06-07	1	-6/+18
\| \| \| \|	Sort makefile entries to reduce conflicts.
*	AArch64: Add SVE memcpy	Wilco Dijkstra	2022-06-07	5	-42/+284
\| \| \| \| \| \|	Add an initial SVE memcpy implementation. Copies up to 32 bytes use SVE vectors which improves the random memcpy benchmark significantly. Cleanup the memcpy and memmove ifunc selectors.
*	x86_64: Add strstr function with 512-bit EVEX	Raghuveer Devulapalli	2022-06-06	4	-4/+242
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adding a 512-bit EVEX version of strstr. The algorithm works as follows: (1) We spend a few cycles at the begining to peek into the needle. We locate an edge in the needle (first occurance of 2 consequent distinct characters) and also store the first 64-bytes into a zmm register. (2) We search for the edge in the haystack by looking into one cache line of the haystack at a time. This avoids having to read past a page boundary which can cause a seg fault. (3) If an edge is found in the haystack we first compare the first 64-bytes of the needle (already stored in a zmm register) before we proceed with a full string compare performed byte by byte. Benchmarking results: (old = strstr_sse2_unaligned, new = strstr_avx512) Geometric mean of all benchmarks: new / old = 0.66 Difficult skiptable(0) : new / old = 0.02 Difficult skiptable(1) : new / old = 0.01 Difficult 2-way : new / old = 0.25 Difficult testing first 2 : new / old = 1.26 Difficult skiptable(0) : new / old = 0.05 Difficult skiptable(1) : new / old = 0.06 Difficult 2-way : new / old = 0.26 Difficult testing first 2 : new / old = 1.05 Difficult skiptable(0) : new / old = 0.42 Difficult skiptable(1) : new / old = 0.24 Difficult 2-way : new / old = 0.21 Difficult testing first 2 : new / old = 1.04 Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	grep: egrep -> grep -E, fgrep -> grep -F	Sam James	2022-06-05	5	-8/+8
\| \| \| \| \| \| \| \| \| \| \|	Newer versions of GNU grep (after grep 3.7, not inclusive) will warn on 'egrep' and 'fgrep' invocations. Convert usages within the tree to their expanded non-aliased counterparts to avoid irritating warnings during ./configure and the test suite. Signed-off-by: Sam James <sam@gentoo.org> Reviewed-by: Fangrui Song <maskray@google.com>
*	linux: Add process_mrelease	Adhemerval Zanella	2022-06-02	38	-0/+124
\| \| \| \| \| \| \| \| \|	Added in Linux 5.15 (884a7e5964e06ed93c7771c0d7cf19c09a8946f1), the new syscalls allows a caller to free the memory of a dying target process. Checked on x86_64-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
*	linux: Add process_madvise	Adhemerval Zanella	2022-06-02	38	-0/+214
\| \| \| \| \| \| \| \| \| \|	It was added on Linux 5.10 (ecb8ac8b1f146915aa6b96449b66dd48984caacc) with the same functionality as madvise but using a pidfd of the target process. Checked on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
*	linux: Set tst-pidfd-consts unsupported for kernels headers older than 5.10	Adhemerval Zanella	2022-06-02	1	-0/+3
\| \| \| \| \| \| \| \|	Instead of fail trying to build the compare source file. Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Matheus Castanho <msc@linux.ibm.com> Reviewed-by: Matheus Castanho <msc@linux.ibm.com>
*	Linux: Adjust struct rseq definition to current kernel version	Florian Weimer	2022-06-02	1	-22/+6
\| \| \| \| \| \| \| \|	This definition is only used as a fallback with old kernel headers. The change follows kernel commit bfdf4e6208051ed7165b2e92035b4bf11 ("rseq: Remove broken uapi field layout on 32-bit little endian"). Reviewed-by: Carlos O'Donell <carlos@redhat.com>
*	socket: Use 64 bit stat for isfdtype (BZ# 29209)	Adhemerval Zanella	2022-06-01	1	-2/+2
\| \| \| \| \| \|	This is a missing spot initially from 52a5fe70a2c77935. Checked on i686-linux-gnu.
*	posix: Use 64 bit stat for fpathconf (_PC_ASYNC_IO) (BZ# 29208)	Adhemerval Zanella	2022-06-01	1	-2/+2
\| \| \| \| \| \|	This is a missing spot initially from 52a5fe70a2c77935. Checked on i686-linux-gnu.
*	posix: Use 64 bit stat for posix_fallocate fallback (BZ# 29207)	Adhemerval Zanella	2022-06-01	2	-4/+4
\| \| \| \| \| \|	This is a missing spot initially from 52a5fe70a2c77935. Checked on i686-linux-gnu.
*	linux: use statx for fstat if neither newfstatat nor fstatat64 is present	WANG Xuerui	2022-06-01	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \|	LoongArch is going to be the first architecture supported by Linux that has neither fstat* nor newfstatat [1], instead exclusively relying on statx. So in fstatat64's implementation, we need to also enable statx usage if neither fstatat64 nor newfstatat is present, to prepare for this new case of kernel ABI. [1]: https://lore.kernel.org/all/20220518092619.1269111-1-chenhuacai@loongson.cn/ Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
*	Add MADV_DONTNEED_LOCKED from Linux 5.18 to bits/mman-linux.h	Joseph Myers	2022-06-01	1	-0/+2
\| \| \| \| \| \| \| \|	Linux 5.18 adds a constant MADV_DONTNEED_LOCKED (defined in multiple header files, but with the same value on all architectures). Add this constant to bits/mman-linux.h. Tested for x86_64.
*	Add HWCAP2_MTE3 from Linux 5.18 to AArch64 bits/hwcap.h	Joseph Myers	2022-06-01	1	-0/+1
\| \| \| \| \| \| \|	Linux 5.18 defines a new AArch64 HWCAP value HWCAP2_MTE3; add it to glibc's sysdeps/unix/sysv/linux/aarch64/bits/hwcap.h. Tested with build-many-glibcs.py for aarch64-linux-gnu.
*	i686: Use generic sincosf implementation for SSE2 version	Adhemerval Zanella	2022-06-01	5	-585/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The generic implementation shows slight better performance (gcc 11.2.1 on a Ryzen 9 5900X): * s_sincosf-sse2.S: "sincosf": { "workload-random": { "duration": 3.89961e+09, "iterations": 9.5472e+07, "reciprocal-throughput": 40.8429, "latency": 40.8483, "max-throughput": 2.4484e+07, "min-throughput": 2.44808e+07 } } * generic s_cossinf.c: "sincosf": { "workload-random": { "duration": 3.71953e+09, "iterations": 1.48512e+08, "reciprocal-throughput": 25.0515, "latency": 25.0391, "max-throughput": 3.99177e+07, "min-throughput": 3.99375e+07 } } Checked on i686-linux-gnu. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	i686: Use generic sinf implementation for SSE2 version	Adhemerval Zanella	2022-06-01	5	-565/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Performance seems to be similar (gcc 11.2.1 on a Ryzen 9 5900X), the generic algorithm shows slight better performance for the 'workload-huge.wrf' input set. * s_sinf-sse2.S: "sinf": { "": { "duration": 3.72405e+09, "iterations": 2.38374e+08, "max": 63.973, "min": 11.211, "mean": 15.6227 }, "workload-random.wrf": { "duration": 3.76923e+09, "iterations": 8.4e+07, "reciprocal-throughput": 17.6355, "latency": 72.108, "max-throughput": 5.67037e+07, "min-throughput": 1.38681e+07 }, "workload-huge.wrf": { "duration": 3.76943e+09, "iterations": 6e+07, "reciprocal-throughput": 29.3493, "latency": 96.2985, "max-throughput": 3.40724e+07, "min-throughput": 1.03844e+07 } } * generic s_sinf.c: "sinf": { "": { "duration": 3.70989e+09, "iterations": 2.18025e+08, "max": 69.782, "min": 11.1, "mean": 17.0159 }, "workload-random.wrf": { "duration": 3.77213e+09, "iterations": 9.6e+07, "reciprocal-throughput": 17.5402, "latency": 61.0459, "max-throughput": 5.70119e+07, "min-throughput": 1.63811e+07 }, "workload-huge.wrf": { "duration": 3.81576e+09, "iterations": 5.6e+07, "reciprocal-throughput": 38.2111, "latency": 98.0659, "max-throughput": 2.61704e+07, "min-throughput": 1.01972e+07 } } Checked on i686-linux-gnu. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	i686: Use generic cosf implementation for SSE2 version	Adhemerval Zanella	2022-06-01	5	-552/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Performance seems to be similar (gcc 11.2.1 on a Ryzen 9 5900X): * s_cosf-sse2.S: "cosf": { "workload-random": { "duration": 3.74987e+09, "iterations": 9.616e+07, "reciprocal-throughput": 15.8141, "latency": 62.1782, "max-throughput": 6.32346e+07, "min-throughput": 1.60828e+07 } } * generic s_cosf.c: "cosf": { "workload-random": { "duration": 3.87298e+09, "iterations": 1.00968e+08, "reciprocal-throughput": 18.3448, "latency": 58.3722, "max-throughput": 5.45113e+07, "min-throughput": 1.71314e+07 } } Checked on i686-linux-gnu.
*	x86_64: Optimize sincos where sin/cos is optimized (bug 29193)	Andreas Schwab	2022-06-01	6	-3/+55
\| \| \| \| \| \| \|	The compiler may substitute calls to sin or cos with calls to sincos, thus we should have the same optimized implementations for sincos. The optimized implementations may produce results that differ, that also makes sure that the sincos call aggrees with the sin and cos calls.
*	Add SOL_SMC from Linux 5.18 to bits/socket.h	Joseph Myers	2022-05-31	1	-0/+1
\| \| \| \| \| \| \|	Linux 5.18 adds a constant SOL_SMC to the getsockopt / setsockopt levels; add this constant to bits/socket.h. Tested for x86_64.
*	elf: Remove _dl_skip_args	Adhemerval Zanella	2022-05-30	2	-5/+0
\| \| \| \| \| \|	Now that no architecture uses it anymore. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
*	x86_64: Remove _dl_skip_args usage	Adhemerval Zanella	2022-05-30	2	-22/+3
\| \| \| \| \| \| \| \| \| \| \|	Since ad43cac44a the generic code already shuffles the argv/envp/auxv on the stack to remove the ld.so own arguments and thus _dl_skip_args is always 0. So there is no need to adjust the argc or argv. Checked on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
*	sparc: Remove _dl_skip_args usage	Adhemerval Zanella	2022-05-30	2	-79/+4
\| \| \| \| \| \| \| \| \| \|	Since ad43cac44a the generic code already shuffles the argv/envp/auxv on the stack to remove the ld.so own arguments and thus _dl_skip_args is always 0. So there is no need to adjust the argc or argv. Checked on sparc64-linux-gnu and sparcv9-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
*	sh: Remove _dl_skip_args usage	Adhemerval Zanella	2022-05-30	1	-15/+1
\| \| \| \| \| \| \| \| \| \| \|	Since ad43cac44a the generic code already shuffles the argv/envp/auxv on the stack to remove the ld.so own arguments and thus _dl_skip_args is always 0. So there is no need to adjust the argc or argv. Checked with qemu-user that arguments are correctly passed on both constructors and main program. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
*	s390: Remove _dl_skip_args usage	Adhemerval Zanella	2022-05-30	2	-62/+0
\| \| \| \| \| \| \| \| \| \|	Since ad43cac44a the generic code already shuffles the argv/envp/auxv on the stack to remove the ld.so own arguments and thus _dl_skip_args is always 0. So there is no need to adjust the argc or argv. Checked on s390x-linux-gnu and s390-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
*	riscv: Remove _dl_skip_args usage	Adhemerval Zanella	2022-05-30	1	-11/+1
\| \| \| \| \| \| \| \| \| \| \|	Since ad43cac44a the generic code already shuffles the argv/envp/auxv on the stack to remove the ld.so own arguments and thus _dl_skip_args is always 0. So there is no need to adjust the argc or argv. Checked with qemu-user that arguments are correctly passed on both constructors and main program. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
*	nios2: Remove _dl_skip_args usage (BZ# 29187)	Adhemerval Zanella	2022-05-30	1	-40/+10
\| \| \| \| \| \| \| \| \| \| \|	Since ad43cac44a the generic code already shuffles the argv/envp/auxv on the stack to remove the ld.so own arguments and thus _dl_skip_args is always 0. So there is no need to adjust the argc or argv. Checked with qemu-user that arguments are correctly passed on both constructors and main program. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
*	mips: Remove _dl_skip_args usage	Adhemerval Zanella	2022-05-30	1	-29/+2
\| \| \| \| \| \| \| \| \| \| \|	Since ad43cac44a the generic code already shuffles the argv/envp/auxv on the stack to remove the ld.so own arguments and thus _dl_skip_args is always 0. So there is no need to adjust the argc or argv. Checked with qemu-user that arguments are correctly passed on both constructors and main program. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
*	microblaze: Remove _dl_skip_args usage	Adhemerval Zanella	2022-05-30	1	-5/+0
\| \| \| \| \| \| \| \| \| \| \|	Since ad43cac44a the generic code already shuffles the argv/envp/auxv on the stack to remove the ld.so own arguments and thus _dl_skip_args is always 0. So there is no need to adjust the argc or argv. Checked with qemu-user that arguments are correctly passed on both constructors and main program. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
*	m68k: Remove _dl_skip_args usage	Adhemerval Zanella	2022-05-30	1	-7/+0
\| \| \| \| \| \| \| \| \| \| \|	Since ad43cac44a the generic code already shuffles the argv/envp/auxv on the stack to remove the ld.so own arguments and thus _dl_skip_args is always 0. So there is no need to adjust the argc or argv. Checked with qemu-user that arguments are correctly passed on both constructors and main program. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
*	ia64: Remove _dl_skip_args usage	Adhemerval Zanella	2022-05-30	1	-56/+14
\| \| \| \| \| \| \| \| \| \| \| \| \|	Since ad43cac44a the generic code already shuffles the argv/envp/auxv on the stack to remove the ld.so own arguments and thus _dl_skip_args is always 0. The startup code is changed to read the _dl_argc and _dl_argv values, and envp is calculated from argc and argv. Checked on ia64-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
*	i686: Remove _dl_skip_args usage	Adhemerval Zanella	2022-05-30	1	-11/+2
\| \| \| \| \| \| \| \| \| \|	Since ad43cac44a the generic code already shuffles the argv/envp/auxv on the stack to remove the ld.so own arguments and thus _dl_skip_args is always 0. So there is no need to adjust the argc or argv. Checked on i686-linux-gnu. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	hppa: Remove _dl_skip_args usage (BZ# 29165)	Adhemerval Zanella	2022-05-30	1	-22/+14
\| \| \| \| \| \| \| \| \| \| \| \| \|	Different than other architectures, hppa creates an unrelated stack frame where ld.so argc/argv adjustments done by ad43cac44a6860eaefc is not done on the argc/argv saved/restore by _dl_start_user. Instead load _dl_argc and _dl_argv directlty instead of adjust them using _dl_skip_args value. Checked on hppa-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
*	csky: Remove _dl_skip_args usage	Adhemerval Zanella	2022-05-30	1	-18/+1
\| \| \| \| \| \| \| \|	Since ad43cac44a the generic code already shuffles the argv/envp/auxv on the stack to remove the ld.so own arguments and thus _dl_skip_args is always 0. It makes the fixup_stack branch ununsed. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
*	arc: Remove _dl_skip_args usage	Adhemerval Zanella	2022-05-30	1	-15/+2
\| \| \| \| \| \| \| \|	Since ad43cac44a the generic code already shuffles the argv/envp/auxv on the stack to remove the ld.so own arguments and thus _dl_skip_args is always 0. So there is no need to adjust the argc or argv. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
*	arm: Remove _dl_skip_args usage	Adhemerval Zanella	2022-05-30	1	-39/+0
\| \| \| \| \| \| \| \| \| \| \|	Since ad43cac44a the generic code already shuffles the argv/envp/auxv on the stack to remove the ld.so own arguments and thus _dl_skip_args is always 0. It makes the _fixup_stack branch ununsed. Checked with qemu-user that arguments are correctly passed on both constructors and main program. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
*	alpha: Remove _dl_skip_args usage	Adhemerval Zanella	2022-05-30	1	-41/+0
\| \| \| \| \| \| \| \| \| \| \|	Since ad43cac44a the generic code already shuffles the argv/envp/auxv on the stack to remove the ld.so own arguments and thus _dl_skip_args is always 0. It makes the fixup_stack branch ununsed. Checked with qemu-user that arguments are correctly passed on both constructors and main program. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
*	x86-64: Ignore r_addend for R_X86_64_GLOB_DAT/R_X86_64_JUMP_SLOT	H.J. Lu	2022-05-26	1	-2/+4
\| \| \| \| \| \| \| \|	According to x86-64 psABI, r_addend should be ignored for R_X86_64_GLOB_DAT and R_X86_64_JUMP_SLOT. Since linkers always set their r_addends to 0, we can ignore their r_addends. Reviewed-by: Fangrui Song <maskray@google.com>
*	x86_64: Implement evex512 version of strlen, strnlen, wcslen and wcsnlen	Sunil K Pandey	2022-05-26	7	-0/+346
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch implements following evex512 version of string functions. Perf gain for evex512 version is up to 50% as compared to evex, depending on length and alignment. Placeholder function, not used by any processor at the moment. - String length function using 512 bit vectors. - String N length using 512 bit vectors. - Wide string length using 512 bit vectors. - Wide string N length using 512 bit vectors. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
*	Update kernel version to 5.18 in header constant tests	Joseph Myers	2022-05-26	2	-2/+2
\| \| \| \| \| \| \| \| \| \|	This patch updates the kernel version in the tests tst-mman-consts.py and tst-pidfd-consts.py to 5.18. (There are no new constants covered by these tests in 5.18, or in 5.17 in the case of tst-pidfd-consts.py that previously used version 5.16, that need any other header changes.) Tested with build-many-glibcs.py.
*	Update syscall-names.list for Linux 5.18	Joseph Myers	2022-05-25	1	-2/+2
\| \| \| \| \| \| \|	Linux 5.18 has no new syscalls. Update the version number in syscall-names.list to reflect that it is still current for 5.18. Tested with build-many-glibcs.py.
*	Fix deadlock when pthread_atfork handler calls pthread_atfork or dlclose	Arjun Shankar	2022-05-25	5	-4/+372
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In multi-threaded programs, registering via pthread_atfork, de-registering implicitly via dlclose, or running pthread_atfork handlers during fork was protected by an internal lock. This meant that a pthread_atfork handler attempting to register another handler or dlclose a dynamically loaded library would lead to a deadlock. This commit fixes the deadlock in the following way: During the execution of handlers at fork time, the atfork lock is released prior to the execution of each handler and taken again upon its return. Any handler registrations or de-registrations that occurred during the execution of the handler are accounted for before proceeding with further handler execution. If a handler that hasn't been executed yet gets de-registered by another handler during fork, it will not be executed. If a handler gets registered by another handler during fork, it will not be executed during that particular fork. The possibility that handlers may now be registered or deregistered during handler execution means that identifying the next handler to be run after a given handler may register/de-register others requires some bookkeeping. The fork_handler struct has an additional field, 'id', which is assigned sequentially during registration. Thus, handlers are executed in ascending order of 'id' during 'prepare', and descending order of 'id' during parent/child handler execution after the fork. Two tests are included: * tst-atfork3: Adhemerval Zanella <adhemerval.zanella@linaro.org> This test exercises calling dlclose from prepare, parent, and child handlers. * tst-atfork4: This test exercises calling pthread_atfork and dlclose from the prepare handler. [BZ #24595, BZ #27054] Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
*	math: Add math-use-builtins-fabs (BZ#29027)	Adhemerval Zanella	2022-05-23	12	-208/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Both float, double, and _Float128 are assumed to be supported (float and double already only uses builtins). Only long double is parametrized due GCC bug 29253 which prevents its usage on powerpc. It allows to remove i686, ia64, x86_64, powerpc, and sparc arch specific implementation. On ia64 it also fixes the sNAN handling: math/test-float64x-fabs math/test-ldouble-fabs Checked on x86_64-linux-gnu, i686-linux-gnu, powerpc-linux-gnu, powerpc64-linux-gnu, sparc64-linux-gnu, and ia64-linux-gnu.