mirror/glibc - mirror of git://sourceware.org/git/glibc.git

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	x86: Add sse42 implementation to strcmp's ifunc	Noah Goldstein	2022-06-14	1	-0/+5
\| \| \| \| \| \| \| \| \|	This has been missing since the the ifuncs where added. The performance of SSE4.2 is preferable to to SSE2. Measured on Tigerlake with N = 20 runs. Geometric Mean of all benchmarks SSE4.2 / SSE2: 0.906
*	x86: Fix misordered logic for setting `rep_movsb_stop_threshold`	Noah Goldstein	2022-06-14	1	-12/+12
\| \| \| \| \| \| \| \|	Move the setting of `rep_movsb_stop_threshold` to after the tunables have been collected so that the `rep_movsb_stop_threshold` (which is used to redirect control flow to the non_temporal case) will use any user value for `non_temporal_threshold` (set using glibc.cpu.x86_non_temporal_threshold)
*	elf: Refine direct extern access diagnostics to protected symbol	Fangrui Song	2022-06-14	1	-23/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Refine commit 349b0441dab375099b1d7f6909c1742286a67da9: 1. Copy relocations for extern protected data do not work properly, regardless whether GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS is used. It makes sense to produce a warning unconditionally. 2. Non-zero value of an undefined function symbol may break pointer equality, but may be benign in many cases (many programs don't take the address in the shared object then compare it with the address in the executable). Reword the diagnostic to be clearer. 3. Remove the unneeded condition !(undef_map->l_1_needed & GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS). If the executable does not not have GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS (can only occur in error cases), the diagnostic should be emitted as well. When the defining shared object has GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS, report an error to apply the intended enforcement.
*	Add bounds check to __libc_ifunc_impl_list	Wilco Dijkstra	2022-06-10	8	-46/+16
\| \| \| \| \| \| \| \| \| \| \| \|	Add a proper bounds check to __libc_ifunc_impl_list. This makes MAX_IFUNC redundant and fixes several targets that will write outside the array. To avoid unnecessary large diffs, pass the maximum in the argument 'i' to IFUNC_IMPL_ADD - 'max' can be used in new ifunc definitions and existing ones can be updated if desired. Passes buildmanyglibc. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
*	x86: Optimize svml_s_tanhf4_core_sse4.S	Noah Goldstein	2022-06-09	1	-727/+138
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Optimizations are: 1. Reduce code size (-112 bytes). 2. Remove redundant move instructions. 3. Slightly improve instruction selection/scheduling where possible. 4. Prefer registers which get short instruction encoding. 5. Reduce rodata size (-4k+ rodata is shared with avx2). Result is roughly a 15-16% speedup: Function, New Time, Old Time, New / Old _ZGVbN4v_tanhf, 3.158, 3.749, 0.842
*	x86: Optimize svml_s_tanhf8_core_avx2.S	Noah Goldstein	2022-06-09	1	-741/+171
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Optimizations are: 1. Reduce code size (-81 bytes). 2. Remove redundant move instructions. 3. Slightly improve instruction selection/scheduling where possible. 4. Prefer registers which get short instruction encoding. 5. Reduce rodata size (-32 bytes). Result is roughly a 17-18% speedup: Function, New Time, Old Time, New / Old _ZGVdN8v_tanhf, 1.977, 2.402, 0.823
*	x86: Add data file that can be shared by tanhf-avx2 and tanhf-sse4	Noah Goldstein	2022-06-09	1	-0/+621
\| \| \| \| \| \| \| \| \| \|	tanhf-avx2 and tanhf-sse4 use the same data tables so we can save over 4kb using a shared datatable. This does increase the memory footprint of the sse4 version (as now all the targets are 32 bytes instead of 16), generally it seems worth the code size save. NB: This patch doesn't do anything itself, it is setup for future patches.
*	x86: Optimize svml_s_tanhf16_core_avx512.S	Noah Goldstein	2022-06-09	1	-240/+287
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Optimizations are: 1. Reduce code size (-67 bytes). 2. Remove redundant move instructions. 3. Slightly improve instruction selection/scheduling where possible. 4. Reduce rodata usage (-448 bytes). Result is roughly a 14% speedup: Function, New Time, Old Time, New / Old _ZGVeN16v_tanhf, 0.649, 0.752, 0.863
*	x86: Improve svml_s_atanhf4_core_sse4.S	Noah Goldstein	2022-06-09	1	-209/+169
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Improvements are: 1. Reduce code size (-62 bytes). 2. Remove redundant move instructions. 3. Slightly improve instruction selection/scheduling where possible. 4. Prefer registers which get short instruction encoding. 5. Reduce rodata usage (-16 bytes). The throughput improvement is not significant as the port 0 bottleneck is unavoidable. Function, New Time, Old Time, New / Old _ZGVbN4v_atanhf, 8.821, 8.903, 0.991
*	x86: Improve svml_s_atanhf8_core_avx2.S	Noah Goldstein	2022-06-09	1	-203/+202
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Improvements are: 1. Reduce code size (-60 bytes). 2. Remove redundant move instructions. 3. Slightly improve instruction selection/scheduling where possible. 4. Prefer registers which get short instruction encoding. 5. Shrink rodata usage (-32 bytes). The throughput improvement is not that significant (3-5%) as the port 0 bottleneck is unavoidable. Function, New Time, Old Time, New / Old _ZGVdN8v_atanhf, 2.799, 2.923, 0.958
*	x86: Improve svml_s_atanhf16_core_avx512.S	Noah Goldstein	2022-06-09	1	-230/+244
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Improvements are: 1. Reduce code size (-64 bytes). 2. Remove redundant move instructions. 3. Slightly improve instruction selection/scheduling where possible. 4. Reduce rodata size ([-128, -188] bytes). The throughput improvement is not significant as the port 0 bottleneck is unavoidable. Function, New Time, Old Time, New / Old _ZGVeN16v_atanhf, 1.39, 1.408, 0.987
*	x86: Align varshift table to 32-bytes	Noah Goldstein	2022-06-09	2	-3/+5
\| \| \| \|	This ensures the load will never split a cache line.
*	x86: Add copyright to strpbrk-c.c	Noah Goldstein	2022-06-09	1	-0/+18
\|
*	x86: Fix page cross case in rawmemchr-avx2 [BZ #29234]	Noah Goldstein	2022-06-08	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 6dcbb7d95dded20153b12d76d2f4e0ef0cda4f35 Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Mon Jun 6 21:11:33 2022 -0700 x86: Shrink code size of memchr-avx2.S Changed how the page cross case aligned string (rdi) in rawmemchr. This was incompatible with how `L(cross_page_continue)` expected the pointer to be aligned and would cause rawmemchr to read data start started before the beginning of the string. What it would read was in valid memory but could count CHAR matches resulting in an incorrect return value. This commit fixes that issue by essentially reverting the changes to the L(page_cross) case as they didn't really matter. Test cases added and all pass with the new code (and where confirmed to fail with the old code). Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	nptl: Fix __libc_cleanup_pop_restore asynchronous restore (BZ#29214)	Adhemerval Zanella	2022-06-08	2	-0/+83
\| \| \| \| \| \|	This was due a wrong revert done on 404656009b459658. Checked on x86_64-linux-gnu.
*	x86: ZERO_UPPER_VEC_REGISTERS_RETURN_XTEST expect no transactions	Noah Goldstein	2022-06-07	1	-3/+3
\| \| \| \| \| \| \| \|	Give fall-through path to `vzeroupper` and taken-path to `vzeroall`. Generally even on machines with RTM the expectation is the string-library functions will not be called in transactions. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	x86: Shrink code size of memchr-evex.S	Noah Goldstein	2022-06-07	1	-21/+25
\| \| \| \| \| \| \| \| \| \| \| \| \|	This is not meant as a performance optimization. The previous code was far to liberal in aligning targets and wasted code size unnecissarily. The total code size saving is: 64 bytes There are no non-negligible changes in the benchmarks. Geometric Mean of all benchmarks New / Old: 1.000 Full xcheck passes on x86_64. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	x86: Shrink code size of memchr-avx2.S	Noah Goldstein	2022-06-07	2	-50/+60
\| \| \| \| \| \| \| \| \| \| \| \| \|	This is not meant as a performance optimization. The previous code was far to liberal in aligning targets and wasted code size unnecissarily. The total code size saving is: 59 bytes There are no major changes in the benchmarks. Geometric Mean of all benchmarks New / Old: 0.967 Full xcheck passes on x86_64. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	x86: Optimize memrchr-avx2.S	Noah Goldstein	2022-06-07	2	-278/+257
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The new code: 1. prioritizes smaller user-arg lengths more. 2. optimizes target placement more carefully 3. reuses logic more 4. fixes up various inefficiencies in the logic. The biggest case here is the `lzcnt` logic for checking returns which saves either a branch or multiple instructions. The total code size saving is: 306 bytes Geometric Mean of all benchmarks New / Old: 0.760 Regressions: There are some regressions. Particularly where the length (user arg length) is large but the position of the match char is near the beginning of the string (in first VEC). This case has roughly a 10-20% regression. This is because the new logic gives the hot path for immediate matches to shorter lengths (the more common input). This case has roughly a 15-45% speedup. Full xcheck passes on x86_64. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	x86: Optimize memrchr-evex.S	Noah Goldstein	2022-06-07	1	-271/+268
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The new code: 1. prioritizes smaller user-arg lengths more. 2. optimizes target placement more carefully 3. reuses logic more 4. fixes up various inefficiencies in the logic. The biggest case here is the `lzcnt` logic for checking returns which saves either a branch or multiple instructions. The total code size saving is: 263 bytes Geometric Mean of all benchmarks New / Old: 0.755 Regressions: There are some regressions. Particularly where the length (user arg length) is large but the position of the match char is near the beginning of the string (in first VEC). This case has roughly a 20% regression. This is because the new logic gives the hot path for immediate matches to shorter lengths (the more common input). This case has roughly a 35% speedup. Full xcheck passes on x86_64. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	x86: Optimize memrchr-sse2.S	Noah Goldstein	2022-06-07	1	-321/+292
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The new code: 1. prioritizes smaller lengths more. 2. optimizes target placement more carefully. 3. reuses logic more. 4. fixes up various inefficiencies in the logic. The total code size saving is: 394 bytes Geometric Mean of all benchmarks New / Old: 0.874 Regressions: 1. The page cross case is now colder, especially re-entry from the page cross case if a match is not found in the first VEC (roughly 50%). My general opinion with this patch is this is acceptable given the "coldness" of this case (less than 4%) and generally performance improvement in the other far more common cases. 2. There are some regressions 5-15% for medium/large user-arg lengths that have a match in the first VEC. This is because the logic was rewritten to optimize finds in the first VEC if the user-arg length is shorter (where we see roughly 20-50% performance improvements). It is not always the case this is a regression. My intuition is some frontend quirk is partially explaining the data although I haven't been able to find the root cause. Full xcheck passes on x86_64. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	x86: Add COND_VZEROUPPER that can replace vzeroupper if no `ret`	Noah Goldstein	2022-06-07	2	-0/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The RTM vzeroupper mitigation has no way of replacing inline vzeroupper not before a return. This can be useful when hoisting a vzeroupper to save code size for example: ``` L(foo): cmpl %eax, %edx jz L(bar) tzcntl %eax, %eax addq %rdi, %rax VZEROUPPER_RETURN L(bar): xorl %eax, %eax VZEROUPPER_RETURN ``` Can become: ``` L(foo): COND_VZEROUPPER cmpl %eax, %edx jz L(bar) tzcntl %eax, %eax addq %rdi, %rax ret L(bar): xorl %eax, %eax ret ``` This code does not change any existing functionality. There is no difference in the objdump of libc.so before and after this patch. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	x86: Create header for VEC classes in x86 strings library	Noah Goldstein	2022-06-07	7	-0/+327
\| \| \| \| \| \| \| \| \| \|	This patch does not touch any existing code and is only meant to be a tool for future patches so that simple source files can more easily be maintained to target multiple VEC classes. There is no difference in the objdump of libc.so before and after this patch. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	powerpc: Fix VSX register number on __strncpy_power9 [BZ #29197]	Matheus Castanho	2022-06-07	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	__strncpy_power9 initializes VR 18 with zeroes to be used throughout the code, including when zero-padding the destination string. However, the v18 reference was mistakenly being used for stxv and stxvl, which take a VSX vector as operand. The code ended up using the uninitialized VSR 18 register by mistake. Both occurrences have been changed to use the proper VSX number for VR 18 (i.e. VSR 50). Tested on powerpc, powerpc64 and powerpc64le. Signed-off-by: Kewen Lin <linkw@gcc.gnu.org>
*	AArch64: Sort makefile entries	Wilco Dijkstra	2022-06-07	1	-6/+18
\| \| \| \|	Sort makefile entries to reduce conflicts.
*	AArch64: Add SVE memcpy	Wilco Dijkstra	2022-06-07	5	-42/+284
\| \| \| \| \| \|	Add an initial SVE memcpy implementation. Copies up to 32 bytes use SVE vectors which improves the random memcpy benchmark significantly. Cleanup the memcpy and memmove ifunc selectors.
*	x86_64: Add strstr function with 512-bit EVEX	Raghuveer Devulapalli	2022-06-06	4	-4/+242
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adding a 512-bit EVEX version of strstr. The algorithm works as follows: (1) We spend a few cycles at the begining to peek into the needle. We locate an edge in the needle (first occurance of 2 consequent distinct characters) and also store the first 64-bytes into a zmm register. (2) We search for the edge in the haystack by looking into one cache line of the haystack at a time. This avoids having to read past a page boundary which can cause a seg fault. (3) If an edge is found in the haystack we first compare the first 64-bytes of the needle (already stored in a zmm register) before we proceed with a full string compare performed byte by byte. Benchmarking results: (old = strstr_sse2_unaligned, new = strstr_avx512) Geometric mean of all benchmarks: new / old = 0.66 Difficult skiptable(0) : new / old = 0.02 Difficult skiptable(1) : new / old = 0.01 Difficult 2-way : new / old = 0.25 Difficult testing first 2 : new / old = 1.26 Difficult skiptable(0) : new / old = 0.05 Difficult skiptable(1) : new / old = 0.06 Difficult 2-way : new / old = 0.26 Difficult testing first 2 : new / old = 1.05 Difficult skiptable(0) : new / old = 0.42 Difficult skiptable(1) : new / old = 0.24 Difficult 2-way : new / old = 0.21 Difficult testing first 2 : new / old = 1.04 Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	grep: egrep -> grep -E, fgrep -> grep -F	Sam James	2022-06-05	5	-8/+8
\| \| \| \| \| \| \| \| \| \| \|	Newer versions of GNU grep (after grep 3.7, not inclusive) will warn on 'egrep' and 'fgrep' invocations. Convert usages within the tree to their expanded non-aliased counterparts to avoid irritating warnings during ./configure and the test suite. Signed-off-by: Sam James <sam@gentoo.org> Reviewed-by: Fangrui Song <maskray@google.com>
*	linux: Add process_mrelease	Adhemerval Zanella	2022-06-02	38	-0/+124
\| \| \| \| \| \| \| \| \|	Added in Linux 5.15 (884a7e5964e06ed93c7771c0d7cf19c09a8946f1), the new syscalls allows a caller to free the memory of a dying target process. Checked on x86_64-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
*	linux: Add process_madvise	Adhemerval Zanella	2022-06-02	38	-0/+214
\| \| \| \| \| \| \| \| \| \|	It was added on Linux 5.10 (ecb8ac8b1f146915aa6b96449b66dd48984caacc) with the same functionality as madvise but using a pidfd of the target process. Checked on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
*	linux: Set tst-pidfd-consts unsupported for kernels headers older than 5.10	Adhemerval Zanella	2022-06-02	1	-0/+3
\| \| \| \| \| \| \| \|	Instead of fail trying to build the compare source file. Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Matheus Castanho <msc@linux.ibm.com> Reviewed-by: Matheus Castanho <msc@linux.ibm.com>
*	Linux: Adjust struct rseq definition to current kernel version	Florian Weimer	2022-06-02	1	-22/+6
\| \| \| \| \| \| \| \|	This definition is only used as a fallback with old kernel headers. The change follows kernel commit bfdf4e6208051ed7165b2e92035b4bf11 ("rseq: Remove broken uapi field layout on 32-bit little endian"). Reviewed-by: Carlos O'Donell <carlos@redhat.com>
*	socket: Use 64 bit stat for isfdtype (BZ# 29209)	Adhemerval Zanella	2022-06-01	1	-2/+2
\| \| \| \| \| \|	This is a missing spot initially from 52a5fe70a2c77935. Checked on i686-linux-gnu.
*	posix: Use 64 bit stat for fpathconf (_PC_ASYNC_IO) (BZ# 29208)	Adhemerval Zanella	2022-06-01	1	-2/+2
\| \| \| \| \| \|	This is a missing spot initially from 52a5fe70a2c77935. Checked on i686-linux-gnu.
*	posix: Use 64 bit stat for posix_fallocate fallback (BZ# 29207)	Adhemerval Zanella	2022-06-01	2	-4/+4
\| \| \| \| \| \|	This is a missing spot initially from 52a5fe70a2c77935. Checked on i686-linux-gnu.
*	linux: use statx for fstat if neither newfstatat nor fstatat64 is present	WANG Xuerui	2022-06-01	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \|	LoongArch is going to be the first architecture supported by Linux that has neither fstat* nor newfstatat [1], instead exclusively relying on statx. So in fstatat64's implementation, we need to also enable statx usage if neither fstatat64 nor newfstatat is present, to prepare for this new case of kernel ABI. [1]: https://lore.kernel.org/all/20220518092619.1269111-1-chenhuacai@loongson.cn/ Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
*	Add MADV_DONTNEED_LOCKED from Linux 5.18 to bits/mman-linux.h	Joseph Myers	2022-06-01	1	-0/+2
\| \| \| \| \| \| \| \|	Linux 5.18 adds a constant MADV_DONTNEED_LOCKED (defined in multiple header files, but with the same value on all architectures). Add this constant to bits/mman-linux.h. Tested for x86_64.
*	Add HWCAP2_MTE3 from Linux 5.18 to AArch64 bits/hwcap.h	Joseph Myers	2022-06-01	1	-0/+1
\| \| \| \| \| \| \|	Linux 5.18 defines a new AArch64 HWCAP value HWCAP2_MTE3; add it to glibc's sysdeps/unix/sysv/linux/aarch64/bits/hwcap.h. Tested with build-many-glibcs.py for aarch64-linux-gnu.
*	i686: Use generic sincosf implementation for SSE2 version	Adhemerval Zanella	2022-06-01	5	-585/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The generic implementation shows slight better performance (gcc 11.2.1 on a Ryzen 9 5900X): * s_sincosf-sse2.S: "sincosf": { "workload-random": { "duration": 3.89961e+09, "iterations": 9.5472e+07, "reciprocal-throughput": 40.8429, "latency": 40.8483, "max-throughput": 2.4484e+07, "min-throughput": 2.44808e+07 } } * generic s_cossinf.c: "sincosf": { "workload-random": { "duration": 3.71953e+09, "iterations": 1.48512e+08, "reciprocal-throughput": 25.0515, "latency": 25.0391, "max-throughput": 3.99177e+07, "min-throughput": 3.99375e+07 } } Checked on i686-linux-gnu. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	i686: Use generic sinf implementation for SSE2 version	Adhemerval Zanella	2022-06-01	5	-565/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Performance seems to be similar (gcc 11.2.1 on a Ryzen 9 5900X), the generic algorithm shows slight better performance for the 'workload-huge.wrf' input set. * s_sinf-sse2.S: "sinf": { "": { "duration": 3.72405e+09, "iterations": 2.38374e+08, "max": 63.973, "min": 11.211, "mean": 15.6227 }, "workload-random.wrf": { "duration": 3.76923e+09, "iterations": 8.4e+07, "reciprocal-throughput": 17.6355, "latency": 72.108, "max-throughput": 5.67037e+07, "min-throughput": 1.38681e+07 }, "workload-huge.wrf": { "duration": 3.76943e+09, "iterations": 6e+07, "reciprocal-throughput": 29.3493, "latency": 96.2985, "max-throughput": 3.40724e+07, "min-throughput": 1.03844e+07 } } * generic s_sinf.c: "sinf": { "": { "duration": 3.70989e+09, "iterations": 2.18025e+08, "max": 69.782, "min": 11.1, "mean": 17.0159 }, "workload-random.wrf": { "duration": 3.77213e+09, "iterations": 9.6e+07, "reciprocal-throughput": 17.5402, "latency": 61.0459, "max-throughput": 5.70119e+07, "min-throughput": 1.63811e+07 }, "workload-huge.wrf": { "duration": 3.81576e+09, "iterations": 5.6e+07, "reciprocal-throughput": 38.2111, "latency": 98.0659, "max-throughput": 2.61704e+07, "min-throughput": 1.01972e+07 } } Checked on i686-linux-gnu. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	i686: Use generic cosf implementation for SSE2 version	Adhemerval Zanella	2022-06-01	5	-552/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Performance seems to be similar (gcc 11.2.1 on a Ryzen 9 5900X): * s_cosf-sse2.S: "cosf": { "workload-random": { "duration": 3.74987e+09, "iterations": 9.616e+07, "reciprocal-throughput": 15.8141, "latency": 62.1782, "max-throughput": 6.32346e+07, "min-throughput": 1.60828e+07 } } * generic s_cosf.c: "cosf": { "workload-random": { "duration": 3.87298e+09, "iterations": 1.00968e+08, "reciprocal-throughput": 18.3448, "latency": 58.3722, "max-throughput": 5.45113e+07, "min-throughput": 1.71314e+07 } } Checked on i686-linux-gnu.
*	x86_64: Optimize sincos where sin/cos is optimized (bug 29193)	Andreas Schwab	2022-06-01	6	-3/+55
\| \| \| \| \| \| \|	The compiler may substitute calls to sin or cos with calls to sincos, thus we should have the same optimized implementations for sincos. The optimized implementations may produce results that differ, that also makes sure that the sincos call aggrees with the sin and cos calls.
*	Add SOL_SMC from Linux 5.18 to bits/socket.h	Joseph Myers	2022-05-31	1	-0/+1
\| \| \| \| \| \| \|	Linux 5.18 adds a constant SOL_SMC to the getsockopt / setsockopt levels; add this constant to bits/socket.h. Tested for x86_64.
*	elf: Remove _dl_skip_args	Adhemerval Zanella	2022-05-30	2	-5/+0
\| \| \| \| \| \|	Now that no architecture uses it anymore. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
*	x86_64: Remove _dl_skip_args usage	Adhemerval Zanella	2022-05-30	2	-22/+3
\| \| \| \| \| \| \| \| \| \| \|	Since ad43cac44a the generic code already shuffles the argv/envp/auxv on the stack to remove the ld.so own arguments and thus _dl_skip_args is always 0. So there is no need to adjust the argc or argv. Checked on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
*	sparc: Remove _dl_skip_args usage	Adhemerval Zanella	2022-05-30	2	-79/+4
\| \| \| \| \| \| \| \| \| \|	Since ad43cac44a the generic code already shuffles the argv/envp/auxv on the stack to remove the ld.so own arguments and thus _dl_skip_args is always 0. So there is no need to adjust the argc or argv. Checked on sparc64-linux-gnu and sparcv9-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
*	sh: Remove _dl_skip_args usage	Adhemerval Zanella	2022-05-30	1	-15/+1
\| \| \| \| \| \| \| \| \| \| \|	Since ad43cac44a the generic code already shuffles the argv/envp/auxv on the stack to remove the ld.so own arguments and thus _dl_skip_args is always 0. So there is no need to adjust the argc or argv. Checked with qemu-user that arguments are correctly passed on both constructors and main program. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
*	s390: Remove _dl_skip_args usage	Adhemerval Zanella	2022-05-30	2	-62/+0
\| \| \| \| \| \| \| \| \| \|	Since ad43cac44a the generic code already shuffles the argv/envp/auxv on the stack to remove the ld.so own arguments and thus _dl_skip_args is always 0. So there is no need to adjust the argc or argv. Checked on s390x-linux-gnu and s390-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
*	riscv: Remove _dl_skip_args usage	Adhemerval Zanella	2022-05-30	1	-11/+1
\| \| \| \| \| \| \| \| \| \| \|	Since ad43cac44a the generic code already shuffles the argv/envp/auxv on the stack to remove the ld.so own arguments and thus _dl_skip_args is always 0. So there is no need to adjust the argc or argv. Checked with qemu-user that arguments are correctly passed on both constructors and main program. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
*	nios2: Remove _dl_skip_args usage (BZ# 29187)	Adhemerval Zanella	2022-05-30	1	-40/+10
\| \| \| \| \| \| \| \| \| \| \|	Since ad43cac44a the generic code already shuffles the argv/envp/auxv on the stack to remove the ld.so own arguments and thus _dl_skip_args is always 0. So there is no need to adjust the argc or argv. Checked with qemu-user that arguments are correctly passed on both constructors and main program. Reviewed-by: Carlos O'Donell <carlos@redhat.com>