mirror/glibc - mirror of git://sourceware.org/git/glibc.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	malloc: Add Huge Page support for mmap	Adhemerval Zanella	2021-12-15	3	-0/+142
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With the morecore hook removed, there is not easy way to provide huge pages support on with glibc allocator without resorting to transparent huge pages. And some users and programs do prefer to use the huge pages directly instead of THP for multiple reasons: no splitting, re-merging by the VM, no TLB shootdowns for running processes, fast allocation from the reserve pool, no competition with the rest of the processes unlike THP, no swapping all, etc. This patch extends the 'glibc.malloc.hugetlb' tunable: the value '2' means to use huge pages directly with the system default size, while a positive value means and specific page size that is matched against the supported ones by the system. Currently only memory allocated on sysmalloc() is handled, the arenas still uses the default system page size. To test is a new rule is added tests-malloc-hugetlb2, which run the addes tests with the required GLIBC_TUNABLE setting. On systems without a reserved huge pages pool, is just stress the mmap(MAP_HUGETLB) allocation failure. To improve test coverage it is required to create a pool with some allocated pages. Checked on x86_64-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>
*	malloc: Add madvise support for Transparent Huge Pages	Adhemerval Zanella	2021-12-15	4	-0/+150
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Linux Transparent Huge Pages (THP) current supports three different states: 'never', 'madvise', and 'always'. The 'never' is self-explanatory and 'always' will enable THP for all anonymous pages. However, 'madvise' is still the default for some system and for such case THP will be only used if the memory range is explicity advertise by the program through a madvise(MADV_HUGEPAGE) call. To enable it a new tunable is provided, 'glibc.malloc.hugetlb', where setting to a value diffent than 0 enables the madvise call. This patch issues the madvise(MADV_HUGEPAGE) call after a successful mmap() call at sysmalloc() with sizes larger than the default huge page size. The madvise() call is disable is system does not support THP or if it has the mode set to "never" and on Linux only support one page size for THP, even if the architecture supports multiple sizes. To test is a new rule is added tests-malloc-hugetlb1, which run the addes tests with the required GLIBC_TUNABLE setting. Checked on x86_64-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>
*	powerpc: Use global register variable in <thread_pointer.h>	Florian Weimer	2021-12-15	2	-12/+8
\| \| \| \| \| \| \| \| \| \| \| \| \|	A local register variable is merely a compiler hint, and so not appropriate in this context. Move the global register variable into <thread_pointer.h> and include it from <tls.h>, as there can only be one global definition for one particular register. Fixes commit 8dbeb0561eeb876f557ac9eef5721912ec074ea5 ("nptl: Add <thread_pointer.h> for defining __thread_pointer"). Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Raphael M Zinsly <rzinsly@linux.ibm.com>
*	Support target specific ALIGN for variable alignment test [BZ #28676]	H.J. Lu	2021-12-14	4	-0/+80
\| \| \| \| \| \| \| \| \| \| \|	Add <tst-file-align.h> to support target specific ALIGN for variable alignment test: 1. Alpha: Use 0x10000. 2. MicroBlaze and Nios II: Use 0x8000. 3. All others: Use 0x200000. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
*	hurd: Do not set PIE_UNSUPPORTED	Samuel Thibault	2021-12-14	2	-11/+0
\| \| \| \|	This is now supported.
*	sysdeps: Simplify sin Taylor Series calculation	Akila Welihinda	2021-12-13	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \|	The macro TAYLOR_SIN adds the term `-0.5daa^2 + da` in hopes of regaining some precision as a function of da. However the comment says we add the term `-0.5daa^2 + 0.5*da` which is different. This fix updates the comment to reflect the code and also simplifies the calculation by replacing `a` with `x` because they always have the same value. Signed-off-by: Akila Welihinda <akilawelihinda@ucla.edu> Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
*	math: Remove the error handling wrapper from hypot and hypotf	Adhemerval Zanella	2021-12-13	31	-10/+101
\| \| \| \| \| \| \| \| \| \| \| \|	The error handling is moved to sysdeps/ieee754 version with no SVID support. The compatibility symbol versions still use the wrapper with SVID error handling around the new code. There is no new symbol version nor compatibility code on !LIBM_SVID_COMPAT targets (e.g. riscv). Only ia64 is unchanged, since it still uses the arch specific __libm_error_region on its implementation. Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
*	math: Use fmin/fmax on hypot	Wilco Dijkstra	2021-12-13	1	-2/+3
\| \| \| \| \| \|	It optimizes for architectures that provides fast builtins. Checked on aarch64-linux-gnu.
*	aarch64: Add math-use-builtins-f{max,min}.h	Adhemerval Zanella	2021-12-13	6	-112/+8
\| \| \| \|	It allows to remove the arch-specific implementations.
*	math: Add math-use-builtinds-fmin.h	Adhemerval Zanella	2021-12-13	2	-0/+5
\| \| \| \| \|	It allows the architecture to use the builtin instead of generic implementation.
*	math: Add math-use-builtinds-fmax.h	Adhemerval Zanella	2021-12-13	6	-0/+9
\| \| \| \| \|	It allows the architecture to use the builtin instead of generic implementation.
*	math: Remove powerpc e_hypot	Adhemerval Zanella	2021-12-13	9	-327/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The generic implementation is shows only slight worse performance: POWER10 reciprocal-throughput latency master 8.28478 13.7253 new hypot 7.21945 13.1933 POWER9 reciprocal-throughput latency master 13.4024 14.0967 new hypot 14.8479 15.8061 POWER8 reciprocal-throughput latency master 15.5767 16.8885 new hypot 16.5371 18.4057 One way to improve might to make gcc generate xsmaxdp/xsmindp for fmax/fmin (it onl does for -ffast-math, clang does for default options). Checked on powerpc64-linux-gnu (power8) and powerpc64le-linux-gnu (power9).
*	i386: Move hypot implementation to C	Adhemerval Zanella	2021-12-13	3	-139/+48
\| \| \| \| \| \| \| \| \| \|	The generic hypotf is slight slower, mostly due the tricks the assembly does to optimize the isinf/isnan/issignaling. The generic hypot is way slower, since the optimized implementation uses the i386 default excessive precision to issue the operation directly. A similar implementation is provided instead of using the generic implementation: Checked on i686-linux-gnu.
*	math: Use an improved algorithm for hypotl (ldbl-128)	Adhemerval Zanella	2021-12-13	1	-130/+96
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This implementation is based on 'An Improved Algorithm for hypot(a,b)' by Carlos F. Borges [1] using the MyHypot3 with the following changes: - Handle qNaN and sNaN. - Tune the 'widely varying operands' to avoid spurious underflow due the multiplication and fix the return value for upwards rounding mode. - Handle required underflow exception for subnormal results. The main advantage of the new algorithm is its precision. With a random 1e9 input pairs in the range of [LDBL_MIN, LDBL_MAX], glibc current implementation shows around 0.05% results with an error of 1 ulp (453266 results) while the new implementation only shows 0.0001% of total (1280). Checked on aarch64-linux-gnu and x86_64-linux-gnu. [1] https://arxiv.org/pdf/1904.09481.pdf
*	math: Use an improved algorithm for hypotl (ldbl-96)	Adhemerval Zanella	2021-12-13	1	-133/+98
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This implementation is based on 'An Improved Algorithm for hypot(a,b)' by Carlos F. Borges [1] using the MyHypot3 with the following changes: - Handle qNaN and sNaN. - Tune the 'widely varying operands' to avoid spurious underflow due the multiplication and fix the return value for upwards rounding mode. - Handle required underflow exception for subnormal results. The main advantage of the new algorithm is its precision. With a random 1e8 input pairs in the range of [LDBL_MIN, LDBL_MAX], glibc current implementation shows around 0.02% results with an error of 1 ulp (23158 results) while the new implementation only shows 0.0001% of total (111). [1] https://arxiv.org/pdf/1904.09481.pdf
*	math: Improve hypot performance with FMA	Wilco Dijkstra	2021-12-13	1	-1/+16
\| \| \| \| \| \| \| \| \| \| \|	Improve hypot performance significantly by using fma when available. The fma version has twice the throughput of the previous version and 70% of the latency. The non-fma version has 30% higher throughput and 10% higher latency. Max ULP error is 0.949 with fma and 0.792 without fma. Passes GLIBC testsuite.
*	math: Use an improved algorithm for hypot (dbl-64)	Wilco Dijkstra	2021-12-13	1	-143/+92
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This implementation is based on the 'An Improved Algorithm for hypot(a,b)' by Carlos F. Borges [1] using the MyHypot3 with the following changes: - Handle qNaN and sNaN. - Tune the 'widely varying operands' to avoid spurious underflow due the multiplication and fix the return value for upwards rounding mode. - Handle required underflow exception for denormal results. The main advantage of the new algorithm is its precision: with a random 1e9 input pairs in the range of [DBL_MIN, DBL_MAX], glibc current implementation shows around 0.34% results with an error of 1 ulp (3424869 results) while the new implementation only shows 0.002% of total (18851). The performance result are also only slight worse than current implementation. On x86_64 (Ryzen 5900X) with gcc 12: Before: "hypot": { "workload-random": { "duration": 3.73319e+09, "iterations": 1.12e+08, "reciprocal-throughput": 22.8737, "latency": 43.7904, "max-throughput": 4.37184e+07, "min-throughput": 2.28361e+07 } } After: "hypot": { "workload-random": { "duration": 3.7597e+09, "iterations": 9.8e+07, "reciprocal-throughput": 23.7547, "latency": 52.9739, "max-throughput": 4.2097e+07, "min-throughput": 1.88772e+07 } } Co-Authored-By: Adhemerval Zanella <adhemerval.zanella@linaro.org> Checked on x86_64-linux-gnu and aarch64-linux-gnu. [1] https://arxiv.org/pdf/1904.09481.pdf
*	math: Simplify hypotf implementation	Adhemerval Zanella	2021-12-13	2	-35/+37
\| \| \| \| \| \| \| \| \| \| \| \|	Use a more optimized comparison for check for NaN and infinite and add an inlined issignaling implementation for float. With gcc it results in 2 FP comparisons. The file Copyright is also changed to use GPL, the implementation was completely changed by 7c10fd3515f to use double precision instead of scaling and this change removes all the GET_FLOAT_WORD usage. Checked on x86_64-linux-gnu.
*	Cleanup encoding in comments	Siddhesh Poyarekar	2021-12-13	4	-32/+32
\| \| \| \| \| \| \| \| \|	Replace non-UTF-8 and non-ASCII characters in comments with their UTF-8 equivalents so that files don't end up with mixed encodings. With this, all files (except tests that actually test different encodings) have a single encoding. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
*	Replace --enable-static-pie with --disable-default-pie	Siddhesh Poyarekar	2021-12-13	13	-0/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Build glibc programs and tests as PIE by default and enable static-pie automatically if the architecture and toolchain supports it. Also add a new configuration option --disable-default-pie to prevent building programs as PIE. Only the following architectures now have PIE disabled by default because they do not work at the moment. hppa, ia64, alpha and csky don't work because the linker is unable to handle a pcrel relocation generated from PIE objects. The microblaze compiler is currently failing with an ICE. GNU hurd tries to enable static-pie, which does not work and hence fails. All these targets have default PIE disabled at the moment and I have left it to the target maintainers to enable PIE on their targets. build-many-glibcs runs clean for all targets. I also tested x86_64 on Fedora and Ubuntu, to verify that the default build as well as --disable-default-pie work as expected with both system toolchains. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
*	hurd: Add rules for static PIE build	Samuel Thibault	2021-12-12	1	-0/+2
\| \| \| \|	This fixes [BZ #28671].
*	hurd: Fix gmon-static	Samuel Thibault	2021-12-12	1	-1/+1
\| \| \| \|	We need to use crt0 for gmon-static too.
*	x86-64: Remove LD_PREFER_MAP_32BIT_EXEC support [BZ #28656]	H.J. Lu	2021-12-10	4	-95/+0
\| \| \| \| \| \| \| \| \| \|	Remove the LD_PREFER_MAP_32BIT_EXEC environment variable support since the first PT_LOAD segment is no longer executable due to defaulting to -z separate-code. This fixes [BZ #28656]. Reviewed-by: Florian Weimer <fweimer@redhat.com>
*	nptl: Add one more barrier to nptl/tst-create1	Florian Weimer	2021-12-10	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Without the bar_ctor_finish barrier, it was possible that thread2 re-locked user_lock before ctor had a chance to lock it. ctor then blocked in its locking operation, xdlopen from the main thread did not return, and thread2 was stuck waiting in bar_dtor: thread 1: started. thread 2: started. thread 2: locked user_lock. constructor started: 0. thread 1: in ctor: started. thread 3: started. thread 3: done. thread 2: unlocked user_lock. thread 2: locked user_lock. Fixes the test in commit 83b5323261bb72313bffcf37476c1b8f0847c736 ("elf: Avoid deadlock between pthread_create and ctors [BZ #28357]"). Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
*	Remove TLS_TCB_ALIGN and TLS_INIT_TCB_ALIGN	Florian Weimer	2021-12-09	21	-130/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	TLS_INIT_TCB_ALIGN is not actually used. TLS_TCB_ALIGN was likely introduced to support a configuration where the thread pointer has not the same alignment as THREAD_SELF. Only ia64 seems to use that, but for the stack/pointer guard, not for storing tcbhead_t. Some ports use TLS_TCB_OFFSET and TLS_PRE_TCB_SIZE to shift the thread pointer, potentially landing in a different residue class modulo the alignment, but the changes should not impact that. In general, given that TLS variables have their own alignment requirements, having different alignment for the (unshifted) thread pointer and struct pthread would potentially result in dynamic offsets, leading to more complexity. hppa had different values before: __alignof__ (tcbhead_t), which seems to be 4, and __alignof__ (struct pthread), which was 8 (old default) and is now 32. However, it defines THREAD_SELF as: /* Return the thread descriptor for the current thread. / # define THREAD_SELF \ ({ struct pthread __self; \ __self = __get_cr27(); \ __self - 1; \ }) So the thread pointer points after struct pthread (hence __self - 1), and they have to have the same alignment on hppa as well. Similarly, on ia64, the definitions were different. We have: # define TLS_PRE_TCB_SIZE \ (sizeof (struct pthread) \ + (PTHREAD_STRUCT_END_PADDING < 2 * sizeof (uintptr_t) \ ? ((2 * sizeof (uintptr_t) + __alignof__ (struct pthread) - 1) \ & ~(__alignof__ (struct pthread) - 1)) \ : 0)) # define THREAD_SELF \ ((struct pthread ) ((char ) __thread_self - TLS_PRE_TCB_SIZE)) And TLS_PRE_TCB_SIZE is a multiple of the struct pthread alignment (confirmed by the new _Static_assert in sysdeps/ia64/libc-tls.c). On m68k, we have a larger gap between tcbhead_t and struct pthread. But as far as I can tell, the port is fine with that. The definition of TCB_OFFSET is sufficient to handle the shifted TCB scenario. This fixes commit 23c77f60181eb549f11ec2f913b4270af29eee38 ("nptl: Increase default TCB alignment to 32"). Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	nptl: Add public rseq symbols and <sys/rseq.h>	Florian Weimer	2021-12-09	36	-5/+145
\| \| \| \| \| \| \| \| \| \| \| \| \|	The relationship between the thread pointer and the rseq area is made explicit. The constant offset can be used by JIT compilers to optimize rseq access (e.g., for really fast sched_getcpu). Extensibility is provided through __rseq_size and __rseq_flags. (In the future, the kernel could request a different rseq size via the auxiliary vector.) Co-Authored-By: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
*	nptl: Add glibc.pthread.rseq tunable to control rseq registration	Florian Weimer	2021-12-09	6	-8/+126
\| \| \| \| \| \| \| \|	This tunable allows applications to register the rseq area instead of glibc. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
*	Linux: Use rseq to accelerate sched_getcpu	Florian Weimer	2021-12-09	1	-2/+17
\| \| \| \| \|	Co-Authored-By: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
*	nptl: Add rseq registration	Florian Weimer	2021-12-09	14	-3/+935
\| \| \| \| \| \| \| \| \| \| \| \|	The rseq area is placed directly into struct pthread. rseq registration failure is not treated as an error, so it is possible that threads run with inconsistent registration status. <sys/rseq.h> is not yet installed as a public header. Co-Authored-By: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
*	nptl: Introduce THREAD_GETMEM_VOLATILE	Florian Weimer	2021-12-09	3	-0/+6
\| \| \| \| \| \|	This will be needed for rseq TCB access. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
*	nptl: Introduce <tcb-access.h> for THREAD_* accessors	Florian Weimer	2021-12-09	21	-376/+301
\| \| \| \| \| \| \|	These are common between most architectures. Only the x86 targets are outliers. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
*	nptl: Add <thread_pointer.h> for defining __thread_pointer	Florian Weimer	2021-12-09	3	-0/+99
\| \| \| \| \| \| \| \| \|	<tls.h> already contains a definition that is quite similar, but it is not consistent across architectures. Only architectures for which rseq support is added are covered. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
*	x86: Don't set Prefer_No_AVX512 for processors with AVX512 and AVX-VNNI	H.J. Lu	2021-12-06	1	-2/+5
\| \| \| \| \| \|	Don't set Prefer_No_AVX512 on processors with AVX512 and AVX-VNNI since they won't lower CPU frequency when ZMM load and store instructions are used.
*	linux: Add generic ioctl implementation	Adhemerval Zanella	2021-12-06	3	-31/+85
\| \| \| \|	The powerpc is refactor to use the default implementation.
*	linux: Add generic syscall implementation	Adhemerval Zanella	2021-12-06	4	-67/+65
\| \| \| \| \|	It allows also to remove hppa specific implementation and simplify riscv implementation a bit.
*	csu: Always use __executable_start in gmon-start.c	Florian Weimer	2021-12-05	3	-63/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Current binutils defines __executable_start as the lowest text address, so using the entry point address as a fallback is no longer necessary. As a result, overriding <entry.h> is only necessary if the entry point is not called _start. The previous approach to define __ASSEMBLY__ to suppress the declaration breaks if headers included by <entry.h> are not compatible with __ASSEMBLY__. This happens with rseq integration because it is necessary to include kernel headers in more places. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	elf: execve statically linked programs instead of crashing [BZ #28648]	Florian Weimer	2021-12-05	2	-0/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Programs without dynamic dependencies and without a program interpreter are now run via execve. Previously, the dynamic linker either crashed while attempting to read a non-existing dynamic segment (looking for DT_AUDIT/DT_DEPAUDIT data), or the self-relocated in the static PIE executable crashed because the outer dynamic linker had already applied RELRO protection. <dl-execve.h> is needed because execve is not available in the dynamic loader on Hurd. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	x86-64: Use notl in EVEX strcmp [BZ #28646]	Noah Goldstein	2021-12-03	1	-6/+8
\| \| \| \| \| \| \| \|	Must use notl %edi here as lower bits are for CHAR comparisons potentially out of range thus can be 0 without indicating mismatch. This fixes BZ #28646. Co-Authored-By: H.J. Lu <hjl.tools@gmail.com>
*	nptl: Increase default TCB alignment to 32	Florian Weimer	2021-12-03	16	-49/+0
\| \| \| \| \| \| \| \| \| \|	rseq support will use a 32-byte aligned field in struct pthread, so the whole struct needs to have at least that alignment. nptl/tst-tls3mod.c uses TCB_ALIGNMENT, therefore include <descr.h> to obtain the fallback definition. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	AArch64: Improve A64FX memcpy	Wilco Dijkstra	2021-12-02	1	-321/+225
\| \| \| \| \| \| \| \| \| \| \| \| \|	v2 is a complete rewrite of the A64FX memcpy. Performance is improved by streamlining the code, aligning all large copies and using a single unrolled loop for all sizes. The code size for memcpy and memmove goes down from 1796 bytes to 868 bytes. Performance is better in all cases: bench-memcpy-random is 2.3% faster overall, bench-memcpy-large is ~33% faster for large sizes, bench-memcpy-walk is 25% faster for small sizes and 20% for the largest sizes. The geomean of all tests in bench-memcpy is 5.1% faster, and total time is reduced by 4%. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
*	AArch64: Optimize memcmp	Wilco Dijkstra	2021-12-02	1	-107/+134
\| \| \| \| \| \| \| \|	Rewrite memcmp to improve performance. On small and medium inputs performance is 10-20% better. Large inputs use a SIMD loop processing 64 bytes per iteration, which is 30-50% faster depending on the size. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
*	powerpc64[le]: Fix CFI and LR save address for asm syscalls [BZ #28532]	Matheus Castanho	2021-11-30	1	-4/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Syscalls based on the assembly templates are missing CFI for r31, which gets clobbered when scv is used, and info for LR is inaccurate, placed in the wrong LOC and not using the proper offset. LR was also being saved to the callee's frame, while the ABI mandates it to be saved to the caller's frame. These are fixed by this commit. After this change: $ readelf -wF libc.so.6 \| grep 0004b9d4.. -A 7 && objdump --disassemble=kill libc.so.6 00004a48 0000000000000020 00004a4c FDE cie=00000000 pc=000000000004b9d4..000000000004ba3c LOC CFA r31 ra 000000000004b9d4 r1+0 u u 000000000004b9e4 r1+48 u u 000000000004b9e8 r1+48 c-16 u 000000000004b9fc r1+48 c-16 c+16 000000000004ba08 r1+48 c-16 000000000004ba18 r1+48 u 000000000004ba1c r1+0 u libc.so.6: file format elf64-powerpcle Disassembly of section .text: 000000000004b9d4 <kill>: 4b9d4: 1f 00 4c 3c addis r2,r12,31 4b9d8: 2c c3 42 38 addi r2,r2,-15572 4b9dc: 25 00 00 38 li r0,37 4b9e0: d1 ff 21 f8 stdu r1,-48(r1) 4b9e4: 20 00 e1 fb std r31,32(r1) 4b9e8: 98 8f ed eb ld r31,-28776(r13) 4b9ec: 10 00 ff 77 andis. r31,r31,16 4b9f0: 1c 00 82 41 beq 4ba0c <kill+0x38> 4b9f4: a6 02 28 7d mflr r9 4b9f8: 40 00 21 f9 std r9,64(r1) 4b9fc: 01 00 00 44 scv 0 4ba00: 40 00 21 e9 ld r9,64(r1) 4ba04: a6 03 28 7d mtlr r9 4ba08: 08 00 00 48 b 4ba10 <kill+0x3c> 4ba0c: 02 00 00 44 sc 4ba10: 00 00 bf 2e cmpdi cr5,r31,0 4ba14: 20 00 e1 eb ld r31,32(r1) 4ba18: 30 00 21 38 addi r1,r1,48 4ba1c: 18 00 96 41 beq cr5,4ba34 <kill+0x60> 4ba20: 01 f0 20 39 li r9,-4095 4ba24: 40 48 23 7c cmpld r3,r9 4ba28: 20 00 e0 4d bltlr+ 4ba2c: d0 00 63 7c neg r3,r3 4ba30: 08 00 00 48 b 4ba38 <kill+0x64> 4ba34: 20 00 e3 4c bnslr+ 4ba38: c8 32 fe 4b b 2ed00 <__syscall_error> ... 4ba44: 40 20 0c 00 .long 0xc2040 4ba48: 68 00 00 00 .long 0x68 4ba4c: 06 00 5f 5f rlwnm r31,r26,r0,0,3 4ba50: 6b 69 6c 6c xoris r12,r3,26987
*	linux: Implement pipe in terms of __NR_pipe2	Adhemerval Zanella	2021-11-30	10	-221/+3
\| \| \| \| \| \| \| \| \|	The syscall pipe2 was added in linux 2.6.27 and glibc requires linux 3.2.0. The patch removes the arch-specific implementation for alpha, ia64, mips, sh, and sparc which requires a different kernel ABI than the usual one. Checked on x86_64-linux-gnu and with a build for the affected ABIs.
*	linux: Implement mremap in C	Adhemerval Zanella	2021-11-30	3	-1/+42
\| \| \| \| \| \| \| \| \|	Variadic function calls in syscalls.list does not work for all ABIs (for instance where the argument are passed on the stack instead of registers) and might have underlying issues depending of the variadic type (for instance if a 64-bit argument is used). Checked on x86_64-linux-gnu.
*	linux: Add prlimit64 C implementation	Adhemerval Zanella	2021-11-30	18	-27/+47
\| \| \| \| \| \| \| \| \| \| \|	The LFS prlimit64 requires a arch-specific implementation in syscalls.list. Instead add a generic one that handles the required symbol alias for __RLIM_T_MATCHES_RLIM64_T. HPPA is the only outlier which requires a different default symbol. Checked on x86_64-linux-gnu and with build for the affected ABIs.
*	linux: Use /proc/stat fallback for __get_nprocs_conf (BZ #28624)	Adhemerval Zanella	2021-11-25	1	-25/+35
\| \| \| \| \| \| \|	The /proc/statm fallback was removed by f13fb81ad3159 if sysfs is not available, reinstate it. Checked on x86_64-linux-gnu.
*	linux: Add fanotify_mark C implementation	Adhemerval Zanella	2021-11-25	18	-22/+42
\| \| \| \| \| \| \| \| \|	Passing 64-bit arguments on syscalls.list is tricky: it requires to reimplement the expected kernel abi in each architecture. This is way to better to represent in C code where we already have macros for this (SYSCALL_LL64). Checked on x86_64-linux-gnu.
*	linux: Only build fstatat fallback if required	Adhemerval Zanella	2021-11-25	1	-7/+11
\| \| \| \| \| \| \|	For 32-bit architecture with __ASSUME_STATX there is no need to build fstatat64_time64_stat. Checked on i686-linux-gnu.
*	x86-64: Add vector sin/sinf to libmvec microbenchmark	Sunil K Pandey	2021-11-24	3	-0/+8201
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add vector sin/sinf and input files to libmvec microbenchmark. libmvec-sin-inputs: 90% Normal random distribution range: (-DBL_MAX, DBL_MAX) mean: 0.0 sigma: 5.0 10% uniform random distribution in range (-1000.0, 1000.0) libmvec-sinf-inputs: 90% Normal random distribution range: (-FLT_MAX, FLT_MAX) mean: 0.0f sigma: 5.0f 10% uniform random distribution in range (-1000.0f, 1000.0f) Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	x86-64: Add vector pow/powf to libmvec microbenchmark	Sunil K Pandey	2021-11-24	3	-0/+8201
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add vector pow/powf and input files to libmvec microbenchmark. libmvec-pow-inputs: arg1: 90% Normal random distribution range: (0.0, 256.0) mean: 0.0 sigma: 32.0 10% uniform random distribution in range (0.0, 256.0) arg2: 90% Normal random distribution range: (-127.0, 127.0) mean: 0.0 sigma: 16.0 10% uniform random distribution in range (-127.0, 127.0) libmvec-powf-inputs: arg1: 90% Normal random distribution range: (0.0f, 100.0f) mean: 0.0f sigma: 16.0f 10% uniform random distribution in range (0.0f, 100.0f) arg2: 90% Normal random distribution range: (-10.0f, 10.0f) mean: 0.0f sigma: 8.0f 10% uniform random distribution in range (-10.0f, 10.0f) Reviewed-by: H.J. Lu <hjl.tools@gmail.com>