mirror/glibc - mirror of git://sourceware.org/git/glibc.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	x86_64: Add avx2 optimized __memcmpeq in memcmpeq-avx2.S	Noah Goldstein	2021-10-27	4	-9/+308
\| \| \| \| \| \| \| \| \| \| \| \| \|	No bug. This commit adds new optimized __memcmpeq implementation for avx2. The primary optimizations are: 1) skipping the logic to find the difference of the first mismatched byte. 2) not updating src/dst addresses as the non-equals logic does not need to be reused by different areas.
*	x86_64: Add sse2 optimized __memcmpeq in memcmp-sse2.S	Noah Goldstein	2021-10-27	1	-4/+51
\| \| \| \| \| \|	No bug. This commit does not modify any of the memcmp implementation. It just adds __memcmpeq ifdefs to skip obvious cases where computing the proper 1/-1 required by memcmp is not needed.
*	x86_64: Add support for __memcmpeq using sse2, avx2, and evex	Noah Goldstein	2021-10-27	12	-9/+202
\| \| \| \| \| \|	No bug. This commit adds support for __memcmpeq to be implemented seperately from memcmp. Support is added for versions optimized with sse2, avx2, and evex.
*	String: Add hidden defs for __memcmpeq() to enable internal usage	Noah Goldstein	2021-10-26	25	-0/+36
\| \| \| \| \| \| \| \|	No bug. This commit adds hidden defs for all declarations of __memcmpeq. This enables usage of __memcmpeq without the PLT for usage internal to GLIBC.
*	String: Add support for __memcmpeq() ABI on all targets	Noah Goldstein	2021-10-26	60	-0/+82
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	No bug. This commit adds support for __memcmpeq() as a new ABI for all targets. In this commit __memcmpeq() is implemented only as an alias to the corresponding targets memcmp() implementation. __memcmpeq() is added as a new symbol starting with GLIBC_2.35 and defined in string.h with comments explaining its behavior. Basic tests that it is callable and works where added in string/tester.c As discussed in the proposal "Add new ABI '__memcmpeq()' to libc" __memcmpeq() is essentially a reserved namespace for bcmp(). The means is shares the same specifications as memcmp() except the return value for non-equal byte sequences is any non-zero value. This is less strict than memcmp()'s return value specification and can be better optimized when a boolean return is all that is needed. __memcmpeq() is meant to only be called by compilers if they can prove that the return value of a memcmp() call is only used for its boolean value. All tests in string/tester.c passed. As well build succeeds on x86_64-linux-gnu target.
*	configure: Don't check LD -v --help for LIBC_LINKER_FEATURE	Fangrui Song	2021-10-25	1	-11/+8
\| \| \| \| \| \| \| \| \|	When LIBC_LINKER_FEATURE is used to check a linker option with the equal sign, it will likely fail because the LD -v --help output may look like `-z lam-report=[none\|warning\|error]` while the needle is something like `-z lam-report=warning`. The LD -v --help filter doesn't save much time, so just remove it.
*	x86: Replace sse2 instructions with avx in memcmp-evex-movbe.S	Noah Goldstein	2021-10-23	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit replaces two usages of SSE2 'movups' with AVX 'vmovdqu'. it could potentially be dangerous to use SSE2 if this function is ever called without using 'vzeroupper' beforehand. While compilers appear to use 'vzeroupper' before function calls if AVX2 has been used, using SSE2 here is more brittle. Since it is not absolutely necessary it should be avoided. It costs 2-extra bytes but the extra bytes should only eat into alignment padding. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	x86_64: Add missing libmvec ABI tests	Sunil K Pandey	2021-10-22	43	-8/+152
\| \| \| \| \| \|	Add vector ABI tests for cos, exp, log, pow and sin functions. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	elf: Fix e6fd79f379 build with --enable-tunables=no	Adhemerval Zanella	2021-10-21	1	-0/+9
\| \| \| \| \| \|	The _dl_sort_maps_init() is not defined when tunables is not enabled. Checked on x86_64-linux-gnu.
*	elf: Fix slow DSO sorting behavior in dynamic loader (BZ #17645)	Chung-Lin Tang	2021-10-21	1	-1/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This second patch contains the actual implementation of a new sorting algorithm for shared objects in the dynamic loader, which solves the slow behavior that the current "old" algorithm falls into when the DSO set contains circular dependencies. The new algorithm implemented here is simply depth-first search (DFS) to obtain the Reverse-Post Order (RPO) sequence, a topological sort. A new l_visited:1 bitfield is added to struct link_map to more elegantly facilitate such a search. The DFS algorithm is applied to the input maps[nmap-1] backwards towards maps[0]. This has the effect of a more "shallow" recursion depth in general since the input is in BFS. Also, when combined with the natural order of processing l_initfini[] at each node, this creates a resulting output sorting closer to the intuitive "left-to-right" order in most cases. Another notable implementation adjustment related to this _dl_sort_maps change is the removing of two char arrays 'used' and 'done' in _dl_close_worker to represent two per-map attributes. This has been changed to simply use two new bit-fields l_map_used:1, l_map_done:1 added to struct link_map. This also allows discarding the clunky 'used' array sorting that _dl_sort_maps had to sometimes do along the way. Tunable support for switching between different sorting algorithms at runtime is also added. A new tunable 'glibc.rtld.dynamic_sort' with current valid values 1 (old algorithm) and 2 (new DFS algorithm) has been added. At time of commit of this patch, the default setting is 1 (old algorithm). Signed-off-by: Chung-Lin Tang <cltang@codesourcery.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
*	linux: Fix a possibly non-constant expression in _Static_assert	Fangrui Song	2021-10-20	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	According to C11 6.6p6, `const int` as an operand may not make up a constant expression. GCC -O0 errors: ../sysdeps/unix/sysv/linux/opendir.c:107:19: error: static_assert expression is not an integral constant expression _Static_assert (allocation_size >= sizeof (struct dirent64), -O2 -Wpedantic has a similar warning. See https://gcc.gnu.org/PR102502 for GCC's inconsistency. Use enum which is guaranteed to be a constant expression. This also makes the file compilable with Clang. Fixes: 4b962c9e859de23b461d61f860dbd3f21311e83a ("linux: Simplify opendir buffer allocation")
*	x86-64: Add sysdeps/x86_64/fpu/Makeconfig	H.J. Lu	2021-10-20	3	-139/+155
\| \| \| \| \| \| \| \| \| \|	1. Add sysdeps/x86_64/fpu/Makeconfig to auto-generate libmvec.mk, which contains libmvec ABI test dependencies and CFLAGS, in the build directory. 2. Include libmvec.mk for libmvec ABI test dependencies and CFLAGS. Tested on SSE4, AVX, AVX2 and AVX512 machines. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
*	powerpc: Remove backtrace implementation	Adhemerval Zanella	2021-10-20	5	-277/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The powerpc optimization to provide a fast stacktrace requires some ad-hoc code to handle Linux signal frames and the change is fragile once the kernel decides to slight change its execution sequence [1]. The generic implementation work as-is and it should be future proof since the kernel provides the expected CFI directives in vDSO shared page. Checked on powerpc-linux-gnu, powerpc64le-linux-gnu, and powerpc64-linux-gnu. [1] https://sourceware.org/pipermail/libc-alpha/2021-January/122027.html
*	ld.so: Initialize bootstrap_map.l_ld_readonly [BZ #28340]	H.J. Lu	2021-10-19	4	-27/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. Define DL_RO_DYN_SECTION to initalize bootstrap_map.l_ld_readonly before calling elf_get_dynamic_info to get dynamic info in bootstrap_map, 2. Define a single static inline bool dl_relocate_ld (const struct link_map l) { / Don't relocate dynamic section if it is readonly */ return !(l->l_ld_readonly \|\| DL_RO_DYN_SECTION); } This updates BZ #28340 fix.
*	timex: Use 64-bit fields on 32-bit TIMESIZE=64 systems (BZ #28469)	Stafford Horne	2021-10-18	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This was found when testing the OpenRISC port I am working on. These two tests fail with SIGSEGV: FAIL: misc/tst-ntp_gettime FAIL: misc/tst-ntp_gettimex This was found to be due to the kernel overwriting the stack space allocated by the timex structure. The reason for the overwrite being that the kernel timex has 64-bit fields and user space code only allocates enough stack space for timex with 32-bit fields. On 32-bit systems with TIMESIZE=64 __USE_TIME_BITS64 is not defined. This causes the timex structure to use 32-bit fields with type __syscall_slong_t. This patch adjusts the ifdef condition to allow 32-bit systems with TIMESIZE=64 to use the 64-bit long long timex definition. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
*	hurd if_index: Explicitly use AF_INET for if index discovery	Samuel Thibault	2021-10-18	1	-3/+3
\| \| \| \| \| \| \|	5bf07e1b3a74 ("Linux: Simplify __opensock and fix race condition [BZ #28353]") made __opensock try NETLINK then UNIX then INET. On the Hurd, only INET knows about network interfaces, so better actually specify that in if_index.
*	hurd: Fix intr-msg parameter/stack kludge	Samuel Thibault	2021-10-18	1	-10/+39
\| \| \| \| \| \| \| \| \| \| \|	INTR_MSG_TRAP was tinkering with esp to make it point to _hurd_intr_rpc_mach_msg's parameters, and notably use (&msg)[-1] which is meaningless in C. Instead, just push the parameters on the stack, which also avoids leaving local variables of _hurd_intr_rpc_mach_msg below esp. We now also properly express that OPTION and TIMEOUT may be updated during the trap call.
*	x86-64: Add test-vector-abi.h/test-vector-abi-sincos.h	H.J. Lu	2021-10-14	17	-172/+80
\| \| \| \| \|	Add templates for vector ABI test and use them for vector sincos/sincosf ABI tests.
*	elf: Fix dynamic-link.h usage on rtld.c	Adhemerval Zanella	2021-10-14	29	-110/+199
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The 4af6982e4c fix does not fully handle RTLD_BOOTSTRAP usage on rtld.c due two issues: 1. RTLD_BOOTSTRAP is also used on dl-machine.h on various architectures and it changes the semantics of various machine relocation functions. 2. The elf_get_dynamic_info() change was done sideways, previously to 490e6c62aa get-dynamic-info.h was included by the first dynamic-link.h include without RTLD_BOOTSTRAP being defined. It means that the code within elf_get_dynamic_info() that uses RTLD_BOOTSTRAP is in fact unused. To fix 1. this patch now includes dynamic-link.h only once with RTLD_BOOTSTRAP defined. The ELF_DYNAMIC_RELOCATE call will now have the relocation fnctions with the expected semantics for the loader. And to fix 2. part of 4af6982e4c is reverted (the check argument elf_get_dynamic_info() is not required) and the RTLD_BOOTSTRAP pieces are removed. To reorganize the includes the static TLS definition is moved to its own header to avoid a circular dependency (it is defined on dynamic-link.h and dl-machine.h requires it at same time other dynamic-link.h definition requires dl-machine.h defitions). Also ELF_MACHINE_NO_REL, ELF_MACHINE_NO_RELA, and ELF_MACHINE_PLT_REL are moved to its own header. Only ancient ABIs need special values (arm, i386, and mips), so a generic one is used as default. The powerpc Elf64_FuncDesc is also moved to its own header, since csu code required its definition (which would require either include elf/ folder or add a full path with elf/). Checked on x86_64, i686, aarch64, armhf, powerpc64, powerpc32, and powerpc64le. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
*	x86: Optimize memset-vec-unaligned-erms.S	Noah Goldstein	2021-10-12	5	-95/+232
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	No bug. Optimization are 1. change control flow for L(more_2x_vec) to fall through to loop and jump for L(less_4x_vec) and L(less_8x_vec). This uses less code size and saves jumps for length > 4x VEC_SIZE. 2. For EVEX/AVX512 move L(less_vec) closer to entry. 3. Avoid complex address mode for length > 2x VEC_SIZE 4. Slightly better aligning code for the loop from the perspective of code size and uops. 5. Align targets so they make full use of their fetch block and if possible cache line. 6. Try and reduce total number of icache lines that will need to be pulled in for a given length. 7. Include "local" version of stosb target. For AVX2/EVEX/AVX512 jumping to the stosb target in the sse2 code section will almost certainly be to a new page. The new version does increase code size marginally by duplicating the target but should get better iTLB behavior as a result. test-memset, test-wmemset, and test-bzero are all passing. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	x86: Optimize memcmp-evex-movbe.S for frontend behavior and size	Noah Goldstein	2021-10-12	1	-192/+242
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	No bug. The frontend optimizations are to: 1. Reorganize logically connected basic blocks so they are either in the same cache line or adjacent cache lines. 2. Avoid cases when basic blocks unnecissarily cross cache lines. 3. Try and 32 byte align any basic blocks possible without sacrificing code size. Smaller / Less hot basic blocks are used for this. Overall code size shrunk by 168 bytes. This should make up for any extra costs due to aligning to 64 bytes. In general performance before deviated a great deal dependending on whether entry alignment % 64 was 0, 16, 32, or 48. These changes essentially make it so that the current implementation is at least equal to the best alignment of the original for any arguments. The only additional optimization is in the page cross case. Branch on equals case was removed from the size == [4, 7] case. As well the [4, 7] and [2, 3] case where swapped as [4, 7] is likely a more hot argument size. test-memcmp and test-wmemcmp are both passing.
*	elf: Fix elf_get_dynamic_info definition	Adhemerval Zanella	2021-10-12	3	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before to 490e6c62aa31a8a ('elf: Avoid nested functions in the loader [BZ #27220]'), elf_get_dynamic_info() was defined twice on rtld.c: on the first dynamic-link.h include and later within _dl_start(). The former definition did not define DONT_USE_BOOTSTRAP_MAP and it is used on setup_vdso() (since it is a global definition), while the former does define DONT_USE_BOOTSTRAP_MAP and it is used on loader self-relocation. With the commit change, the function is now included and defined once instead of defined as a nested function. So rtld.c defines without defining RTLD_BOOTSTRAP and it brokes at least powerpc32. This patch fixes by moving the get-dynamic-info.h include out of dynamic-link.h, which then the caller can corirectly set the expected semantic by defining STATIC_PIE_BOOTSTRAP, RTLD_BOOTSTRAP, and/or RESOLVE_MAP. It also required to enable some asserts only for the loader bootstrap to avoid issues when called from setup_vdso(). As a side note, this is another issues with nested functions: it is not clear from pre-processed output (-E -dD) how the function will be build and its semantic (since nested function will be local and extra C defines may change it). I checked on x86_64-linux-gnu (w/o --enable-static-pie), i686-linux-gnu, powerpc64-linux-gnu, powerpc-linux-gnu-power4, aarch64-linux-gnu, arm-linux-gnu, sparc64-linux-gnu, and s390x-linux-gnu. Reviewed-by: Fangrui Song <maskray@google.com>
*	Fix nios2 localplt failure	Joseph Myers	2021-10-11	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	Building for nios2-linux-gnu has recently started showing a localplt test failure, arising from a reference to __floatunsidf from getloadavg after commit b5c8a3aa82f66f49b731ca5204104cee48bccfa5 ("Linux: implement getloadavg(3) using sysinfo(2)") (this is an architecture with soft-fp in libc). Add this as a permitted local PLT reference in localplt.data. Tested with build-many-glibcs.py for nios2-linux-gnu.
*	elf: Remove Intel MPX support (lazy PLT, ld.so profile, and LD_AUDIT)	Fangrui Song	2021-10-11	10	-183/+5
\| \| \| \| \| \| \| \| \| \|	Intel MPX failed to gain wide adoption and has been deprecated for a while. GCC 9.1 removed Intel MPX support. Linux kernel removed MPX in 2019. This patch removes the support code from the dynamic loader. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	x86: Modify ENTRY in sysdep.h so that p2align can be specified	Noah Goldstein	2021-10-08	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \|	No bug. This change adds a new macro ENTRY_P2ALIGN which takes a second argument, log2 of the desired function alignment. The old ENTRY(name) macro is just ENTRY_P2ALIGN(name, 4) so this doesn't affect any existing functionality. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
*	Linux: implement getloadavg(3) using sysinfo(2)	Cristian Rodríguez	2021-10-08	1	-36/+14
\| \| \| \| \|	Signed-off-by: Cristian Rodríguez <crrodriguez@opensuse.org> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
*	elf: Avoid nested functions in the loader [BZ #27220]	Fangrui Song	2021-10-07	21	-216/+272
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	dynamic-link.h is included more than once in some elf/ files (rtld.c, dl-conflict.c, dl-reloc.c, dl-reloc-static-pie.c) and uses GCC nested functions. This harms readability and the nested functions usage is the biggest obstacle prevents Clang build (Clang doesn't support GCC nested functions). The key idea for unnesting is to add extra parameters (struct link_map and struct r_scope_elm []) to RESOLVE_MAP, ELF_MACHINE_BEFORE_RTLD_RELOC, ELF_DYNAMIC_RELOCATE, elf_machine_rel[a], elf_machine_lazy_rel, and elf_machine_runtime_setup. (This is inspired by Stan Shebs' ppc64/x86-64 implementation in the google/grte/v5-2.27/master which uses mixed extra parameters and static variables.) Future simplification: * If mips elf_machine_runtime_setup no longer needs RESOLVE_GOTSYM, elf_machine_runtime_setup can drop the `scope` parameter. * If TLSDESC no longer need to be in elf_machine_lazy_rel, elf_machine_lazy_rel can drop the `scope` parameter. Tested on aarch64, i386, x86-64, powerpc64le, powerpc64, powerpc32, sparc64, sparcv9, s390x, s390, hppa, ia64, armhf, alpha, and mips64. In addition, tested build-many-glibcs.py with {arc,csky,microblaze,nios2}-linux-gnu and riscv64-linux-gnu-rv64imafdc-lp64d. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
*	Add run-time check for indirect external access	H.J. Lu	2021-10-07	1	-0/+54
\| \| \| \| \| \| \| \| \| \| \| \| \|	When performing symbol lookup for references in executable without indirect external access: 1. Disallow copy relocations in executable against protected data symbols in a shared object with indirect external access. 2. Disallow non-zero symbol values of undefined function symbols in executable, which are used as the function pointer, against protected function symbols in a shared object with indirect external access. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
*	Initial support for GNU_PROPERTY_1_NEEDED	H.J. Lu	2021-10-07	4	-7/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. Add GNU_PROPERTY_1_NEEDED: #define GNU_PROPERTY_1_NEEDED GNU_PROPERTY_UINT32_OR_LO to indicate the needed properties by the object file. 2. Add GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS: #define GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS (1U << 0) to indicate that the object file requires canonical function pointers and cannot be used with copy relocation. 3. Scan GNU_PROPERTY_1_NEEDED property and store it in l_1_needed. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
*	S390: Add PCI_MIO and SIE HWCAPs	Stefan Liebler	2021-10-07	3	-3/+12
\| \| \| \| \| \| \| \| \| \| \| \| \|	Both new HWCAPs were introduced in these kernel commits: - 7e8403ecaf884f307b627f3c371475913dd29292 "s390: add HWCAP_S390_PCI_MIO to ELF hwcaps" - 7e82523f2583e9813e4109df3656707162541297 "s390/hwcaps: make sie capability regular hwcap" Also note that the kernel commit 511ad531afd4090625def4d9aba1f5227bd44b8e "s390/hwcaps: shorten HWCAP defines" has shortened the prefix of the macros from "HWCAP_S390_" to "HWCAP_". For compatibility reasons, we do not change the prefix in public glibc header file.
*	S390: update libm test ulps	Stefan Liebler	2021-10-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	Update after commit 6bbf7298323bf31bc43494b2201465a449778e10. Fixed inaccuracy of j0f (BZ #28185) See also e.g. commit c75b106145c30e6c7bcf87f384a5c68ce56406e9 aarch64: update libm test ulps
*	powerpc: update libm test ulps	Adhemerval Zanella	2021-10-06	1	-1/+1
\| \| \| \| \|	Update after commit 6bbf7298323bf31bc43494b2201465a449778e10 (Fixed inaccuracy of j0f (BZ #28185)).
*	y2038: Use a common definition for stat for sparc32	Adhemerval Zanella	2021-10-06	1	-23/+31
\| \| \| \| \| \|	The sparc32 misses support for support done by 4e8521333bea6. Checked on sparcv9-linux-gnu.
*	aarch64: update libm test ulps	Szabolcs Nagy	2021-10-05	1	-1/+1
\| \| \| \| \| \| \|	Update after commit 6bbf7298323bf31bc43494b2201465a449778e10. Fixed inaccuracy of j0f (BZ #28185)
*	Fixed inaccuracy of j0f (BZ #28185)	Paul Zimmermann	2021-10-05	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	The largest errors over the full binary32 range are after this patch (on x86_64): RNDN: libm wrong by up to 9.00e+00 ulp(s) [9] for x=0x1.04c39cp+6 RNDZ: libm wrong by up to 9.00e+00 ulp(s) [9] for x=0x1.04c39cp+6 RNDU: libm wrong by up to 9.00e+00 ulp(s) [9] for x=0x1.04c39cp+6 RNDD: libm wrong by up to 8.98e+00 ulp(s) [9] for x=0x1.4b7066p+7 Inputs that were yielding huge errors have been added to "make check". Reviewed-by: Adhemeral Zanella <adhemerval.zanella@linaro.org>
*	elf: Avoid deadlock between pthread_create and ctors [BZ #28357]	Szabolcs Nagy	2021-10-04	4	-3/+176
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The fix for bug 19329 caused a regression such that pthread_create can deadlock when concurrent ctors from dlopen are waiting for it to finish. Use a new GL(dl_load_tls_lock) in pthread_create that is not taken around ctors in dlopen. The new lock is also used in __tls_get_addr instead of GL(dl_load_lock). The new lock is held in _dl_open_worker and _dl_close_worker around most of the logic before/after the init/fini routines. When init/fini routines are running then TLS is in a consistent, usable state. In _dl_open_worker the new lock requires catching and reraising dlopen failures that happen in the critical section. The new lock is reinitialized in a fork child, to keep the existing behaviour and it is kept recursive in case malloc interposition or TLS access from signal handlers can retake it. It is not obvious if this is necessary or helps, but avoids changing the preexisting behaviour. The new lock may be more appropriate for dl_iterate_phdr too than GL(dl_load_write_lock), since TLS state of an incompletely loaded module may be accessed. If the new lock can replace the old one, that can be a separate change. Fixes bug 28357. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
*	nptl: pthread_kill must send signals to a specific thread [BZ #28407]	Florian Weimer	2021-10-01	2	-0/+93
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The choice between the kill vs tgkill system calls is not just about the TID reuse race, but also about whether the signal is sent to the whole process (and any thread in it) or to a specific thread. This was caught by the openposix test suite: LTP: openposix test suite - FAIL: SIGUSR1 is member of new thread pendingset. <https://gitlab.com/cki-project/kernel-tests/-/issues/764> Fixes commit 526c3cf11ee9367344b6b15d669e4c3cb461a2be ("nptl: Fix race between pthread_kill and thread exit (bug 12889)"). Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@redhat.com>
*	nptl: Add CLOCK_MONOTONIC support for PI mutexes	Adhemerval Zanella	2021-10-01	2	-15/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Linux added FUTEX_LOCK_PI2 to support clock selection (commit bf22a6976897977b0a3f1aeba6823c959fc4fdae). With the new flag we can now proper support CLOCK_MONOTONIC for pthread_mutex_clocklock with Priority Inheritance. If kernel does not support, EINVAL is returned instead. The difference is the futex operation will be issued and the kernel will advertise the missing support (instead of hard-code error return). Checked on x86_64-linux-gnu and i686-linux-gnu on Linux 5.14, 5.11, and 4.15.
*	nptl: Use FUTEX_LOCK_PI2 when available	Adhemerval Zanella	2021-10-01	2	-54/+5
\| \| \| \| \| \| \| \| \| \| \|	This patch uses the new futex PI operation provided by Linux v5.14 when it is required. The futex_lock_pi64() is moved to futex-internal.c (since it used on two different places and its code size might be large depending of the kernel configuration) and clockid is added as an argument. Co-authored-by: Kurt Kanzenbach <kurt@linutronix.de>
*	Linux: Add FUTEX_LOCK_PI2	Kurt Kanzenbach	2021-10-01	1	-0/+8
\| \| \| \| \| \| \| \| \| \|	Linux v5.14.0 introduced a new futex operation called FUTEX_LOCK_PI2. This kernel feature can be used to implement pthread_mutex_clocklock(MONOTONIC)/PI. Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
*	Update alpha libm-test-ulps	Adhemerval Zanella	2021-09-30	1	-49/+53
\|
*	powerpc: Fix unrecognized instruction errors with recent binutils	Paul A. Clarke	2021-09-29	2	-6/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Recent versions of binutils (with commit b25f942e18d6ecd7ec3e2d2e9930eb4f996c258a) stopped preserving "sticky" options across a base `.machine` directive, nullifying the use of passing "-many" through GCC to the assembler. As a result, some instructions which were recognized even under older, more stringent `.machine` directives become unrecognized instructions in that context. In `sysdeps/powerpc/tst-set_ppr.c`, the use of the `mfppr32` extended mnemonic became unrecognized, as the default compilation with GCC for 32bit powerpc adds a `.machine ppc` in the resulting assembly, so the command line option `-Wa,-many` is essentially ignored, and the ISA 2.06 instructions and mnemonics, like `mfppr32`, are unrecognized. The compilation of `sysdeps/powerpc/tst-set_ppr.c` fails with: Error: unrecognized opcode: `mfppr32' Add appropriate `.machine` directives in the assembly to bracket the `mfppr32` instruction. Part of a 2019 fix (commit 9250e6610fdb0f3a6f238d2813e319a41fb7a810) to the above test's Makefile to add `-many` to the compilation when GCC itself stopped passing `-many` to the assember no longer has any effect, so remove that. Reported-by: Joseph Myers <joseph@codesourcery.com>
*	Add fmaximum, fminimum functions	Joseph Myers	2021-09-28	42	-1/+1967
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	C2X adds new <math.h> functions for floating-point maximum and minimum, corresponding to the new operations that were added in IEEE 754-2019 because of concerns about the old operations not being associative in the presence of signaling NaNs. fmaximum and fminimum handle NaNs like most <math.h> functions (any NaN argument means the result is a quiet NaN). fmaximum_num and fminimum_num handle both quiet and signaling NaNs the way fmax and fmin handle quiet NaNs (if one argument is a number and the other is a NaN, return the number), but still raise "invalid" for a signaling NaN argument, making them exceptions to the normal rule that a function with a floating-point result raising "invalid" also returns a quiet NaN. fmaximum_mag, fminimum_mag, fmaximum_mag_num and fminimum_mag_num are corresponding functions returning the argument with greatest or least absolute value. All these functions also treat +0 as greater than -0. There are also corresponding <tgmath.h> type-generic macros. Add these functions to glibc. The implementations use type-generic templates based on those for fmax, fmin, fmaxmag and fminmag, and test inputs are based on those for those functions with appropriate adjustments to the expected results. The RISC-V maintainers might wish to add optimized versions of fmaximum_num and fminimum_num (for float and double), since RISC-V (F extension version 2.2 and later) provides instructions corresponding to those functions - though it might be at least as useful to add architecture-independent built-in functions to GCC and teach the RISC-V back end to expand those functions inline, which is what you generally want for functions that can be implemented with a single instruction. Tested for x86_64 and x86, and with build-many-glibcs.py.
*	Linux: Simplify __opensock and fix race condition [BZ #28353]	Florian Weimer	2021-09-28	2	-116/+0
\| \| \| \| \| \| \| \| \| \| \| \|	AF_NETLINK support is not quite optional on modern Linux systems anymore, so it is likely that the first attempt will always succeed. Consequently, there is no need to cache the result. Keep AF_UNIX and the Internet address families as a fallback, for the rare case that AF_NETLINK is missing. The other address families previously probed are totally obsolete be now, so remove them. Use this simplified version as the generic implementation, disabling Netlink support as needed.
*	pthread/tst-cancel28: Fix barrier re-init race condition	Stafford Horne	2021-09-28	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When running this test on the OpenRISC port I am working on this test fails with a timeout. The test passes when being straced or debugged. Looking at the code there seems to be a race condition in that: 1 main thread: calls xpthread_cancel 2 sub thread : receives cancel signal 3 sub thread : cleanup routine waits on barrier 4 main thread: re-inits barrier 5 main thread: waits on barrier After getting to 5 the main thread and sub thread wait forever as the 2 barriers are no longer the same. Removing the barrier re-init seems to fix this issue. Also, the barrier does not need to be reinitialized as that is done by default. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
*	powerpc: Delete unneeded ELF_MACHINE_BEFORE_RTLD_RELOC	Fangrui Song	2021-09-27	2	-4/+0
\| \| \| \|	Reviewed-by: Raphael M Zinsly <rzinsly@linux.ibm.com>
*	posix: Remove spawni.c	Adhemerval Zanella	2021-09-27	1	-343/+0
\| \| \| \| \| \| \| \|	Although it provide an alternate implementation that communicates using pipe() instead of shared memory, no port uses and it adds extra burden for posix_spawn() extensions. Reviewed-by: Florian Weimer <fweimer@redhat.com>
*	Disable symbol hack in libc_nonshared.a	H.J. Lu	2021-09-27	2	-2/+4
\| \| \| \| \|	Don't reference __GI_memmove, __GI_memset, __GI_memcpy, __divdi3_internal, __udivdi3_internal and __moddi3_internal in libc_nonshared.a.
*	linux: Revert the use of sched_getaffinity on get_nproc (BZ #28310)	Adhemerval Zanella	2021-09-27	1	-5/+134
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The use of sched_getaffinity on get_nproc and sysconf (_SC_NPROCESSORS_ONLN) done in 903bc7dcc2acafc40 (BZ #27645) breaks the top command in common hypervisor configurations and also other monitoring tools. The main issue using sched_getaffinity changed the symbols semantic from system-wide scope of online CPUs to per-process one (which can be changed with kernel cpusets or book parameters in VM). This patch reverts mostly of the 903bc7dcc2acafc40, with the exceptions: * No more cached values and atomic updates, since they are inherent racy. * No /proc/cpuinfo fallback, since /proc/stat is already used and it would require to revert more arch-specific code. * The alloca is replace with a static buffer of 1024 bytes. So the implementation first consult the sysfs, and fallbacks to procfs. Checked on x86_64-linux-gnu. Reviewed-by: Florian Weimer <fweimer@redhat.com>
*	linux: Simplify get_nprocs	Adhemerval Zanella	2021-09-27	1	-50/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch simplifies the memory allocation code and uses the sched routines instead of reimplement it. This still uses a stack allocation buffer, so it can be used on malloc initialization code. Linux currently supports at maximum of 4096 cpus for most architectures: $ find -iname Kconfig \| xargs git grep -A10 -w NR_CPUS \| grep -w range arch/alpha/Kconfig- range 2 32 arch/arc/Kconfig- range 2 4096 arch/arm/Kconfig- range 2 16 if DEBUG_KMAP_LOCAL arch/arm/Kconfig- range 2 32 if !DEBUG_KMAP_LOCAL arch/arm64/Kconfig- range 2 4096 arch/csky/Kconfig- range 2 32 arch/hexagon/Kconfig- range 2 6 if SMP arch/ia64/Kconfig- range 2 4096 arch/mips/Kconfig- range 2 256 arch/openrisc/Kconfig- range 2 32 arch/parisc/Kconfig- range 2 32 arch/riscv/Kconfig- range 2 32 arch/s390/Kconfig- range 2 512 arch/sh/Kconfig- range 2 32 arch/sparc/Kconfig- range 2 32 if SPARC32 arch/sparc/Kconfig- range 2 4096 if SPARC64 arch/um/Kconfig- range 1 1 arch/x86/Kconfig-# [NR_CPUS_RANGE_BEGIN ... NR_CPUS_RANGE_END] range. arch/x86/Kconfig- range NR_CPUS_RANGE_BEGIN NR_CPUS_RANGE_END arch/xtensa/Kconfig- range 2 32 With x86 supporting 8192: arch/x86/Kconfig 976 config NR_CPUS_RANGE_END 977 int 978 depends on X86_64 979 default 8192 if SMP && CPUMASK_OFFSTACK 980 default 512 if SMP && !CPUMASK_OFFSTACK 981 default 1 if !SMP So using a maximum of 32k cpu should cover all cases (and I would expect once we start to have many more CPUs that Linux would provide a more straightforward way to query for such information). A test is added to check if sched_getaffinity can successfully return with large buffers. Checked on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Florian Weimer <fweimer@redhat.com>