about summary refs log tree commit diff
path: root/sysdeps
Commit message (Collapse)AuthorAgeFilesLines
* x86: Optimize memset-vec-unaligned-erms.SNoah Goldstein2021-10-125-95/+232
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No bug. Optimization are 1. change control flow for L(more_2x_vec) to fall through to loop and jump for L(less_4x_vec) and L(less_8x_vec). This uses less code size and saves jumps for length > 4x VEC_SIZE. 2. For EVEX/AVX512 move L(less_vec) closer to entry. 3. Avoid complex address mode for length > 2x VEC_SIZE 4. Slightly better aligning code for the loop from the perspective of code size and uops. 5. Align targets so they make full use of their fetch block and if possible cache line. 6. Try and reduce total number of icache lines that will need to be pulled in for a given length. 7. Include "local" version of stosb target. For AVX2/EVEX/AVX512 jumping to the stosb target in the sse2 code section will almost certainly be to a new page. The new version does increase code size marginally by duplicating the target but should get better iTLB behavior as a result. test-memset, test-wmemset, and test-bzero are all passing. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* x86: Optimize memcmp-evex-movbe.S for frontend behavior and sizeNoah Goldstein2021-10-121-192/+242
| | | | | | | | | | | | | | | | | | | | | | | | | | No bug. The frontend optimizations are to: 1. Reorganize logically connected basic blocks so they are either in the same cache line or adjacent cache lines. 2. Avoid cases when basic blocks unnecissarily cross cache lines. 3. Try and 32 byte align any basic blocks possible without sacrificing code size. Smaller / Less hot basic blocks are used for this. Overall code size shrunk by 168 bytes. This should make up for any extra costs due to aligning to 64 bytes. In general performance before deviated a great deal dependending on whether entry alignment % 64 was 0, 16, 32, or 48. These changes essentially make it so that the current implementation is at least equal to the best alignment of the original for any arguments. The only additional optimization is in the page cross case. Branch on equals case was removed from the size == [4, 7] case. As well the [4, 7] and [2, 3] case where swapped as [4, 7] is likely a more hot argument size. test-memcmp and test-wmemcmp are both passing.
* elf: Fix elf_get_dynamic_info definitionAdhemerval Zanella2021-10-123-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before to 490e6c62aa31a8a ('elf: Avoid nested functions in the loader [BZ #27220]'), elf_get_dynamic_info() was defined twice on rtld.c: on the first dynamic-link.h include and later within _dl_start(). The former definition did not define DONT_USE_BOOTSTRAP_MAP and it is used on setup_vdso() (since it is a global definition), while the former does define DONT_USE_BOOTSTRAP_MAP and it is used on loader self-relocation. With the commit change, the function is now included and defined once instead of defined as a nested function. So rtld.c defines without defining RTLD_BOOTSTRAP and it brokes at least powerpc32. This patch fixes by moving the get-dynamic-info.h include out of dynamic-link.h, which then the caller can corirectly set the expected semantic by defining STATIC_PIE_BOOTSTRAP, RTLD_BOOTSTRAP, and/or RESOLVE_MAP. It also required to enable some asserts only for the loader bootstrap to avoid issues when called from setup_vdso(). As a side note, this is another issues with nested functions: it is not clear from pre-processed output (-E -dD) how the function will be build and its semantic (since nested function will be local and extra C defines may change it). I checked on x86_64-linux-gnu (w/o --enable-static-pie), i686-linux-gnu, powerpc64-linux-gnu, powerpc-linux-gnu-power4, aarch64-linux-gnu, arm-linux-gnu, sparc64-linux-gnu, and s390x-linux-gnu. Reviewed-by: Fangrui Song <maskray@google.com>
* Fix nios2 localplt failureJoseph Myers2021-10-111-0/+1
| | | | | | | | | | | Building for nios2-linux-gnu has recently started showing a localplt test failure, arising from a reference to __floatunsidf from getloadavg after commit b5c8a3aa82f66f49b731ca5204104cee48bccfa5 ("Linux: implement getloadavg(3) using sysinfo(2)") (this is an architecture with soft-fp in libc). Add this as a permitted local PLT reference in localplt.data. Tested with build-many-glibcs.py for nios2-linux-gnu.
* elf: Remove Intel MPX support (lazy PLT, ld.so profile, and LD_AUDIT)Fangrui Song2021-10-1110-183/+5
| | | | | | | | | | Intel MPX failed to gain wide adoption and has been deprecated for a while. GCC 9.1 removed Intel MPX support. Linux kernel removed MPX in 2019. This patch removes the support code from the dynamic loader. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* x86: Modify ENTRY in sysdep.h so that p2align can be specifiedNoah Goldstein2021-10-081-2/+5
| | | | | | | | | | | | No bug. This change adds a new macro ENTRY_P2ALIGN which takes a second argument, log2 of the desired function alignment. The old ENTRY(name) macro is just ENTRY_P2ALIGN(name, 4) so this doesn't affect any existing functionality. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
* Linux: implement getloadavg(3) using sysinfo(2)Cristian Rodríguez2021-10-081-36/+14
| | | | | Signed-off-by: Cristian Rodríguez <crrodriguez@opensuse.org> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* elf: Avoid nested functions in the loader [BZ #27220]Fangrui Song2021-10-0721-216/+272
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | dynamic-link.h is included more than once in some elf/ files (rtld.c, dl-conflict.c, dl-reloc.c, dl-reloc-static-pie.c) and uses GCC nested functions. This harms readability and the nested functions usage is the biggest obstacle prevents Clang build (Clang doesn't support GCC nested functions). The key idea for unnesting is to add extra parameters (struct link_map *and struct r_scope_elm *[]) to RESOLVE_MAP, ELF_MACHINE_BEFORE_RTLD_RELOC, ELF_DYNAMIC_RELOCATE, elf_machine_rel[a], elf_machine_lazy_rel, and elf_machine_runtime_setup. (This is inspired by Stan Shebs' ppc64/x86-64 implementation in the google/grte/v5-2.27/master which uses mixed extra parameters and static variables.) Future simplification: * If mips elf_machine_runtime_setup no longer needs RESOLVE_GOTSYM, elf_machine_runtime_setup can drop the `scope` parameter. * If TLSDESC no longer need to be in elf_machine_lazy_rel, elf_machine_lazy_rel can drop the `scope` parameter. Tested on aarch64, i386, x86-64, powerpc64le, powerpc64, powerpc32, sparc64, sparcv9, s390x, s390, hppa, ia64, armhf, alpha, and mips64. In addition, tested build-many-glibcs.py with {arc,csky,microblaze,nios2}-linux-gnu and riscv64-linux-gnu-rv64imafdc-lp64d. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* Add run-time check for indirect external accessH.J. Lu2021-10-071-0/+54
| | | | | | | | | | | | | When performing symbol lookup for references in executable without indirect external access: 1. Disallow copy relocations in executable against protected data symbols in a shared object with indirect external access. 2. Disallow non-zero symbol values of undefined function symbols in executable, which are used as the function pointer, against protected function symbols in a shared object with indirect external access. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* Initial support for GNU_PROPERTY_1_NEEDEDH.J. Lu2021-10-074-7/+26
| | | | | | | | | | | | | | | | | 1. Add GNU_PROPERTY_1_NEEDED: #define GNU_PROPERTY_1_NEEDED GNU_PROPERTY_UINT32_OR_LO to indicate the needed properties by the object file. 2. Add GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS: #define GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS (1U << 0) to indicate that the object file requires canonical function pointers and cannot be used with copy relocation. 3. Scan GNU_PROPERTY_1_NEEDED property and store it in l_1_needed. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* S390: Add PCI_MIO and SIE HWCAPsStefan Liebler2021-10-073-3/+12
| | | | | | | | | | | | | Both new HWCAPs were introduced in these kernel commits: - 7e8403ecaf884f307b627f3c371475913dd29292 "s390: add HWCAP_S390_PCI_MIO to ELF hwcaps" - 7e82523f2583e9813e4109df3656707162541297 "s390/hwcaps: make sie capability regular hwcap" Also note that the kernel commit 511ad531afd4090625def4d9aba1f5227bd44b8e "s390/hwcaps: shorten HWCAP defines" has shortened the prefix of the macros from "HWCAP_S390_" to "HWCAP_". For compatibility reasons, we do not change the prefix in public glibc header file.
* S390: update libm test ulpsStefan Liebler2021-10-061-1/+1
| | | | | | | | | | | Update after commit 6bbf7298323bf31bc43494b2201465a449778e10. Fixed inaccuracy of j0f (BZ #28185) See also e.g. commit c75b106145c30e6c7bcf87f384a5c68ce56406e9 aarch64: update libm test ulps
* powerpc: update libm test ulpsAdhemerval Zanella2021-10-061-1/+1
| | | | | Update after commit 6bbf7298323bf31bc43494b2201465a449778e10 (Fixed inaccuracy of j0f (BZ #28185)).
* y2038: Use a common definition for stat for sparc32Adhemerval Zanella2021-10-061-23/+31
| | | | | | The sparc32 misses support for support done by 4e8521333bea6. Checked on sparcv9-linux-gnu.
* aarch64: update libm test ulpsSzabolcs Nagy2021-10-051-1/+1
| | | | | | | Update after commit 6bbf7298323bf31bc43494b2201465a449778e10. Fixed inaccuracy of j0f (BZ #28185)
* Fixed inaccuracy of j0f (BZ #28185)Paul Zimmermann2021-10-051-3/+3
| | | | | | | | | | | | | The largest errors over the full binary32 range are after this patch (on x86_64): RNDN: libm wrong by up to 9.00e+00 ulp(s) [9] for x=0x1.04c39cp+6 RNDZ: libm wrong by up to 9.00e+00 ulp(s) [9] for x=0x1.04c39cp+6 RNDU: libm wrong by up to 9.00e+00 ulp(s) [9] for x=0x1.04c39cp+6 RNDD: libm wrong by up to 8.98e+00 ulp(s) [9] for x=0x1.4b7066p+7 Inputs that were yielding huge errors have been added to "make check". Reviewed-by: Adhemeral Zanella <adhemerval.zanella@linaro.org>
* elf: Avoid deadlock between pthread_create and ctors [BZ #28357]Szabolcs Nagy2021-10-044-3/+176
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The fix for bug 19329 caused a regression such that pthread_create can deadlock when concurrent ctors from dlopen are waiting for it to finish. Use a new GL(dl_load_tls_lock) in pthread_create that is not taken around ctors in dlopen. The new lock is also used in __tls_get_addr instead of GL(dl_load_lock). The new lock is held in _dl_open_worker and _dl_close_worker around most of the logic before/after the init/fini routines. When init/fini routines are running then TLS is in a consistent, usable state. In _dl_open_worker the new lock requires catching and reraising dlopen failures that happen in the critical section. The new lock is reinitialized in a fork child, to keep the existing behaviour and it is kept recursive in case malloc interposition or TLS access from signal handlers can retake it. It is not obvious if this is necessary or helps, but avoids changing the preexisting behaviour. The new lock may be more appropriate for dl_iterate_phdr too than GL(dl_load_write_lock), since TLS state of an incompletely loaded module may be accessed. If the new lock can replace the old one, that can be a separate change. Fixes bug 28357. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* nptl: pthread_kill must send signals to a specific thread [BZ #28407]Florian Weimer2021-10-012-0/+93
| | | | | | | | | | | | | | | | | The choice between the kill vs tgkill system calls is not just about the TID reuse race, but also about whether the signal is sent to the whole process (and any thread in it) or to a specific thread. This was caught by the openposix test suite: LTP: openposix test suite - FAIL: SIGUSR1 is member of new thread pendingset. <https://gitlab.com/cki-project/kernel-tests/-/issues/764> Fixes commit 526c3cf11ee9367344b6b15d669e4c3cb461a2be ("nptl: Fix race between pthread_kill and thread exit (bug 12889)"). Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@redhat.com>
* nptl: Add CLOCK_MONOTONIC support for PI mutexesAdhemerval Zanella2021-10-012-15/+28
| | | | | | | | | | | | | | | Linux added FUTEX_LOCK_PI2 to support clock selection (commit bf22a6976897977b0a3f1aeba6823c959fc4fdae). With the new flag we can now proper support CLOCK_MONOTONIC for pthread_mutex_clocklock with Priority Inheritance. If kernel does not support, EINVAL is returned instead. The difference is the futex operation will be issued and the kernel will advertise the missing support (instead of hard-code error return). Checked on x86_64-linux-gnu and i686-linux-gnu on Linux 5.14, 5.11, and 4.15.
* nptl: Use FUTEX_LOCK_PI2 when availableAdhemerval Zanella2021-10-012-54/+5
| | | | | | | | | | | This patch uses the new futex PI operation provided by Linux v5.14 when it is required. The futex_lock_pi64() is moved to futex-internal.c (since it used on two different places and its code size might be large depending of the kernel configuration) and clockid is added as an argument. Co-authored-by: Kurt Kanzenbach <kurt@linutronix.de>
* Linux: Add FUTEX_LOCK_PI2Kurt Kanzenbach2021-10-011-0/+8
| | | | | | | | | | Linux v5.14.0 introduced a new futex operation called FUTEX_LOCK_PI2. This kernel feature can be used to implement pthread_mutex_clocklock(MONOTONIC)/PI. Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* Update alpha libm-test-ulpsAdhemerval Zanella2021-09-301-49/+53
|
* powerpc: Fix unrecognized instruction errors with recent binutilsPaul A. Clarke2021-09-292-6/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recent versions of binutils (with commit b25f942e18d6ecd7ec3e2d2e9930eb4f996c258a) stopped preserving "sticky" options across a base `.machine` directive, nullifying the use of passing "-many" through GCC to the assembler. As a result, some instructions which were recognized even under older, more stringent `.machine` directives become unrecognized instructions in that context. In `sysdeps/powerpc/tst-set_ppr.c`, the use of the `mfppr32` extended mnemonic became unrecognized, as the default compilation with GCC for 32bit powerpc adds a `.machine ppc` in the resulting assembly, so the command line option `-Wa,-many` is essentially ignored, and the ISA 2.06 instructions and mnemonics, like `mfppr32`, are unrecognized. The compilation of `sysdeps/powerpc/tst-set_ppr.c` fails with: Error: unrecognized opcode: `mfppr32' Add appropriate `.machine` directives in the assembly to bracket the `mfppr32` instruction. Part of a 2019 fix (commit 9250e6610fdb0f3a6f238d2813e319a41fb7a810) to the above test's Makefile to add `-many` to the compilation when GCC itself stopped passing `-many` to the assember no longer has any effect, so remove that. Reported-by: Joseph Myers <joseph@codesourcery.com>
* Add fmaximum, fminimum functionsJoseph Myers2021-09-2842-1/+1967
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | C2X adds new <math.h> functions for floating-point maximum and minimum, corresponding to the new operations that were added in IEEE 754-2019 because of concerns about the old operations not being associative in the presence of signaling NaNs. fmaximum and fminimum handle NaNs like most <math.h> functions (any NaN argument means the result is a quiet NaN). fmaximum_num and fminimum_num handle both quiet and signaling NaNs the way fmax and fmin handle quiet NaNs (if one argument is a number and the other is a NaN, return the number), but still raise "invalid" for a signaling NaN argument, making them exceptions to the normal rule that a function with a floating-point result raising "invalid" also returns a quiet NaN. fmaximum_mag, fminimum_mag, fmaximum_mag_num and fminimum_mag_num are corresponding functions returning the argument with greatest or least absolute value. All these functions also treat +0 as greater than -0. There are also corresponding <tgmath.h> type-generic macros. Add these functions to glibc. The implementations use type-generic templates based on those for fmax, fmin, fmaxmag and fminmag, and test inputs are based on those for those functions with appropriate adjustments to the expected results. The RISC-V maintainers might wish to add optimized versions of fmaximum_num and fminimum_num (for float and double), since RISC-V (F extension version 2.2 and later) provides instructions corresponding to those functions - though it might be at least as useful to add architecture-independent built-in functions to GCC and teach the RISC-V back end to expand those functions inline, which is what you generally want for functions that can be implemented with a single instruction. Tested for x86_64 and x86, and with build-many-glibcs.py.
* Linux: Simplify __opensock and fix race condition [BZ #28353]Florian Weimer2021-09-282-116/+0
| | | | | | | | | | | | AF_NETLINK support is not quite optional on modern Linux systems anymore, so it is likely that the first attempt will always succeed. Consequently, there is no need to cache the result. Keep AF_UNIX and the Internet address families as a fallback, for the rare case that AF_NETLINK is missing. The other address families previously probed are totally obsolete be now, so remove them. Use this simplified version as the generic implementation, disabling Netlink support as needed.
* pthread/tst-cancel28: Fix barrier re-init race conditionStafford Horne2021-09-281-1/+0
| | | | | | | | | | | | | | | | | | | | When running this test on the OpenRISC port I am working on this test fails with a timeout. The test passes when being straced or debugged. Looking at the code there seems to be a race condition in that: 1 main thread: calls xpthread_cancel 2 sub thread : receives cancel signal 3 sub thread : cleanup routine waits on barrier 4 main thread: re-inits barrier 5 main thread: waits on barrier After getting to 5 the main thread and sub thread wait forever as the 2 barriers are no longer the same. Removing the barrier re-init seems to fix this issue. Also, the barrier does not need to be reinitialized as that is done by default. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* powerpc: Delete unneeded ELF_MACHINE_BEFORE_RTLD_RELOCFangrui Song2021-09-272-4/+0
| | | | Reviewed-by: Raphael M Zinsly <rzinsly@linux.ibm.com>
* posix: Remove spawni.cAdhemerval Zanella2021-09-271-343/+0
| | | | | | | | Although it provide an alternate implementation that communicates using pipe() instead of shared memory, no port uses and it adds extra burden for posix_spawn() extensions. Reviewed-by: Florian Weimer <fweimer@redhat.com>
* Disable symbol hack in libc_nonshared.aH.J. Lu2021-09-272-2/+4
| | | | | Don't reference __GI_memmove, __GI_memset, __GI_memcpy, __divdi3_internal, __udivdi3_internal and __moddi3_internal in libc_nonshared.a.
* linux: Revert the use of sched_getaffinity on get_nproc (BZ #28310)Adhemerval Zanella2021-09-271-5/+134
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The use of sched_getaffinity on get_nproc and sysconf (_SC_NPROCESSORS_ONLN) done in 903bc7dcc2acafc40 (BZ #27645) breaks the top command in common hypervisor configurations and also other monitoring tools. The main issue using sched_getaffinity changed the symbols semantic from system-wide scope of online CPUs to per-process one (which can be changed with kernel cpusets or book parameters in VM). This patch reverts mostly of the 903bc7dcc2acafc40, with the exceptions: * No more cached values and atomic updates, since they are inherent racy. * No /proc/cpuinfo fallback, since /proc/stat is already used and it would require to revert more arch-specific code. * The alloca is replace with a static buffer of 1024 bytes. So the implementation first consult the sysfs, and fallbacks to procfs. Checked on x86_64-linux-gnu. Reviewed-by: Florian Weimer <fweimer@redhat.com>
* linux: Simplify get_nprocsAdhemerval Zanella2021-09-271-50/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch simplifies the memory allocation code and uses the sched routines instead of reimplement it. This still uses a stack allocation buffer, so it can be used on malloc initialization code. Linux currently supports at maximum of 4096 cpus for most architectures: $ find -iname Kconfig | xargs git grep -A10 -w NR_CPUS | grep -w range arch/alpha/Kconfig- range 2 32 arch/arc/Kconfig- range 2 4096 arch/arm/Kconfig- range 2 16 if DEBUG_KMAP_LOCAL arch/arm/Kconfig- range 2 32 if !DEBUG_KMAP_LOCAL arch/arm64/Kconfig- range 2 4096 arch/csky/Kconfig- range 2 32 arch/hexagon/Kconfig- range 2 6 if SMP arch/ia64/Kconfig- range 2 4096 arch/mips/Kconfig- range 2 256 arch/openrisc/Kconfig- range 2 32 arch/parisc/Kconfig- range 2 32 arch/riscv/Kconfig- range 2 32 arch/s390/Kconfig- range 2 512 arch/sh/Kconfig- range 2 32 arch/sparc/Kconfig- range 2 32 if SPARC32 arch/sparc/Kconfig- range 2 4096 if SPARC64 arch/um/Kconfig- range 1 1 arch/x86/Kconfig-# [NR_CPUS_RANGE_BEGIN ... NR_CPUS_RANGE_END] range. arch/x86/Kconfig- range NR_CPUS_RANGE_BEGIN NR_CPUS_RANGE_END arch/xtensa/Kconfig- range 2 32 With x86 supporting 8192: arch/x86/Kconfig 976 config NR_CPUS_RANGE_END 977 int 978 depends on X86_64 979 default 8192 if SMP && CPUMASK_OFFSTACK 980 default 512 if SMP && !CPUMASK_OFFSTACK 981 default 1 if !SMP So using a maximum of 32k cpu should cover all cases (and I would expect once we start to have many more CPUs that Linux would provide a more straightforward way to query for such information). A test is added to check if sched_getaffinity can successfully return with large buffers. Checked on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Florian Weimer <fweimer@redhat.com>
* misc: Add __get_nprocs_schedAdhemerval Zanella2021-09-272-0/+12
| | | | | | | | | | | This is an internal function meant to return the number of avaliable processor where the process can scheduled, different than the __get_nprocs which returns a the system available online CPU. The Linux implementation currently only calls __get_nprocs(), which in tuns calls sched_getaffinity. Reviewed-by: Florian Weimer <fweimer@redhat.com>
* htl: make pthread_sigstate read/write set/oset outside sigstate sectionSamuel Thibault2021-09-261-5/+11
| | | | so that if a segfault occurs, the handler can run fine.
* Fix sysdeps/x86/fpu/s_ffma.c for 32-bit FMA processor caseJoseph Myers2021-09-241-2/+6
| | | | | | | | | | | | | | | | | | | | It turns out the __SSE2_MATH__ conditional in sysdeps/x86/fpu/s_ffma.c does not cover all cases where the x86 fenv_private.h macros might manipulate one of the SSE and 387 floating-point state, while the actual fma implementation uses the other. Specifically, in the 32-bit case, with a compiler not defaulting to -mfpmath=sse, but testing on a processor with hardware FMA support, the multiarch fma function implementations will end up using SSE, while the fenv_private.h macros will use the 387 state for double. Change the conditional to use the default macros rather than the optimized ones in all cases except when the compiler inlines an fma instruction (in which case, since all those instructions are SSE instructions and -mfpmath=sse must be in effect for them to be inlined, the optimized macros will only use the SSE state and it's OK for them to only use the SSE state). Tested for x86_64 and x86. H.J. reports in <https://sourceware.org/pipermail/libc-alpha/2021-September/131367.html> that it fixes the problems he observed.
* Linux: Avoid closing -1 on failure in __closefrom_fallbackFlorian Weimer2021-09-241-1/+1
| | | | Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* i386: Port elf_machine_{load_address,dynamic} from x86-64Fangrui Song2021-09-241-16/+9
| | | | | | | | | This drops reliance on _GLOBAL_OFFSET_TABLE_[0] being the link-time address of _DYNAMIC. The code sequence length does not change. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* aarch64: Disable A64FX memcpy/memmove BTI unconditionallyNaohiro Tamura2021-09-241-0/+3
| | | | | | | | | This patch disables A64FX memcpy/memmove BTI instruction insertion unconditionally such as A64FX memset patch [1] for performance. [1] commit 07b427296b8d59f439144029d9a948f6c1ce0a31 Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* powerpc64le: Avoid conflicting types for f64xfmaf128 when IFUNC is not usedTulio Magno Quites Machado Filho2021-09-231-0/+2
| | | | | | | | | Avoid defining f64xfmaf128 twice when building s_fmaf128.c. This can be reproduced on powerpc64le whenever f128 functions do not have IFUNC enabled, e.g. using "--with-cpu=power8 --disable-multi-arch", or when using "-with-cpu=power9". Fixes: b3f27d8150d4f ("Add narrowing fma functions")
* Fix ffma use of round-to-odd on x86Joseph Myers2021-09-231-0/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On 32-bit x86 with -mfpmath=sse, and on x86_64 with --disable-multi-arch, the tests of ffma and its aliases (fma narrowing from binary64 to binary32) fail. This is probably the issue reported by H.J. in <https://sourceware.org/pipermail/libc-alpha/2021-September/131277.html>. The problem is the use of fenv_private.h macros in the round-to-odd implementation. Those macros are set up to manipulate only one of the SSE and 387 floating-point state, whichever is relevant for the type indicated by the suffix on the macro name. But x86 configurations sometimes use the ldbl-96 implementation of binary64 fma (that's where --disable-multi-arch is relevant for x86_64: it causes the ldbl-96 implementation to be used, instead of an IFUNC implementation that falls back to the dbl-64 version), contrary to the expectations of those macros for functions operating on double when __SSE2_MATH__ is defined. This can be addressed by using the default versions of those macros (giving x86 its own version of s_ffma.c), as is done for the *f128 macro variants where it depends on the details of how GCC was configured when building libgcc which floating-point state is affected by _Float128 arithmetic. The issue only applies when __SSE2_MATH__ is defined, and doesn't apply when __FP_FAST_FMA is defined (because in that case, fma will be inlined by the compiler, meaning it's definitely an SSE operation; for the same reason, this is not an issue for narrowing sqrt, as hardware sqrt is always inlined in that implementation for x86), but in other cases it's safest to use the default versions of the fenv_private.h macros to ensure things work whichever fma implementation is used. Tested for x86_64 (with and without --disable-multi-arch) and x86 (with and without -mfpmath=sse).
* nptl: Avoid setxid deadlock with blocked signals in thread exit [BZ #28361]Florian Weimer2021-09-232-0/+62
| | | | | | | | | | | | | | | | | | | | | | As part of the fix for bug 12889, signals are blocked during thread exit, so that application code cannot run on the thread that is about to exit. This would cause problems if the application expected signals to be delivered after the signal handler revealed the thread to still exist, despite pthread_kill can no longer be used to send signals to it. However, glibc internally uses the SIGSETXID signal in a way that is incompatible with signal blocking, due to the way the setxid handshake delays thread exit until the setxid operation has completed. With a blocked SIGSETXID, the handshake can never complete, causing a deadlock. As a band-aid, restore the previous handshake protocol by not blocking SIGSETXID during thread exit. The new test sysdeps/pthread/tst-pthread-setuid-loop.c is based on a downstream test by Martin Osvald. Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@redhat.com>
* Add narrowing fma functionsJoseph Myers2021-09-2267-1/+947
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the narrowing fused multiply-add functions from TS 18661-1 / TS 18661-3 / C2X to glibc's libm: ffma, ffmal, dfmal, f32fmaf64, f32fmaf32x, f32xfmaf64 for all configurations; f32fmaf64x, f32fmaf128, f64fmaf64x, f64fmaf128, f32xfmaf64x, f32xfmaf128, f64xfmaf128 for configurations with _Float64x and _Float128; __f32fmaieee128 and __f64fmaieee128 aliases in the powerpc64le case (for calls to ffmal and dfmal when long double is IEEE binary128). Corresponding tgmath.h macro support is also added. The changes are mostly similar to those for the other narrowing functions previously added, especially that for sqrt, so the description of those generally applies to this patch as well. As with sqrt, I reused the same test inputs in auto-libm-test-in as for non-narrowing fma rather than adding extra or separate inputs for narrowing fma. The tests in libm-test-narrow-fma.inc also follow those for non-narrowing fma. The non-narrowing fma has a known bug (bug 6801) that it does not set errno on errors (overflow, underflow, Inf * 0, Inf - Inf). Rather than fixing this or having narrowing fma check for errors when non-narrowing does not (complicating the cases when narrowing fma can otherwise be an alias for a non-narrowing function), this patch does not attempt to check for errors from narrowing fma and set errno; the CHECK_NARROW_FMA macro is still present, but as a placeholder that does nothing, and this missing errno setting is considered to be covered by the existing bug rather than needing a separate open bug. missing-errno annotations are duly added to many of the auto-libm-test-in test inputs for fma. This completes adding all the new functions from TS 18661-1 to glibc, so will be followed by corresponding stdc-predef.h changes to define __STDC_IEC_60559_BFP__ and __STDC_IEC_60559_COMPLEX__, as the support for TS 18661-1 will be at a similar level to that for C standard floating-point facilities up to C11 (pragmas not implemented, but library functions done). (There are still further changes to be done to implement changes to the types of fromfp functions from N2548.) Tested as followed: natively with the full glibc testsuite for x86_64 (GCC 11, 7, 6) and x86 (GCC 11); with build-many-glibcs.py with GCC 11, 7 and 6; cross testing of math/ tests for powerpc64le, powerpc32 hard float, mips64 (all three ABIs, both hard and soft float). The different GCC versions are to cover the different cases in tgmath.h and tgmath.h tests properly (GCC 6 has _Float* only as typedefs in glibc headers, GCC 7 has proper _Float* support, GCC 8 adds __builtin_tgmath).
* ld.so: Replace DL_RO_DYN_SECTION with dl_relocate_ld [BZ #28340]H.J. Lu2021-09-226-14/+98
| | | | | | | | | | | | | | | | | We can't relocate entries in dynamic section if it is readonly: 1. Add a l_ld_readonly field to struct link_map to indicate if dynamic section is readonly and set it based on p_flags of PT_DYNAMIC segment. 2. Replace DL_RO_DYN_SECTION with dl_relocate_ld to decide if dynamic section should be relocated. 3. Remove DL_RO_DYN_TEMP_CNT. 4. Don't use a static dynamic section to make readonly dynamic section in vDSO writable. 5. Remove the temp argument from elf_get_dynamic_info. This fixes BZ #28340. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* Adjust new narrowing div/mul tests for IBM long double, update powerpc ULPsJoseph Myers2021-09-221-0/+3
| | | | | | Testing for powerpc shows some of the new narrowing div/mul tests need XFAILing for IBM long double and some ULPs updates are needed for those tests.
* Fix f64xdivf128, f64xmulf128 spurious underflows (bug 28358)Joseph Myers2021-09-2114-14/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | As described in bug 28358, the round-to-odd computations used in the libm functions that round their results to a narrower format can yield spurious underflow exceptions in the following circumstances: the narrowing only narrows the precision of the type and not the exponent range (i.e., it's narrowing _Float128 to _Float64x on x86_64, x86 or ia64), the architecture does after-rounding tininess detection (which applies to all those architectures), the result is inexact, tiny before rounding but not tiny after rounding (with the chosen rounding mode) for _Float64x (which is possible for narrowing mul, div and fma, not for narrowing add, sub or sqrt), so the underflow exception resulting from the toward-zero computation in _Float128 is spurious for _Float64x. Fixed by making ROUND_TO_ODD call feclearexcept (FE_UNDERFLOW) in the problem cases (as indicated by an extra argument to the macro); there is never any need to preserve underflow exceptions from this part of the computation, because the conversion of the round-to-odd value to the narrower type will underflow in exactly the cases in which the function should raise that exception, but it may be more efficient to avoid the extra manipulation of the floating-point environment when not needed. Tested for x86_64 and x86, and with build-many-glibcs.py.
* nptl: Fix type of pthread_mutexattr_getrobust_np, ↵Florian Weimer2021-09-211-2/+2
| | | | | | | pthread_mutexattr_setrobust_np (bug 28036) Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@redhat.com>
* powerpc: Fix unrecognized instruction errors with recent GCCPaul A. Clarke2021-09-201-0/+1
| | | | | | | | | | | | | | | | | | | | | | Recent binutils commit b25f942e18d6ecd7ec3e2d2e9930eb4f996c258a changes the behavior of `.machine` directives to override, rather than augment, the base CPU. This can result in _reduced_ functionality when, for example, compiling for default machine "power8", but explicitly asking for ".machine power5", which loses Altivec instructions. In tst-ucontext-ppc64-vscr.c, while the instructions provoking the new error messages are bracketed by ".machine power5", which is ostensibly Power ISA 2.03 (POWER5), the POWER5 processor did not support the VSX subset, so these instructions are not recognized as "power5". Error: unrecognized opcode: `vspltisb' Error: unrecognized opcode: `vpkuwus' Error: unrecognized opcode: `mfvscr' Error: unrecognized opcode: `stvx' Manually adding the VSX subset via ".machine altivec" is sufficient. Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
* nptl: pthread_kill needs to return ESRCH for old programs (bug 19193)Florian Weimer2021-09-201-2/+19
| | | | | | The fix for bug 19193 breaks some old applications which appear to use pthread_kill to probe if a thread is still running, something that is not supported by POSIX.
* Extend struct r_debug to support multiple namespaces [BZ #15971]H.J. Lu2021-09-191-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Glibc does not provide an interface for debugger to access libraries loaded in multiple namespaces via dlmopen. The current rtld-debugger interface is described in the file: elf/rtld-debugger-interface.txt under the "Standard debugger interface" heading. This interface only provides access to the first link-map (LM_ID_BASE). 1. Bump r_version to 2 when multiple namespaces are used. This triggers the GDB bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28236 2. Add struct r_debug_extended to extend struct r_debug into a linked-list, where each element correlates to an unique namespace. 3. Initialize the r_debug_extended structure. Bump r_version to 2 for the new namespace and add the new namespace to the namespace linked list. 4. Add _dl_debug_update to return the address of struct r_debug' of a namespace. 5. Add a hidden symbol, _r_debug_extended, for struct r_debug_extended. 6. Provide the symbol, _r_debug, with size of struct r_debug, as an alias of _r_debug_extended, for programs which reference _r_debug. This fixes BZ #15971. Reviewed-by: Florian Weimer <fweimer@redhat.com>
* elf: Remove THREAD_GSCOPE_IN_TCBSergey Bugaev2021-09-1621-32/+0
| | | | | | | | | All the ports now have THREAD_GSCOPE_IN_TCB set to 1. Remove all support for !THREAD_GSCOPE_IN_TCB, along with the definition itself. Signed-off-by: Sergey Bugaev <bugaevc@gmail.com> Message-Id: <20210915171110.226187-4-bugaevc@gmail.com> Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
* htl: Reimplement GSCOPESergey Bugaev2021-09-163-20/+76
| | | | | | | | | | | | | | | | This is a new implementation of GSCOPE which largely mirrors its NPTL counterpart. Same as in NPTL, instead of a global flag shared between threads, there is now a per-thread GSCOPE flag stored in each thread's TCB. This makes entering and exiting a GSCOPE faster at the expense of making THREAD_GSCOPE_WAIT () slower. The largest win is the elimination of many redundant gsync_wake () RPC calls; previously, even simplest programs would make dozens of fully redundant gsync_wake () calls. Signed-off-by: Sergey Bugaev <bugaevc@gmail.com> Message-Id: <20210915171110.226187-3-bugaevc@gmail.com> Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>