about summary refs log tree commit diff
Commit message (Collapse)AuthorAgeFilesLines
...
* nptl: Add backoff mechanism to spinlock loopWangyang Guo2022-05-094-2/+89
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When mutiple threads waiting for lock at the same time, once lock owner releases the lock, waiters will see lock available and all try to lock, which may cause an expensive CAS storm. Binary exponential backoff with random jitter is introduced. As try-lock attempt increases, there is more likely that a larger number threads compete for adaptive mutex lock, so increase wait time in exponential. A random jitter is also added to avoid synchronous try-lock from other threads. v2: Remove read-check before try-lock for performance. v3: 1. Restore read-check since it works well in some platform. 2. Make backoff arch dependent, and enable it for x86_64. 3. Limit max backoff to reduce latency in large critical section. v4: Fix strict-prototypes error in sysdeps/nptl/pthread_mutex_backoff.h v5: Commit log updated for regression in large critical section. Result of pthread-mutex-locks bench Test Platform: Xeon 8280L (2 socket, 112 CPUs in total) First Row: thread number First Col: critical section length Values: backoff vs upstream, time based, low is better non-critical-length: 1 1 2 4 8 16 32 64 112 140 0 0.99 0.58 0.52 0.49 0.43 0.44 0.46 0.52 0.54 1 0.98 0.43 0.56 0.50 0.44 0.45 0.50 0.56 0.57 2 0.99 0.41 0.57 0.51 0.45 0.47 0.48 0.60 0.61 4 0.99 0.45 0.59 0.53 0.48 0.49 0.52 0.64 0.65 8 1.00 0.66 0.71 0.63 0.56 0.59 0.66 0.72 0.71 16 0.97 0.78 0.91 0.73 0.67 0.70 0.79 0.80 0.80 32 0.95 1.17 0.98 0.87 0.82 0.86 0.89 0.90 0.90 64 0.96 0.95 1.01 1.01 0.98 1.00 1.03 0.99 0.99 128 0.99 1.01 1.01 1.17 1.08 1.12 1.02 0.97 1.02 non-critical-length: 32 1 2 4 8 16 32 64 112 140 0 1.03 0.97 0.75 0.65 0.58 0.58 0.56 0.70 0.70 1 0.94 0.95 0.76 0.65 0.58 0.58 0.61 0.71 0.72 2 0.97 0.96 0.77 0.66 0.58 0.59 0.62 0.74 0.74 4 0.99 0.96 0.78 0.66 0.60 0.61 0.66 0.76 0.77 8 0.99 0.99 0.84 0.70 0.64 0.66 0.71 0.80 0.80 16 0.98 0.97 0.95 0.76 0.70 0.73 0.81 0.85 0.84 32 1.04 1.12 1.04 0.89 0.82 0.86 0.93 0.91 0.91 64 0.99 1.15 1.07 1.00 0.99 1.01 1.05 0.99 0.99 128 1.00 1.21 1.20 1.22 1.25 1.31 1.12 1.10 0.99 non-critical-length: 128 1 2 4 8 16 32 64 112 140 0 1.02 1.00 0.99 0.67 0.61 0.61 0.61 0.74 0.73 1 0.95 0.99 1.00 0.68 0.61 0.60 0.60 0.74 0.74 2 1.00 1.04 1.00 0.68 0.59 0.61 0.65 0.76 0.76 4 1.00 0.96 0.98 0.70 0.63 0.63 0.67 0.78 0.77 8 1.01 1.02 0.89 0.73 0.65 0.67 0.71 0.81 0.80 16 0.99 0.96 0.96 0.79 0.71 0.73 0.80 0.84 0.84 32 0.99 0.95 1.05 0.89 0.84 0.85 0.94 0.92 0.91 64 1.00 0.99 1.16 1.04 1.00 1.02 1.06 0.99 0.99 128 1.00 1.06 0.98 1.14 1.39 1.26 1.08 1.02 0.98 There is regression in large critical section. But adaptive mutex is aimed for "quick" locks. Small critical section is more common when users choose to use adaptive pthread_mutex. Signed-off-by: Wangyang Guo <wangyang.guo@intel.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* Linux: Implement a useful version of _startup_fatalFlorian Weimer2022-05-093-19/+65
| | | | | | On i386 and ia64, the TCB is not available at this point. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* ia64: Always define IA64_USE_NEW_STUB as a flag macroFlorian Weimer2022-05-092-13/+15
| | | | | | | And keep the previous definition if it exists. This allows disabling IA64_USE_NEW_STUB while keeping USE_DL_SYSINFO defined. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* linux: Fix posix_spawn return code if clone fails (BZ#29109)Adhemerval Zanella2022-05-061-1/+1
| | | | | | The __clone_internal returns the error on errno. Checked on x86_64-linux-gnu.
* benchtests: Add wcrtomb microbenchmarkSiddhesh Poyarekar2022-05-062-0/+140
| | | | | | | | Add a simple benchmark that measures wcrtomb performance with various locales with 1-4 byte characters. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: Florian Weimer <fweimer@redhat.com>
* clock_settime/clock_gettime: Use __nonnull to avoid null pointerXiaoming Ni2022-05-052-6/+9
| | | | | | | | | | | | | clock_settime() clock_settime64() clock_gettime() clock_gettime64() Add __nonnull((2)) to avoid null pointer access. Link: https://sourceware.org/bugzilla/show_bug.cgi?id=27662 Link: https://sourceware.org/bugzilla/show_bug.cgi?id=29084 Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* clock_adjtime: Use __nonnull to avoid null pointerXiaoming Ni2022-05-052-3/+3
| | | | | | | | | | clock_adjtime()/clock_adjtime64() Add __nonnull((2)) to avoid null pointer access. Link: https://sourceware.org/bugzilla/show_bug.cgi?id=27662 Link: https://sourceware.org/bugzilla/show_bug.cgi?id=29084 Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* ntp_xxxtimex: Use __nonnull to avoid null pointerXiaoming Ni2022-05-052-8/+8
| | | | | | | | | | | | | | ntp_gettime() ntp_gettime64() ntp_gettimex() ntp_gettimex64() ntp_adjtime() Add __nonnull((1)) to avoid null pointer access. Link: https://sourceware.org/bugzilla/show_bug.cgi?id=27662 Link: https://sourceware.org/bugzilla/show_bug.cgi?id=29084 Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* adjtimex/adjtimex64: Use __nonnull to avoid null pointerXiaoming Ni2022-05-052-4/+4
| | | | | | | | | | Add __nonnull((1)) to the adjtimex()/adjtimex64() function declaration to avoid null pointer access. Link: https://sourceware.org/bugzilla/show_bug.cgi?id=27662 Link: https://sourceware.org/bugzilla/show_bug.cgi?id=29084 Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* hurd spawni: Fix reauthenticating closed fdsSamuel Thibault2022-05-051-1/+1
| | | | | When an fd is closed, the port cell remains, but the port becomes MACH_PORT_NULL, so we have to guard against it.
* Linux: Define MMAP_CALL_INTERNALFlorian Weimer2022-05-043-12/+30
| | | | | | | | | | | | Unlike MMAP_CALL, this avoids a TCB dependency for an errno update on failure. <mmap_internal.h> cannot be included as is on several architectures due to the definition of page_unit, so introduce a separate header file for the definition of MMAP_CALL and MMAP_CALL_INTERNAL, <mmap_call.h>. Reviewed-by: Stefan Liebler <stli@linux.ibm.com>
* i386: Honor I386_USE_SYSENTER for 6-argument Linux system callsFlorian Weimer2022-05-043-3/+37
| | | | | | | Introduce an int-80h-based version of __libc_do_syscall and use it if I386_USE_SYSENTER is defined as 0. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* i386: Remove OPTIMIZE_FOR_GCC_5 from Linux libc-do-syscall.SFlorian Weimer2022-05-041-3/+0
| | | | | | | | After commit a78e6a10d0b50d0ca80309775980fc99944b1727 ("i386: Remove broken CAN_USE_REGISTER_ASM_EBP (bug 28771)"), it is never defined. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* manual: Clarify that abbreviations of long options are allowedSiddhesh Poyarekar2022-05-041-1/+2
| | | | | | | | | | The man page and code comments clearly state that abbreviations of long option names are recognized correctly as long as they are unique. Document this fact in the glibc manual as well. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: Florian Weimer <fweimer@redhat.com> Reviewed-by: Andreas Schwab <schwab@linux-m68k.org>
* elf: Remove fallback to the start of DT_STRTAB for dladdrFangrui Song2022-05-021-11/+5
| | | | | | | | | | | | | When neither DT_HASH nor DT_GNU_HASH is present, the code scans [DT_SYMTAB, DT_STRTAB). However, there is no guarantee that .dynstr immediately follows .dynsym (e.g. lld typically places .gnu.version after .dynsym). In the absence of a hash table, symbol lookup will always fail (map->l_nbuckets == 0 in dl-lookup.c) as if the object has no symbol, so it seems fair for dladdr to do the same. Reviewed-by: Florian Weimer <fweimer@redhat.com>
* powerpc32: Remove unused HAVE_PPC_SECURE_PLTFangrui Song2022-05-023-44/+0
| | | | | | | 82a79e7d1843f9d90075a0bf2f04557040829bb0 removed the only user of HAVE_PPC_SECURE_PLT. Reviewed-by: Florian Weimer <fweimer@redhat.com>
* dlfcn: Implement the RTLD_DI_PHDR request type for dlinfoFlorian Weimer2022-04-295-5/+159
| | | | | | | | | The information is theoretically available via dl_iterate_phdr as well, but that approach is very slow if there are many shared objects. Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@rehdat.com>
* manual: Document the dlinfo functionFlorian Weimer2022-04-291-1/+70
| | | | | Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@rehdat.com>
* Do not use --hash-style=both for building glibc shared objectsFlorian Weimer2022-04-295-61/+0
| | | | | | | | The comment indicates that --hash-style=both was used to maintain compatibility with static dlopen, but we had many internal ABI changes since then, so this compatiblity does not add value anymore. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* benchtests: Better libmvec integrationSiddhesh Poyarekar2022-04-292-19/+17
| | | | | | | | Improve libmvec benchmark integration so that in future other architectures may be able to run their libmvec benchmarks as well. This now allows libmvec benchmarks to be run with `make BENCHSET=bench-math`. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* benchtests: Add UNSUPPORTED benchmark statusSiddhesh Poyarekar2022-04-292-11/+24
| | | | | | | | | | | | The libmvec benchmarks print a message indicating that a certain CPU feature is unsupported and exit prematurelyi, which breaks the JSON in bench.out. Handle this more elegantly in the bench makefile target by adding support for an UNSUPPORTED exit status (77) so that bench.out continues to have output for valid tests. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* linux: Fix fchmodat with AT_SYMLINK_NOFOLLOW for 64 bit time_t (BZ#29097)Adhemerval Zanella2022-04-284-6/+30
| | | | | | | | The AT_SYMLINK_NOFOLLOW emulation ues the default 32 bit stat internal calls, which fails with EOVERFLOW if the file constains timestamps beyond 2038. Checked on i686-linux-gnu.
* Use __ehdr_start rather than _begin in _dl_start_finalAlan Modra2022-04-282-6/+4
| | | | | | | | | | __ehdr_start is already used in rltld.c:dl_main, and can serve the same purpose as _begin. Besides tidying the code, using linker defined section relative symbols rather than "-defsym _begin=0" better reflects the intent of _dl_start_final use of _begin, which is to refer to the load address of ld.so rather than absolute address zero. Reviewed-by: Florian Weimer <fweimer@redhat.com>
* sysdeps: Add 'get_fast_jitter' interace in fast-jitter.hNoah Goldstein2022-04-271-0/+42
| | | | | | | | | | | | | | 'get_fast_jitter' is meant to be used purely for performance purposes. In all cases it's used it should be acceptable to get no randomness (see default case). An example use case is in setting jitter for retries between threads at a lock. There is a performance benefit to having jitter, but only if the jitter can be generated very quickly and ultimately there is no serious issue if no jitter is generated. The implementation generally uses 'HP_TIMING_NOW' iff it is inlined (avoid any potential syscall paths). Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* posix/glob.c: update from gnulibDJ Delorie2022-04-272-12/+59
| | | | | | | | Copied from gnulib/lib/glob.c in order to fix rhbz 1982608 Also fixes swbz 25659 Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@redhat.com>
* benchtests: Add pthread-mutex-locks benchWangyang Guo2022-04-272-0/+290
| | | | | | | | | | | | | | | | | | | | Benchmark for testing pthread mutex locks performance with different threads and critical sections. The test configuration consists of 3 parts: 1. thread number 2. critical-section length 3. non-critical-section length Thread number starts from 1 and increased by 2x until num of CPU cores (nprocs). An additional over-saturation case (1.25 * nprocs) is also included. Critical-section is represented by a loop of shared do_filler(), length can be determined by the loop iters. Non-critical-section is similiar to the critical-section, except it's based on non-shared do_filler(). Currently, adaptive pthread_mutex lock is tested.
* linux: Fix missing internal 64 bit time_t stat usageAdhemerval Zanella2022-04-272-4/+4
| | | | | | These are two missing spots initially done by 52a5fe70a2c77935. Checked on i686-linux-gnu.
* elf: Fix DFS sorting algorithm for LD_TRACE_LOADED_OBJECTS with missing ↵Adhemerval Zanella2022-04-2714-1/+200
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | libraries (BZ #28868) On _dl_map_object the underlying file is not opened in trace mode (in other cases where the underlying file can't be opened, _dl_map_object quits with an error). If there any missing libraries being processed, they will not be considered on final nlist size passed on _dl_sort_maps later in the function. And it is then used by _dl_sort_maps_dfs on the stack allocated working maps: 222 /* Array to hold RPO sorting results, before we copy back to maps[]. */ 223 struct link_map *rpo[nmaps]; 224 225 /* The 'head' position during each DFS iteration. Note that we start at 226 one past the last element due to first-decrement-then-store (see the 227 bottom of above dfs_traversal() routine). */ 228 struct link_map **rpo_head = &rpo[nmaps]; However while transversing the 'l_initfini' on dfs_traversal it will still consider the l_faked maps and thus update rpo more times than the allocated working 'rpo', overflowing the stack object. As suggested in bugzilla, one option would be to avoid sorting the maps for trace mode. However I think ignoring l_faked object does make sense (there is one less constraint to call the sorting function), it allows a slight less stack usage for trace, and it is slight simpler solution. The tests does trigger the stack overflow, however I tried to make it more generic to check different scenarios or missing objects. Checked on x86_64-linux-gnu. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* posix: Remove unused definition on _ForkAdhemerval Zanella2022-04-261-3/+0
| | | | Checked on x86_64-linux-gnu.
* NEWS: Mention DT_RELR supportH.J. Lu2022-04-261-1/+6
|
* elf: Add more DT_RELR testsH.J. Lu2022-04-2610-3/+286
| | | | | | | | Verify that: 1. A DT_RELR shared library without DT_NEEDED works. 2. A DT_RELR shared library without DT_VERNEED works. 3. A DT_RELR shared library without libc.so on DT_NEEDED works.
* elf: Properly handle zero DT_RELA/DT_REL valuesH.J. Lu2022-04-262-7/+23
| | | | | | With DT_RELR, there may be no relocations in DT_RELA/DT_REL and their entry values are zero. Don't relocate DT_RELA/DT_REL and update the combined relocation start address if their entry values are zero.
* elf: Support DT_RELR relative relocation format [BZ #27924]Fangrui Song2022-04-267-0/+170
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PIE and shared objects usually have many relative relocations. In 2017/2018, SHT_RELR/DT_RELR was proposed on https://groups.google.com/g/generic-abi/c/bX460iggiKg/m/GxjM0L-PBAAJ ("Proposal for a new section type SHT_RELR") and is a pre-standard. RELR usually takes 3% or smaller space than R_*_RELATIVE relocations. The virtual memory size of a mostly statically linked PIE is typically 5~10% smaller. --- Notes I will not include in the submitted commit: Available on https://sourceware.org/git/?p=glibc.git;a=shortlog;h=refs/heads/maskray/relr "pre-standard": even Solaris folks are happy with the refined generic-abi proposal. Cary Coutant will apply the change https://sourceware.org/pipermail/libc-alpha/2021-October/131781.html This patch is simpler than Chrome OS's glibc patch and makes ELF_DYNAMIC_DO_RELR available to all ports. I don't think the current glibc implementation supports ia64 in an ELFCLASS32 container. That said, the style I used is works with an ELFCLASS32 container for 64-bit machine if ElfW(Addr) is 64-bit. * Chrome OS folks have carried a local patch since 2018 (latest version: https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/refs/heads/main/sys-libs/glibc/files/local/glibc-2.32). I.e. this feature has been battle tested. * Android bionic supports 2018 and switched to DT_RELR==36 in 2020. * The Linux kernel has supported CONFIG_RELR since 2019-08 (https://git.kernel.org/linus/5cf896fb6be3effd9aea455b22213e27be8bdb1d). * A musl patch (by me) exists but is not applied: https://www.openwall.com/lists/musl/2019/03/06/3 * rtld-elf from FreeBSD 14 will support DT_RELR. I believe upstream glibc should support DT_RELR to benefit all Linux distributions. I filed some feature requests to get their attention: * Gentoo: https://bugs.gentoo.org/818376 * Arch Linux: https://bugs.archlinux.org/task/72433 * Debian https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=996598 * Fedora https://bugzilla.redhat.com/show_bug.cgi?id=2014699 As of linker support (to the best of my knowledge): * LLD support DT_RELR. * https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/refs/heads/main/sys-devel/binutils/files/ has a gold patch. * GNU ld feature request https://sourceware.org/bugzilla/show_bug.cgi?id=27923 Changes from the original patch: 1. Check the linker option, -z pack-relative-relocs, which add a GLIBC_ABI_DT_RELR symbol version dependency on the shared C library if it provides a GLIBC_2.XX symbol version. 2. Change make variale to have-dt-relr. 3. Rename tst-relr-no-pie to tst-relr-pie for --disable-default-pie. 4. Use TEST_VERIFY in tst-relr.c. 5. Add the check-tst-relr-pie.out test to check for linker generated libc.so version dependency on GLIBC_ABI_DT_RELR. 6. Move ELF_DYNAMIC_DO_RELR before ELF_DYNAMIC_DO_REL.
* Add GLIBC_ABI_DT_RELR for DT_RELR supportH.J. Lu2022-04-266-5/+60
| | | | | | | | | | | | | | | | | | | | The EI_ABIVERSION field of the ELF header in executables and shared libraries can be bumped to indicate the minimum ABI requirement on the dynamic linker. However, EI_ABIVERSION in executables isn't checked by the Linux kernel ELF loader nor the existing dynamic linker. Executables will crash mysteriously if the dynamic linker doesn't support the ABI features required by the EI_ABIVERSION field. The dynamic linker should be changed to check EI_ABIVERSION in executables. Add a glibc version, GLIBC_ABI_DT_RELR, to indicate DT_RELR support so that the existing dynamic linkers will issue an error on executables with GLIBC_ABI_DT_RELR dependency. When there is a DT_VERNEED entry with libc.so on DT_NEEDED, issue an error if there is a DT_RELR entry without GLIBC_ABI_DT_RELR dependency. Support __placeholder_only_for_empty_version_map as the placeholder symbol used only for empty version map to generate GLIBC_ABI_DT_RELR without any symbols.
* elf: Define DT_RELR related macros and typesH.J. Lu2022-04-262-2/+15
|
* elf: Replace PI_STATIC_AND_HIDDEN with opposite HIDDEN_VAR_NEEDS_DYNAMIC_RELOCFangrui Song2022-04-2641-82/+46
| | | | | | | | | | | | | | | | | | PI_STATIC_AND_HIDDEN indicates whether accesses to internal linkage variables and hidden visibility variables in a shared object (ld.so) need dynamic relocations (usually R_*_RELATIVE). PI (position independent) in the macro name is a misnomer: a code sequence using GOT is typically position-independent as well, but using dynamic relocations does not meet the requirement. Not defining PI_STATIC_AND_HIDDEN is legacy and we expect that all new ports will define PI_STATIC_AND_HIDDEN. Current ports defining PI_STATIC_AND_HIDDEN are more than the opposite. Change the configure default. No functional change. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* i386: Regenerate ulpsCarlos O'Donell2022-04-262-2/+2
| | | | | | | These failures were caught while building glibc master for Fedora Rawhide which is built with '-mtune=generic -msse2 -mfpmath=sse' using gcc 11.3 (gcc-11.3.1-2.fc35) on a Cascadelake Intel Xeon processor.
* dlfcn: Do not use rtld_active () to determine ld.so state (bug 29078)Florian Weimer2022-04-2614-14/+159
| | | | | | | | | | | | | | | | | | | | When audit modules are loaded, ld.so initialization is not yet complete, and rtld_active () returns false even though ld.so is mostly working. Instead, the static dlopen hook is used, but that does not work at all because this is not a static dlopen situation. Commit 466c1ea15f461edb8e3ffaf5d86d708876343bbf ("dlfcn: Rework static dlopen hooks") moved the hook pointer into _rtld_global_ro, which means that separate protection is not needed anymore and the hook pointer can be checked directly. The guard for disabling libio vtable hardening in _IO_vtable_check should stay for now. Fixes commit 8e1472d2c1e25e6eabc2059170731365f6d5b3d1 ("ld.so: Examine GLRO to detect inactive loader [BZ #20204]"). Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* INSTALL: Rephrase -with-default-link documentationFlorian Weimer2022-04-262-9/+9
| | | | Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* elf: Move post-relocation code of _dl_start into _dl_start_finalFangrui Song2022-04-251-15/+10
| | | | | | | | | | | | | | | | On non-PI_STATIC_AND_HIDDEN architectures, getting the address of _rtld_local_ro (for GLRO (dl_final_object)) goes through a GOT entry. The GOT load may be reordered before self relocation, leading to an unrelocated/incorrect _rtld_local_ro address. 84e02af1ebc9988126eebe60bf19226cea835623 tickled GCC powerpc32 to reorder the GOT load before relative relocations, leading to ld.so crash. This is similar to the m68k jump table reordering issue fixed by a8e9b5b8079d18116ca69c9797e77804ecf2ee7e. Move code after self relocation into _dl_start_final to avoid the reordering. This fixes powerpc32 and may help other architectures when ELF_DYNAMIC_RELOCATE is simplified in the future.
* misc: Fix rare fortify crash on wchar funcs. [BZ 29030]Joan Bruguera2022-04-252-6/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If `__glibc_objsize (__o) == (size_t) -1` (i.e. `__o` is unknown size), fortify checks should pass, and `__whatever_alias` should be called. Previously, `__glibc_objsize (__o) == (size_t) -1` was explicitly checked, but on commit a643f60c53876b, this was moved into `__glibc_safe_or_unknown_len`. A comment says the -1 case should work as: "The -1 check is redundant because since it implies that __glibc_safe_len_cond is true.". But this fails when: * `__s > 1` * `__osz == -1` (i.e. unknown size at compile time) * `__l` is big enough * `__l * __s <= __osz` can be folded to a constant (I only found this to be true for `mbsrtowcs` and other functions in wchar2.h) In this case `__l * __s <= __osz` is false, and `__whatever_chk_warn` will be called by `__glibc_fortify` or `__glibc_fortify_n` and crash the program. This commit adds the explicit `__osz == -1` check again. moc crashes on startup due to this, see: https://bugs.archlinux.org/task/74041 Minimal test case (test.c): #include <wchar.h> int main (void) { const char *hw = "HelloWorld"; mbsrtowcs (NULL, &hw, (size_t)-1, NULL); return 0; } Build with: gcc -O2 -Wp,-D_FORTIFY_SOURCE=2 test.c -o test && ./test Output: *** buffer overflow detected ***: terminated Fixes: BZ #29030 Signed-off-by: Joan Bruguera <joanbrugueram@gmail.com> Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* elf: Remove unused enum allowmaskFangrui Song2022-04-251-11/+0
| | | | | | | Unused since 52a01100ad011293197637e42b5be1a479a2f4ae ("elf: Remove ad-hoc restrictions on dlopen callers [BZ #22787]"). Reviewed-by: Florian Weimer <fweimer@redhat.com>
* scripts/glibcelf.py: Mark as UNSUPPORTED on Python 3.5 and earlierFlorian Weimer2022-04-251-0/+6
| | | | | enum.IntFlag and enum.EnumMeta._missing_ support are not part of earlier Python versions.
* x86: Optimize {str|wcs}rchr-evexNoah Goldstein2022-04-221-181/+290
| | | | | | | | | | | The new code unrolls the main loop slightly without adding too much overhead and minimizes the comparisons for the search CHAR. Geometric Mean of all benchmarks New / Old: 0.755 See email for all results. Full xcheck passes on x86_64 with and without multiarch enabled. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* x86: Optimize {str|wcs}rchr-avx2Noah Goldstein2022-04-221-157/+269
| | | | | | | | | | | The new code unrolls the main loop slightly without adding too much overhead and minimizes the comparisons for the search CHAR. Geometric Mean of all benchmarks New / Old: 0.832 See email for all results. Full xcheck passes on x86_64 with and without multiarch enabled. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* x86: Optimize {str|wcs}rchr-sse2Noah Goldstein2022-04-224-444/+339
| | | | | | | | | | | The new code unrolls the main loop slightly without adding too much overhead and minimizes the comparisons for the search CHAR. Geometric Mean of all benchmarks New / Old: 0.741 See email for all results. Full xcheck passes on x86_64 with and without multiarch enabled. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* benchtests: Improve bench-strrchrNoah Goldstein2022-04-221-44/+82
| | | | | | | | | 1. Use json-lib for printing results. 2. Expose all parameters (before pos, seek_char, and max_char where not printed). 3. Add benchmarks that test multiple occurence of seek_char in the string. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* x86-64: Fix SSE2 memcmp and SSSE3 memmove for x32H.J. Lu2022-04-222-0/+8
| | | | | | | | | | | | | | | | Clear the upper 32 bits in RDX (memory size) for x32 to fix FAIL: string/tst-size_t-memcmp FAIL: string/tst-size_t-memcmp-2 FAIL: string/tst-size_t-memcpy FAIL: wcsmbs/tst-size_t-wmemcmp on x32 introduced by 8804157ad9 x86: Optimize memcmp SSE2 in memcmp.S 26b2478322 x86: Reduce code size of mem{move|pcpy|cpy}-ssse3 Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
* Default to --with-default-link=no (bug 25812)Florian Weimer2022-04-227-118/+190
| | | | | | | | | This is necessary to place the libio vtables into the RELRO segment. New tests elf/tst-relro-ldso and elf/tst-relro-libc are added to verify that this is what actually happens. The new tests fail on ia64 due to lack of (default) RELRO support inbutils, so they are XFAILed there.
* scripts: Add glibcelf.py moduleFlorian Weimer2022-04-223-0/+1402
| | | | | | | | | | | | | | | | Hopefully, this will lead to tests that are easier to maintain. The current approach of parsing readelf -W output using regular expressions is not necessarily easier than parsing the ELF data directly. This module is still somewhat incomplete (e.g., coverage of relocation types and versioning information is missing), but it is sufficient to perform basic symbol analysis or program header analysis. The EM_* mapping for architecture-specific constant classes (e.g., SttX86_64) is not yet implemented. The classes are defined for the benefit of elf/tst-glibcelf.py. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>