about summary refs log tree commit diff
Commit message (Collapse)AuthorAgeFilesLines
* elf: Disable some subtests of ifuncmain1, ifuncmain5 for !PIE release/2.35/masterFlorian Weimer7 days2-0/+22
| | | | (cherry picked from commit 9cc9d61ee12f2f8620d8e0ea3c42af02bf07fe1e)
* malloc: Exit early on test failure in tst-reallocFlorian Weimer7 days1-31/+15
| | | | | | | | | | | This addresses more (correct) use-after-free warnings reported by GCC 12 on some targets. Fixes commit c094c232eb3246154265bb035182f92fe1b17ab8 ("Avoid -Wuse-after-free in tests [BZ #26779]."). Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org> (cherry picked from commit d653fd2d9ebe23c2b16b76edf717c5dbd5ce9b77)
* nscd: Use time_t for return type of addgetnetgrentXFlorian Weimer7 days1-2/+2
| | | | | | | | | | | | Using int may give false results for future dates (timeouts after the year 2028). Fixes commit 04a21e050d64a1193a6daab872bca2528bda44b ("CVE-2024-33601, CVE-2024-33602: nscd: netgroup: Use two buffers in addgetnetgrentX (bug 31680)"). Reviewed-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit 4bbca1a44691a6e9adcee5c6798a707b626bc331)
* login: structs utmp, utmpx, lastlog _TIME_BITS independence (bug 30701)Florian Weimer7 days18-22/+165
| | | | | | | | | | These structs describe file formats under /var/log, and should not depend on the definition of _TIME_BITS. This is achieved by defining __WORDSIZE_TIME64_COMPAT32 to 1 on 32-bit ports that support 32-bit time_t values (where __time_t is 32 bits). Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit 9abdae94c7454c45e02e97e4ed1eb1b1915d13d8)
* login: Check default sizes of structs utmp, utmpx, lastlogFlorian Weimer7 days17-1/+88
| | | | | | | | | The default <utmp-size.h> is for ports with a 64-bit time_t. Ports with a 32-bit time_t or with __WORDSIZE_TIME64_COMPAT32=1 need to override it. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit 4d4da5aab936504b2d3eca3146e109630d9093c4)
* sparc: Remove 64 bit check on sparc32 wordsize (BZ 27574)Adhemerval Zanella7 days1-9/+4
| | | | | | | | The sparc32 is always 32 bits. Checked on sparcv9-linux-gnu. (cherry picked from commit dd57f5e7b652772499cb220d78157c1038d24f06)
* elf: Also compile dl-misc.os with $(rtld-early-cflags)H.J. Lu14 days1-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Also compile dl-misc.os with $(rtld-early-cflags) to avoid Program received signal SIGILL, Illegal instruction. 0x00007ffff7fd36ea in _dl_strtoul (nptr=nptr@entry=0x7fffffffe2c9 "2", endptr=endptr@entry=0x7fffffffd728) at dl-misc.c:156 156 bool positive = true; (gdb) bt #0 0x00007ffff7fd36ea in _dl_strtoul (nptr=nptr@entry=0x7fffffffe2c9 "2", endptr=endptr@entry=0x7fffffffd728) at dl-misc.c:156 #1 0x00007ffff7fdb1a9 in tunable_initialize ( cur=cur@entry=0x7ffff7ffbc00 <tunable_list+2176>, strval=strval@entry=0x7fffffffe2c9 "2", len=len@entry=1) at dl-tunables.c:131 #2 0x00007ffff7fdb3a2 in parse_tunables (valstring=<optimized out>) at dl-tunables.c:258 #3 0x00007ffff7fdb5d9 in __GI___tunables_init (envp=0x7fffffffdd58) at dl-tunables.c:288 #4 0x00007ffff7fe44c3 in _dl_sysdep_start ( start_argptr=start_argptr@entry=0x7fffffffdcb0, dl_main=dl_main@entry=0x7ffff7fe5f80 <dl_main>) at ../sysdeps/unix/sysv/linux/dl-sysdep.c:110 #5 0x00007ffff7fe5cae in _dl_start_final (arg=0x7fffffffdcb0) at rtld.c:494 #6 _dl_start (arg=0x7fffffffdcb0) at rtld.c:581 #7 0x00007ffff7fe4b38 in _start () (gdb) when setting GLIBC_TUNABLES in glibc compiled with APX. Reviewed-by: Florian Weimer <fweimer@redhat.com> (cherry picked from commit 049b7684c912dd32b67b1b15b0f43bf07d5f512e)
* CVE-2024-33601, CVE-2024-33602: nscd: netgroup: Use two buffers in ↵Florian Weimer2024-04-251-98/+121
| | | | | | | | | | | | | | | | | | addgetnetgrentX (bug 31680) This avoids potential memory corruption when the underlying NSS callback function does not use the buffer space to store all strings (e.g., for constant strings). Instead of custom buffer management, two scratch buffers are used. This increases stack usage somewhat. Scratch buffer allocation failure is handled by return -1 (an invalid timeout value) instead of terminating the process. This fixes bug 31679. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org> (cherry picked from commit c04a21e050d64a1193a6daab872bca2528bda44b)
* CVE-2024-33600: nscd: Avoid null pointer crashes after notfound response ↵Florian Weimer2024-04-251-4/+7
| | | | | | | | | | | | | | | | | | (bug 31678) The addgetnetgrentX call in addinnetgrX may have failed to produce a result, so the result variable in addinnetgrX can be NULL. Use db->negtimeout as the fallback value if there is no result data; the timeout is also overwritten below. Also avoid sending a second not-found response. (The client disconnects after receiving the first response, so the data stream did not go out of sync even without this fix.) It is still beneficial to add the negative response to the mapping, so that the client can get it from there in the future, instead of going through the socket. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org> (cherry picked from commit b048a482f088e53144d26a61c390bed0210f49f2)
* CVE-2024-33600: nscd: Do not send missing not-found response in ↵Florian Weimer2024-04-251-8/+6
| | | | | | | | | | addgetnetgrentX (bug 31678) If we failed to add a not-found response to the cache, the dataset point can be null, resulting in a null pointer dereference. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org> (cherry picked from commit 7835b00dbce53c3c87bbbb1754a95fb5e58187aa)
* CVE-2024-33599: nscd: Stack-based buffer overflow in netgroup cache (bug 31677)Florian Weimer2024-04-251-2/+3
| | | | | | | | Using alloca matches what other caches do. The request length is bounded by MAXKEYLEN. Reviewed-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit 87801a8fd06db1d654eea3e4f7626ff476a9bdaa)
* iconv: ISO-2022-CN-EXT: fix out-of-bound writes when writing escape sequence ↵Charles Fol2024-04-173-1/+144
| | | | | | | | | | | | | | | | | | (CVE-2024-2961) ISO-2022-CN-EXT uses escape sequences to indicate character set changes (as specified by RFC 1922). While the SOdesignation has the expected bounds checks, neither SS2designation nor SS3designation have its; allowing a write overflow of 1, 2, or 3 bytes with fixed values: '$+I', '$+J', '$+K', '$+L', '$+M', or '$*H'. Checked on aarch64-linux-gnu. Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit f9dc609e06b1136bb0408be9605ce7973a767ada)
* powerpc: Fix ld.so address determination for PCREL mode (bug 31640)Florian Weimer2024-04-141-0/+19
| | | | | | | | | This seems to have stopped working with some GCC 14 versions, which clobber r2. With other compilers, the kernel-provided r2 value is still available at this point. Reviewed-by: Peter Bergner <bergner@linux.ibm.com> (cherry picked from commit 14e56bd4ce15ac2d1cc43f762eb2e6b83fec1afe)
* AArch64: Check kernel version for SVE ifuncsWilco Dijkstra2024-04-085-2/+53
| | | | | | | | | | | | | | | | | Old Linux kernels disable SVE after every system call. Calling the SVE-optimized memcpy afterwards will then cause a trap to reenable SVE. As a result, applications with a high use of syscalls may run slower with the SVE memcpy. This is true for kernels between 4.15.0 and before 6.2.0, except for 5.14.0 which was patched. Avoid this by checking the kernel version and selecting the SVE ifunc on modern kernels. Parse the kernel version reported by uname() into a 24-bit kernel.major.minor value without calling any library functions. If uname() is not supported or if the version format is not recognized, assume the kernel is modern. Tested-by: Florian Weimer <fweimer@redhat.com> Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit 2e94e2f5d2bf2de124c8ad7da85463355e54ccb2)
* aarch64: fix check for SVE support in assemblerSzabolcs Nagy2024-04-082-4/+6
| | | | | | | | | | | | | | Due to GCC bug 110901 -mcpu can override -march setting when compiling asm code and thus a compiler targetting a specific cpu can fail the configure check even when binutils gas supports SVE. The workaround is that explicit .arch directive overrides both -mcpu and -march, and since that's what the actual SVE memcpy uses the configure check should use that too even if the GCC issue is fixed independently. Reviewed-by: Florian Weimer <fweimer@redhat.com> (cherry picked from commit 73c26018ed0ecd9c807bb363cc2c2ab4aca66a82)
* aarch64: correct CFI in rawmemchr (bug 31113)Andreas Schwab2024-04-081-1/+1
| | | | | | | | | The .cfi_return_column directive changes the return column for the whole FDE range. But the actual intent is to tell the unwinder that the value in x30 (lr) now resides in x15 after the move, and that is expressed by the .cfi_register directive. (cherry picked from commit 3f798427884fa57770e8e2291cf58d5918254bb5)
* AArch64: Remove Falkor memcpyWilco Dijkstra2024-04-088-332/+1
| | | | | | | | | | | The latest implementations of memcpy are actually faster than the Falkor implementations [1], so remove the falkor/phecda ifuncs for memcpy and the now unused IS_FALKOR/IS_PHECDA defines. [1] https://sourceware.org/pipermail/libc-alpha/2022-December/144227.html Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit 2f5524cc5381eb75fef55f7901bb907bd5628333)
* AArch64: Add memset_zva64Wilco Dijkstra2024-04-086-68/+38
| | | | | | | | | Add a specialized memset for the common ZVA size of 64 to avoid the overhead of reading the ZVA size. Since the code is identical to __memset_falkor, remove the latter. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit 3d7090f14b13312320e425b27dcf0fe72de026fd)
* AArch64: Cleanup emag memsetWilco Dijkstra2024-04-084-197/+90
| | | | | | | | Cleanup emag memset - merge the memset_base64.S file, remove the unused ZVA code (since it is disabled on emag). Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit 9627ab99b50d250c6dd3001a3355aa03692f7fe5)
* AArch64: Cleanup ifuncsWilco Dijkstra2024-04-0818-125/+41
| | | | | | | | | Cleanup ifuncs. Remove uses of libc_hidden_builtin_def, use ENTRY rather than ENTRY_ALIGN, remove unnecessary defines and conditional compilation. Rename strlen_mte to strlen_generic. Remove rtld-memset. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit 9fd3409842b3e2d31cff5dbd6f96066c430f0aa2)
* AArch64: Add support for MOPS memcpy/memmove/memsetWilco Dijkstra2024-04-0811-1/+141
| | | | | | | | | Add support for MOPS in cpu_features and INIT_ARCH. Add ifuncs using MOPS for memcpy, memmove and memset (use .inst for now so it works with all binutils versions without needing complex configure and conditional compilation). Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit 2bd00179885928fd95fcabfafc50e7b5c6e660d2)
* Add HWCAP2_MOPS from Linux 6.5 to AArch64 bits/hwcap.hJoseph Myers2024-04-081-0/+22
| | | | | | | | | Linux 6.5 adds a new AArch64 HWCAP2 value, HWCAP2_MOPS. Add it to glibc's bits/hwcap.h. Tested with build-many-glibcs.py for aarch64-linux-gnu. (cherry picked from commit ff5d2abd18629e0efac41e31699cdff3be0e08fa)
* AArch64: Improve SVE memcpy and memmoveWilco Dijkstra2024-04-081-20/+14
| | | | | | | | | Improve SVE memcpy by copying 2 vectors if the size is small enough. This improves performance of random memcpy by ~9% on Neoverse V1, and 33-64 byte copies are ~16% faster. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit d2d3f3720ce627a4fe154d8dd14db716a32bcc6e)
* AArch64: Improve strrchrWilco Dijkstra2024-04-081-25/+33
| | | | | | | | | Use shrn for narrowing the mask which simplifies code and speeds up small strings. Unroll the first search loop to improve performance on large strings. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit 55599d480437dcf129b41b95be32b48f2a9e5da9)
* AArch64: Optimize strnlenWilco Dijkstra2024-04-081-21/+18
| | | | | | | | | Optimize strnlen using the shrn instruction and improve the main loop. Small strings are around 10% faster, large strings are 40% faster on modern CPUs. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit ad098893ba3c3344a5f2f6ab1627c47204afdb47)
* AArch64: Optimize strlenWilco Dijkstra2024-04-081-8/+12
| | | | | | | | Optimize strlen by unrolling the main loop. Large strings are 64% faster on modern CPUs. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit 03c8ce5000198947a4dd7b2c14e5131738fda62b)
* AArch64: Optimize strcpyWilco Dijkstra2024-04-081-17/+19
| | | | | | | Unroll the main loop. Large strings are around 20% faster on modern CPUs. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit 349e48c01e85bd96006860084e76d322e6ca02f1)
* AArch64: Improve strchrnulWilco Dijkstra2024-04-081-2/+10
| | | | | | | Unroll the main loop, which improves performance slightly. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit 09ebd8549b2ce5a3a6c0c7c5f3e62227faf50a99)
* AArch64: Optimize strchrWilco Dijkstra2024-04-081-28/+24
| | | | | | | | Simplify calculation of the mask using shrn. Unroll the main loop. Small strings are 20% faster on modern CPUs. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit 51541a229740801882490177fa178e49264b13fb)
* AArch64: Improve strlen_asimdWilco Dijkstra2024-04-081-12/+4
| | | | | | | | Use shrn for the mask, merge tst+bne into cbnz, and tweak code alignment. Performance improves slightly as a result. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit 1bbb1a2022e126f21810d3d0ebe0a975d5243e43)
* AArch64: Optimize memrchrWilco Dijkstra2024-04-081-9/+11
| | | | | | | Optimize the main loop - large strings are 43% faster on modern CPUs. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit 00776241776e67fc666b896c1e85770f4f3ec1e1)
* AArch64: Optimize memchrWilco Dijkstra2024-04-081-13/+14
| | | | | | | Optimize the main loop - large strings are 40% faster on modern CPUs. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit ce758d4f063820c2bc743e12797d7454c66be718)
* aarch64: Use memcpy_simd as the default memcpyWilco Dijkstra2024-04-086-370/+81
| | | | | | | | Since __memcpy_simd is the fastest memcpy on almost all cores, replace the generic memcpy with it. If SVE is available, a SVE memcpy will be used by default (including for Neoverse N2). (cherry picked from commit e6f3fe362f1aab78b1448d69ecdbd9e3872636d3)
* aarch64: Cleanup memset ifuncWilco Dijkstra2024-04-082-17/+26
| | | | | | | Cleanup memset ifunc selectors. The A64FX memset relies on a ZVA size of 256, so add an explicit check. (cherry picked from commit a8e72913fea0c6e2832c50523c60907ffa3b753b)
* AArch64: Fix typo in sve configure check (BZ# 29394)Wilco Dijkstra2024-04-082-4/+4
| | | | | | Fix a typo in the SVE configure check. This fixes [BZ# 29394]. (cherry picked from commit 12182ba18dabda791a4f63a11ee2e9d828f40f9b)
* aarch64: Optimize string functions with shrn instructionDanila Kutenin2024-04-086-102/+59
| | | | | | | | | | | | | | | We found that string functions were using AND+ADDP to find the nibble/syndrome mask but there is an easier opportunity through `SHRN dst.8b, src.8h, 4` (shift right every 2 bytes by 4 and narrow to 1 byte) and has same latency on all SIMD ARMv8 targets as ADDP. There are also possible gaps for memcmp but that's for another patch. We see 10-20% savings for small-mid size cases (<=128) which are primary cases for general workloads. (cherry picked from commit 3c9980698988ef64072f1fac339b180f52792faf)
* AArch64: Sort makefile entriesWilco Dijkstra2024-04-081-6/+18
| | | | | | Sort makefile entries to reduce conflicts. (cherry picked from commit eea282d9c665392d6959f6d7112ba4bef27701c9)
* AArch64: Add SVE memcpyWilco Dijkstra2024-04-085-42/+284
| | | | | | | | Add an initial SVE memcpy implementation. Copies up to 32 bytes use SVE vectors which improves the random memcpy benchmark significantly. Cleanup the memcpy and memmove ifunc selectors. (cherry picked from commit 9f298bfe1f183804bb54b54ff9071afc0494906c)
* linux: Use rseq area unconditionally in sched_getcpu (bug 31479)Florian Weimer2024-03-181-8/+0
| | | | | | | | | | | | | | | | | | | | | Originally, nptl/descr.h included <sys/rseq.h>, but we removed that in commit 2c6b4b272e6b4d07303af25709051c3e96288f2d ("nptl: Unconditionally use a 32-byte rseq area"). After that, it was not ensured that the RSEQ_SIG macro was defined during sched_getcpu.c compilation that provided a definition. This commit always checks the rseq area for CPU number information before using the other approaches. This adds an unnecessary (but well-predictable) branch on architectures which do not define RSEQ_SIG, but its cost is small compared to the system call. Most architectures that have vDSO acceleration for getcpu also have rseq support. Fixes: 2c6b4b272e6b4d07303af25709051c3e96288f2d Fixes: 1d350aa06091211863e41169729cee1bca39f72f Reviewed-by: Arjun Shankar <arjun@redhat.com> (cherry picked from commit 7a76f218677d149d8b7875b336722108239f7ee9) Fixes: c9ee9cc8b8e4f8671c1d487f83db333b6be6a925
* Include sys/rseq.h in tst-rseq-disable.cStefan Liebler2024-03-181-0/+1
| | | | | | | | | | | | | Starting with commit 2c6b4b272e6b4d07303af25709051c3e96288f2d "nptl: Unconditionally use a 32-byte rseq area", the testcase misc/tst-rseq-disable is UNSUPPORTED as RSEQ_SIG is not defined. The mentioned commit removes inclusion of sys/rseq.h in nptl/descr.h. Thus just include sys/rseq.h in the tst-rseq-disable.c as also done in tst-rseq.c and tst-rseq-nptl.c. Reviewed-by: Florian Weimer <fweimer@redhat.com> (cherry picked from commit 637aac2ae3980de31a6baab236a9255fe853cc76)
* nptl: Unconditionally use a 32-byte rseq areaFlorian Weimer2024-03-181-4/+14
| | | | | | | | | If the kernel headers provide a larger struct rseq, we used that size as the argument to the rseq system call. As a result, rseq registration would fail on older kernels which only accept size 32. (cherry picked from commit 2c6b4b272e6b4d07303af25709051c3e96288f2d)
* make ‘struct pthread’ a complete typePaul Eggert2024-03-181-4/+4
| | | | | | | | | * nptl/descr.h (struct pthread): Remove end_padding member, which made this type incomplete. (PTHREAD_STRUCT_END_PADDING): Stop using end_padding. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org> (cherry picked from commit 3edc4ff2ceff4a59587ebecb94148d3bcfa1df62)
* support: use 64-bit time_t (bug 30111)Andreas Schwab2024-03-126-8/+22
| | | | | | Ensure to use 64-bit time_t in the test infrastructure. (cherry picked from commit 3bfdc4e2bceb601b90c81a9baa73c1904db58b2f)
* malloc: Use __get_nprocs on arena_get2 (BZ 30945)Adhemerval Zanella2024-02-125-18/+2
| | | | | | | | | | | | | | | | | | This restore the 2.33 semantic for arena_get2. It was changed by 11a02b035b46 to avoid arena_get2 call malloc (back when __get_nproc was refactored to use an scratch_buffer - 903bc7dcc2acafc). The __get_nproc was refactored over then and now it also avoid to call malloc. The 11a02b035b46 did not take in consideration any performance implication, which should have been discussed properly. The __get_nprocs_sched is still used as a fallback mechanism if procfs and sysfs is not acessible. Checked on x86_64-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com> (cherry picked from commit 472894d2cfee5751b44c0aaa71ed87df81c8e62e)
* x86_64: Optimize ffsll function code size.Sunil K Pandey2024-01-311-5/+5
| | | | | | | | | | | | | | | | | | Ffsll function randomly regress by ~20%, depending on how code gets aligned in memory. Ffsll function code size is 17 bytes. Since default function alignment is 16 bytes, it can load on 16, 32, 48 or 64 bytes aligned memory. When ffsll function load at 16, 32 or 64 bytes aligned memory, entire code fits in single 64 bytes cache line. When ffsll function load at 48 bytes aligned memory, it splits in two cache line, hence random regression. Ffsll function size reduction from 17 bytes to 12 bytes ensures that it will always fit in single 64 bytes cache line. This patch fixes ffsll function random performance regression. Reviewed-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit 9d94997b5f9445afd4f2bccc5fa60ff7c4361ec1)
* NEWS: Mention bug fixes for 29039/30745/30843H.J. Lu2023-12-231-0/+3
|
* x86-64: Fix the tcb field load for x32 [BZ #31185]H.J. Lu2023-12-232-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | _dl_tlsdesc_undefweak and _dl_tlsdesc_dynamic access the thread pointer via the tcb field in TCB: _dl_tlsdesc_undefweak: _CET_ENDBR movq 8(%rax), %rax subq %fs:0, %rax ret _dl_tlsdesc_dynamic: ... subq %fs:0, %rax movq -8(%rsp), %rdi ret Since the tcb field in TCB is a pointer, %fs:0 is a 32-bit location, not 64-bit. It should use "sub %fs:0, %RAX_LP" instead. Since _dl_tlsdesc_undefweak returns ptrdiff_t and _dl_make_tlsdesc_dynamic returns void *, RAX_LP is appropriate here for x32 and x86-64. This fixes BZ #31185. (cherry picked from commit 81be2a61dafc168327c1639e97b6dae128c7ccf3)
* x86-64: Fix the dtv field load for x32 [BZ #31184]H.J. Lu2023-12-232-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On x32, I got FAIL: elf/tst-tlsgap $ gdb elf/tst-tlsgap ... open tst-tlsgap-mod1.so Thread 2 "tst-tlsgap" received signal SIGSEGV, Segmentation fault. [Switching to LWP 2268754] _dl_tlsdesc_dynamic () at ../sysdeps/x86_64/dl-tlsdesc.S:108 108 movq (%rsi), %rax (gdb) p/x $rsi $4 = 0xf7dbf9005655fb18 (gdb) This is caused by _dl_tlsdesc_dynamic: _CET_ENDBR /* Preserve call-clobbered registers that we modify. We need two scratch regs anyway. */ movq %rsi, -16(%rsp) movq %fs:DTV_OFFSET, %rsi Since the dtv field in TCB is a pointer, %fs:DTV_OFFSET is a 32-bit location, not 64-bit. Load the dtv field to RSI_LP instead of rsi. This fixes BZ #31184. (cherry picked from commit 3502440397bbb840e2f7223734aa5cc2cc0e29b6)
* elf: Fix TLS modid reuse generation assignment (BZ 29039)Hector Martin2023-12-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | _dl_assign_tls_modid() assigns a slotinfo entry for a new module, but does *not* do anything to the generation counter. The first time this happens, the generation is zero and map_generation() returns the current generation to be used during relocation processing. However, if a slotinfo entry is later reused, it will already have a generation assigned. If this generation has fallen behind the current global max generation, then this causes an obsolete generation to be assigned during relocation processing, as map_generation() returns this generation if nonzero. _dl_add_to_slotinfo() eventually resets the generation, but by then it is too late. This causes DTV updates to be skipped, leading to NULL or broken TLS slot pointers and segfaults. Fix this by resetting the generation to zero in _dl_assign_tls_modid(), so it behaves the same as the first time a slot is assigned. _dl_add_to_slotinfo() will still assign the correct static generation later during module load, but relocation processing will no longer use an obsolete generation. Note that slotinfo entry (aka modid) reuse typically happens after a dlclose and only TLS access via dynamic tlsdesc is affected. Because tlsdesc is optimized to use the optional part of static TLS, dynamic tlsdesc can be avoided by increasing the glibc.rtld.optional_static_tls tunable to a large enough value, or by LD_PRELOAD-ing the affected modules. Fixes bug 29039. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit 3921c5b40f293c57cb326f58713c924b0662ef59)
* Revert "elf: Move l_init_called_next to old place of l_text_end in link map"Florian Weimer2023-10-191-4/+4
| | | | | | This reverts commit 59ee83b0c27a67a34dc53b312424c9435423bfc9. Reason for revert: Preserve internal ABI.