about summary refs log tree commit diff
Commit message (Collapse)AuthorAgeFilesLines
* nptl: Move pthread_attr_setstack into libcFlorian Weimer2021-05-1164-52/+96
| | | | | | The symbol was moved using scripts/move-symbol-to-libc.py. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* nptl: Move pthread_attr_setguardsize into libcFlorian Weimer2021-05-1164-33/+75
| | | | | | The symbol was moved using scripts/move-symbol-to-libc.py. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* nptl: Move pthread_attr_getstacksize into libcFlorian Weimer2021-05-1164-33/+74
| | | | | | The symbol was moved using scripts/move-symbol-to-libc.py. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* nptl: Move pthread_attr_getstackaddr into libcFlorian Weimer2021-05-1164-33/+74
| | | | | | The symbol was moved using scripts/move-symbol-to-libc.py. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* nptl: Move pthread_attr_getstack into libcFlorian Weimer2021-05-1164-33/+74
| | | | | | The symbol was moved using scripts/move-symbol-to-libc.py. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* nptl: Move pthread_attr_getguardsize into libcFlorian Weimer2021-05-1164-33/+75
| | | | | | The symbol was moved using scripts/move-symbol-to-libc.py. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* nptl: Move pthread_attr_getaffinity_np into libcFlorian Weimer2021-05-1164-50/+93
| | | | | | The symbol was moved using scripts/move-symbol-to-libc.py. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* elf: Fix DTV gap reuse logic [BZ #27135]Szabolcs Nagy2021-05-113-15/+6
| | | | | | | | | | | | | | | | | | | | | | | For some reason only dlopen failure caused dtv gaps to be reused. It is possible that the intent was to never reuse modids for a different module, but after dlopen failure all gaps are reused not just the ones caused by the unfinished dlopened. So the code has to handle reused modids already which seems to work, however the data races at thread creation and tls access (see bug 19329 and bug 27111) may be more severe if slots are reused so this is scheduled after those fixes. I think fixing the races are not simpler if reuse is disallowed and reuse has other benefits, so set GL(dl_tls_dtv_gaps) whenever entries are removed from the middle of the slotinfo list. The value does not have to be correct: incorrect true value causes the next modid query to do a slotinfo walk, incorrect false will leave gaps and new entries are added at the end. Fixes bug 27135. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* elf: Add test case for [BZ #19329]Szabolcs Nagy2021-05-113-2/+76
| | | | | | | | | | | | | Test concurrent dlopen and pthread_create when the loaded modules have TLS. This triggers dl-tls assertion failures more reliably than the nptl/tst-stack4 test. The dlopened module has 100 DT_NEEDED dependencies with TLS, they were reused from an existing TLS test. The number of created threads during dlopen depends on filesystem speed and hardware, but at most 3 threads are alive at a time to limit resource usage. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* elf: Use relaxed atomics for racy accesses [BZ #19329]Szabolcs Nagy2021-05-114-17/+42
| | | | | | | | | | | | | | | | | | This is a follow up patch to the fix for bug 19329. This adds relaxed MO atomics to accesses that were previously data races but are now race conditions, and where relaxed MO is sufficient. The race conditions all follow the pattern that the write is behind the dlopen lock, but a read can happen concurrently (e.g. during tls access) without holding the lock. For slotinfo entries the read value only matters if it reads from a synchronized write in dlopen or dlclose, otherwise the related dtv entry is not valid to access so it is fine to leave it in an inconsistent state. The same applies for GL(dl_tls_max_dtv_idx) and GL(dl_tls_generation), but there the algorithm relies on the fact that the read of the last synchronized write is an increasing value. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* elf: Fix data races in pthread_create and TLS access [BZ #19329]Szabolcs Nagy2021-05-111-16/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DTV setup at thread creation (_dl_allocate_tls_init) is changed to take the dlopen lock, GL(dl_load_lock). Avoiding data races here without locks would require design changes: the map that is accessed for static TLS initialization here may be concurrently freed by dlclose. That use after free may be solved by only locking around static TLS setup or by ensuring dlclose does not free modules with static TLS, however currently every link map with TLS has to be accessed at least to see if it needs static TLS. And even if that's solved, still a lot of atomics would be needed to synchronize DTV related globals without a lock. So fix both bug 19329 and bug 27111 with a lock that prevents DTV setup running concurrently with dlopen or dlclose. _dl_update_slotinfo at TLS access still does not use any locks so CONCURRENCY NOTES are added to explain the synchronization. The early exit from the slotinfo walk when max_modid is reached is not strictly necessary, but does not hurt either. An incorrect acquire load was removed from _dl_resize_dtv: it did not synchronize with any release store or fence and synchronization is now handled separately at thread creation and TLS access time. There are still a number of racy read accesses to globals that will be changed to relaxed MO atomics in a followup patch. This should not introduce regressions compared to existing behaviour and avoid cluttering the main part of the fix. Not all TLS access related data races got fixed here: there are additional races at lazy tlsdesc relocations see bug 27137. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* write_archive_locales: Fix memory leakSiddhesh Poyarekar2021-05-111-0/+2
| | | | Fix memory leak identified by coverity.
* nptl: Move thread join functions into libcFlorian Weimer2021-05-1171-178/+436
| | | | | | | | | | | The symbols pthread_clockjoin_np, pthread_join, pthread_timedjoin_np, pthread_tryjoin_np, thrd_join were moved using scripts/move-symbol-to-libc.py. Moving the symbols at the same time avoids the need for temporary exports. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* nptl: Move pthread_detach, thrd_detach into libcFlorian Weimer2021-05-1167-66/+157
| | | | | | The symbols were moved using scripts/move-symbol-to-libc.py. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* nptl: Move __free_tcb into libcFlorian Weimer2021-05-117-29/+52
| | | | | | Under the name __nptl_free_tcb. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* nptl: Move stack cache management, __libpthread_freeres into libcFlorian Weimer2021-05-1110-158/+212
| | | | | | | | | | | | | This replaces the FREE_P macro with the __nptl_stack_in_use inline function. stack_list_del is renamed to __nptl_stack_list_del, stack_list_add to __nptl_stack_list_add, __deallocate_stack to __nptl_deallocate_stack, free_stacks to __nptl_free_stacks. It is convenient to move __libpthread_freeres into libc at the same time. This removes the temporary __default_pthread_attr_freeres export and restores full freeres coverage for __default_pthread_attr. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* nptl: Move pthread_setattr_default_np into libcFlorian Weimer2021-05-1165-35/+80
| | | | | | | | | | | The symbol was moved using scripts/move-symbol-to-libc.py. The export of __default_pthread_attr_freeres is temporary. There is a minor regression in freeres coverage because in the dynamic case, __default_pthread_attr_freeres is no longer called if libpthread is not linked in. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* nptl: Remove always-disabled debugging supportFlorian Weimer2021-05-113-75/+5
| | | | | | | This removes the DEBUGGING_P macro and the __pthread_debug variable. The __find_in_stack_list function is now unused and deleted as well. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* nptl: Replace pthread_sigqueue implementation with Linux oneFlorian Weimer2021-05-112-75/+38
| | | | Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* get-translit.py: Fix typoSiddhesh Poyarekar2021-05-111-1/+1
|
* _dl_exception_create_format: Add missing va_endSiddhesh Poyarekar2021-05-111-0/+1
| | | | Coverity discovered a missing va_end.
* linux: Move funlockfile/_IO_funlockfile into libcAdhemerval Zanella2021-05-1032-92/+3
| | | | | | | | | | The nptl version is used as default, since now with symbol always present the single-thread optimization is tricky. Hurd is not change, it is used it own lock scheme (which call _cthreads_funlockfile). Checked on x86_64-linux-gnu.
* linux: Move ftrylockfile/_IO_ftrylockfile into libcAdhemerval Zanella2021-05-1032-94/+3
| | | | | | | | | | The nptl version is used as default, since now with symbol always present the single-thread optimization is tricky. Hurd is not change, it is used it own lock scheme (which call _cthreads_ftrylockfile). Checked on x86_64-linux-gnu.
* linux: Move flockfile/_IO_flockfile into libcAdhemerval Zanella2021-05-1032-94/+3
| | | | | | | | | | The nptl version is used as default, since now with symbol always present the single-thread optimization is tricky. Hurd is not change, it is used it own lock scheme (which call _cthreads_flockfile). Checked on x86_64-linux-gnu.
* Use a #pragma to suppress a bogus GCC 10 warning instead of an assert [BZ ↵Martin Sebor2021-05-101-1/+11
| | | | | | 27832]. Reviewed-by: fweimer@redhat.com
* Add PTRACE_SYSEMU and PT_SYSEMU_SINGLESTEP from Linux 5.12 for s390Joseph Myers2021-05-101-0/+10
| | | | | | | | Linux 5.12 adds the constants PTRACE_SYSEMU and PTRACE_SYSEMU_SINGLESTEP for s390. Add these to glibc. Tested with build-many-glibcs.py for s390-linux-gnu and s390x-linux-gnu.
* add workload traces for cbrtlPaul Zimmermann2021-05-103-0/+1011
| | | | | | These workload traces cover the whole "long double" range. This patch was prepared with the help of Adhemerval Zanella. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* Linux: Move __reclaim_stacks into the fork implementation in libcFlorian Weimer2021-05-105-119/+110
| | | | | | | As a result, __libc_pthread_init is no longer needed. Tested-by: Carlos O'Donell <carlos@redhat.com> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* nptl: Move __default_pthread_attr, __default_pthread_attr_lock into libcFlorian Weimer2021-05-104-6/+14
| | | | | | | | The GLIBC_PRIVATE exports for these symbols are expected to be temporary. Tested-by: Carlos O'Donell <carlos@redhat.com> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* nptl: Simplify resetting the in-flight stack in __reclaim_stacksFlorian Weimer2021-05-101-3/+3
| | | | | | | stack_list_del overwrites the in-flight stack variable. Tested-by: Carlos O'Donell <carlos@redhat.com> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* nptl: Move changing of stack permissions into ld.soFlorian Weimer2021-05-109-83/+100
| | | | | | | | | All the stack lists are now in _rtld_global, so it is possible to change stack permissions directly from there, instead of calling into libpthread to do the change. Tested-by: Carlos O'Donell <carlos@redhat.com> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* nptl: Simplify the change_stack_perm calling conventionFlorian Weimer2021-05-101-24/+5
| | | | | | | | Only ia64 needs the page mask, and it is straightforward to compute the value within the function itself. Tested-by: Carlos O'Donell <carlos@redhat.com> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* nptl: Move more stack management variables into _rtld_globalFlorian Weimer2021-05-104-30/+36
| | | | | | | | | | | | Permissions of the cached stacks may have to be updated if an object is loaded that requires executable stacks, so the dynamic loader needs to know about these cached stacks. The move of in_flight_stack and stack_cache_actsize is a requirement for merging __reclaim_stacks into the fork implementation in libc. Tested-by: Carlos O'Donell <carlos@redhat.com> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* elf: Introduce __tls_pre_init_tpFlorian Weimer2021-05-106-40/+60
| | | | | | | | | | | This is an early variant of __tls_init_tp, primarily for initializing thread-related elements of _rtld_global/GL. Some existing initialization code not needed for NPTL is moved into the generic version of this function. Tested-by: Carlos O'Donell <carlos@redhat.com> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* nptl: Eliminate __pthread_multiple_threadsFlorian Weimer2021-05-104-17/+3
| | | | | | | It is no longer needed after the SINGLE_THREADED_P consolidation. Tested-by: Carlos O'Donell <carlos@redhat.com> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* Linux: Simplify and fix the definition of SINGLE_THREAD_PFlorian Weimer2021-05-101-29/+7
| | | | | | | | | | | | | | Always use __libc_multiple_threads if beneficial, and do not assume the the dynamic loader is single-threaded. This assumption could become incorrect by accident once more code is moved from libpthread into it. The previous commit introducing the NO_SYSCALL_CANCEL_CHECKING macro enables this change. Do not hint to the compiler that multi-threaded programs are unlikely (which is not quite true anymore). Tested-by: Carlos O'Donell <carlos@redhat.com> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* Linux: Explicitly disable cancellation checking in the dynamic loaderFlorian Weimer2021-05-101-2/+9
| | | | | | | | | | | | | Historically, SINGLE_THREAD_P is defined to 1 in the dynamic loader. This has the side effect of disabling cancellation points. In order to enable future use of SINGLE_THREAD_P for single-thread optimizations in the dynamic loader (which becomes important once more code is moved from libpthread), introduce a new NO_SYSCALL_CANCEL_CHECKING macro which is always 1 for IS_IN (rtld), indepdently of the actual SINGLE_THREAD_P value. Tested-by: Carlos O'Donell <carlos@redhat.com> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* nptl: Export __libc_multiple_threads from libc as an internal symbolFlorian Weimer2021-05-108-30/+13
| | | | | | | | This allows the elimination of the __libc_multiple_threads_ptr variable in libpthread and its initialization procedure. Tested-by: Carlos O'Donell <carlos@redhat.com> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* elf, nptl: Resolve recursive lock implementation earlyFlorian Weimer2021-05-107-24/+120
| | | | | | | | | | | If libpthread is included in libc, it is not necessary to delay initialization of the lock/unlock function pointers until libpthread is loaded. This eliminates two unprotected function pointers from _rtld_global and removes some initialization code from libpthread. Tested-by: Carlos O'Donell <carlos@redhat.com> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* scripts/versions.awk: Add strings and hashes to <first-versions.h>Florian Weimer2021-05-101-0/+36
| | | | | | | | | This generates new macros of this from: They are useful for symbol lookups using _dl_lookup_direct. Tested-by: Carlos O'Donell <carlos@redhat.com> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* Hurd: Add missing hidden proto definition for __ttyname_rFlorian Weimer2021-05-101-1/+1
|
* x86: Add EVEX optimized memchr family not safe for RTMNoah Goldstein2021-05-0810-41/+217
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No bug. This commit adds a new implementation for EVEX memchr that is not safe for RTM because it uses vzeroupper. The benefit is that by using ymm0-ymm15 it can use vpcmpeq and vpternlogd in the 4x loop which is faster than the RTM safe version which cannot use vpcmpeq because there is no EVEX encoding for the instruction. All parts of the implementation aside from the 4x loop are the same for the two versions and the optimization is only relevant for large sizes. Tigerlake: size , algn , Pos , Cur T , New T , Win , Dif 512 , 6 , 192 , 9.2 , 9.04 , no-RTM , 0.16 512 , 7 , 224 , 9.19 , 8.98 , no-RTM , 0.21 2048 , 0 , 256 , 10.74 , 10.54 , no-RTM , 0.2 2048 , 0 , 512 , 14.81 , 14.87 , RTM , 0.06 2048 , 0 , 1024 , 22.97 , 22.57 , no-RTM , 0.4 2048 , 0 , 2048 , 37.49 , 34.51 , no-RTM , 2.98 <-- Icelake: size , algn , Pos , Cur T , New T , Win , Dif 512 , 6 , 192 , 7.6 , 7.3 , no-RTM , 0.3 512 , 7 , 224 , 7.63 , 7.27 , no-RTM , 0.36 2048 , 0 , 256 , 8.48 , 8.38 , no-RTM , 0.1 2048 , 0 , 512 , 11.57 , 11.42 , no-RTM , 0.15 2048 , 0 , 1024 , 17.92 , 17.38 , no-RTM , 0.54 2048 , 0 , 2048 , 30.37 , 27.34 , no-RTM , 3.03 <-- test-memchr, test-wmemchr, and test-rawmemchr are all passing. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* x86-64: Fix an unknown vector operation in memchr-evex.SAlice Xu2021-05-071-1/+1
| | | | | | | An unknown vector operation occurred in commit 2a76821c308. Fixed it by using "ymm{k1}{z}" but not "ymm {k1} {z}". Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* powerpc64le: Fix ifunc selection for memset, memmove, bzero and bcopyRaoni Fassina Firmino2021-05-075-20/+22
| | | | | | | | | The hwcap2 check for the aforementioned functions should check for both PPC_FEATURE2_ARCH_3_1 and PPC_FEATURE2_HAS_ISEL but was mistakenly checking for any one of them, enabling isa 3.1 version of the functions in incompatible processors, like POWER8. Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
* malloc: Make tunable callback functions staticH.J. Lu2021-05-071-2/+2
| | | | | Since malloc tunable callback functions are only used within the same file, we should make them static.
* linux: implement ttyname as a wrapper around ttyname_r.Érico Nogueira2021-05-073-161/+15
| | | | | | | | | | | | | Big win in binary size and avoids duplicating the logic in multiple places. On x86_64, dropped from 1883206 to 1881790, a 1416 byte decrease. Also changed logic to track if ttyname_buf has been allocated by checking if it's NULL instead of tracking buflen as an additional variable. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* linux: use fd_to_filename instead of _fitoa_word in ttyname_r.Érico Nogueira2021-05-071-5/+3
| | | | | | | | | Simplifies the logic and makes intent clearer, while at the same time decreasing binary size. On x86_64, dropped from 1883270 to 1883206, a 64 byte decrease. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* misc: use _fitoa_word to implement __fd_to_filename.Érico Nogueira2021-05-071-5/+2
| | | | | | | | | | In a default build for x86_64, size decreased by 24 bytes: 1883294 to 1883270. Aditionally, avoids repeating the number printing logic in multiple places. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* linux: Remove /proc/cpuinfo fallback on alpha and sparcAdhemerval Zanella2021-05-073-97/+1
| | | | | | | | | There is no much gain in fallback to cpuinfo if sysfs is no present, usually on restricted environment neither will be present. It also simplifies the code and make all architecture use the sched_getaffinity as the sysfs fallback. Checked on sparc64-linux-gnu.
* linux: Use sched_getaffinity for __get_nprocs (BZ #27645)Adhemerval Zanella2021-05-078-338/+31
| | | | | | | | Both the sysfs and procfs parsing (through GET_NPROCS_PARSER) are removed in favor the syscall. The initial scratch buffer should fit to most of the common usage (1024 bytes with maps to 8192 CPUs). Checked on x86_64-linux-gnu and aarch64-linux-gnu.