about summary refs log tree commit diff
path: root/src/env
Commit message (Collapse)AuthorAgeFilesLines
* protect stack canary from leak via read-as-string by zeroing second bytejvoisin2022-03-081-0/+9
| | | | | | | | | This reduces entropy of the canary from 64-bit to 56-bit in exchange for mitigating non-terminated C string overflows by setting the second byte of the canary to nul, so that off-by-one write overflow with a nul byte can still be detected. Idea from GrapheneOS bionic commit 7024d880b51f03a796ff8832f1298f2f1531fd7b
* fix inconsistent signature of __libc_start_mainRich Felker2021-01-301-1/+2
| | | | | | | | | | | | | | | | | | | | | | | commit 7586360badcae6e73f04eb1b8189ce630281c4b2 removed the unused arguments from the definition of __libc_start_main, making it incompatible with the declaration at the point of call, which still passed 6 arguments. calls with mismatched function type have undefined behavior, breaking LTO and any other tooling that checks for function signature mismatch. removing the extra arguments from the point of call (crt1) is not an option for fixing this, since that would be a change in ABI surface between application and libc. adding back the extra arguments requires some care. on archs that pass arguments on the stack or that reserve argument spill space for the callee on the stack, it imposes an ABI requirement on the caller to provide such space. the modern crt1.c entry point provides such space, but originally there was arch-specific asm for the call to __libc_start_main. the last of this asm was removed in commit 6fef8cafbd0f6f185897bc87feb1ff66e2e204e1, and manual review of the code removed and its prior history was performed to check that all archs/variants passed the legacy init/fini/ldso_fini arguments.
* remove redundant pthread struct members repeated for layout purposesRich Felker2020-08-272-2/+2
| | | | | | | dtv_copy, canary2, and canary_at_end existed solely to match multiple ABI and asm-accessed layouts simultaneously. now that pthread_arch.h can be included before struct __pthread is defined, the struct layout can depend on macros defined by pthread_arch.h.
* add secure_getenv functionPetr Vaněk2019-08-081-0/+8
| | | | This function is a GNU extension introduced in glibc 2.17.
* fix tls offsets when p_vaddr%p_align != 0 on TLS_ABOVE_TP targetsSzabolcs Nagy2019-05-161-1/+2
| | | | | | | | | | | | | | | | currently the bfd linker does not seem to create tls segments where p_vaddr%p_align != 0, but this is valid in ELF and then the runtime computed tls offset must satisfy offset%p_align == (base+p_vaddr)%p_align and in case of local exec tls (main executable) the smallest such offset must be used (otherwise it is incompatible with the offset computed by the static linker). the !TLS_ABOVE_TP case is handled correctly (the offset is negative then in the formula). the ldso code for TLS_ABOVE_TP is changed so the static tls offset of each module satisfies the formula.
* overhaul i386 syscall mechanism not to depend on external asm sourceRich Felker2019-04-102-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | this is the first part of a series of patches intended to make __syscall fully self-contained in the object file produced using syscall.h, which will make it possible for crt1 code to perform syscalls. the (confusingly named) i386 __vsyscall mechanism, which this commit removes, was introduced before the presence of a valid thread pointer was mandatory; back then the thread pointer was setup lazily only if threads were used. the intent was to be able to perform syscalls using the kernel's fast entry point in the VDSO, which can use the sysenter (Intel) or syscall (AMD) instruction instead of int $128, but without inlining an access to the __syscall global at the point of each syscall, which would incur a significant size cost from PIC setup everywhere. the mechanism also shuffled registers/calling convention around to avoid spills of call-saved registers, and to avoid allocating ebx or ebp via asm constraints, since there are plenty of broken-but-supported compiler versions which are incapable of allocating ebx with -fPIC or ebp with -fno-omit-frame-pointer. the new mechanism preserves the properties of avoiding spills and avoiding allocation of ebx/ebp in constraints, but does it inline, using some fairly simple register shuffling, and uses a field of the thread structure rather than global data for the vdso-provided syscall code address. for now, the external __syscall function is refactored not to use the old __vsyscall so it can be kept, but the intent is to remove it too.
* track all live threads in an AS-safe, fully-consistent linked listRich Felker2019-02-151-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the hard problem here is unlinking threads from a list when they exit without creating a window of inconsistency where the kernel task for a thread still exists and is still executing instructions in userspace, but is not reflected in the list. the magic solution here is getting rid of per-thread exit futex addresses (set_tid_address), and instead using the exit futex to unlock the global thread list. since pthread_join can no longer see the thread enter a detach_state of EXITED (which depended on the exit futex address pointing to the detach_state), it must now observe the unlocking of the thread list lock before it can unmap the joined thread and return. it doesn't actually have to take the lock. for this, a __tl_sync primitive is offered, with a signature that will allow it to be enhanced for quick return even under contention on the lock, if needed. for now, the exiting thread always performs a futex wake on its detach_state. a future change could optimize this out except when there is already a joiner waiting. initial/dynamic variants of detached state no longer need to be tracked separately, since the futex address is always set to the global list lock, not a thread-local address that could become invalid on detached thread exit. all detached threads, however, must perform a second sigprocmask syscall to block implementation-internal signals, since locking the thread list with them already blocked is not permissible. the arch-independent C version of __unmapself no longer needs to take a lock or setup its own futex address to release the lock, since it must necessarily be called with the thread list lock already held, guaranteeing exclusive access to the temporary stack. changes to libc.threads_minus_1 no longer need to be atomic, since they are guarded by the thread list lock. it is largely vestigial at this point, and can be replaced with a cheaper boolean indicating whether the process is multithreaded at some point in the future.
* __libc_start_main: slightly simplify stage2 pointer setupAlexander Monakov2018-11-021-3/+4
| | | | | | Use "+r" in the asm instead of implementing a non-transparent copy by applying "0" constraint to the source value. Introduce a typedef for the function type to avoid spelling it out twice.
* use prototype for function pointer in static link libc init barrierRich Felker2018-10-181-1/+1
| | | | | | this is not needed for correctness, but doesn't hurt, and in some cases the compiler may pessimize the call assuming the callee might be variadic when it lacks a prototype.
* fix error in constraints for static link libc init barrierRich Felker2018-10-181-1/+1
| | | | | | | commit 4390383b32250a941ec616e8bff6f568a801b1c0 inadvertently used "r" instead of "0" for the input constraint, which only happened to work for the configuration I tested it on because it usually makes sense for the compiler to choose the same input and output register.
* document and make explicit desired noinline property for __init_libcRich Felker2018-10-171-0/+6
| | | | | | | | | | | | | on multiple occasions I've started to flatten/inline the code in __init_libc, only to rediscover the reason it was not inlined: GCC fails to deallocate its stack (and now, with the changes in commit 4390383b32250a941ec616e8bff6f568a801b1c0, fails to produce a tail call to the stage 2 function; see PR #87639) before calling main if it was inlined. document this with a comment and use an explicit noinline attribute if __GNUC__ is defined so that even with CFLAGS that heavily favor inlining it won't get inlined.
* impose barrier between thread pointer setup and use for static linkingRich Felker2018-10-171-0/+13
| | | | | | | | | | | | this is the analog of commit 1c84c99913bf1cd47b866ed31e665848a0da84a2 for static linking. unlike with dynamic linking, we don't have symbolic lookup to use as a barrier. use a dummy (target-agnostic) degenerate inline asm fragment instead. this technique has precedent in commit 05ac345f895098657cf44d419b5d572161ebaf43 where it's used for explicit_bzero. if it proves problematic in any way, loading the address of the stage 2 function from a pointer object whose address leaks to kernelspace during thread pointer init could be used as an even stronger barrier.
* combine arch ABI's DTP_OFFSET into DTV pointersRich Felker2018-10-122-13/+12
| | | | | | | | | | | | | | | | | | | | | | as explained in commit 6ba5517a460c6c438f64d69464fdfc3269a4c91a, some archs use an offset (typicaly -0x8000) with their DTPOFF relocations, which __tls_get_addr needs to invert. on affected archs, which lack direct support for large immediates, this can cost multiple extra instructions in the hot path. instead, incorporate the DTP_OFFSET into the DTV entries. this means they are no longer valid pointers, so store them as an array of uintptr_t rather than void *; this also makes it easier to access slot 0 as a valid slot count. commit e75b16cf93ebbc1ce758d3ea6b2923e8b2457c68 left behind cruft in two places, __reset_tls and __tls_get_new, from back when it was possible to have uninitialized gap slots indicated by a null pointer in the DTV. since the concept of null pointer is no longer meaningful with an offset applied, remove this cruft. presently there are no archs with both TLSDESC and nonzero DTP_OFFSET, but the dynamic TLSDESC relocation code is also updated to apply an inverted offset to its offset field, so that the offset DTV would not impose a runtime cost in TLSDESC resolver functions.
* support setting of default thread stack size via PT_GNU_STACK headerRich Felker2018-09-181-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | this facilitates building software that assumes a large default stack size without any patching to call pthread_setattr_default_np or pthread_attr_setstacksize at each thread creation site, using just LDFLAGS. normally the PT_GNU_STACK header is used only to reflect whether executable stack is desired, but with GNU ld at least, passing -Wl,-z,stack-size=N will set a size on the program header. with this patch, that size will be incorporated into the default stack size (subject to increase-only rule and DEFAULT_STACK_MAX limit). both static and dynamic linking honor the program header. for dynamic linking, all libraries loaded at program start, including preloaded ones, are considered. dlopened libraries are not considered, for several reasons. extra logic would be needed to defer processing until the load of the new library is commited, synchronization woud be needed since other threads may be running concurrently, and the effectiveness woud be limited since the larger size would not apply to threads that already existed at the time of dlopen. programs that will dlopen code expecting a large stack need to declare the requirement themselves, or pthread_setattr_default_np can be used.
* reduce spurious inclusion of libc.hRich Felker2018-09-125-5/+1
| | | | | | | | | | | | | | | | | | | | | libc.h was intended to be a header for access to global libc state and related interfaces, but ended up included all over the place because it was the way to get the weak_alias macro. most of the inclusions removed here are places where weak_alias was needed. a few were recently introduced for hidden. some go all the way back to when libc.h defined CANCELPT_BEGIN and _END, and all (wrongly implemented) cancellation points had to include it. remaining spurious users are mostly callers of the LOCK/UNLOCK macros and files that use the LFS64 macro to define the awful *64 aliases. in a few places, new inclusion of libc.h is added because several internal headers no longer implicitly include libc.h. declarations for __lockfile and __unlockfile are moved from libc.h to stdio_impl.h so that the latter does not need libc.h. putting them in libc.h made no sense at all, since the macros in stdio_impl.h are needed to use them correctly anyway.
* overhaul internally-public declarations using wrapper headersRich Felker2018-09-126-11/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commits leading up to this one have moved the vast majority of libc-internal interface declarations to appropriate internal headers, allowing them to be type-checked and setting the stage to limit their visibility. the ones that have not yet been moved are mostly namespace-protected aliases for standard/public interfaces, which exist to facilitate implementing plain C functions in terms of POSIX functionality, or C or POSIX functionality in terms of extensions that are not standardized. some don't quite fit this description, but are "internally public" interfacs between subsystems of libc. rather than create a number of newly-named headers to declare these functions, and having to add explicit include directives for them to every source file where they're needed, I have introduced a method of wrapping the corresponding public headers. parallel to the public headers in $(srcdir)/include, we now have wrappers in $(srcdir)/src/include that come earlier in the include path order. they include the public header they're wrapping, then add declarations for namespace-protected versions of the same interfaces and any "internally public" interfaces for the subsystem they correspond to. along these lines, the wrapper for features.h is now responsible for the definition of the hidden, weak, and weak_alias macros. this means source files will no longer need to include any special headers to access these features. over time, it is my expectation that the scope of what is "internally public" will expand, reducing the number of source files which need to include *_impl.h and related headers down to those which are actually implementing the corresponding subsystems, not just using them.
* define and use internal macros for hidden visibility, weak refsRich Felker2018-09-053-6/+3
| | | | | | | | | this cleans up what had become widespread direct inline use of "GNU C" style attributes directly in the source, and lowers the barrier to increased use of hidden visibility, which will be useful to recovering some of the efficiency lost when the protected visibility hack was dropped in commit dc2f368e565c37728b0d620380b849c3a1ddd78f, especially on archs where the PLT ABI is costly.
* fix TLS layout of TLS variant I when there is a gap above TPSzabolcs Nagy2018-06-021-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | In TLS variant I the TLS is above TP (or above a fixed offset from TP) but on some targets there is a reserved gap above TP before TLS starts. This matters for the local-exec tls access model when the offsets of TLS variables from the TP are hard coded by the linker into the executable, so the libc must compute these offsets the same way as the linker. The tls offset of the main module has to be alignup(GAP_ABOVE_TP, main_tls_align). If there is no TLS in the main module then the gap can be ignored since musl does not use it and the tls access models of shared libraries are not affected. The previous setup only worked if (tls_align & -GAP_ABOVE_TP) == 0 (i.e. TLS did not require large alignment) because the gap was treated as a fixed offset from TP. Now the TP points at the end of the pthread struct (which is aligned) and there is a gap above it (which may also need alignment). The fix required changing TP_ADJ and __pthread_self on affected targets (aarch64, arm and sh) and in the tlsdesc asm the offset to access the dtv changed too.
* improve joinable/detached thread state handlingRich Felker2018-05-051-2/+2
| | | | | | | | | | | | | | | | | previously, some accesses to the detached state (from pthread_join and pthread_getattr_np) were unsynchronized; they were harmless in programs with well-defined behavior, but ugly. other accesses (in pthread_exit and pthread_detach) were synchronized by a poorly named "exitlock", with an ad-hoc trylock operation on it open-coded in pthread_detach, whose only purpose was establishing protocol for which thread is responsible for deallocation of detached-thread resources. instead, use an atomic detach_state and unify it with the futex used to wait for thread exit. this eliminates 2 members from the pthread structure, gets rid of the hackish lock usage, and makes rigorous the trap added in commit 80bf5952551c002cf12d96deb145629765272db0 for catching attempts to join detached threads. it should also make attempt to detach an already-detached thread reliably trap.
* use a dedicated futex object for pthread_join instead of tid fieldRich Felker2018-05-021-1/+2
| | | | | | | | | | | | | | | | | | the tid field in the pthread structure is not volatile, and really shouldn't be, so as not to limit the compiler's ability to reorder, merge, or split loads in code paths that may be relevant to performance (like controlling lock ownership). however, use of objects which are not volatile or atomic with futex wait is inherently broken, since the compiler is free to transform a single load into multiple loads, thereby using a different value for the controlling expression of the loop and the value passed to the futex syscall, leading the syscall to block instead of returning. reportedly glibc's pthread_join was actually affected by an equivalent issue in glibc on s390. add a separate, dedicated join_futex object for pthread_join to use.
* prevent bypass of guarantee that suids start with fd 0/1/2 openRich Felker2018-04-051-0/+2
| | | | | | | | | | | | | | | it was reported by Erik Bosman that poll fails without setting revents when the nfds argument exceeds the current value for RLIMIT_NOFILE, causing the subsequent open calls to be bypassed. if the rlimit is either 1 or 2, this leaves fd 0 and 1 potentially closed but openable when the application code is reached. based on a brief reading of the poll syscall documentation and code, it may be possible for poll to fail under other attacker-controlled conditions as well. if it turns out these are reasonable conditions that may happen in the real world, we may have to go back and implement fallbacks to probe each fd individually if poll fails, but for now, keep things simple and treat all poll failures as fatal.
* for executing init array functions, use function type with prototypeRich Felker2017-10-131-1/+1
| | | | | | | | | | this is for consistency with the way it's done in in the dynamic linker, avoiding a deprecated C feature (non-prototype function types), and improving code generation. GCC unnecessarily uses the variadic calling convention (e.g. clearing rax on x86_64) when making a call where the argument types are not known for compatibility with wrong code which calls variadic functions this way. (C on the other hand is clear that such calls have undefined behavior.)
* free allocations in clearenvAlexander Monakov2017-09-041-2/+6
| | | | | | This aligns clearenv with the Linux man page by setting 'environ' rather than '*environ' to NULL, and stops it from leaking entries allocated by the libc.
* overhaul environment functionsAlexander Monakov2017-09-044-81/+86
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rewrite environment access functions to slim down code, fix bugs and avoid invoking undefined behavior. * avoid using int-typed iterators where size_t would be correct; * use strncmp instead of memcmp consistently; * tighten prologues by invoking __strchrnul; * handle NULL environ. putenv: * handle "=value" input via unsetenv too (will return -1/EINVAL); * rewrite and simplify __putenv; fix the leak caused by failure to deallocate entry added by preceding setenv when called from putenv. setenv: * move management of libc-allocated entries to this translation unit, and use no-op weak symbols in putenv/unsetenv; unsetenv: * rewrite; this fixes UB caused by testing a free'd pointer against NULL on entry to subsequent loops. Not changed: Failure to extend allocation tracking array (previously __env_map, now env_alloced) is ignored rather than causing to report -1/ENOMEM to the caller; the worst-case consequence is leaking this allocation when it is removed or replaced in a subsequent environment access. Initially UB in unsetenv was reported by Alexander Cherepanov. Using a weak alias to avoid pulling in malloc via unsetenv was suggested by Rich Felker.
* __init_libc: add fallbacks for __progname setupAlexander Monakov2017-08-291-4/+4
| | | | | | | | | | | | It is possible for argv[0] to be a null pointer, but the __progname variable is used to implement functions in src/legacy/err.c that do not expect it to be null. It is also available to the user via the program_invocation_name alias as a GNU extension, and the implementation in Glibc initializes it to a pointer to empty string rather than NULL. Since argv[0] is usually non-null and it's preferable to keep those variables in BSS, implement the fallbacks in __init_libc, which also allows to have an intermediate fallback to AT_EXECFN.
* fix support for initialized TLS in static PIE binariesRich Felker2016-12-201-0/+5
| | | | | | | | | | | | | | | | | | | | | the static-linked version of __init_tls needs to locate the TLS initialization image via the ELF program headers, which requires determining the base address at which the program was loaded. the existing code attempted to do this by comparing the actual address of the program headers (obtained via auxv) with the virtual address for the PT_PHDR record in the program headers. however, the linker seems to produce a PT_PHDR record only when a program interpreter (dynamic linker) is used. thus the computation failed and used the default base address of 0, leading to a crash when trying to access the TLS image at the wrong address. the dynamic linker entry point and static-PIE rcrt1.o startup code compute the base address instead by taking the difference between the run-time address of _DYNAMIC and the virtual address in the PT_DYNAMIC record. this patch copies the approach they use, but with a weak symbolic reference to _DYNAMIC instead of obtaining the address from the crt_arch.h asm. this works because relocations have already been performed at the time __init_tls is called.
* env: avoid leaving dangling pointers in __env_mapAlexander Monakov2016-03-061-0/+1
| | | | | | This is the minimal fix for __putenv leaving a pointer to freed heap storage in __env_map array, which could later on lead to errors such as double-free.
* remove undef weak refs to init/fini array symbols in libc.soRich Felker2015-11-191-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit ad1cd43a86645ba2d4f7c8747240452a349d6bc1 eliminated preprocessor-level omission of references to the init/fini array symbols from object files going into libc.so. the references are weak, and the intent was that the linker would resolve them to zero in libc.so, but instead it leaves undefined references that could be satisfied at runtime. normally these references would be harmless, since the code using them does not even get executed, but some older binutils versions produce a linking error: when linking a program against libc.so, ld first tries to use the hidden init/fini array symbols produced by the linker script to satisfy the references in libc.so, then produces an error because the definitions are hidden. ideally ld would have already provided definitions of these symbols when linking libc.so, but the linker script for -shared omits them. to avoid this situation, the dynamic linker now provides its own dummy definitions of the init/fini array symbols for libc.so. since they are hidden, everything binds at ld time and no references remain in the dynamic symbol table. with modern binutils and --gc-sections, both the dummy empty array objects and the code referencing them get dropped at link time, anyway. the _init and _fini symbols are also switched back to using weak definitions rather than weak references since the latter behave somewhat problematically in general, and the weak definition approach was known to work well.
* unify static and dynamic linked implementations of thread-local storageRich Felker2015-11-122-45/+49
| | | | | | | | | | | | | | | | | this both allows removal of some of the main remaining uses of the SHARED macro and clears one obstacle to static-linked dlopen support, which may be added at some point in the future. specialized single-TLS-module versions of __copy_tls and __reset_tls are removed and replaced with code adapted from their dynamic-linked versions, capable of operating on a whole chain of TLS modules, and use of the dynamic linker's DSO chain (which contains large struct dso objects) by these functions is replaced with a new chain of struct tls_module objects containing only the information needed for implementing TLS. this may also yield some performance benefit initializing TLS for a new thread when a large number of modules without TLS have been loaded, since since there is no need to walk structures for modules without TLS.
* unify static and dynamic libc init/fini code pathsRich Felker2015-11-111-15/+11
| | | | | | use weak definitions that the dynamic linker can override instead of preprocessor conditionals on SHARED so that the same libc start and exit code can be used for both static and dynamic linking.
* eliminate use of SHARED macro to suppress visibility attributesRich Felker2015-11-111-10/+1
| | | | | | | | | | | | | | | | this is the first and simplest stage of removal of the SHARED macro, which will eventually allow libc.a and libc.so to be produced from the same object files. the original motivation for these #ifdefs which are now being removed was to allow building a static-only libc using a compiler that does not support visibility. however, SHARED was the wrong condition to test for this anyway; various assembly-language sources refer to hidden symbols and declare them with the .hidden directive, making it wrong to define the referenced symbols as non-hidden. if there is a need in the future to build libc using compilers that lack visibility, support could be moved to the build system or perhaps the __PIC__ macro could be checked instead of SHARED.
* move calls to application init functions after crt1 entry pointRich Felker2015-09-221-0/+3
| | | | | | | | | | | | | | this change is needed to be compatible with fdpic, where some of the main application's relocations may be performed as part of the crt1 entry point. if we call init functions before passing control, these relocations will not yet have been performed, and the init code will potentially make use of invalid pointers. conceptually, no code provided by the application or third-party libraries should run before the application entry point. the difference is not observable to programs using the crt1 we provide, but it could come into play if custom entry point code is used, so it's better to be doing this right anyway.
* provide __stack_chk_fail_local in libc.aRich Felker2015-06-201-0/+4
| | | | | | | | | | | | | | this symbol is needed only on archs where the PLT call ABI is klunky, and only for position-independent code compiled with stack protector. thus references usually only appear in shared libraries or PIE executables, but they can also appear when linking statically if some of the object files being linked were built as PIC/PIE. normally libssp_nonshared.a from the compiler toolchain should provide __stack_chk_fail_local, but reportedly it appears prior to -lc in the link order, thus failing to satisfy references from libc itself (which arise only if libc.a was built as PIC/PIE with stack protector enabled).
* fix stack protector crashes on x32 & powerpc due to misplaced TLS canaryRich Felker2015-05-061-1/+1
| | | | | | | | | | | | | | | | | | | i386, x86_64, x32, and powerpc all use TLS for stack protector canary values in the default stack protector ABI, but the location only matched the ABI on i386 and x86_64. on x32, the expected location for the canary contained the tid, thus producing spurious mismatches (resulting in process termination) upon fork. on powerpc, the expected location contained the stdio_locks list head, so returning from a function after calling flockfile produced spurious mismatches. in both cases, the random canary was not present, and a predictable value was used instead, making the stack protector hardening much less effective than it should be. in the current fix, the thread structure has been expanded to have canary fields at all three possible locations, and archs that use a non-default location must define a macro in pthread_arch.h to choose which location is used. for most archs (which lack TLS canary ABI) the choice does not matter.
* fix misalignment of dtv in static-linked programs with odd-sized TLSRich Felker2015-04-231-1/+2
| | | | | | | | | | both static and dynamic linked versions of the __copy_tls function have a hidden assumption that the alignment of the beginning or end of the memory passed is suitable for storing an array of pointers for the dtv. pthread_create satisfies this requirement except when libc.tls_size is misaligned, which cannot happen with dynamic linking due to way update_tls_size computes the total size, but could happen with static linking and odd-sized TLS.
* remove dead store from static __init_tlsRich Felker2015-04-231-2/+0
| | | | | | commit dab441aea240f3b7c18a26d2ef51979ea36c301c, which made thread pointer init mandatory for all programs, rendered this store obsolete by removing the early-return path for static programs with no TLS.
* make __init_tp function static when static linkingRich Felker2015-04-231-0/+3
| | | | | | | | this slightly reduces the code size cost of TLS/thread-pointer for static linking since __init_tp can be inlined into its only caller and removed. this is analogous to the handling of __init_libc in __libc_start_main, where the function only has external linkage when it needs to be called from the dynamic linker.
* fix inconsistent visibility for __hwcap and __sysinfo symbolsRich Felker2015-04-221-3/+0
| | | | | these are used as hidden by asm files (and such use is the whole reason they exist), but their actual definitions were not hidden.
* remove useless visibility application from static-linking-only codeRich Felker2015-04-222-3/+2
| | | | | part of the goal here is to eliminate use of the ATTR_LIBC_VISIBILITY macro outside of libc.h, since it was never intended to be 'public'.
* allow libc itself to be built with stack protector enabledRich Felker2015-04-131-0/+10
| | | | | | | | | | | | | | | | | | | | this was already essentially possible as a result of the previous commits changing the dynamic linker/thread pointer bootstrap process. this commit mainly adds build system infrastructure: configure no longer attempts to disable stack protector. instead it simply determines how so the makefile can disable stack protector for a few translation units used during early startup. stack protector is also disabled for memcpy and memset since compilers (incorrectly) generate calls to them on some archs to implement struct initialization and assignment, and such calls may creep into early initialization. no explicit attempt to enable stack protector is made by configure at this time; any stack protector option supported by the compiler can be passed to configure in CFLAGS, and if the compiler uses stack protector by default, this default is respected.
* remove remnants of support for running in no-thread-pointer modeRich Felker2015-04-132-5/+3
| | | | | | | | | | | | | since 1.1.0, musl has nominally required a thread pointer to be setup. most of the remaining code that was checking for its availability was doing so for the sake of being usable by the dynamic linker. as of commit 71f099cb7db821c51d8f39dfac622c61e54d794c, this is no longer necessary; the thread pointer is now valid before any libc code (outside of dynamic linker bootstrap functions) runs. this commit essentially concludes "phase 3" of the "transition path for removing lazy init of thread pointer" project that began during the 1.1.0 release cycle.
* optimize out setting up robust list with kernel when not neededRich Felker2015-04-101-0/+1
| | | | | | | | | | as a result of commit 12e1e324683a1d381b7f15dd36c99b37dd44d940, kernel processing of the robust list is only needed for process-shared mutexes. previously the first attempt to lock any owner-tracked mutex resulted in robust list initialization and a set_robust_list syscall. this is no longer necessary, and since the kernel's record of the robust list must now be cleared at thread exit time for detached threads, optimizing it out is more worthwhile than before too.
* copy the dtv pointer to the end of the pthread struct for TLS_ABOVE_TP archsSzabolcs Nagy2015-03-111-1/+1
| | | | | | | | | | | | | | | | | | | | There are two main abi variants for thread local storage layout: (1) TLS is above the thread pointer at a fixed offset and the pthread struct is below that. So the end of the struct is at known offset. (2) the thread pointer points to the pthread struct and TLS starts below it. So the start of the struct is at known (zero) offset. Assembly code for the dynamic TLSDESC callback needs to access the dynamic thread vector (dtv) pointer which is currently at the front of the pthread struct. So in case of (1) the asm code needs to hard code the offset from the end of the struct which can easily break if the struct changes. This commit adds a copy of the dtv at the end of the struct. New members must not be added after dtv_copy, only before it. The size of the struct is increased a bit, but there is opportunity for size optimizations.
* fix over-alignment of TLS, insufficient builtin TLS on 64-bit archsRich Felker2015-03-061-2/+8
| | | | | | | | | | | | | a conservative estimate of 4*sizeof(size_t) was used as the minimum alignment for thread-local storage, despite the only requirements being alignment suitable for struct pthread and void* (which struct pthread already contains). additional alignment required by the application or libraries is encoded in their headers and is already applied. over-alignment prevented the builtin_tls array from ever being used in dynamic-linked programs on 64-bit archs, thereby requiring allocation at startup even in programs with no TLS of their own.
* fix #ifdef inside a macro argument list in __init_tls.cSzabolcs Nagy2014-08-131-4/+3
| | | | | C99 6.10.3p11 disallows such constructs so use an #ifdef outside of the argument list of __syscall
* eliminate use of cached pid from thread structureRich Felker2014-07-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | the main motivation for this change is to remove the assumption that the tid of the main thread is also the pid of the process. (the value returned by the set_tid_address syscall was used to fill both fields despite it semantically being the tid.) this is historically and presently true on linux and unlikely to change, but it conceivably could be false on other systems that otherwise reproduce the linux syscall api/abi. only a few parts of the code were actually still using the cached pid. in a couple places (aio and synccall) it was a minor optimization to avoid a syscall. caching could be reintroduced, but lazily as part of the public getpid function rather than at program startup, if it's deemed important for performance later. in other places (cancellation and pthread_kill) the pid was completely unnecessary; the tkill syscall can be used instead of tgkill. this is actually a rather subtle issue, since tgkill is supposedly a solution to race conditions that can affect use of tkill. however, as documented in the commit message for commit 7779dbd2663269b465951189b4f43e70839bc073, tgkill does not actually solve this race; it just limits it to happening within one process rather than between processes. we use a lock that avoids the race in pthread_kill, and the use in the cancellation signal handler is self-targeted and thus not subject to tid reuse races, so both are safe regardless of which syscall (tgkill or tkill) is used.
* add locale frameworkRich Felker2014-07-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | this commit adds non-stub implementations of setlocale, duplocale, newlocale, and uselocale, along with the data structures and minimal code needed for representing the active locale on a per-thread basis and optimizing the common case where thread-local locale settings are not in use. at this point, the data structures only contain what is necessary to represent LC_CTYPE (a single flag) and LC_MESSAGES (a name for use in finding message translation files). representation for the other categories will be added later; the expectation is that a single pointer will suffice for each. for LC_CTYPE, the strings "C" and "POSIX" are treated as special; any other string is accepted and treated as "C.UTF-8". for other categories, any string is accepted after being truncated to a maximum supported length (currently 15 bytes). for LC_MESSAGES, the name is kept regardless of whether libc itself can use such a message translation locale, since applications using catgets or gettext should be able to use message locales libc is not aware of. for other categories, names which are not successfully loaded as locales (which, at present, means all names) are treated as aliases for "C". setlocale never fails. locale settings are not yet used anywhere, so this commit should have no visible effects except for the contents of the string returned by setlocale.
* fix typo in a comment in __libc_start_mainRich Felker2014-07-011-1/+1
|
* separate __tls_get_addr implementation from dynamic linker/init_tlsRich Felker2014-06-191-5/+0
| | | | | | | | | | | | | | | such separation serves multiple purposes: - by having the common path for __tls_get_addr alone in its own function with a tail call to the slow case, code generation is greatly improved. - by having __tls_get_addr in it own file, it can be replaced on a per-arch basis as needed, for optimization or ABI-specific purposes. - by removing __tls_get_addr from __init_tls.c, a few bytes of code are shaved off of static binaries (which are unlikely to use this function unless the linker messed up).
* simplify errno implementationRich Felker2014-06-101-1/+0
| | | | | | | | | | | | | | | the motivation for the errno_ptr field in the thread structure, which this commit removes, was to allow the main thread's errno to keep its address when lazy thread pointer initialization was used. &errno was evaluated prior to setting up the thread pointer and stored in errno_ptr for the main thread; subsequently created threads would have errno_ptr pointing to their own errno_val in the thread structure. since lazy initialization was removed, there is no need for this extra level of indirection; __errno_location can simply return the address of the thread's errno_val directly. this does cause &errno to change, but the change happens before entry to application code, and thus is not observable.