| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
this change is needed to correctly handle the case where a constructor
creates a new thread which calls dlopen. previously, the lock was not
held in this case. the reason for the complex logic to avoid locking
whenever possible is that, since the mutex is recursive, it will need
to inspect the thread pointer to get the current thread's tid, and
this requires initializing the thread pointer. we do not want
non-multi-threaded programs to attempt to access the thread pointer
unnecessarily; doing so could make them crash on ancient kernels that
don't support threads but which may otherwise be capable of running
the program.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
previously, the path string was being used despite being invalid. with
this change, empty path file or error reading the path file is treated
as an empty path. this is preferable to falling back to a default
path, so that attacks to prevent reading of the path file could not
result in loading incorrect and possibly dangerous (outdated or
mismatching ABI) libraries from.
the code to strip the final newline has also been removed; now that
newline is accepted as a delimiter, it's harmless to leave it in
place.
|
|
|
|
|
|
|
| |
apparently the original commit was never tested properly, since
getline was only ever reading one line. the intent was to read the
entire file, so use getdelim with the null byte as delimiter as a
cheap way to read a whole file into memory.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
this allows /etc/ld-musl-$(ARCH).path to contain one path per line,
which is much more convenient for users than the :-delimited format,
which was a source of repeated and unnecessary confusion. for
simplicity, \n is also accepted in environment variables, though it
should probably not be used there.
at the same time, issues with overly long paths invoking UB or getting
truncated have been fixed. such issues should not have arisen with the
environment (which is size-limited) but could have been generated by a
path file larger than 2**31 bytes in length.
|
|
|
|
|
|
| |
this bug seems to have been introduced when the map_library signatures
was changed to return the mapping in a temp dso structure instead of
into separate variables.
|
|
|
|
|
|
|
|
| |
based on patch by Pierre Carrier <pierre@gcarrier.fr> that just added
the flag constant, but with minimal additional code so that it
actually works as documented. this is a nonstandard option but some
major software (reportedly, Firefox) uses it and it was easy to add
anyway.
|
| |
|
|
|
|
|
| |
struct dso was not defined in this case, and it's not needed in the
code that was using it anyway; void pointers work just as well.
|
| |
|
|
|
|
|
|
|
| |
this is wasteful and useless from a standpoint of sane programs, but
it is required by the standard, and the current requirements were
upheld with the closure of Austin Group issue #639:
http://austingroupbugs.net/view.php?id=639
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
previously, shared library constructors were being called before
important internal things like the environment (extern char **environ)
and hwcap flags (needed for sjlj to work right with float on arm) were
initialized in __libc_start_main. rather than trying to have to
dynamic linker make sure this stuff all gets initialized right, I've
opted to just defer calling shared library constructors until after
the main program's entry point is reached. this also fixes the order
of ctors to be the exact reverse of dtors, which is a desirable
property and possibly even mandated by some languages.
the main practical effect of this change is that shared libraries
calling getenv from ctors will no longer fail.
|
|
|
|
|
|
| |
actually, the hard-coded name should be eliminated too, and replaced
by a search for the soname in the headers, but that can be done
separately later.
|
|
|
|
|
|
| |
fortunately the memory corruption could not hurt anything, but it
prevented clearing the final newline and thus prevented the last path
element from working.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
this change was originally intended just to avoid repeated attempts to
open a nonexistant /etc/ls-musl-$(ARCH).path file, but I realized it
also prevents the default paths from being searched when such a path
file exists. despite the potential to break existing usage, I believe
the new behavior is the right behavior, and it's better to fix it
sooner rather than later. with the old behavior, it was impossible to
inhibit search of default paths which might contain musl-incompatible
libs (or even libs from a different cpu arch, on multi-arch machines).
|
| |
|
|
|
|
|
|
|
| |
some of these were coming from stdio functions locking files without
unlocking them. I believe it's useful for this to throw a warning, so
I added a new macro that's self-documenting that the file will never
be unlocked to avoid the warning in the few places where it's wrong.
|
|
|
|
|
|
|
|
|
|
| |
patches by Alex Caudill (npx). the dynamic-linked version is almost
identical to the final submitted patch; I just added a couple missing
lines for saving the phdr address when the dynamic linker is invoked
directly to run a program, and removed a couple to avoid introducing
another unnecessary type. the static-linked version is based on npx's
draft. it could use some improvements which are contingent on the
startup code saving some additional information for later use.
|
| |
|
|
|
|
|
| |
this was broken during the early dynamic-linked TLS commits, which
rearranged some of the code for handling new relocation types.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
despite documentation that makes it sound a lot different, the only
ABI-constraint difference between TLS variants II and I seems to be
that variant II stores the initial TLS segment immediately below the
thread pointer (i.e. the thread pointer points to the end of it) and
variant I stores the initial TLS segment above the thread pointer,
requiring the thread descriptor to be stored below. the actual value
stored in the thread pointer register also tends to have per-arch
random offsets applied to it for silly micro-optimization purposes.
with these changes applied, TLS should be basically working on all
supported archs except microblaze. I'm still working on getting the
necessary information and a working toolchain that can build TLS
binaries for microblaze, but in theory, static-linked programs with
TLS and dynamic-linked programs where only the main executable uses
TLS should already work on microblaze.
alignment constraints have not yet been heavily tested, so it's
possible that this code does not always align TLS segments correctly
on archs that need TLS variant I.
|
|
|
|
|
| |
this change brings the behavior in line with the static-linked code,
which seems to be correct.
|
|
|
|
|
|
|
| |
this makes it so the #undef libc and __libc name are no longer needed,
which were problematic because the "accessor function" mode for
accessing the libc struct could not be used, breaking build on any
compiler without (working) visibility.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the code in __libc_start_main is now responsible for parsing auxv,
rather than duplicating the parsing all over the place. this should
shave off a few cycles and some code size. __init_libc is left as an
external-linkage function despite the fact that it could be static, to
prevent it from being inlined and permanently wasting stack space when
main is called.
a few other minor changes are included, like eliminating per-thread
ssp canaries (they were likely broken when combined with certain
dlopen usages, and completely unnecessary) and some other unnecessary
checks. since this code gets linked into every program, it should be
as small and simple as possible.
|
|
|
|
|
|
| |
at initial program load, all libraries must be loaded before the
thread pointer can be setup, since the TP-relative addresses of all
initial TLS objects must be constant.
|
|
|
|
|
|
|
|
|
|
| |
this is needed to ensure async-cancel-safety, i.e. to make it safe to
access TLS objects when async cancellation is enabled. otherwise, if
cancellation were acter upon after the atomic fetch/add but before the
thread saved the obtained memory, another access to the same TLS in
the cancellation handler could end up performing the atomic fetch/add
again, consuming more memory than is actually available and
overflowing into other objects on the heap.
|
| |
|
|
|
|
|
|
|
|
| |
symbol value of 0 is not "undefined" for TLS; it's the address of the
first symbol in the TLS segment. however, non-definition TLS
references also have values of 0, so check the section.
hopefully the new logic is more clear, too.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
compute offsets from the thread pointer statically when loading the
library, rather than repeating the logic on each thread creation. not
only is the latter less efficient at runtime; it also fails to provide
solid guarantees that the offsets will remain the same when the
initial alignment of memory is different. the new alignment handling
is both more rigorous and simpler.
the old code was also clobbering TLS bss with random image data in
some cases due to using tls_size (size of TLS segment) instead of
tls_len (length of the TLS data image).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
some libraries call dlopen from their constructors, resulting in
recursive calls to dlopen. previously, this resulted in deadlock. I'm
now unlocking the dlopen lock before running constructors (this is
especially important since the lock also blocked pthread_create and
was being held while application code runs!) and using a separate
recursive mutex protecting the ctor/dtor state instead.
in order to prevent the same ctor from being called more than once, a
module is considered "constructed" just before the ctor runs.
also, switch from using atexit to register each dtor to using a single
atexit call to register the dynamic linker's dtor processing as just
one handler. this is necessary because atexit performs allocation and
may fail, but the library has already been loaded and cannot be
backed-out at the time dtor registration is performed. this change
also ensures that all dtors run after all atexit functions, rather
than in mixed order.
|
|
|
|
|
|
| |
libraries loaded more than once by pathname should not get shortnames
that would cause them to later be used to satisfy non-pathname load
requests.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
unlike other implementations, this one reserves memory for new TLS in
all pre-existing threads at dlopen-time, and dlopen will fail with no
resources consumed and no new libraries loaded if memory is not
available. memory is not immediately distributed to running threads;
that would be too complex and too costly. instead, assurances are made
that threads needing the new TLS can obtain it in an async-signal-safe
way from a buffer belonging to the dynamic linker/new module (via
atomic fetch-and-add based allocator).
I've re-appropriated the lock that was previously used for __synccall
(synchronizing set*id() syscalls between threads) as a general
pthread_create lock. it's a "backwards" rwlock where the "read"
operation is safe atomic modification of the live thread count, which
multiple threads can perform at the same time, and the "write"
operation is making sure the count does not increase during an
operation that depends on it remaining bounded (__synccall or dlopen).
in static-linked programs that don't use __synccall, this lock is a
no-op and has no cost.
|
|
|
|
|
| |
orig_tail was being saved before the lock was obtained, allowing
dlopen failure to roll-back other dlopens that had succeeded.
|
|
|
|
|
|
|
|
| |
currently, only i386 is tested. x86_64 and arm should probably work.
the necessary relocation types for mips and microblaze have not been
added because I don't understand how they're supposed to work, and I'm
not even sure if it's defined yet on microblaze. I may be able to
reverse engineer the requirements out of gcc/binutils output.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
this was an optimization to save/recover a minimal amount of extra
memory for use by malloc, that's becoming increasingly costly to keep
around. freeing this data:
1. breaks debugging with gdb (it can't find library symbols)
2. breaks thread-local storage in shared libraries
it would be possible to disable freeing when TLS is used, but in
addition to the above breakages, tracking whether dlopen/dlsym is used
adds a cost to every symbol lookup, possibly making program startup
slower for large programs. combined with the complexity, it's not
worth it. we already save/recover plenty of memory in the dynamic
linker with reclaim_gaps.
|
|
|
|
|
|
| |
this code will not work yet because the necessary relocations are not
supported, and cannot be supported without some internal changes to
how relocation processing works (coming soon).
|
|
|
|
|
| |
only TLS in the main program is supported so far; TLS defined in
shared libraries will not work yet.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the design for TLS in dynamic-linked programs is mostly complete too,
but I have not yet implemented it. cost is nonzero but still low for
programs which do not use TLS and/or do not use threads (a few hundred
bytes of new code, plus dependency on memcpy). i believe it can be
made smaller at some point by merging __init_tls and __init_security
into __libc_start_main and avoiding duplicate auxv-parsing code.
at the same time, I've also slightly changed the logic pthread_create
uses to allocate guard pages to ensure that guard pages are not
counted towards commit charge.
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
to deal with the fact that the public headers may be used with pre-c99
compilers, __restrict is used in place of restrict, and defined
appropriately for any supported compiler. we also avoid the form
[restrict] since older versions of gcc rejected it due to a bug in the
original c99 standard, and instead use the form *restrict.
|
| |
|
|
|
|
|
|
|
|
|
| |
based on patches submitted by boris brezillon. this commit also fixes
the issue whereby the main application and libc don't have the address
ranges of their mappings stored, which was theoretically a problem for
RTLD_NEXT support in dlsym; it didn't actually matter because libc
never calls dlsym, and it seemed to be doing the right thing (by
chance) for symbols in the main program as well.
|
|
|
|
|
|
| |
wrong hash was being passed; just a copy/paste error. did not affect
lookups in the global namespace; this is probably why it was not
caught in testing.
|
| |
|