| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
the aio operations that lead to calling __aio_get_queue with the
possibility to expand the fd map are not AS-safe, but if they are
interrupted by a signal handler, the signal handler may call close,
which is required to be AS-safe. due to __aio_get_queue taking the
write lock without blocking signals, such a call to close from a
signal handler could deadlock.
change __aio_get_queue to block signals if it needs to obtain a write
lock, and restore when finished.
|
|
|
|
|
|
|
| |
aio_suspend waits on a dummy futex in the corner case when the array of
requests contains NULL pointers only. But the value of this futex was
left uninitialized, so if it happens to be non-zero, aio_suspend
degrades to spinning instead of blocking.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
as reported by Alexey Izbyshev, there is a lock order inversion
deadlock between the malloc lock and aio maplock at MT-fork time:
_Fork attempts to take the aio maplock while fork already has the
malloc lock, but a concurrent aio operation holding the maplock may
attempt to allocate memory.
move the __aio_atfork calls in the parent from _Fork to fork, and
reorder the lock before most other locks, since nothing else depends
on aio(*). this leaves us with the possibility that the child will not
be able to obtain the read lock, if _Fork is used directly and happens
concurrent with an aio operation. however, in that case, the child
context is an async signal context that cannot call any further aio
functions, so all we need is to ensure that close does not attempt to
perform any aio cancellation. this can be achieved just by nulling out
the map pointer.
(*) even if other functions call close, they will only need a read
lock, not a write lock, and read locks being recursive ensures they
can obtain it. moreover, the number of read references held is bounded
by something like twice the number of live threads, meaning that the
read lock count cannot saturate.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
originally the namespace-infringing "large file support" interfaces
were included as part of glibc-ABI-compat, with the intent that they
not be used for linking, since our off_t is and always has been
unconditionally 64-bit and since we usually do not aim to support
nonstandard interfaces when there is an equivalent standard interface.
unfortunately, having the symbols present and available for linking
caused configure scripts to detect them and attempt to use them
without declarations, producing all the expected ill effects that
entails.
as a result, commit 2dd8d5e1b8ba1118ff1782e96545cb8a2318592c was made
to prevent this, using macros to redirect the LFS64 names to the
standard names, conditional on _GNU_SOURCE or _LARGEFILE64_SOURCE.
however, this has turned out to be a source of further problems,
especially since g++ defines _GNU_SOURCE by default. in particular,
the presence of these names as macros breaks a lot of valid code.
this commit removes all the LFS64 symbols and replaces them with a
mechanism in the dynamic linker symbol lookup failure path to retry
with the spurious "64" removed from the symbol name. in the future,
if/when the rest of glibc-ABI-compat is moved out of libc, this can be
removed.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
pthread_once is not compatible with MT-fork constraints (commit
167390f05564e0a4d3fcb4329377fd7743267560) and is not needed here
anyway; we already have a lock suitable for initialization.
while changing this, fix a corner case where AT_MINSIGSTKSZ gives a
value that's more than MINSIGSTKSZ but by a margin of less than
2048, thereby causing the size to be reduced. it shouldn't matter but
the intent was to be the larger of a 2048-byte margin over the legacy
fixed minimum stack requirement or a 512-byte margin over the minimum
the kernel reports at runtime.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
this change lifts undocumented restrictions on calls by replacement
mallocs to libc functions that might take these locks, and sets the
stage for lifting restrictions on the child execution environment
after multithreaded fork.
care is taken to #define macros to replace all four functions (malloc,
calloc, realloc, free) even if not all of them will be used, using an
undefined symbol name for the ones intended not to be used so that any
inadvertent future use will be caught at compile time rather than
directed to the wrong implementation.
|
|
|
|
|
| |
also fix the lack of declaration (and thus hidden visibility) in
__stdio_close's use of __aio_close.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
previously, if a file descriptor had aio operations pending in the
parent before fork, attempting to close it in the child would attempt
to cancel a thread belonging to the parent. this could deadlock, fail,
or crash the whole process of the cancellation signal handler was not
yet installed in the parent. in addition, further use of aio from the
child could malfunction or deadlock.
POSIX specifies that async io operations are not inherited by the
child on fork, so clear the entire aio fd map in the child, and take
the aio map lock (with signals blocked) across the fork so that the
lock is kept in a consistent state.
|
|
|
|
|
|
|
| |
these functions cannot provide the glibc lfs64-ABI-compatible symbols
when time_t differs from what it was in that ABI. instead, the aliases
need to be provided by the time32 compat shims or through some other
mechanism.
|
|
|
|
|
|
|
| |
The old/new parameters to pthread_sigmask, sigprocmask, and setitimer
are marked restrict, so passing the same address to both is
prohibited. Modify callers of these functions to use a separate object
for each argument.
|
|
|
|
|
|
|
|
| |
it's not clear whether this is required, but it seems arguable that it
should happen. for example aio_suspend is supposed to return
immediately if any of the operations has "completed", which includes
ending with an error status asynchonously and might also be
interpreted to include doing so synchronously.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the map structures in particular are permanent once created, and thus
a large number of aio function calls with invalid file descriptors
could exhaust memory, whereas, assuming normal resource limits, only a
very small number of entries ever need to be allocated. check validity
of the fd before allocating anything new, so that allocation of large
amounts of memory is only possible when resource limits have been
increased and a large number of files are actually open.
this change also improves error reporting for bad file descriptors to
happen at the time the aio submission call is made, as opposed to
asynchronously.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
since commit c9f415d7ea2dace5bf77f6518b6afc36bb7a5732, it has been
possible that the allocator is application-provided code, which cannot
necessarily run safely on io thread stacks, and which should not be
able to see the existence of io threads, since they are an
implementation detail.
instead of having the io thread request and possibly allocate its
queue (and the map structures leading to it), make the submitting
thread responsible for this, and pass the queue pointer into the io
thread via its args structure. this eliminates the only early error
case in io threads, making it no longer necessary to pass an error
status back to the submitting thread via the args structure.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
aio threads not using SIGEV_THREAD notification are created with small
stacks and no guard page, which is possible since they only run the
code for the requested io operation, not any application code. the
motivation is not creating a lot of VMAs. however, the io thread needs
to be able to receive a cancellation signal in case aio_cancel
(implemented via pthread_cancel) is called. this requires sufficient
stack space for a signal frame, which PTHREAD_STACK_MIN does not
necessarily include.
in principle MINSIGSTKSZ from signal.h should give us sufficient space
for a signal frame, but the value is incorrect on some existing archs
due to kernel addition of new vector register support without
consideration for impact on ABI. some powerpc models exceed
MINSIGSTKSZ by about 0.5k, and x86[_64] with AVX-512 can exceed it by
up to about 1.5k. so use MINSIGSTKSZ+2048 to allow for the discrepancy
plus some working space.
unfortunately, it's possible that signal frame sizes could continue to
grow, and some archs (aarch64) explicitly specify that they may.
passing of a runtime value for MINSIGSTKSZ via AT_MINSIGSTKSZ in the
aux vector was added to aarch64 linux, and presumably other archs will
use this mechanism to report if they further increase the signal frame
size. when AT_MINSIGSTKSZ is present, assume it's correct, so that we
only need a small amount of working space in addition to it; in this
case just add 512.
|
|
|
|
|
|
| |
the LFS64 macro was not self-documenting and barely saved any
characters. simply use weak_alias directly so that it's clear what's
being done, and doesn't depend on a header to provide a strange macro.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
libc.h was intended to be a header for access to global libc state and
related interfaces, but ended up included all over the place because
it was the way to get the weak_alias macro. most of the inclusions
removed here are places where weak_alias was needed. a few were
recently introduced for hidden. some go all the way back to when
libc.h defined CANCELPT_BEGIN and _END, and all (wrongly implemented)
cancellation points had to include it.
remaining spurious users are mostly callers of the LOCK/UNLOCK macros
and files that use the LFS64 macro to define the awful *64 aliases.
in a few places, new inclusion of libc.h is added because several
internal headers no longer implicitly include libc.h.
declarations for __lockfile and __unlockfile are moved from libc.h to
stdio_impl.h so that the latter does not need libc.h. putting them in
libc.h made no sense at all, since the macros in stdio_impl.h are
needed to use them correctly anyway.
|
|
|
|
| |
these were overlooked for various reasons in earlier stages.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the memory model we use internally for atomics permits plain loads of
values which may be subject to concurrent modification without
requiring that a special load function be used. since a compiler is
free to make transformations that alter the number of loads or the way
in which loads are performed, the compiler is theoretically free to
break this usage. the most obvious concern is with atomic cas
constructs: something of the form tmp=*p;a_cas(p,tmp,f(tmp)); could be
transformed to a_cas(p,*p,f(*p)); where the latter is intended to show
multiple loads of *p whose resulting values might fail to be equal;
this would break the atomicity of the whole operation. but even more
fundamental breakage is possible.
with the changes being made now, objects that may be modified by
atomics are modeled as volatile, and the atomic operations performed
on them by other threads are modeled as asynchronous stores by
hardware which happens to be acting on the request of another thread.
such modeling of course does not itself address memory synchronization
between cores/cpus, but that aspect was already handled. this all
seems less than ideal, but it's the best we can do without mandating a
C11 compiler and using the C11 model for atomics.
in the case of pthread_once_t, the ABI type of the underlying object
is not volatile-qualified. so we are assuming that accessing the
object through a volatile-qualified lvalue via casts yields volatile
access semantics. the language of the C standard is somewhat unclear
on this matter, but this is an assumption the linux kernel also makes,
and seems to be the correct interpretation of the standard.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
previously, the __timedwait function was optionally a cancellation
point depending on whether it was passed a pointer to a cleaup
function and context to register. as of now, only one caller actually
used such a cleanup function (and it may face removal soon); most
callers either passed a null pointer to disable cancellation or a
dummy cleanup function.
now, __timedwait is never a cancellation point, and __timedwait_cp is
the cancellable version. this makes the intent of the calling code
more obvious and avoids ugly dummy functions and long argument lists.
|
|
|
|
|
|
|
| |
a_store is only valid for int, but ssize_t may be defined as long or
another type. since there is no valid way for another thread to acess
the return value without first checking the error/completion status of
the aiocb anyway, an atomic store is not necessary.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
previously, aio operations were not tracked by file descriptor; each
operation was completely independent. this resulted in non-conforming
behavior for non-seekable/append-mode writes (which are required to be
ordered) and made it impossible to implement aio_cancel, which in turn
made closing file descriptors with outstanding aio operations unsafe.
the new implementation is significantly heavier (roughly twice the
size, and seems to be slightly slower) and presently aims mainly at
correctness, not performance.
most of the public interfaces have been moved into a single file,
aio.c, because there is little benefit to be had from splitting them.
whenever any aio functions are used, aio_cancel and the internal
queue lifetime management and fd-to-queue mapping code must be linked,
and these functions make up the bulk of the code size.
the close function's interaction with aio is implemented with weak
alias magic, to avoid pulling in heavy aio cancellation code in
programs that don't use aio, and the expensive cancellation path
(which includes signal blocking) is optimized out when there are no
active aio queues.
|
|
|
|
|
|
| |
versionsort64, aio*64 and lio*64 symbols were missing, they are
only needed for glibc ABI compatibility, on the source level
dirent.h and aio.h already redirect them.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the main motivation for this change is to remove the assumption that
the tid of the main thread is also the pid of the process. (the value
returned by the set_tid_address syscall was used to fill both fields
despite it semantically being the tid.) this is historically and
presently true on linux and unlikely to change, but it conceivably
could be false on other systems that otherwise reproduce the linux
syscall api/abi.
only a few parts of the code were actually still using the cached pid.
in a couple places (aio and synccall) it was a minor optimization to
avoid a syscall. caching could be reintroduced, but lazily as part of
the public getpid function rather than at program startup, if it's
deemed important for performance later. in other places (cancellation
and pthread_kill) the pid was completely unnecessary; the tkill
syscall can be used instead of tgkill. this is actually a rather
subtle issue, since tgkill is supposedly a solution to race conditions
that can affect use of tkill. however, as documented in the commit
message for commit 7779dbd2663269b465951189b4f43e70839bc073, tgkill
does not actually solve this race; it just limits it to happening
within one process rather than between processes. we use a lock that
avoids the race in pthread_kill, and the use in the cancellation
signal handler is self-targeted and thus not subject to tid reuse
races, so both are safe regardless of which syscall (tgkill or tkill)
is used.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PAGE_SIZE was hardcoded to 4096, which is historically what most
systems use, but on several archs it is a kernel config parameter,
user space can only know it at execution time from the aux vector.
PAGE_SIZE and PAGESIZE are not defined on archs where page size is
a runtime parameter, applications should use sysconf(_SC_PAGE_SIZE)
to query it. Internally libc code defines PAGE_SIZE to libc.page_size,
which is set to aux[AT_PAGESZ] in __init_libc and early in __dynlink
as well. (Note that libc.page_size can be accessed without GOT, ie.
before relocations are done)
Some fpathconf settings are hardcoded to 4096, these should be actually
queried from the filesystem using statfs.
|
|
|
|
|
|
|
| |
issue found and patch provided by Jens Gustedt. after the atomic store
to the error code field of the aiocb, the application is permitted to
free or reuse the storage, so further access is invalid. instead, use
the local copy that was already made.
|
| |
|
| |
|
|
|
|
|
|
|
| |
for some reason I have not been able to determine, gcc 3.2 rejects the
array notation. this seems to be a gcc bug, but since it's easy to
work around, let's do the workaround and avoid gratuitously requiring
newer compilers.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
this mirrors the stdio_impl.h cleanup. one header which is not
strictly needed, errno.h, is left in pthread_impl.h, because since
pthread functions return their error codes rather than using errno,
nearly every single pthread function needs the errno constants.
in a few places, rather than bringing in string.h to use memset, the
memset was replaced by direct assignment. this seems to generate much
better code anyway, and makes many functions which were previously
non-leaf functions into leaf functions (possibly eliminating a great
deal of bloat on some platforms where non-leaf functions require ugly
prologue and/or epilogue).
|
|
|
|
|
|
|
|
| |
to deal with the fact that the public headers may be used with pre-c99
compilers, __restrict is used in place of restrict, and defined
appropriately for any supported compiler. we also avoid the form
[restrict] since older versions of gcc rejected it due to a bug in the
original c99 standard, and instead use the form *restrict.
|
| |
|
|
|
|
|
| |
i blame this one on posix for using hideous const-qualified double
pointers which are unusable without hideous casts.
|
| |
|
|
|
|
|
| |
previous fix was backwards and propagated the wrong type rather than
the right one...
|
| |
|
|
some features are not yet supported, and only minimal testing has been
performed. should be considered experimental at this point.
|