mirror/musl - mirror of git://git.musl-libc.org/musl

	Commit message (Collapse)	Author	Age	Files	Lines
*	add C11 thread creation and related thread functions	Rich Felker	2014-09-07	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	based on patch by Jens Gustedt. the main difficulty here is handling the difference between start function signatures and thread return types for C11 threads versus POSIX threads. pointers to void are assumed to be able to represent faithfully all values of int. the function pointer for the thread start function is cast to an incorrect type for passing through pthread_create, but is cast back to its correct type before calling so that the behavior of the call is well-defined. changes to the existing threads implementation were kept minimal to reduce the risk of regressions, and duplication of code that carries implementation-specific assumptions was avoided for ease and safety of future maintenance.
*	fix false ownership of stdio FILEs due to tid reuse	Rich Felker	2014-08-23	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	this is analogous commit fffc5cda10e0c5c910b40f7be0d4fa4e15bb3f48 which fixed the corresponding issue for mutexes. the robust list can't be used here because the locks do not share a common layout with mutexes. at some point it may make sense to simply incorporate a mutex object into the FILE structure and use it, but that would be a much more invasive change, and it doesn't mesh well with the current design that uses a simpler code path for internal locking and pulls in the recursive-mutex-like code when the flockfile API is used explicitly.
*	fix fallback checks for kernels without private futex support	Rich Felker	2014-08-22	1	-1/+1
\| \| \| \|	for unknown syscall commands, the kernel produces ENOSYS, not EINVAL.
*	redesign cond var implementation to fix multiple issues	Rich Felker	2014-08-17	1	-5/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	the immediate issue that was reported by Jens Gustedt and needed to be fixed was corruption of the cv/mutex waiter states when switching to using a new mutex with the cv after all waiters were unblocked but before they finished returning from the wait function. self-synchronized destruction was also handled poorly and may have had race conditions. and the use of sequence numbers for waking waiters admitted a theoretical missed-wakeup if the sequence number wrapped through the full 32-bit space. the new implementation is largely documented in the comments in the source. the basic principle is to use linked lists initially attached to the cv object, but detachable on signal/broadcast, made up of nodes residing in automatic storage (stack) on the threads that are waiting. this eliminates the need for waiters to access the cv object after they are signaled, and allows us to limit wakeup to one waiter at a time during broadcasts even when futex requeue cannot be used. performance is also greatly improved, roughly double some tests. basically nothing is changed in the process-shared cond var case, where this implementation does not work, since processes do not have access to one another's local storage.
*	make pointers used in robust list volatile	Rich Felker	2014-08-17	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	when manipulating the robust list, the order of stores matters, because the code may be asynchronously interrupted by a fatal signal and the kernel will then access the robust list in what is essentially an async-signal context. previously, aliasing considerations made it seem unlikely that a compiler could reorder the stores, but proving that they could not be reordered incorrectly would have been extremely difficult. instead I've opted to make all the pointers used as part of the robust list, including those in the robust list head and in the individual mutexes, volatile. in addition, the format of the robust list has been changed to point back to the head at the end, rather than ending with a null pointer. this is to match the documented kernel robust list ABI. the null pointer, which was previously used, only worked because faults during access terminate the robust list processing.
*	make futex operations use private-futex mode when possible	Rich Felker	2014-08-15	1	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	private-futex uses the virtual address of the futex int directly as the hash key rather than requiring the kernel to resolve the address to an underlying backing for the mapping in which it lies. for certain usage patterns it improves performance significantly. in many places, the code using futex __wake and __wait operations was already passing a correct fixed zero or nonzero flag for the priv argument, so no change was needed at the site of the call, only in the __wake and __wait functions themselves. in other places, especially where the process-shared attribute for a synchronization object was not previously tracked, additional new code is needed. for mutexes, the only place to store the flag is in the type field, so additional bit masking logic is needed for accessing the type. for non-process-shared condition variable broadcasts, the futex requeue operation is unable to requeue from a private futex to a process-shared one in the mutex structure, so requeue is simply disabled in this case by waking all waiters. for robust mutexes, the kernel always performs a non-private wake when the owner dies. in order not to introduce a behavioral regression in non-process-shared robust mutexes (when the owning thread dies), they are simply forced to be treated as process-shared for now, giving correct behavior at the expense of performance. this can be fixed by adding explicit code to pthread_exit to do the right thing for non-shared robust mutexes in userspace rather than relying on the kernel to do it, and will be fixed in this way later. since not all supported kernels have private futex support, the new code detects EINVAL from the futex syscall and falls back to making the call without the private flag. no attempt to cache the result is made; caching it and using the cached value efficiently is somewhat difficult, and not worth the complexity when the benefits would be seen only on ancient kernels which have numerous other limitations and bugs anyway.
*	simplify errno implementation	Rich Felker	2014-06-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	the motivation for the errno_ptr field in the thread structure, which this commit removes, was to allow the main thread's errno to keep its address when lazy thread pointer initialization was used. &errno was evaluated prior to setting up the thread pointer and stored in errno_ptr for the main thread; subsequently created threads would have errno_ptr pointing to their own errno_val in the thread structure. since lazy initialization was removed, there is no need for this extra level of indirection; __errno_location can simply return the address of the thread's errno_val directly. this does cause &errno to change, but the change happens before entry to application code, and thus is not observable.
*	fix multiple bugs in SIGEV_THREAD timers	Rich Felker	2013-08-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. the thread result field was reused for storing a kernel timer id, but would be overwritten if the application code exited or cancelled the thread. 2. low pointer values were used as the indicator that the timer id is a kernel timer id rather than a thread id. this is not portable, as mmap may return low pointers on some conditions. instead, use the fact that pointers must be aligned and kernel timer ids must be non-negative to map pointers into the negative integer space. 3. signals were not blocked until after the timer thread started, so a race condition could allow a signal handler to run in the timer thread when it's not supposed to exist. this is mainly problematic if the calling thread was the only thread where the signal was unblocked and the signal handler assumes it runs in that thread.
*	transition to using functions for internal signal blocking/restoring	Rich Felker	2013-04-26	1	-0/+4
\| \| \| \| \| \| \| \| \| \|	there are several reasons for this change. one is getting rid of the repetition of the syscall signature all over the place. another is sharing the constant masks without costly GOT accesses in PIC. the main motivation, however, is accurately representing whether we want to block signals that might be handled by the application, or all signals.
*	implement pthread_getattr_np	Rich Felker	2013-03-31	1	-0/+2
\| \| \| \| \| \|	this function is mainly (purely?) for obtaining stack address information, but we also provide the detach state since it's easy to do anyway.
*	remove __SYSCALL_SSLEN arch macro in favor of using public _NSIG	Rich Felker	2013-03-26	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	the issue at hand is that many syscalls require as an argument the kernel-ABI size of sigset_t, intended to allow the kernel to switch to a larger sigset_t in the future. previously, each arch was defining this size in syscall_arch.h, which was redundant with the definition of _NSIG in bits/signal.h. as it's used in some not-quite-portable application code as well, _NSIG is much more likely to be recognized and understood immediately by someone reading the code, and it's also shorter and less cluttered. note that _NSIG is actually 65/129, not 64/128, but the division takes care of throwing away the off-by-one part.
*	replace __wake function with macro that performs direct syscall	Rich Felker	2013-02-01	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \|	this should generate faster and smaller code, especially with inline syscalls. the conditional with cnt is ugly, but thankfully cnt is always a constant anyway so it gets evaluated at compile time. it may be preferable to make separate __wake and __wakeall macros without a count argument. priv flag is not used yet; private futex support still needs to be done at some point in the future.
*	add support for thread scheduling (POSIX TPS option)	Rich Felker	2012-11-11	1	-0/+5
\| \| \| \| \| \| \| \| \| \|	linux's sched_* syscalls actually implement the TPS (thread scheduling) functionality, not the PS (process scheduling) functionality which the sched_* functions are supposed to have. omitting support for the PS option (and having the sched_* interfaces fail with ENOSYS rather than omitting them, since some broken software assumes they exist) seems to be the only conforming way to do this on linux.
*	clean up sloppy nested inclusion from pthread_impl.h	Rich Felker	2012-11-08	1	-8/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	this mirrors the stdio_impl.h cleanup. one header which is not strictly needed, errno.h, is left in pthread_impl.h, because since pthread functions return their error codes rather than using errno, nearly every single pthread function needs the errno constants. in a few places, rather than bringing in string.h to use memset, the memset was replaced by direct assignment. this seems to generate much better code anyway, and makes many functions which were previously non-leaf functions into leaf functions (possibly eliminating a great deal of bloat on some platforms where non-leaf functions require ugly prologue and/or epilogue).
*	support for TLS in dynamic-loaded (dlopen) modules	Rich Felker	2012-10-05	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	unlike other implementations, this one reserves memory for new TLS in all pre-existing threads at dlopen-time, and dlopen will fail with no resources consumed and no new libraries loaded if memory is not available. memory is not immediately distributed to running threads; that would be too complex and too costly. instead, assurances are made that threads needing the new TLS can obtain it in an async-signal-safe way from a buffer belonging to the dynamic linker/new module (via atomic fetch-and-add based allocator). I've re-appropriated the lock that was previously used for __synccall (synchronizing set*id() syscalls between threads) as a general pthread_create lock. it's a "backwards" rwlock where the "read" operation is safe atomic modification of the live thread count, which multiple threads can perform at the same time, and the "write" operation is making sure the count does not increase during an operation that depends on it remaining bounded (__synccall or dlopen). in static-linked programs that don't use __synccall, this lock is a no-op and has no cost.
*	beginnings of full TLS support in shared libraries	Rich Felker	2012-10-04	1	-1/+1
\| \| \| \| \| \|	this code will not work yet because the necessary relocations are not supported, and cannot be supported without some internal changes to how relocation processing works (coming soon).
*	fix (hopefully) all hard-coded 8's for kernel sigset_t size	Rich Felker	2012-08-09	1	-2/+5
\| \| \| \| \| \| \| \| \| \|	some minor changes to how hard-coded sets for thread-related purposes are handled were also needed, since the old object sizes were not necessarily sufficient. things have gotten a bit ugly in this area, and i think a cleanup is in order at some point, but for now the goal is just to get the code working on all supported archs including mips, which was badly broken by linux rejecting syscalls with the wrong sigset_t size.
*	fix several locks that weren't updated right for new futex-based __lock	Rich Felker	2012-07-12	1	-3/+3
\| \| \| \| \| \|	these could have caused memory corruption due to invalid accesses to the next field. all should be fixed now; I found the errors with fgrep -r '__lock(&', which is bogus since the argument should be an array.
*	add pthread_attr_setstack interface (and get)	Rich Felker	2012-06-09	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	i originally omitted these (optional, per POSIX) interfaces because i considered them backwards implementation details. however, someone later brought to my attention a fairly legitimate use case: allocating thread stacks in memory that's setup for sharing and/or fast transfer between CPU and GPU so that the thread can move data to a GPU directly from automatic-storage buffers without having to go through additional buffer copies. perhaps there are other situations in which these interfaces are useful too.
*	increase default thread stack size to 80k	Rich Felker	2012-06-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I've been looking for data that would suggest a good default, and since little has shown up, i'm doing this based on the limited data I have. the value 80k is chosen to accommodate 64k of application data (which happens to be the size of the buffer in git that made it crash without a patch to call pthread_attr_setstacksize) plus the max stack usage of most libc functions (with a few exceptions like crypt, which will be fixed soon to avoid excessive stack usage, and [n]ftw, which inherently uses a fair bit in recursive directory searching). if further evidence emerges suggesting that the default should be larger, I'll consider changing it again, but I'd like to avoid it getting too large to avoid the issues of large commit charge and rapid address space exhaustion on 32-bit machines.
*	remove cruft from pthread structure (old cancellation stuff)	Rich Felker	2012-05-25	1	-2/+0
\|
*	fix out-of-bounds array access in pthread barriers on 64-bit	Rich Felker	2012-05-21	1	-1/+1
\| \| \| \| \|	it's ok to overlap with integer slot 3 on 32-bit because only slots 0-2 are used on process-local barriers.
*	overhaul SSP support to use a real canary	Rich Felker	2012-05-03	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	pthread structure has been adjusted to match the glibc/GCC abi for where the canary is stored on i386 and x86_64. it will need variants for other archs to provide the added security of the canary's entropy, but even without that it still works as well as the old "minimal" ssp support. eventually such changes will be made anyway, since they are also needed for GCC/C11 thread-local storage support (not yet implemented). care is taken not to attempt initializing the thread pointer unless the program actually uses SSP (by reference to __stack_chk_fail).
*	synchronize cond var destruction with exiting waits	Rich Felker	2011-10-02	1	-0/+1
\|
*	improve pshared barriers	Rich Felker	2011-09-28	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	eliminate the sequence number field and instead use the counter as the futex because of the way the lock is held, sequence numbers are completely useless, and this frees up a field in the barrier structure to be used as a waiter count for the count futex, which lets us avoid some syscalls in the best case. as of now, self-synchronized destruction and unmapping should be fully safe. before any thread can return from the barrier, all threads in the barrier have obtained the vm lock, and each holds a shared lock on the barrier. the barrier memory is not inspected after the shared lock count reaches 0, nor after the vm lock is released.
*	process-shared barrier support, based on discussion with bdonlan	Rich Felker	2011-09-27	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \|	this implementation is rather heavy-weight, but it's the first solution i've found that's actually correct. all waiters actually wait twice at the barrier so that they can synchronize exit, and they hold a "vm lock" that prevents changes to virtual memory mappings (and blocks pthread_barrier_destroy) until all waiters are finished inspecting the barrier. thus, it is safe for any thread to destroy and/or unmap the barrier's memory as soon as pthread_barrier_wait returns, without further synchronization.
*	fix lost signals in cond vars	Rich Felker	2011-09-26	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	due to moving waiters from the cond var to the mutex in bcast, these waiters upon wakeup would steal slots in the count from newer waiters that had not yet been signaled, preventing the signal function from taking any action. to solve the problem, we simply use two separate waiter counts, and so that the original "total" waiters count is undisturbed by broadcast and still available for signal.
*	redo cond vars again, use sequence numbers	Rich Felker	2011-09-26	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	testing revealed that the old implementation, while correct, was giving way too many spurious wakeups due to races changing the value of the condition futex. in a test program with 5 threads receiving broadcast signals, the number of returns from pthread_cond_wait was roughly 3 times what it should have been (2 spurious wakeups for every legitimate wakeup). moreover, the magnitude of this effect seems to grow with the number of threads. the old implementation may also have had some nasty race conditions with reuse of the cond var with a new mutex. the new implementation is based on incrementing a sequence number with each signal event. this sequence number has nothing to do with the number of threads intended to be woken; it's only used to provide a value for the futex wait to avoid deadlock. in theory there is a danger of race conditions due to the value wrapping around after 2^32 signals. it would be nice to eliminate that, if there's a way. testing showed no spurious wakeups (though they are of course possible) with the new implementation, as well as slightly improved performance.
*	new futex-requeue-based pthread_cond_broadcast implementation	Rich Felker	2011-09-25	1	-3/+6
\| \| \| \| \| \|	this avoids the "stampede effect" where pthread_cond_broadcast would result in all waiters waking up simultaneously, only to immediately contend for the mutex and go back to sleep.
*	fix deadlock in condition wait whenever there are multiple waiters	Rich Felker	2011-09-22	1	-0/+1
\| \| \| \| \|	it's amazing none of the conformance tests i've run even bothered to check whether something so basic works...
*	overhaul clone syscall wrapping	Rich Felker	2011-09-18	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	several things are changed. first, i have removed the old __uniclone function signature and replaced it with the "standard" linux __clone/clone signature. this was necessary to expose clone to applications anyway, and it makes it easier to port __clone to new archs, since it's now testable independently of pthread_create. secondly, i have removed all references to the ugly ldt descriptor structure (i386 only) from the c code and pthread structure. in places where it is needed, it is now created on the stack just when it's needed, in assembly code. thus, the i386 __clone function takes the desired thread pointer as its argument, rather than an ldt descriptor pointer, just like on all other sane archs. this should not affect applications since there is really no way an application can use clone with threads/tls in a way that doesn't horribly conflict with and clobber the underlying implementation's use. applications are expected to use clone only for creating actual processes, possibly with new namespace features and whatnot.
*	pthread and synccall cleanup, new __synccall_wait op	Rich Felker	2011-08-12	1	-0/+1
\| \| \| \| \| \| \| \| \|	fix up clone signature to match the actual behavior. the new __syncall_wait function allows a __synccall callback to wait for other threads to continue without returning, so that it can resume action after the caller finishes. this interface could be made significantly more general/powerful with minimal effort, but i'll wait to do that until it's actually useful for something.
*	overhaul rwlocks to address several issues	Rich Felker	2011-08-03	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	like mutexes and semaphores, rwlocks suffered from a race condition where the unlock operation could access the lock memory after another thread successfully obtained the lock (and possibly destroyed or unmapped the object). this has been fixed in the same way it was fixed for other lock types. in addition, the previous implementation favored writers over readers. in the absence of other considerations, that is the best behavior for rwlocks, and posix explicitly allows it. however posix also requires read locks to be recursive. if writers are favored, any attempt to obtain a read lock while a writer is waiting for the lock will fail, causing "recursive" read locks to deadlock. this can be avoided by keeping track of which threads already hold read locks, but doing so requires unbounded memory usage, and there must be a fallback case that favors readers in case memory allocation failed. and all of this must be synchronized. the cost, complexity, and risk of errors in getting it right is too great, so we simply favor readers. tracking of the owner of write locks has been removed, as it was not useful for anything. it could allow deadlock detection, but it's not clear to me that returning EDEADLK (which a buggy program is likely to ignore) is better than deadlocking; at least the latter behavior prevents further data corruption. a correct program cannot invoke this situation anyway. the reader count and write lock state, as well as the "last minute" waiter flag have all been combined into a single atomic lock. this means all state transitions for the lock are atomic compare-and-swap operations. this makes establishing correctness much easier and may improve performance. finally, some code duplication has been cleaned up. more is called for, especially the standard __timedwait idiom repeated in all locks.
*	unify and overhaul timed futex waits	Rich Felker	2011-08-02	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	new features: - FUTEX_WAIT_BITSET op will be used for timed waits if available. this saves a call to clock_gettime. - error checking for the timespec struct is now inside __timedwait so it doesn't need to be duplicated everywhere. cond_timedwait still needs to duplicate it to avoid unlocking the mutex, though. - pushing and popping the cancellation handler is delegated to __timedwait, and cancellable/non-cancellable waits are unified.
*	add proper fuxed-based locking for stdio	Rich Felker	2011-07-30	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	previously, stdio used spinlocks, which would be unacceptable if we ever add support for thread priorities, and which yielded pathologically bad performance if an application attempted to use flockfile on a key file as a major/primary locking mechanism. i had held off on making this change for fear that it would hurt performance in the non-threaded case, but actually support for recursive locking had already inflicted that cost. by having the internal locking functions store a flag indicating whether they need to perform unlocking, rather than using the actual recursive lock counter, i was able to combine the conditionals at unlock time, eliminating any additional cost, and also avoid a nasty corner case where a huge number of calls to ftrylockfile could cause deadlock later at the point of internal locking. this commit also fixes some issues with usage of pthread_self conflicting with __attribute__((const)) which resulted in crashes with some compiler versions/optimizations, mainly in flockfile prior to pthread_create.
*	new attempt at making set*id() safe and robust	Rich Felker	2011-07-29	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	changing credentials in a multi-threaded program is extremely difficult on linux because it requires synchronizing the change between all threads, which have their own thread-local credentials on the kernel side. this is further complicated by the fact that changing the real uid can fail due to exceeding RLIMIT_NPROC, making it possible that the syscall will succeed in some threads but fail in others. the old __rsyscall approach being replaced was robust in that it would report failure if any one thread failed, but in this case, the program would be left in an inconsistent state where individual threads might have different uid. (this was not as bad as glibc, which would sometimes even fail to report the failure entirely!) the new approach being committed refuses to change real user id when it cannot temporarily set the rlimit to infinity. this is completely POSIX conformant since POSIX does not require an implementation to allow real-user-id changes for non-privileged processes whatsoever. still, setting the real uid can fail due to memory allocation in the kernel, but this can only happen if there is not already a cached object for the target user. thus, we forcibly serialize the syscalls attempts, and fail the entire operation on the first failure. this should lead to an all-or-nothing success/failure result, but it's still fragile and highly dependent on kernel developers not breaking things worse than they're already broken. ideally linux will eventually add a CLONE_USERCRED flag that would give POSIX conformant credential changes without any hacks from userspace, and all of this code would become redundant and could be removed ~10 years down the line when everyone has abandoned the old broken kernels. i'm not holding my breath...
*	fix race condition in pthread_kill	Rich Felker	2011-06-14	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	if thread id was reused by the kernel between the time pthread_kill read it from the userspace pthread_t object and the time of the tgkill syscall, a signal could be sent to the wrong thread. the tgkill syscall was supposed to prevent this race (versus the old tkill syscall) but it can't; it can only help in the case where the tid is reused in a different process, but not when the tid is reused in the same process. the only solution i can see is an extra lock to prevent threads from exiting while another thread is trying to pthread_kill them. it should be very very cheap in the non-contended case.
*	fix sigset macro for 64-bit systems (<< was overflowing due to wrong type)	Rich Felker	2011-06-13	1	-1/+1
\|
*	implement uselocale function (minimal)	Rich Felker	2011-05-30	1	-0/+2
\|
*	optimize compound-literal sigset_t's not to contain useless hurd bits	Rich Felker	2011-05-07	1	-2/+4
\|
*	overhaul implementation-internal signal protections	Rich Felker	2011-05-07	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	the new approach relies on the fact that the only ways to create sigset_t objects without invoking UB are to use the sigset() functions, or from the masks returned by sigprocmask, sigaction, etc. or in the ucontext_t argument to a signal handler. thus, as long as sigfillset and sigaddset avoid adding the "protected" signals, there is no way the application will ever obtain a sigset_t including these bits, and thus no need to add the overhead of checking/clearing them when sigprocmask or sigaction is called. note that the old code actually failed* to remove the bits from sa_mask when sigaction was called. the new implementations are also significantly smaller, simpler, and faster due to ignoring the useless "GNU HURD signals" 65-1024, which are not used and, if there's any sanity in the world, never will be used.
*	completely new barrier implementation, addressing major correctness issues	Rich Felker	2011-05-06	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	the previous implementation had at least 2 problems: 1. the case where additional threads reached the barrier before the first wave was finished leaving the barrier was untested and seemed not to be working. 2. threads leaving the barrier continued to access memory within the barrier object after other threads had successfully returned from pthread_barrier_wait. this could lead to memory corruption or crashes if the barrier object had automatic storage in one of the waiting threads and went out of scope before all threads finished returning, or if one thread unmapped the memory in which the barrier object lived. the new implementation avoids both problems by making the barrier state essentially local to the first thread which enters the barrier wait, and forces that thread to be the last to return.
*	overhaul pthread cancellation	Rich Felker	2011-04-17	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	this patch improves the correctness, simplicity, and size of cancellation-related code. modulo any small errors, it should now be completely conformant, safe, and resource-leak free. the notion of entering and exiting cancellation-point context has been completely eliminated and replaced with alternative syscall assembly code for cancellable syscalls. the assembly is responsible for setting up execution context information (stack pointer and address of the syscall instruction) which the cancellation signal handler can use to determine whether the interrupted code was in a cancellable state. these changes eliminate race conditions in the previous generation of cancellation handling code (whereby a cancellation request received just prior to the syscall would not be processed, leaving the syscall to block, potentially indefinitely), and remedy an issue where non-cancellable syscalls made from signal handlers became cancellable if the signal handler interrupted a cancellation point. x86_64 asm is untested and may need a second try to get it right.
*	use a separate signal from SIGCANCEL for SIGEV_THREAD timers	Rich Felker	2011-04-14	1	-0/+1
\| \| \| \| \| \|	otherwise we cannot support an application's desire to use asynchronous cancellation within the callback function. this change also slightly debloats pthread_create.c.
*	greatly improve SIGEV_THREAD timers	Rich Felker	2011-04-09	1	-0/+1
\| \| \| \| \|	calling pthread_exit from, or pthread_cancel on, the timer callback thread will no longer destroy the timer.
*	move rsyscall out of pthread_create module	Rich Felker	2011-04-06	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	this is something of a tradeoff, as now setid() functions, rather than pthread_create, are what pull in the code overhead for dealing with linux's refusal to implement proper POSIX thread-vs-process semantics. my motivations are: 1. it's cleaner this way, especially cleaner to optimize out the rsyscall locking overhead from pthread_create when it's not needed. 2. it's expected that only a tiny number of core system programs will ever use setid() functions, whereas many programs may want to use threads, and making thread overhead tiny is an incentive for "light" programs to try threads.
*	optimize timer creation and possibly protect against some minor races	Rich Felker	2011-03-30	1	-2/+0
\| \| \| \| \| \| \| \| \|	the major idea of this patch is not to depend on having the timer pointer delivered to the signal handler, and instead use the thread pointer to get the callback function address and argument. this way, the parent thread can make the timer_create syscall while the child thread is starting, and it should never have to block waiting for the barrier.
*	major improvements to cancellation handling	Rich Felker	2011-03-29	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \|	- there is no longer any risk of spoofing cancellation requests, since the cancel flag is set in pthread_cancel rather than in the signal handler. - cancellation signal is no longer unblocked when running the cancellation handlers. instead, pthread_create will cause any new threads created from a cancellation handler to unblock their own cancellation signal. - various tweaks in preparation for POSIX timer support.
*	some preliminaries for adding POSIX timers	Rich Felker	2011-03-29	1	-0/+4
\|
*	remove useless field in pthread struct (wasted a good bit of space)	Rich Felker	2011-03-28	1	-1/+0
\|