about summary refs log tree commit diff
path: root/src/internal
Commit message (Collapse)AuthorAgeFilesLines
* overhaul SSP support to use a real canaryRich Felker2012-05-031-0/+4
| | | | | | | | | | | | | pthread structure has been adjusted to match the glibc/GCC abi for where the canary is stored on i386 and x86_64. it will need variants for other archs to provide the added security of the canary's entropy, but even without that it still works as well as the old "minimal" ssp support. eventually such changes will be made anyway, since they are also needed for GCC/C11 thread-local storage support (not yet implemented). care is taken not to attempt initializing the thread pointer unless the program actually uses SSP (by reference to __stack_chk_fail).
* fix off-by-one error that caused uninitialized memory read in floatscanRich Felker2012-04-301-1/+1
| | | | | | this caused misreading of certain floating point values that are exact multiples of large powers of ten, unpredictable depending on prior stack contents.
* ditch the priority inheritance locks; use malloc's version of lockRich Felker2012-04-242-3/+3
| | | | | | | | | | | | | | | | | | | i did some testing trying to switch malloc to use the new internal lock with priority inheritance, and my malloc contention test got 20-100 times slower. if priority inheritance futexes are this slow, it's simply too high a price to pay for avoiding priority inversion. maybe we can consider them somewhere down the road once the kernel folks get their act together on this (and perferably don't link it to glibc's inefficient lock API)... as such, i've switch __lock to use malloc's implementation of lightweight locks, and updated all the users of the code to use an array with a waiter count for their locks. this should give optimal performance in the vast majority of cases, and it's simple. malloc is still using its own internal copy of the lock code because it seems to yield measurably better performance with -O3 when it's inlined (20% or more difference in the contention stress test).
* new internal locking primitive; drop spinlocksRich Felker2012-04-241-1/+2
| | | | | | we use priority inheritance futexes if possible so that the library cannot hit internal priority inversion deadlocks in the presence of realtime priority scheduling (full support to be added later).
* remove redundant (unmaintained) check in floatscanRich Felker2012-04-221-3/+3
| | | | also be extra careful to avoid wrapping the circular buffer early
* make floatscan correctly set errno for overflow/underflowRich Felker2012-04-211-4/+16
| | | | | | | | | | | care is taken that the setting of errno correctly reflects underflow condition. scanning exact denormal values does not result in ERANGE, nor does scanning values (such as the usual string definition of FLT_MIN) which are actually less than the smallest normal number but which round to a normal result. only the decimal case is handled so far; hex float require a separate fix to come later.
* skip leading zeros even after decimal point in floatscanRich Felker2012-04-211-4/+9
| | | | | | in principle this should just be an optimization, but it happens to also fix a nasty bug where values like 0.00000000001 were getting caught by the early zero detection path and wrongly scanned as zero.
* fix overread (consuming an extra byte) scanning NANRich Felker2012-04-211-1/+1
| | | | bug detected by glib test suite
* fix really bad breakage in strtol, etc.: failure to accept leading spacesRich Felker2012-04-193-4/+5
|
* fix typo in exponent reading code or floatsRich Felker2012-04-181-1/+1
| | | | | this was basically harmless, but could have resulted in misreading inputs with more than a few gigabytes worth of digits..
* fix failure to read infinity in scanfRich Felker2012-04-171-3/+4
| | | | | | this code worked in strtod, but not in scanf. more evidence that i should design a better interface for discarding multiple tail characters than just calling unget repeatedly...
* fix failure of int parser to unget an initial mismatching characterRich Felker2012-04-171-0/+1
|
* use the new integer parser (FILE/shgetc based) for strtol, wcstol, etc.Rich Felker2012-04-162-127/+0
|
* new scanf implementation and corresponding integer parser/converterRich Felker2012-04-163-0/+107
| | | | | | | | | | | | | | | advantages over the old code: - correct results for floating point (old code was bogus) - wide/regular scanf separated so scanf does not pull in wide code - well-defined behavior on integers that overflow dest type - support for %[a-b] ranges with %[ (impl-defined by widely used) - no intermediate conversion of fmt string to wide string - cleaner, easier to share code with strto* functions - better standards conformance for corner cases the old code remains in the source tree, as the wide versions of the scanf-family functions are still using it. it will be removed when no longer needed.
* fix buggy limiter handling in shgetcRich Felker2012-04-161-4/+3
| | | | this is needed for upcoming new scanf
* fix broken shgetc limiter logic (wasn't working)Rich Felker2012-04-162-2/+5
|
* floatscan: fix incorrect count of leading nonzero digitsRich Felker2012-04-161-1/+1
| | | | | | | this off-by-one error was causing values with just one digit past the decimal point to be treated by the integer case. in many cases it would yield the correct result, but if expressions are evaluated in excess precision, double rounding may occur.
* use fast version of the int reading code for the high-order digits tooRich Felker2012-04-131-3/+13
| | | | | this increases code size slightly, but it's considerably faster, especially for power-of-2 bases.
* use macros instead of inline functions in shgetc.hRich Felker2012-04-131-20/+4
| | | | | | at -Os optimization level, gcc refuses to inline these functions even though the inlined code would roughly the same size as the function call, and much faster. the easy solution is to make them into macros.
* fix spurious overflows in strtoull with small basesRich Felker2012-04-131-7/+3
| | | | | | | whenever the base was small enough that more than one digit could still fit after UINTMAX_MAX/36-1 was reached, only the first would be allowed; subsequent digits would trigger spurious overflow, making it impossible to read the largest values in low bases.
* remove magic numbers from floatscanRich Felker2012-04-121-5/+5
|
* optimize more integer cases in floatscan; comment the whole procedureRich Felker2012-04-121-8/+27
|
* revert invalid optimization in floatscanRich Felker2012-04-111-2/+2
|
* fix stupid typo in floatscan that caused excess rounding of some valuesRich Felker2012-04-111-1/+1
|
* optimize floatscan downscaler to skip results that won't be neededRich Felker2012-04-111-2/+3
| | | | | | | | | | | | | | when upscaling, even the very last digit is needed in cases where the input is exact; no digits can be discarded. but when downscaling, any digits less significant than the mantissa bits are destined for the great bitbucket; the only influence they can have is their presence (being nonzero). thus, we simply throw them away early. the result is nearly a 4x performance improvement for processing huge values. the particular threshold LD_B1B_DIG+3 is not chosen sharply; it's simply a "safe" distance past the significant bits. it would be nice to replace it with a sharp bound, but i suspect performance will be comparable (within a few percent) anyway.
* simplify/debloat radix point alignment code in floatscanRich Felker2012-04-111-9/+4
| | | | | | | | | | | | now that this is the first operation, it can rely on the circular buffer contents not being wrapped when it begins. we limit the number of digits read slightly in the initial parsing loops too so that this code does not have to consider the case where it might cause the circular buffer to wrap; this is perfectly fine because KMAX is chosen as a power of two for circular-buffer purposes and is much larger than it otherwise needs to be, anyway. these changes should not affect performance at all.
* optimize floatscan: avoid excessive upscalingRich Felker2012-04-111-27/+27
| | | | | | | | | | upscaling by even one step too much creates 3-29 extra iterations for the next loop. this is still suboptimal since it always goes by 2^29 rather than using a smaller upscale factor when nearing the target, but performance on common, small-magnitude, few-digit values has already more than doubled with this change. more optimizations on the way...
* fix incorrect initial count in shgetc when data is already bufferedRich Felker2012-04-111-1/+1
|
* fix bug parsing lone zero followed by junk, and hex float over-readingRich Felker2012-04-111-6/+5
|
* fix float scanning of certain values ending in zerosRich Felker2012-04-101-1/+3
| | | | | | | for example, "1000000000" was being read as "1" due to this loop exiting early. it's necessary to actually update z and zero the entries so that the subsequent rounding code does not get confused; before i did that, spurious inexact exceptions were being raised.
* fix potential overflow in exponent readingRich Felker2012-04-101-1/+1
| | | | | | | note that there's no need for a precise cutoff, because exponents this large will always result in overflow or underflow (it's impossible to read enough digits to compensate for the exponent magnitude; even at a few nanoseconds per digit it would take hundreds of years).
* set errno properly when parsing floating pointRich Felker2012-04-101-4/+21
|
* add "scan helper getc" and rework strtod, etc. to use itRich Felker2012-04-105-73/+111
| | | | | | | | | | | | | | | | | | the immediate benefit is a significant debloating of the float parsing code by moving the responsibility for keeping track of the number of characters read to a different module. by linking shgetc with the stdio buffer logic, counting logic is defered to buffer refill time, keeping the calls to shgetc fast and light. in the future, shgetc will also be useful for integrating the new float code with scanf, which needs to not only count the characters consumed, but also limit the number of characters read based on field width specifiers. shgetc may also become a useful tool for simplifying the integer parsing code.
* new floating point parser/converterRich Felker2012-04-102-0/+446
| | | | | | | | | | | | | | | | | this version is intended to be fully conformant to the ISO C, POSIX, and IEEE standards for conversion of decimal/hex floating point strings to float, double, and long double (ld64 or ld80 only at present) values. in particular, all results are intended to be rounded correctly according to the current rounding mode. further, this implementation aims to set the floating point underflow, overflow, and inexact flags to reflect the conversion performed. a moderate amount of testing has been performed (by nsz and myself) prior to integration of the code in musl, but it still may have bugs. so far, only strto(d|ld|f) use the new code. scanf integration will be done as a separate commit, and i will add implementations of the wide character functions later.
* add creal/cimag macros in complex.h (and use them in the functions defs)Rich Felker2012-03-221-8/+0
|
* don't inline __rem_pio2l so the code size is smallernsz2012-03-191-0/+1
|
* fix loads of missing const in new libm, and some global vars (?!) in powlRich Felker2012-03-181-2/+2
|
* fix namespace issues for lgamma, etc.Rich Felker2012-03-161-0/+2
| | | | standard functions cannot depend on nonstandard symbols
* first commit of the new libm!Rich Felker2012-03-132-0/+323
| | | | | | | | | | | | | | | | thanks to the hard work of Szabolcs Nagy (nsz), identifying the best (from correctness and license standpoint) implementations from freebsd and openbsd and cleaning them up! musl should now fully support c99 float and long double math functions, and has near-complete complex math support. tgmath should also work (fully on gcc-compatible compilers, and mostly on any c99 compiler). based largely on commit 0376d44a890fea261506f1fc63833e7a686dca19 from nsz's libm git repo, with some additions (dummy versions of a few missing long double complex functions, etc.) by me. various cleanups still need to be made, including re-adding (if they're correct) some asm functions that were dropped.
* fix obscure bug in strtoull reading the highest 16 possible valuesRich Felker2012-03-021-1/+1
|
* new attempt at working around the gcc 3 visibility bugRich Felker2012-02-242-0/+7
| | | | | since gcc is failing to generate the necessary ".hidden" directive in the output asm, generate it explicitly with an __asm__ statement...
* remove useless attribute visibility from definitionsRich Felker2012-02-241-1/+1
| | | | | | this was a failed attempt at working around the gcc 3 visibility bug affecting x86_64. subsequent patch will address it with an ugly but working hack.
* cleanup and work around visibility bug in gcc 3 that affects x86_64Rich Felker2012-02-232-6/+11
| | | | | | | | | | | | | | in gcc 3, the visibility attribute must be placed on both the declaration and on the definition. if it's omitted from the definition, the compiler fails to emit the ".hidden" directive in the assembly, and the linker will either generate textrels (if supported, such as on i386) or refuse to link (on targets where certain types of textrels are forbidden or impossible without further assumptions about memory layout, such as on x86_64). this patch also unifies the decision about when to use visibility into libc.h and makes the visibility in the utf-8 state machine tables based on libc.h rather than a duplicate test.
* synchronize cond var destruction with exiting waitsRich Felker2011-10-021-0/+1
|
* improve pshared barriersRich Felker2011-09-281-1/+1
| | | | | | | | | | | | | | eliminate the sequence number field and instead use the counter as the futex because of the way the lock is held, sequence numbers are completely useless, and this frees up a field in the barrier structure to be used as a waiter count for the count futex, which lets us avoid some syscalls in the best case. as of now, self-synchronized destruction and unmapping should be fully safe. before any thread can return from the barrier, all threads in the barrier have obtained the vm lock, and each holds a shared lock on the barrier. the barrier memory is not inspected after the shared lock count reaches 0, nor after the vm lock is released.
* process-shared barrier support, based on discussion with bdonlanRich Felker2011-09-271-3/+5
| | | | | | | | | | | | | this implementation is rather heavy-weight, but it's the first solution i've found that's actually correct. all waiters actually wait twice at the barrier so that they can synchronize exit, and they hold a "vm lock" that prevents changes to virtual memory mappings (and blocks pthread_barrier_destroy) until all waiters are finished inspecting the barrier. thus, it is safe for any thread to destroy and/or unmap the barrier's memory as soon as pthread_barrier_wait returns, without further synchronization.
* fix lost signals in cond varsRich Felker2011-09-261-0/+1
| | | | | | | | | | | due to moving waiters from the cond var to the mutex in bcast, these waiters upon wakeup would steal slots in the count from newer waiters that had not yet been signaled, preventing the signal function from taking any action. to solve the problem, we simply use two separate waiter counts, and so that the original "total" waiters count is undisturbed by broadcast and still available for signal.
* cleanup various minor issues reported by nszRich Felker2011-09-261-3/+3
| | | | | | | | | the changes to syscall_ret are mostly no-ops in the generated code, just cleanup of type issues and removal of some implementation-defined behavior. the one exception is the change in the comparison value, which is fixed so that 0xf...f000 (which in principle could be a valid return value for mmap, although probably never in reality) is not treated as an error return.
* redo cond vars again, use sequence numbersRich Felker2011-09-261-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | testing revealed that the old implementation, while correct, was giving way too many spurious wakeups due to races changing the value of the condition futex. in a test program with 5 threads receiving broadcast signals, the number of returns from pthread_cond_wait was roughly 3 times what it should have been (2 spurious wakeups for every legitimate wakeup). moreover, the magnitude of this effect seems to grow with the number of threads. the old implementation may also have had some nasty race conditions with reuse of the cond var with a new mutex. the new implementation is based on incrementing a sequence number with each signal event. this sequence number has nothing to do with the number of threads intended to be woken; it's only used to provide a value for the futex wait to avoid deadlock. in theory there is a danger of race conditions due to the value wrapping around after 2^32 signals. it would be nice to eliminate that, if there's a way. testing showed no spurious wakeups (though they are of course possible) with the new implementation, as well as slightly improved performance.
* new futex-requeue-based pthread_cond_broadcast implementationRich Felker2011-09-251-3/+6
| | | | | | this avoids the "stampede effect" where pthread_cond_broadcast would result in all waiters waking up simultaneously, only to immediately contend for the mutex and go back to sleep.