about summary refs log tree commit diff
Commit message (Collapse)AuthorAgeFilesLines
* s390x: drop SO_ definitions from bits/socket.hSzabolcs Nagy2019-07-011-28/+0
| | | | the s390x definitions matched the generic ones in sys/socket.h.
* netinet/in.h: add IPV6_ROUTER_ALERT_ISOLATE from linux v5.1Szabolcs Nagy2019-07-011-0/+1
| | | | | | | | restricts router alert packets received by the socket to the socket's namespace only. see linux commit 9036b2fe092a107856edd1a3bad48b83f2b45000 net: ipv6: add socket option IPV6_ROUTER_ALERT_ISOLATE
* sys/prctl.h: add PR_SPEC_DISABLE_NOEXEC from linux v5.1Szabolcs Nagy2019-07-011-0/+1
| | | | | | | | allows specifying that the speculative store bypass disable bit should be cleared on exec. see linux commit 71368af9027f18fe5d1c6f372cfdff7e4bde8b48 x86/speculation: Add PR_SPEC_DISABLE_NOEXEC
* fcntl.h: add F_SEAL_FUTURE_WRITE from linux v5.1Szabolcs Nagy2019-07-011-0/+1
| | | | | | | | | needed for android so it can migrate from its ashmem to memfd. allows making the memfd readonly for future users while keeping a writable mmap of it. see linux commit ab3948f58ff841e51feb845720624665ef5b7ef3 mm/memfd: add an F_SEAL_FUTURE_WRITE seal to memfd
* sys/fanotify.h: update for linux v5.1Szabolcs Nagy2019-07-011-1/+33
| | | | | | | | | | | | | | | | | includes changes from linux v5.1 linux commit 235328d1fa4251c6dcb32351219bb553a58838d2 fanotify: add support for create/attrib/move/delete events linux commit 5e469c830fdb5a1ebaa69b375b87f583326fd296 fanotify: copy event fid info to user linux commit e9e0c8903009477b630e37a8b6364b26a00720da fanotify: encode file identifier for FAN_REPORT_FID as well as earlier changes that were missed. sys/statfs.h is included for fsid_t.
* fix deadlock in synccall after threaded forkSamuel Holland2019-07-011-0/+1
| | | | | | | | | | | | | | | synccall may be called by AS-safe functions such as setuid/setgid after fork. although fork() resets libc.threads_minus_one, causing synccall to take the single-threaded path, synccall still takes the thread list lock. This lock may be held by another thread if for example fork() races with pthread_create(). After fork(), the value of the lock is meaningless, so clear it. maintainer's note: commit 8f11e6127fe93093f81a52b15bb1537edc3fc8af and e4235d70672d9751d7718ddc2b52d0b426430768 introduced this regression. the state protected by this lock is the linked list, which is entirely replaced in the child path of fork (next=prev=self), so resetting it is semantically sound.
* cap getdents length argument to INT_MAXRich Felker2019-06-281-0/+2
| | | | | | | | the linux syscall treats this argument as having type int, so passing extremely long buffer sizes would be misinterpreted by the kernel. since "short reads" are always acceptable, just cap it down. patch based on report and suggested change by Florian Weimer.
* remove unnecessary and problematic _Noreturn from crt/ldso startupRich Felker2019-06-253-5/+5
| | | | | | | | | | | | | | | | | | | after commit a48ccc159a5fa061a18419296100ee48a1cd6cc9 removed the use of _Noreturn on the stage3_func type (which only worked due to it being defined to the "GNU C" attribute in C99 mode), GCC could no longer assume that the ends of __dls2 and __dls2b are unreachable, and produced a warning that a function marked _Noreturn returns. also, since commit 4390383b32250a941ec616e8bff6f568a801b1c0, the _Noreturn declaration for __libc_start_main in crt1/rcrt1 has been not only inconsistent with the definition, but wrong. formally, __libc_start_main does return, via a (hopefully) tail call to a helper function after the barrier. incorrect usage of _Noreturn in the declaration was probably formal UB. the _Noreturn specifiers were not useful in any of these places, so remove them all. now, the only remaining usage of _Noreturn is in public interfaces where _Noreturn is part of their contract.
* allow fmemopen with zero sizeRich Felker2019-06-251-1/+1
| | | | | | | previously, POSIX erroneously required this to fail with EINVAL despite the traditional glibc implementation, on which the POSIX interface was based, allowing it. the resolution of Austin Group issue 818 removes the requirement to fail.
* do not use _Noreturn for a function pointer in dynamic linkerMatthew Maurer2019-06-211-1/+1
| | | | | _Noreturn is a C11 construct, and may only be used at the site of a function definition.
* remove implicit include of sys/sysmacros.h from sys/types.hRich Felker2019-06-211-1/+0
| | | | | | | | | | this reverts commit f552c792c7ce5a560f214e1104d93ee5b0833967, which exposed the sysmacros.h macros (device major/minor calculations) for BSD and GNU profiles to mimic an unintentional glibc behavior some code depended on. glibc has deprecated and since removed them as the resolution to bug #19239, so it makes no sense for us to keep this behavior. affected code should all have been fixed by now, and if it's not yet fixed it needs to be for use with modern glibc anyway.
* add riscv64 architecture supportRich Felker2019-06-1451-0/+1362
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Author: Alex Suykov <alex.suykov@gmail.com> Author: Aric Belsito <lluixhi@gmail.com> Author: Drew DeVault <sir@cmpwn.com> Author: Michael Clark <mjc@sifive.com> Author: Michael Forney <mforney@mforney.org> Author: Stefan O'Rear <sorear2@gmail.com> This port has involved the work of many people over several years. I have tried to ensure that everyone with substantial contributions has been credited above; if any omissions are found they will be noted later in an update to the authors/contributors list in the COPYRIGHT file. The version committed here comes from the riscv/riscv-musl repo's commit 3fe7e2c75df78eef42dcdc352a55757729f451e2, with minor changes by me for issues found during final review: - a_ll/a_sc atomics are removed (according to the ISA spec, lr/sc are not safe to use in separate inline asm fragments) - a_cas[_p] is fixed to be a memory barrier - the call from the _start assembly into the C part of crt1/ldso is changed to allow for the possibility that the linker does not place them nearby each other. - DTP_OFFSET is defined correctly so that local-dynamic TLS works - reloc.h LDSO_ARCH logic is simplified and made explicit. - unused, non-functional crti/n asm files are removed. - an empty .sdata section is added to crt1 so that the __global_pointer reference is resolvable. - indentation style errors in some asm files are fixed.
* optimize aarch64 dynamic tlsdesc function to spill fewer registersRich Felker2019-05-261-10/+7
| | | | | | | | | | | | | | with the glibc generation counter model for reusing dynamic tls slots after dlclose, it's really not possible to get away with fewer than 4 working registers. for us however it's always been possible, but tricky, and only became apparent after the switch to installing new dynamic tls at dlopen time. by merging the negated thread pointer into the addend early, the register holding the thread pointer can immediately be reused, bringing the working register count down to three. this allows saving/restoring via a single stp/ldp pair, since the return register x0 does not need to be saved. net reduction of 3 instructions, 2 of which were push/pop.
* make powerpc64 vrregset_t logical layout match expected APIRich Felker2019-05-221-1/+4
| | | | | | | | | between v2 and v3 of the powerpc64 port patch, the change was made from a 32x4 array of 32-bit unsigned ints for vrregs[] to a 32-element array of __int128. this mismatches the API applications working with mcontext_t expect from glibc, and seems to have been motivated by a misinterpretation of a comment on how aarch64 did things as a suggestion to do the same on powerpc64.
* fix vrregset_t layout and member naming on powerpc64Rich Felker2019-05-221-4/+8
| | | | | | | | | | | | | | | | | | the mistaken layout seems to have been adapted from 32-bit powerpc, where vscr and vrsave are packed into the same 128-bit slot in a way that looks like it relies on non-overlapping-ness of the value bits in big endian. the powerpc64 port accounted for the fact that the 64-bit ABI puts each in its own 128-bit slot, but ordered them incorrectly (matching the bit order used on the 32-bit ABI), and failed to account for vscr being padded according to endianness so that it can be accessed via vector moves. in addition to ABI layout, our definition used different logical member layout/naming from glibc, where vscr is a structure to facilitate access as a 32-bit word or a 128-bit vector. the inconsistency here was unintentional, so fix it.
* fix tls offsets when p_vaddr%p_align != 0 on TLS_ABOVE_TP targetsSzabolcs Nagy2019-05-162-4/+6
| | | | | | | | | | | | | | | | currently the bfd linker does not seem to create tls segments where p_vaddr%p_align != 0, but this is valid in ELF and then the runtime computed tls offset must satisfy offset%p_align == (base+p_vaddr)%p_align and in case of local exec tls (main executable) the smallest such offset must be used (otherwise it is incompatible with the offset computed by the static linker). the !TLS_ABOVE_TP case is handled correctly (the offset is negative then in the formula). the ldso code for TLS_ABOVE_TP is changed so the static tls offset of each module satisfies the formula.
* fix static tls offsets of shared libs on TLS_ABOVE_TP targetsSzabolcs Nagy2019-05-161-4/+2
| | | | | | | | | | | | | | | | | | tls_offset should always point to the end of the allocated static tls area, but this was not handled correctly on "tls variant 1" targets in the dynamic linker: after application tls was allocated, tls_offset was aligned up, potentially wasting tls space. (alignment may be needed at the begining of the tls area, not at the end, but that will be fixed separately as it is unlikely to affect real binaries.) when static tls was allocated for a shared library, tls_offset was only updated with the size of the tls segment which does not include alignment gaps, which can easily happen if the tls size update for one library leaves tls_offset misaligned for the next one. this can cause oob access in __copy_tls or arbitrary breakage at tls access. (the issue was observed on aarch64 with rust binaries)
* fix format strings for uid/gid values in putpwent/putgrentRich Felker2019-05-162-2/+2
| | | | | | commit 648c3b4e18b2ce2b6af7d44783e42ca267ea49f5 omitted this change, which is needed to be able to use uid/gid values greater than INT_MAX with these interfaces. it fixes alpine linux bug #10460.
* remove unused struct dso members from dynlink.cFangrui Song2019-05-121-1/+0
| | | | | maintainer's note: commit 9d44b6460ab603487dab4d916342d9ba4467e6b9 removed their use.
* improve i386 inline syscall asm on non-broken compilersRich Felker2019-05-112-1/+34
| | | | | | | | | | | | | | | | | we have to avoid using ebx unconditionally in asm constraints for i386, because gcc 3 and 4 and possibly other simplistic compilers (pcc?) implement PIC via making ebx a fixed-use register, and disallow its use for anything else. rather than hard-coding knowledge of which compilers work (at least gcc 5+ and clang), perform a configure test; this should give us the good codegen on any new compilers we don't yet know about. swapping ebx and edx is kept for 1- and 2-arg syscalls because it avoids having any spills/stack-frame at all in small functions. for 6-arg, if ebx is directly usable, the complex shuffling introduced in commit c8798ef974d21c338a7d8d874a402978ffc6168e can be avoided, and ebp can be loaded the same way ebx is in 5-arg syscalls for compilers that don't support direct use of ebx.
* fix regression in i386 inline syscall asm producing invalid codeRich Felker2019-05-101-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 22e5bbd0deadcbd767864bd714e890b70e1fe1df inlined the i386 syscall mechanism, but wrongly assumed memory operands to the 5- and 6-argument syscall asm would be esp-based. however, nothing in the constraints prevented them from being ebx- or ebp-based, and in those cases, ebx and ebp could be clobbered before use of the memory operand was complete. in the 6-argument case, this prevented restoration of the original register values before the end of the asm block, breaking the asm contract since ebx and ebp are not marked as clobbered. (they can't be, because lots of compilers don't accept these registers in constraints or clobbers if PIC or frame pointer is enabled). doing this right is complicated by the fact that, after a single push, no operands which might be memory operands are usable. if they are esp-based, the value of esp has changed, rendering them invalid. introduce some new dances to load the registers. for the 5-arg case, push the operand that may be a memory operand first, and after that, it doesn't matter if the operand is invalid, since we'll just use the newly pushed value. for the 6-arg case, we need to put both operands in memory to begin with, like the old non-inline code prior to commit 22e5bbd0deadcbd767864bd714e890b70e1fe1df accepted, so that there's only one potentially memory-based operand to the asm. this can then be saved with a single push, and after that the values can be read off into the registers they're needed in. there's some size overhead, but still a lot less execution overhead than the old out-of-line code. doing it better depends on a modern compiler that lets you use ebx and ebp in asm constraints without restriction. the failure modes on compilers where this doesn't work are inconsistent and dangerous (on at least some gcc versions 4.x and earlier, wrong codegen!), so this is a delicate matter. it can be addressed later if needed.
* make fgetwc set error indicator for stream on encoding errorsRich Felker2019-05-051-2/+8
| | | | | | | | | this is a requirement in POSIX that's omitted, and seemed potentially non-conforming, in the C standard. as such it was omitted here. however, as part of Austin Group issue #1170, the discrepancy was raised with WG14 and determined to be unintended; future versions of the C standard will require the error indicator to be set, as POSIX does.
* fix broken posix_fadvise on mips due to missing 7-arg syscall supportRich Felker2019-05-051-0/+25
| | | | | | | | commit 788d5e24ca19c6291cebd8d1ad5b5ed6abf42665 exposed the breakage at build time by removing support for 7-argument syscalls; however, the external __syscall function provided for mips before did not pass a 7th argument from the stack, so the behavior was just silently broken.
* allow archs to provide a 7-argument syscall if neededRich Felker2019-05-051-0/+1
| | | | | | | | commit 788d5e24ca19c6291cebd8d1ad5b5ed6abf42665 noted that we could add this if needed, and in fact it is needed, but not for one of the archs documented as having a 7th syscall arg register. rather, it's needed for mips (o32), where all but the first 4 arguments are passed on the stack, and the stack can accommodate a 7th.
* fix build regression on mips n32 due to typo in new inline syscallRich Felker2019-05-051-1/+1
| | | | | commit 1bcdaeee6e659f1d856717c9aa562a068f2f3bd4 introduced the regression.
* fix passing of 64-bit syscall arguments on microblazeRich Felker2019-05-051-1/+1
| | | | | | | | | | | | | | | this has been wrong since the beginning of the microblaze port: the syscall ABI for microblaze does not align 64-bit arguments on even register boundaries. commit 788d5e24ca19c6291cebd8d1ad5b5ed6abf42665 exposed the problem by introducing references to a nonexistent __syscall7. the ABI is not documented well anywhere, but I was able to confirm against both strace source and glibc source that microblaze is not using the alignment. per the syscall(2) man page, posix_fadvise, ftruncate, pread, pwrite, readahead, sync_file_range, and truncate were all affected and either did not work at all, or only worked by chance, e.g. when the affected argument slots were all zero.
* fix regression in s390x SO_PEERSEC definitionRich Felker2019-04-231-0/+1
| | | | | | | analogous to commit efda534b212f713fe2b92a62b06e45f656b763ce for powerpc. commit 587f5a53bc3a68d80b239ba515d583df690a96df moved the definition of SO_PEERSEC to bits/socket.h for archs where the SO_* macros differ.
* make new math code compatible with unused variable warning/errorRich Felker2019-04-201-3/+6
| | | | | | | | | | | | | | | | commit b50d315fd23f0fbc4c11e2583801dd123d933745 introduced fp_force_eval implemented by default with a dead store to a volatile variable. unfortunately introduces warnings with -Wunused-variable and breaks the ability to use -Werror with the default warning options set by configure when warnings are enabled. we could just call fp_barrier instead, but that results in a spurious load after the store due to volatile semantics. the fix committed here avoids the load. it will still produce warnings without -Wno-unused-but-set-variable, but that's part of our default warning profile, and there are already other locations in the source where an unused variable warning will occur without it.
* math: new powSzabolcs Nagy2019-04-174-303/+521
| | | | | | | | | | | | | | | | | | from https://github.com/ARM-software/optimized-routines, commit 04884bd04eac4b251da4026900010ea7d8850edc The underflow exception is signaled if the result is in the subnormal range even if the result is exact. code size change: +3421 bytes. benchmark on x86_64 before, after, speedup: -Os: pow rthruput: 102.96 ns/call 33.38 ns/call 3.08x pow latency: 144.37 ns/call 54.75 ns/call 2.64x -O3: pow rthruput: 98.91 ns/call 32.79 ns/call 3.02x pow latency: 138.74 ns/call 53.78 ns/call 2.58x
* math: new exp and exp2Szabolcs Nagy2019-04-174-480/+434
| | | | | | | | | | | | | | | | | | | | | | | | from https://github.com/ARM-software/optimized-routines, commit 04884bd04eac4b251da4026900010ea7d8850edc TOINT_INTRINSICS and EXP_USE_TOINT_NARROW cases are unused. The underflow exception is signaled if the result is in the subnormal range even if the result is exact (e.g. exp2(-1023.0)). code size change: -1672 bytes. benchmark on x86_64 before, after, speedup: -Os: exp rthruput: 12.73 ns/call 6.68 ns/call 1.91x exp latency: 45.78 ns/call 21.79 ns/call 2.1x exp2 rthruput: 6.35 ns/call 5.26 ns/call 1.21x exp2 latency: 26.00 ns/call 16.58 ns/call 1.57x -O3: exp rthruput: 12.75 ns/call 6.73 ns/call 1.89x exp latency: 45.91 ns/call 21.80 ns/call 2.11x exp2 rthruput: 6.47 ns/call 5.40 ns/call 1.2x exp2 latency: 26.03 ns/call 16.54 ns/call 1.57x
* math: new log2Szabolcs Nagy2019-04-173-106/+335
| | | | | | | | | | | | | | | from https://github.com/ARM-software/optimized-routines, commit 04884bd04eac4b251da4026900010ea7d8850edc code size change: +2458 bytes (+1524 bytes with fma). benchmark on x86_64 before, after, speedup: -Os: log2 rthruput: 16.08 ns/call 10.49 ns/call 1.53x log2 latency: 44.54 ns/call 25.55 ns/call 1.74x -O3: log2 rthruput: 15.92 ns/call 10.11 ns/call 1.58x log2 latency: 44.66 ns/call 26.16 ns/call 1.71x
* math: new logSzabolcs Nagy2019-04-173-104/+454
| | | | | | | | | | | | | | | | | | from https://github.com/ARM-software/optimized-routines, commit 04884bd04eac4b251da4026900010ea7d8850edc Assume __FP_FAST_FMA implies __builtin_fma is inlined as a single instruction. code size change: +4588 bytes (+2540 bytes with fma). benchmark on x86_64 before, after, speedup: -Os: log rthruput: 12.61 ns/call 7.95 ns/call 1.59x log latency: 41.64 ns/call 23.38 ns/call 1.78x -O3: log rthruput: 12.51 ns/call 7.75 ns/call 1.61x log latency: 41.82 ns/call 23.55 ns/call 1.78x
* math: new powfSzabolcs Nagy2019-04-174-240/+232
| | | | | | | | | | | | | | | | | | | | | from https://github.com/ARM-software/optimized-routines, commit 04884bd04eac4b251da4026900010ea7d8850edc POWF_SCALE != 1.0 case only matters if TOINT_INTRINSICS is set, which is currently not supported for any target. SNaN is not supported, it would require an issignalingf implementation. code size change: -816 bytes. benchmark on x86_64 before, after, speedup: -Os: powf rthruput: 95.14 ns/call 20.04 ns/call 4.75x powf latency: 137.00 ns/call 34.98 ns/call 3.92x -O3: powf rthruput: 92.48 ns/call 13.67 ns/call 6.77x powf latency: 131.11 ns/call 35.15 ns/call 3.73x
* math: new exp2f and expfSzabolcs Nagy2019-04-175-179/+193
| | | | | | | | | | | | | | | | | | | | | | from https://github.com/ARM-software/optimized-routines, commit 04884bd04eac4b251da4026900010ea7d8850edc In expf TOINT_INTRINSICS is kept, but is unused, it would require support for __builtin_round and __builtin_lround as single instruction. code size change: +94 bytes. benchmark on x86_64 before, after, speedup: -Os: expf rthruput: 9.19 ns/call 8.11 ns/call 1.13x expf latency: 34.19 ns/call 18.77 ns/call 1.82x exp2f rthruput: 5.59 ns/call 6.52 ns/call 0.86x exp2f latency: 17.93 ns/call 16.70 ns/call 1.07x -O3: expf rthruput: 9.12 ns/call 4.92 ns/call 1.85x expf latency: 34.44 ns/call 18.99 ns/call 1.81x exp2f rthruput: 5.58 ns/call 4.49 ns/call 1.24x exp2f latency: 17.95 ns/call 16.94 ns/call 1.06x
* math: new log2fSzabolcs Nagy2019-04-173-58/+108
| | | | | | | | | | | | | | | from https://github.com/ARM-software/optimized-routines, commit 04884bd04eac4b251da4026900010ea7d8850edc code size change: +177 bytes. benchmark on x86_64 before, after, speedup: -Os: log2f rthruput: 11.38 ns/call 5.99 ns/call 1.9x log2f latency: 35.01 ns/call 22.57 ns/call 1.55x -O3: log2f rthruput: 10.82 ns/call 5.58 ns/call 1.94x log2f latency: 35.13 ns/call 21.04 ns/call 1.67x
* math: new logfSzabolcs Nagy2019-04-173-54/+109
| | | | | | | | | | | | | | | | from https://github.com/ARM-software/optimized-routines, commit 04884bd04eac4b251da4026900010ea7d8850edc, with minor changes to better fit into musl. code size change: +289 bytes. benchmark on x86_64 before, after, speedup: -Os: logf rthruput: 8.40 ns/call 6.14 ns/call 1.37x logf latency: 31.79 ns/call 24.33 ns/call 1.31x -O3: logf rthruput: 8.43 ns/call 5.58 ns/call 1.51x logf latency: 32.04 ns/call 20.88 ns/call 1.53x
* math: add configuration macrosSzabolcs Nagy2019-04-171-0/+5
| | | | | | | Musl currently aims to support non-nearest rounding mode and does not support SNaNs. These macros allow marking relevant code paths in case these decisions are changed later (they also help documenting the corner cases involved).
* math: add macros for static branch prediction hintsSzabolcs Nagy2019-04-171-0/+9
| | | | | | | | These don't have an effectw with -Os so not useful with default settings other than documenting the expectation. With --enable-optimize=internal,malloc,string,math the libc.so code size increases by 18K on x86_64 and performance varies in -2% .. +10%.
* math: add double precision error handling functionsSzabolcs Nagy2019-04-176-0/+35
|
* math: add single precision error handling functionsSzabolcs Nagy2019-04-176-0/+37
| | | | | | | | | | These are supposed to be used in tail call positions when handling special cases in new code. (fp exceptions may be raised "naturally" by the common code path if special casing is more effort.) This implements the error handling apis used in https://github.com/ARM-software/optimized-routines without errno setting.
* math: add eval_as_float and eval_as_doubleSzabolcs Nagy2019-04-171-0/+17
| | | | | | | | | | | | | | | | | | Previously type casts or assignments were used for handling excess precision, which assumed standard C99 semantics, but since it's a rarely needed obscure detail, it's better to use explicit helper functions to document where we rely on this. It also helps if the code is used outside of the libc in non-C99 compilation mode: with the default excess precision handling of gcc, explicit inline asm barriers are needed for narrowing on FLT_EVAL_METHOD!=0 targets. I plan to use this in new code with the existing style that uses double_t and float_t as much as possible. One ugliness is that it is required for almost every return statement since that does not drop excess precision (the standard changed this in C11 annex F, but that does not help in non-standard compilation modes or with old compilers).
* math: add fp_arch.h with fp_barrier and fp_force_evalSzabolcs Nagy2019-04-173-6/+90
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | C99 has ways to support fenv access, but compilers don't implement it and assume nearest rounding mode and no fp status flag access. (gcc has -frounding-math and then it does not assume nearest rounding mode, but it still assumes the compiled code itself does not change the mode. Even if the C99 mechanism was implemented it is not ideal: it requires all code in the library to be compiled with FENV_ACCESS "on" to make it usable in non-nearest rounding mode, but that limits optimizations more than necessary.) The math functions should give reasonable results in all rounding modes (but the quality may be degraded in non-nearest rounding modes) and the fp status flag settings should follow the spec, so fenv side-effects are important and code transformations that break them should be prevented. Unfortunately compilers don't give any help with this, the best we can do is to add fp barriers to the code using volatile local variables (they create a stack frame and undesirable memory accesses to it) or inline asm (gcc specific, requires target specific fp reg constraints, often creates unnecessary reg moves and multiple barriers are needed to express that an operation has side-effects) or extern call (only useful in tail-call position to avoid stack-frame creation and does not work with lto). We assume that in a math function if an operation depends on the input and the output depends on it, then the operation will be evaluated at runtime when the function is called, producing all the expected fenv side-effects (this is not true in case of lto and in case the operation is evaluated with excess precision that is not rounded away). So fp barriers are needed (1) to prevent the move of an operation within a function (in case it may be moved from an unevaluated code path into an evaluated one or if it may be moved across a fenv access), (2) force the evaluation of an operation for its side-effect when it has no input dependency (may be constant folded) or (3) when its output is unused. I belive that fp_barrier and fp_force_eval can take care of these and they should not be needed in hot code paths.
* math: remove sun copyright from libm.hSzabolcs Nagy2019-04-171-23/+0
| | | | | | | Nothing is left from the original fdlibm header nor from the bsd modifications to it other than some internal api declarations. Comments are dropped that may be copyrightable content.
* math: add asuint, asuint64, asfloat and asdoubleSzabolcs Nagy2019-04-171-33/+15
| | | | | Code generation for SET_HIGH_WORD slightly changes, but it only affects pow, otherwise the generated code is unchanged.
* math: move complex math out of libm.hSzabolcs Nagy2019-04-1767-80/+87
| | | | | | This makes it easier to build musl math code with a compiler that does not support complex types (tcc) and in general more sensible factorization of the internal headers.
* define FP_FAST_FMA* when fma* can be inlinedSzabolcs Nagy2019-04-171-0/+12
| | | | | | | | | | | | | | | | | | | FP_FAST_FMA can be defined if "the fma function generally executes about as fast as, or faster than, a multiply and an add of double operands", which can only be true if the fma call is inlined as an instruction. gcc sets __FP_FAST_FMA if __builtin_fma is inlined as an instruction, but that does not mean an fma call will be inlined (e.g. it is defined with -fno-builtin-fma), other compilers (clang) don't even have such macro, but this is the closest we can get. (even if the libc fma implementation is a single instruction, the extern call overhead is already too big when the macro is used to decide between x*y+z and fma(x,y,z) so it cannot be based on libc only, defining the macro unconditionally on targets which have fma in the base isa is also incorrect: the compiler might not inline fma anyway.) this solution works with gcc unless fma inlining is explicitly turned off.
* fcntl.h: define O_TTY_INIT to 0A. Wilcox2019-04-101-2/+3
| | | | | | | | | | POSIX: "[If] either O_TTY_INIT is set in oflag or O_TTY_INIT has the value zero, open() shall set any non-standard termios structure terminal parameters to a state that provides conforming behavior." The Linux kernel tty drivers always perform initialisation on their devices to set known good termios values during the open(2) call. This means that setting O_TTY_INIT to zero is conforming.
* remove external __syscall function and last remaining usersRich Felker2019-04-1018-264/+2
| | | | | | | | | | | | | | the weak version of __syscall_cp_c was using a tail call to __syscall to avoid duplicating the 6-argument syscall code inline in small static-linked programs, but now that __syscall no longer exists, the inline expansion is no longer duplication. the syscall.h machinery suppported up to 7 syscall arguments, only via an external __syscall function, but we presently have no syscall call points that actually make use of that many, and the kernel only defines 7-argument calling conventions for arm, powerpc (32-bit), and sh. if it turns out we need them in the future, they can easily be added.
* implement inline 5- and 6-argument syscalls for mipsn32 and mips64Rich Felker2019-04-102-29/+68
| | | | | | | | | | n32 and n64 ABIs add new argument registers vs o32, so that passing on the stack is not necessary, so it's not clear why the 5- and 6-argument versions were special-cased to begin with; it seems to have been pattern-copying from arch/mips (o32). i've treated the new argument registers like the first 4 in terms of clobber status (non-clobbered). hopefully this is correct.
* cleanup mips64 syscall_arch functionsRich Felker2019-04-101-18/+9
|