about summary refs log tree commit diff
Commit message (Collapse)AuthorAgeFilesLines
* fix restrict violations in internal use of several functionsSamuel Holland2019-07-103-10/+10
| | | | | | | The old/new parameters to pthread_sigmask, sigprocmask, and setitimer are marked restrict, so passing the same address to both is prohibited. Modify callers of these functions to use a separate object for each argument.
* mention mips64 n32 ABI support in INSTALL docRich Felker2019-07-091-1/+1
|
* document riscv64 support in INSTALL documentRich Felker2019-07-091-0/+5
|
* prevent dup2 action for posix_spawn internal pipe fdRich Felker2019-07-081-0/+4
| | | | | | | | | | | | | as reported by Tavian Barnes, a dup2 file action for the internal pipe fd used by posix_spawn could cause it to remain open after execve and allow the child to write an artificial error into it, confusing the parent. POSIX allows internal use of file descriptors by the implementation, with undefined behavior for poking at them, so this is not a conformance problem, but it seems preferable to diagnose and prevent the error when we can do so easily. catch attempts to apply a dup2 action to the internal pipe fd and emulate EBADF for it instead.
* fix inadvertent use of uninitialized variable in dladdrRich Felker2019-07-061-1/+1
| | | | | | | | | commit c8b49b2fbc7faa8bf065220f11963d76c8a2eb93 introduced code that checked bestsym to determine whether a matching symbol was found, but bestsym is uninitialized if not. instead use best, consistent with use in the rest of the function. simplified from bug report and patch by Cheng Liu.
* remove spurious MAP_32BIT definition from riscv64 archRich Felker2019-07-041-1/+0
| | | | | | this was apparently copied from x86_64; it's not part of the kernel API for riscv64. this change eliminates the need for a riscv64-specific bits header and lets it use the generic one.
* configure: make AR and RANLIB customizableFangrui Song2019-07-041-0/+4
|
* remove stray .end directives from powerpc[64] asmFangrui Song2019-07-022-2/+0
| | | | | maintainer's note: these are not meaningful/correct/needed and the clang integrated assembler errors out upon seeing them.
* add new syscall numbers from linux v5.1Szabolcs Nagy2019-07-0115-0/+327
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syscall numbers are now synced up across targets (starting from 403 the numbers are the same on all targets other than an arch specific offset) IPC syscalls sem*, shm*, msg* got added where they were missing (except for semop: only semtimedop got added), the new semctl, shmctl, msgctl imply IPC_64, see linux commit 0d6040d4681735dfc47565de288525de405a5c99 arch: add split IPC system calls where needed new 64bit time_t syscall variants got added on 32bit targets, see linux commit 48166e6ea47d23984f0b481ca199250e1ce0730a y2038: add 64-bit time_t syscalls to all 32-bit architectures new async io syscalls got added, see linux commit 2b188cc1bb857a9d4701ae59aa7768b5124e262e Add io_uring IO interface linux commit edafccee56ff31678a091ddb7219aba9b28bc3cb io_uring: add support for pre-mapped user IO buffers a new syscall got added that uses the fd of /proc/<pid> as a stable handle for processes: allows sending signals without pid reuse issues, intended to eventually replace rt_sigqueueinfo, kill, tgkill and rt_tgsigqueueinfo, see linux commit 3eb39f47934f9d5a3027fe00d906a45fe3a15fad signal: add pidfd_send_signal() syscall on some targets (arm, m68k, s390x, sh) some previously missing syscall numbers got added as well.
* ipc: prefer SYS_ipc when it is definedSzabolcs Nagy2019-07-0112-12/+12
| | | | | | | | | Linux v5.1 introduced ipc syscalls on targets where previously only SYS_ipc was available, change the logic such that the ipc code keeps using SYS_ipc which works backward compatibly on older kernels. This changes behaviour on microblaze which had both mechanisms, now SYS_ipc will be used instead of separate syscalls.
* mips64: fix syscall numbers of io_pgetevents and rseqSzabolcs Nagy2019-07-011-2/+2
| | | | | | | | | the numbers added in commit d149e69c02eb558114f20ea718810e95538a3b2f add io_pgetevents and rseq syscall numbers from linux v4.18 were incorrect.
* elf.h: add NT_ARM_PAC{A,G}_KEYS from linux v5.1Szabolcs Nagy2019-07-011-0/+2
| | | | | | | to request or change pointer auth keys for criu via ptrace, new in linux commit d0a060be573bfbf8753a15dca35497db5e968bb0 arm64: add ptrace regsets for ptrauth key management
* netinet/in.h: add INADDR_ALLSNOOPERS_GROUP from linux v5.1Szabolcs Nagy2019-07-011-0/+1
| | | | | | | | RFC 4286: "The IPv4 multicast address for All-Snoopers is 224.0.0.106." from linux commit 4effd28c1245303dce7fd290c501ac2c11052114 bridge: join all-snoopers multicast address
* sys/socket.h: add SO_BINDTOIFINDEX from linux v5.1Szabolcs Nagy2019-07-011-0/+1
| | | | | | | | | SO_BINDTOIFINDEX behaves similar to SO_BINDTODEVICE, but takes a network interface index as argument, rather than the network interface name. see linux commit f5dd3d0c9638a9d9a02b5964c4ad636f06cf7e2c net: introduce SO_BINDTOIFINDEX sockopt
* s390x: drop SO_ definitions from bits/socket.hSzabolcs Nagy2019-07-011-28/+0
| | | | the s390x definitions matched the generic ones in sys/socket.h.
* netinet/in.h: add IPV6_ROUTER_ALERT_ISOLATE from linux v5.1Szabolcs Nagy2019-07-011-0/+1
| | | | | | | | restricts router alert packets received by the socket to the socket's namespace only. see linux commit 9036b2fe092a107856edd1a3bad48b83f2b45000 net: ipv6: add socket option IPV6_ROUTER_ALERT_ISOLATE
* sys/prctl.h: add PR_SPEC_DISABLE_NOEXEC from linux v5.1Szabolcs Nagy2019-07-011-0/+1
| | | | | | | | allows specifying that the speculative store bypass disable bit should be cleared on exec. see linux commit 71368af9027f18fe5d1c6f372cfdff7e4bde8b48 x86/speculation: Add PR_SPEC_DISABLE_NOEXEC
* fcntl.h: add F_SEAL_FUTURE_WRITE from linux v5.1Szabolcs Nagy2019-07-011-0/+1
| | | | | | | | | needed for android so it can migrate from its ashmem to memfd. allows making the memfd readonly for future users while keeping a writable mmap of it. see linux commit ab3948f58ff841e51feb845720624665ef5b7ef3 mm/memfd: add an F_SEAL_FUTURE_WRITE seal to memfd
* sys/fanotify.h: update for linux v5.1Szabolcs Nagy2019-07-011-1/+33
| | | | | | | | | | | | | | | | | includes changes from linux v5.1 linux commit 235328d1fa4251c6dcb32351219bb553a58838d2 fanotify: add support for create/attrib/move/delete events linux commit 5e469c830fdb5a1ebaa69b375b87f583326fd296 fanotify: copy event fid info to user linux commit e9e0c8903009477b630e37a8b6364b26a00720da fanotify: encode file identifier for FAN_REPORT_FID as well as earlier changes that were missed. sys/statfs.h is included for fsid_t.
* fix deadlock in synccall after threaded forkSamuel Holland2019-07-011-0/+1
| | | | | | | | | | | | | | | synccall may be called by AS-safe functions such as setuid/setgid after fork. although fork() resets libc.threads_minus_one, causing synccall to take the single-threaded path, synccall still takes the thread list lock. This lock may be held by another thread if for example fork() races with pthread_create(). After fork(), the value of the lock is meaningless, so clear it. maintainer's note: commit 8f11e6127fe93093f81a52b15bb1537edc3fc8af and e4235d70672d9751d7718ddc2b52d0b426430768 introduced this regression. the state protected by this lock is the linked list, which is entirely replaced in the child path of fork (next=prev=self), so resetting it is semantically sound.
* cap getdents length argument to INT_MAXRich Felker2019-06-281-0/+2
| | | | | | | | the linux syscall treats this argument as having type int, so passing extremely long buffer sizes would be misinterpreted by the kernel. since "short reads" are always acceptable, just cap it down. patch based on report and suggested change by Florian Weimer.
* remove unnecessary and problematic _Noreturn from crt/ldso startupRich Felker2019-06-253-5/+5
| | | | | | | | | | | | | | | | | | | after commit a48ccc159a5fa061a18419296100ee48a1cd6cc9 removed the use of _Noreturn on the stage3_func type (which only worked due to it being defined to the "GNU C" attribute in C99 mode), GCC could no longer assume that the ends of __dls2 and __dls2b are unreachable, and produced a warning that a function marked _Noreturn returns. also, since commit 4390383b32250a941ec616e8bff6f568a801b1c0, the _Noreturn declaration for __libc_start_main in crt1/rcrt1 has been not only inconsistent with the definition, but wrong. formally, __libc_start_main does return, via a (hopefully) tail call to a helper function after the barrier. incorrect usage of _Noreturn in the declaration was probably formal UB. the _Noreturn specifiers were not useful in any of these places, so remove them all. now, the only remaining usage of _Noreturn is in public interfaces where _Noreturn is part of their contract.
* allow fmemopen with zero sizeRich Felker2019-06-251-1/+1
| | | | | | | previously, POSIX erroneously required this to fail with EINVAL despite the traditional glibc implementation, on which the POSIX interface was based, allowing it. the resolution of Austin Group issue 818 removes the requirement to fail.
* do not use _Noreturn for a function pointer in dynamic linkerMatthew Maurer2019-06-211-1/+1
| | | | | _Noreturn is a C11 construct, and may only be used at the site of a function definition.
* remove implicit include of sys/sysmacros.h from sys/types.hRich Felker2019-06-211-1/+0
| | | | | | | | | | this reverts commit f552c792c7ce5a560f214e1104d93ee5b0833967, which exposed the sysmacros.h macros (device major/minor calculations) for BSD and GNU profiles to mimic an unintentional glibc behavior some code depended on. glibc has deprecated and since removed them as the resolution to bug #19239, so it makes no sense for us to keep this behavior. affected code should all have been fixed by now, and if it's not yet fixed it needs to be for use with modern glibc anyway.
* add riscv64 architecture supportRich Felker2019-06-1451-0/+1362
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Author: Alex Suykov <alex.suykov@gmail.com> Author: Aric Belsito <lluixhi@gmail.com> Author: Drew DeVault <sir@cmpwn.com> Author: Michael Clark <mjc@sifive.com> Author: Michael Forney <mforney@mforney.org> Author: Stefan O'Rear <sorear2@gmail.com> This port has involved the work of many people over several years. I have tried to ensure that everyone with substantial contributions has been credited above; if any omissions are found they will be noted later in an update to the authors/contributors list in the COPYRIGHT file. The version committed here comes from the riscv/riscv-musl repo's commit 3fe7e2c75df78eef42dcdc352a55757729f451e2, with minor changes by me for issues found during final review: - a_ll/a_sc atomics are removed (according to the ISA spec, lr/sc are not safe to use in separate inline asm fragments) - a_cas[_p] is fixed to be a memory barrier - the call from the _start assembly into the C part of crt1/ldso is changed to allow for the possibility that the linker does not place them nearby each other. - DTP_OFFSET is defined correctly so that local-dynamic TLS works - reloc.h LDSO_ARCH logic is simplified and made explicit. - unused, non-functional crti/n asm files are removed. - an empty .sdata section is added to crt1 so that the __global_pointer reference is resolvable. - indentation style errors in some asm files are fixed.
* optimize aarch64 dynamic tlsdesc function to spill fewer registersRich Felker2019-05-261-10/+7
| | | | | | | | | | | | | | with the glibc generation counter model for reusing dynamic tls slots after dlclose, it's really not possible to get away with fewer than 4 working registers. for us however it's always been possible, but tricky, and only became apparent after the switch to installing new dynamic tls at dlopen time. by merging the negated thread pointer into the addend early, the register holding the thread pointer can immediately be reused, bringing the working register count down to three. this allows saving/restoring via a single stp/ldp pair, since the return register x0 does not need to be saved. net reduction of 3 instructions, 2 of which were push/pop.
* make powerpc64 vrregset_t logical layout match expected APIRich Felker2019-05-221-1/+4
| | | | | | | | | between v2 and v3 of the powerpc64 port patch, the change was made from a 32x4 array of 32-bit unsigned ints for vrregs[] to a 32-element array of __int128. this mismatches the API applications working with mcontext_t expect from glibc, and seems to have been motivated by a misinterpretation of a comment on how aarch64 did things as a suggestion to do the same on powerpc64.
* fix vrregset_t layout and member naming on powerpc64Rich Felker2019-05-221-4/+8
| | | | | | | | | | | | | | | | | | the mistaken layout seems to have been adapted from 32-bit powerpc, where vscr and vrsave are packed into the same 128-bit slot in a way that looks like it relies on non-overlapping-ness of the value bits in big endian. the powerpc64 port accounted for the fact that the 64-bit ABI puts each in its own 128-bit slot, but ordered them incorrectly (matching the bit order used on the 32-bit ABI), and failed to account for vscr being padded according to endianness so that it can be accessed via vector moves. in addition to ABI layout, our definition used different logical member layout/naming from glibc, where vscr is a structure to facilitate access as a 32-bit word or a 128-bit vector. the inconsistency here was unintentional, so fix it.
* fix tls offsets when p_vaddr%p_align != 0 on TLS_ABOVE_TP targetsSzabolcs Nagy2019-05-162-4/+6
| | | | | | | | | | | | | | | | currently the bfd linker does not seem to create tls segments where p_vaddr%p_align != 0, but this is valid in ELF and then the runtime computed tls offset must satisfy offset%p_align == (base+p_vaddr)%p_align and in case of local exec tls (main executable) the smallest such offset must be used (otherwise it is incompatible with the offset computed by the static linker). the !TLS_ABOVE_TP case is handled correctly (the offset is negative then in the formula). the ldso code for TLS_ABOVE_TP is changed so the static tls offset of each module satisfies the formula.
* fix static tls offsets of shared libs on TLS_ABOVE_TP targetsSzabolcs Nagy2019-05-161-4/+2
| | | | | | | | | | | | | | | | | | tls_offset should always point to the end of the allocated static tls area, but this was not handled correctly on "tls variant 1" targets in the dynamic linker: after application tls was allocated, tls_offset was aligned up, potentially wasting tls space. (alignment may be needed at the begining of the tls area, not at the end, but that will be fixed separately as it is unlikely to affect real binaries.) when static tls was allocated for a shared library, tls_offset was only updated with the size of the tls segment which does not include alignment gaps, which can easily happen if the tls size update for one library leaves tls_offset misaligned for the next one. this can cause oob access in __copy_tls or arbitrary breakage at tls access. (the issue was observed on aarch64 with rust binaries)
* fix format strings for uid/gid values in putpwent/putgrentRich Felker2019-05-162-2/+2
| | | | | | commit 648c3b4e18b2ce2b6af7d44783e42ca267ea49f5 omitted this change, which is needed to be able to use uid/gid values greater than INT_MAX with these interfaces. it fixes alpine linux bug #10460.
* remove unused struct dso members from dynlink.cFangrui Song2019-05-121-1/+0
| | | | | maintainer's note: commit 9d44b6460ab603487dab4d916342d9ba4467e6b9 removed their use.
* improve i386 inline syscall asm on non-broken compilersRich Felker2019-05-112-1/+34
| | | | | | | | | | | | | | | | | we have to avoid using ebx unconditionally in asm constraints for i386, because gcc 3 and 4 and possibly other simplistic compilers (pcc?) implement PIC via making ebx a fixed-use register, and disallow its use for anything else. rather than hard-coding knowledge of which compilers work (at least gcc 5+ and clang), perform a configure test; this should give us the good codegen on any new compilers we don't yet know about. swapping ebx and edx is kept for 1- and 2-arg syscalls because it avoids having any spills/stack-frame at all in small functions. for 6-arg, if ebx is directly usable, the complex shuffling introduced in commit c8798ef974d21c338a7d8d874a402978ffc6168e can be avoided, and ebp can be loaded the same way ebx is in 5-arg syscalls for compilers that don't support direct use of ebx.
* fix regression in i386 inline syscall asm producing invalid codeRich Felker2019-05-101-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 22e5bbd0deadcbd767864bd714e890b70e1fe1df inlined the i386 syscall mechanism, but wrongly assumed memory operands to the 5- and 6-argument syscall asm would be esp-based. however, nothing in the constraints prevented them from being ebx- or ebp-based, and in those cases, ebx and ebp could be clobbered before use of the memory operand was complete. in the 6-argument case, this prevented restoration of the original register values before the end of the asm block, breaking the asm contract since ebx and ebp are not marked as clobbered. (they can't be, because lots of compilers don't accept these registers in constraints or clobbers if PIC or frame pointer is enabled). doing this right is complicated by the fact that, after a single push, no operands which might be memory operands are usable. if they are esp-based, the value of esp has changed, rendering them invalid. introduce some new dances to load the registers. for the 5-arg case, push the operand that may be a memory operand first, and after that, it doesn't matter if the operand is invalid, since we'll just use the newly pushed value. for the 6-arg case, we need to put both operands in memory to begin with, like the old non-inline code prior to commit 22e5bbd0deadcbd767864bd714e890b70e1fe1df accepted, so that there's only one potentially memory-based operand to the asm. this can then be saved with a single push, and after that the values can be read off into the registers they're needed in. there's some size overhead, but still a lot less execution overhead than the old out-of-line code. doing it better depends on a modern compiler that lets you use ebx and ebp in asm constraints without restriction. the failure modes on compilers where this doesn't work are inconsistent and dangerous (on at least some gcc versions 4.x and earlier, wrong codegen!), so this is a delicate matter. it can be addressed later if needed.
* make fgetwc set error indicator for stream on encoding errorsRich Felker2019-05-051-2/+8
| | | | | | | | | this is a requirement in POSIX that's omitted, and seemed potentially non-conforming, in the C standard. as such it was omitted here. however, as part of Austin Group issue #1170, the discrepancy was raised with WG14 and determined to be unintended; future versions of the C standard will require the error indicator to be set, as POSIX does.
* fix broken posix_fadvise on mips due to missing 7-arg syscall supportRich Felker2019-05-051-0/+25
| | | | | | | | commit 788d5e24ca19c6291cebd8d1ad5b5ed6abf42665 exposed the breakage at build time by removing support for 7-argument syscalls; however, the external __syscall function provided for mips before did not pass a 7th argument from the stack, so the behavior was just silently broken.
* allow archs to provide a 7-argument syscall if neededRich Felker2019-05-051-0/+1
| | | | | | | | commit 788d5e24ca19c6291cebd8d1ad5b5ed6abf42665 noted that we could add this if needed, and in fact it is needed, but not for one of the archs documented as having a 7th syscall arg register. rather, it's needed for mips (o32), where all but the first 4 arguments are passed on the stack, and the stack can accommodate a 7th.
* fix build regression on mips n32 due to typo in new inline syscallRich Felker2019-05-051-1/+1
| | | | | commit 1bcdaeee6e659f1d856717c9aa562a068f2f3bd4 introduced the regression.
* fix passing of 64-bit syscall arguments on microblazeRich Felker2019-05-051-1/+1
| | | | | | | | | | | | | | | this has been wrong since the beginning of the microblaze port: the syscall ABI for microblaze does not align 64-bit arguments on even register boundaries. commit 788d5e24ca19c6291cebd8d1ad5b5ed6abf42665 exposed the problem by introducing references to a nonexistent __syscall7. the ABI is not documented well anywhere, but I was able to confirm against both strace source and glibc source that microblaze is not using the alignment. per the syscall(2) man page, posix_fadvise, ftruncate, pread, pwrite, readahead, sync_file_range, and truncate were all affected and either did not work at all, or only worked by chance, e.g. when the affected argument slots were all zero.
* fix regression in s390x SO_PEERSEC definitionRich Felker2019-04-231-0/+1
| | | | | | | analogous to commit efda534b212f713fe2b92a62b06e45f656b763ce for powerpc. commit 587f5a53bc3a68d80b239ba515d583df690a96df moved the definition of SO_PEERSEC to bits/socket.h for archs where the SO_* macros differ.
* make new math code compatible with unused variable warning/errorRich Felker2019-04-201-3/+6
| | | | | | | | | | | | | | | | commit b50d315fd23f0fbc4c11e2583801dd123d933745 introduced fp_force_eval implemented by default with a dead store to a volatile variable. unfortunately introduces warnings with -Wunused-variable and breaks the ability to use -Werror with the default warning options set by configure when warnings are enabled. we could just call fp_barrier instead, but that results in a spurious load after the store due to volatile semantics. the fix committed here avoids the load. it will still produce warnings without -Wno-unused-but-set-variable, but that's part of our default warning profile, and there are already other locations in the source where an unused variable warning will occur without it.
* math: new powSzabolcs Nagy2019-04-174-303/+521
| | | | | | | | | | | | | | | | | | from https://github.com/ARM-software/optimized-routines, commit 04884bd04eac4b251da4026900010ea7d8850edc The underflow exception is signaled if the result is in the subnormal range even if the result is exact. code size change: +3421 bytes. benchmark on x86_64 before, after, speedup: -Os: pow rthruput: 102.96 ns/call 33.38 ns/call 3.08x pow latency: 144.37 ns/call 54.75 ns/call 2.64x -O3: pow rthruput: 98.91 ns/call 32.79 ns/call 3.02x pow latency: 138.74 ns/call 53.78 ns/call 2.58x
* math: new exp and exp2Szabolcs Nagy2019-04-174-480/+434
| | | | | | | | | | | | | | | | | | | | | | | | from https://github.com/ARM-software/optimized-routines, commit 04884bd04eac4b251da4026900010ea7d8850edc TOINT_INTRINSICS and EXP_USE_TOINT_NARROW cases are unused. The underflow exception is signaled if the result is in the subnormal range even if the result is exact (e.g. exp2(-1023.0)). code size change: -1672 bytes. benchmark on x86_64 before, after, speedup: -Os: exp rthruput: 12.73 ns/call 6.68 ns/call 1.91x exp latency: 45.78 ns/call 21.79 ns/call 2.1x exp2 rthruput: 6.35 ns/call 5.26 ns/call 1.21x exp2 latency: 26.00 ns/call 16.58 ns/call 1.57x -O3: exp rthruput: 12.75 ns/call 6.73 ns/call 1.89x exp latency: 45.91 ns/call 21.80 ns/call 2.11x exp2 rthruput: 6.47 ns/call 5.40 ns/call 1.2x exp2 latency: 26.03 ns/call 16.54 ns/call 1.57x
* math: new log2Szabolcs Nagy2019-04-173-106/+335
| | | | | | | | | | | | | | | from https://github.com/ARM-software/optimized-routines, commit 04884bd04eac4b251da4026900010ea7d8850edc code size change: +2458 bytes (+1524 bytes with fma). benchmark on x86_64 before, after, speedup: -Os: log2 rthruput: 16.08 ns/call 10.49 ns/call 1.53x log2 latency: 44.54 ns/call 25.55 ns/call 1.74x -O3: log2 rthruput: 15.92 ns/call 10.11 ns/call 1.58x log2 latency: 44.66 ns/call 26.16 ns/call 1.71x
* math: new logSzabolcs Nagy2019-04-173-104/+454
| | | | | | | | | | | | | | | | | | from https://github.com/ARM-software/optimized-routines, commit 04884bd04eac4b251da4026900010ea7d8850edc Assume __FP_FAST_FMA implies __builtin_fma is inlined as a single instruction. code size change: +4588 bytes (+2540 bytes with fma). benchmark on x86_64 before, after, speedup: -Os: log rthruput: 12.61 ns/call 7.95 ns/call 1.59x log latency: 41.64 ns/call 23.38 ns/call 1.78x -O3: log rthruput: 12.51 ns/call 7.75 ns/call 1.61x log latency: 41.82 ns/call 23.55 ns/call 1.78x
* math: new powfSzabolcs Nagy2019-04-174-240/+232
| | | | | | | | | | | | | | | | | | | | | from https://github.com/ARM-software/optimized-routines, commit 04884bd04eac4b251da4026900010ea7d8850edc POWF_SCALE != 1.0 case only matters if TOINT_INTRINSICS is set, which is currently not supported for any target. SNaN is not supported, it would require an issignalingf implementation. code size change: -816 bytes. benchmark on x86_64 before, after, speedup: -Os: powf rthruput: 95.14 ns/call 20.04 ns/call 4.75x powf latency: 137.00 ns/call 34.98 ns/call 3.92x -O3: powf rthruput: 92.48 ns/call 13.67 ns/call 6.77x powf latency: 131.11 ns/call 35.15 ns/call 3.73x
* math: new exp2f and expfSzabolcs Nagy2019-04-175-179/+193
| | | | | | | | | | | | | | | | | | | | | | from https://github.com/ARM-software/optimized-routines, commit 04884bd04eac4b251da4026900010ea7d8850edc In expf TOINT_INTRINSICS is kept, but is unused, it would require support for __builtin_round and __builtin_lround as single instruction. code size change: +94 bytes. benchmark on x86_64 before, after, speedup: -Os: expf rthruput: 9.19 ns/call 8.11 ns/call 1.13x expf latency: 34.19 ns/call 18.77 ns/call 1.82x exp2f rthruput: 5.59 ns/call 6.52 ns/call 0.86x exp2f latency: 17.93 ns/call 16.70 ns/call 1.07x -O3: expf rthruput: 9.12 ns/call 4.92 ns/call 1.85x expf latency: 34.44 ns/call 18.99 ns/call 1.81x exp2f rthruput: 5.58 ns/call 4.49 ns/call 1.24x exp2f latency: 17.95 ns/call 16.94 ns/call 1.06x
* math: new log2fSzabolcs Nagy2019-04-173-58/+108
| | | | | | | | | | | | | | | from https://github.com/ARM-software/optimized-routines, commit 04884bd04eac4b251da4026900010ea7d8850edc code size change: +177 bytes. benchmark on x86_64 before, after, speedup: -Os: log2f rthruput: 11.38 ns/call 5.99 ns/call 1.9x log2f latency: 35.01 ns/call 22.57 ns/call 1.55x -O3: log2f rthruput: 10.82 ns/call 5.58 ns/call 1.94x log2f latency: 35.13 ns/call 21.04 ns/call 1.67x
* math: new logfSzabolcs Nagy2019-04-173-54/+109
| | | | | | | | | | | | | | | | from https://github.com/ARM-software/optimized-routines, commit 04884bd04eac4b251da4026900010ea7d8850edc, with minor changes to better fit into musl. code size change: +289 bytes. benchmark on x86_64 before, after, speedup: -Os: logf rthruput: 8.40 ns/call 6.14 ns/call 1.37x logf latency: 31.79 ns/call 24.33 ns/call 1.31x -O3: logf rthruput: 8.43 ns/call 5.58 ns/call 1.51x logf latency: 32.04 ns/call 20.88 ns/call 1.53x