| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
with this patch, the malloc in libc.so built with -Os is nearly the
same speed as the one built with -O3. thus it solves the performance
regression that resulted from removing the forced -O3 when building
libc.so; now libc.so can be both small and fast.
|
|
|
|
|
|
|
|
|
|
|
| |
I originally added -O3 for shared libraries to counteract very bad
behavior by GCC when building PIC code: it insists on reloading the
GOT register in static functions that need it, even if the address of
the function is never leaked from the translation unit and all local
callers of the function have already loaded the GOT register. this
measurably degrades performance in a few key areas like malloc. the
inlining done at -O3 avoids the issue, but that's really not a good
reason for overriding the user's choice of optimization level.
|
|
|
|
|
|
| |
vfork is implemented as the fork syscall (with no atfork handlers run)
on archs where it is not available, so this change does not introduce
any change in behavior or regression for such archs.
|
| |
|
|
|
|
|
|
|
| |
I'm not 100% sure that Linux's O_PATH meets the POSIX requirements for
O_SEARCH, but it seems very close if not perfect. and old kernels
ignore it, so O_SEARCH will still work as desired as long as the
caller has read permissions to the directory.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
by using the "ir" constraint (immediate or register) and the carefully
constructed instruction addu $2,$0,%2 which can take either an
immediate or a register for %2, the new inline asm admits maximal
optimization with no register spillage to the stack when the compiler
successfully performs constant propagration, but still works by
allocating a register when the syscall number cannot be recognized as
a constant. in the case of syscalls with 0-3 arguments it barely
matters, but for 4-argument syscalls, using an immediate for the
syscall number avoids creating a stack frame for the syscall wrapper
function.
|
|
|
|
|
|
|
|
|
| |
all past and current kernel versions have done so, but there seems to
be no reason it's necessary and the sentiment from everyone I've asked
has been that we should not rely on it. instead, use r7 (an argument
register) which will necessarily be preserved upon syscall restart.
however this only works for 0-3 argument syscalls, and we have to
resort to the function call for 4-argument syscalls.
|
|
|
|
|
|
|
| |
for the sake of simplicity, I've only used rep movsb rather than
breaking up the copy for using rep movsd/q. on all modern cpus, this
seems to be fine, but if there are performance problems, there might
be a need to go back and add support for rep movsd/q.
|
| |
|
|
|
|
|
|
|
|
|
| |
before restrict was added, memove called memcpy for forward copies and
used a byte-at-a-time loop for reverse copies. this was changed to
avoid invoking UB now that memcpy has an undefined copying order,
making memmove considerably slower.
performance is still rather bad, so I'll be adding asm soon.
|
| |
|
|
|
|
|
|
|
|
|
| |
this should both fix the issue with ARM needing -lgcc_eh (although
that's really a bug in the libgcc build process that's causing
considerable bloat, which should be fixed) and make it easier to build
musl using clang/llvm in place of gcc. unfortunately I don't know a
good way to detect and support pcc's -lpcc since it's not in pcc's
default library search path...
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
| |
no syscalls actually use that many arguments; the issue is that some
syscalls with 64-bit arguments have them ordered badly so that
breaking them into aligned 32-bit half-arguments wastes slots with
padding, and a 7th slot is needed for the last argument.
|
|
|
|
|
|
| |
most pure-syscall-wrapper functions compile to the smallest/simplest
code possible (save r7 ; load syscall # ; svc 0 ; restore r7 ; tail
call to __syscall_ret).
|
|
|
|
|
|
|
| |
this drastically reduces the size of some functions which are purely
syscall wrappers.
disabled for clang due to known bugs satisfying register constraints.
|
| |
|
|
|
|
|
|
|
| |
this code was using $10 to save the syscall number, but $10 is not
necessarily preserved by the kernel across syscalls. only mattered for
syscalls that got interrupted by a signal and restarted. as far as i
can tell, $25 is preserved by the kernel across syscalls.
|
|
|
|
|
|
|
| |
something is wrong with the logic for the argument layout, resulting
in compile errors on mips due to too many args to syscall... further
information on how it's supposed to work will be needed before it can
be reactivated.
|
|
|
|
|
|
|
|
|
|
|
|
| |
now public syscall.h only exposes __NR_* and SYS_* constants and the
variadic syscall function. no macros or inline functions, no
__syscall_ret or other internal details, no 16-/32-bit legacy syscall
renaming, etc. this logic has all been moved to src/internal/syscall.h
with the arch-specific parts in arch/$(ARCH)/syscall_arch.h, and the
amount of arch-specific stuff has been reduced to a minimum.
changes still need to be reviewed/double-checked. minimal testing on
i386 and mips has already been performed.
|
| |
|
|
|
|
| |
based on patch by Justin Cormack
|
| |
|
| |
|
|
|
|
| |
not sure why this was missed in the earlier commit.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
this is equivalent to posix_fallocate except that it has an extra
mode/flags argument to control its behavior, and stores the error in
errno rather than returning an error code.
|
|
|
|
|
| |
the syscall takes an extra flag argument which should be zero to meet
the POSIX requirements.
|
| |
|
|
|
|
| |
features.h contains the fallback logic for pre-C11 compilers
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
the old behavior of exposing nothing except plain ISO C can be
obtained by defining __STRICT_ANSI__ or using a compiler option (such
as -std=c99) that predefines it. the new default featureset is POSIX
with XSI plus _BSD_SOURCE. any explicit feature test macros will
inhibit the default.
installation docs have also been updated to reflect this change.
|
|
|
|
|
|
|
|
|
|
| |
clang does not presently support the "v" constraint we want to use to
get the result from $3, and trying to use register...__asm__("$3") to
do the same invokes serious compiler bugs. so for now, i'm working
around the issue with an extra temp register and putting $3 in the
clobber list instead of using it as output. when the bugs in clang are
fixed, this issue should be revisited to generate smaller/faster code
like what gcc gets.
|
|
|
|
|
|
|
|
|
|
|
|
| |
previously, it was pretty much random which one of these trees a given
function appeared in. they have now been organized into:
src/linux: non-POSIX linux syscalls (possibly shard with other nixen)
src/legacy: various obsolete/legacy functions, mostly wrappers
src/misc: still mostly uncategorized; some misc POSIX, some nonstd
src/crypt: crypt hash functions
further cleanup will be done later.
|
|
|
|
| |
void* does not implicitly convert to function pointer types.
|
|
|
|
|
|
|
|
|
|
|
| |
so far, this is the only actual use of loff_t i've found. some
software, including glib, assumes loff_t must exist if splice exists;
this is a reasonable assumption since the official prototype for
splice uses loff_t, as it always works with 64-bit offsets regardless
of the selected libc off_t size. i'm using #define for now rather than
a typedef to make it easy to define in other headers if necessary
(like the LFS64 ugliness), but it may be necessary to add it to
alltypes.h eventually if other functions end up needing it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
note that POSIX does not specify these functions as _Noreturn, because
POSIX is aligned with C99, not the new C11 standard. when POSIX is
eventually updated to C11, it will almost surely give these functions
the _Noreturn attribute. for now, the actual _Noreturn keyword is not
used anyway when compiling with a c99 compiler, which is what POSIX
requires; the GCC __attribute__ is used instead if it's available,
however.
in a few places, I've added infinite for loops at the end of _Noreturn
functions to silence compiler warnings. presumably
__buildin_unreachable could achieve the same thing, but it would only
work on newer GCCs and would not be portable. the loops should have
near-zero code size cost anyway.
like the previous _Noreturn commit, this one is based on patches
contributed by philomath.
|
| |
|