| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
| |
|
|
|
|
| |
these macros are supported by more compilers
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
despite documentation that makes it sound a lot different, the only
ABI-constraint difference between TLS variants II and I seems to be
that variant II stores the initial TLS segment immediately below the
thread pointer (i.e. the thread pointer points to the end of it) and
variant I stores the initial TLS segment above the thread pointer,
requiring the thread descriptor to be stored below. the actual value
stored in the thread pointer register also tends to have per-arch
random offsets applied to it for silly micro-optimization purposes.
with these changes applied, TLS should be basically working on all
supported archs except microblaze. I'm still working on getting the
necessary information and a working toolchain that can build TLS
binaries for microblaze, but in theory, static-linked programs with
TLS and dynamic-linked programs where only the main executable uses
TLS should already work on microblaze.
alignment constraints have not yet been heavily tested, so it's
possible that this code does not always align TLS segments correctly
on archs that need TLS variant I.
|
|
|
|
|
|
|
|
| |
this is actually a rather subtle issue: do arrays decay to pointers
when used as inline asm args? gcc says yes, but currently pcc says no.
hopefully this discrepency in pcc will be fixed, but since the
behavior is not clearly defined anywhere I can find, I'm using an
explicit operation to cause the decay to occur.
|
|
|
|
|
|
| |
this doubles the performance of the fastest syscalls on the atom I
tested it on; improvement is reportedly much more dramatic on
worst-case cpus. cannot be used for cancellable syscalls.
|
| |
|
|
|
|
|
|
|
|
| |
currently, only i386 is tested. x86_64 and arm should probably work.
the necessary relocation types for mips and microblaze have not been
added because I don't understand how they're supposed to work, and I'm
not even sure if it's defined yet on microblaze. I may be able to
reverse engineer the requirements out of gcc/binutils output.
|
|
|
|
|
|
| |
based on initial work by rdp, with heavy modifications. some features
including threads are untested because qemu app-level emulation seems
to be broken and I do not have a proper system image for testing.
|
| |
|
|
|
|
|
| |
not tested on mips and arm; they may still be broken. x86_64 should be
ok now.
|
|
|
|
|
| |
the linux O_PATH mode provides the necessary semantics for both the
O_SEARCH and O_EXEC modes defined and required by POSIX 2008.
|
|
|
|
|
| |
no problems were detected so far, but the constraints seem to have
been invalid just like the mips ones.
|
|
|
|
|
|
|
|
| |
if same register is used for input/output, the compiler must be told.
otherwise is generates random junk code that clobbers the result. in
pure syscall-wrapper functions, nothing went wrong, but in more
complex functions where register allocation is non-trivial, things
broke badly.
|
|
|
|
|
|
|
| |
I'm not 100% sure that Linux's O_PATH meets the POSIX requirements for
O_SEARCH, but it seems very close if not perfect. and old kernels
ignore it, so O_SEARCH will still work as desired as long as the
caller has read permissions to the directory.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
by using the "ir" constraint (immediate or register) and the carefully
constructed instruction addu $2,$0,%2 which can take either an
immediate or a register for %2, the new inline asm admits maximal
optimization with no register spillage to the stack when the compiler
successfully performs constant propagration, but still works by
allocating a register when the syscall number cannot be recognized as
a constant. in the case of syscalls with 0-3 arguments it barely
matters, but for 4-argument syscalls, using an immediate for the
syscall number avoids creating a stack frame for the syscall wrapper
function.
|
|
|
|
|
|
|
|
|
| |
all past and current kernel versions have done so, but there seems to
be no reason it's necessary and the sentiment from everyone I've asked
has been that we should not rely on it. instead, use r7 (an argument
register) which will necessarily be preserved upon syscall restart.
however this only works for 0-3 argument syscalls, and we have to
resort to the function call for 4-argument syscalls.
|
|
|
|
|
|
| |
most pure-syscall-wrapper functions compile to the smallest/simplest
code possible (save r7 ; load syscall # ; svc 0 ; restore r7 ; tail
call to __syscall_ret).
|
|
|
|
|
|
|
| |
this drastically reduces the size of some functions which are purely
syscall wrappers.
disabled for clang due to known bugs satisfying register constraints.
|
|
|
|
|
|
|
|
|
|
|
|
| |
now public syscall.h only exposes __NR_* and SYS_* constants and the
variadic syscall function. no macros or inline functions, no
__syscall_ret or other internal details, no 16-/32-bit legacy syscall
renaming, etc. this logic has all been moved to src/internal/syscall.h
with the arch-specific parts in arch/$(ARCH)/syscall_arch.h, and the
amount of arch-specific stuff has been reduced to a minimum.
changes still need to be reviewed/double-checked. minimal testing on
i386 and mips has already been performed.
|
|
|
|
| |
based on patch by Justin Cormack
|
|
|
|
|
|
|
|
|
|
| |
clang does not presently support the "v" constraint we want to use to
get the result from $3, and trying to use register...__asm__("$3") to
do the same invokes serious compiler bugs. so for now, i'm working
around the issue with an extra temp register and putting $3 in the
clobber list instead of using it as output. when the bugs in clang are
fixed, this issue should be revisited to generate smaller/faster code
like what gcc gets.
|
|
|
|
|
|
|
|
|
| |
while musl itself requires a c99 compiler, some applications insist on
being compiled with c89 compilers, and use of "inline" in the headers
was breaking them. much of this had been avoided already by just
skipping the inline keyword in pre-c99 compilers or modes, but this
new unified solution is cleaner and may/should result in better code
generation in the default gcc configuration.
|
|
|
|
|
|
| |
linux guarantees ll/sc are always available. on mips1, they will be
emulated by the kernel. thus they are part of the linux mips1 abi and
safe to use.
|
|
|
|
|
|
|
|
| |
this is needed to match the underlying "ABI" standards. it's not
really an ABI issue since the binary representations are the same, but
having the wrong type can lead to errors when the type arising from a
difference-of-pointers expression does not match the defined type of
ptrdiff_t. most of the problems affect C++, not C.
|
| |
|
|
|
|
| |
yet another gratuitous mips incompatibility...
|
|
|
|
| |
untested; hopefully it's right now
|
| |
|
|
|
|
|
| |
why does mips have to be gratuitously incompatible in every possible
imaginable way?
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
not heavily tested, but the basics are working. the basic concept is
that the dynamic linker entry point code invokes a pure-PIC (no global
accesses) C function in reloc.h to perform the early GOT relocations
needed to make the dynamic linker itself functional, then invokes
__dynlink like on other archs. since mips uses some ugly arch-specific
hacks to optimize relocating the GOT (rather than just using the
normal DT_REL[A] tables like on other archs), the dynamic linker has
been modified slightly to support calling arch-specific relocation
code in reloc.h.
most of the actual mips-specific behavior was developed by reading the
output of readelf on libc.so and simple executable files. i could not
find good reference information on which relocation types need to be
supported or their semantics, so it's possible that some legitimate
usage cases will not work yet.
|
|
|
|
|
|
|
| |
also fix the alignment of jmp_buf to meet the abi. linux always
emulates fpu on mips if it's not present, so enabling this code
unconditionally is "safe" but may be slow. in the long term it may be
preferable to find a way to disable it on soft float builds.
|
|
|
|
| |
sc was overwriting the result
|
|
|
|
|
|
| |
the fields in the mcontext_t are long long (for no good reason) even
on 32-bit mips, so the offset of the instruction pointer (as a word)
varies depending on endianness.
|
|
|
|
|
|
|
|
|
| |
the kernel wrongly expects the cmsg length field to be size_t instead
of socklen_t. in order to work around the issue, we have to impose a
length limit and copy to a local buffer. the length limit should be
more than sufficient for any real-world use; these headers are only
used for passing file descriptors and permissions between processes
over unix sockets.
|
|
|
|
| |
this fix is easier than trying to reorder the header stuff
|
|
|
|
| |
signal handling was very broken because of this
|
|
|
|
|
| |
like arm, mips requires 64-bit arguments to be "aligned" on an even
register boundary.
|
|
|
|
|
| |
otherwise offs in ucontext_t will be wrong, and break code that
inspects or modifies the signal makes (including cancellation code).
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
basically, this version of the code was obtained by starting with
rdp's work from his ellcc source tree, adapting it to musl's build
system and coding style, auditing the bits headers for discrepencies
with kernel definitions or glibc/LSB ABI or large file issues, fixing
up incompatibility with the old binutils from aboriginal linux, and
adding some new special cases to deal with the oddities of sigaction
and pipe syscall interfaces on mips.
at present, minimal test programs work, but some interfaces are broken
or missing. threaded programs probably will not link.
|
|
|
|
| |
apparently somebody wants this for something... and it doesn't hurt.
|
| |
|
|
|
|
| |
no need to pass zero for unused arguments; just omit them.
|
|
|
|
| |
this hidden endian dependency had left big endian arm badly broken.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
on arm, the location of the saved-signal-mask flag and mask were off
by one between sigsetjmp and siglongjmp, causing incorrect behavior
restoring the signal mask. this is because the siglongjmp code assumed
an extra slot was in the non-sig jmp_buf for the flag, but arm did not
have this. now, the extra slot is removed for all archs since it was
useless.
also, arm eabi requires jmp_buf to have 8-byte alignment. we achieve
that using long long as the type rather than with non-portable gcc
attribute tags.
|
|
|
|
| |
patch submitted by Kristian L. <email@thexception.net>
|