about summary refs log tree commit diff
path: root/arch
Commit message (Collapse)AuthorAgeFilesLines
...
* have sh/fdpic entry point set fdpic personality if neededRich Felker2015-09-221-0/+12
| | | | | | | | | | | | | | | | | the entry point code supports being loaded by a loader which is not fdpic-aware (in practice, either kernel with mmu or qemu without fdpic support). this mostly just works, but signal handling will wrongly use a function descriptor address as a code address if the personality is not adjusted to fdpic. ideally this code could be placed with sigaction so that it's not needed except if/when a signal handler is installed. however, personality is incorrectly maintained per-thread by the kernel, rather than per-process, so it's necessary to correct the personality before any threads are started. also, in order to skip the personality syscall when an fdpic-aware loader is used, we need to be able to detect how the program was loaded, and this information is only readily available at the entry point.
* add real fdpic loading of shared librariesRich Felker2015-09-221-0/+1
| | | | | | previously, the normal ELF library loading code was used even for fdpic, so only the kernel-loaded dynamic linker and main app could benefit from separate placement of segments and shared text.
* size-optimize sh/fdpic dynamic entry pointRich Felker2015-09-221-0/+4
| | | | | | the __fdpic_fixup code is not needed for ET_DYN executables, which instead use reloctions, so we can omit it from the dynamic linker and static-pie entry point and save some code size.
* work around breakage in sh/fdpic __unmapself functionRich Felker2015-09-221-0/+5
| | | | | | | | | | | | the C implementation of __unmapself used for potentially-nommu sh assumed CRTJMP takes a function descriptor rather than a code address; however, the actual dynamic linker needs a code address, and so commit 7a9669e977e5f750cf72ccbd2614f8b72ce02c4c changed the definition of the macro in reloc.h. this commit puts the old macro back in a place where it only affects __unmapself. this is an ugly workaround and should be cleaned up at some point, but at least it's well isolated.
* add general fdpic support in dynamic linker and arch support for shRich Felker2015-09-222-3/+12
| | | | | | | | | | | | | | | | | | at this point not all functionality is complete. the dynamic linker itself, and main app if it is also loaded by the kernel, take advantage of fdpic and do not need constant displacement between segments, but additional libraries loaded by the dynamic linker follow normal ELF semantics for mapping still. this fully works, but does not admit shared text on nommu. in terms of actual functional correctness, dlsym's results are presently incorrect for function symbols, RTLD_NEXT fails to identify the caller correctly, and dladdr fails almost entirely. with the dynamic linker entry point working, support for static pie is automatically included, but linking the main application as ET_DYN (pie) probably does not make sense for fdpic anyway. ET_EXEC is equally relocatable but more efficient at representing relocations.
* new dlstart stage-2 chaining for x86_64 and x32Rich Felker2015-09-172-0/+10
|
* new dlstart stage-2 chaining for powerpcRich Felker2015-09-171-0/+9
|
* new dlstart stage-2 chaining for or1kRich Felker2015-09-171-0/+9
|
* new dlstart stage-2 chaining for mipsRich Felker2015-09-171-0/+15
|
* new dlstart stage-2 chaining for microblazeRich Felker2015-09-171-0/+7
|
* introduce new symbol-lookup-free rcrt1/dlstart stage chainingRich Felker2015-09-171-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | previously, the call into stage 2 was made by looking up the symbol name "__dls2" (which was chosen short to be easy to look up) from the dynamic symbol table. this was no problem for the dynamic linker, since it always exports all its symbols. in the case of the static pie entry point, however, the dynamic symbol table does not contain the necessary symbol unless -rdynamic/-E was used when linking. this linking requirement is a major obstacle both to practical use of static-pie as a nommu binary format (since it greatly enlarges the file) and to upstream toolchain support for static-pie (adding -E to default linking specs is not reasonable). this patch replaces the runtime symbolic lookup with a link-time lookup via an inline asm fragment, which reloc.h is responsible for providing. in this initial commit, the asm is provided only for i386, and the old lookup code is left in place as a fallback for archs that have not yet transitioned. modifying crt_arch.h to pass the stage-2 function pointer as an argument was considered as an alternative, but such an approach would not be compatible with fdpic, where it's impossible to compute function pointers without already having performed relocations. it was also deemed desirable to keep crt_arch.h as simple/minimal as possible. in principle, archs with pc-relative or got-relative addressing of static variables could instead load the stage-2 function pointer from a static volatile object. that does not work for fdpic, and is not safe against reordering on mips-like archs that use got slots even for static functions, but it's a valid on i386 and many others, and could provide a reasonable default implementation in the future.
* reindent powerpc's bits/termios.h to be consistent with other archsFelix Janda2015-09-151-140/+138
|
* fix namespace violations in aarch64/bits/termios.hFelix Janda2015-09-151-7/+7
| | | | in analogy with commit a627eb35864d5c29a3c3300dfe83745ab1e7a00f
* add sh fdpic subarch variantsRich Felker2015-09-121-1/+16
| | | | | | | | | with this commit it should be possible to produce a working static-linked fdpic libc and application binaries for sh. the changes in reloc.h are largely unused at this point since dynamic linking is not supported, but the CRTJMP macro is used one place outside of dynamic linking, in __unmapself.
* add fdpic version of entry point code for shRich Felker2015-09-121-0/+29
| | | | | | | this version of the entry point is only suitable for static linking in ET_EXEC form. neither dynamic linking nor pie is supported yet. at some point in the future the fdpic and non-fdpic versions of this code may be unified but for now it's easiest to work with them separately.
* make sh clone asm fdpic-compatibleRich Felker2015-09-121-0/+5
| | | | | | | | | | | clone calls back to a function pointer provided by the caller, which will actually be a pointer to a function descriptor on fdpic. the obvious solution is to have a separate version of clone for fdpic, but I have taken a simpler approach to go around the problem. instead of calling the pointed-to function from asm, a direct call is made to an internal C function which then calls the pointed-to function. this lets the C compiler generate the appropriate calling convention for an indirect call with no need for ABI-specific assembly.
* fix missing earlyclobber flag in i386 a_ctz_64 asmRich Felker2015-09-091-1/+1
| | | | | | | | | this error was only found by reading the code, but it seems to have been causing gcc to produce wrong code in malloc: the same register was used for the output and the high word of the input. in principle this could have caused an infinite loop searching for an available bin, but in practice most x86 models seem to implement the "undefined" result of the bsf instruction as "unchanged".
* implement arm eabi mem* functionsTimo Teräs2015-08-314-0/+36
| | | | | | | these functions are part of the ARM EABI, meaning compilers may generate references to them. known versions of gcc do not use them, but llvm does. they are not provided by libgcc, and the de facto standard seems to be that libc provides them.
* mitigate performance regression in libc-internal locks on x86_64Rich Felker2015-08-162-2/+2
| | | | | | | | | commit 3c43c0761e1725fd5f89a9c028cbf43250abb913 fixed missing synchronization in the atomic store operation for i386 and x86_64, but opted to use mfence for the barrier on x86_64 where it's always available. however, in practice mfence is significantly slower than the barrier approach used on i386 (a nop-like lock orl operation). this commit changes x86_64 (and x32) to use the faster barrier.
* aarch64: fix 64-bit syscall argument passingSzabolcs Nagy2015-08-111-4/+2
| | | | | | | | | On 32bit systems long long arguments are passed in a special way to some syscalls; this accidentally got copied to the AArch64 port. The following interfaces were broken: fallocate, fanotify, ftruncate, posix_fadvise, posix_fallocate, pread, pwrite, readahead, sync_file_range, truncate.
* fix missing synchronization in atomic store on i386 and x86_64Rich Felker2015-07-283-3/+3
| | | | | | | | | | | | | | | | | despite being strongly ordered, the x86 memory model does not preclude reordering of loads across earlier stores. while a plain store suffices as a release barrier, we actually need a full barrier, since users of a_store subsequently load a waiter count to determine whether to issue a futex wait, and using a stale count will result in soft (fail-to-wake) deadlocks. these deadlocks were observed in malloc and possible with stdio locks and other libc-internal locking. on i386, an atomic operation on the caller's stack is used as the barrier rather than performing the store itself using xchg; this avoids the need to read the cache line on which the store is being performed. mfence is used on x86_64 where it's always available, and could be used on i386 with the appropriate cpu model checks if it's shown to perform better.
* socket.h: cleanup/reorder mips and powerpc bits/socket.hRoman Yeryomin2015-07-212-18/+20
| | | | | | ....to be somewhat consistent and easily comparable with asm/socket.h Signed-off-by: Roman Yeryomin <roman@ubnt.com>
* socket.h: fix SO_* for mipsRoman Yeryomin2015-07-212-1/+6
| | | | Signed-off-by: Roman Yeryomin <roman@ubnt.com>
* mips: fix mcontext_t register array field nameFelix Fietkau2015-07-211-1/+1
| | | | | | glibc and uclibc use gregs instead of regs Signed-off-by: Felix Fietkau <nbd@openwrt.org>
* fix local-dynamic model TLS on mips and powerpcRich Felker2015-06-252-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | the TLS ABI spec for mips, powerpc, and some other (presently unsupported) RISC archs has the return value of __tls_get_addr offset by +0x8000 and the result of DTPOFF relocations offset by -0x8000. I had previously assumed this part of the ABI was actually just an implementation detail, since the adjustments cancel out. however, when the local dynamic model is used for accessing TLS that's known to be in the same DSO, either of the following may happen: 1. the -0x8000 offset may already be applied to the argument structure passed to __tls_get_addr at ld time, without any opportunity for runtime relocations. 2. __tls_get_addr may be used with a zero offset argument to obtain a base address for the module's TLS, to which the caller then applies immediate offsets for individual objects accessed using the local dynamic model. since the immediate offsets have the -0x8000 adjustment applied to them, the base address they use needs to include the +0x8000 offset. it would be possible, but more complex, to store the pointers in the dtv[] array with the +0x8000 offset pre-applied, to avoid the runtime cost of adding 0x8000 on each call to __tls_get_addr. this change could be made later if measurements show that it would help.
* switch to using trap number 31 for syscalls on shRich Felker2015-06-161-1/+1
| | | | | | | | | | | | | | | | | | | nominally the low bits of the trap number on sh are the number of syscall arguments, but they have never been used by the kernel, and some code making syscalls does not even know the number of arguments and needs to pass an arbitrary high number anyway. sh3/sh4 traditionally used the trap range 16-31 for syscalls, but part of this range overlapped with hardware exceptions/interrupts on sh2 hardware, so an incompatible range 32-47 was chosen for sh2. using trap number 31 everywhere, since it's in the existing sh3/sh4 range and does not conflict with sh2 hardware, is a proposed unification of the kernel syscall convention that will allow binaries to be shared between sh2 and sh3/sh4. if this is not accepted into the kernel, we can refit the sh2 target with runtime selection mechanisms for the trap number, but doing so would be invasive and would entail non-trivial overhead.
* switch sh port's __unmapself to generic version when running on sh2/nommuRich Felker2015-06-161-0/+19
| | | | | | | | | | | | | due to the way the interrupt and syscall trap mechanism works, userspace on sh2 must never set the stack pointer to an invalid value. thus, the approach used on most archs, where __unmapself executes with no stack for the interval between SYS_munmap and SYS_exit, is not viable on sh2. in order not to pessimize sh3/sh4, the sh asm version of __unmapself is not removed. instead it's renamed and redirected through code that calls either the generic (safe) __unmapself or the sh3/sh4 asm, depending on compile-time and run-time conditions.
* add support for sh2 interrupt-masking-based atomics to sh portRich Felker2015-06-163-8/+113
| | | | | | | | | | | | | | | | | | | the sh2 target is being considered an ISA subset of sh3/sh4, in the sense that binaries built for sh2 are intended to be usable on later cpu models/kernels with mmu support. so rather than hard-coding sh2-specific atomics, the runtime atomic selection mechanisms that was already in place has been extended to add sh2 atomics. at this time, the sh2 atomics are not SMP-compatible; since the ISA lacks actual atomic operations, the new code instead masks interrupts for the duration of the atomic operation, producing an atomic result on single-core. this is only possible because the kernel/hardware does not impose protections against userspace doing so. additional changes will be needed to support future SMP systems. care has been taken to avoid producing significant additional code size in the case where it's known at compile-time that the target is not sh2 and does not need sh2-specific code.
* arm: add vdso supportSzabolcs Nagy2015-06-141-0/+4
| | | | | vdso will be available on arm in linux v4.2, the user-space code for it is in kernel commit 8512287a8165592466cb9cb347ba94892e9c56a5
* fix stack alignment code in mips crt_arch.hRich Felker2015-05-241-2/+2
| | | | | | | | | | | | | | the instruction used to align the stack, "and $sp, $sp, -8", does not actually exist; it's expanded to 2 instructions using the 'at' (assembler temporary) register, and thus cannot be used in a branch delay slot. since alignment mod 16 commutes with subtracting 8, simply swapping these two operations fixes the problem. crt1.o was not affected because it's still being generated from a dedicated asm source file. dlstart.lo was not affected because the stack pointer it receives is already aligned by the kernel. but Scrt1.o was affected in cases where the dynamic linker gave it a misaligned stack pointer.
* add .text section directive to all crt_arch.h files missing itRich Felker2015-05-227-0/+7
| | | | | | | | i386 and x86_64 versions already had the .text directive; other archs did not. normally, top-level (file scope) __asm__ starts in the .text section anyway, but problems were reported with some versions of clang, and it seems preferable to set it explicitly anyway, at least for the sake of consistency between archs.
* fix inconsistency in a_and and a_or argument types on x86[_64]Rich Felker2015-05-203-12/+12
| | | | | | conceptually, and on other archs, these functions take a pointer to int, but in the i386, x86_64, and x32 versions of atomic.h, they took a pointer to void instead.
* inline llsc atomics when building for sh4aBobby Bingham2015-05-192-90/+128
| | | | | | | If we're building for sh4a, the compiler is already free to use instructions only available on sh4a, so we can do the same and inline the llsc atomics. If we're building for an older processor, we still do the same runtime atomics selection as before.
* make arm reloc.h CRTJMP macro compatible with thumbRich Felker2015-05-141-0/+5
| | | | | | | | | | | | | compilers targeting armv7 may be configured to produce thumb2 code instead of arm code by default, and in the future we may wish to support targets where only the thumb instruction set is available. the instructions this patch omits in thumb mode are needed only for non-thumb versions of armv4 or earlier, which are not supported by any current compilers/toolchains and thus rather pointless to have. at some point these compatibility return sequences may be removed from all asm source files, and in that case it would make sense to remove them here too and remove the ifdef.
* make arm crt_arch.h compatible with thumb code generationRich Felker2015-05-141-4/+6
| | | | | | | | | | | | | | | | | | compilers targeting armv7 may be configured to produce thumb2 code instead of arm code by default, and in the future we may wish to support targets where only the thumb instruction set is available. the changes made here avoid operating directly on the sp register, which is not possible in thumb code, and address an issue with the way the address of _DYNAMIC is computed. previously, the relative address of _DYNAMIC was stored with an additional offset of -8 versus the pc-relative add instruction, since on arm the pc register evaluates to ".+8". in thumb code, it instead evaluates to ".+4". both are two (normal-size) instructions beyond "." in the current execution mode, so the numbered label 2 used in the relative address expression is simply moved two instructions ahead to be compatible with both instruction sets.
* fix stack protector crashes on x32 & powerpc due to misplaced TLS canaryRich Felker2015-05-062-0/+3
| | | | | | | | | | | | | | | | | | | i386, x86_64, x32, and powerpc all use TLS for stack protector canary values in the default stack protector ABI, but the location only matched the ABI on i386 and x86_64. on x32, the expected location for the canary contained the tid, thus producing spurious mismatches (resulting in process termination) upon fork. on powerpc, the expected location contained the stdio_locks list head, so returning from a function after calling flockfile produced spurious mismatches. in both cases, the random canary was not present, and a predictable value was used instead, making the stack protector hardening much less effective than it should be. in the current fix, the thread structure has been expanded to have canary fields at all three possible locations, and archs that use a non-default location must define a macro in pthread_arch.h to choose which location is used. for most archs (which lack TLS canary ABI) the choice does not matter.
* fix broken cancellation on x32 due to incorrect saved-PC offsetRich Felker2015-05-021-1/+1
|
* fix dangling pointers in x32 syscall timespec fixup codeRich Felker2015-05-012-10/+23
| | | | | | | the lifetime of compound literals is the block in which they appear. the temporary struct __timespec_kernel objects created as compound literals no longer existed at the time their addresses were passed to the kernel.
* fix __syscall declaration with wrong visibility in syscall_arch.hSzabolcs Nagy2015-04-305-8/+3
| | | | | remove __syscall declaration where it is not needed (aarch64, arm, microblaze, or1k) and add the hidden attribute where it is (mips).
* aarch64: fix CRTJMP in reloc.hSzabolcs Nagy2015-04-301-1/+1
| | | | | commit f3ddd173806fd5c60b3f034528ca24542aecc5b9 broke the build by using "bx" instead of "br".
* fix sh jmp_buf size to match ABIRich Felker2015-04-271-1/+1
| | | | | | | | | | | | | | | | | while the sh port is still experimental and subject to ABI instability, this is not actually an application/libc boundary ABI change. it only affects third-party APIs where jmp_buf is used in a shared structure at the ABI boundary, because nothing anywhere near the end of the jmp_buf object (which includes the oversized sigset_t) is accessed by libc. both glibc and uclibc have 15-slot jmp_buf for sh. presumably the smaller version was used in musl because the slots for fpu status register and thread pointer register (gbr) were incorrect and must not be restored by longjmp, but the size should have been preserved, as it's generally treated as a libc-agnostic ABI property for the arch, and having extra slots free in case we ever need them for something is useful anyway.
* fix ldso name for sh-nofpu subarchRich Felker2015-04-241-1/+7
| | | | | | | | | | | | | | | previously it was using the same name as the default ABI with hard float (floating point args and return value in registers). the test __SH_FPU_ANY__ || __SH4__ matches what's used in the configure script already, and seems correct under casual review against gcc's config/sh.h, but may need tweaks. the logic for predefined macros for sh, and what they all mean, is very complex. eventually this should be documented in comments here. configure already rejects "half-hard" configurations on sh where double=float since these do not conform to Annex F and are not suitable for musl, so these do not need to be considered here.
* fix failure of sh reloc.h to properly detect endianness for ldso nameRich Felker2015-04-241-0/+2
| | | | | versions of reloc.h that rely on endian macros much include endian.h to ensure they are available.
* fix breakage in x32 dynamic linker due to mismatching register sizeRich Felker2015-04-201-1/+1
| | | | | | the jmp instruction requires a 64-bit register, so cast the desired PC address up to uint64_t, going through uintptr_t to ensure that it's zero-extended rather than possibly sign-extended.
* add execveat syscall number to microblazeSzabolcs Nagy2015-04-171-0/+2
| | | | | syscall number was reserved in linux v4.0, kernel commit add4b1b02da7e7ec35c34dd04d351ac53f3f0dd8
* fix missing quotation mark in mips crt_arch.h that broke buildRich Felker2015-04-171-1/+1
|
* consistently use hidden visibility for cancellable syscall internalsRich Felker2015-04-141-0/+7
| | | | | | | | | | in a few places, non-hidden symbols were referenced from asm in ways that assumed ld-time binding. while these is no semantic reason these symbols need to be hidden, fixing the references without making them hidden was going to be ugly, and hidden reduces some bloat anyway. in the asm files, .global/.hidden directives have been moved to the top to unclutter the actual code.
* use hidden visibility for i386 asm-internal __vsyscall symbolRich Felker2015-04-141-7/+7
| | | | | otherwise the call instruction in the inline syscall asm results in textrels without ld-time binding.
* dynamic linker bootstrap overhaulRich Felker2015-04-1321-465/+285
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | this overhaul further reduces the amount of arch-specific code needed by the dynamic linker and removes a number of assumptions, including: - that symbolic function references inside libc are bound at link time via the linker option -Bsymbolic-functions. - that libc functions used by the dynamic linker do not require access to data symbols. - that static/internal function calls and data accesses can be made without performing any relocations, or that arch-specific startup code handled any such relocations needed. removing these assumptions paves the way for allowing libc.so itself to be built with stack protector (among other things), and is achieved by a three-stage bootstrap process: 1. relative relocations are processed with a flat function. 2. symbolic relocations are processed with no external calls/data. 3. main program and dependency libs are processed with a fully-functional libc/ldso. reduction in arch-specific code is achived through the following: - crt_arch.h, used for generating crt1.o, now provides the entry point for the dynamic linker too. - asm is no longer responsible for skipping the beginning of argv[] when ldso is invoked as a command. - the functionality previously provided by __reloc_self for heavily GOT-dependent RISC archs is now the arch-agnostic stage-1. - arch-specific relocation type codes are mapped directly as macros rather than via an inline translation function/switch statement.
* fix possible clobbering of syscall return values on mipsRich Felker2015-04-071-3/+6
| | | | | | | | | | | | depending on the compiler's interpretation of __asm__ register names for register class objects, it may be possible for the return value in r2 to be clobbered by the function call to __stat_fix. I have not observed any such breakage in normal builds and suspect it only happens with -O0 or other unusual build options, but since there's an ambiguity as to the semantics of this feature, it's best to use an explicit temporary to avoid the issue. based on reporting and patch by Eugene.