about summary refs log tree commit diff
path: root/arch
Commit message (Collapse)AuthorAgeFilesLines
* fix stack alignment code in mips crt_arch.hRich Felker2015-05-241-2/+2
| | | | | | | | | | | | | | the instruction used to align the stack, "and $sp, $sp, -8", does not actually exist; it's expanded to 2 instructions using the 'at' (assembler temporary) register, and thus cannot be used in a branch delay slot. since alignment mod 16 commutes with subtracting 8, simply swapping these two operations fixes the problem. crt1.o was not affected because it's still being generated from a dedicated asm source file. dlstart.lo was not affected because the stack pointer it receives is already aligned by the kernel. but Scrt1.o was affected in cases where the dynamic linker gave it a misaligned stack pointer.
* add .text section directive to all crt_arch.h files missing itRich Felker2015-05-227-0/+7
| | | | | | | | i386 and x86_64 versions already had the .text directive; other archs did not. normally, top-level (file scope) __asm__ starts in the .text section anyway, but problems were reported with some versions of clang, and it seems preferable to set it explicitly anyway, at least for the sake of consistency between archs.
* fix inconsistency in a_and and a_or argument types on x86[_64]Rich Felker2015-05-203-12/+12
| | | | | | conceptually, and on other archs, these functions take a pointer to int, but in the i386, x86_64, and x32 versions of atomic.h, they took a pointer to void instead.
* inline llsc atomics when building for sh4aBobby Bingham2015-05-192-90/+128
| | | | | | | If we're building for sh4a, the compiler is already free to use instructions only available on sh4a, so we can do the same and inline the llsc atomics. If we're building for an older processor, we still do the same runtime atomics selection as before.
* make arm reloc.h CRTJMP macro compatible with thumbRich Felker2015-05-141-0/+5
| | | | | | | | | | | | | compilers targeting armv7 may be configured to produce thumb2 code instead of arm code by default, and in the future we may wish to support targets where only the thumb instruction set is available. the instructions this patch omits in thumb mode are needed only for non-thumb versions of armv4 or earlier, which are not supported by any current compilers/toolchains and thus rather pointless to have. at some point these compatibility return sequences may be removed from all asm source files, and in that case it would make sense to remove them here too and remove the ifdef.
* make arm crt_arch.h compatible with thumb code generationRich Felker2015-05-141-4/+6
| | | | | | | | | | | | | | | | | | compilers targeting armv7 may be configured to produce thumb2 code instead of arm code by default, and in the future we may wish to support targets where only the thumb instruction set is available. the changes made here avoid operating directly on the sp register, which is not possible in thumb code, and address an issue with the way the address of _DYNAMIC is computed. previously, the relative address of _DYNAMIC was stored with an additional offset of -8 versus the pc-relative add instruction, since on arm the pc register evaluates to ".+8". in thumb code, it instead evaluates to ".+4". both are two (normal-size) instructions beyond "." in the current execution mode, so the numbered label 2 used in the relative address expression is simply moved two instructions ahead to be compatible with both instruction sets.
* fix stack protector crashes on x32 & powerpc due to misplaced TLS canaryRich Felker2015-05-062-0/+3
| | | | | | | | | | | | | | | | | | | i386, x86_64, x32, and powerpc all use TLS for stack protector canary values in the default stack protector ABI, but the location only matched the ABI on i386 and x86_64. on x32, the expected location for the canary contained the tid, thus producing spurious mismatches (resulting in process termination) upon fork. on powerpc, the expected location contained the stdio_locks list head, so returning from a function after calling flockfile produced spurious mismatches. in both cases, the random canary was not present, and a predictable value was used instead, making the stack protector hardening much less effective than it should be. in the current fix, the thread structure has been expanded to have canary fields at all three possible locations, and archs that use a non-default location must define a macro in pthread_arch.h to choose which location is used. for most archs (which lack TLS canary ABI) the choice does not matter.
* fix broken cancellation on x32 due to incorrect saved-PC offsetRich Felker2015-05-021-1/+1
|
* fix dangling pointers in x32 syscall timespec fixup codeRich Felker2015-05-012-10/+23
| | | | | | | the lifetime of compound literals is the block in which they appear. the temporary struct __timespec_kernel objects created as compound literals no longer existed at the time their addresses were passed to the kernel.
* fix __syscall declaration with wrong visibility in syscall_arch.hSzabolcs Nagy2015-04-305-8/+3
| | | | | remove __syscall declaration where it is not needed (aarch64, arm, microblaze, or1k) and add the hidden attribute where it is (mips).
* aarch64: fix CRTJMP in reloc.hSzabolcs Nagy2015-04-301-1/+1
| | | | | commit f3ddd173806fd5c60b3f034528ca24542aecc5b9 broke the build by using "bx" instead of "br".
* fix sh jmp_buf size to match ABIRich Felker2015-04-271-1/+1
| | | | | | | | | | | | | | | | | while the sh port is still experimental and subject to ABI instability, this is not actually an application/libc boundary ABI change. it only affects third-party APIs where jmp_buf is used in a shared structure at the ABI boundary, because nothing anywhere near the end of the jmp_buf object (which includes the oversized sigset_t) is accessed by libc. both glibc and uclibc have 15-slot jmp_buf for sh. presumably the smaller version was used in musl because the slots for fpu status register and thread pointer register (gbr) were incorrect and must not be restored by longjmp, but the size should have been preserved, as it's generally treated as a libc-agnostic ABI property for the arch, and having extra slots free in case we ever need them for something is useful anyway.
* fix ldso name for sh-nofpu subarchRich Felker2015-04-241-1/+7
| | | | | | | | | | | | | | | previously it was using the same name as the default ABI with hard float (floating point args and return value in registers). the test __SH_FPU_ANY__ || __SH4__ matches what's used in the configure script already, and seems correct under casual review against gcc's config/sh.h, but may need tweaks. the logic for predefined macros for sh, and what they all mean, is very complex. eventually this should be documented in comments here. configure already rejects "half-hard" configurations on sh where double=float since these do not conform to Annex F and are not suitable for musl, so these do not need to be considered here.
* fix failure of sh reloc.h to properly detect endianness for ldso nameRich Felker2015-04-241-0/+2
| | | | | versions of reloc.h that rely on endian macros much include endian.h to ensure they are available.
* fix breakage in x32 dynamic linker due to mismatching register sizeRich Felker2015-04-201-1/+1
| | | | | | the jmp instruction requires a 64-bit register, so cast the desired PC address up to uint64_t, going through uintptr_t to ensure that it's zero-extended rather than possibly sign-extended.
* add execveat syscall number to microblazeSzabolcs Nagy2015-04-171-0/+2
| | | | | syscall number was reserved in linux v4.0, kernel commit add4b1b02da7e7ec35c34dd04d351ac53f3f0dd8
* fix missing quotation mark in mips crt_arch.h that broke buildRich Felker2015-04-171-1/+1
|
* consistently use hidden visibility for cancellable syscall internalsRich Felker2015-04-141-0/+7
| | | | | | | | | | in a few places, non-hidden symbols were referenced from asm in ways that assumed ld-time binding. while these is no semantic reason these symbols need to be hidden, fixing the references without making them hidden was going to be ugly, and hidden reduces some bloat anyway. in the asm files, .global/.hidden directives have been moved to the top to unclutter the actual code.
* use hidden visibility for i386 asm-internal __vsyscall symbolRich Felker2015-04-141-7/+7
| | | | | otherwise the call instruction in the inline syscall asm results in textrels without ld-time binding.
* dynamic linker bootstrap overhaulRich Felker2015-04-1321-465/+285
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | this overhaul further reduces the amount of arch-specific code needed by the dynamic linker and removes a number of assumptions, including: - that symbolic function references inside libc are bound at link time via the linker option -Bsymbolic-functions. - that libc functions used by the dynamic linker do not require access to data symbols. - that static/internal function calls and data accesses can be made without performing any relocations, or that arch-specific startup code handled any such relocations needed. removing these assumptions paves the way for allowing libc.so itself to be built with stack protector (among other things), and is achieved by a three-stage bootstrap process: 1. relative relocations are processed with a flat function. 2. symbolic relocations are processed with no external calls/data. 3. main program and dependency libs are processed with a fully-functional libc/ldso. reduction in arch-specific code is achived through the following: - crt_arch.h, used for generating crt1.o, now provides the entry point for the dynamic linker too. - asm is no longer responsible for skipping the beginning of argv[] when ldso is invoked as a command. - the functionality previously provided by __reloc_self for heavily GOT-dependent RISC archs is now the arch-agnostic stage-1. - arch-specific relocation type codes are mapped directly as macros rather than via an inline translation function/switch statement.
* fix possible clobbering of syscall return values on mipsRich Felker2015-04-071-3/+6
| | | | | | | | | | | | depending on the compiler's interpretation of __asm__ register names for register class objects, it may be possible for the return value in r2 to be clobbered by the function call to __stat_fix. I have not observed any such breakage in normal builds and suspect it only happens with -O0 or other unusual build options, but since there's an ambiguity as to the semantics of this feature, it's best to use an explicit temporary to avoid the issue. based on reporting and patch by Eugene.
* move O_PATH definition back to arch bitsRich Felker2015-04-019-0/+9
| | | | | | | while it's the same for all presently supported archs, it differs at least on sparc, and conceptually it's no less arch-specific than the other O_* macros. O_SEARCH and O_EXEC are still defined in terms of O_PATH in the main fcntl.h.
* aarch64: remove duplicate macro definitions in bits/fcntl.hRich Felker2015-04-011-3/+0
|
* aarch64: fix definition of sem_nsems in semid_ds structureRich Felker2015-04-011-1/+7
| | | | | | POSIX requires the sem_nsems member to have type unsigned short. we have to work around the incorrect kernel type using matching endian-specific padding.
* aarch64: fix namespace pollution in bits/shm.hSzabolcs Nagy2015-04-011-2/+2
| | | | | | The shm_info struct is a gnu extension and some of its members do not have shm* prefix. This is worked around in sys/shm.h by macros, but aarch64 didn't use those.
* fix missing max_align_t definition on aarch64Rich Felker2015-03-201-0/+2
|
* fix MINSIGSTKSZ values for archs with large signal contextsRich Felker2015-03-1810-0/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | the previous values (2k min and 8k default) were too small for some archs. aarch64 reserves 4k in the signal context for future extensions and requires about 4.5k total, and powerpc reportedly uses over 2k. the new minimums are chosen to fit the saved context and also allow a minimal signal handler to run. since the default (SIGSTKSZ) has always been 6k larger than the minimum, it is also increased to maintain the 6k usable by the signal handler. this happens to be able to store one pathname buffer and should be sufficient for calling any function in libc that doesn't involve conversion between floating point and decimal representations. x86 (both 32-bit and 64-bit variants) may also need a larger minimum (around 2.5k) in the future to support avx-512, but the values on these archs are left alone for now pending further analysis. the value for PTHREAD_STACK_MIN is not increased to match MINSIGSTKSZ at this time. this is so as not to preclude applications from using extremely small thread stacks when they know they will not be handling signals. unfortunately cancellation and multi-threaded set*id() use signals as an implementation detail and therefore require a stack large enough for a signal context, so applications which use extremely small thread stacks may still need to avoid using these features.
* aarch64: fix typo in bits/ioctl.hSzabolcs Nagy2015-03-141-1/+1
|
* aarch64: add struct _aarch64_ctx to signal.hSzabolcs Nagy2015-03-141-0/+17
| | | | | | | | The unwind code in libgcc uses this type for unwinding across signal handlers. On aarch64 the kernel may place a sequence of structs on the signal stack on top of the ucontext to provide additional information. The unwinder only needs the header, but added all the types the kernel currently defines for this mechanism because they are part of the uapi.
* align x32 pthread type sizes to be common with 32-bit archsRich Felker2015-03-121-4/+4
| | | | | | | | | | | | | | | | | | | | | previously, commit e7b9887e8b65253087ab0b209dc8dd85c9f09614 aligned the sizes with the glibc ABI. subsequent discussion during the merge of the aarch64 port reached a conclusion that we should reject larger arch-specific sizes, which have significant cost and no benefit, and stick with the existing common 32-bit sizes for all 32-bit/ILP32 archs and the x86_64 sizes for 64-bit archs. one peculiarity of this change is that x32 pthread_attr_t is now larger in musl than in the glibc x32 ABI, making it unsafe to call pthread_attr_init from x32 code that was compiled against glibc. with all the ABI issues of x32, it's not clear that ABI compatibility will ever work, but if it's needed, pthread_attr_init and related functions could be modified not to write to the last slot of the object. this is not a regression versus previous releases, since on previous releases the x32 pthread type sizes were all severely oversized already (due to incorrectly using the x86_64 LP64 definitions). moreover, x32 is still considered experimental and not ABI-stable.
* add aarch64 portSzabolcs Nagy2015-03-1133-0/+1814
| | | | | | | | | | This adds complete aarch64 target support including bigendian subarch. Some of the long double math functions are known to be broken otherwise interfaces should be fully functional, but at this point consider this port experimental. Initial work on this port was done by Sireesh Tripurari and Kevin Bortis.
* fix FLT_ROUNDS to reflect the current rounding modeSzabolcs Nagy2015-03-079-9/+0
| | | | | Implemented as a wrapper around fegetround introducing a new function to the ABI: __flt_rounds. (fegetround cannot be used directly from float.h)
* fix POLLWRNORM and POLLWRBAND on mipsTrutz Behn2015-03-049-0/+2
| | | | | | these macros have the same distinct definition on blackfin, frv, m68k, mips, sparc and xtensa kernels. POLLMSG and POLLRDHUP additionally differ on sparc.
* fix x32 pthread type definitionsRich Felker2015-03-041-7/+7
| | | | | | the previous definitions were copied from x86_64. not only did they fail to match the ABI sizes; they also wrongly encoded an assumption that long/pointer types are twice as large as int.
* make all objects used with atomic operations volatileRich Felker2015-03-039-63/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the memory model we use internally for atomics permits plain loads of values which may be subject to concurrent modification without requiring that a special load function be used. since a compiler is free to make transformations that alter the number of loads or the way in which loads are performed, the compiler is theoretically free to break this usage. the most obvious concern is with atomic cas constructs: something of the form tmp=*p;a_cas(p,tmp,f(tmp)); could be transformed to a_cas(p,*p,f(*p)); where the latter is intended to show multiple loads of *p whose resulting values might fail to be equal; this would break the atomicity of the whole operation. but even more fundamental breakage is possible. with the changes being made now, objects that may be modified by atomics are modeled as volatile, and the atomic operations performed on them by other threads are modeled as asynchronous stores by hardware which happens to be acting on the request of another thread. such modeling of course does not itself address memory synchronization between cores/cpus, but that aspect was already handled. this all seems less than ideal, but it's the best we can do without mandating a C11 compiler and using the C11 model for atomics. in the case of pthread_once_t, the ABI type of the underlying object is not volatile-qualified. so we are assuming that accessing the object through a volatile-qualified lvalue via casts yields volatile access semantics. the language of the C standard is somewhat unclear on this matter, but this is an assumption the linux kernel also makes, and seems to be the correct interpretation of the standard.
* add syscall numbers for the new execveat syscallSzabolcs Nagy2015-02-097-4/+19
| | | | | | | | | this syscall allows fexecve to be implemented without /proc, it is new in linux v3.19, added in commit 51f39a1f0cea1cacf8c787f652f26dfee9611874 (sh and microblaze do not have allocated syscall numbers yet) added a x32 fix as well: the io_setup and io_submit syscalls are no longer common with x86_64, so use the x32 specific numbers.
* remove cruft from x86_64 syscall.hSzabolcs Nagy2015-02-071-23/+0
| | | | | | | x86_64 syscall.h defined some musl internal syscall names and made them public. These defines were already moved to src/internal/syscall.h (except for SYS_fadvise which is added now) so the cruft in x86_64 syscall.h is not needed.
* fix typo in x86_64/x32 user_fpregs_structFelix Janda2015-02-012-2/+2
| | | | mxcs_mask should be mxcr_mask
* move MREMAP_MAYMOVE and MREMAP_FIXED out of bitsTrutz Behn2015-01-309-27/+0
| | | | | | the definitions are generic for all kernel archs. exposure of these macros now only occurs on the same feature test as for the function accepting them, which is believed to be more correct.
* remove mips-only EINIT and EREMDEV errnosTrutz Behn2015-01-301-2/+0
| | | | | the errno values are unused by the kernel and the macro definitions were never exposed by glibc.
* add new syscall numbers for bpf and kexec_file_loadSzabolcs Nagy2014-12-238-0/+20
| | | | | | | | | | | these syscalls are new in linux v3.18, bpf is present on all supported archs except sh, kexec_file_load is only allocted for x86_64 and x32 yet. bpf was added in linux commit 99c55f7d47c0dc6fc64729f37bf435abf43f4c60 kexec_file_load syscall number was allocated in commit f0895685c7fd8c938c91a9d8a6f7c11f22df58d2
* move wint_t definition to the shared part of alltypes.h.inRich Felker2014-12-219-9/+0
|
* add arm private syscall numbersTimo Teräs2014-12-031-0/+5
| | | | it is part of kernel uapi, and some programs (e.g. nodejs) do use them
* unify non-inline version of syscall code across archsRich Felker2014-11-223-104/+6
| | | | | | | | | | | | | | except powerpc, which still lacks inline syscalls simply because nobody has written the code, these are all fallbacks used to work around a clang bug that probably does not exist in versions of clang that can compile musl. however, it's useful to have the generic non-inline code anyway, as it eases the task of porting to new archs: writing inline syscall code is now optional. this approach could also help support compilers which don't understand inline asm or lack support for the needed register constraints. mips could not be unified because it has special fixup code for broken layout of the kernel's struct stat.
* inline 5- and 6-argument syscalls on armRich Felker2014-11-221-2/+15
|
* remove old clang workarounds from arm syscall implementationRich Felker2014-11-221-31/+0
| | | | | | | | | | the register constraints in the non-clang case were tested to work on clang back to 3.2, and earlier versions of clang have known bugs that preclude building musl. there may be other reasons to prefer not to use inline syscalls, but if so the function-call-based implementations should be added back in a unified way for all archs.
* fix __aeabi_read_tp oversight in arm atomics/tls overhaulRich Felker2014-11-222-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | calls to __aeabi_read_tp may be generated by the compiler to access TLS on pre-v6 targets. previously, this function was hard-coded to call the kuser helper, which would crash on kernels with kuser helper removed. to fix the problem most efficiently, the definition of __aeabi_read_tp is moved so that it's an alias for the new __a_gettp. however, on v7+ targets, code to initialize the runtime choice of thread-pointer loading code is not even compiled, meaning that defining __aeabi_read_tp would have caused an immediate crash due to using the default implementation of __a_gettp with a HCF instruction. fortunately there is an elegant solution which reduces overall code size: putting the native thread-pointer loading instruction in the default code path for __a_gettp, so that separate default/native code paths are not needed. this function should never be called before __set_thread_area anyway, and if it is called early on pre-v6 hardware, the old behavior (crashing) is maintained. ideally __aeabi_read_tp would not be called at all on v7+ targets anyway -- in fact, prior to the overhaul, the same problem existed, but it was never caught by users building for v7+ with kuser disabled. however, it's possible for calls to __aeabi_read_tp to end up in a v7+ binary if some of the object files were built for pre-v7 targets, e.g. in the case of static libraries that were built separately, so this case needs to be handled.
* overhaul ARM atomics/tls for performance and compatibilityRich Felker2014-11-195-44/+330
| | | | | | | | | | | | | | | | | | | | | | | | previously, builds for pre-armv6 targets hard-coded use of the "kuser helper" system for atomics and thread-pointer access, resulting in binaries that fail to run (crash) on systems where this functionality has been disabled (as a security/hardening measure) in the kernel. additionally, builds for armv6 hard-coded an outdated/deprecated memory barrier instruction which may require emulation (extremely slow) on future models. this overhaul replaces the behavior for all pre-armv7 builds (both of the above cases) to perform runtime detection of the appropriate mechanisms for barrier, atomic compare-and-swap, and thread pointer access. detection is based on information provided by the kernel in auxv: presence of the HWCAP_TLS bit for AT_HWCAP and the architecture version encoded in AT_PLATFORM. direct use of the instructions is preferred when possible, since probing for the existence of the kuser helper page would be difficult and would incur runtime cost. for builds targeting armv7 or later, the runtime detection code is not compiled at all, and much more efficient versions of the non-cas atomic operations are provided by using ldrex/strex directly rather than wrapping cas.
* fix 64-bit syscall argument passing on or1kRich Felker2014-11-051-1/+1
| | | | | | the kernel syscall interface for or1k does not expect 64-bit arguments to be aligned to "even" register boundaries. this incorrect alignment broke truncate/ftruncate and as well as a few less-common syscalls.
* add explicit barrier operation to internal atomic.h APIRich Felker2014-10-109-6/+33
|