about summary refs log tree commit diff
path: root/arch/sh
Commit message (Collapse)AuthorAgeFilesLines
...
* add MCL_ONFAULT and MLOCK_ONFAULT mlockall and mlock2 flagsSzabolcs Nagy2016-01-261-0/+1
| | | | | | | | they lock faulted pages into memory (useful when a small part of a large mapped file needs efficient access), new in linux v4.4, commit b0f205c2a3082dd9081f9a94e50658c5fa906ff1 MLOCK_* is not in the POSIX reserved namespace for sys/mman.h
* remove sh port's __fpscr_values source fileRich Felker2016-01-221-5/+0
| | | | | | | | | commit f3ddd173806fd5c60b3f034528ca24542aecc5b9, the dynamic linker bootstrap overhaul, silently disabled the definition of __fpscr_values in this file since libc.so's copy of __fpscr_values now comes from crt_arch.h, the same place the public definition in the main program's crt1.o ultimately comes from. remove this file which is no longer in use.
* move sh port's __shcall internal function from arch/sh/src to src treeRich Felker2016-01-221-5/+0
|
* move sh __unmapself code from arch/sh/src to main src treeRich Felker2016-01-221-24/+0
|
* overhaul sh atomics for new atomics framework, add j-core cas.l backendRich Felker2016-01-214-287/+30
| | | | | | | | | | | | | | | | sh needs runtime-selected atomic backends since there are a number of supported models that use non-forwards-compatible (non-smp-compatible) atomic mechanisms. previously, the code paths for this were highly inefficient since they involved C function calls with multiple branches in the callee and heavy spills in the caller. the new code performs calls the runtime-selected asm fragment from inline asm with extremely minimal clobbers, rather than using a function call. for the sh4a case where the atomic mechanism is known and there is no forward-compatibility issue, the movli.l and movco.l instructions are provided as a_ll and a_sc, allowing the new shared atomic.h to generate efficient inline versions of all the basic atomic operations without needing a cas loop.
* refactor internal atomic.hRich Felker2016-01-211-72/+0
| | | | | | | | | | | | | | | rather than having each arch provide its own atomic.h, there is a new shared atomic.h in src/internal which pulls arch-specific definitions from arc/$(ARCH)/atomic_arch.h. the latter can be extremely minimal, defining only a_cas or new ll/sc type primitives which the shared atomic.h will use to construct everything else. this commit avoids making heavy changes to the individual archs' atomic implementations. definitions which are identical or near-identical to what the new shared atomic.h would produce have been removed, but otherwise the changes made are just hooking up the arch-specific files to the new infrastructure. major changes to take advantage of the new system will come in subsequent commits.
* fix dynamic loader library mapping for nommu systemsRich Felker2015-11-111-0/+2
| | | | | | | | | | | | | | | | | | | | | on linux/nommu, non-writable private mappings of files may actually use memory shared with other processes or the fs cache. the old nommu loader code (used when mmap with MAP_FIXED fails) simply wrote over top of the original file mapping, possibly clobbering this shared memory. no such breakage was observed in practice, but it should have been possible. the new code starts by mapping anonymous writable memory on archs that might support nommu, then maps load segments over top of it, falling back to read if MAP_FIXED fails. we use an anonymous map rather than a writable file map to avoid reading more data from disk than needed. since pages cannot be loaded lazily on fault, in case of large data/bss, mapping the full file may read a lot of data that will subsequently be thrown away when processing additional LOAD segments. as a result, we cannot skip the first LOAD segment when operating in this mode. these changes affect only non-FDPIC nommu support.
* generalize sh entry point asm not to assume call dests fit in 12 bitsRich Felker2015-11-021-5/+12
| | | | | | this assumption is borderline-unsafe to begin with, and fails badly with -ffunction-sections since the linker can move the callee arbitrarily far away when it lies in a different section.
* properly access mcontext_t program counter in cancellation handlerRich Felker2015-11-021-1/+1
| | | | | | | | | using the actual mcontext_t definition rather than an overlaid pointer array both improves correctness/readability and eliminates some ugly hacks for archs with 64-bit registers bit 32-bit program counter. also fix UB due to comparison of pointers not in a common array object.
* fix signal return for sh/fdpicRich Felker2015-09-231-0/+8
| | | | | | | | | | | | | | | | the restorer function pointer provided in the kernel sigaction structure is interpreted by the kernel as a raw code address, not a function descriptor. this commit moves the declarations of the __restore and __restore_rt symbols to ksigaction.h so that arch versions of the file can override them, and introduces a version for sh which declares them as objects rather than functions. an alternate solution would have been defining SA_RESTORER to 0 so that the functions are not used, but this both requires executable stack (since the sh kernel does not have a vdso page with permanent restorer functions) and crashes on qemu user-level emulation.
* have sh/fdpic entry point set fdpic personality if neededRich Felker2015-09-221-0/+12
| | | | | | | | | | | | | | | | | the entry point code supports being loaded by a loader which is not fdpic-aware (in practice, either kernel with mmu or qemu without fdpic support). this mostly just works, but signal handling will wrongly use a function descriptor address as a code address if the personality is not adjusted to fdpic. ideally this code could be placed with sigaction so that it's not needed except if/when a signal handler is installed. however, personality is incorrectly maintained per-thread by the kernel, rather than per-process, so it's necessary to correct the personality before any threads are started. also, in order to skip the personality syscall when an fdpic-aware loader is used, we need to be able to detect how the program was loaded, and this information is only readily available at the entry point.
* add real fdpic loading of shared librariesRich Felker2015-09-221-0/+1
| | | | | | previously, the normal ELF library loading code was used even for fdpic, so only the kernel-loaded dynamic linker and main app could benefit from separate placement of segments and shared text.
* size-optimize sh/fdpic dynamic entry pointRich Felker2015-09-221-0/+4
| | | | | | the __fdpic_fixup code is not needed for ET_DYN executables, which instead use reloctions, so we can omit it from the dynamic linker and static-pie entry point and save some code size.
* work around breakage in sh/fdpic __unmapself functionRich Felker2015-09-221-0/+5
| | | | | | | | | | | | the C implementation of __unmapself used for potentially-nommu sh assumed CRTJMP takes a function descriptor rather than a code address; however, the actual dynamic linker needs a code address, and so commit 7a9669e977e5f750cf72ccbd2614f8b72ce02c4c changed the definition of the macro in reloc.h. this commit puts the old macro back in a place where it only affects __unmapself. this is an ugly workaround and should be cleaned up at some point, but at least it's well isolated.
* add general fdpic support in dynamic linker and arch support for shRich Felker2015-09-222-3/+12
| | | | | | | | | | | | | | | | | | at this point not all functionality is complete. the dynamic linker itself, and main app if it is also loaded by the kernel, take advantage of fdpic and do not need constant displacement between segments, but additional libraries loaded by the dynamic linker follow normal ELF semantics for mapping still. this fully works, but does not admit shared text on nommu. in terms of actual functional correctness, dlsym's results are presently incorrect for function symbols, RTLD_NEXT fails to identify the caller correctly, and dladdr fails almost entirely. with the dynamic linker entry point working, support for static pie is automatically included, but linking the main application as ET_DYN (pie) probably does not make sense for fdpic anyway. ET_EXEC is equally relocatable but more efficient at representing relocations.
* add sh fdpic subarch variantsRich Felker2015-09-121-1/+16
| | | | | | | | | with this commit it should be possible to produce a working static-linked fdpic libc and application binaries for sh. the changes in reloc.h are largely unused at this point since dynamic linking is not supported, but the CRTJMP macro is used one place outside of dynamic linking, in __unmapself.
* add fdpic version of entry point code for shRich Felker2015-09-121-0/+29
| | | | | | | this version of the entry point is only suitable for static linking in ET_EXEC form. neither dynamic linking nor pie is supported yet. at some point in the future the fdpic and non-fdpic versions of this code may be unified but for now it's easiest to work with them separately.
* make sh clone asm fdpic-compatibleRich Felker2015-09-121-0/+5
| | | | | | | | | | | clone calls back to a function pointer provided by the caller, which will actually be a pointer to a function descriptor on fdpic. the obvious solution is to have a separate version of clone for fdpic, but I have taken a simpler approach to go around the problem. instead of calling the pointed-to function from asm, a direct call is made to an internal C function which then calls the pointed-to function. this lets the C compiler generate the appropriate calling convention for an indirect call with no need for ABI-specific assembly.
* switch to using trap number 31 for syscalls on shRich Felker2015-06-161-1/+1
| | | | | | | | | | | | | | | | | | | nominally the low bits of the trap number on sh are the number of syscall arguments, but they have never been used by the kernel, and some code making syscalls does not even know the number of arguments and needs to pass an arbitrary high number anyway. sh3/sh4 traditionally used the trap range 16-31 for syscalls, but part of this range overlapped with hardware exceptions/interrupts on sh2 hardware, so an incompatible range 32-47 was chosen for sh2. using trap number 31 everywhere, since it's in the existing sh3/sh4 range and does not conflict with sh2 hardware, is a proposed unification of the kernel syscall convention that will allow binaries to be shared between sh2 and sh3/sh4. if this is not accepted into the kernel, we can refit the sh2 target with runtime selection mechanisms for the trap number, but doing so would be invasive and would entail non-trivial overhead.
* switch sh port's __unmapself to generic version when running on sh2/nommuRich Felker2015-06-161-0/+19
| | | | | | | | | | | | | due to the way the interrupt and syscall trap mechanism works, userspace on sh2 must never set the stack pointer to an invalid value. thus, the approach used on most archs, where __unmapself executes with no stack for the interval between SYS_munmap and SYS_exit, is not viable on sh2. in order not to pessimize sh3/sh4, the sh asm version of __unmapself is not removed. instead it's renamed and redirected through code that calls either the generic (safe) __unmapself or the sh3/sh4 asm, depending on compile-time and run-time conditions.
* add support for sh2 interrupt-masking-based atomics to sh portRich Felker2015-06-163-8/+113
| | | | | | | | | | | | | | | | | | | the sh2 target is being considered an ISA subset of sh3/sh4, in the sense that binaries built for sh2 are intended to be usable on later cpu models/kernels with mmu support. so rather than hard-coding sh2-specific atomics, the runtime atomic selection mechanisms that was already in place has been extended to add sh2 atomics. at this time, the sh2 atomics are not SMP-compatible; since the ISA lacks actual atomic operations, the new code instead masks interrupts for the duration of the atomic operation, producing an atomic result on single-core. this is only possible because the kernel/hardware does not impose protections against userspace doing so. additional changes will be needed to support future SMP systems. care has been taken to avoid producing significant additional code size in the case where it's known at compile-time that the target is not sh2 and does not need sh2-specific code.
* add .text section directive to all crt_arch.h files missing itRich Felker2015-05-221-0/+1
| | | | | | | | i386 and x86_64 versions already had the .text directive; other archs did not. normally, top-level (file scope) __asm__ starts in the .text section anyway, but problems were reported with some versions of clang, and it seems preferable to set it explicitly anyway, at least for the sake of consistency between archs.
* inline llsc atomics when building for sh4aBobby Bingham2015-05-192-90/+128
| | | | | | | If we're building for sh4a, the compiler is already free to use instructions only available on sh4a, so we can do the same and inline the llsc atomics. If we're building for an older processor, we still do the same runtime atomics selection as before.
* fix sh jmp_buf size to match ABIRich Felker2015-04-271-1/+1
| | | | | | | | | | | | | | | | | while the sh port is still experimental and subject to ABI instability, this is not actually an application/libc boundary ABI change. it only affects third-party APIs where jmp_buf is used in a shared structure at the ABI boundary, because nothing anywhere near the end of the jmp_buf object (which includes the oversized sigset_t) is accessed by libc. both glibc and uclibc have 15-slot jmp_buf for sh. presumably the smaller version was used in musl because the slots for fpu status register and thread pointer register (gbr) were incorrect and must not be restored by longjmp, but the size should have been preserved, as it's generally treated as a libc-agnostic ABI property for the arch, and having extra slots free in case we ever need them for something is useful anyway.
* fix ldso name for sh-nofpu subarchRich Felker2015-04-241-1/+7
| | | | | | | | | | | | | | | previously it was using the same name as the default ABI with hard float (floating point args and return value in registers). the test __SH_FPU_ANY__ || __SH4__ matches what's used in the configure script already, and seems correct under casual review against gcc's config/sh.h, but may need tweaks. the logic for predefined macros for sh, and what they all mean, is very complex. eventually this should be documented in comments here. configure already rejects "half-hard" configurations on sh where double=float since these do not conform to Annex F and are not suitable for musl, so these do not need to be considered here.
* fix failure of sh reloc.h to properly detect endianness for ldso nameRich Felker2015-04-241-0/+2
| | | | | versions of reloc.h that rely on endian macros much include endian.h to ensure they are available.
* dynamic linker bootstrap overhaulRich Felker2015-04-133-34/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | this overhaul further reduces the amount of arch-specific code needed by the dynamic linker and removes a number of assumptions, including: - that symbolic function references inside libc are bound at link time via the linker option -Bsymbolic-functions. - that libc functions used by the dynamic linker do not require access to data symbols. - that static/internal function calls and data accesses can be made without performing any relocations, or that arch-specific startup code handled any such relocations needed. removing these assumptions paves the way for allowing libc.so itself to be built with stack protector (among other things), and is achieved by a three-stage bootstrap process: 1. relative relocations are processed with a flat function. 2. symbolic relocations are processed with no external calls/data. 3. main program and dependency libs are processed with a fully-functional libc/ldso. reduction in arch-specific code is achived through the following: - crt_arch.h, used for generating crt1.o, now provides the entry point for the dynamic linker too. - asm is no longer responsible for skipping the beginning of argv[] when ldso is invoked as a command. - the functionality previously provided by __reloc_self for heavily GOT-dependent RISC archs is now the arch-agnostic stage-1. - arch-specific relocation type codes are mapped directly as macros rather than via an inline translation function/switch statement.
* move O_PATH definition back to arch bitsRich Felker2015-04-011-0/+1
| | | | | | | while it's the same for all presently supported archs, it differs at least on sparc, and conceptually it's no less arch-specific than the other O_* macros. O_SEARCH and O_EXEC are still defined in terms of O_PATH in the main fcntl.h.
* fix MINSIGSTKSZ values for archs with large signal contextsRich Felker2015-03-181-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | the previous values (2k min and 8k default) were too small for some archs. aarch64 reserves 4k in the signal context for future extensions and requires about 4.5k total, and powerpc reportedly uses over 2k. the new minimums are chosen to fit the saved context and also allow a minimal signal handler to run. since the default (SIGSTKSZ) has always been 6k larger than the minimum, it is also increased to maintain the 6k usable by the signal handler. this happens to be able to store one pathname buffer and should be sufficient for calling any function in libc that doesn't involve conversion between floating point and decimal representations. x86 (both 32-bit and 64-bit variants) may also need a larger minimum (around 2.5k) in the future to support avx-512, but the values on these archs are left alone for now pending further analysis. the value for PTHREAD_STACK_MIN is not increased to match MINSIGSTKSZ at this time. this is so as not to preclude applications from using extremely small thread stacks when they know they will not be handling signals. unfortunately cancellation and multi-threaded set*id() use signals as an implementation detail and therefore require a stack large enough for a signal context, so applications which use extremely small thread stacks may still need to avoid using these features.
* fix FLT_ROUNDS to reflect the current rounding modeSzabolcs Nagy2015-03-071-1/+0
| | | | | Implemented as a wrapper around fegetround introducing a new function to the ABI: __flt_rounds. (fegetround cannot be used directly from float.h)
* fix POLLWRNORM and POLLWRBAND on mipsTrutz Behn2015-03-041-0/+0
| | | | | | these macros have the same distinct definition on blackfin, frv, m68k, mips, sparc and xtensa kernels. POLLMSG and POLLRDHUP additionally differ on sparc.
* make all objects used with atomic operations volatileRich Felker2015-03-031-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the memory model we use internally for atomics permits plain loads of values which may be subject to concurrent modification without requiring that a special load function be used. since a compiler is free to make transformations that alter the number of loads or the way in which loads are performed, the compiler is theoretically free to break this usage. the most obvious concern is with atomic cas constructs: something of the form tmp=*p;a_cas(p,tmp,f(tmp)); could be transformed to a_cas(p,*p,f(*p)); where the latter is intended to show multiple loads of *p whose resulting values might fail to be equal; this would break the atomicity of the whole operation. but even more fundamental breakage is possible. with the changes being made now, objects that may be modified by atomics are modeled as volatile, and the atomic operations performed on them by other threads are modeled as asynchronous stores by hardware which happens to be acting on the request of another thread. such modeling of course does not itself address memory synchronization between cores/cpus, but that aspect was already handled. this all seems less than ideal, but it's the best we can do without mandating a C11 compiler and using the C11 model for atomics. in the case of pthread_once_t, the ABI type of the underlying object is not volatile-qualified. so we are assuming that accessing the object through a volatile-qualified lvalue via casts yields volatile access semantics. the language of the C standard is somewhat unclear on this matter, but this is an assumption the linux kernel also makes, and seems to be the correct interpretation of the standard.
* move MREMAP_MAYMOVE and MREMAP_FIXED out of bitsTrutz Behn2015-01-301-3/+0
| | | | | | the definitions are generic for all kernel archs. exposure of these macros now only occurs on the same feature test as for the function accepting them, which is believed to be more correct.
* move wint_t definition to the shared part of alltypes.h.inRich Felker2014-12-211-1/+0
|
* add explicit barrier operation to internal atomic.h APIRich Felker2014-10-101-1/+3
|
* add threads.h and needed per-arch types for mtx_t and cnd_tRich Felker2014-09-061-0/+2
| | | | | | | | | | | | | | | | based on patch by Jens Gustedt. mtx_t and cnd_t are defined in such a way that they are formally "compatible types" with pthread_mutex_t and pthread_cond_t, respectively, when accessed from a different translation unit. this makes it possible to implement the C11 functions using the pthread functions (which will dereference them with the pthread types) without having to use the same types, which would necessitate either namespace violations (exposing pthread type names in threads.h) or incompatible changes to the C++ name mangling ABI for the pthread types. for the rest of the types, things are much simpler; using identical types is possible without any namespace considerations.
* add working a_spin() atomic for non-x86 targetsRich Felker2014-08-251-0/+1
| | | | | | | | | | | | | conceptually, a_spin needs to be at least a compiler barrier, so the compiler will not optimize out loops (and the load on each iteration) while spinning. it should also be a memory barrier, or the spinning thread might keep spinning without noticing stores from other threads, thus delaying for longer than it should. ideally, an optimal a_spin implementation that avoids unnecessary cache/memory contention should be chosen for each arch, but for now, the easiest thing is to perform a useless a_cas on the calling thread's stack.
* add max_align_t definition for C11 and C++11Rich Felker2014-08-201-0/+2
| | | | | | | | | | | | | | | | | unfortunately this needs to be able to vary by arch, because of a huge mess GCC made: the GCC definition, which became the ABI, depends on quirks in GCC's definition of __alignof__, which does not match the formal alignment of the type. GCC's __alignof__ unexpectedly exposes the an implementation detail, its "preferred alignment" for the type, rather than the formal/ABI alignment of the type, which it only actually uses in structures. on most archs the two values are the same, but on some (at least i386) the preferred alignment is greater than the ABI alignment. I considered using _Alignas(8) unconditionally, but on at least one arch (or1k), the alignment of max_align_t with GCC's definition is only 4 (even the "preferred alignment" for these types is only 4).
* make pointers used in robust list volatileRich Felker2014-08-171-1/+1
| | | | | | | | | | | | | | | | | | | | when manipulating the robust list, the order of stores matters, because the code may be asynchronously interrupted by a fatal signal and the kernel will then access the robust list in what is essentially an async-signal context. previously, aliasing considerations made it seem unlikely that a compiler could reorder the stores, but proving that they could not be reordered incorrectly would have been extremely difficult. instead I've opted to make all the pointers used as part of the robust list, including those in the robust list head and in the individual mutexes, volatile. in addition, the format of the robust list has been changed to point back to the head at the end, rather than ending with a null pointer. this is to match the documented kernel robust list ABI. the null pointer, which was previously used, only worked because faults during access terminate the robust list processing.
* fix terminal control ioctl constants for shRich Felker2014-07-291-4/+8
| | | | | | | | | | | | this commit changes the names to match the kernel names, exposing under the normal names the "old" versions which work with a smaller termios structure compatible with the userspace structure, and renaming the "new" versions with "2" on the end like the kernel has. this fixes spurious warnings "Unsupported ioctl: cmd=0x802c542a" from qemu-sh4 and should be more correct anyway, since our userspace termios structure does not have meaningful information in the part which the kernel would be interpreting as speeds with the new ioctl.
* clean up unused and inconsistent atomics in arch dirsRich Felker2014-07-271-5/+0
| | | | | | | | | | | the a_cas_l, a_swap_l, a_swap_p, and a_store_l operations were probably used a long time ago when only i386 and x86_64 were supported. as other archs were added, support for them was inconsistent, and they are obviously not in use at present. having them around potentially confuses readers working on new ports, and the type-punning hacks and inconsistent use of types in their definitions is not a style I wish to perpetuate in the source tree, so removing them seems appropriate.
* fix insufficient synchronization in sh atomic asmRich Felker2014-07-271-1/+2
| | | | | | | while other usage I've seen only has the synco instruction after the atomic operation, I cannot find any documentation indicating that this is correct. certainly all stores before the atomic need to have been synchronized before the atomic operation takes place.
* refactor to remove arch-specific relocation code from dynamic linkerRich Felker2014-06-181-25/+13
| | | | | | | | | | | this was one of the main instances of ugly code duplication: all archs use basically the same types of relocations, but roughly equivalent logic was duplicated for each arch to account for the different naming and numbering of relocation types and variation in whether REL or RELA records are used. as an added bonus, both REL and RELA are now supported on all archs, regardless of which is used by the standard toolchain.
* multiple fixes to sh (superh) dynamic linker relocationsRich Felker2014-06-171-10/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | the following issues are fixed: - R_SH_REL32 was adding the load address of the module being relocated to the result. this seems to have been a mistake in the original port, since it does not match other dynamic linker implementations and since adding a difference between two addresses (the symbol value and the relocation address) to a load address does not make sense. - R_SH_TLS_DTPMOD32 was wrongly accepting an inline addend (i.e. using += rather than = on *reloc_addr) which makes no sense; addition is not an operation that's defined on module ids. - R_SH_TLS_DTPOFF32 and R_SH_TLS_TPOFF32 were wrongly using inline addends rather than the RELA-provided addends. in addition, handling of R_SH_GLOB_DAT, R_SH_JMP_SLOT, and R_SH_DIR32 are merged to all honor the addend. the first two should not need it for correct usage generated by toolchains, but other dynamic linkers allow addends here, and it simplifies the code anyway. these issues were spotted while reviewing the code for the purpose of refactoring this part of the dynamic linker. no testing was performed.
* dynamic linker: permit error returns from arch-specific reloc functionRich Felker2014-06-161-1/+2
| | | | | | | | the immediate motivation is supporting TLSDESC relocations which require allocation and thus may fail (unless we pre-allocate), but this mechanism should also be used for throwing an error on unsupported or invalid relocation types, and perhaps in certain cases, for reporting when a relocation is not satisfiable.
* fix RLIMIT_ constants for mipsSzabolcs Nagy2014-04-151-0/+0
| | | | | | | The mips arch is special in that it uses different RLIMIT_ numbers than other archs, so allow bits/resource.h to override the default RLIMIT_ numbers (empty on all archs except mips). Reported by orc.
* fix signal.h breakage from moving stack_t to arch-specific bitsRich Felker2014-03-181-6/+6
| | | | | | in the previous changes, I missed the fact that both the prototype of the sigaltstack function and the definition of ucontext_t depend on stack_t.
* move signal.h definition of stack_t to arch-specific bitsRich Felker2014-03-181-0/+6
| | | | | it's different at least on mips. mips version will be fixed in a separate commit to show the change.
* fix typo in filename used in sh portRich Felker2014-03-181-0/+0
|
* superh: fix dynamic linking of __fpscr_valuesBobby Bingham2014-03-162-1/+7
| | | | | | | | | | | | | | Applications ended up with copy relocations for this array, which resulted in libc's references to this array pointing to the application's copy. The dynamic linker, however, can require this array before the application is relocated, and therefore before the application's copy of this array is initialized. This resulted in garbage being loaded into FPSCR before executing main, which violated the ABI. We fix this by putting the array in crt1 and making the libc copy private. This prevents libc's reference to the array from pointing to an uninitialized copy in the application.