about summary refs log tree commit diff
path: root/sysdeps/aarch64
Commit message (Collapse)AuthorAgeFilesLines
* [AArch64] Update libm-test-ulpsSzabolcs Nagy2017-03-271-80/+88
| | | | * sysdeps/aarch64/libm-test-ulps: Update.
* Add ifunc support for aarch64.Steve Ellcey2017-03-152-0/+19
| | | | | | | | | | | * sysdeps/aarch64/dl-machine.h: Include cpu-features.c. (DL_PLATFORM_INIT): New define. (dl_platform_init): New function. * sysdeps/aarch64/ldsodefs.h: Include cpu-features.h. * sysdeps/unix/sysv/linux/aarch64/cpu-features.c: New file. * sysdeps/unix/sysv/linux/aarch64/cpu-features.h: Likewise. * sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c: Likewise. * sysdeps/unix/sysv/linux/aarch64/libc-start.c: Likewise.
* New pthread rwlock that is more scalable.Torvald Riegel2017-01-101-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This replaces the pthread rwlock with a new implementation that uses a more scalable algorithm (primarily through not using a critical section anymore to make state changes). The fast path for rdlock acquisition and release is now basically a single atomic read-modify write or CAS and a few branches. See nptl/pthread_rwlock_common.c for details. * nptl/DESIGN-rwlock.txt: Remove. * nptl/lowlevelrwlock.sym: Remove. * nptl/Makefile: Add new tests. * nptl/pthread_rwlock_common.c: New file. Contains the new rwlock. * nptl/pthreadP.h (PTHREAD_RWLOCK_PREFER_READER_P): Remove. (PTHREAD_RWLOCK_WRPHASE, PTHREAD_RWLOCK_WRLOCKED, PTHREAD_RWLOCK_RWAITING, PTHREAD_RWLOCK_READER_SHIFT, PTHREAD_RWLOCK_READER_OVERFLOW, PTHREAD_RWLOCK_WRHANDOVER, PTHREAD_RWLOCK_FUTEX_USED): New. * nptl/pthread_rwlock_init.c (__pthread_rwlock_init): Adapt to new implementation. * nptl/pthread_rwlock_rdlock.c (__pthread_rwlock_rdlock_slow): Remove. (__pthread_rwlock_rdlock): Adapt. * nptl/pthread_rwlock_timedrdlock.c (pthread_rwlock_timedrdlock): Adapt. * nptl/pthread_rwlock_timedwrlock.c (pthread_rwlock_timedwrlock): Adapt. * nptl/pthread_rwlock_trywrlock.c (pthread_rwlock_trywrlock): Adapt. * nptl/pthread_rwlock_tryrdlock.c (pthread_rwlock_tryrdlock): Adapt. * nptl/pthread_rwlock_unlock.c (pthread_rwlock_unlock): Adapt. * nptl/pthread_rwlock_wrlock.c (__pthread_rwlock_wrlock_slow): Remove. (__pthread_rwlock_wrlock): Adapt. * nptl/tst-rwlock10.c: Adapt. * nptl/tst-rwlock11.c: Adapt. * nptl/tst-rwlock17.c: New file. * nptl/tst-rwlock18.c: New file. * nptl/tst-rwlock19.c: New file. * nptl/tst-rwlock2b.c: New file. * nptl/tst-rwlock8.c: Adapt. * nptl/tst-rwlock9.c: Adapt. * sysdeps/aarch64/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Adapt. * sysdeps/arm/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Adapt. * sysdeps/hppa/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Adapt. * sysdeps/ia64/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Adapt. * sysdeps/m68k/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Adapt. * sysdeps/microblaze/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Adapt. * sysdeps/mips/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Adapt. * sysdeps/nios2/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Adapt. * sysdeps/s390/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Adapt. * sysdeps/sh/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Adapt. * sysdeps/sparc/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Adapt. * sysdeps/tile/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Adapt. * sysdeps/unix/sysv/linux/alpha/bits/pthreadtypes.h (pthread_rwlock_t): Adapt. * sysdeps/unix/sysv/linux/powerpc/bits/pthreadtypes.h (pthread_rwlock_t): Adapt. * sysdeps/x86/bits/pthreadtypes.h (pthread_rwlock_t): Adapt. * nptl/nptl-printers.py (): Adapt. * nptl/nptl_lock_constants.pysym: Adapt. * nptl/test-rwlock-printers.py: Adapt. * nptl/test-rwlockattr-printers.c: Adapt. * nptl/test-rwlockattr-printers.py: Adapt.
* Update copyright dates with scripts/update-copyrights.Joseph Myers2017-01-01107-107/+107
|
* New condvar implementation that provides stronger ordering guarantees.Torvald Riegel2016-12-311-9/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a new implementation for condition variables, required after http://austingroupbugs.net/view.php?id=609 to fix bug 13165. In essence, we need to be stricter in which waiters a signal or broadcast is required to wake up; this couldn't be solved using the old algorithm. ISO C++ made a similar clarification, so this also fixes a bug in current libstdc++, for example. We can't use the old algorithm anymore because futexes do not guarantee to wake in FIFO order. Thus, when we wake, we can't simply let any waiter grab a signal, but we need to ensure that one of the waiters happening before the signal is woken up. This is something the previous algorithm violated (see bug 13165). There's another issue specific to condvars: ABA issues on the underlying futexes. Unlike mutexes that have just three states, or semaphores that have no tokens or a limited number of them, the state of a condvar is the *order* of the waiters. A waiter on a semaphore can grab a token whenever one is available; a condvar waiter must only consume a signal if it is eligible to do so as determined by the relative order of the waiter and the signal. Therefore, this new algorithm maintains two groups of waiters: Those eligible to consume signals (G1), and those that have to wait until previous waiters have consumed signals (G2). Once G1 is empty, G2 becomes the new G1. 64b counters are used to avoid ABA issues. This condvar doesn't yet use a requeue optimization (ie, on a broadcast, waking just one thread and requeueing all others on the futex of the mutex supplied by the program). I don't think doing the requeue is necessarily the right approach (but I haven't done real measurements yet): * If a program expects to wake many threads at the same time and make that scalable, a condvar isn't great anyway because of how it requires waiters to operate mutually exclusive (due to the mutex usage). Thus, a thundering herd problem is a scalability problem with or without the optimization. Using something like a semaphore might be more appropriate in such a case. * The scalability problem is actually at the mutex side; the condvar could help (and it tries to with the requeue optimization), but it should be the mutex who decides how that is done, and whether it is done at all. * Forcing all but one waiter into the kernel-side wait queue of the mutex prevents/avoids the use of lock elision on the mutex. Thus, it prevents the only cure against the underlying scalability problem inherent to condvars. * If condvars use short critical sections (ie, hold the mutex just to check a binary flag or such), which they should do ideally, then forcing all those waiter to proceed serially with kernel-based hand-off (ie, futex ops in the mutex' contended state, via the futex wait queues) will be less efficient than just letting a scalable mutex implementation take care of it. Our current mutex impl doesn't employ spinning at all, but if critical sections are short, spinning can be much better. * Doing the requeue stuff requires all waiters to always drive the mutex into the contended state. This leads to each waiter having to call futex_wake after lock release, even if this wouldn't be necessary. [BZ #13165] * nptl/pthread_cond_broadcast.c (__pthread_cond_broadcast): Rewrite to use new algorithm. * nptl/pthread_cond_destroy.c (__pthread_cond_destroy): Likewise. * nptl/pthread_cond_init.c (__pthread_cond_init): Likewise. * nptl/pthread_cond_signal.c (__pthread_cond_signal): Likewise. * nptl/pthread_cond_wait.c (__pthread_cond_wait): Likewise. (__pthread_cond_timedwait): Move here from pthread_cond_timedwait.c. (__condvar_confirm_wakeup, __condvar_cancel_waiting, __condvar_cleanup_waiting, __condvar_dec_grefs, __pthread_cond_wait_common): New. (__condvar_cleanup): Remove. * npt/pthread_condattr_getclock.c (pthread_condattr_getclock): Adapt. * npt/pthread_condattr_setclock.c (pthread_condattr_setclock): Likewise. * npt/pthread_condattr_getpshared.c (pthread_condattr_getpshared): Likewise. * npt/pthread_condattr_init.c (pthread_condattr_init): Likewise. * nptl/tst-cond1.c: Add comment. * nptl/tst-cond20.c (do_test): Adapt. * nptl/tst-cond22.c (do_test): Likewise. * sysdeps/aarch64/nptl/bits/pthreadtypes.h (pthread_cond_t): Adapt structure. * sysdeps/arm/nptl/bits/pthreadtypes.h (pthread_cond_t): Likewise. * sysdeps/ia64/nptl/bits/pthreadtypes.h (pthread_cond_t): Likewise. * sysdeps/m68k/nptl/bits/pthreadtypes.h (pthread_cond_t): Likewise. * sysdeps/microblaze/nptl/bits/pthreadtypes.h (pthread_cond_t): Likewise. * sysdeps/mips/nptl/bits/pthreadtypes.h (pthread_cond_t): Likewise. * sysdeps/nios2/nptl/bits/pthreadtypes.h (pthread_cond_t): Likewise. * sysdeps/s390/nptl/bits/pthreadtypes.h (pthread_cond_t): Likewise. * sysdeps/sh/nptl/bits/pthreadtypes.h (pthread_cond_t): Likewise. * sysdeps/tile/nptl/bits/pthreadtypes.h (pthread_cond_t): Likewise. * sysdeps/unix/sysv/linux/alpha/bits/pthreadtypes.h (pthread_cond_t): Likewise. * sysdeps/unix/sysv/linux/powerpc/bits/pthreadtypes.h (pthread_cond_t): Likewise. * sysdeps/x86/bits/pthreadtypes.h (pthread_cond_t): Likewise. * sysdeps/nptl/internaltypes.h (COND_NWAITERS_SHIFT): Remove. (COND_CLOCK_BITS): Adapt. * sysdeps/nptl/pthread.h (PTHREAD_COND_INITIALIZER): Adapt. * nptl/pthreadP.h (__PTHREAD_COND_CLOCK_MONOTONIC_MASK, __PTHREAD_COND_SHARED_MASK): New. * nptl/nptl-printers.py (CLOCK_IDS): Remove. (ConditionVariablePrinter, ConditionVariableAttributesPrinter): Adapt. * nptl/nptl_lock_constants.pysym: Adapt. * nptl/test-cond-printers.py: Adapt. * sysdeps/unix/sysv/linux/hppa/internaltypes.h (cond_compat_clear, cond_compat_check_and_clear): Adapt. * sysdeps/unix/sysv/linux/hppa/pthread_cond_timedwait.c: Remove file ... * sysdeps/unix/sysv/linux/hppa/pthread_cond_wait.c (__pthread_cond_timedwait): ... and move here. * nptl/DESIGN-condvar.txt: Remove file. * nptl/lowlevelcond.sym: Likewise. * nptl/pthread_cond_timedwait.c: Likewise. * sysdeps/unix/sysv/linux/i386/i486/pthread_cond_broadcast.S: Likewise. * sysdeps/unix/sysv/linux/i386/i486/pthread_cond_signal.S: Likewise. * sysdeps/unix/sysv/linux/i386/i486/pthread_cond_timedwait.S: Likewise. * sysdeps/unix/sysv/linux/i386/i486/pthread_cond_wait.S: Likewise. * sysdeps/unix/sysv/linux/i386/i586/pthread_cond_broadcast.S: Likewise. * sysdeps/unix/sysv/linux/i386/i586/pthread_cond_signal.S: Likewise. * sysdeps/unix/sysv/linux/i386/i586/pthread_cond_timedwait.S: Likewise. * sysdeps/unix/sysv/linux/i386/i586/pthread_cond_wait.S: Likewise. * sysdeps/unix/sysv/linux/i386/i686/pthread_cond_broadcast.S: Likewise. * sysdeps/unix/sysv/linux/i386/i686/pthread_cond_signal.S: Likewise. * sysdeps/unix/sysv/linux/i386/i686/pthread_cond_timedwait.S: Likewise. * sysdeps/unix/sysv/linux/i386/i686/pthread_cond_wait.S: Likewise. * sysdeps/unix/sysv/linux/x86_64/pthread_cond_broadcast.S: Likewise. * sysdeps/unix/sysv/linux/x86_64/pthread_cond_signal.S: Likewise. * sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S: Likewise. * sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S: Likewise.
* Refactor long double information into bits/long-double.h.Joseph Myers2016-12-141-21/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Information about whether the ABI of long double is the same as that of double is split between bits/mathdef.h and bits/wordsize.h. When the ABIs are the same, bits/mathdef.h defines __NO_LONG_DOUBLE_MATH. In addition, in the case where the same glibc binary supports both -mlong-double-64 and -mlong-double-128, bits/wordsize.h defines __LONG_DOUBLE_MATH_OPTIONAL, along with __NO_LONG_DOUBLE_MATH if this particular compilation is with -mlong-double-64. As part of the refactoring I proposed in <https://sourceware.org/ml/libc-alpha/2016-11/msg00745.html>, this patch puts all that information in a single header, bits/long-double.h. It is included from sys/cdefs.h alongside the include of bits/wordsize.h, so other headers generally do not need to include bits/long-double.h directly. Previously, various bits/mathdef.h headers and bits/wordsize.h headers had this long double information (including implicitly in some bits/mathdef.h headers through not having the defines present in the default version). After the patch, it's all in six bits/long-double.h headers. Furthermore, most of those new headers are not architecture-specific. Architectures with optional long double all use the ldbl-opt sysdeps directory, either in the order (ldbl-64-128, ldbl-opt, ldbl-128) or (ldbl-128ibm, ldbl-opt). Thus a generic header for the case where long double = double, and headers in ldbl-128, ldbl-96 and ldbl-opt, suffices to cover every architecture except for cases where long double properties vary between different ABIs sharing a set of installed headers; fortunately all the ldbl-opt cases share a single compiler-predefined macro __LONG_DOUBLE_128__ that can be used to tell whether this compilation is -mlong-double-64 or -mlong-double-128. The two cases where a set of headers is shared between ABIs with different long double properties, MIPS (o32 has long double = double, other ABIs use ldbl-128) and SPARC (32-bit has optional long double, 64-bit has required long double), need their own bits/long-double.h headers. As with bits/wordsize.h, multiple-include protection for this header is generally implicit through the include guards on sys/cdefs.h, and multiple inclusion is harmless in any case. There is one subtlety: the header must not define __LONG_DOUBLE_MATH_OPTIONAL if __NO_LONG_DOUBLE_MATH was defined before its inclusion, because doing so breaks how sysdeps/ieee754/ldbl-opt/nldbl-compat.h defines __NO_LONG_DOUBLE_MATH itself before including system headers. Subject to keeping that working, it would be reasonable to move these macros from defined/undefined #ifdef to always-defined 1/0 #if semantics, but this patch does not attempt to do so, just rearranges where the macros are defined. After this patch, the only use of bits/mathdef.h is the alpha one for modifying complex function ABIs for old GCC. Thus, all versions of the header other than the default and alpha versions are removed, as is the include from math.h. Tested for x86_64 and x86. Also did compilation-only testing with build-many-glibcs.py. * bits/long-double.h: New file. * sysdeps/ieee754/ldbl-128/bits/long-double.h: Likewise. * sysdeps/ieee754/ldbl-96/bits/long-double.h: Likewise. * sysdeps/ieee754/ldbl-opt/bits/long-double.h: Likewise. * sysdeps/mips/bits/long-double.h: Likewise. * sysdeps/unix/sysv/linux/sparc/bits/long-double.h: Likewise. * math/Makefile (headers): Add bits/long-double.h. * misc/sys/cdefs.h: Include <bits/long-double.h>. * stdlib/strtold.c: Include <bits/long-double.h> instead of <bits/wordsize.h>. * bits/mathdef.h [!_COMPLEX_H]: Do not allow inclusion. [!__NO_LONG_DOUBLE_MATH]: Remove conditional code. * math/math.h: Do not include <bits/mathdef.h>. * sysdeps/aarch64/bits/mathdef.h: Remove file. * sysdeps/alpha/bits/mathdef.h [!_COMPLEX_H]: Do not allow inclusion. * sysdeps/ia64/bits/mathdef.h: Remove file. * sysdeps/m68k/m680x0/bits/mathdef.h: Likewise. * sysdeps/mips/bits/mathdef.h: Likewise. * sysdeps/powerpc/bits/mathdef.h: Likewise. * sysdeps/s390/bits/mathdef.h: Likewise. * sysdeps/sparc/bits/mathdef.h: Likewise. * sysdeps/x86/bits/mathdef.h: Likewise. * sysdeps/s390/s390-32/bits/wordsize.h [!__NO_LONG_DOUBLE_MATH && !__LONG_DOUBLE_MATH_OPTIONAL]: Remove conditional code. * sysdeps/s390/s390-64/bits/wordsize.h [!__NO_LONG_DOUBLE_MATH && !__LONG_DOUBLE_MATH_OPTIONAL]: Likewise. * sysdeps/unix/sysv/linux/alpha/bits/wordsize.h [!__NO_LONG_DOUBLE_MATH && !__LONG_DOUBLE_MATH_OPTIONAL]: Likewise. * sysdeps/unix/sysv/linux/powerpc/bits/wordsize.h [!__NO_LONG_DOUBLE_MATH && !__LONG_DOUBLE_MATH_OPTIONAL]: Likewise. * sysdeps/unix/sysv/linux/sparc/bits/wordsize.h [!__NO_LONG_DOUBLE_MATH && !__LONG_DOUBLE_MATH_OPTIONAL]: Likewise.
* aarch64: Use explicit offsets in _dl_tlsdesc_dynamicFlorian Weimer2016-12-022-9/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 389d1f1b232b3d6b9d73ee2c50e543ace6675621 (“Partial ILP32 support for aarch64”) broke dynamic TLS support because a load offset changed: 0000000000000030 <_dl_tlsdesc_dynamic>: 30: a9bc7bfd stp x29, x30, [sp,#-64]! 34: 910003fd mov x29, sp 38: a9020be1 stp x1, x2, [sp,#32] 3c: a90313e3 stp x3, x4, [sp,#48] 40: d53bd044 mrs x4, tpidr_el0 44: c8dffc1f ldar xzr, [x0] 48: f9400401 ldr x1, [x0,#8] 4c: f9400080 ldr x0, [x4] 50: f9400823 ldr x3, [x1,#16] 54: f9400002 ldr x2, [x0] 58: eb02007f cmp x3, x2 5c: 540001a8 b.hi 90 <_dl_tlsdesc_dynamic+0x60> 60: f9400022 ldr x2, [x1] 64: 8b021000 add x0, x0, x2, lsl #4 68: f9400000 ldr x0, [x0] 6c: b100041f cmn x0, #0x1 70: 54000100 b.eq 90 <_dl_tlsdesc_dynamic+0x60> - 74: f9400421 ldr x1, [x1,#8] + 74: f9400821 ldr x1, [x1,#16] 78: 8b010000 add x0, x0, x1 … This commit introduces explicit struct offsets, generated from the C headers, fixing the regression.
* Refactor FP_ILOGB* out of bits/mathdef.h.Joseph Myers2016-12-011-9/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Continuing the refactoring of bits/mathdef.h, this patch stops it defining FP_ILOGB0 and FP_ILOGBNAN, moving the required information to a new header bits/fp-logb.h. There are only two possible values of each of those macros permitted by ISO C. TS 18661-1 adds corresponding macros for llogb, and their values are required to correspond to those of the ilogb macros in the obvious way. Thus two boolean values - for which the same choices are correct for most architectures - suffice to determine the value of all these macros, and by defining macros for those boolean values in bits/fp-logb.h we can then define the public FP_* macros in math.h and avoid the present duplication of the associated feature test macro logic. This patch duly moves to bits/fp-logb.h defining __FP_LOGB0_IS_MIN and __FP_LOGBNAN_IS_MIN. Default definitions of those to 0 are correct for both architectures, while ia64, m68k and x86 get their own versions of bits/fp-logb.h to reflect their use of values different from the defaults. The patch renders many copies of bits/mathdef.h trivial (needed only to avoid the default __NO_LONG_DOUBLE_MATH). I'll revise <https://sourceware.org/ml/libc-alpha/2016-11/msg00865.html> accordingly so that it removes all bits/mathdef.h headers except the default one and the alpha one, and arranges for the header to be included only by complex.h as the only remaining use at that point will be for the alpha ABI issues there. Tested for x86_64 and x86. Also did compile-only testing with build-many-glibcs.py (using glibc sources from before the commit that introduced many build failures with undefined __GI___sigsetjmp). * bits/fp-logb.h: New file. * sysdeps/ia64/bits/fp-logb.h: Likewise. * sysdeps/m68k/m680x0/bits/fp-logb.h: Likewise. * sysdeps/x86/bits/fp-logb.h: Likewise. * math/Makefile (headers): Add bits/fp-logb.h. * math/math.h: Include <bits/fp-logb.h>. [__USE_ISOC99] (FP_ILOGB0): Define based on __FP_LOGB0_IS_MIN. [__USE_ISOC99] (FP_ILOGBNAN): Define based on __FP_LOGBNAN_IS_MIN. * bits/mathdef.h (FP_ILOGB0): Remove. (FP_ILOGBNAN): Likewise. * sysdeps/aarch64/bits/mathdef.h (FP_ILOGB0): Likewise. (FP_ILOGBNAN): Likewise. * sysdeps/alpha/bits/mathdef.h (FP_ILOGB0): Likewise. (FP_ILOGBNAN): Likewise. * sysdeps/ia64/bits/mathdef.h (FP_ILOGB0): Likewise. (FP_ILOGBNAN): Likewise. * sysdeps/m68k/m680x0/bits/mathdef.h (FP_ILOGB0): Likewise. (FP_ILOGBNAN): Likewise. * sysdeps/mips/bits/mathdef.h (FP_ILOGB0): Likewise. (FP_ILOGBNAN): Likewise. * sysdeps/powerpc/bits/mathdef.h (FP_ILOGB0): Likewise. (FP_ILOGBNAN): Likewise. * sysdeps/s390/bits/mathdef.h (FP_ILOGB0): Likewise. (FP_ILOGBNAN): Likewise. * sysdeps/sparc/bits/mathdef.h (FP_ILOGB0): Likewise. (FP_ILOGBNAN): Likewise. * sysdeps/x86/bits/mathdef.h (FP_ILOGB0): Likewise. (FP_ILOGBNAN): Likewise.
* Refactor FP_FAST_* into bits/fp-fast.h.Joseph Myers2016-11-292-3/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Continuing the refactoring of bits/mathdef.h, this patch moves the FP_FAST_* definitions into a new bits/fp-fast.h header. Currently this is only for FP_FAST_FMA*, but in future it would be the appropriate place for the FP_FAST_* macros from TS 18661-1 as well. The generic bits/mathdef.h header defines these macros based on whether the compiler defines __FP_FAST_*. Most architecture-specific headers, however, fail to do so, meaning that if the architecture (or some particular processors) does in fact have fused operations, and GCC knows to use them inline, the FP_FAST_* macros will still not be defined. By refactoring, this patch causes the generic version (based on __FP_FAST_*) to be used in more cases, and so the macro definitions to be more accurate. Architectures that already defined some or all of these macros other than based on the predefines have their own versions of fp-fast.h, which are arranged so they define FP_FAST_* if either the architecture-specific conditions are true or __FP_FAST_* are defined. After this refactoring, various bits/mathdef.h headers for architectures with long double = double are semantically identical to the generic version. The patch removes those headers that are redundant. (In fact two of the four removed were already redundant before this patch because they did use __FP_FAST_*.) Tested for x86_64 and x86, and compilation-only with build-many-glibcs.py. * bits/fp-fast.h: New file. * sysdeps/aarch64/bits/fp-fast.h: Likewise. * sysdeps/powerpc/bits/fp-fast.h: Likewise. * math/Makefile (headers): Add bits/fp-fast.h. * math/math.h: Include <bits/fp-fast.h>. * bits/mathdef.h (FP_FAST_FMA): Remove. (FP_FAST_FMAF): Likewise. (FP_FAST_FMAL): Likewise. * sysdeps/aarch64/bits/mathdef.h (FP_FAST_FMA): Likewise. (FP_FAST_FMAF): Likewise. * sysdeps/powerpc/bits/mathdef.h (FP_FAST_FMA): Likewise. (FP_FAST_FMAF): Likewise. * sysdeps/x86/bits/mathdef.h (FP_FAST_FMA): Likewise. (FP_FAST_FMAF): Likewise. (FP_FAST_FMAL): Likewise. * sysdeps/arm/bits/mathdef.h: Remove file. * sysdeps/hppa/fpu/bits/mathdef.h: Likewise. * sysdeps/sh/sh4/bits/mathdef.h: Likewise. * sysdeps/tile/bits/mathdef.h: Likewise.
* Partial ILP32 support for aarch64.Steve Ellcey2016-11-2823-156/+266
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * sysdeps/aarch64/crti.S: Add include of sysdep.h. (call_weak_fn): Use PTR_REG to get correct reg name in ILP32. * sysdeps/aarch64/dl-irel.h: Add include of sysdep.h. (elf_irela): Use AARCH64_R macro to get correct relocation in ILP32. * sysdeps/aarch64/dl-machine.h: Add include of sysdep.h. (elf_machine_load_address, RTLD_START, RTLD_START_1, RTLD_START, elf_machine_type_class, ELF_MACHINE_JMP_SLOT, elf_machine_rela, elf_machine_lazy_rel): Add ifdef's for ILP32 support. * sysdeps/aarch64/dl-tlsdesc.S (_dl_tlsdesc_return, _dl_tlsdesc_return_lazy, _dl_tlsdesc_dynamic, _dl_tlsdesc_resolve_hold): Extend pointers in ILP32, use PTR_REG to get correct reg name for ILP32. * sysdeps/aarch64/dl-trampoline.S (ip01): New Macro. (RELA_SIZE): New Macro. (_dl_runtime_resolve, _dl_runtime_profile): Use new macros and PTR_REG to support ILP32. * sysdeps/aarch64/jmpbuf-unwind.h (_JMPBUF_CFA_UNWINDS_ADJ): Add cast for ILP32 mode. * sysdeps/aarch64/memcmp.S (memcmp): Extend arg pointers for ILP32 mode. * sysdeps/aarch64/memcpy.S (memmove, memcpy): Ditto. * sysdeps/aarch64/memset.S (__memset): Ditto. * sysdeps/aarch64/strchr.S (strchr): Ditto. * sysdeps/aarch64/strchrnul.S (__strchrnul): Ditto. * sysdeps/aarch64/strcmp.S (strcmp): Ditto. * sysdeps/aarch64/strcpy.S (strcpy): Ditto. * sysdeps/aarch64/strlen.S (__strlen): Ditto. * sysdeps/aarch64/strncmp.S (strncmp): Ditto. * sysdeps/aarch64/strnlen.S (strnlen): Ditto. * sysdeps/aarch64/strrchr.S (strrchr): Ditto. * sysdeps/unix/sysv/linux/aarch64/clone.S: Ditto. * sysdeps/unix/sysv/linux/aarch64/setcontext.S (__setcontext): Ditto. * sysdeps/unix/sysv/linux/aarch64/swapcontext.S (__swapcontext): Ditto. * sysdeps/aarch64/__longjmp.S (__longjmp): Extend pointers in ILP32, change PTR_MANGLE call to use register numbers instead of names. * sysdeps/unix/sysv/linux/aarch64/getcontext.S (__getcontext): Ditto. * sysdeps/aarch64/setjmp.S (__sigsetjmp): Extend arg pointers for ILP32 mode, change PTR_MANGLE calls to use register numbers. * sysdeps/aarch64/start.S (_start): Ditto. * sysdeps/aarch64/nptl/bits/pthreadtypes.h (__PTHREAD_RWLOCK_INT_FLAGS_SHARED): New define. (__SIZEOF_PTHREAD_ATTR_T, __SIZEOF_PTHREAD_MUTEX_T, __SIZEOF_PTHREAD_MUTEXATTR_T, __SIZEOF_PTHREAD_COND_T, __SIZEOF_PTHREAD_COND_COMPAT_T, __SIZEOF_PTHREAD_CONDATTR_T, __SIZEOF_PTHREAD_RWLOCK_T, __SIZEOF_PTHREAD_RWLOCKATTR_T, __SIZEOF_PTHREAD_BARRIER_T, __SIZEOF_PTHREAD_BARRIERATTR_T): Make defined values dependent on __ILP32__. * sysdeps/aarch64/nptl/bits/semaphore.h (__SIZEOF_SEM_T): Change define. (sem_t): Change __align type. * sysdeps/aarch64/sysdep.h (AARCH64_R, PTR_REG, PTR_LOG_SIZE, DELOUSE, PTR_SIZE): New Macros. (LDST_PCREL, LDST_GLOBAL) Update to use PTR_REG. * sysdeps/unix/sysv/linux/aarch64/bits/fcntl.h (O_LARGEFILE): Set when in ILP32 mode. (F_GETLK64, F_SETLK64, F_SETLKW64): Only set in LP64 mode. * sysdeps/unix/sysv/linux/aarch64/dl-cache.h (DL_CACHE_DEFAULT_ID): Set elf flags for ILP32. (add_system_dir): Set ILP32 library directories. * sysdeps/unix/sysv/linux/aarch64/init-first.c (_libc_vdso_platform_setup): Set minimum kernel version for ILP32. * sysdeps/unix/sysv/linux/aarch64/ldconfig.h (SYSDEP_KNOWN_INTERPRETER_NAMES): Add ILP32 names. * sysdeps/unix/sysv/linux/aarch64/sigcontextinfo.h (GET_PC, SET_PC): New Macros. * sysdeps/unix/sysv/linux/aarch64/sysdep.h: Handle ILP32 pointers.
* Remove cached PID/TID in cloneAdhemerval Zanella2016-11-241-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch remove the PID cache and usage in current GLIBC code. Current usage is mainly used a performance optimization to avoid the syscall, however it adds some issues: - The exposed clone syscall will try to set pid/tid to make the new thread somewhat compatible with current GLIBC assumptions. This cause a set of issue with new workloads and usecases (such as BZ#17214 and [1]) as well for new internal usage of clone to optimize other algorithms (such as clone plus CLONE_VM for posix_spawn, BZ#19957). - The caching complexity also added some bugs in the past [2] [3] and requires more effort of each port to handle such requirements (for both clone and vfork implementation). - Caching performance gain in mainly on getpid and some specific code paths. The getpid performance leverage is questionable [4], either by the idea of getpid being a hotspot as for the getpid implementation itself (if it is indeed a justifiable hotspot a vDSO symbol could let to a much more simpler solution). Other usage is mainly for non usual code paths, such as pthread cancellation signal and handling. For thread creation (on stack allocation) the code simplification in fact adds some performance gain due the no need of transverse the stack cache and invalidate each element pid. Other thread usages will require a direct getpid syscall, such as cancellation/setxid signal, thread cancellation, thread fail path (at create_thread), and thread signal (pthread_kill and pthread_sigqueue). However these are hardly usual hotspots and I think adding a syscall is justifiable. It also simplifies both the clone and vfork arch-specific implementation. And by review each fork implementation there are some discrepancies that this patch also solves: - microblaze clone/vfork does not set/reset the pid/tid field - hppa uses the default vfork implementation that fallback to fork. Since vfork is deprecated I do not think we should bother with it. The patch also removes the TID caching in clone. My understanding for such semantic is try provide some pthread usage after a user program issue clone directly (as done by thread creation with CLONE_PARENT_SETTID and pthread tid member). However, as stated before in multiple discussions threads, GLIBC provides clone syscalls without further supporting all this semantics. I ran a full make check on x86_64, x32, i686, armhf, aarch64, and powerpc64le. For sparc32, sparc64, and mips I ran the basic fork and vfork tests from posix/ folder (on a qemu system). So it would require further testing on alpha, hppa, ia64, m68k, nios2, s390, sh, and tile (I excluded microblaze because it is already implementing the patch semantic regarding clone/vfork). [1] https://codereview.chromium.org/800183004/ [2] https://sourceware.org/ml/libc-alpha/2006-07/msg00123.html [3] https://sourceware.org/bugzilla/show_bug.cgi?id=15368 [4] http://yarchive.net/comp/linux/getpid_caching.html * sysdeps/nptl/fork.c (__libc_fork): Remove pid cache setting. * nptl/allocatestack.c (allocate_stack): Likewise. (__reclaim_stacks): Likewise. (setxid_signal_thread): Obtain pid through syscall. * nptl/nptl-init.c (sigcancel_handler): Likewise. (sighandle_setxid): Likewise. * nptl/pthread_cancel.c (pthread_cancel): Likewise. * sysdeps/unix/sysv/linux/pthread_kill.c (__pthread_kill): Likewise. * sysdeps/unix/sysv/linux/pthread_sigqueue.c (pthread_sigqueue): Likewise. * sysdeps/unix/sysv/linux/createthread.c (create_thread): Likewise. * sysdeps/unix/sysv/linux/getpid.c: Remove file. * nptl/descr.h (struct pthread): Change comment about pid value. * nptl/pthread_getattr_np.c (pthread_getattr_np): Remove thread pid assert. * sysdeps/unix/sysv/linux/pthread-pids.h (__pthread_initialize_pids): Do not set pid value. * nptl_db/td_ta_thr_iter.c (iterate_thread_list): Remove thread pid cache check. * nptl_db/td_thr_validate.c (td_thr_validate): Likewise. * sysdeps/aarch64/nptl/tcb-offsets.sym: Remove pid offset. * sysdeps/alpha/nptl/tcb-offsets.sym: Likewise. * sysdeps/arm/nptl/tcb-offsets.sym: Likewise. * sysdeps/hppa/nptl/tcb-offsets.sym: Likewise. * sysdeps/i386/nptl/tcb-offsets.sym: Likewise. * sysdeps/ia64/nptl/tcb-offsets.sym: Likewise. * sysdeps/m68k/nptl/tcb-offsets.sym: Likewise. * sysdeps/microblaze/nptl/tcb-offsets.sym: Likewise. * sysdeps/mips/nptl/tcb-offsets.sym: Likewise. * sysdeps/nios2/nptl/tcb-offsets.sym: Likewise. * sysdeps/powerpc/nptl/tcb-offsets.sym: Likewise. * sysdeps/s390/nptl/tcb-offsets.sym: Likewise. * sysdeps/sh/nptl/tcb-offsets.sym: Likewise. * sysdeps/sparc/nptl/tcb-offsets.sym: Likewise. * sysdeps/tile/nptl/tcb-offsets.sym: Likewise. * sysdeps/x86_64/nptl/tcb-offsets.sym: Likewise. * sysdeps/unix/sysv/linux/aarch64/clone.S: Remove pid and tid caching. * sysdeps/unix/sysv/linux/alpha/clone.S: Likewise. * sysdeps/unix/sysv/linux/arm/clone.S: Likewise. * sysdeps/unix/sysv/linux/hppa/clone.S: Likewise. * sysdeps/unix/sysv/linux/i386/clone.S: Likewise. * sysdeps/unix/sysv/linux/ia64/clone2.S: Likewise. * sysdeps/unix/sysv/linux/mips/clone.S: Likewise. * sysdeps/unix/sysv/linux/nios2/clone.S: Likewise. * sysdeps/unix/sysv/linux/powerpc/powerpc32/clone.S: Likewise. * sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S: Likewise. * sysdeps/unix/sysv/linux/s390/s390-32/clone.S: Likewise. * sysdeps/unix/sysv/linux/s390/s390-64/clone.S: Likewise. * sysdeps/unix/sysv/linux/sh/clone.S: Likewise. * sysdeps/unix/sysv/linux/sparc/sparc32/clone.S: Likewise. * sysdeps/unix/sysv/linux/sparc/sparc64/clone.S: Likewise. * sysdeps/unix/sysv/linux/tile/clone.S: Likewise. * sysdeps/unix/sysv/linux/x86_64/clone.S: Likewise. * sysdeps/unix/sysv/linux/aarch64/vfork.S: Remove pid set and reset. * sysdeps/unix/sysv/linux/alpha/vfork.S: Likewise. * sysdeps/unix/sysv/linux/arm/vfork.S: Likewise. * sysdeps/unix/sysv/linux/i386/vfork.S: Likewise. * sysdeps/unix/sysv/linux/ia64/vfork.S: Likewise. * sysdeps/unix/sysv/linux/m68k/clone.S: Likewise. * sysdeps/unix/sysv/linux/m68k/vfork.S: Likewise. * sysdeps/unix/sysv/linux/mips/vfork.S: Likewise. * sysdeps/unix/sysv/linux/nios2/vfork.S: Likewise. * sysdeps/unix/sysv/linux/powerpc/powerpc32/vfork.S: Likewise. * sysdeps/unix/sysv/linux/powerpc/powerpc64/vfork.S: Likewise. * sysdeps/unix/sysv/linux/s390/s390-32/vfork.S: Likewise. * sysdeps/unix/sysv/linux/s390/s390-64/vfork.S: Likewise. * sysdeps/unix/sysv/linux/sh/vfork.S: Likewise. * sysdeps/unix/sysv/linux/sparc/sparc32/vfork.S: Likewise. * sysdeps/unix/sysv/linux/sparc/sparc64/vfork.S: Likewise. * sysdeps/unix/sysv/linux/tile/vfork.S: Likewise. * sysdeps/unix/sysv/linux/x86_64/vfork.S: Likewise. * sysdeps/unix/sysv/linux/tst-clone2.c (f): Remove direct pthread struct access. (clone_test): Remove function. (do_test): Rewrite to take in consideration pid is not cached anymore.
* Refactor float_t, double_t information into bits/flt-eval-method.h.Joseph Myers2016-11-241-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At present, definitions of float_t and double_t are split among many bits/mathdef.h headers. For all but three architectures, these types are float and double. Furthermore, if you assume __FLT_EVAL_METHOD__ to be defined, that provides a more generic way of determining the correct values of these typedefs. Defining these typedefs more generally based on __FLT_EVAL_METHOD__ was previously proposed by Paul Eggert in <https://sourceware.org/ml/libc-alpha/2012-02/msg00002.html>. This patch refactors things in the way I proposed in <https://sourceware.org/ml/libc-alpha/2016-11/msg00745.html>. A new header bits/flt-eval-method.h defines a single macro, __GLIBC_FLT_EVAL_METHOD, which is then used by math.h to define float_t and double_t. The default is based on __FLT_EVAL_METHOD__ (although actually a default to 0 would have the same effect for current ports, because ports where values other than 0 or 16 are possible all have their own headers). To avoid changing the existing semantics in any case, including for compilers not defining __FLT_EVAL_METHOD__, architecture-specific files are then added for m68k, s390, x86 which replicate the existing semantics. At least with __FLT_EVAL_METHOD__ values possible with GCC, there should be no change to the choices of float_t and double_t for any supported configuration. Architecture maintainer notes: * m68k: sysdeps/m68k/m680x0/bits/flt-eval-method.h always defines __GLIBC_FLT_EVAL_METHOD to 2 to replicate the existing logic. But actually GCC defines __FLT_EVAL_METHOD__ to 0 if TARGET_68040. It might make sense to make the header prefer to base things on __FLT_EVAL_METHOD__ if defined, like the x86 version, and so make the choices of these types more accurate (with a NEWS entry as for the other changes to these types on particular architectures). * s390: sysdeps/s390/bits/flt-eval-method.h always defines __GLIBC_FLT_EVAL_METHOD to 1 to replicate the existing logic. As previously discussed, it might make sense in coordination with GCC to eliminate the historic mistake, avoid excess precision in the -fexcess-precision=standard case and make the typedefs match (with a NEWS entry, again). Tested for x86-64 and x86. Also did compilation-only testing with build-many-glibcs.py. * bits/flt-eval-method.h: New file. * sysdeps/m68k/m680x0/bits/flt-eval-method.h: Likewise. * sysdeps/s390/bits/flt-eval-method.h: Likewise. * sysdeps/x86/bits/flt-eval-method.h: Likewise. * math/Makefile (headers): Add bits/flt-eval-method.h. * math/math.h: Include <bits/flt-eval-method.h>. [__USE_ISOC99] (float_t): Define based on __GLIBC_FLT_EVAL_METHOD. [__USE_ISOC99] (double_t): Likewise. * bits/mathdef.h (float_t): Remove. (double_t): Likewise. * sysdeps/aarch64/bits/mathdef.h (float_t): Likewise. (double_t): Likewise. * sysdeps/alpha/bits/mathdef.h (float_t): Likewise. (double_t): Likewise. * sysdeps/arm/bits/mathdef.h (float_t): Likewise. (double_t): Likewise. * sysdeps/hppa/fpu/bits/mathdef.h (float_t): Likewise. (double_t): Likewise. * sysdeps/ia64/bits/mathdef.h (float_t): Likewise. (double_t): Likewise. * sysdeps/m68k/m680x0/bits/mathdef.h (float_t): Likewise. (double_t): Likewise. * sysdeps/mips/bits/mathdef.h (float_t): Likewise. (double_t): Likewise. * sysdeps/powerpc/bits/mathdef.h (float_t): Likewise. (double_t): Likewise. * sysdeps/s390/bits/mathdef.h (float_t): Likewise. (double_t): Likewise. * sysdeps/sh/sh4/bits/mathdef.h (float_t): Likewise. (double_t): Likewise. * sysdeps/sparc/bits/mathdef.h (float_t): Likewise. (double_t): Likewise. * sysdeps/tile/bits/mathdef.h (float_t): Likewise. (double_t): Likewise. * sysdeps/x86/bits/mathdef.h (float_t): Likewise. (double_t): Likewise.
* Regenerate ULPs for aarch64Siddhesh Poyarekar2016-11-101-2/+2
| | | | * sysdeps/aarch64/libm-test-ulps: Regenerated.
* nptl: Document the reason why __kind in pthread_mutex_t is part of the ABIFlorian Weimer2016-11-071-0/+2
|
* Do not hardcode platform names in manual/libm-err-tab.pl (bug 14139).Joseph Myers2016-11-041-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | manual/libm-err-tab.pl hardcodes a list of names for particular platforms (mapping from sysdeps directory name to friendly name for the manual). This goes against the principle of keeping information about individual platforms in their corresponding sysdeps directory, and the list is also very out-of-date regarding supported platforms and their corresponding sysdeps directories. This patch fixes this by adding a libm-test-ulps-name file alongside each libm-test-ulps file. The script then gets the friendly name from that file, which is required to exist, so it no longer needs to allow for the mapping being missing. Tested for x86_64. [BZ #14139] * manual/libm-err-tab.pl (%pplatforms): Initialize to empty. (find_files): Obtain platform name from libm-test-ulps-name and store in %pplatforms. (canonicalize_platform): Remove. (print_platforms): Use $pplatforms directly. (by_platforms): Do not allow for platforms missing from %pplatforms. * sysdeps/aarch64/libm-test-ulps-name: New file. * sysdeps/alpha/fpu/libm-test-ulps-name: Likewise. * sysdeps/arm/libm-test-ulps-name: Likewise. * sysdeps/generic/libm-test-ulps-name: Likewise. * sysdeps/hppa/fpu/libm-test-ulps-name: Likewise. * sysdeps/i386/fpu/libm-test-ulps-name: Likewise. * sysdeps/i386/i686/fpu/multiarch/libm-test-ulps-name: Likewise. * sysdeps/ia64/fpu/libm-test-ulps-name: Likewise. * sysdeps/m68k/coldfire/fpu/libm-test-ulps-name: Likewise. * sysdeps/m68k/m680x0/fpu/libm-test-ulps-name: Likewise. * sysdeps/microblaze/libm-test-ulps-name: Likewise. * sysdeps/mips/mips32/libm-test-ulps-name: Likewise. * sysdeps/mips/mips64/libm-test-ulps-name: Likewise. * sysdeps/nios2/libm-test-ulps-name: Likewise. * sysdeps/powerpc/fpu/libm-test-ulps-name: Likewise. * sysdeps/powerpc/nofpu/libm-test-ulps-name: Likewise. * sysdeps/s390/fpu/libm-test-ulps-name: Likewise. * sysdeps/sh/libm-test-ulps-name: Likewise. * sysdeps/sparc/fpu/libm-test-ulps-name: Likewise. * sysdeps/tile/libm-test-ulps-name: Likewise. * sysdeps/x86_64/fpu/libm-test-ulps-name: Likewise.
* Define wordsize.h macros everywhereSteve Ellcey2016-11-041-0/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | * bits/wordsize.h: Add documentation. * sysdeps/aarch64/bits/wordsize.h : New file * sysdeps/generic/stdint.h (PTRDIFF_MIN, PTRDIFF_MAX): Update definitions. (SIZE_MAX): Change ifdef to if in __WORDSIZE32_SIZE_ULONG check. * sysdeps/gnu/bits/utmp.h (__WORDSIZE_TIME64_COMPAT32): Check with #if instead of #ifdef. * sysdeps/gnu/bits/utmpx.h (__WORDSIZE_TIME64_COMPAT32): Ditto. * sysdeps/mips/bits/wordsize.h (__WORDSIZE32_SIZE_ULONG, __WORDSIZE32_PTRDIFF_LONG, __WORDSIZE_TIME64_COMPAT32): Add or change defines. * sysdeps/powerpc/powerpc32/bits/wordsize.h: Likewise. * sysdeps/powerpc/powerpc64/bits/wordsize.h: Likewise. * sysdeps/s390/s390-32/bits/wordsize.h: Likewise. * sysdeps/s390/s390-64/bits/wordsize.h: Likewise. * sysdeps/sparc/sparc32/bits/wordsize.h: Likewise. * sysdeps/sparc/sparc64/bits/wordsize.h: Likewise. * sysdeps/tile/tilegx/bits/wordsize.h: Likewise. * sysdeps/tile/tilepro/bits/wordsize.h: Likewise. * sysdeps/unix/sysv/linux/alpha/bits/wordsize.h: Likewise. * sysdeps/unix/sysv/linux/powerpc/bits/wordsize.h: Likewise. * sysdeps/unix/sysv/linux/sparc/bits/wordsize.h: Likewise. * sysdeps/wordsize-32/bits/wordsize.h: Likewise. * sysdeps/wordsize-64/bits/wordsize.h: Likewise. * sysdeps/x86/bits/wordsize.h: Likewise.
* An optimized memchr was missing for AArch64. This version is similar toWilco Dijkstra2016-11-041-0/+157
| | | | | | | | | strchr and is significantly faster than the C version. 2016-11-04 Wilco Dijkstra <wdijkstr@arm.com> Kevin Petit <kevin.petit@arm.com> * sysdeps/aarch64/memchr.S (__memchr): New file.
* Add femode_t functions: aarch64.Joseph Myers2016-09-072-0/+61
| | | | | | | | This patch adds AArch64 versions of fegetmode and fesetmode. Untested. * sysdeps/aarch64/fpu/fegetmode.c: New file. * sysdeps/aarch64/fpu/fesetmode.c: Likewise.
* Add femode_t functions.Joseph Myers2016-09-071-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TS 18661-1 defines a type femode_t to represent the set of dynamic floating-point control modes (such as the rounding mode and trap enablement modes), and functions fegetmode and fesetmode to manipulate those modes (without affecting other state such as the raised exception flags) and a corresponding macro FE_DFL_MODE. This patch series implements those interfaces for glibc. This first patch adds the architecture-independent pieces, the x86 and x86_64 implementations, and the <bits/fenv.h> and ABI baseline updates for all architectures so glibc keeps building and passing the ABI tests on all architectures. Subsequent patches add the fegetmode and fesetmode implementations for other architectures. femode_t is generally an integer type - the same type as fenv_t, or as the single element of fenv_t where fenv_t is a structure containing a single integer (or the single relevant element, where it has elements for both status and control registers) - except where architecture properties or consistency with the fenv_t implementation indicate otherwise. FE_DFL_MODE follows FE_DFL_ENV in whether it's a magic pointer value (-1 cast to const femode_t *), a value that can be distinguished from valid pointers by its high bits but otherwise contains a representation of the desired register contents, or a pointer to a constant variable (the powerpc case; __fe_dfl_mode is added as an exported constant object, an alias to __fe_dfl_env). Note that where architectures (that share a register between control and status bits) gain definitions of new floating-point control or status bits in future, the implementations of fesetmode for those architectures may need updating (depending on whether the new bits are control or status bits and what the implementation does with previously unknown bits), just like existing implementations of <fenv.h> functions that take care not to touch reserved bits may need updating when the set of reserved bits changes. (As any new bits are outside the scope of ISO C, that's just a quality-of-implementation issue for supporting them, not a conformance issue.) As with fenv_t, femode_t should properly include any software DFP rounding mode (and for both fenv_t and femode_t I'd consider that fragment of DFP support appropriate for inclusion in glibc even in the absence of the rest of libdfp; hardware DFP rounding modes should already be included if the definitions of which bits are status / control bits are correct). Tested for x86_64, x86, mips64 (hard float, and soft float to test the fallback version), arm (hard float) and powerpc (hard float, soft float and e500). Other architecture versions are untested. * math/fegetmode.c: New file. * math/fesetmode.c: Likewise. * sysdeps/i386/fpu/fegetmode.c: Likewise. * sysdeps/i386/fpu/fesetmode.c: Likewise. * sysdeps/x86_64/fpu/fegetmode.c: Likewise. * sysdeps/x86_64/fpu/fesetmode.c: Likewise. * math/fenv.h: Update comment on inclusion of <bits/fenv.h>. [__GLIBC_USE (IEC_60559_BFP_EXT)] (fegetmode): New function declaration. [__GLIBC_USE (IEC_60559_BFP_EXT)] (fesetmode): Likewise. * bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)] (femode_t): New typedef. [__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro. * sysdeps/aarch64/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)] (femode_t): New typedef. [__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro. * sysdeps/alpha/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)] (femode_t): New typedef. [__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro. * sysdeps/arm/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)] (femode_t): New typedef. [__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro. * sysdeps/hppa/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)] (femode_t): New typedef. [__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro. * sysdeps/ia64/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)] (femode_t): New typedef. [__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro. * sysdeps/m68k/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)] (femode_t): New typedef. [__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro. * sysdeps/microblaze/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)] (femode_t): New typedef. [__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro. * sysdeps/mips/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)] (femode_t): New typedef. [__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro. * sysdeps/nios2/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)] (femode_t): New typedef. [__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro. * sysdeps/powerpc/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)] (femode_t): New typedef. [__GLIBC_USE (IEC_60559_BFP_EXT)] (__fe_dfl_mode): New variable declaration. [__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro. * sysdeps/s390/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)] (femode_t): New typedef. [__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro. * sysdeps/sh/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)] (femode_t): New typedef. [__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro. * sysdeps/sparc/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)] (femode_t): New typedef. [__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro. * sysdeps/tile/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)] (femode_t): New typedef. [__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro. * sysdeps/x86/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)] (femode_t): New typedef. [__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro. * manual/arith.texi (FE_DFL_MODE): Document macro. (fegetmode): Document function. (fesetmode): Likewise. * math/Versions (fegetmode): New libm symbol at version GLIBC_2.25. (fesetmode): Likewise. * math/Makefile (libm-support): Add fegetmode and fesetmode. (tests): Add test-femode and test-femode-traps. * math/test-femode-traps.c: New file. * math/test-femode.c: Likewise. * sysdeps/powerpc/fpu/fenv_const.c (__fe_dfl_mode): Declare as alias for __fe_dfl_env. * sysdeps/powerpc/nofpu/fenv_const.c (__fe_dfl_mode): Likewise. * sysdeps/powerpc/powerpc32/e500/nofpu/fenv_const.c (__fe_dfl_mode): Likewise. * sysdeps/powerpc/Versions (__fe_dfl_mode): New libm symbol at version GLIBC_2.25. * sysdeps/nacl/libm.abilist: Update. * sysdeps/unix/sysv/linux/aarch64/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/alpha/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/arm/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/hppa/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/i386/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/ia64/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/microblaze/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/mips/mips32/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/mips/mips64/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/nios2/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/powerpc/powerpc64/libm-le.abilist: Likewise. * sysdeps/unix/sysv/linux/powerpc/powerpc64/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/sh/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/tile/tilegx/tilegx32/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/tile/tilegx/tilegx64/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/tile/tilepro/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Likewise.
* Make common fmax implementation generic.Paul E. Murphy2016-09-013-3/+3
| | | | | Also update aarch64 to ensure the correct s_fmin.c is included. The include order favors including the generated copy.
* Add fesetexcept: aarch64.Joseph Myers2016-08-161-0/+34
| | | | | | This patch adds an AArch64 version of fesetexcept. Untested. * sysdeps/aarch64/fpu/fesetexcept.c: New file.
* [AArch64] Update libm-test-ulpsSzabolcs Nagy2016-07-211-4/+4
| | | | | | | | This partly reverts commit f8238ae3c7701dbd9c04028861916de64e578114 that regenerated the ulps, to make the max ulps good for gcc-5, gcc-6 and gcc-trunk as well. * sysdeps/aarch64/libm-test-ulps: Updated.
* [AArch64] Regenerate libm-test-ulpsSzabolcs Nagy2016-07-181-10/+10
| | | | * sysdeps/aarch64/libm-test-ulps: Regenerated.
* [AArch64] Fix libc internal asm profiling codeSzabolcs Nagy2016-07-112-2/+35
| | | | | | | | | | | | | | | | | | | | | When glibc is built with --enable-profile, the ENTRY of asm functions includes CALL_MCOUNT for profiling. (matters for binaries static linked against libc_p.a.) CALL_MCOUNT did not save/restore argument registers around the _mcount call so it clobbered them. (it is enough to only save/restore the arguments passed to a given asm function, but that would be too many asm changes so it is simpler to always save all argument registers in this macro.) float args are not saved: mcount does not clobber the float regs and currently no asm function takes float arguments anyway. [BZ #18707] * sysdeps/aarch64/Makefile (CFLAGS-mcount.c): Add -mgeneral-regs-only. * sysdeps/aarch64/sysdep.h (CALL_MCOUNT): Save argument registers.
* Remove atomic_compare_and_exchange_bool_rel.Torvald Riegel2016-06-241-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | atomic_compare_and_exchange_bool_rel and catomic_compare_and_exchange_bool_rel are removed and replaced with the new C11-like atomic_compare_exchange_weak_release. The concurrent code in nscd/cache.c has not been reviewed yet, so this patch does not add detailed comments. * nscd/cache.c (cache_add): Use new C11-like atomic operation instead of atomic_compare_and_exchange_bool_rel. * nptl/pthread_mutex_unlock.c (__pthread_mutex_unlock_full): Likewise. * include/atomic.h (atomic_compare_and_exchange_bool_rel, catomic_compare_and_exchange_bool_rel): Remove. * sysdeps/aarch64/atomic-machine.h (atomic_compare_and_exchange_bool_rel): Likewise. * sysdeps/alpha/atomic-machine.h (atomic_compare_and_exchange_bool_rel): Likewise. * sysdeps/arm/atomic-machine.h (atomic_compare_and_exchange_bool_rel): Likewise. * sysdeps/mips/atomic-machine.h (atomic_compare_and_exchange_bool_rel): Likewise. * sysdeps/tile/atomic-machine.h (atomic_compare_and_exchange_bool_rel): Likewise.
* This patch further tunes memcpy - avoid one branch for sizes 1-3,Wilco Dijkstra2016-06-221-24/+32
| | | | | | | add a prefetch and improve small copies that are exact powers of 2. * sysdeps/aarch64/memcpy.S (memcpy): Further tuning for performance.
* Add a simple rawmemchr implementation. Use strlen for rawmemchr(s, '\0') as itWilco Dijkstra2016-06-202-2/+45
| | | | | | | | is the fastest way to search for '\0'. Otherwise use memchr with an infinite size. This is 3x faster on benchtests for large sizes. Passes GLIBC tests. * sysdeps/aarch64/rawmemchr.S (__rawmemchr): New file. * sysdeps/aarch64/strlen.S (__strlen): Change to __strlen to avoid PLT.
* This is an optimized memcpy/memmove for AArch64. Copies are split into 3 mainWilco Dijkstra2016-06-202-452/+209
| | | | | | | | | | | | | | | cases: small copies of up to 16 bytes, medium copies of 17..96 bytes which are fully unrolled. Large copies of more than 96 bytes align the destination and use an unrolled loop processing 64 bytes per iteration. In order to share code with memmove, small and medium copies read all data before writing, allowing any kind of overlap. All memmoves except for the large backwards case fall into memcpy for optimal performance. On a random copy test memcpy/memmove are 40% faster on Cortex-A57 and 28% on Cortex-A53. * sysdeps/aarch64/memcpy.S (memcpy): Rewrite of optimized memcpy and memmove. * sysdeps/aarch64/memmove.S (memmove): Remove memmove code (merged into memcpy.S).
* elf: Consolidate machine-agnostic DTV definitions in <dl-dtv.h>Florian Weimer2016-06-202-14/+1
| | | | | Identical definitions of dtv_t and TLS_DTV_UNALLOCATED were repeated for all architectures using DTVs.
* This is an optimized memset for AArch64. Memset is split into 4 main cases:Wilco Dijkstra2016-05-121-198/+161
| | | | | | | | | | | | | | small sets of up to 16 bytes, medium of 16..96 bytes which are fully unrolled. Large memsets of more than 96 bytes align the destination and use an unrolled loop processing 64 bytes per iteration. Memsets of zero of more than 256 use the dc zva instruction, and there are faster versions for the common ZVA sizes 64 or 128. STP of Q registers is used to reduce codesize without loss of performance. The speedup on test-memset is 1% on Cortex-A57 and 8% on Cortex-A53. * sysdeps/aarch64/memset.S (__memset): Rewrite of optimized memset.
* Add _STRING_INLINE_unaligned and string_private.hH.J. Lu2016-02-182-2/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As discussed in https://sourceware.org/ml/libc-alpha/2015-10/msg00403.html the setting of _STRING_ARCH_unaligned currently controls the external GLIBC ABI as well as selecting the use of unaligned accesses withing GLIBC. Since _STRING_ARCH_unaligned was recently changed for AArch64, this would potentially break the ABI in GLIBC 2.23, so split the uses and add _STRING_INLINE_unaligned to select the string ABI. This setting must be fixed for each target, while _STRING_ARCH_unaligned may be changed from release to release. _STRING_ARCH_unaligned is used unconditionally in glibc. But <bits/string.h>, which defines _STRING_ARCH_unaligned, isn't included with -Os. Since _STRING_ARCH_unaligned is internal to glibc and may change between glibc releases, it should be made private to glibc. _STRING_ARCH_unaligned should defined in the new string_private.h heade file which is included unconditionally from internal <string.h> for glibc build. [BZ #19462] * bits/string.h (_STRING_ARCH_unaligned): Renamed to ... (_STRING_INLINE_unaligned): This. * include/string.h: Include <string_private.h>. * string/bits/string2.h: Replace _STRING_ARCH_unaligned with _STRING_INLINE_unaligned. * sysdeps/aarch64/bits/string.h (_STRING_ARCH_unaligned): Removed. (_STRING_INLINE_unaligned): New. * sysdeps/aarch64/string_private.h: New file. * sysdeps/generic/string_private.h: Likewise. * sysdeps/m68k/m680x0/m68020/string_private.h: Likewise. * sysdeps/s390/string_private.h: Likewise. * sysdeps/x86/string_private.h: Likewise. * sysdeps/m68k/m680x0/m68020/bits/string.h (_STRING_ARCH_unaligned): Renamed to ... (_STRING_INLINE_unaligned): This. * sysdeps/s390/bits/string.h (_STRING_ARCH_unaligned): Renamed to ... (_STRING_INLINE_unaligned): This. * sysdeps/sparc/bits/string.h (_STRING_ARCH_unaligned): Renamed to ... (_STRING_INLINE_unaligned): This. * sysdeps/x86/bits/string.h (_STRING_ARCH_unaligned): Renamed to ... (_STRING_INLINE_unaligned): This.
* Update copyright dates with scripts/update-copyrights.Joseph Myers2016-01-04101-101/+101
|
* [AArch64] Regenerate libm-test-ulpsSzabolcs Nagy2015-12-011-50/+54
| | | | * sysdeps/aarch64/libm-test-ulps: Regenerated.
* Enable _STRING_ARCH_unaligned on AArch64.Wilco Dijkstra2015-11-101-0/+24
| | | | | * sysdeps/aarch64/bits/string.h: New file. (_STRING_ARCH_unaligned): Define.
* Regenerate aarch64 libm-test-ulpsSzabolcs Nagy2015-09-241-204/+230
| | | | * sysdeps/aarch64/libm-test-ulps: Regenerated.
* Move bits/atomic.h to atomic-machine.h (bug 14912).Joseph Myers2015-09-111-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It was noted in <https://sourceware.org/ml/libc-alpha/2012-09/msg00305.html> that the bits/*.h naming scheme should only be used for installed headers. This patch renames bits/atomic.h to atomic-machine.h to follow that convention. This is the only change in this series that needs to change the filename rather than simply removing a directory level (because both atomic.h and bits/atomic.h exist at present). Tested for x86_64 (testsuite, and that installed stripped shared libraries are unchanged by the patch). [BZ #14912] * sysdeps/aarch64/bits/atomic.h: Move to ... * sysdeps/aarch64/atomic-machine.h: ...here. (_AARCH64_BITS_ATOMIC_H): Rename macro to _AARCH64_ATOMIC_MACHINE_H. * sysdeps/alpha/bits/atomic.h: Move to ... * sysdeps/alpha/atomic-machine.h: ...here. * sysdeps/arm/bits/atomic.h: Move to ... * sysdeps/arm/atomic-machine.h: ...here. Update comments. * bits/atomic.h: Move to ... * sysdeps/generic/atomic-machine.h: ...here. (_BITS_ATOMIC_H): Rename macro to _ATOMIC_MACHINE_H. * sysdeps/i386/bits/atomic.h: Move to ... * sysdeps/i386/atomic-machine.h: ...here. * sysdeps/ia64/bits/atomic.h: Move to ... * sysdeps/ia64/atomic-machine.h: ...here. * sysdeps/m68k/coldfire/bits/atomic.h: Move to ... * sysdeps/m68k/coldfire/atomic-machine.h: ...here. (_BITS_ATOMIC_H): Rename macro to _ATOMIC_MACHINE_H. * sysdeps/m68k/m680x0/m68020/bits/atomic.h: Move to ... * sysdeps/m68k/m680x0/m68020/atomic-machine.h: ...here. * sysdeps/microblaze/bits/atomic.h: Move to ... * sysdeps/microblaze/atomic-machine.h: ...here. * sysdeps/mips/bits/atomic.h: Move to ... * sysdeps/mips/atomic-machine.h: ...here. (_MIPS_BITS_ATOMIC_H): Rename macro to _MIPS_ATOMIC_MACHINE_H. * sysdeps/powerpc/bits/atomic.h: Move to ... * sysdeps/powerpc/atomic-machine.h: ...here. Update comments. * sysdeps/powerpc/powerpc32/bits/atomic.h: Move to ... * sysdeps/powerpc/powerpc32/atomic-machine.h: ...here. Update comments. Include <atomic-machine.h> instead of <bits/atomic.h>. * sysdeps/powerpc/powerpc64/bits/atomic.h: Move to ... * sysdeps/powerpc/powerpc64/atomic-machine.h: ...here. Include <atomic-machine.h> instead of <bits/atomic.h>. * sysdeps/s390/bits/atomic.h: Move to ... * sysdeps/s390/atomic-machine.h: ...here. * sysdeps/sparc/sparc32/bits/atomic.h: Move to ... * sysdeps/sparc/sparc32/atomic-machine.h: ...here. (_BITS_ATOMIC_H): Rename macro to _ATOMIC_MACHINE_H. * sysdeps/sparc/sparc32/sparcv9/bits/atomic.h: Move to ... * sysdeps/sparc/sparc32/sparcv9/atomic-machine.h: ...here. * sysdeps/sparc/sparc64/bits/atomic.h: Move to ... * sysdeps/sparc/sparc64/atomic-machine.h: ...here. * sysdeps/tile/bits/atomic.h: Move to ... * sysdeps/tile/atomic-machine.h: ...here. * sysdeps/tile/tilegx/bits/atomic.h: Move to ... * sysdeps/tile/tilegx/atomic-machine.h: ...here. Include <sysdeps/tile/atomic-machine.h> instead of <sysdeps/tile/bits/atomic.h>. (_BITS_ATOMIC_H): Rename macro to _ATOMIC_MACHINE_H. * sysdeps/tile/tilepro/bits/atomic.h: Move to ... * sysdeps/tile/tilepro/atomic-machine.h: ...here. Include <sysdeps/tile/atomic-machine.h> instead of <sysdeps/tile/bits/atomic.h>. (_BITS_ATOMIC_H): Rename macro to _ATOMIC_MACHINE_H. * sysdeps/unix/sysv/linux/arm/bits/atomic.h: Move to ... * sysdeps/unix/sysv/linux/arm/atomic-machine.h: ...here. Include <sysdeps/arm/atomic-machine.h> instead of <sysdeps/arm/bits/atomic.h>. * sysdeps/unix/sysv/linux/hppa/bits/atomic.h: Move to ... * sysdeps/unix/sysv/linux/hppa/atomic-machine.h: ...here. (_BITS_ATOMIC_H): Rename macro to _ATOMIC_MACHINE_H. * sysdeps/unix/sysv/linux/m68k/coldfire/bits/atomic.h: Move to ... * sysdeps/unix/sysv/linux/m68k/coldfire/atomic-machine.h: ...here. (_BITS_ATOMIC_H): Rename macro to _ATOMIC_MACHINE_H. * sysdeps/unix/sysv/linux/nios2/bits/atomic.h: Move to ... * sysdeps/unix/sysv/linux/nios2/atomic-machine.h: ...here. (_NIOS2_BITS_ATOMIC_H): Rename macro to _NIOS2_ATOMIC_MACHINE_H. * sysdeps/unix/sysv/linux/sh/bits/atomic.h: Move to ... * sysdeps/unix/sysv/linux/sh/atomic-machine.h: ...here. * sysdeps/x86_64/bits/atomic.h: Move to ... * sysdeps/x86_64/atomic-machine.h: ...here. * include/atomic.h: Include <atomic-machine.h> instead of <bits/atomic.h>.
* Rename bits/linkmap.h to linkmap.h (bug 14912).Joseph Myers2015-09-041-0/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It was noted in <https://sourceware.org/ml/libc-alpha/2012-09/msg00305.html> that the bits/*.h naming scheme should only be used for installed headers. This patch renames bits/linkmap.h to plain linkmap.h to follow that convention. Tested for x86_64 (testsuite, and that installed stripped shared libraries are unchanged by the patch). [BZ #14912] * bits/linkmap.h: Move to ... * sysdeps/generic/linkmap.h: ...here. * sysdeps/aarch64/bits/linkmap.h: Move to ... * sysdeps/aarch64/linkmap.h: ...here. * sysdeps/arm/bits/linkmap.h: Move to ... * sysdeps/arm/linkmap.h: ...here. * sysdeps/hppa/bits/linkmap.h: Move to ... * sysdeps/hppa/linkmap.h: ...here. * sysdeps/ia64/bits/linkmap.h: Move to ... * sysdeps/ia64/linkmap.h: ...here. * sysdeps/mips/bits/linkmap.h: Move to ... * sysdeps/mips/linkmap.h: ...here. * sysdeps/s390/bits/linkmap.h: Move to ... * sysdeps/s390/linkmap.h: ...here. * sysdeps/sh/bits/linkmap.h: Move to ... * sysdeps/sh/linkmap.h: ...here. * sysdeps/x86/bits/linkmap.h: Move to ... * sysdeps/x86/linkmap.h: ...here. * include/link.h: Include <linkmap.h> instead of <bits/linkmap.h>.
* 2015-08-24 Wilco Dijkstra <wdijkstr@arm.com>Wilco Dijkstra2015-08-241-27/+0
| | | | * sysdeps/aarch64/bzero.S (__bzero): Remove.
* 2015-08-24 Wilco Dijkstra <wdijkstr@arm.com>Wilco Dijkstra2015-08-241-8/+4
| | | | | | * sysdeps/aarch64/fpu/math_private.h (libc_feholdsetround_aarch64_ctx): Unconditionally set __fpcr to avoid uninialized warning. (libc_feholdsetround_noex_aarch64_ctx): Likewise.
* Improve feenableexcept performance - avoid an unnecessary FPCR read in caseWilco Dijkstra2015-08-051-9/+7
| | | | the FPCR does not change. Also improve the logic of the return value.
* Improve fesetenv performance by avoiding unnecessary FPSR/FPCR reads/writes.Wilco Dijkstra2015-08-051-17/+23
| | | | | | It uses the same logic as the ARM version. The common case removes 1 FPSR and 1 FPCR read. For FE_DFL_ENV and FE_NOMASK_ENV a FPCR read is avoided in case the FPCR does not change.
* [AArch64][BZ #17711] Fix extern protected data handlingSzabolcs Nagy2015-07-242-1/+4
| | | | | | | | | Fixes elf/tst-protected1a and elf/tst-protected1b tests. Depends on a gcc patch that makes protected visibility data non-local: https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01871.html and on a binutils patch so R_*_GLOB_DAT relocs are used for it: https://sourceware.org/ml/binutils/2015-07/msg00246.html
* Add AArch64 versions of math_opt_barrier and math_force_eval that avoid ↵Wilco Dijkstra2015-07-131-0/+5
| | | | going via memory.
* Optimize the strlen implementation by using a page cross check and a fast checkWilco Dijkstra2015-07-131-65/+165
| | | | | | for nul bytes which reverts to separate loop when a non-ASCII char is encountered. Speedup on test-strlen is ~10%, long ASCII strings are processed ~60% faster, and on random tests it is ~80% better.
* Inline __ieee754_sqrt and __ieee754_sqrtf. Also add external definitions.Wilco Dijkstra2015-07-063-0/+72
|
* Regenerate aarch64 libm-test-ulpsSzabolcs Nagy2015-07-021-32/+128
| | | | * sysdeps/aarch64/libm-test-ulps: Regenerated.
* [AArch64] Fix cfi_adjust_cfa_offset usage in dl-tlsdesc.SSzabolcs Nagy2015-06-171-5/+5
| | | | | | | | | Some of the cfi annotations used incorrect sign. * sysdeps/aarch64/dl-tlsdesc.S (_dl_tlsdesc_return_lazy): Fix cfi_adjust_cfa_offset argument. (_dl_tlsdesc_undefweak, _dl_tlsdesc_dynamic): Likewise. (_dl_tlsdesc_resolve_rela, _dl_tlsdesc_resolve_hold): Likewise.
* [BZ 18034][AArch64] Lazy TLSDESC relocation data race fixSzabolcs Nagy2015-06-173-12/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Lazy TLSDESC initialization needs to be synchronized with concurrent TLS accesses. The TLS descriptor contains a function pointer (entry) and an argument that is accessed from the entry function. With lazy initialization the first call to the entry function updates the entry and the argument to their final value. A final entry function must make sure that it accesses an initialized argument, this needs synchronization on systems with weak memory ordering otherwise the writes of the first call can be observed out of order. There are at least two issues with the current code: tlsdesc.c (i386, x86_64, arm, aarch64) uses volatile memory accesses on the write side (in the initial entry function) instead of C11 atomics. And on systems with weak memory ordering (arm, aarch64) the read side synchronization is missing from the final entry functions (dl-tlsdesc.S). This patch only deals with aarch64. * Write side: Volatile accesses were replaced with C11 relaxed atomics, and a release store was used for the initialization of entry so the read side can synchronize with it. * Read side: TLS access generated by the compiler and an entry function code is roughly ldr x1, [x0] // load the entry blr x1 // call it entryfunc: ldr x0, [x0,#8] // load the arg ret Various alternatives were considered to force the ordering in the entry function between the two loads: (1) barrier entryfunc: dmb ishld ldr x0, [x0,#8] (2) address dependency (if the address of the second load depends on the result of the first one the ordering is guaranteed): entryfunc: ldr x1,[x0] and x1,x1,#8 orr x1,x1,#8 ldr x0,[x0,x1] (3) load-acquire (ARMv8 instruction that is ordered before subsequent loads and stores) entryfunc: ldar xzr,[x0] ldr x0,[x0,#8] Option (1) is the simplest but slowest (note: this runs at every TLS access), options (2) and (3) do one extra load from [x0] (same address loads are ordered so it happens-after the load on the call site), option (2) clobbers x1 which is problematic because existing gcc does not expect that, so approach (3) was chosen. A new _dl_tlsdesc_return_lazy entry function was introduced for lazily relocated static TLS, so non-lazy static TLS can avoid the synchronization cost. [BZ #18034] * sysdeps/aarch64/dl-tlsdesc.h (_dl_tlsdesc_return_lazy): Declare. * sysdeps/aarch64/dl-tlsdesc.S (_dl_tlsdesc_return_lazy): Define. (_dl_tlsdesc_undefweak): Guarantee TLSDESC entry and argument load-load ordering using ldar. (_dl_tlsdesc_dynamic): Likewise. (_dl_tlsdesc_return_lazy): Likewise. * sysdeps/aarch64/tlsdesc.c (_dl_tlsdesc_resolve_rela_fixup): Use relaxed atomics instead of volatile and synchronize with release store. (_dl_tlsdesc_resolve_hold_fixup): Use relaxed atomics instead of volatile. * elf/tlsdeschtab.h (_dl_tlsdesc_resolve_early_return_p): Likewise.
* Use libc_hidden_proto / libc_hidden_def with __strnlen.Joseph Myers2015-06-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Various code in glibc uses __strnlen instead of strnlen for namespace reasons. However, __strnlen does not use libc_hidden_proto / libc_hidden_def (as is normally done for any function defined and called within the same library, whether or not exported from the library and whatever namespace it is in), so the compiler does not know that those calls are to a function within libc. This patch uses libc_hidden_proto / libc_hidden_def with __strnlen. On x86_64, it makes no difference to the installed stripped shared libraries. On 32-bit x86, it causes __strnlen calls to go to the same place as strnlen calls (the fallback strnlen implementation), rather than through a PLT entry for the strnlen IFUNC; I'm not sure of the logic behind when calls from within libc should use IFUNCs versus when they should go direct to a particular function implementation, but clearly it doesn't make sense for strnlen and __strnlen to be handled differently in this regard. Tested for x86_64 and x86 (testsuite, and comparison of installed shared libraries as described above). * string/strnlen.c [!STRNLEN] (__strnlen): Use libc_hidden_def. * include/string.h (__strnlen): Use libc_hidden_proto. * sysdeps/aarch64/strnlen.S (__strnlen): Use libc_hidden_def. * sysdeps/i386/i686/multiarch/strnlen-c.c [SHARED] (libc_hidden_def): Define __GI___strnlen as well as __GI_strnlen. * sysdeps/powerpc/powerpc32/power4/multiarch/strnlen-power7.S (libc_hidden_def): Undefine and redefine. * sysdeps/powerpc/powerpc32/power4/multiarch/strnlen-ppc32.c [SHARED] (libc_hidden_def): Define __GI___strnlen as well as __GI_strnlen. * sysdeps/powerpc/powerpc32/power7/strnlen.S (__strnlen): Use libc_hidden_def. * sysdeps/tile/tilegx/strnlen.c (__strnlen): Likewise.
* 2015-06-02 Szabolcs Nagy <szabolcs.nagy@arm.com>Wilco Dijkstra2015-06-021-26/+26
| | | | * sysdeps/aarch64/libm-test-ulps: Update.