| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
RETURN_ADDRESS is used at several places in glibc to mean a valid
code address of the call site, but with pac-ret it may contain a
pointer authentication code (PAC), so its definition is adjusted.
This is gcc PR target/94891: __builtin_return_address should not
expose signed pointers to user code where it can cause ABI issues.
In glibc RETURN_ADDRESS is only changed if it is built with pac-ret.
There is no detection for the specific gcc issue because it is
hard to test and the additional xpac does not cause problems.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently gcc -pg -mbranch-protection=pac-ret passes signed return
address to _mcount, so _mcount now has to always strip pac from the
frompc since that's from user code that may be built with pac-ret.
This is gcc PR target/94791: signed pointers should not escape and get
passed across extern call boundaries, since that's an ABI break, but
because existing gcc has this issue we work it around in glibc until
that is resolved. This is compatible with a fixed gcc and it is a nop
on systems without PAuth support. The bug was introduced in gcc-7 with
-msign-return-address=non-leaf|all support which in gcc-9 got renamed
to -mbranch-protection=pac-ret|pac-ret+leaf|standard.
strip_pac uses inline asm instead of __builtin_aarch64_xpaclri since
that is not a documented api and not available in all supported gccs.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use return address signing in assembly files for functions that save
LR when pac-ret is enabled in the compiler.
The GNU property note for PAC-RET is not meaningful to the dynamic
linker so it is not strictly required, but it may be used to track
the security property of binaries. (The PAC-RET property is only set
if BTI is set too because BTI implies working GNU property support.)
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Return address signing requires unwinder support, which is
present in libgcc since >=gcc-7, however due to bugs the
support may be broken in <gcc-10 (and similarly there may
be issues in custom unwinders), so pac-ret is not always
safe to use. So in assembly code glibc should only use
pac-ret if the compiler uses it too. Unfortunately there
is no predefined feature macro for it set by the compiler
so pac-ret is inferred from the code generation.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When glibc is built with branch protection (i.e. with a gcc configured
with --enable-standard-branch-protection), all glibc binaries should
be BTI compatible and marked as such.
It is easy to link BTI incompatible objects by accident and this is
silent currently which is usually not the expectation, so this is
changed into a link error. (There is no linker flag for failing on
BTI incompatible inputs so all warnings are turned into fatal errors
outside the test system when building glibc with branch protection.)
Unfortunately, outlined atomic functions are not BTI compatible in
libgcc (PR libgcc/96001), so to build glibc with current gcc use
'CC=gcc -mno-outline-atomics', this should be fixed in libgcc soon
and then glibc can be built and tested without such workarounds.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Binaries can opt-in to using BTI via an ELF object file marking.
The dynamic linker has to then mprotect the executable segments
with PROT_BTI. In case of static linked executables or in case
of the dynamic linker itself, PROT_BTI protection is done by the
operating system.
On AArch64 glibc uses PT_GNU_PROPERTY instead of PT_NOTE to check
the properties of a binary because PT_NOTE can be unreliable with
old linkers (old linkers just append the notes of input objects
together and add them to the output without checking them for
consistency which means multiple incompatible GNU property notes
can be present in PT_NOTE).
BTI property is handled in the loader even if glibc is not built
with BTI support, so in theory user code can be BTI protected
independently of glibc. In practice though user binaries are not
marked with the BTI property if glibc has no support because the
static linked libc objects (crt files, libc_nonshared.a) are
unmarked.
This patch relies on Linux userspace API that is not yet in a
linux release but in v5.8-rc1 so scheduled to be in Linux 5.8.
Co-authored-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Tailcalls must use x16 or x17 for the indirect branch instruction
to be compatible with code that uses BTI c at function entries.
(Other forms of indirect branches can only land on BTI j.)
Also added a BTI c at the ELF entry point of rtld, this is not
strictly necessary since the kernel does not use indirect branch
to get there, but it seems safest once building glibc itself with
BTI is supported.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To enable building glibc with branch protection, assembly code
needs BTI landing pads and ELF object file markings in the form
of a GNU property note.
The landing pads are unconditionally added to all functions that
may be indirectly called. When the code segment is not mapped
with PROT_BTI these instructions are nops. They are kept in the
code when BTI is not supported so that the layout of performance
critical code is unchanged across configurations.
The GNU property notes are only added when there is support for
BTI in the toolchain, because old binutils does not handle the
notes right. (Does not know how to merge them nor to put them in
PT_GNU_PROPERTY segment instead of PT_NOTE, and some versions
of binutils emit warnings about the unknown GNU property. In
such cases the produced libc binaries would not have valid
ELF marking so BTI would not be enabled.)
Note: functions using ENTRY or ENTRY_ALIGN now start with an
additional BTI c, so alignment of the following code changes,
but ENTRY_ALIGN_AND_PAD was fixed so there is no change to the
existing code layout. Some string functions may need to be
tuned for optimal performance after this commit.
Co-authored-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
|
|
|
|
|
|
| |
The compiler can add required elf markings based on CFLAGS
but the assembler cannot, so using C code for empty files
creates less of a maintenance problem.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Check BTI support in the compiler and linker. The check also
requires READELF that understands the BTI GNU property note.
It is expected to succeed with gcc >=gcc-9 configured with
--enable-standard-branch-protection and binutils >=binutils-2.33.
Note: passing -mbranch-protection=bti in CFLAGS when building glibc
may not be enough to get a glibc that supports BTI because crtbegin*
and crtend* provided by the compiler needs to be BTI compatible too.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for MTE to strncmp. Regression tested with xcheck and benchmarked
with glibc's benchtests on the Cortex-A53, Cortex-A72, and Neoverse N1.
The existing implementation assumes that any access to the pages in which the
string resides is safe. This assumption is not true when MTE is enabled. This
patch updates the algorithm to ensure that accesses remain within the bounds
of an MTE tag (16-byte chunks) and improves overall performance.
Co-authored-by: Branislav Rankov <branislav.rankov@arm.com>
Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for MTE to strcmp. Regression tested with xcheck and benchmarked
with glibc's benchtests on the Cortex-A53, Cortex-A72, and Neoverse N1.
The existing implementation assumes that any access to the pages in which the
string resides is safe. This assumption is not true when MTE is enabled. This
patch updates the algorithm to ensure that accesses remain within the bounds
of an MTE tag (16-byte chunks) and improves overall performance.
Co-authored-by: Branislav Rankov <branislav.rankov@arm.com>
Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for MTE to strrchr. Regression tested with xcheck and benchmarked
with glibc's benchtests on the Cortex-A53, Cortex-A72, and Neoverse N1.
The existing implementation assumes that any access to the pages in which the
string resides is safe. This assumption is not true when MTE is enabled. This
patch updates the algorithm to ensure that accesses remain within the bounds
of an MTE tag (16-byte chunks) and improves overall performance.
Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for MTE to memrchr. Regression tested with xcheck and benchmarked
with glibc's benchtests on the Cortex-A53, Cortex-A72, and Neoverse N1.
The existing implementation assumes that any access to the pages in which the
string resides is safe. This assumption is not true when MTE is enabled. This
patch updates the algorithm to ensure that accesses remain within the bounds
of an MTE tag (16-byte chunks) and improves overall performance.
Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for MTE to memchr. Regression tested with xcheck and benchmarked
with glibc's benchtests on the Cortex-A53, Cortex-A72, and Neoverse N1.
The existing implementation assumes that any access to the pages in which the
string resides is safe. This assumption is not true when MTE is enabled. This
patch updates the algorithm to ensure that accesses remain within the bounds
of an MTE tag (16-byte chunks) and improves overall performance.
Co-authored-by: Gabor Kertesz <gabor.kertesz@arm.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for MTE to strcpy. Regression tested with xcheck and benchmarked
with glibc's benchtests on the Cortex-A53, Cortex-A72, and Neoverse N1.
The existing implementation assumes that any access to the pages in which the
string resides is safe. This assumption is not true when MTE is enabled. This
patch updates the algorithm to ensure that accesses remain within the bounds
of an MTE tag (16-byte chunks) and improves overall performance.
Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>
|
|
|
|
|
|
|
|
| |
The -fno-math-errno is already added by default and the minimum
required GCC to build glibc (6.2) make the -ffinite-math-only
superflous.
Checked on aarch64-linux-gnu.
|
|
|
|
|
|
|
| |
The define is already set on the math-use-builtins-ceil.h, the patch
just removes the implementations (it was missed on c9feb1be93).
Checked on aarch64-linux-gnu.
|
|
|
|
|
|
|
|
|
|
|
| |
Each symbol definitions are moved on a separated file and it
cover all symbol type definitions (float, double, long double,
and float128).
It allows to set support for architectures without the boiler
place of copying default values.
Checked with a build on the affected ABIs.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Introduce an Arm MTE compatible strlen implementation.
The existing implementation assumes that any access to the pages in
which the string resides is safe. This assumption is not true when
MTE is enabled. This patch updates the algorithm to ensure that
accesses remain within the bounds of an MTE tag (16-byte chunks) and
improves overall performance on modern cores. On cores with less
efficient Advanced SIMD implementation such as Cortex-A53 it can
be slower.
Benchmarked on Cortex-A72, Cortex-A53, Neoverse N1.
Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Introduce an Arm MTE compatible strchr implementation.
The existing implementation assumes that any access to the pages in
which the string resides is safe. This assumption is not true when
MTE is enabled. This patch updates the algorithm to ensure that
accesses remain within the bounds of an MTE tag (16-byte chunks) and
improves overall performance.
Benchmarked on Cortex-A72, Cortex-A53, Neoverse N1.
Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Introduce an Arm MTE compatible strchrnul implementation.
The existing implementation assumes that any access to the pages in
which the string resides is safe. This assumption is not true when
MTE is enabled. This patch updates the algorithm to ensure that
accesses remain within the bounds of an MTE tag (16-byte chunks) and
improves overall performance.
Benchmarked on Cortex-A72, Cortex-A53, Neoverse N1.
Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Falkor's memcpy and memmove share some implementation details,
therefore, the two routines are moved to a single source file
for code reuse.
The two routines now share code for small and medium copies
(up to and including 128 bytes). Large copies in memcpy do not
handle overlap correctly, consequently, the loops for
moving/copying more than 128 bytes stay separate for memcpy
and memmove.
To increase code reuse a number of small modifications were made:
1. The old implementation of memcpy copied the first 16-bytes as
soon as the size of data was determined to be greater than 32 bytes.
For memcpy code to also work when copying small/medium overlapping
data, the first load and store was moved to the large copy case.
2. Medium memcpy case no longer assumes that 16 bytes were already
copied and uses 8 registers to copy up to 128 bytes.
3. Small case for memmove was enlarged to that of memcpy, which is
less than or equal to 32 bytes.
4. Medium case for memmove was enlarged to that of memcpy, which is
less than or equal to 128 bytes.
Other changes include:
1. Improve alignment of existing loop bodies.
2. 'Delouse' memmove and memcpy input arguments. Make sure that
upper 32-bits of input registers are zeroed if unused.
3. Do one more iteration in memmove loops and reduce the number of
copies made from the start/end of the buffer, depending on
the direction of the memmove loop.
Benchmarking:
Looking at the results from bench-memcpy-random.out, we can see that
now memmove_falkor is about 5% faster than memcpy_falkor_old, while
memmove_falkor_old was more than 15% slower. The memcpy implementation
remained largely unmodified, so there is no significant performance
change.
The reason for such a significant memmove performance gain is the
increase of the upper bound on the small copy case to 32 bytes and
the increase of the upper bound on the medium copy case to 128 bytes.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
introduce sysdep header math-use-builtins.h to replace aarch64
implementations with corresponding generic ones.
- newly inroduced generic sqrt{,f}, fma{,f}
- existing floor{,f}, nearbyint{,f}, rint{,f}, round{,f}, trunc{,f}
- Note that generic copysign was already enabled (via generic
math-use-builtins.h) now thru sysdep header
Tested with build-many-glibcs for aarch64-linux-gnu
This is a non functional change and aarch64 libm before/after was
byte invariant as compared below:
| cd /SCRATCH/vgupta/gnu/install-glibc-A-baseline
| for i in `find . -name libm-2.31.9000.so`; do
| echo $i; diff $i /SCRATCH/vgupta/gnu/install-glibc-C-reduce-scope/$i ;
| echo $?;
| done
| ./aarch64-linux-gnu/lib64/libm-2.31.9000.so
| 0
| ./arm-linux-gnueabi/lib/libm-2.31.9000.so
| 0
| ./x86_64-linux-gnu/lib64/libm-2.31.9000.so
| 0
| ./arm-linux-gnueabihf/lib/libm-2.31.9000.so
| 0
| ./riscv64-linux-gnu-rv64imac-lp64/lib64/lp64/libm-2.31.9000.so
| 0
| ./riscv64-linux-gnu-rv64imafdc-lp64/lib64/lp64/libm-2.31.9000.so
| 0
| ./powerpc-linux-gnu/lib/libm-2.31.9000.so
| 0
| ./microblaze-linux-gnu/lib/libm-2.31.9000.so
| 0
| ./nios2-linux-gnu/lib/libm-2.31.9000.so
| 0
| ./hppa-linux-gnu/lib/libm-2.31.9000.so
| 0
| ./s390x-linux-gnu/lib64/libm-2.31.9000.so
| 0
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes the optimized implementation of strcpy and strnlen
on a big-endian arm64 machine.
The optimized method uses neon, which can process 128bit with one
instruction. On a big-endian machine, the bit order should be reversed
for the whole 128-bits double word. But with instuction
rev64 datav.16b, datav.16b
it reverses 64bits in the two halves rather than reversing 128bits.
There is no such instruction as rev128 to reverse the 128bits, but we
can fix this by loading the data registers accordingly.
Fixes 0237b61526e7("aarch64: Optimized implementation of strcpy") and
2911cb68ed3d("aarch64: Optimized implementation of strnlen").
Signed-off-by: Lexi Shao <shaolexi@huawei.com>
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With mathinline removal there is no need to keep building and testing
inline math tests.
The gen-libm-tests.py support to generate ULP_I_* is removed and all
libm-test-ulps files are updated to longer have the
i{float,double,ldouble} entries. The support for no-test-inline is
also removed from both gen-auto-libm-tests and the
auto-libm-test-out-* were regenerated.
Checked on x86_64-linux-gnu and i686-linux-gnu.
|
|
|
|
|
|
| |
Further optimize integer memcpy. Small cases now include copies up
to 32 bytes. 64-128 byte copies are split into two cases to improve
performance of 64-96 byte copies. Comments have been rewritten.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This supersedes the init_array sysdeps directory. It allows us to
check for ELF_INITFINI in both C and assembler code, and skip DT_INIT
and DT_FINI processing completely on newer architectures.
A new header file is needed because <dl-machine.h> is incompatible
with assembler code. <sysdep.h> is compatible with assembler code,
but it cannot be included in all assembler files because on some
architectures, it redefines register names, and some assembler files
conflict with that.
<elf-initfini.h> is replicated for legacy architectures which need
DT_INIT/DT_FINI support. New architectures follow the generic default
and disable it.
|
|
|
|
|
| |
All architectures using their own definition of struct
__pthread_rwlock_arch_t need to provide their own pthread-offsets.h.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a new macro, libm_alias_finite, to define all _finite
symbol. It sets all _finite symbol as compat symbol based on its first
version (obtained from the definition at built generated first-versions.h).
The <fn>f128_finite symbols were introduced in GLIBC 2.26 and so need
special treatment in code that is shared between long double and float128.
It is done by adding a list, similar to internal symbol redifinition,
on sysdeps/ieee754/float128/float128_private.h.
Alpha also needs some tricky changes to ensure we still emit 2 compat
symbols for sqrt(f).
Passes buildmanyglibc.
Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
|
| |
|
|
|
|
| |
Checked on aarch64-linux-gnu.
|
|
|
|
|
|
|
| |
Rename ifunc for kunpeng to kunpeng920, and modify the corresponding
function files including IS_KUNPENG920 judgement.
Checked on aarch64-linux-gnu.
|
|
|
|
| |
Checked on aarch64-linux-gnu.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Due to the branch prediction issue of Kunpeng processor, we found
memset_generic has poor performance on middle sizes setting, and so
we reconstructed the logic, expanded the loop by 4 times in set_long
to solve the problem, even when setting below 1K sizes have benefit.
Another change is that DZ_ZVA seems no work when setting zero, so we
discarded it and used set_long to set zero instead. Fewer branches and
predictions also make the zero case have slightly improvement.
Checked on aarch64-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Optimize the strlen implementation by using vector operations and
loop unrolling in main loop.Compared to __strlen_generic,it reduces
latency of cases in bench-strlen by 7%~18% when the length of src
is greater than 128 bytes, with gains throughout the benchmark.
Checked on aarch64-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Considering the excellent performance of memchr.S on glibc 2.30, the
same algorithm is used to find chrin. Compared to memrchr.c, this
method with memrchr.S achieves an average performance improvement
of 58% based on benchtest and its extension cases.
Checked on aarch64-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Optimize the strlen implementation by using vector operations and
loop unrooling in main loop. Compared to aarch64/strnlen.S, it
reduces latency of cases in bench-strnlen by 11%~24% when the length
of src is greater than 64 bytes, with gains throughout the benchmark.
Checked on aarch64-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Optimize the strcpy implementation by using vector loads and operations
in main loop.Compared to aarch64/strcpy.S, it reduces latency of cases
in bench-strlen by 5%~18% when the length of src is greater than 64
bytes, with gains throughout the benchmark.
Checked on aarch64-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
The loop body is expanded from a 16-byte comparison to a 64-byte
comparison, and the usage of ldp is replaced by the Post-index
mode to the Base plus offset mode. Hence, compare can faster 18%
around > 128 bytes in all.
Checked on aarch64-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
|
|
|
|
|
|
|
|
|
| |
This commit adds missing skip_ifunc checks to aarch64, arm, i386,
sparc, and x86_64. A new test case ensures that IRELATIVE IFUNC
resolvers do not run in various diagnostic modes of the dynamic
loader.
Reviewed-By: Szabolcs Nagy <szabolcs.nagy@arm.com>
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a default pthread-offsets.h based on default
thread definitions from struct_mutex.h and struct_rwlock.h.
The idea is to simplify new ports inclusion.
Checked with a build on affected abis.
Change-Id: I7785a9581e651feb80d1413b9e03b5ac0452668a
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a new generic __pthread_rwlock_arch_t definition meant
to be used by new ports. Its layout mimics the current usage on some
64 bits ports and it allows some ports to use the generic definition.
The arch __pthread_rwlock_arch_t definition is moved from
pthreadtypes-arch.h to another arch-specific header (struct_rwlock.h).
Also the static intialization macro for pthread_rwlock_t is set to use
an arch defined on (__PTHREAD_RWLOCK_INITIALIZER) which simplifies its
implementation.
The default pthread_rwlock_t layout differs from current ports with:
1. Internal layout is the same for 32 bits and 64 bits.
2. Internal flag is an unsigned short so it should not required
additional padding to align for word boundary (if it is the case
for the ABI).
Checked with a build on affected abis.
Change-Id: I776a6a986c23199929d28a3dcd30272db21cd1d0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current way of defining the common mutex definition for POSIX and
C11 on pthreadtypes-arch.h (added by commit 06be6368da16104be5) is
not really the best options for newer ports. It requires define some
misleading flags that should be always defined as 0
(__PTHREAD_COMPAT_PADDING_MID and __PTHREAD_COMPAT_PADDING_END), it
exposes options used solely for linuxthreads compat mode
(__PTHREAD_MUTEX_USE_UNION and __PTHREAD_MUTEX_NUSERS_AFTER_KIND), and
requires newer ports to explicit define them (adding more boilerplate
code).
This patch adds a new default __pthread_mutex_s definition meant to
be used by newer ports. Its layout mimics the current usage on both
32 and 64 bits ports and it allows most ports to use the generic
definition. Only ports that use some arch-specific definition (such
as hardware lock-elision or linuxthreads compat) requires specific
headers.
For 32 bit, the generic definitions mimic the other 32-bit ports
of using an union to define the fields uses on adaptive and robust
mutexes (thus not allowing both usage at same time) and by using a
single linked-list for robust mutexes. Both decisions seemed to
follow what recent ports have done and make the resulting
pthread_mutex_t/mtx_t object smaller.
Also the static intialization macro for pthread_mutex_t is set to use
a macro __PTHREAD_MUTEX_INITIALIZER where the architecture can redefine
in its struct_mutex.h if it requires additional fields to be
initialized.
Checked with a build on affected abis.
Change-Id: I30a22c3e3497805fd6e52994c5925897cffcfe13
|
|
|
|
|
|
|
|
|
|
| |
The new rwlock implementation added by cc25c8b4c1196 (2.25) removed
support for lock-elision. This patch removes remaining the
arch-specific unused definitions.
Checked with a build against all affected ABIs.
Change-Id: I5dec8af50e3cd56d7351c52ceff4aa3771b53cd6
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch new build tests to check for internal fields offsets for
internal pthread_rwlock_t definition. Althoug the '__data.__flags'
field layout should be preserved due static initializators, the patch
also adds tests for the futexes that may be used in a shared memory
(although using different libc version in such scenario is not really
supported).
Checked with a build against all affected ABIs.
Change-Id: Iccc103d557de13d17e4a3f59a0cad2f4a640c148
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The offsets of pthread_mutex_t __data.__nusers, __data.__spins,
__data.elision, __data.list are not required to be constant over
the releases. Only the __data.__kind is used for static
initializers.
This patch also adds an additional size check for __data.__kind.
Checked with a build against affected ABIs.
Change-Id: I7a4e48cc91b4c4ada57e9a5d1b151fb702bfaa9f
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Increase the upper bound on medium cases from 96 to 128 bytes.
Now, up to 128 bytes are copied unrolled.
Increase the upper bound on small cases from 16 to 32 bytes so that
copies of 17-32 bytes are not impacted by the larger medium case.
Benchmarking:
The attached figures show relative timing difference with respect
to 'memcpy_generic', which is the existing implementation.
'memcpy_med_128' denotes the the version of memcpy_generic with
only the medium case enlarged. The 'memcpy_med_128_small_32' numbers
are for the version of memcpy_generic submitted in this patch, which
has both medium and small cases enlarged. The figures were generated
using the script from:
https://www.sourceware.org/ml/libc-alpha/2019-10/msg00563.html
Depending on the platform, the performance improvement in the
bench-memcpy-random.c benchmark ranges from 6% to 20% between
the original and final version of memcpy.S
Tested against GLIBC testsuite and randomized tests.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With only two exceptions (sys/types.h and sys/param.h, both of which
historically might have defined BYTE_ORDER) the public headers that
include <endian.h> only want to be able to test __BYTE_ORDER against
__*_ENDIAN.
This patch creates a new bits/endian.h that can be included by any
header that wants to be able to test __BYTE_ORDER and/or
__FLOAT_WORD_ORDER against the __*_ENDIAN constants, or needs
__LONG_LONG_PAIR. It only defines macros in the implementation
namespace.
The existing bits/endian.h (which could not be included independently
of endian.h, and only defines __BYTE_ORDER and maybe __FLOAT_WORD_ORDER)
is renamed to bits/endianness.h. I also took the opportunity to
canonicalize the form of this header, which we are stuck with having
one copy of per architecture. Since they are so short, this means git
doesn’t understand that they were renamed from existing headers, sigh.
endian.h itself is a nonstandard header and its only remaining use
from a standard header is guarded by __USE_MISC, so I dropped the
__USE_MISC conditionals from around all of the public-namespace things
it defines. (This means, an application that requests strict library
conformance but includes endian.h will still see the definition of
BYTE_ORDER.)
A few changes to specific bits/endian(ness).h variants deserve
mention:
- sysdeps/unix/sysv/linux/ia64/bits/endian.h is moved to
sysdeps/ia64/bits/endianness.h. If I remember correctly, ia64 did
have selectable endianness, but we have assembly code in
sysdeps/ia64 that assumes it’s little-endian, so there is no reason
to treat the ia64 endianness.h as linux-specific.
- The C-SKY port does not fully support big-endian mode, the compile
will error out if __CSKYBE__ is defined.
- The PowerPC port had extra logic in its bits/endian.h to detect a
broken compiler, which strikes me as unnecessary, so I removed it.
- The only files that defined __FLOAT_WORD_ORDER always defined it to
the same value as __BYTE_ORDER, so I removed those definitions.
The SH bits/endian(ness).h had comments inconsistent with the
actual setting of __FLOAT_WORD_ORDER, which I also removed.
- I *removed* copyright boilerplate from the few bits/endian(ness).h
headers that had it; these files record a single fact in a fashion
dictated by an external spec, so I do not think they are copyrightable.
As long as I was changing every copy of ieee754.h in the tree, I
noticed that only the MIPS variant includes float.h, because it uses
LDBL_MANT_DIG to decide among three different versions of
ieee854_long_double. This patch makes it not include float.h when
GCC’s intrinsic __LDBL_MANT_DIG__ is available.
* string/endian.h: Unconditionally define LITTLE_ENDIAN,
BIG_ENDIAN, PDP_ENDIAN, and BYTE_ORDER. Condition byteswapping
macros only on !__ASSEMBLER__. Move the definitions of
__BIG_ENDIAN, __LITTLE_ENDIAN, __PDP_ENDIAN, __FLOAT_WORD_ORDER,
and __LONG_LONG_PAIR to...
* string/bits/endian.h: ...this new file, which includes
the renamed header bits/endianness.h for the definition of
__BYTE_ORDER and possibly __FLOAT_WORD_ORDER.
* string/Makefile: Install bits/endianness.h.
* include/bits/endian.h: New wrapper.
* bits/endian.h: Rename to bits/endianness.h.
Add multiple-include guard. Rewrite the comment explaining what
the machine-specific variants of this file should do.
* sysdeps/unix/sysv/linux/ia64/bits/endian.h:
Move to sysdeps/ia64.
* sysdeps/aarch64/bits/endian.h
* sysdeps/alpha/bits/endian.h
* sysdeps/arm/bits/endian.h
* sysdeps/csky/bits/endian.h
* sysdeps/hppa/bits/endian.h
* sysdeps/ia64/bits/endian.h
* sysdeps/m68k/bits/endian.h
* sysdeps/microblaze/bits/endian.h
* sysdeps/mips/bits/endian.h
* sysdeps/nios2/bits/endian.h
* sysdeps/powerpc/bits/endian.h
* sysdeps/riscv/bits/endian.h
* sysdeps/s390/bits/endian.h
* sysdeps/sh/bits/endian.h
* sysdeps/sparc/bits/endian.h
* sysdeps/x86/bits/endian.h:
Rename to endianness.h; canonicalize form of file; remove
redundant definitions of __FLOAT_WORD_ORDER.
* sysdeps/powerpc/bits/endianness.h: Remove logic to check for
broken compilers.
* ctype/ctype.h
* sysdeps/aarch64/nptl/bits/pthreadtypes-arch.h
* sysdeps/arm/nptl/bits/pthreadtypes-arch.h
* sysdeps/csky/nptl/bits/pthreadtypes-arch.h
* sysdeps/ia64/ieee754.h
* sysdeps/ieee754/ieee754.h
* sysdeps/ieee754/ldbl-128/ieee754.h
* sysdeps/ieee754/ldbl-128ibm/ieee754.h
* sysdeps/m68k/nptl/bits/pthreadtypes-arch.h
* sysdeps/microblaze/nptl/bits/pthreadtypes-arch.h
* sysdeps/mips/ieee754/ieee754.h
* sysdeps/mips/nptl/bits/pthreadtypes-arch.h
* sysdeps/nios2/nptl/bits/pthreadtypes-arch.h
* sysdeps/nptl/pthread.h
* sysdeps/riscv/nptl/bits/pthreadtypes-arch.h
* sysdeps/sh/nptl/bits/pthreadtypes-arch.h
* sysdeps/sparc/sparc32/ieee754.h
* sysdeps/unix/sysv/linux/generic/bits/stat.h
* sysdeps/unix/sysv/linux/generic/bits/statfs.h
* sysdeps/unix/sysv/linux/sys/acct.h
* wctype/bits/wctype-wchar.h:
Include bits/endian.h, not endian.h.
* sysdeps/unix/sysv/linux/hppa/pthread.h: Don’t include endian.h.
* sysdeps/mips/ieee754/ieee754.h: Use __LDBL_MANT_DIG__
in ifdefs, instead of LDBL_MANT_DIG. Only include float.h
when __LDBL_MANT_DIG__ is not predefined, in which case
define __LDBL_MANT_DIG__ to equal LDBL_MANT_DIG.
|