| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
These structs describe file formats under /var/log, and should not
depend on the definition of _TIME_BITS. This is achieved by
defining __WORDSIZE_TIME64_COMPAT32 to 1 on 32-bit ports that
support 32-bit time_t values (where __time_t is 32 bits).
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 9abdae94c7454c45e02e97e4ed1eb1b1915d13d8)
|
|
|
|
|
|
|
|
|
| |
The default <utmp-size.h> is for ports with a 64-bit time_t.
Ports with a 32-bit time_t or with __WORDSIZE_TIME64_COMPAT32=1
need to override it.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 4d4da5aab936504b2d3eca3146e109630d9093c4)
|
|
|
|
|
|
|
|
|
| |
This seems to have stopped working with some GCC 14 versions,
which clobber r2. With other compilers, the kernel-provided
r2 value is still available at this point.
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
(cherry picked from commit 14e56bd4ce15ac2d1cc43f762eb2e6b83fec1afe)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The sparc ABI has multiple cases on how to handle JMP_SLOT relocations,
(sparc_fixup_plt/sparc64_fixup_plt). For BINDNOW, _dl_audit_symbind
will be responsible to setup the final relocation value; while for
lazy binding _dl_fixup/_dl_profile_fixup will call the audit callback
and tail cail elf_machine_fixup_plt (which will call
sparc64_fixup_plt).
This patch fixes by issuing the SPARC specific routine on bindnow and
forwarding the audit value to elf_machine_fixup_plt for lazy resolution.
It fixes the la_symbind for bind-now tests on sparc64 and sparcv9:
elf/tst-audit24a
elf/tst-audit24b
elf/tst-audit24c
elf/tst-audit24d
Checked on sparc64-linux-gnu and sparcv9-linux-gnu.
Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
(cherry picked from commit dddc88587a7f48cbb361d9929ec23d790164eef8)
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This patch cleans up the power4 strncmp optimization for powerpc64 which
is unlikely to be used anywhere.
Tested on ppc64le with and without --disable-multi-arch flag.
Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
|
|
|
| |
Supports pcrel addressing of TLS GOT entry. Also tweak the non-pcrel
asm constraint to better reflect how the reg is used.
|
|
|
|
|
|
|
|
| |
This makes it more likely that the compiler can compute the strlen
argument in _startup_fatal at compile time, which is required to
avoid a dependency on strlen this early during process startup.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
|
|
|
|
| |
Reviewed-by: Fangrui Song <maskray@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GCC 13 has added more _FloatN and _FloatNx versions of existing
<math.h> and <complex.h> built-in functions, for use in libstdc++-v3.
This breaks the glibc build because of how those functions are defined
as aliases to functions with the same ABI but different types. Add
appropriate -fno-builtin-* options for compiling relevant files, as
already done for the case of long double functions aliasing double
ones and based on the list of files used there.
I fixed some mistakes in that list of double files that I noticed
while implementing this fix, but there may well be more such
(harmless) cases, in this list or the new one (files that don't
actually exist or don't define the named functions as aliases so don't
need the options). I did try to exclude cases where glibc doesn't
define certain functions for _FloatN or _FloatNx types at all from the
new uses of -fno-builtin-* options. As with the options for double
files (see the commit message for commit
49348beafe9ba150c9bd48595b3f372299bddbb0, "Fix build with GCC 10 when
long double = double."), it's deliberate that the options are used
even if GCC currently doesn't have a built-in version of a given
functions, so providing some level of future-proofing against more
such built-in functions being added in future.
Tested with build-many-glibcs.py for aarch64-linux-gnu
powerpc-linux-gnu powerpc64le-linux-gnu x86_64-linux-gnu (compilers
and glibcs builds) with GCC mainline.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the future, this will result in a compilation failure if the
macros are unexpectedly undefined (due to header inclusion ordering
or header inclusion missing altogether).
Assembler sources are more difficult to convert. In many cases,
they are hand-optimized for the mangling and no-mangling variants,
which is why they are not converted.
sysdeps/s390/s390-32/__longjmp.c and sysdeps/s390/s390-64/__longjmp.c
are special: These are C sources, but most of the implementation is
in assembler, so the PTR_DEMANGLE macro has to be undefined in some
cases, to match the assembler style.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows us to define a generic no-op version of PTR_MANGLE and
PTR_DEMANGLE. In the future, we can use PTR_MANGLE and PTR_DEMANGLE
unconditionally in C sources, avoiding an unintended loss of hardening
due to missing include files or unlucky header inclusion ordering.
In i386 and x86_64, we can avoid a <tls.h> dependency in the C
code by using the computed constant from <tcb-offsets.h>. <sysdep.h>
no longer includes these definitions, so there is no cyclic dependency
anymore when computing the <tcb-offsets.h> constants.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Besides the option being gcc specific, this approach is still fragile
and not future proof since we do not know if this will be the only
optimization option gcc will add that transforms loops to memset
(or any libcall).
This patch adds a new header, dl-symbol-redir-ifunc.h, that can b
used to redirect the compiler generated libcalls to port the generic
memset implementation if required.
Checked on x86_64-linux-gnu and aarch64-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
|
|
|
|
|
|
|
|
| |
Removal of legacy hwcaps support from the dynamic loader left
no users of _dl_string_hwcap.
Signed-off-by: Javier Pello <devel@otheo.eu>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GCC 13 adds support for _FloatN and _FloatNx types in C++, so breaking
the installed glibc headers that assume such support is not present.
GCC mostly works around this with fixincludes, but that doesn't help
for building glibc and its tests (glibc doesn't itself contain C++
code, but there's C++ code built for tests). Update glibc's
bits/floatn-common.h and bits/floatn.h headers to handle the GCC 13
support directly.
In general the changes match those made by fixincludes, though I think
the ones in sysdeps/powerpc/bits/floatn.h, where the header tests
__LDBL_MANT_DIG__ == 113 or uses #elif, wouldn't match the existing
fixincludes patterns.
Some places involving special C++ handling in relation to _FloatN
support are not changed. There's no need to change the
__HAVE_FLOATN_NOT_TYPEDEF definition (also in a form that wouldn't be
matched by the fixincludes fixes) because it's only used in relation
to macro definitions using features not supported for C++
(__builtin_types_compatible_p and _Generic). And there's no need to
change the inline function overloads for issignaling, iszero and
iscanonical in C++ because cases where types have the same format but
are no longer compatible types are handled automatically by the C++
overload resolution rules.
This patch also does not change the overload handling for iseqsig, and
there I think changes *are* needed, beyond those in this patch or made
by fixincludes. The way that overload is defined, via a template
parameter to a structure type, requires overloads whenever the types
are incompatible, even if they have the same format. So I think we
need to add overloads with GCC 13 for every supported _FloatN and
_FloatNx type, rather than just having one for _Float128 when it has a
different ABI to long double as at present (but for older GCC, such
overloads must not be defined for types that end up defined as
typedefs for another type).
Tested with build-many-glibcs.py: compilers build for
aarch64-linux-gnu ia64-linux-gnu mips64-linux-gnu powerpc-linux-gnu
powerpc64le-linux-gnu x86_64-linux-gnu; glibcs build for
aarch64-linux-gnu ia64-linux-gnu i686-linux-gnu mips-linux-gnu
mips64-linux-gnu-n32 powerpc-linux-gnu powerpc64le-linux-gnu
x86_64-linux-gnu.
|
|
|
|
|
|
|
| |
Rename atomic_exchange_rel/acq to use atomic_exchange_release/acquire
since these map to the standard C11 atomic builtins.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rather than buffering 16 MiB of entropy in userspace (by way of
chacha20), simply call getrandom() every time.
This approach is doubtlessly slower, for now, but trying to prematurely
optimize arc4random appears to be leading toward all sorts of nasty
properties and gotchas. Instead, this patch takes a much more
conservative approach. The interface is added as a basic loop wrapper
around getrandom(), and then later, the kernel and libc together can
work together on optimizing that.
This prevents numerous issues in which userspace is unaware of when it
really must throw away its buffer, since we avoid buffering all
together. Future improvements may include userspace learning more from
the kernel about when to do that, which might make these sorts of
chacha20-based optimizations more possible. The current heuristic of 16
MiB is meaningless garbage that doesn't correspond to anything the
kernel might know about. So for now, let's just do something
conservative that we know is correct and won't lead to cryptographic
issues for users of this function.
This patch might be considered along the lines of, "optimization is the
root of all evil," in that the much more complex implementation it
replaces moves too fast without considering security implications,
whereas the incremental approach done here is a much safer way of going
about things. Once this lands, we can take our time in optimizing this
properly using new interplay between the kernel and userspace.
getrandom(0) is used, since that's the one that ensures the bytes
returned are cryptographically secure. But on systems without it, we
fallback to using /dev/urandom. This is unfortunate because it means
opening a file descriptor, but there's not much of a choice. Secondly,
as part of the fallback, in order to get more or less the same
properties of getrandom(0), we poll on /dev/random, and if the poll
succeeds at least once, then we assume the RNG is initialized. This is a
rough approximation, as the ancient "non-blocking pool" initialized
after the "blocking pool", not before, and it may not port back to all
ancient kernels, though it does to all kernels supported by glibc
(≥3.2), so generally it's the best approximation we can do.
The motivation for including arc4random, in the first place, is to have
source-level compatibility with existing code. That means this patch
doesn't attempt to litigate the interface itself. It does, however,
choose a conservative approach for implementing it.
Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Cristian Rodríguez <crrodriguez@opensuse.org>
Cc: Paul Eggert <eggert@cs.ucla.edu>
Cc: Mark Harris <mark.hsj@gmail.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: linux-crypto@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It adds vectorized ChaCha20 implementation based on libgcrypt
cipher/chacha20-ppc.c. It targets POWER8 and it is used on default
for LE.
On a POWER8 it shows the following improvements (using formatted
bench-arc4random data):
POWER8
GENERIC MB/s
-----------------------------------------------
arc4random [single-thread] 138.77
arc4random_buf(16) [single-thread] 174.36
arc4random_buf(32) [single-thread] 228.11
arc4random_buf(48) [single-thread] 252.31
arc4random_buf(64) [single-thread] 270.11
arc4random_buf(80) [single-thread] 278.97
arc4random_buf(96) [single-thread] 287.78
arc4random_buf(112) [single-thread] 291.92
arc4random_buf(128) [single-thread] 295.25
POWER8 MB/s
-----------------------------------------------
arc4random [single-thread] 198.06
arc4random_buf(16) [single-thread] 278.79
arc4random_buf(32) [single-thread] 448.89
arc4random_buf(48) [single-thread] 551.09
arc4random_buf(64) [single-thread] 646.12
arc4random_buf(80) [single-thread] 698.04
arc4random_buf(96) [single-thread] 756.06
arc4random_buf(112) [single-thread] 784.12
arc4random_buf(128) [single-thread] 808.04
-----------------------------------------------
Checked on powerpc64-linux-gnu and powerpc64le-linux-gnu.
Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a proper bounds check to __libc_ifunc_impl_list. This makes MAX_IFUNC
redundant and fixes several targets that will write outside the array.
To avoid unnecessary large diffs, pass the maximum in the argument 'i' to
IFUNC_IMPL_ADD - 'max' can be used in new ifunc definitions and existing
ones can be updated if desired.
Passes buildmanyglibc.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
__strncpy_power9 initializes VR 18 with zeroes to be used throughout the
code, including when zero-padding the destination string. However, the
v18 reference was mistakenly being used for stxv and stxvl, which take a
VSX vector as operand. The code ended up using the uninitialized VSR 18
register by mistake.
Both occurrences have been changed to use the proper VSX number for VR 18
(i.e. VSR 50).
Tested on powerpc, powerpc64 and powerpc64le.
Signed-off-by: Kewen Lin <linkw@gcc.gnu.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Both float, double, and _Float128 are assumed to be supported
(float and double already only uses builtins). Only long double
is parametrized due GCC bug 29253 which prevents its usage on
powerpc.
It allows to remove i686, ia64, x86_64, powerpc, and sparc arch
specific implementation.
On ia64 it also fixes the sNAN handling:
math/test-float64x-fabs
math/test-ldouble-fabs
Checked on x86_64-linux-gnu, i686-linux-gnu, powerpc-linux-gnu,
powerpc64-linux-gnu, sparc64-linux-gnu, and ia64-linux-gnu.
|
|
|
|
|
|
|
| |
82a79e7d1843f9d90075a0bf2f04557040829bb0 removed the only user of
HAVE_PPC_SECURE_PLT.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PI_STATIC_AND_HIDDEN indicates whether accesses to internal linkage
variables and hidden visibility variables in a shared object (ld.so)
need dynamic relocations (usually R_*_RELATIVE). PI (position
independent) in the macro name is a misnomer: a code sequence using GOT
is typically position-independent as well, but using dynamic relocations
does not meet the requirement.
Not defining PI_STATIC_AND_HIDDEN is legacy and we expect that all new
ports will define PI_STATIC_AND_HIDDEN. Current ports defining
PI_STATIC_AND_HIDDEN are more than the opposite. Change the configure
default.
No functional change.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
libgcc ifunc resolvers that access hwcap via a field in the tcb can't
be called until the thread pointer is set up. Other ifunc resolvers
might need access to at_platform. This patch sets up a fake thread
pointer early to a copy of tcbhead_t. hwcapinfo.c already had local
variables for hwcap and at_platform, replace them with an entire
tcbhead_t. It's not that large and this way we easily ensure hwcap
and at_platform are at the same relative offsets as they are in the
real thread block.
The patch also conditionally disables part of tst-tlsifunc-static,
"bar address read from IFUNC resolver is incorrect". We can't get a
proper address for a thread variable before glibc initialises tls.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The PowerPC64 linker edits medium model toc-indirect code to toc-pointer
relative:
addis r9,r2,tc_entry_for_var@toc@ha
ld r9,tc_entry_for_var@toc@l(r9)
becomes
addis r9,r2,(var-.TOC.)@ha
addi r9,r9,(var-.TOC.)@l
when "var" is known to be local to the binary. This isn't done for
small-model toc-indirect code, because "var" is almost guaranteed to
be too far away from .TOC. for a 16-bit signed offset. And, because
the analysis of which .toc entry can be removed becomes much more
complicated in objects that mix code models, they aren't removed if
any small-model toc sequence appears in an object file.
Unfortunately, glibc's build of ld.so smashes the needed objects
together in a ld -r linking stage. This means the GOT/TOC is left
with a whole lot of relative relocations which is untidy, but in
itself is not a serious problem. However, static-pie on powerpc64
bombs due to a segfault caused by one of the small-model accesses
before _dl_relocate_static_pie. (The very first one in rcrt1.o
passing start_addresses in r8 to __libc_start_main.)
So this patch makes all the toc/got accesses in assembly medium code
model, and a couple of functions hidden. By itself this is not
enough to give us working static-pie, but it is useful in isolation to
enable better linker optimisation.
There's a serious problem in libgcc too. libgcc ifuncs access the
AT_HWCAP words stored in the tcb with an offset from the thread
pointer (r13), but r13 isn't set at the time _dl_relocate_static_pie.
A followup patch will fix that.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
|
|
|
|
|
|
| |
The builtin and generic implementation from generic files are suffice.
Checked on powerpc64-linux-gnu and powerpc-linux-gnu.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
configure scripts need to be runnable with a POSIX-compliant /bin/sh.
On many (but not all!) systems, /bin/sh is provided by Bash, so errors
like this aren't spotted. Notably Debian defaults to /bin/sh provided
by dash which doesn't tolerate such bashisms as '=='.
This retains compatibility with bash.
Fixes configure warnings/errors like:
```
checking if compiler warns about alias for function with incompatible types... yes
/var/tmp/portage/sys-libs/glibc-2.34-r10/work/glibc-2.34/configure: 4209: test: xyes: unexpected operator
```
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Signed-off-by: Sam James <sam@gentoo.org>
|
|
|
|
|
| |
The symbol is not present in current POSIX specification and compiler
already generates memset call.
|
|
|
|
|
| |
The symbol is not present in current POSIX specification and compiler
already generates memset call.
|
|
|
|
|
| |
The symbols is not present in current POSIX specification and compiler
already generates memmove call.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Prelinked binaries and libraries still work, the dynamic tags
DT_GNU_PRELINKED, DT_GNU_LIBLIST, DT_GNU_CONFLICT just ignored
(meaning the process is reallocated as default).
The loader environment variable TRACE_PRELINKING is also removed,
since it used solely on prelink.
Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The audit symbind callback is not called for binaries built with
-Wl,-z,now or when LD_BIND_NOW=1 is used, nor the PLT tracking callbacks
(plt_enter and plt_exit) since this would change the expected
program semantics (where no PLT is expected) and would have performance
implications (such as for BZ#15533).
LAV_CURRENT is also bumped to indicate the audit ABI change (where
la_symbind flags are set by the loader to indicate no possible PLT
trace).
To handle powerpc64 ELFv1 function descriptor, _dl_audit_symbind
requires to know whether bind-now is used so the symbol value is
updated to function text segment instead of the OPD (for lazy binding
this is done by PPC64_LOAD_FUNCPTR on _dl_runtime_resolve).
Checked on x86_64-linux-gnu, i686-linux-gnu, aarch64-linux-gnu,
powerpc64-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
|
|
|
|
|
|
|
|
| |
This is required so that the checks still work if $(early-cflags)
selects a different ISA level.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Trapping SIGSEGV within the process is error-prone, adds security
issues, and modern analysis design tends to happen out of the
process (either by attaching a debugger or by post-mortem analysis).
The libSegfault also has some design problems, it uses non
async-signal-safe function (backtrace) on signal handler.
There are multiple alternatives if users do want to use similar
functionality, such as sigsegv gnulib module or libsegfault.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I used these shell commands:
../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")
and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 7061 files FOO.
I then removed trailing white space from math/tgmath.h,
support/tst-support-open-dev-null-range.c, and
sysdeps/x86_64/multiarch/strlen-vec.S, to work around the following
obscure pre-commit check failure diagnostics from Savannah. I don't
know why I run into these diagnostics whereas others evidently do not.
remote: *** 912-#endif
remote: *** 913:
remote: *** 914-
remote: *** error: lines with trailing whitespace found
...
remote: *** error: sysdeps/unix/sysv/linux/statx_cp.c: trailing lines
|
|
|
|
|
|
| |
And use machine-sp.h instead. The Linux implementation is based on
already provided CURRENT_STACK_FRAME (used on nptl code) and
STACK_GROWS_UPWARD is replaced with _STACK_GROWS_UP.
|
|
|
|
| |
Now that memusage.c uses generic types we can remove them.
|
|
|
|
|
|
|
|
|
| |
It consolidates the code required to call la_pltexit audit
callback.
Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A local register variable is merely a compiler hint, and so not
appropriate in this context. Move the global register variable into
<thread_pointer.h> and include it from <tls.h>, as there can only
be one global definition for one particular register.
Fixes commit 8dbeb0561eeb876f557ac9eef5721912ec074ea5
("nptl: Add <thread_pointer.h> for defining __thread_pointer").
Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Raphael M Zinsly <rzinsly@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The generic implementation is shows only slight worse performance:
POWER10 reciprocal-throughput latency
master 8.28478 13.7253
new hypot 7.21945 13.1933
POWER9 reciprocal-throughput latency
master 13.4024 14.0967
new hypot 14.8479 15.8061
POWER8 reciprocal-throughput latency
master 15.5767 16.8885
new hypot 16.5371 18.4057
One way to improve might to make gcc generate xsmaxdp/xsmindp for
fmax/fmin (it onl does for -ffast-math, clang does for default
options).
Checked on powerpc64-linux-gnu (power8) and powerpc64le-linux-gnu
(power9).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
TLS_INIT_TCB_ALIGN is not actually used. TLS_TCB_ALIGN was likely
introduced to support a configuration where the thread pointer
has not the same alignment as THREAD_SELF. Only ia64 seems to use
that, but for the stack/pointer guard, not for storing tcbhead_t.
Some ports use TLS_TCB_OFFSET and TLS_PRE_TCB_SIZE to shift
the thread pointer, potentially landing in a different residue class
modulo the alignment, but the changes should not impact that.
In general, given that TLS variables have their own alignment
requirements, having different alignment for the (unshifted) thread
pointer and struct pthread would potentially result in dynamic
offsets, leading to more complexity.
hppa had different values before: __alignof__ (tcbhead_t), which
seems to be 4, and __alignof__ (struct pthread), which was 8
(old default) and is now 32. However, it defines THREAD_SELF as:
/* Return the thread descriptor for the current thread. */
# define THREAD_SELF \
({ struct pthread *__self; \
__self = __get_cr27(); \
__self - 1; \
})
So the thread pointer points after struct pthread (hence __self - 1),
and they have to have the same alignment on hppa as well.
Similarly, on ia64, the definitions were different. We have:
# define TLS_PRE_TCB_SIZE \
(sizeof (struct pthread) \
+ (PTHREAD_STRUCT_END_PADDING < 2 * sizeof (uintptr_t) \
? ((2 * sizeof (uintptr_t) + __alignof__ (struct pthread) - 1) \
& ~(__alignof__ (struct pthread) - 1)) \
: 0))
# define THREAD_SELF \
((struct pthread *) ((char *) __thread_self - TLS_PRE_TCB_SIZE))
And TLS_PRE_TCB_SIZE is a multiple of the struct pthread alignment
(confirmed by the new _Static_assert in sysdeps/ia64/libc-tls.c).
On m68k, we have a larger gap between tcbhead_t and struct pthread.
But as far as I can tell, the port is fine with that. The definition
of TCB_OFFSET is sufficient to handle the shifted TCB scenario.
This fixes commit 23c77f60181eb549f11ec2f913b4270af29eee38
("nptl: Increase default TCB alignment to 32").
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
| |
These are common between most architectures. Only the x86 targets
are outliers.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
|
|
|
|
|
|
|
|
|
| |
<tls.h> already contains a definition that is quite similar,
but it is not consistent across architectures.
Only architectures for which rseq support is added are covered.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Current binutils defines __executable_start as the lowest text
address, so using the entry point address as a fallback is no
longer necessary. As a result, overriding <entry.h> is only
necessary if the entry point is not called _start.
The previous approach to define __ASSEMBLY__ to suppress the
declaration breaks if headers included by <entry.h> are not
compatible with __ASSEMBLY__. This happens with rseq integration
because it is necessary to include kernel headers in more places.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
|
|
| |
rseq support will use a 32-byte aligned field in struct pthread,
so the whole struct needs to have at least that alignment.
nptl/tst-tls3mod.c uses TCB_ALIGNMENT, therefore include <descr.h>
to obtain the fallback definition.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Syscalls based on the assembly templates are missing CFI for r31, which gets
clobbered when scv is used, and info for LR is inaccurate, placed in the wrong
LOC and not using the proper offset. LR was also being saved to the callee's
frame, while the ABI mandates it to be saved to the caller's frame. These are
fixed by this commit.
After this change:
$ readelf -wF libc.so.6 | grep 0004b9d4.. -A 7 && objdump --disassemble=kill libc.so.6
00004a48 0000000000000020 00004a4c FDE cie=00000000 pc=000000000004b9d4..000000000004ba3c
LOC CFA r31 ra
000000000004b9d4 r1+0 u u
000000000004b9e4 r1+48 u u
000000000004b9e8 r1+48 c-16 u
000000000004b9fc r1+48 c-16 c+16
000000000004ba08 r1+48 c-16
000000000004ba18 r1+48 u
000000000004ba1c r1+0 u
libc.so.6: file format elf64-powerpcle
Disassembly of section .text:
000000000004b9d4 <kill>:
4b9d4: 1f 00 4c 3c addis r2,r12,31
4b9d8: 2c c3 42 38 addi r2,r2,-15572
4b9dc: 25 00 00 38 li r0,37
4b9e0: d1 ff 21 f8 stdu r1,-48(r1)
4b9e4: 20 00 e1 fb std r31,32(r1)
4b9e8: 98 8f ed eb ld r31,-28776(r13)
4b9ec: 10 00 ff 77 andis. r31,r31,16
4b9f0: 1c 00 82 41 beq 4ba0c <kill+0x38>
4b9f4: a6 02 28 7d mflr r9
4b9f8: 40 00 21 f9 std r9,64(r1)
4b9fc: 01 00 00 44 scv 0
4ba00: 40 00 21 e9 ld r9,64(r1)
4ba04: a6 03 28 7d mtlr r9
4ba08: 08 00 00 48 b 4ba10 <kill+0x3c>
4ba0c: 02 00 00 44 sc
4ba10: 00 00 bf 2e cmpdi cr5,r31,0
4ba14: 20 00 e1 eb ld r31,32(r1)
4ba18: 30 00 21 38 addi r1,r1,48
4ba1c: 18 00 96 41 beq cr5,4ba34 <kill+0x60>
4ba20: 01 f0 20 39 li r9,-4095
4ba24: 40 48 23 7c cmpld r3,r9
4ba28: 20 00 e0 4d bltlr+
4ba2c: d0 00 63 7c neg r3,r3
4ba30: 08 00 00 48 b 4ba38 <kill+0x64>
4ba34: 20 00 e3 4c bnslr+
4ba38: c8 32 fe 4b b 2ed00 <__syscall_error>
...
4ba44: 40 20 0c 00 .long 0xc2040
4ba48: 68 00 00 00 .long 0x68
4ba4c: 06 00 5f 5f rlwnm r31,r26,r0,0,3
4ba50: 6b 69 6c 6c xoris r12,r3,26987
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The @notoc usage only yields an advantage on ISA 3.1+ machine (power10)
and for ld.bfd also when it sees pcrel relocations used on the code
(generated if compiler targets ISA 3.1+). On bfd case ISA 3.1+
instruction on stubs are used iff linker also sees the new pc-relative
relocations (for instance R_PPC64_D34), otherwise it generates default
stubs (ppc64_elf_check_relocs:4700).
This patch also help on linkers that do not implement this optimization,
since building for older ISA (such as 3.0 / power9) will also trigger
power10 stubs generation in the assembly code uses the NOTOC imacro.
Checked on powerpc64le-linux-gnu.
Reviewed-by: Fangrui Song <maskray@google.com>
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are a few places where only known numeric values are acceptable for
`asm` parameters, yet the constraint "i" is used. "i" can include
"symbolic constants whose values will be known only at assembly time or
later."
Use "n" instead of "i" where known numeric values are required.
Suggested-by: Segher Boessenkool <segher@kernel.crashing.org>
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
|
|
|
|
|
|
|
|
| |
No bug.
This commit adds hidden defs for all declarations of __memcmpeq. This
enables usage of __memcmpeq without the PLT for usage internal to
GLIBC.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
No bug.
This commit adds support for __memcmpeq() as a new ABI for all
targets. In this commit __memcmpeq() is implemented only as an alias
to the corresponding targets memcmp() implementation. __memcmpeq() is
added as a new symbol starting with GLIBC_2.35 and defined in string.h
with comments explaining its behavior. Basic tests that it is callable
and works where added in string/tester.c
As discussed in the proposal "Add new ABI '__memcmpeq()' to libc"
__memcmpeq() is essentially a reserved namespace for bcmp(). The means
is shares the same specifications as memcmp() except the return value
for non-equal byte sequences is any non-zero value. This is less
strict than memcmp()'s return value specification and can be better
optimized when a boolean return is all that is needed.
__memcmpeq() is meant to only be called by compilers if they can prove
that the return value of a memcmp() call is only used for its boolean
value.
All tests in string/tester.c passed. As well build succeeds on
x86_64-linux-gnu target.
|