| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
Don't set Prefer_No_AVX512 on processors with AVX512 and AVX-VNNI since
they won't lower CPU frequency when ZMM load and store instructions are
used.
|
|
|
|
| |
The powerpc is refactor to use the default implementation.
|
|
|
|
|
| |
It allows also to remove hppa specific implementation and simplify
riscv implementation a bit.
|
|
|
|
|
|
| |
The function was renamed to __atomic_wide_counter_load_relaxed
in commit 8bd336a00a5311bf7a9e99b3b0e9f01ff5faa74b ("nptl: Extract
<bits/atomic_wide_counter.h> from pthread_cond_common.c").
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Current binutils defines __executable_start as the lowest text
address, so using the entry point address as a fallback is no
longer necessary. As a result, overriding <entry.h> is only
necessary if the entry point is not called _start.
The previous approach to define __ASSEMBLY__ to suppress the
declaration breaks if headers included by <entry.h> are not
compatible with __ASSEMBLY__. This happens with rseq integration
because it is necessary to include kernel headers in more places.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Programs without dynamic dependencies and without a program
interpreter are now run via execve.
Previously, the dynamic linker either crashed while attempting to
read a non-existing dynamic segment (looking for DT_AUDIT/DT_DEPAUDIT
data), or the self-relocated in the static PIE executable crashed
because the outer dynamic linker had already applied RELRO protection.
<dl-execve.h> is needed because execve is not available in the
dynamic loader on Hurd.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
|
| |
On Ice Lake and Tiger Lake laptops, some test programs timeout when there
are 3 "make check -j8" runs in parallel. Add --with-timeoutfactor=NUM to
specify an integer to scale the timeout of test programs, which can be
overriden by TIMEOUTFACTOR environment variable.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
|
|
|
|
|
|
|
|
| |
Must use notl %edi here as lower bits are for CHAR comparisons
potentially out of range thus can be 0 without indicating mismatch.
This fixes BZ #28646.
Co-Authored-By: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
|
|
| |
rseq support will use a 32-byte aligned field in struct pthread,
so the whole struct needs to have at least that alignment.
nptl/tst-tls3mod.c uses TCB_ALIGNMENT, therefore include <descr.h>
to obtain the fallback definition.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
| |
As defined on: https://systemd.io/COREDUMP_PACKAGE_METADATA/
this note will be used starting from Fedora 36.
Signed-off-by: Luca Boccassi <bluca@debian.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2 is a complete rewrite of the A64FX memcpy. Performance is improved
by streamlining the code, aligning all large copies and using a single
unrolled loop for all sizes. The code size for memcpy and memmove goes
down from 1796 bytes to 868 bytes. Performance is better in all cases:
bench-memcpy-random is 2.3% faster overall, bench-memcpy-large is ~33%
faster for large sizes, bench-memcpy-walk is 25% faster for small sizes
and 20% for the largest sizes. The geomean of all tests in bench-memcpy
is 5.1% faster, and total time is reduced by 4%.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
|
|
|
|
|
|
|
|
| |
Rewrite memcmp to improve performance. On small and medium inputs performance
is 10-20% better. Large inputs use a SIMD loop processing 64 bytes per
iteration, which is 30-50% faster depending on the size.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Syscalls based on the assembly templates are missing CFI for r31, which gets
clobbered when scv is used, and info for LR is inaccurate, placed in the wrong
LOC and not using the proper offset. LR was also being saved to the callee's
frame, while the ABI mandates it to be saved to the caller's frame. These are
fixed by this commit.
After this change:
$ readelf -wF libc.so.6 | grep 0004b9d4.. -A 7 && objdump --disassemble=kill libc.so.6
00004a48 0000000000000020 00004a4c FDE cie=00000000 pc=000000000004b9d4..000000000004ba3c
LOC CFA r31 ra
000000000004b9d4 r1+0 u u
000000000004b9e4 r1+48 u u
000000000004b9e8 r1+48 c-16 u
000000000004b9fc r1+48 c-16 c+16
000000000004ba08 r1+48 c-16
000000000004ba18 r1+48 u
000000000004ba1c r1+0 u
libc.so.6: file format elf64-powerpcle
Disassembly of section .text:
000000000004b9d4 <kill>:
4b9d4: 1f 00 4c 3c addis r2,r12,31
4b9d8: 2c c3 42 38 addi r2,r2,-15572
4b9dc: 25 00 00 38 li r0,37
4b9e0: d1 ff 21 f8 stdu r1,-48(r1)
4b9e4: 20 00 e1 fb std r31,32(r1)
4b9e8: 98 8f ed eb ld r31,-28776(r13)
4b9ec: 10 00 ff 77 andis. r31,r31,16
4b9f0: 1c 00 82 41 beq 4ba0c <kill+0x38>
4b9f4: a6 02 28 7d mflr r9
4b9f8: 40 00 21 f9 std r9,64(r1)
4b9fc: 01 00 00 44 scv 0
4ba00: 40 00 21 e9 ld r9,64(r1)
4ba04: a6 03 28 7d mtlr r9
4ba08: 08 00 00 48 b 4ba10 <kill+0x3c>
4ba0c: 02 00 00 44 sc
4ba10: 00 00 bf 2e cmpdi cr5,r31,0
4ba14: 20 00 e1 eb ld r31,32(r1)
4ba18: 30 00 21 38 addi r1,r1,48
4ba1c: 18 00 96 41 beq cr5,4ba34 <kill+0x60>
4ba20: 01 f0 20 39 li r9,-4095
4ba24: 40 48 23 7c cmpld r3,r9
4ba28: 20 00 e0 4d bltlr+
4ba2c: d0 00 63 7c neg r3,r3
4ba30: 08 00 00 48 b 4ba38 <kill+0x64>
4ba34: 20 00 e3 4c bnslr+
4ba38: c8 32 fe 4b b 2ed00 <__syscall_error>
...
4ba44: 40 20 0c 00 .long 0xc2040
4ba48: 68 00 00 00 .long 0x68
4ba4c: 06 00 5f 5f rlwnm r31,r26,r0,0,3
4ba50: 6b 69 6c 6c xoris r12,r3,26987
|
|
|
|
|
|
|
|
|
| |
The syscall pipe2 was added in linux 2.6.27 and glibc requires linux
3.2.0. The patch removes the arch-specific implementation for alpha,
ia64, mips, sh, and sparc which requires a different kernel ABI
than the usual one.
Checked on x86_64-linux-gnu and with a build for the affected ABIs.
|
|
|
|
|
|
|
|
|
| |
Variadic function calls in syscalls.list does not work for all ABIs
(for instance where the argument are passed on the stack instead of
registers) and might have underlying issues depending of the variadic
type (for instance if a 64-bit argument is used).
Checked on x86_64-linux-gnu.
|
|
|
|
|
|
|
|
|
|
|
| |
The LFS prlimit64 requires a arch-specific implementation in
syscalls.list. Instead add a generic one that handles the
required symbol alias for __RLIM_T_MATCHES_RLIM64_T.
HPPA is the only outlier which requires a different default
symbol.
Checked on x86_64-linux-gnu and with build for the affected ABIs.
|
|
|
|
| |
The test uses the bool type.
|
|
|
|
| |
The test uses standard integer types.
|
|
|
|
| |
libc.so.0.3 does not seem to need this defined any more.
|
|
|
|
|
|
|
| |
The /proc/statm fallback was removed by f13fb81ad3159 if sysfs is
not available, reinstate it.
Checked on x86_64-linux-gnu.
|
|
|
|
|
|
|
|
|
| |
Passing 64-bit arguments on syscalls.list is tricky: it requires
to reimplement the expected kernel abi in each architecture. This
is way to better to represent in C code where we already have
macros for this (SYSCALL_LL64).
Checked on x86_64-linux-gnu.
|
|
|
|
|
|
|
| |
For 32-bit architecture with __ASSUME_STATX there is no need to
build fstatat64_time64_stat.
Checked on i686-linux-gnu.
|
|
|
|
|
|
| |
Problem reported by Benno Schulenberg in:
https://lists.gnu.org/r/bug-gnulib/2021-10/msg00035.html
* posix/regexec.c (re_search_internal): Use better bounds check.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add vector sin/sinf and input files to libmvec microbenchmark.
libmvec-sin-inputs:
90% Normal random distribution
range: (-DBL_MAX, DBL_MAX)
mean: 0.0
sigma: 5.0
10% uniform random distribution in range (-1000.0, 1000.0)
libmvec-sinf-inputs:
90% Normal random distribution
range: (-FLT_MAX, FLT_MAX)
mean: 0.0f
sigma: 5.0f
10% uniform random distribution in range (-1000.0f, 1000.0f)
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add vector pow/powf and input files to libmvec microbenchmark.
libmvec-pow-inputs:
arg1:
90% Normal random distribution
range: (0.0, 256.0)
mean: 0.0
sigma: 32.0
10% uniform random distribution in range (0.0, 256.0)
arg2:
90% Normal random distribution
range: (-127.0, 127.0)
mean: 0.0
sigma: 16.0
10% uniform random distribution in range (-127.0, 127.0)
libmvec-powf-inputs:
arg1:
90% Normal random distribution
range: (0.0f, 100.0f)
mean: 0.0f
sigma: 16.0f
10% uniform random distribution in range (0.0f, 100.0f)
arg2:
90% Normal random distribution
range: (-10.0f, 10.0f)
mean: 0.0f
sigma: 8.0f
10% uniform random distribution in range (-10.0f, 10.0f)
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add vector log/logf and input files to libmvec microbenchmark.
libmvec-log-inputs:
70% Normal random distribution
range: (0.0, DBL_MAX)
mean: 1.0
sigma: 50.0
30% uniform random distribution in range (0.0, DBL_MAX)
libmvec-logf-inputs:
70% Normal random distribution
range: (0.0f, FLT_MAX)
mean: 1.0f
sigma: 50.0f
30% uniform random distribution in range (0.0f, FLT_MAX)
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add vector exp/expf and input files to libmvec microbenchmark.
libmvec-exp-inputs:
90% Normal random distribution
range: (-708.0, 709.0)
mean: 0.0
sigma: 16.0
10% uniform random distribution in range (-500.0, 500.0)
libmvec-expf-inputs:
90% Normal random distribution
range: (-87.0f, 88.0f)
mean: 0.0f
sigma: 8.0f
10% uniform random distribution in range (-50.0f, 50.0f)
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add vector cos/cosf and input files to libmvec microbenchmark.
libmvec-cos-inputs:
90% Normal random distribution
range: (-DBL_MAX, DBL_MAX)
mean: 0.0
sigma: 5.0
10% uniform random distribution in range (-1000.0, 1000.0)
libmvec-cosf-inputs:
90% Normal random distribution
range: (-FLT_MAX, FLT_MAX)
mean: 0.0f
sigma: 5.0f
10% uniform random distribution in range (-1000.0f, 1000.0f)
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that Hurd implementis both close_range and closefrom (f2c996597d),
we can make close_range() a base ABI, and make the default closefrom()
implementation on top of close_range().
The generic closefrom() implementation based on __getdtablesize() is
moved to generic close_range(). On Linux it will be overriden by
the auto-generation syscall while on Hurd it will be a system specific
implementation.
The closefrom() now calls close_range() and __closefrom_fallback().
Since on Hurd close_range() does not fail, __closefrom_fallback() is an
empty static inline function set by__ASSUME_CLOSE_RANGE.
The __ASSUME_CLOSE_RANGE also allows optimize Linux
__closefrom_fallback() implementation when --enable-kernel=5.9 or
higher is used.
Finally the Linux specific tst-close_range.c is moved to io and
enabled as default. The Linuxism and CLOSE_RANGE_UNSHARE are
guarded so it can be built for Hurd (I have not actually test it).
Checked on x86_64-linux-gnu, i686-linux-gnu, and with a i686-gnu
build.
|
|
|
|
|
|
|
|
|
|
|
|
| |
__libc_signal_restore_set was in the wrong place: It also ran
when setjmp returned the second time (after pthread_exit or
pthread_cancel). This is observable with blocked pending
signals during thread exit.
Fixes commit b3cae39dcbfa2432b3f3aa28854d8ac57f0de1b8
("nptl: Start new threads with all signals blocked [BZ #25098]").
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The @notoc usage only yields an advantage on ISA 3.1+ machine (power10)
and for ld.bfd also when it sees pcrel relocations used on the code
(generated if compiler targets ISA 3.1+). On bfd case ISA 3.1+
instruction on stubs are used iff linker also sees the new pc-relative
relocations (for instance R_PPC64_D34), otherwise it generates default
stubs (ppc64_elf_check_relocs:4700).
This patch also help on linkers that do not implement this optimization,
since building for older ISA (such as 3.0 / power9) will also trigger
power10 stubs generation in the assembly code uses the NOTOC imacro.
Checked on powerpc64le-linux-gnu.
Reviewed-by: Fangrui Song <maskray@google.com>
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
It requires less boilerplate code for newer ports. The _Static_assert
checks from internal setjmp are moved to its own internal test since
setjmp.h is included early by multiple headers (to generate
rtld-sizes.sym).
The riscv jmp_buf-macros.h check is also redundant, it is already
done by riscv configure.ac.
Checked with a build for the affected architectures.
|
|
|
|
|
|
|
|
| |
This patch updates the kernel version in the test tst-mman-consts.py
to 5.15. (There are no new MAP_* constants covered by this test in
5.15 that need any other header changes.)
Tested with build-many-glibcs.py.
|
|
|
|
|
|
|
| |
It is not possible to use interface ioctls with netlink sockets
on all Linux kernels.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
|
|
|
|
|
|
| |
It ensures that the the namespace is guaranteed to not be empty.
Checked on x86_64-linux-gnu.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
|
|
|
|
|
|
|
| |
Linux 5.15 adds a new address / protocol family PF_MCTP / AF_MCTP; add
these constants to bits/socket.h.
Tested for x86_64.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The change 1e5a5866cb ("Remove malloc hooks [BZ #23328]") has broken
ports that are using GLIBC_2_35, like the new OpenRISC port I am working
on.
The libc_malloc_debug.so library used to bring in the debug
infrastructure is currently essentially empty for GLIBC_2_35 ports like
mine causing mtrace tests to fail:
cat sysdeps/unix/sysv/linux/or1k/shlib-versions
DEFAULT GLIBC_2.35
ld=ld-linux-or1k.so.1
FAIL: posix/bug-glob2-mem
FAIL: posix/bug-regex14-mem
FAIL: posix/bug-regex2-mem
FAIL: posix/bug-regex21-mem
FAIL: posix/bug-regex31-mem
FAIL: posix/bug-regex36-mem
FAIL: malloc/tst-mtrace.
The issue seems to be with the ifdefs in malloc/malloc-debug.c. The
ifdefs are currently essentially exluding all symbols for ports > 2.35.
Removing the top level SHLIB_COMPAT ifdef allows things to just work.
Fixes: 1e5a5866cb ("Remove malloc hooks [BZ #23328]")
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
|
|
|
|
|
|
|
| |
This will be used to deallocate memory allocated using the non-minimal
malloc.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
And make it an installed header. This addresses a few aliasing
violations (which do not seem to result in miscompilation due to
the use of atomics), and also enables use of wide counters in other
parts of the library.
The debug output in nptl/tst-cond22 has been adjusted to print
the 32-bit values instead because it avoids a big-endian/little-endian
difference.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add python script to generate libmvec microbenchmark from the input
values for each libmvec function using skeleton benchmark template.
Creates double and float benchmarks with vector length 1, 2, 4, 8,
and 16 for each libmvec function. Vector length 1 corresponds to
scalar version of function and is included for vector function perf
comparison.
Co-authored-by: Haochen Jiang <haochen.jiang@intel.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
| |
Since b05fae4d8e34, __minimal malloc code is used during static
startup before PIE self-relocation (_dl_relocate_static_pie).
So it requires the same fix done for other objects by 47618209d05a.
Checked on aarch64, x86_64, and i686 with and without static-pie.
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. Use a temporary file to generate Makefile fragments for DSO sorting
tests and use -include on them.
2. Add Makefile fragments to postclean-generated so that a "make clean"
removes the autogenerated fragments and a subsequent "make" regenerates
them.
This partially fixes BZ #28550.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Put all sources for DSO sorting tests in the dso-sort-tests-src directory
and compile test relocatable objects with
$(objpfx)tst-dso-ordering1-dir/tst-dso-ordering1-a.os: $(objpfx)dso-sort-tests-src/tst-dso-ordering1-a.c
$(compile.c) $(OUTPUT_OPTION)
to avoid random $< values from $(before-compile) when compiling test
relocatable objects with
$(objpfx)%$o: $(objpfx)%.c $(before-compile); $$(compile-command.c)
compile-command.c = $(compile.c) $(OUTPUT_OPTION) $(compile-mkdep-flags)
compile.c = $(CC) $< -c $(CFLAGS) $(CPPFLAGS)
for 3 "make -j 28" parallel builds on a machine with 112 cores at the
same time.
This partially fixes BZ #28550.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
|
|
| |
No functional change.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Update
commit 49302b8fdf9103b6fc0a398678668a22fa19574c
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Thu Nov 11 06:54:01 2021 -0800
Avoid extra load with CAS in __pthread_mutex_clocklock_common [BZ #28537]
Replace boolean CAS with value CAS to avoid the extra load.
and
commit 0b82747dc48d5bf0871bdc6da8cb6eec1256355f
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Thu Nov 11 06:31:51 2021 -0800
Avoid extra load with CAS in __pthread_mutex_lock_full [BZ #28537]
Replace boolean CAS with value CAS to avoid the extra load.
by moving assignment out of the CAS condition.
|
|
|
|
|
| |
Document that --enable-initfini-array is enabled by default in GCC 12,
which can be removed when GCC 12 becomes the minimum requirement.
|
|
|
|
|
|
|
| |
Currently, if the temporary file creation fails the create_tz_file
function returns NULL. The NULL pointer is then passed to setenv which
causes a SIGSEGV. Rather than failing with a SIGSEGV print a warning
and exit.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CAS instruction is expensive. From the x86 CPU's point of view, getting
a cache line for writing is more expensive than reading. See Appendix
A.2 Spinlock in:
https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/xeon-lock-scaling-analysis-paper.pdf
The full compare and swap will grab the cache line exclusive and cause
excessive cache line bouncing.
Add LLL_MUTEX_READ_LOCK to do an atomic load and skip CAS in spinlock
loop if compare may fail to reduce cache line bouncing on contended locks.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
|
|
|
|
|
|
| |
Replace boolean CAS with value CAS to avoid the extra load.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
|
|
|
|
|
|
| |
Replace boolean CAS with value CAS to avoid the extra load.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
|