| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
This patch updates unroll8 code so as not to degrade at the peak
performance 16KB for both FX1000 and FX700.
Inserted 2 instructions at the beginning of the unroll8 loop,
cmp and branch, are a workaround that is found heuristically.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When using LLD (LLVM linker) as the linker, configure prints a confusing
message.
*** These critical programs are missing or too old: GNU ld
LLD>=13.0.0 can build glibc --enable-static-pie. (8.0.0 needs one
workaround for -Wl,-defsym=_begin=0. 9.0.0 works with
--disable-static-pie).
XFAIL two tests sysdeps/x86/tst-ifunc-isa-* which have the BZ #28154
issue (LLD follows the PowerPC port of GNU ld for ifunc by placing
IRELATIVE relocations in .rela.dyn, triggering a glibc ifunc fragility).
The set of dynamic symbols is the same with GNU ld and LLD,
modulo unused SHN_ABS version node symbols.
For comparison, gold does not support --enable-static-pie
yet (--no-dynamic-linker is unsupported BZ #22221), yet
has 6 failures more than LLD. gold linked libc.so has
larger .dynsym differences with GNU ld and LLD
(non-default version symbols are changed to default versions
by a version script BZ #28196).
|
|
|
|
|
| |
MS_SYNC is actually 0, so we cannot test that both MS_SYNC and MS_ASYNC
are set.
|
|
|
|
| |
== has higher priority than &
|
|
|
|
|
|
|
| |
Use testl, instead of andl, to check __x86_string_control to avoid
updating __x86_string_control.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
|
|
|
|
|
| |
On i686, there is no multiarch memove in libc.a, don't include multiarch
memove in ifunc-impl-list.c in libc.a.
|
|
|
|
|
|
|
|
|
| |
posix/tst-spawn5 (BZ #28260)
It ensures a continuous range of file descriptor and avoid hitting
the RLIMIT_NOFILE.
Checked on x86_64-linux-gnu.
|
|
|
|
|
|
| |
LLD doesn't support --{,no-}tls-get-addr-optimize.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
|
|
|
|
|
| |
The minimum GNU binutils requirement is 2.25 which supports AVX512DQ.
Remove assembler AVX512DQ check.
|
|
|
|
|
| |
The minimum GCC requirement is GCC 6.2 which supports -mavx512f. Remove
compiler -mavx512f check. Tested with GCC 6.4.1 on Linux/x86-64.
|
|
|
|
| |
This is not referenced any more and includes a non-existing file.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Optimize loads of all bits set into ZMM register in AVX512 SVML codes
by replacing
vpbroadcastq .L_2il0floatpacket.16(%rip), %zmmX
and
vmovups .L_2il0floatpacket.13(%rip), %zmmX
with
vpternlogd $0xff, %zmmX, %zmmX, %zmmX
This fixes BZ #28252.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The Autoconf documentation for the AC_CACHE_CHECK macro states:
The commands-to-set-it must have no side effects except for setting
the variable cache-id, see below.
However, the tests for support of -msahf and -mmovbe were embedded in
the commands-to-set-it for lib_cv_include_x86_isa_level. This had the
consequence that libc_cv_have_x86_lahf_sahf and libc_cv_have_x86_movbe
were not defined whenever lib_cv_include_x86_isa_level was read from
cache. These variables' being undefined meant that their unquoted use
in later test expressions led to the 'test' built-in's misparsing its
arguments and emitting errors like "test: =: unexpected operator" or
"test: =: unary operator expected", depending on the particular shell.
This commit refactors the tests for LAHF/SAHF and MOVBE instruction
support into their own AC_CACHE_CHECK macro invocations to obey the
rule that the commands-to-set-it must have no side effects other than
setting the variable named by cache-id.
Signed-off-by: Matt Whitlock <sourceware@mattwhitlock.name>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
|
|
|
|
|
|
|
|
| |
and drop reliance on _GLOBAL_OFFSET_TABLE_[0] being the link-time
address of _DYNAMIC. &__ehdr_start is a better way to get the load address.
This is similar to commits b37b75d269883a2c553bb7019a813094eb4e2dd1
(x86-64) and 43d06ed218fc8be58987bdfd00e21e5720f0b862 (aarch64).
Reviewed-by: Joseph Myers <joseph@codesourcery.com>
|
|
|
|
|
|
|
|
|
| |
&__ehdr_start is a better way to get the load address.
This is similar to commits b37b75d269883a2c553bb7019a813094eb4e2dd1
(x86-64) and 43d06ed218fc8be58987bdfd00e21e5720f0b862 (aarch64).
Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
|
|
|
|
|
|
| |
They provide TLS_GD/TLS_LD/TLS_IE/TLS_IE macros for TLS testing. Now
that we have migrated to __thread and tls_model attributes, these macros
are unused and the tls-macros.h files can retire.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
|
|
|
|
|
|
|
| |
and drop reliance on _GLOBAL_OFFSET_TABLE_[0] being the link-time
address of _DYNAMIC. &__ehdr_start is a better way to get the load address.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
#28152] [BZ #28205]
elf/tls-macros.h was added for TLS testing when GCC did not support
__thread. __thread and tls_model attributes are mature now and have been
used by many newer tests.
Also delete tst-tls2.c which tests .tls_common (unused by modern GCC and
unsupported by Clang/LLD). .tls_common and .tbss definition are almost
identical after linking, so the runtime test doesn't add additional
coverage. Assembler and linker tests should be on the binutils side.
When LLD 13.0.0 is allowed in configure.ac
(https://sourceware.org/pipermail/libc-alpha/2021-August/129866.html),
`make check` result is on par with glibc built with GNU ld on aarch64
and x86_64.
As a future clean-up, TLS_GD/TLS_LD/TLS_IE/TLS_IE macros can be removed from
sysdeps/*/tls-macros.h. We can add optional -mtls-dialect={gnu2,trad}
tests to ensure coverage.
Tested on aarch64-linux-gnu, powerpc64le-linux-gnu, and x86_64-linux-gnu.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
|
|
|
|
|
|
| |
Gnumach's 0650a4ee30e3 implements support for high bits being set in the
mask parameter of vm_map. This allows to remove the fmh kludge that was
masking away the address range by mapping a dumb area there.
|
|
|
|
|
|
|
|
|
|
|
|
| |
In "mips: align stack in clone [BZ #28223]"
(commit 1f51cd9a860ee45eee8a56fb2ba925267a2a7bfe) I made a mistake: I
misbelieved one "word" was 2-byte and "doubleword" should be 4-byte.
But in MIPS ABI one "word" is defined 32-bit (4-byte), so "doubleword" is
8-byte [1], and "quadword" is 16-byte [2].
[1]: "System V Application Binary Interface: MIPS(R) RISC Processor
Supplement, 3rd edition", page 3-31
[2]: "MIPSpro(TM) 64-Bit Porting and Transition Guide", page 23
|
|
|
|
|
|
|
|
| |
The MIPS O32 ABI requires 4 byte aligned stack, and the MIPS N64 and N32
ABI require 8 byte aligned stack. Previously if the caller passed an
unaligned stack to clone the the child misbehaved.
Fixes bug 28223.
|
|
|
|
|
|
|
| |
When the memory object is read-only, the kernel would be right in
refusing max vmprot containing VM_PROT_WRITE.
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
|
|
|
|
| |
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The AArch64 ABI is largely platform agnostic and does not specify
_GLOBAL_OFFSET_TABLE_[0] ([1]). glibc ld.so turns out to be probably the
only user of _GLOBAL_OFFSET_TABLE_[0] and GNU ld defines the value
to the link-time address _DYNAMIC. [2]
In 2012, __ehdr_start was implemented in GNU ld and gold in binutils
2.23. Using adrp+add / (-mcmodel=tiny) adr to access
__ehdr_start/_DYNAMIC gives us a robust way to get the load address and
the link-time address of _DYNAMIC.
[1]: From a psABI maintainer, https://bugs.llvm.org/show_bug.cgi?id=49672#c2
[2]: LLD's aarch64 port does not set _GLOBAL_OFFSET_TABLE_[0] to the
link-time address _DYNAMIC.
LLD is widely used on aarch64 Android and ChromeOS devices. Software
just works without the need for _GLOBAL_OFFSET_TABLE_[0].
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
|
|
|
|
|
|
|
| |
Simplify the code for memsets smaller than L1. Improve the unroll8 and
L1_prefetch loops.
Reviewed-by: Naohiro Tamura <naohirot@fujitsu.com>
|
|
|
|
|
|
| |
Remove unroll32 code since it doesn't improve performance.
Reviewed-by: Naohiro Tamura <naohirot@fujitsu.com>
|
|
|
|
|
|
|
| |
Simplify handling of remaining bytes. Avoid lots of taken branches and complex
whilelo computations, instead unconditionally write vectors from the end.
Reviewed-by: Naohiro Tamura <naohirot@fujitsu.com>
|
|
|
|
|
|
|
|
| |
Improve performance of large memsets. Simplify alignment code. For zero memset
use DC ZVA, which almost doubles performance. For non-zero memsets use the
unroll8 loop which is about 10% faster.
Reviewed-by: Naohiro Tamura <naohirot@fujitsu.com>
|
|
|
|
|
|
|
|
| |
Improve performance of small memsets by reducing instruction counts and
improving code alignment. Bench-memset shows 35-45% performance gain for
small sizes.
Reviewed-by: Naohiro Tamura <naohirot@fujitsu.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Linux 5.13 adds a PTRACE_GET_RSEQ_CONFIGURATION constant, with an
associated ptrace_rseq_configuration structure.
Add this constant to the various sys/ptrace.h headers in glibc, with
the structure in bits/ptrace-shared.h (named struct
__ptrace_rseq_configuration in glibc, as with other such structures).
Tested for x86_64, and with build-many-glibcs.py.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Helper thread frees copied attribute on NOTIFY_REMOVED message
received from the OS kernel. Unfortunately, it fails to check whether
copied attribute actually exists (data.attr != NULL). This worked
earlier because free() checks passed pointer before actually
attempting to release corresponding memory. But
__pthread_attr_destroy assumes pointer is not NULL.
So passing NULL pointer to __pthread_attr_destroy will result in
segmentation fault. This scenario is possible if
notification->sigev_notify_attributes == NULL (which means default
thread attributes should be used).
Signed-off-by: Nikita Popov <npv1310@gmail.com>
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
|
|
|
|
|
|
|
| |
We'd like to support processors without Altivec or VSX, so check
the relevant hwcap bits before selecting them.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
|
|
|
|
|
|
|
| |
A number of optimised memset routines assume the cacheline size is 128B,
so we better check before using them.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
|
|
|
|
|
|
|
|
| |
We use PPC_FEATURE_HAS_VSX to select a number of POWER7 optimised
functions. These functions don't use any VSX instructions, so
PPC_FEATURE_ARCH_2_06 seems like a better fit.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
__REDIRECT and __THROW are not compatible with C++ due to the ordering of the
__asm__ alias and the throw specifier. __REDIRECT_NTH has to be used
instead.
Fixes commit 8a40aff86ba5f64a3a84883e539cb67b ("io: Add time64 alias
for fcntl"), commit 82c395d91ea4f69120d453aeec398e30 ("misc: Add
time64 alias for ioctl"), commit b39ffab860cd743a82c91946619f1b8158
("Linux: Add time64 alias for prctl").
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
It turned that the generic implementation of brk() does not work
for sparc, since on failure kernel will just return the previous
input value without setting the conditional register.
This patches adds back a sparc32 and sparc64 implementation removed
by 720480934ab9107.
Checked on sparc64-linux-gnu and sparcv9-linux-gnu.
|
|
|
|
| |
The generated code is unchanged.
|
|
|
|
|
|
|
| |
labellist and precedencelist could get freed a second time if there
are allocation failures, so set them to NULL to avoid a double-free.
Reviewed-by: Arjun Shankar <arjun@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 3ec5d83d2a237d39e7fd6ef7a0bc8ac4c171a4a5
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Sat Jan 25 14:19:40 2020 -0800
x86-64: Avoid rep movsb with short distance [BZ #27130]
introduced some regressions on Intel processors without Fast Short REP
MOV (FSRM). Add Avoid_Short_Distance_REP_MOVSB to avoid rep movsb with
short distance only on Intel processors with FSRM. bench-memmove-large
on Skylake server shows that cycles of __memmove_evex_unaligned_erms
improves for the following data size:
before after Improvement
length=4127, align1=3, align2=0: 479.38 349.25 27%
length=4223, align1=9, align2=5: 405.62 333.25 18%
length=8223, align1=3, align2=0: 786.12 496.38 37%
length=8319, align1=9, align2=5: 727.50 501.38 31%
length=16415, align1=3, align2=0: 1436.88 840.00 41%
length=16511, align1=9, align2=5: 1375.50 836.38 39%
length=32799, align1=3, align2=0: 2890.00 1860.12 36%
length=32895, align1=9, align2=5: 2891.38 1931.88 33%
|
| |
|
|
|
|
|
| |
The setitimer fork hook, fork_itimer, needs to call malloc inside
__mach_setup_tls, so we need to unlock malloc before calling it.
|
|
|
|
|
| |
These failures were caught while building glibc master for Fedora Rawhide
which is built with `-mtune=generic -msse2 -mfpmath=sse'.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. Install <bits/platform/x86.h> for <sys/platform/x86.h> which includes
<bits/platform/x86.h>.
2. Rename HAS_CPU_FEATURE to CPU_FEATURE_PRESENT which checks if the
processor has the feature.
3. Rename CPU_FEATURE_USABLE to CPU_FEATURE_ACTIVE which checks if the
feature is active. There may be other preconditions, like sufficient
stack space or further setup for AMX, which must be satisfied before the
feature can be used.
This fixes BZ #27958.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove unused code and declare __libc_mallopt when !IS_IN (libc) to
allow the debug hook to build with --disable-tunables.
Also, run tst-ifunc-isa-2* tests only when tunables are enabled since
the result depends on it.
Tested on x86_64.
Reported-by: Matheus Castanho <msc@linux.ibm.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
|
|
|
|
|
|
| |
84f7ce84474c ("posix: Add glob64 with 64-bit time_t support") replaced
GLOB_NO_LSTAT with defining GLOB_LSTAT and GLOB_LSTAT64, but the posix
and gnu versions of the change were missing in the commit.
|
|
|
|
|
| |
Reviewed-by: Lukasz Majewski <lukma@denx.de>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
|
|
|
| |
Reviewed-by: Lukasz Majewski <lukma@denx.de>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These deprecated functions are only safe to call from
__malloc_initialize_hook and as a result, are not useful in the
general case. Move the implementations to libc_malloc_debug so that
existing binaries that need it will now have to preload the debug DSO
to work correctly.
This also allows simplification of the core malloc implementation by
dropping all the undumping support code that was added to make
malloc_set_state work.
One known breakage is that of ancient emacs binaries that depend on
this. They will now crash when running with this libc. With
LD_BIND_NOW=1, it will terminate immediately because of not being able
to find malloc_set_state but with lazy binding it will crash in
unpredictable ways. It will need a preloaded libc_malloc_debug.so so
that its initialization hook is executed to allow its malloc
implementation to work properly.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The malloc-check debugging feature is tightly integrated into glibc
malloc, so thanks to an idea from Florian Weimer, much of the malloc
implementation has been moved into libc_malloc_debug.so to support
malloc-check. Due to this, glibc malloc and malloc-check can no
longer work together; they use altogether different (but identical)
structures for heap management. This should not make a difference
though since the malloc check hook is not disabled anywhere.
malloc_set_state does, but it does so early enough that it shouldn't
cause any problems.
The malloc check tunable is now in the debug DSO and has no effect
when the DSO is not preloaded.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
|