| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
These structs describe file formats under /var/log, and should not
depend on the definition of _TIME_BITS. This is achieved by
defining __WORDSIZE_TIME64_COMPAT32 to 1 on 32-bit ports that
support 32-bit time_t values (where __time_t is 32 bits).
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 9abdae94c7454c45e02e97e4ed1eb1b1915d13d8)
|
|
|
|
|
|
|
|
|
| |
The default <utmp-size.h> is for ports with a 64-bit time_t.
Ports with a 32-bit time_t or with __WORDSIZE_TIME64_COMPAT32=1
need to override it.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 4d4da5aab936504b2d3eca3146e109630d9093c4)
|
|
|
|
|
|
|
|
| |
The sparc32 is always 32 bits.
Checked on sparcv9-linux-gnu.
(cherry picked from commit dd57f5e7b652772499cb220d78157c1038d24f06)
|
|
|
|
|
|
|
|
|
| |
Fall back to ppoll if ppoll_time64 fails with ENOSYS.
Fixes commit 370da8a121c3ba9eeb2f13da15fc0f21f4136b25 ("nptl: Fix
tst-cancel30 on sparc64").
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit f4724843ada64a51d66f65d3199fe431f9d4c254)
|
|
|
|
|
|
|
|
|
| |
This seems to have stopped working with some GCC 14 versions,
which clobber r2. With other compilers, the kernel-provided
r2 value is still available at this point.
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
(cherry picked from commit 14e56bd4ce15ac2d1cc43f762eb2e6b83fec1afe)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Old Linux kernels disable SVE after every system call. Calling the
SVE-optimized memcpy afterwards will then cause a trap to reenable SVE.
As a result, applications with a high use of syscalls may run slower with
the SVE memcpy. This is true for kernels between 4.15.0 and before 6.2.0,
except for 5.14.0 which was patched. Avoid this by checking the kernel
version and selecting the SVE ifunc on modern kernels.
Parse the kernel version reported by uname() into a 24-bit kernel.major.minor
value without calling any library functions. If uname() is not supported or
if the version format is not recognized, assume the kernel is modern.
Tested-by: Florian Weimer <fweimer@redhat.com>
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit 2e94e2f5d2bf2de124c8ad7da85463355e54ccb2)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Due to GCC bug 110901 -mcpu can override -march setting when compiling
asm code and thus a compiler targetting a specific cpu can fail the
configure check even when binutils gas supports SVE.
The workaround is that explicit .arch directive overrides both -mcpu
and -march, and since that's what the actual SVE memcpy uses the
configure check should use that too even if the GCC issue is fixed
independently.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit 73c26018ed0ecd9c807bb363cc2c2ab4aca66a82)
|
|
|
|
|
|
|
|
|
| |
The .cfi_return_column directive changes the return column for the whole
FDE range. But the actual intent is to tell the unwinder that the value
in x30 (lr) now resides in x15 after the move, and that is expressed by
the .cfi_register directive.
(cherry picked from commit 3f798427884fa57770e8e2291cf58d5918254bb5)
|
|
|
|
|
|
|
|
|
|
|
| |
The latest implementations of memcpy are actually faster than the Falkor
implementations [1], so remove the falkor/phecda ifuncs for memcpy and
the now unused IS_FALKOR/IS_PHECDA defines.
[1] https://sourceware.org/pipermail/libc-alpha/2022-December/144227.html
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 2f5524cc5381eb75fef55f7901bb907bd5628333)
|
|
|
|
|
|
|
|
|
| |
Add a specialized memset for the common ZVA size of 64 to avoid the
overhead of reading the ZVA size. Since the code is identical to
__memset_falkor, remove the latter.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 3d7090f14b13312320e425b27dcf0fe72de026fd)
|
|
|
|
|
|
|
|
| |
Cleanup emag memset - merge the memset_base64.S file, remove
the unused ZVA code (since it is disabled on emag).
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 9627ab99b50d250c6dd3001a3355aa03692f7fe5)
|
|
|
|
|
|
|
|
|
| |
Cleanup ifuncs. Remove uses of libc_hidden_builtin_def, use ENTRY rather than
ENTRY_ALIGN, remove unnecessary defines and conditional compilation. Rename
strlen_mte to strlen_generic. Remove rtld-memset.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit 9fd3409842b3e2d31cff5dbd6f96066c430f0aa2)
|
|
|
|
|
|
|
|
|
| |
Add support for MOPS in cpu_features and INIT_ARCH. Add ifuncs using MOPS for
memcpy, memmove and memset (use .inst for now so it works with all binutils
versions without needing complex configure and conditional compilation).
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit 2bd00179885928fd95fcabfafc50e7b5c6e660d2)
|
|
|
|
|
|
|
|
|
| |
Linux 6.5 adds a new AArch64 HWCAP2 value, HWCAP2_MOPS. Add it to
glibc's bits/hwcap.h.
Tested with build-many-glibcs.py for aarch64-linux-gnu.
(cherry picked from commit ff5d2abd18629e0efac41e31699cdff3be0e08fa)
|
|
|
|
|
|
|
|
|
| |
Improve SVE memcpy by copying 2 vectors if the size is small enough.
This improves performance of random memcpy by ~9% on Neoverse V1, and
33-64 byte copies are ~16% faster.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit d2d3f3720ce627a4fe154d8dd14db716a32bcc6e)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Originally, nptl/descr.h included <sys/rseq.h>, but we removed that
in commit 2c6b4b272e6b4d07303af25709051c3e96288f2d ("nptl:
Unconditionally use a 32-byte rseq area"). After that, it was
not ensured that the RSEQ_SIG macro was defined during sched_getcpu.c
compilation that provided a definition. This commit always checks
the rseq area for CPU number information before using the other
approaches.
This adds an unnecessary (but well-predictable) branch on
architectures which do not define RSEQ_SIG, but its cost is small
compared to the system call. Most architectures that have vDSO
acceleration for getcpu also have rseq support.
Fixes: 2c6b4b272e6b4d07303af25709051c3e96288f2d
Fixes: 1d350aa06091211863e41169729cee1bca39f72f
Reviewed-by: Arjun Shankar <arjun@redhat.com>
(cherry picked from commit 7a76f218677d149d8b7875b336722108239f7ee9)
Fixes: 9da8174362860b4fc564c8ca8c11fc3c51ed1de9
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Starting with commit 2c6b4b272e6b4d07303af25709051c3e96288f2d
"nptl: Unconditionally use a 32-byte rseq area", the testcase
misc/tst-rseq-disable is UNSUPPORTED as RSEQ_SIG is not defined.
The mentioned commit removes inclusion of sys/rseq.h in nptl/descr.h.
Thus just include sys/rseq.h in the tst-rseq-disable.c as also done
in tst-rseq.c and tst-rseq-nptl.c.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit 637aac2ae3980de31a6baab236a9255fe853cc76)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Starting with commit e57d8fc97b90127de4ed3e3a9cdf663667580935
"S390: Always use svc 0"
clone clobbers the call-saved register r7 in error case:
function or stack is NULL.
This patch restores the saved registers also in the error case.
Furthermore the existing test misc/tst-clone is extended to check
all error cases and that clone does not clobber registers in this
error case.
(cherry picked from commit 02782fd12849b6673cb5c2728cb750e8ec295aa3)
Note: Added ia64 __clone2 call to tst-clone.c.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This restore the 2.33 semantic for arena_get2. It was changed by
11a02b035b46 to avoid arena_get2 call malloc (back when __get_nproc
was refactored to use an scratch_buffer - 903bc7dcc2acafc). The
__get_nproc was refactored over then and now it also avoid to call
malloc.
The 11a02b035b46 did not take in consideration any performance
implication, which should have been discussed properly. The
__get_nprocs_sched is still used as a fallback mechanism if procfs
and sysfs is not acessible.
Checked on x86_64-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit 472894d2cfee5751b44c0aaa71ed87df81c8e62e)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The commit 49d877a80b29d3002887b084eec6676d9f5fec18 (arm: Remove
_dl_skip_args usage) removed the _SKIP_ARGS literal, which was
previously loader to r4 on loader _start. However, the cleanup did not
remove the following 'ldr r4, [sl, r4]' on _dl_start_user, used to check
to skip the arguments after ld self-relocations.
In my testing, the kernel initially set r4 to 0, which makes the
ldr instruction just read the _GLOBAL_OFFSET_TABLE_. However, since r4
is a callee-saved register; a different runtime might not zero
initialize it and thus trigger an invalid memory access.
Checked on arm-linux-gnu.
Reported-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit 1e25112dc0cb2515d27d8d178b1ecce778a9d37a)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The functions were previously written in C, but were not compiled
with unwind information. The ENTRY/END macros includes .cfi_startproc
and .cfi_endproc which adds unwind information. This caused the
tests cleanup-8 and cleanup-10 in the GCC testsuite to fail.
This patch adds a version of the ENTRY/END macros without the
CFI instructions that can be used instead.
sigaction registers a restorer address that is located two instructions
before the stub function. This patch adds a two instruction padding to
avoid that the unwinder accesses the unwind information from the function
that the linker has placed right before it in memory. This fixes an issue
with pthread_cancel that caused tst-mutex8-static (and other tests) to fail.
Signed-off-by: Daniel Cederman <cederman@gaisler.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 7bd06985c0a143cdcba2762bfe020e53514a53de)
|
|
|
|
|
|
|
|
|
|
| |
The small counts copy bytes comparsion should be unsigned (as the
memmove size argument). It fixes string/tst-memmove-overflow on
sparcv9, where the input size triggers an invalid code path.
Checked on sparc64-linux-gnu and sparcv9-linux-gnu.
(cherry picked from commit 926a4bdbb5fc8955570208b5571b2d04c6ffbd1d)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Similar to sparc32 fix, remove the unwind information on the signal
return stubs. This fixes the regressions:
FAIL: nptl/tst-cancel24-static
FAIL: nptl/tst-cond8-static
FAIL: nptl/tst-mutex8-static
FAIL: nptl/tst-mutexpi8-static
FAIL: nptl/tst-mutexpi9
On sparc64-linux-gnu.
(cherry picked from commit 369efd817780276dbe0ecf8be6e1f354bdbc9857)
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes commit a61933fe27df ("sparc: Remove bzero optimization") that
after moving code jumped to the wrong label 4.
Verfied by successfully running string/test-memset on sparc32.
Signed-off-by: Andreas Larsson <andreas@gaisler.com>
Signed-off-by: Ludwig Rydberg <ludwig.rydberg@gaisler.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 578190b7e43305141512dee777e4a3b3e8159393)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Ffsll function randomly regress by ~20%, depending on how code gets
aligned in memory. Ffsll function code size is 17 bytes. Since default
function alignment is 16 bytes, it can load on 16, 32, 48 or 64 bytes
aligned memory. When ffsll function load at 16, 32 or 64 bytes aligned
memory, entire code fits in single 64 bytes cache line. When ffsll
function load at 48 bytes aligned memory, it splits in two cache line,
hence random regression.
Ffsll function size reduction from 17 bytes to 12 bytes ensures that it
will always fit in single 64 bytes cache line.
This patch fixes ffsll function random performance regression.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 9d94997b5f9445afd4f2bccc5fa60ff7c4361ec1)
|
|
|
|
|
|
|
|
| |
Thanks to Andreas Schwab for reporting.
Fixes: 652b9fdb77d9fd056d4dd26dad2c14142768ab49
Signed-off-by: Sam James <sam@gentoo.org>
(cherry picked from commit 369f373057073c307938da91af16922bda3dff6a)
|
|
|
|
|
|
|
|
|
|
| |
SYS_modify_ldt requires CONFIG_MODIFY_LDT_SYSCALL to be set in the kernel, which
some distributions may disable for hardening. Check if that's the case (unset)
and mark the test as UNSUPPORTED if so.
Reviewed-by: DJ Delorie <dj@redhat.com>
Signed-off-by: Sam James <sam@gentoo.org>
(cherry picked from commit 652b9fdb77d9fd056d4dd26dad2c14142768ab49)
|
|
|
|
|
|
|
|
|
| |
All callers pass 1 or 0x11 anyway (same meaning according to man page),
but still.
Reviewed-by: DJ Delorie <dj@redhat.com>
Signed-off-by: Sam James <sam@gentoo.org>
(cherry picked from commit e0b712dd9183d527aae4506cd39564c14af3bb28)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
So I was able to reproduce the hangs in the original source, and debug
it, and fix it. In doing so, I realized that we can't use anything
complex to trigger the thread because that "anything" might also cause
the expected segfault and force everything out of sync again.
Here's what I ended up with, and it doesn't seem to hang where the
original one hung quite often (in a tight while..end loop). The key
changes are:
1. Calls to futex are error checked, with retries, to ensure that the
futexes are actually doing what they're supposed to be doing. In the
original code, nearly every futex call returned an error.
2. The main loop has checks for whether the thread ran or not, and
"unlocks" the thread if it didn't (this is how the original source
hangs).
Note: the usleep() is not for timing purposes, but just to give the
kernel an excuse to run the other thread at that time. The test will
not hang without it, but is more likely to test the right bugfix
if the usleep() is present.
(cherry picked from commit 088136aa02de6fa13061ef6f754071a5652fdabd)
|
|
|
|
|
|
|
| |
When __resolv_context_get returns NULL due to out of memory, translate it
to a return value of EAI_MEMORY.
(cherry picked from commit 5eabdb6a6ac1599d23dd5966a37417215950245f)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
_dl_tlsdesc_undefweak and _dl_tlsdesc_dynamic access the thread pointer
via the tcb field in TCB:
_dl_tlsdesc_undefweak:
_CET_ENDBR
movq 8(%rax), %rax
subq %fs:0, %rax
ret
_dl_tlsdesc_dynamic:
...
subq %fs:0, %rax
movq -8(%rsp), %rdi
ret
Since the tcb field in TCB is a pointer, %fs:0 is a 32-bit location,
not 64-bit. It should use "sub %fs:0, %RAX_LP" instead. Since
_dl_tlsdesc_undefweak returns ptrdiff_t and _dl_make_tlsdesc_dynamic
returns void *, RAX_LP is appropriate here for x32 and x86-64. This
fixes BZ #31185.
(cherry picked from commit 81be2a61dafc168327c1639e97b6dae128c7ccf3)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On x32, I got
FAIL: elf/tst-tlsgap
$ gdb elf/tst-tlsgap
...
open tst-tlsgap-mod1.so
Thread 2 "tst-tlsgap" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 2268754]
_dl_tlsdesc_dynamic () at ../sysdeps/x86_64/dl-tlsdesc.S:108
108 movq (%rsi), %rax
(gdb) p/x $rsi
$4 = 0xf7dbf9005655fb18
(gdb)
This is caused by
_dl_tlsdesc_dynamic:
_CET_ENDBR
/* Preserve call-clobbered registers that we modify.
We need two scratch regs anyway. */
movq %rsi, -16(%rsp)
movq %fs:DTV_OFFSET, %rsi
Since the dtv field in TCB is a pointer, %fs:DTV_OFFSET is a 32-bit
location, not 64-bit. Load the dtv field to RSI_LP instead of rsi.
This fixes BZ #31184.
(cherry picked from commit 3502440397bbb840e2f7223734aa5cc2cc0e29b6)
|
|
|
|
|
|
| |
This reverts commit a7e34a667585f675143563569688756f4f4a6e47.
Reason for revert: Incompatibility with existing applications.
|
|
|
|
|
|
|
| |
Linux dilfridge-amd64-stable 6.1.41-gentoo-dist #1 SMP PREEMPT_DYNAMIC Tue Jul 25 09:26:34 -00 2023 x86_64 AMD Ryzen 7 3700X 8-Core Processor AuthenticAMD GNU/Linux
32bit build on x86-64
Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes a very recently added leak in getaddrinfo.
This was assigned CVE-2023-5156.
Resolves: BZ #30884
Related: BZ #30842
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit ec6b95c3303c700eb89eebeda2d7264cc184a796)
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some legacy AMD CPUs and hypervisors have the _cpuid_ '0x8000_001D'
set to Zero, thus resulting in zeroed-out computed cache values.
This patch reintroduces the old way of cache computation as a
fail-safe option to handle these exceptions.
Fixed 'level4_cache_size' value through handle_amd().
Reviewed-by: Premachandra Mallappa <premachandra.mallappa@amd.com>
Tested-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit dcad5c8578130dec7f35fd5b0885304b59f9f543)
|
|
|
|
|
|
| |
Also replace an unreachable assert with __builtin_unreachable.
(cherry picked from commit 856bab7717ef6d1033fd7cbf7cfb2ddefbfffb07)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When an NSS plugin only implements the _gethostbyname2_r and
_getcanonname_r callbacks, getaddrinfo could use memory that was freed
during tmpbuf resizing, through h_name in a previous query response.
The backing store for res->at->name when doing a query with
gethostbyname3_r or gethostbyname2_r is tmpbuf, which is reallocated in
gethosts during the query. For AF_INET6 lookup with AI_ALL |
AI_V4MAPPED, gethosts gets called twice, once for a v6 lookup and second
for a v4 lookup. In this case, if the first call reallocates tmpbuf
enough number of times, resulting in a malloc, th->h_name (that
res->at->name refers to) ends up on a heap allocated storage in tmpbuf.
Now if the second call to gethosts also causes the plugin callback to
return NSS_STATUS_TRYAGAIN, tmpbuf will get freed, resulting in a UAF
reference in res->at->name. This then gets dereferenced in the
getcanonname_r plugin call, resulting in the use after free.
Fix this by copying h_name over and freeing it at the end. This
resolves BZ #30843, which is assigned CVE-2023-4806.
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit 973fe93a5675c42798b2161c6f29c01b0e243994)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current implementation of dlclose (and process exit) re-sorts the
link maps before calling ELF destructors. Destructor order is not the
reverse of the constructor order as a result: The second sort takes
relocation dependencies into account, and other differences can result
from ambiguous inputs, such as cycles. (The force_first handling in
_dl_sort_maps is not effective for dlclose.) After the changes in
this commit, there is still a required difference due to
dlopen/dlclose ordering by the application, but the previous
discrepancies went beyond that.
A new global (namespace-spanning) list of link maps,
_dl_init_called_list, is updated right before ELF constructors are
called from _dl_init.
In dl_close_worker, the maps variable, an on-stack variable length
array, is eliminated. (VLAs are problematic, and dlclose should not
call malloc because it cannot readily deal with malloc failure.)
Marking still-used objects uses the namespace list directly, with
next and next_idx replacing the done_index variable.
After marking, _dl_init_called_list is used to call the destructors
of now-unused maps in reverse destructor order. These destructors
can call dlopen. Previously, new objects do not have l_map_used set.
This had to change: There is no copy of the link map list anymore,
so processing would cover newly opened (and unmarked) mappings,
unloading them. Now, _dl_init (indirectly) sets l_map_used, too.
(dlclose is handled by the existing reentrancy guard.)
After _dl_init_called_list traversal, two more loops follow. The
processing order changes to the original link map order in the
namespace. Previously, dependency order was used. The difference
should not matter because relocation dependencies could already
reorder link maps in the old code.
The changes to _dl_fini remove the sorting step and replace it with
a traversal of _dl_init_called_list. The l_direct_opencount
decrement outside the loader lock is removed because it appears
incorrect: the counter manipulation could race with other dynamic
loader operations.
tst-audit23 needs adjustments to the changes in LA_ACT_DELETE
notifications. The new approach for checking la_activity should
make it clearer that la_activty calls come in pairs around namespace
updates.
The dependency sorting test cases need updates because the destructor
order is always the opposite order of constructor order, even with
relocation dependencies or cycles present.
There is a future cleanup opportunity to remove the now-constant
force_first and for_fini arguments from the _dl_sort_maps function.
Fixes commit 1df71d32fe5f5905ffd5d100e5e9ca8ad62 ("elf: Implement
force_first handling in _dl_sort_maps_dfs (bug 28937)").
Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit 6985865bc3ad5b23147ee73466583dd7fdf65892)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 5f828ff824e3b7cd1 ("io: Fix F_GETLK, F_SETLK, and F_SETLKW for
powerpc64") fixed an issue with the value of the lock constants on
powerpc64 when not using __USE_FILE_OFFSET64, but it ended-up also
changing the value when using __USE_FILE_OFFSET64 causing an API change.
Fix that by also checking that define, restoring the pre
4d0fe291aed3a476a commit values:
Default values:
- F_GETLK: 5
- F_SETLK: 6
- F_SETLKW: 7
With -D_FILE_OFFSET_BITS=64:
- F_GETLK: 12
- F_SETLK: 13
- F_SETLKW: 14
At the same time, it has been noticed that there was no test for io lock
with __USE_FILE_OFFSET64, so just add one.
Tested on x86_64-linux-gnu, i686-linux-gnu and
powerpc64le-unknown-linux-gnu.
Resolves: BZ #30804.
Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
(cherry picked from commit 434bf72a94de68f0cc7fbf3c44bf38c1911b70cb)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The:
```
if (shared_per_thread > 0 && threads > 0)
shared_per_thread /= threads;
```
Code was accidentally moved to inside the else scope. This doesn't
match how it was previously (before af992e7abd).
This patch fixes that by putting the division after the `else` block.
(cherry picked from commit 084fb31bc2c5f95ae0b9e6df4d3cf0ff43471ede)
|
|
|
|
|
|
|
|
|
|
|
| |
On some machines we end up with incomplete cache information. This can
make the new calculation of `sizeof(total-L3)/custom-divisor` end up
lower than intended (and lower than the prior value). So reintroduce
the old bound as a lower bound to avoid potentially regressing code
where we don't have complete information to make the decision.
Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit 8b9a0af8ca012217bf90d1dc0694f85b49ae09da)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After:
```
commit af992e7abdc9049714da76cae1e5e18bc4838fb8
Author: Noah Goldstein <goldstein.w.n@gmail.com>
Date: Wed Jun 7 13:18:01 2023 -0500
x86: Increase `non_temporal_threshold` to roughly `sizeof_L3 / 4`
```
Split `shared` (cumulative cache size) from `shared_per_thread` (cache
size per socket), the `shared_per_thread` *can* be slightly off from
the previous calculation.
Previously we added `core` even if `threads_l2` was invalid, and only
used `threads_l2` to divide `core` if it was present. The changed
version only included `core` if `threads_l2` was valid.
This change restores the old behavior if `threads_l2` is invalid by
adding the entire value of `core`.
Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit 47f747217811db35854ea06741be3685e8bbd44d)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Current `non_temporal_threshold` set to roughly '3/4 * sizeof_L3 /
ncores_per_socket'. This patch updates that value to roughly
'sizeof_L3 / 4`
The original value (specifically dividing the `ncores_per_socket`) was
done to limit the amount of other threads' data a `memcpy`/`memset`
could evict.
Dividing by 'ncores_per_socket', however leads to exceedingly low
non-temporal thresholds and leads to using non-temporal stores in
cases where REP MOVSB is multiple times faster.
Furthermore, non-temporal stores are written directly to main memory
so using it at a size much smaller than L3 can place soon to be
accessed data much further away than it otherwise could be. As well,
modern machines are able to detect streaming patterns (especially if
REP MOVSB is used) and provide LRU hints to the memory subsystem. This
in affect caps the total amount of eviction at 1/cache_associativity,
far below meaningfully thrashing the entire cache.
As best I can tell, the benchmarks that lead this small threshold
where done comparing non-temporal stores versus standard cacheable
stores. A better comparison (linked below) is to be REP MOVSB which,
on the measure systems, is nearly 2x faster than non-temporal stores
at the low-end of the previous threshold, and within 10% for over
100MB copies (well past even the current threshold). In cases with a
low number of threads competing for bandwidth, REP MOVSB is ~2x faster
up to `sizeof_L3`.
The divisor of `4` is a somewhat arbitrary value. From benchmarks it
seems Skylake and Icelake both prefer a divisor of `2`, but older CPUs
such as Broadwell prefer something closer to `8`. This patch is meant
to be followed up by another one to make the divisor cpu-specific, but
in the meantime (and for easier backporting), this patch settles on
`4` as a middle-ground.
Benchmarks comparing non-temporal stores, REP MOVSB, and cacheable
stores where done using:
https://github.com/goldsteinn/memcpy-nt-benchmarks
Sheets results (also available in pdf on the github):
https://docs.google.com/spreadsheets/d/e/2PACX-1vS183r0rW_jRX6tG_E90m9qVuFiMbRIJvi5VAE8yYOvEOIEEc3aSNuEsrFbuXw5c3nGboxMmrupZD7K/pubhtml
Reviewed-by: DJ Delorie <dj@redhat.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit af992e7abdc9049714da76cae1e5e18bc4838fb8)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The sparc ABI has multiple cases on how to handle JMP_SLOT relocations,
(sparc_fixup_plt/sparc64_fixup_plt). For BINDNOW, _dl_audit_symbind
will be responsible to setup the final relocation value; while for
lazy binding _dl_fixup/_dl_profile_fixup will call the audit callback
and tail cail elf_machine_fixup_plt (which will call
sparc64_fixup_plt).
This patch fixes by issuing the SPARC specific routine on bindnow and
forwarding the audit value to elf_machine_fixup_plt for lazy resolution.
It fixes the la_symbind for bind-now tests on sparc64 and sparcv9:
elf/tst-audit24a
elf/tst-audit24b
elf/tst-audit24c
elf/tst-audit24d
Checked on sparc64-linux-gnu and sparcv9-linux-gnu.
Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
(cherry picked from commit dddc88587a7f48cbb361d9929ec23d790164eef8)
|
|
|
|
|
|
|
|
| |
As indicated by sparc kernel-features.h, even though sparc64 defines
__NR_pause, it is not supported (ENOSYS). Always use ppoll or the
64 bit time_t variant instead.
(cherry picked from commit 370da8a121c3ba9eeb2f13da15fc0f21f4136b25)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Different than other 64 bit architectures, powerpc64 defines the
LFS POSIX lock constants with values similar to 32 ABI, which
are meant to be used with fcntl64 syscall. Since powerpc64 kABI
does not have fcntl, the constants are adjusted with the
FCNTL_ADJUST_CMD macro.
The 4d0fe291aed3a476a changed the logic of generic constants
LFS value are equal to the default values; which is now wrong
for powerpc64.
Fix the value by explicit define the previous glibc constants
(powerpc64 does not need to use the 32 kABI value, but it simplifies
the FCNTL_ADJUST_CMD which should be kept as compatibility).
Checked on powerpc64-linux-gnu and powerpc-linux-gnu.
(cherry picked from commit 5f828ff824e3b7cd133ef905b8ae25ab8a8f3d66)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(BZ#30477)
For architecture with default 64 bit time_t support, the kernel
does not provide LFS and non-LFS values for F_GETLK, F_GETLK, and
F_GETLK (the default value used for 64 bit architecture are used).
This is might be considered an ABI break, but the currenct exported
values is bogus anyway.
The POSIX lockf is not affected since it is aliased to lockf64,
which already uses the LFS values.
Checked on i686-linux-gnu and the new tests on a riscv32.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 4d0fe291aed3a476a3b59c4ecfae9d35ac0f15e8)
|