about summary refs log tree commit diff
path: root/sysdeps
Commit message (Collapse)AuthorAgeFilesLines
* x86: Move strchr SSE2 implementation to multiarch/strchr-sse2.SNoah Goldstein2022-07-136-183/+213
| | | | | | | This commit doesn't affect libc.so.6, its just housekeeping to prepare for adding explicit ISA level support. Tested build on x86_64 and x86_32 with/without multiarch.
* x86: Move strrchr SSE2 implementation to multiarch/strrchr-sse2.SNoah Goldstein2022-07-134-377/+366
| | | | | | | This commit doesn't affect libc.so.6, its just housekeeping to prepare for adding explicit ISA level support. Tested build on x86_64 and x86_32 with/without multiarch.
* x86: Move memrchr SSE2 implementation to multiarch/memrchr-sse2.SNoah Goldstein2022-07-132-334/+334
| | | | | | | This commit doesn't affect libc.so.6, its just housekeeping to prepare for adding explicit ISA level support. Tested build on x86_64 and x86_32 with/without multiarch.
* x86: Move strcpy SSE2 implementation to multiarch/strcpy-sse2.SNoah Goldstein2022-07-135-155/+156
| | | | | | | This commit doesn't affect libc.so.6, its just housekeeping to prepare for adding explicit ISA level support. Tested build on x86_64 and x86_32 with/without multiarch.
* x86: Move strlen SSE2 implementation to multiarch/strlen-sse2.SNoah Goldstein2022-07-139-286/+306
| | | | | | | This commit doesn't affect libc.so.6, its just housekeeping to prepare for adding explicit ISA level support. Tested build on x86_64 and x86_32 with/without multiarch.
* x86: Move strcmp SSE42 implementation to multiarch/strcmp-sse4_2.SNoah Goldstein2022-07-135-1792/+1766
| | | | | | | This commit doesn't affect libc.so.6, its just housekeeping to prepare for adding explicit ISA level support. Tested build on x86_64 and x86_32 with/without multiarch.
* x86: Move wcscmp SSE2 implementation to multiarch/wcscmp-sse2.SNoah Goldstein2022-07-132-934/+934
| | | | | | | This commit doesn't affect libc.so.6, its just housekeeping to prepare for adding explicit ISA level support. Tested build on x86_64 and x86_32 with/without multiarch.
* x86: Move strcmp SSE2 implementation to multiarch/strcmp-sse2.SNoah Goldstein2022-07-1311-2178/+2264
| | | | | | | | | | | | This commit doesn't affect libc.so.6, its just housekeeping to prepare for adding explicit ISA level support. Because strcmp-sse2.S implements so many functions (more from avx2/evex/sse42) add a new file 'strcmp-naming.h' to assist in getting the correct symbol name for all the function across multiarch/non-multiarch builds. Tested build on x86_64 and x86_32 with/without multiarch.
* x86: Rename STRCASECMP_NONASCII macro to STRCASECMP_L_NONASCIINoah Goldstein2022-07-132-6/+6
| | | | | | The previous macro name can be confusing given that both `__strcasecmp_l_nonascii` and `__strcasecmp_nonascii` are functions and we use the `_l` version.
* x86: Remove __mmask intrinsics in strstr-avx512.cNoah Goldstein2022-07-121-6/+10
| | | | | | | | | | | | The intrinsics are not available before GCC7 and using standard operators generates code of equivalent or better quality. Removed: _cvtmask64_u64 _kshiftri_mask64 _kand_mask64 Geometric Mean of 5 Runs of Full Benchmark Suite New / Old: 0.958
* x86: Remove generic strncat, strncpy, and stpncpy implementationsNoah Goldstein2022-07-1210-92/+56
| | | | | | | | | | | | | | | | | | These functions all have optimized versions: __strncat_sse2_unaligned, __strncpy_sse2_unaligned, and stpncpy_sse2_unaligned which are faster than their respective generic implementations. Since the sse2 versions can run on baseline x86_64, we should use these as the baseline implementation and can remove the generic implementations. Geometric mean of N=20 runs of the entire benchmark suite on: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz (Tigerlake) __strncat_sse2_unaligned / __strncat_generic: .944 __strncpy_sse2_unaligned / __strncpy_generic: .726 __stpncpy_sse2_unaligned / __stpncpy_generic: .650 Tested build with and without multiarch and full check with multiarch.
* i386: Remove -Wa,-mtune=i686Fangrui Song2022-07-121-10/+0
| | | | | | | gas -mtune= may change NOP generating patterns but -mtune=i686 has no difference from the default by inspecting .o and .os files. Note: Clang doesn't support -Wa,-mtune=i686.
* x86-64: Remove redundant strcspn-generic/strpbrk-generic/strspn-genericH.J. Lu2022-07-081-3/+0
| | | | | | | | | | | | | Remove redundant strcspn-generic, strpbrk-generic and strspn-generic from sysdep_routines in sysdeps/x86_64/multiarch/Makefile added by commit c69f960b017b2cdf39335739009526a72fb20379 Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Sun Jul 3 21:28:07 2022 -0700 x86: Add support for building str{c|p}{brk|spn} with explicit ISA level since they have been added to sysdep_routines in sysdeps/x86_64/Makefile.
* x86-64: Don't mark symbols as hidden in strcmp-XXX.SH.J. Lu2022-07-073-3/+0
| | | | | Don't mark symbols as hidden in strcmp-avx2.S, strcmp-evex.S and strcmp-sse42.S since they are marked as hidden in the IFUNC selectors.
* stdlib: Implement mbrtoc8, c8rtomb, and the char8_t typedef.Tom Honermann2022-07-0634-0/+68
| | | | | | | | | | | | | | | | | This change provides implementations for the mbrtoc8 and c8rtomb functions adopted for C++20 via WG21 P0482R6 and for C2X via WG14 N2653. It also provides the char8_t typedef from WG14 N2653. The mbrtoc8 and c8rtomb functions are declared in uchar.h in C2X mode or when the _GNU_SOURCE macro or C++20 __cpp_char8_t feature test macro is defined. The char8_t typedef is declared in uchar.h in C2X mode or when the _GNU_SOURCE macro is defined and the C++20 __cpp_char8_t feature test macro is not defined (if __cpp_char8_t is defined, then char8_t is a builtin type). Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* aarch64: Optimize string functions with shrn instructionDanila Kutenin2022-07-066-102/+59
| | | | | | | | | | | | | We found that string functions were using AND+ADDP to find the nibble/syndrome mask but there is an easier opportunity through `SHRN dst.8b, src.8h, 4` (shift right every 2 bytes by 4 and narrow to 1 byte) and has same latency on all SIMD ARMv8 targets as ADDP. There are also possible gaps for memcmp but that's for another patch. We see 10-20% savings for small-mid size cases (<=128) which are primary cases for general workloads.
* x86: Add support for building {w}memcmp{eq} with explicit ISA levelNoah Goldstein2022-07-0520-661/+790
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. Refactor files so that all implementations are in the multiarch directory - Moved the implementation portion of memcmp sse2 from memcmp.S to multiarch/memcmp-sse2.S - The non-multiarch file now only includes one of the implementations in the multiarch directory based on the compiled ISA level (only used for non-multiarch builds. Otherwise we go through the ifunc selector). 2. Add ISA level build guards to different implementations. - I.e memcmp-avx2-movsb.S which is ISA level 3 will only build if compiled ISA level <= 3. Otherwise there is no reason to include it as we will always use one of the ISA level 4 implementations (memcmp-evex-movbe.S). 3. Add new multiarch/rtld-{w}memcmp{eq}.S that just include the non-multiarch {w}memcmp{eq}.S which will in turn select the best implementation based on the compiled ISA level. 4. Refactor the ifunc selector and ifunc implementation list to use the ISA level aware wrapper macros that allow functions below the compiled ISA level (with a guranteed replacement) to be skipped. Tested with and without multiarch on x86_64 for ISA levels: {generic, x86-64-v2, x86-64-v3, x86-64-v4} And m32 with and without multiarch.
* x86: Add support for building {w}memset{_chk} with explicit ISA levelNoah Goldstein2022-07-0510-203/+265
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. Refactor files so that all implementations are in the multiarch directory - Moved the implementation portion of memset sse2 from memset.S to multiarch/memset-sse2.S - The non-multiarch file now only includes one of the implementations in the multiarch directory based on the compiled ISA level (only used for non-multiarch builds. Otherwise we go through the ifunc selector). 2. Add ISA level build guards to different implementations. - I.e memset-avx2-unaligned-erms.S which is ISA level 3 will only build if compiled ISA level <= 3. Otherwise there is no reason to include it as we will always use one of the ISA level 4 implementations (memset-evex-unaligned-erms.S). 3. Add new multiarch/rtld-memset.S that just include the non-multiarch memset.S which will in turn select the best implementation based on the compiled ISA level. 4. Refactor the ifunc selector and ifunc implementation list to use the ISA level aware wrapper macros that allow functions below the compiled ISA level (with a guranteed replacement) to be skipped. Tested with and without multiarch on x86_64 for ISA levels: {generic, x86-64-v2, x86-64-v3, x86-64-v4} And m32 with and without multiarch.
* x86: Add support for building {w}memmove{_chk} with explicit ISA levelNoah Goldstein2022-07-0511-272/+403
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. Refactor files so that all implementations are in the multiarch directory - Moved the implementation portion of memmove sse2 from memmove.S to multiarch/memmove-sse2.S - The non-multiarch file now only includes one of the implementations in the multiarch directory based on the compiled ISA level (only used for non-multiarch builds. Otherwise we go through the ifunc selector). 2. Add ISA level build guards to different implementations. - I.e memmove-avx2-unaligned-erms.S which is ISA level 3 will only build if compiled ISA level <= 3. Otherwise there is no reason to include it as we will always use one of the ISA level 4 implementations (memmove-evex-unaligned-erms.S). 3. Add new multiarch/rtld-memmove.S that just include the non-multiarch memmove.S which will in turn select the best implementation based on the compiled ISA level. 4. Refactor the ifunc selector and ifunc implementation list to use the ISA level aware wrapper macros that allow functions below the compiled ISA level (with a guranteed replacement) to be skipped. Tested with and without multiarch on x86_64 for ISA levels: {generic, x86-64-v2, x86-64-v3, x86-64-v4} And m32 with and without multiarch. isa raising memmove
* x86: Add support for building str{c|p}{brk|spn} with explicit ISA levelNoah Goldstein2022-07-0518-13/+247
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The changes for these functions are different than the others because the best implementation (sse4_2) requires the generic implementation as a fallback to be built as well. Changes are: 1. Add non-multiarch functions for str{c|p}{brk|spn}.c to statically select the best implementation based on the configured ISA build level. 2. Add stubs for str{c|p}{brk|spn}-generic and varshift.c to in the sysdeps/x86_64 directory so that the the sse4 implementation will have all of its dependencies for the non-multiarch / rtld build when ISA level >= 2. 3. Add new multiarch/rtld-strcspn.c that just include the non-multiarch strcspn.c which will in turn select the best implementation based on the compiled ISA level. 4. Refactor the ifunc selector and ifunc implementation list to use the ISA level aware wrapper macros that allow functions below the compiled ISA level (with a guranteed replacement) to be skipped. Tested with and without multiarch on x86_64 for ISA levels: {generic, x86-64-v2, x86-64-v3, x86-64-v4} And m32 with and without multiarch.
* x86: Add comment explaining no Slow_SSE4_2 check in ifunc-sse4_2Noah Goldstein2022-07-051-0/+6
| | | | | Just for clarities sake and so that if a future implementation is added we remember to add the check.
* Replace __libc_multiple_threads with __libc_single_threadedAdhemerval Zanella2022-07-0517-24/+20
| | | | | | | | | | | And also fixes the SINGLE_THREAD_P macro for SINGLE_THREAD_BY_GLOBAL, since header inclusion single-thread.h is in the wrong order, the define needs to come before including sysdeps/unix/sysdep.h. The macro is now moved to a per-arch single-threade.h header. The SINGLE_THREAD_P is used on some more places. Checked on aarch64-linux-gnu and x86_64-linux-gnu.
* linux: Add mount_setattrAdhemerval Zanella2022-07-0538-4/+87
| | | | | | | | | | It was added on Linux 5.12 (2a1867219c7b27f928e2545782b86daaf9ad50bd) to allow change the properties of a mount or a mount tree using file descriptors which the new mount api is based on. Checked on x86_64-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* linux: Add tst-mount to check for Linux new mount APIAdhemerval Zanella2022-07-052-0/+96
| | | | | | | | | | | | The new mount API was added on Linux 5.2 with six new syscalls: fsopen, fsconfig, fsmount, move_mount, fspick, and open_tree. The new test verifies minimal functionality along with error paths for specific arguments and their corner cases. Checked on x86_64-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* linux: Add open_treeAdhemerval Zanella2022-07-0537-1/+46
| | | | | | | It was added on Linux 5.2 (a07b20004793d8926f78d63eb5980559f7813404) to return a O_PATH-opened file descriptor to an existing mountpoint. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* linux: Add fspickAdhemerval Zanella2022-07-0536-0/+47
| | | | | | | | | It was added on Linux 5.2 (cf3cba4a429be43e5527a3f78859b1bfd9ebc5fb) that can be used to pick an existing mountpoint into an filesystem context which can thereafter be used to reconfigure a superblock with fsconfig syscall. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* linux: Add fsconfigAdhemerval Zanella2022-07-0537-0/+63
| | | | | | | | | | | | | It was added on Linux 5.2 (ecdab150fddb42fe6a739335257949220033b782) as a way to a configure filesystem creation context and trigger actions upon it, to be used in conjunction with fsopen, fspick and fsmount. The fsconfig_command commands are currently only defined as an enum, so they can't be checked on tst-mount-consts.py with current test support. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* AArch64: Reset HWCAP2_AFP bits in FPCR for default fenvTejas Belagod2022-07-051-1/+1
| | | | | | | | | | | | The AFP feature (Alternate floating-point behavior) was added in armv8.7 and introduced new FPCR bits. Currently, HWCAP2_AFP bits (bit 0, 1, 2) in FPCR are preserved when fenv is set to default environment. This is a deviation from standard behaviour. Clear these bits when setting the fenv to default. There is no libc API to modify the new FPCR bits. Restoring those bits matters if the user changed them directly.
* Fix hurd namespace issues for internal signal functionsAdhemerval Zanella2022-07-041-3/+3
| | | | | | | It was introduced by "Refactor internal-signals.h (a1bdd81664aa681364d)". Use the internal symbols instead. Checked with a build for i686-gnu.
* Refactor internal-signals.hAdhemerval Zanella2022-06-307-67/+152
| | | | | | | | | | | | | | | | | | The main drive is to optimize the internal usage and required size when sigset_t is embedded in other data structures. On Linux, the current supported signal set requires up to 8 bytes (16 on mips), was lower than the user defined sigset_t (128 bytes). A new internal type internal_sigset_t is added, along with the functions to operate on it similar to the ones for sigset_t. The internal-signals.h is also refactored to remove unused functions Besides small stack usage on some functions (posix_spawn, abort) it lower the struct pthread by about 120 bytes (112 on mips). Checked on x86_64-linux-gnu. Reviewed-by: Arjun Shankar <arjun@redhat.com>
* riscv: Use memcpy to handle unaligned access when fixing R_RISCV_RELATIVEKito Cheng2022-06-301-1/+4
| | | | | | | | | | | | | Although RISC-V Linux will enable the unaligned memory access handler by default, that is quite expensive in general, using memcpy will be much cheaper - just break down that into several load/store byte instructions. ARM and MIPS has similar issue: ARM: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51456 MIPS: https://gcc.gnu.org/legacy-ml/gcc-help/2005-07/msg00325.html Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* AArch64: Add asymmetric faulting mode for tag violations in mem.tagging tunableTejas Belagod2022-06-301-1/+7
| | | | | | | | | The new asymmetric mode is available when HWCAP2_MTE3 is set (support is available), bit2 is set in the tunable (user request per application), and the system is configured such that the asymmetric mode is preferred over sync or async (per-cpu system-wide setting). Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* linux: Fix mq_timereceive check for 32 bit fallback code (BZ 29304)Adhemerval Zanella2022-06-301-1/+1
| | | | | | | | On success, mq_receive() and mq_timedreceive() return the number of bytes in the received message, so it requires to check if the value is larger than 0. Checked on i686-linux-gnu.
* x86: Add missing IS_IN (libc) check to strncmp-sse4_2.SNoah Goldstein2022-06-291-3/+5
| | | | | | | | | | | | | | | | Was missing to for the multiarch build rtld-strncmp-sse4_2.os was being built and exporting symbols: build/glibc/string/rtld-strncmp-sse4_2.os: 0000000000000000 T __strncmp_sse42 Introduced in: commit 11ffcacb64a939c10cfc713746b8ec88837f5c4a Author: H.J. Lu <hjl.tools@gmail.com> Date: Wed Jun 21 12:10:50 2017 -0700 x86-64: Implement strcmp family IFUNC selectors in C
* x86: Add missing IS_IN (libc) check to strcspn-sse4.cNoah Goldstein2022-06-292-19/+25
| | | | | | | | | | | | | | | | | | | | | | Was missing to for the multiarch build rtld-strcspn-sse4.os was being built and exporting symbols: build/glibc/string/rtld-strcspn-sse4.os: U ___m128i_shift_right U __strcspn_generic 0000000000000000 T __strcspn_sse42 U strlen build/glibc/string/rtld-varshift.os: 0000000000000000 R ___m128i_shift_right Introduced in: commit 06e51c8f3de38761f8855700841bc49cf495c8c0 Author: H.J. Lu <hongjiu.lu@intel.com> Date: Fri Jul 3 02:48:56 2009 -0700 Add SSE4.2 support for strcspn, strpbrk, and strspn on x86-64.
* x86: Add missing IS_IN (libc) check to memmove-ssse3.SNoah Goldstein2022-06-291-16/+44
| | | | | | | | | | | | | | | | | | | | | | | Was missing to for the multiarch build rtld-memmove-ssse3.os was being built and exporting symbols: >$ nm string/rtld-memmove-ssse3.os U __GI___chk_fail 0000000000000020 T __memcpy_chk_ssse3 0000000000000040 T __memcpy_ssse3 0000000000000020 T __memmove_chk_ssse3 0000000000000040 T __memmove_ssse3 0000000000000000 T __mempcpy_chk_ssse3 0000000000000010 T __mempcpy_ssse3 U __x86_shared_cache_size_half Introduced after 2.35 in: commit 26b2478322db94edc9e0e8f577b2f71d291e5acb Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Thu Apr 14 11:47:40 2022 -0500 x86: Reduce code size of mem{move|pcpy|cpy}-ssse3
* x86-64: Properly indent X86_IFUNC_IMPL_ADD_VN argumentsH.J. Lu2022-06-291-48/+51
| | | | | | | Properly indent X86_IFUNC_IMPL_ADD_VN arguments for memchr, rawmemchr and wmemchr. Co-authored-by: H.J. Lu <hjl.tools@gmail.com>
* x86-64: Small improvements to dl-trampoline.SNoah Goldstein2022-06-292-56/+61
| | | | | | | | | | | | | | | 1. Remove sse2 instructions when using the avx512 or avx version. 2. Fixup some format nits in how the address offsets where aligned. 3. Use more space efficient instructions in the conditional AVX restoral. - vpcmpeqq -> vpcmpeqb - cmp imm32, r; jz -> inc r; jz 4. Use `rep movsb` instead of `rep movsq`. The former is guranteed to be fast with the ERMS flags, the latter is not. The latter also wastes an instruction in size setup.
* x86: Move mem{p}{mov|cpy}_{chk_}erms to its own fileNoah Goldstein2022-06-293-50/+73
| | | | | | The primary memmove_{impl}_unaligned_erms implementations don't interact with this function. Putting them in same file both wastes space and unnecessarily bloats a hot code section.
* x86: Move and slightly improve memset_ermsNoah Goldstein2022-06-293-31/+45
| | | | | | | | | | | | | | Implementation wise: 1. Remove the VZEROUPPER as memset_{impl}_unaligned_erms does not use the L(stosb) label that was previously defined. 2. Don't give the hotpath (fallthrough) to zero size. Code positioning wise: Move memset_{chk}_erms to its own file. Leaving it in between the memset_{impl}_unaligned both adds unnecessary complexity to the file and wastes space in a relatively hot cache section.
* x86: Add definition for __wmemset_chk AVX2 RTM in ifunc impl listNoah Goldstein2022-06-291-0/+4
| | | | This was simply missing and meant we weren't testing it properly.
* linux: Remove unnecessary nice.c and signal.cArjun Shankar2022-06-302-4/+0
| | | | | | | | | These files simply include the sysdeps/posix implementations which would be used even in the absence of the files. They have been unnecessary since 7b17aeda0c5e when nice and signal were removed from the syscalls.list file. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* Linux: Forward declaration of struct iovec for process_madviseFlorian Weimer2022-06-291-5/+2
| | | | | | | | | | | | | This maintains compatibility between <sys/mman.h> and <linux/uio.h>. Before that, the addition of process_madvise made those two header files incompatible. This has been observed resulting in a build failure in LLDB's Process/Linux/NativeRegisterContextLinux_s390x.cpp source file. Fixes commit d19ee3473d68ca0e794f3a8b7677a0983ae1342e ("linux: Add process_madvise"). Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* x86: Add more feature definitions to isa-level.hNoah Goldstein2022-06-281-0/+15
| | | | | This commit doesn't change anything in itself. It is just to add definitions that will be needed by future patches.
* x86-64: Only define used SSE/AVX/AVX512 run-time resolversH.J. Lu2022-06-273-31/+42
| | | | | | | When glibc is built with x86-64 ISA level v3, SSE run-time resolvers aren't used. For x86-64 ISA level v4 build, both SSE and AVX resolvers are unused. Check the minimum x86-64 ISA level to exclude the unused run-time resolvers.
* x86: Move CPU_FEATURE{S}_{USABLE|ARCH}_P to isa-level.hH.J. Lu2022-06-272-27/+24
| | | | | Move X86_ISA_CPU_FEATURE_USABLE_P and X86_ISA_CPU_FEATURES_ARCH_P to where MINIMUM_X86_ISA_LEVEL and XXX_X86_ISA_LEVEL are defined.
* x86: Fix backwards Prefer_No_VZEROUPPER check in ifunc-evex.hNoah Goldstein2022-06-273-26/+34
| | | | | | | | | Add third argument to X86_ISA_CPU_FEATURES_ARCH_P macro so the runtime CPU_FEATURES_ARCH_P check can be inverted if the MINIMUM_X86_ISA_LEVEL is not high enough to constantly evaluate the check. Use this new macro to correct the backwards check in ifunc-evex.h
* x86: Rename strstr_sse2 to strstr_generic as it uses string/strstr.cNoah Goldstein2022-06-273-6/+6
| | | | This is in accordance with other files in the multiarch directory.
* x86: Remove unused file wmemcmp-sse4Noah Goldstein2022-06-271-4/+0
| | | | | | | | | | | | The memcmp-sse4 was removed in: commit 7cbc03d03091d5664060924789afe46d30a5477e Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Fri Apr 15 12:28:00 2022 -0500 x86: Remove memcmp-sse4.S so this file does nothing.
* x86: Put wcs{n}len-sse4.1 in the sse4.1 text sectionNoah Goldstein2022-06-273-1/+7
| | | | | Previously was missing but the two implementations shouldn't get in the sse2 (generic) text section.