about summary refs log tree commit diff
path: root/sysdeps/aarch64
Commit message (Collapse)AuthorAgeFilesLines
* aarch64: Remove duplicate memchr/strlen in libc.a (BZ 31777)Adhemerval Zanella2024-05-232-0/+6
| | | | | | The generic version provides weak definitions of memchr/strlen, which are already provided by the ifunc resolvers. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* aarch64/fpu: Add vector variants of powJoe Ramsay2024-05-2120-12/+2231
| | | | | | | Plus a small amount of moving includes around in order to be able to remove duplicate definition of asuint64. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* aarch64: Update ulpsAdhemerval Zanella2024-05-201-0/+20
| | | | For the log2p1 implementation.
* aarch64/fpu: Add vector variants of cbrtJoe Ramsay2024-05-1613-0/+521
| | | | Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* aarch64/fpu: Add vector variants of hypotJoe Ramsay2024-05-1613-0/+324
| | | | Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* aarch64: Fix AdvSIMD libmvec routines for big-endianJoe Ramsay2024-05-1417-85/+119
| | | | | | | | | | | | | | | | | | | | Previously many routines used * to load from vector types stored in the data table. This is emitted as ldr, which byte-swaps the entire vector register, and causes bugs for big-endian when not all lanes contain the same value. When a vector is to be used this way, it has been replaced with an array and the load with an explicit ld1 intrinsic, which byte-swaps only within lanes. As well, many routines previously used non-standard GCC syntax for vector operations such as indexing into vectors types with [] and assembling vectors using {}. This syntax should not be mixed with ACLE, as the former does not respect endianness whereas the latter does. Such examples have been replaced with, for instance, vcombine_* and vgetq_lane* intrinsics. Helpers which only use the GCC syntax, such as the v_call helpers, do not need changing as they do not use intrinsics. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* elf: Only process multiple tunable once (BZ 31686)Adhemerval Zanella2024-05-071-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | The 680c597e9c3 commit made loader reject ill-formatted strings by first tracking all set tunables and then applying them. However, it does not take into consideration if the same tunable is set multiple times, where parse_tunables_string appends the found tunable without checking if it was already in the list. It leads to a stack-based buffer overflow if the tunable is specified more than the total number of tunables. For instance: GLIBC_TUNABLES=glibc.malloc.check=2:... (repeat over the number of total support for different tunable). Instead, use the index of the tunable list to get the expected tunable entry. Since now the initial list is zero-initialized, the compiler might emit an extra memset and this requires some minor adjustment on some ports. Checked on x86_64-linux-gnu and aarch64-linux-gnu. Reported-by: Yuto Maeda <maeda@cyberdefense.jp> Reported-by: Yutaro Shimizu <shimizu@cyberdefense.jp> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* AArch64: Remove unused defines of CPU namesWilco Dijkstra2024-04-301-7/+0
| | | | | | Remove unused defines of CPU names in cpu-features.h. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* aarch64: Enhanced CPU diagnostics for ld.soFlorian Weimer2024-04-081-0/+84
| | | | | | | This prints some information from struct cpu_features, and the midr_el1 and dczid_el0 system register contents on every CPU. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* aarch64: Remove ld.so __tls_get_addr plt usageAdhemerval Zanella2024-04-041-1/+2
| | | | | | Use the hidden alias instead. Checked on aarch64-linux-gnu.
* aarch64/fpu: Add vector variants of erfcJoe Ramsay2024-04-0416-1/+4892
| | | | Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* aarch64/fpu: Add vector variants of tanhJoe Ramsay2024-04-0413-1/+374
| | | | Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* aarch64/fpu: Add vector variants of sinhJoe Ramsay2024-04-0415-0/+567
| | | | Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* aarch64/fpu: Add vector variants of atanhJoe Ramsay2024-04-0413-0/+283
| | | | Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* aarch64/fpu: Add vector variants of asinhJoe Ramsay2024-04-0413-0/+484
| | | | Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* aarch64/fpu: Add vector variants of acoshJoe Ramsay2024-04-0418-0/+648
| | | | Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* aarch64/fpu: Add vector variants of coshJoe Ramsay2024-04-0417-1/+643
| | | | Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* aarch64/fpu: Add vector variants of erfJoe Ramsay2024-04-0418-1/+4526
| | | | Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* AArch64: Check kernel version for SVE ifuncsWilco Dijkstra2024-03-214-2/+5
| | | | | | | | | | | | | | | | Old Linux kernels disable SVE after every system call. Calling the SVE-optimized memcpy afterwards will then cause a trap to reenable SVE. As a result, applications with a high use of syscalls may run slower with the SVE memcpy. This is true for kernels between 4.15.0 and before 6.2.0, except for 5.14.0 which was patched. Avoid this by checking the kernel version and selecting the SVE ifunc on modern kernels. Parse the kernel version reported by uname() into a 24-bit kernel.major.minor value without calling any library functions. If uname() is not supported or if the version format is not recognized, assume the kernel is modern. Tested-by: Florian Weimer <fweimer@redhat.com> Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* elf: Enable TLS descriptor tests on aarch64Adhemerval Zanella2024-03-191-0/+1
| | | | | | | | | | | | The aarch64 uses 'trad' for traditional tls and 'desc' for tls descriptors, but unlike other targets it defaults to 'desc'. The gnutls2 configure check does not set aarch64 as an ABI that uses TLS descriptors, which then disable somes stests. Also rename the internal machinery fron gnu2 to tls descriptors. Checked on aarch64-linux-gnu. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* aarch64: fix check for SVE support in assemblerSzabolcs Nagy2024-03-142-4/+6
| | | | | | | | | | | | | Due to GCC bug 110901 -mcpu can override -march setting when compiling asm code and thus a compiler targetting a specific cpu can fail the configure check even when binutils gas supports SVE. The workaround is that explicit .arch directive overrides both -mcpu and -march, and since that's what the actual SVE memcpy uses the configure check should use that too even if the GCC issue is fixed independently. Reviewed-by: Florian Weimer <fweimer@redhat.com>
* aarch64/fpu: Sync libmvec routines from 2.39 and before with AORJoe Ramsay2024-02-2618-105/+111
| | | | | | | This includes a fix for big-endian in AdvSIMD log, some cosmetic changes, and numerous small optimisations mainly around inlining and using indexed variants of MLA intrinsics. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* string: Use builtins for ffs and ffsllAdhemerval Zanella Netto2024-02-011-0/+2
| | | | | | | It allows to remove a lot of arch-specific implementations. Checked on x86_64, aarch64, powerpc64. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* Refer to C23 in place of C2X in glibcJoseph Myers2024-02-011-1/+1
| | | | | | | | | | | | | | | WG14 decided to use the name C23 as the informal name of the next revision of the C standard (notwithstanding the publication date in 2024). Update references to C2X in glibc to use the C23 name. This is intended to update everything *except* where it involves renaming files (the changes involving renaming tests are intended to be done separately). In the case of the _ISOC2X_SOURCE feature test macro - the only user-visible interface involved - support for that macro is kept for backwards compatibility, while adding _ISOC23_SOURCE. Tested for x86_64.
* aarch64: Make cpu-features definitions not Linux-specificSergey Bugaev2024-01-042-0/+110
| | | | | | | | | These describe generic AArch64 CPU features, and are not tied to a kernel-specific way of determining them. We can share them between the Linux and Hurd AArch64 ports. Signed-off-by: Sergey Bugaev <bugaevc@gmail.com> Message-ID: <20240103171502.1358371-13-bugaevc@gmail.com>
* aarch64: Add longjmp test for SMESzabolcs Nagy2024-01-022-0/+283
| | | | | | | | | | Includes test for setcontext too. The test directly checks after longjmp if ZA got disabled and the ZA contents got saved following the lazy saving scheme. It does not use ACLE code to verify that gcc can interoperate with glibc. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* aarch64: Add longjmp support for SMESzabolcs Nagy2024-01-021-0/+22
| | | | | | | | | | For the ZA lazy saving scheme to work, longjmp has to call __libc_arm_za_disable. In ld.so we assume ZA is not used so longjmp does not need special support there. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* aarch64: Add SME runtime supportSzabolcs Nagy2024-01-023-3/+129
| | | | | | | | | | | | | | | | | | | The runtime support routines for the call ABI of the Scalable Matrix Extension (SME) are mostly in libgcc. Since libc.so cannot depend on libgcc_s.so have an implementation of __arm_za_disable in libc for libc internal use in longjmp and similar APIs. __libc_arm_za_disable follows the same PCS rules as __arm_za_disable, but it's a hidden symbol so it does not need variant PCS marking. Using __libc_fatal instead of abort because it can print a message and works in ld.so too. But for now we don't need SME routines in ld.so. To check the SME HWCAP in asm, we need the _dl_hwcap2 member offset in _rtld_global_ro in the shared libc.so, while in libc.a the _dl_hwcap2 object is accessed. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* Update copyright dates with scripts/update-copyrightsPaul Eggert2024-01-01223-223/+223
|
* aarch64: Add SIMD attributes to math functions with vector versionsJoe Ramsay2023-12-202-0/+113
| | | | | | | Added annotations for autovec by GCC and GFortran - this enables GCC >= 9 to autovectorise math calls at -Ofast. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* aarch64: Add half-width versions of AdvSIMD f32 libmvec routinesJoe Ramsay2023-12-2018-14/+108
| | | | | | | | | | | Compilers may emit calls to 'half-width' routines (two-lane single-precision variants). These have been added in the form of wrappers around the full-width versions, where the low half of the vector is simply duplicated. This will perform poorly when one lane triggers the special-case handler, as there will be a redundant call to the scalar version, however this is expected to be rare at Ofast. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* aarch64: correct CFI in rawmemchr (bug 31113)Andreas Schwab2023-12-051-1/+1
| | | | | | | The .cfi_return_column directive changes the return column for the whole FDE range. But the actual intent is to tell the unwinder that the value in x30 (lr) now resides in x15 after the move, and that is expressed by the .cfi_register directive.
* aarch64: fix tested ifunc variantsSzabolcs Nagy2023-12-041-3/+3
| | | | | Don't test a64fx string functions when BTI is enabled since they are not BTI compatible.
* aarch64: Improve special-case handling in AdvSIMD double-precision libmvec ↵Joe Ramsay2023-11-291-1/+7
| | | | | | | routines Avoids emitting many saves/restores of vector registers, reduces the amount of code generated around the scalar fallback.
* aarch64: Fix libmvec benchmarksJoe Ramsay2023-11-222-49/+81
| | | | | | | | These were broken by the new atan2 functions, as they were only set up for univariate functions. Arity is now detected from the input file - this revealed a mistake that the double-precision inputs were being used for both single- and double-precision routines, which is now remedied.
* elf: Remove LD_PROFILE for static binariesAdhemerval Zanella2023-11-212-2/+4
| | | | | | | | | | | The _dl_non_dynamic_init does not parse LD_PROFILE, which does not enable profile for dlopen objects. Since dlopen is deprecated for static objects, it is better to remove the support. It also allows to trim down libc.a of profile support. Checked on x86_64-linux-gnu. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* aarch64: Add vector implementations of expm1 routinesJoe Ramsay2023-11-2012-0/+458
| | | | May discard sign of 0 - auto tests for -0 and -0x1p-10000 updated accordingly.
* AArch64: Remove Falkor memcpyWilco Dijkstra2023-11-135-324/+0
| | | | | | | | | | The latest implementations of memcpy are actually faster than the Falkor implementations [1], so remove the falkor/phecda ifuncs for memcpy and the now unused IS_FALKOR/IS_PHECDA defines. [1] https://sourceware.org/pipermail/libc-alpha/2022-December/144227.html Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* AArch64: Add memset_zva64Wilco Dijkstra2023-11-136-68/+38
| | | | | | | | Add a specialized memset for the common ZVA size of 64 to avoid the overhead of reading the ZVA size. Since the code is identical to __memset_falkor, remove the latter. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* AArch64: Cleanup emag memsetWilco Dijkstra2023-11-134-197/+90
| | | | | | | Cleanup emag memset - merge the memset_base64.S file, remove the unused ZVA code (since it is disabled on emag). Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* aarch64: Add vector implementations of log1p routinesJoe Ramsay2023-11-1012-0/+496
| | | | May discard sign of zero.
* aarch64: Add vector implementations of atan2 routinesJoe Ramsay2023-11-1014-0/+531
|
* aarch64: Add vector implementations of atan routinesJoe Ramsay2023-11-1012-0/+403
|
* aarch64: Add vector implementations of acos routinesJoe Ramsay2023-11-1012-1/+436
|
* aarch64: Add vector implementations of asin routinesJoe Ramsay2023-11-1012-1/+403
|
* AArch64: Cleanup ifuncsWilco Dijkstra2023-11-0118-125/+41
| | | | | | | | Cleanup ifuncs. Remove uses of libc_hidden_builtin_def, use ENTRY rather than ENTRY_ALIGN, remove unnecessary defines and conditional compilation. Rename strlen_mte to strlen_generic. Remove rtld-memset. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* AArch64: Add support for MOPS memcpy/memmove/memsetWilco Dijkstra2023-10-249-1/+137
| | | | | | | | Add support for MOPS in cpu_features and INIT_ARCH. Add ifuncs using MOPS for memcpy, memmove and memset (use .inst for now so it works with all binutils versions without needing complex configure and conditional compilation). Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* aarch64: Add vector implementations of exp10 routinesJoe Ramsay2023-10-2312-0/+524
| | | | | Double-precision routines either reuse the exp table (AdvSIMD) or use SVE FEXPA intruction.
* aarch64: Add vector implementations of log10 routinesJoe Ramsay2023-10-2314-1/+580
| | | | A table is also added, which is shared between AdvSIMD and SVE log10.
* aarch64: Add vector implementations of log2 routinesJoe Ramsay2023-10-2314-1/+545
| | | | A table is also added, which is shared between AdvSIMD and SVE log2.