about summary refs log tree commit diff
path: root/sysdeps/aarch64/fpu
Commit message (Collapse)AuthorAgeFilesLines
* aarch64/fpu: Add vector variants of powJoe Ramsay2024-05-2119-12/+2223
| | | | | | | Plus a small amount of moving includes around in order to be able to remove duplicate definition of asuint64. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* aarch64/fpu: Add vector variants of cbrtJoe Ramsay2024-05-1612-0/+513
| | | | Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* aarch64/fpu: Add vector variants of hypotJoe Ramsay2024-05-1612-0/+316
| | | | Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* aarch64: Fix AdvSIMD libmvec routines for big-endianJoe Ramsay2024-05-1417-85/+119
| | | | | | | | | | | | | | | | | | | | Previously many routines used * to load from vector types stored in the data table. This is emitted as ldr, which byte-swaps the entire vector register, and causes bugs for big-endian when not all lanes contain the same value. When a vector is to be used this way, it has been replaced with an array and the load with an explicit ld1 intrinsic, which byte-swaps only within lanes. As well, many routines previously used non-standard GCC syntax for vector operations such as indexing into vectors types with [] and assembling vectors using {}. This syntax should not be mixed with ACLE, as the former does not respect endianness whereas the latter does. Such examples have been replaced with, for instance, vcombine_* and vgetq_lane* intrinsics. Helpers which only use the GCC syntax, such as the v_call helpers, do not need changing as they do not use intrinsics. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* aarch64/fpu: Add vector variants of erfcJoe Ramsay2024-04-0415-1/+4884
| | | | Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* aarch64/fpu: Add vector variants of tanhJoe Ramsay2024-04-0412-1/+366
| | | | Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* aarch64/fpu: Add vector variants of sinhJoe Ramsay2024-04-0414-0/+559
| | | | Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* aarch64/fpu: Add vector variants of atanhJoe Ramsay2024-04-0412-0/+275
| | | | Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* aarch64/fpu: Add vector variants of asinhJoe Ramsay2024-04-0412-0/+476
| | | | Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* aarch64/fpu: Add vector variants of acoshJoe Ramsay2024-04-0417-0/+640
| | | | Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* aarch64/fpu: Add vector variants of coshJoe Ramsay2024-04-0416-1/+635
| | | | Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* aarch64/fpu: Add vector variants of erfJoe Ramsay2024-04-0417-1/+4518
| | | | Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* aarch64/fpu: Sync libmvec routines from 2.39 and before with AORJoe Ramsay2024-02-2618-105/+111
| | | | | | | This includes a fix for big-endian in AdvSIMD log, some cosmetic changes, and numerous small optimisations mainly around inlining and using indexed variants of MLA intrinsics. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* Update copyright dates with scripts/update-copyrightsPaul Eggert2024-01-01121-121/+121
|
* aarch64: Add SIMD attributes to math functions with vector versionsJoe Ramsay2023-12-202-0/+113
| | | | | | | Added annotations for autovec by GCC and GFortran - this enables GCC >= 9 to autovectorise math calls at -Ofast. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* aarch64: Add half-width versions of AdvSIMD f32 libmvec routinesJoe Ramsay2023-12-2018-14/+108
| | | | | | | | | | | Compilers may emit calls to 'half-width' routines (two-lane single-precision variants). These have been added in the form of wrappers around the full-width versions, where the low half of the vector is simply duplicated. This will perform poorly when one lane triggers the special-case handler, as there will be a redundant call to the scalar version, however this is expected to be rare at Ofast. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* aarch64: Improve special-case handling in AdvSIMD double-precision libmvec ↵Joe Ramsay2023-11-291-1/+7
| | | | | | | routines Avoids emitting many saves/restores of vector registers, reduces the amount of code generated around the scalar fallback.
* aarch64: Fix libmvec benchmarksJoe Ramsay2023-11-222-49/+81
| | | | | | | | These were broken by the new atan2 functions, as they were only set up for univariate functions. Arity is now detected from the input file - this revealed a mistake that the double-precision inputs were being used for both single- and double-precision routines, which is now remedied.
* aarch64: Add vector implementations of expm1 routinesJoe Ramsay2023-11-2011-0/+450
| | | | May discard sign of 0 - auto tests for -0 and -0x1p-10000 updated accordingly.
* aarch64: Add vector implementations of log1p routinesJoe Ramsay2023-11-1011-0/+488
| | | | May discard sign of zero.
* aarch64: Add vector implementations of atan2 routinesJoe Ramsay2023-11-1013-0/+523
|
* aarch64: Add vector implementations of atan routinesJoe Ramsay2023-11-1011-0/+395
|
* aarch64: Add vector implementations of acos routinesJoe Ramsay2023-11-1011-1/+428
|
* aarch64: Add vector implementations of asin routinesJoe Ramsay2023-11-1011-1/+395
|
* aarch64: Add vector implementations of exp10 routinesJoe Ramsay2023-10-2311-0/+516
| | | | | Double-precision routines either reuse the exp table (AdvSIMD) or use SVE FEXPA intruction.
* aarch64: Add vector implementations of log10 routinesJoe Ramsay2023-10-2313-1/+572
| | | | A table is also added, which is shared between AdvSIMD and SVE log10.
* aarch64: Add vector implementations of log2 routinesJoe Ramsay2023-10-2313-1/+537
| | | | A table is also added, which is shared between AdvSIMD and SVE log2.
* aarch64: Add vector implementations of exp2 routinesJoe Ramsay2023-10-2311-0/+451
| | | | Some routines reuse table from v_exp_data.c
* aarch64: Add vector implementations of tan routinesJoe Ramsay2023-10-2317-1/+1236
| | | | | This includes some utility headers for evaluating polynomials using various schemes.
* aarch64: Optimise vecmath logsJoe Ramsay2023-10-057-215/+226
| | | | | | | | | | | * Transpose table layout for improved memory access * Use half-vector special comparisons for AdvSIMD * Improve register use near special-case branches - Due to the presence of a function call, return value would get mov-d out of x0 in order to facilitate PCS. By moving the final computation after the branch this can be avoided Also change SVE routines to use overloaded intrinsics for readability.
* aarch64: Cosmetic change in SVE exp routinesJoe Ramsay2023-10-052-47/+44
| | | | | | | Use overloaded intrinsics for readability. Codegen does not change, however while we're bringing the routines up-to-date with recent improvements to other routines in AOR it is worth copying this change over as well.
* aarch64: Optimize SVE cos & cosfJoe Ramsay2023-10-052-53/+47
| | | | | | Saves a mov by ensuring return value does not need to be moved out of the way before special-case branch. Also change to use overloaded intrinsics.
* aarch64: Improve vecmath sin routinesJoe Ramsay2023-10-053-73/+87
| | | | | | | | * Update ULP comment reflecting a new observed max in [-pi/2, pi/2] * Use the same polynomial in AdvSIMD and SVE, rather than FTRIG instructions * Improve register use near special-case branch Also use overloaded intrinsics for SVE.
* AArch64: Remove -0.0 check from vector sinWilco Dijkstra2023-09-262-12/+2
| | | | | | | Remove the unnecessary extra checks for sin (-0.0) from vector sin/sinf, improving performance. Passes regress. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* aarch64: Add vector implementations of exp routinesJoe Ramsay2023-06-3013-1/+585
| | | | | | | | | | | | Optimised implementations for single and double precision, Advanced SIMD and SVE, copied from Arm Optimized Routines. As previously, data tables are used via a barrier to prevent overly aggressive constant inlining. Special-case handlers are marked NOINLINE to avoid incurring the penalty of switching call standards unnecessarily. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* aarch64: Add vector implementations of log routinesJoe Ramsay2023-06-3013-1/+551
| | | | | | | | | | | | | | Optimised implementations for single and double precision, Advanced SIMD and SVE, copied from Arm Optimized Routines. Log lookup table added as HIDDEN symbol to allow it to be shared between AdvSIMD and SVE variants. As previously, data tables are used via a barrier to prevent overly aggressive constant inlining. Special-case handlers are marked NOINLINE to avoid incurring the penalty of switching call standards unnecessarily. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* aarch64: Add vector implementations of sin routinesJoe Ramsay2023-06-3011-6/+418
| | | | | | | | | | | | Optimised implementations for single and double precision, Advanced SIMD and SVE, copied from Arm Optimized Routines. As previously, data tables are used via a barrier to prevent overly aggressive constant inlining. Special-case handlers are marked NOINLINE to avoid incurring the penalty of switching call standards unnecessarily. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* aarch64: Add vector implementations of cos routinesJoe Ramsay2023-06-309-116/+607
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace the loop-over-scalar placeholder routines with optimised implementations from Arm Optimized Routines (AOR). Also add some headers containing utilities for aarch64 libmvec routines, and update libm-test-ulps. Data tables for new routines are used via a pointer with a barrier on it, in order to prevent overly aggressive constant inlining in GCC. This allows a single adrp, combined with offset loads, to be used for every constant in the table. Special-case handlers are marked NOINLINE in order to confine the save/restore overhead of switching from vector to normal calling standard. This way we only incur the extra memory access in the exceptional cases. NOINLINE definitions have been moved to math_private.h in order to reduce duplication. AOR exposes a config option, WANT_SIMD_EXCEPT, to enable selective masking (and later fixing up) of invalid lanes, in order to trigger fp exceptions correctly (AdvSIMD only). This is tested and maintained in AOR, however it is configured off at source level here for performance reasons. We keep the WANT_SIMD_EXCEPT blocks in routine sources to greatly simplify the upstreaming process from AOR to glibc. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* Fix a few more typos I missed in previous round -- BZ 25337Paul Pluzhnikov2023-06-021-1/+1
|
* Enable libmvec support for AArch64Joe Ramsay2023-05-0322-0/+863
| | | | | | | | | | | | | | | | | | | | | | | This patch enables libmvec on AArch64. The proposed change is mainly implementing build infrastructure to add the new routines to ABI, tests and benchmarks. I have demonstrated how this all fits together by adding implementations for vector cos, in both single and double precision, targeting both Advanced SIMD and SVE. The implementations of the routines themselves are just loops over the scalar routine from libm for now, as we are more concerned with getting the plumbing right at this point. We plan to contribute vector routines from the Arm Optimized Routines repo that are compliant with requirements described in the libmvec wiki. Building libmvec requires minimum GCC 10 for SVE ACLE. To avoid raising the minimum GCC by such a big jump, we allow users to disable libmvec if their compiler is too old. Note that at this point users have to manually call the vector math functions. This seems to be acceptable to some downstream users. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* Update copyright dates with scripts/update-copyrightsJoseph Myers2023-01-0632-32/+32
|
* AArch64: Reset HWCAP2_AFP bits in FPCR for default fenvTejas Belagod2022-07-051-1/+1
| | | | | | | | | | | | The AFP feature (Alternate floating-point behavior) was added in armv8.7 and introduced new FPCR bits. Currently, HWCAP2_AFP bits (bit 0, 1, 2) in FPCR are preserved when fenv is set to default environment. This is a deviation from standard behaviour. Clear these bits when setting the fenv to default. There is no libc API to modify the new FPCR bits. Restoring those bits matters if the user changed them directly.
* Update copyright dates with scripts/update-copyrightsPaul Eggert2022-01-0132-32/+32
| | | | | | | | | | | | | | | | | | | | | | | I used these shell commands: ../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright (cd ../glibc && git commit -am"[this commit message]") and then ignored the output, which consisted lines saying "FOO: warning: copyright statement not found" for each of 7061 files FOO. I then removed trailing white space from math/tgmath.h, support/tst-support-open-dev-null-range.c, and sysdeps/x86_64/multiarch/strlen-vec.S, to work around the following obscure pre-commit check failure diagnostics from Savannah. I don't know why I run into these diagnostics whereas others evidently do not. remote: *** 912-#endif remote: *** 913: remote: *** 914- remote: *** error: lines with trailing whitespace found ... remote: *** error: sysdeps/unix/sysv/linux/statx_cp.c: trailing lines
* aarch64: Add math-use-builtins-f{max,min}.hAdhemerval Zanella2021-12-136-112/+8
| | | | It allows to remove the arch-specific implementations.
* Update math: redirect roundeven functionH.J. Lu2021-06-272-1/+2
| | | | | Redirect target specific roundeven functions for aarch64, ldbl-128ibm and riscv.
* AArch64: Add support for roundeven[f]Wilco Dijkstra2021-06-082-0/+57
| | | | | | Add inline assembler for the roundeven functions. Passes GLIBC regression. Note GCC does not inline the builtin (PR100966), so this cannot be used for now.
* Update copyright dates with scripts/update-copyrightsPaul Eggert2021-01-0234-34/+34
| | | | | | | | | | | | | | | | I used these shell commands: ../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright (cd ../glibc && git commit -am"[this commit message]") and then ignored the output, which consisted lines saying "FOO: warning: copyright statement not found" for each of 6694 files FOO. I then removed trailing white space from benchtests/bench-pthread-locks.c and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this diagnostic from Savannah: remote: *** pre-commit check failed ... remote: *** error: lines with trailing whitespace found remote: error: hook declined to update refs/heads/master
* aarch64: Remove fpu MakefileAdhemerval Zanella2020-06-221-14/+0
| | | | | | | | The -fno-math-errno is already added by default and the minimum required GCC to build glibc (6.2) make the -ffinite-math-only superflous. Checked on aarch64-linux-gnu.
* aarch64: Use math-use-builtins for ceil{f}Adhemerval Zanella2020-06-222-58/+0
| | | | | | | The define is already set on the math-use-builtins-ceil.h, the patch just removes the implementations (it was missed on c9feb1be93). Checked on aarch64-linux-gnu.
* math: Decompose math-use-builtins.hAdhemerval Zanella2020-06-229-71/+32
| | | | | | | | | | | Each symbol definitions are moved on a separated file and it cover all symbol type definitions (float, double, long double, and float128). It allows to set support for architectures without the boiler place of copying default values. Checked with a build on the affected ABIs.