Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | aarch64: Fix AdvSIMD libmvec routines for big-endian | Joe Ramsay | 2024-05-14 | 1 | -5/+6 |
| | | | | | | | | | | | | | | | | | | | | Previously many routines used * to load from vector types stored in the data table. This is emitted as ldr, which byte-swaps the entire vector register, and causes bugs for big-endian when not all lanes contain the same value. When a vector is to be used this way, it has been replaced with an array and the load with an explicit ld1 intrinsic, which byte-swaps only within lanes. As well, many routines previously used non-standard GCC syntax for vector operations such as indexing into vectors types with [] and assembling vectors using {}. This syntax should not be mixed with ACLE, as the former does not respect endianness whereas the latter does. Such examples have been replaced with, for instance, vcombine_* and vgetq_lane* intrinsics. Helpers which only use the GCC syntax, such as the v_call helpers, do not need changing as they do not use intrinsics. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> | ||||
* | aarch64/fpu: Sync libmvec routines from 2.39 and before with AOR | Joe Ramsay | 2024-02-26 | 1 | -12/+13 |
| | | | | | | | This includes a fix for big-endian in AdvSIMD log, some cosmetic changes, and numerous small optimisations mainly around inlining and using indexed variants of MLA intrinsics. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> | ||||
* | Update copyright dates with scripts/update-copyrights | Paul Eggert | 2024-01-01 | 1 | -1/+1 |
| | |||||
* | aarch64: Add half-width versions of AdvSIMD f32 libmvec routines | Joe Ramsay | 2023-12-20 | 1 | -1/+3 |
| | | | | | | | | | | | Compilers may emit calls to 'half-width' routines (two-lane single-precision variants). These have been added in the form of wrappers around the full-width versions, where the low half of the vector is simply duplicated. This will perform poorly when one lane triggers the special-case handler, as there will be a redundant call to the scalar version, however this is expected to be rare at Ofast. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> | ||||
* | aarch64: Add vector implementations of tan routines | Joe Ramsay | 2023-10-23 | 1 | -0/+129 |
This includes some utility headers for evaluating polynomials using various schemes. |