about summary refs log tree commit diff
path: root/sysdeps/aarch64/fpu/log1pf_advsimd.c
Commit message (Collapse)AuthorAgeFilesLines
* AArch64: Improve codegen in users of AdvSIMD log1pf helperJoe Ramsay13 days1-78/+43
| | | | | | | | | | | | | | | | | | | | log1pf is quite register-intensive - use fewer registers for the polynomial, and make various changes to shorten dependency chains in parent routines. There is now no spilling with GCC 14. Accuracy moves around a little - comments adjusted accordingly but does not require regen-ulps. Use the helper in log1pf as well, instead of having separate implementations. The more accurate polynomial means special-casing can be simplified, and the shorter dependency chain avoids the usual dance around v0, which is otherwise difficult. There is a small duplication of vectors containing 1.0f (or 0x3f800000) - GCC is not currently able to efficiently handle values which fit in FMOV but not MOVI, and are reinterpreted to integer. There may be potential for more optimisation if this is fixed. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
* AArch64: Add vector logp1 alias for log1pJoe Ramsay2024-09-191-0/+3
| | | | | | | | This enables vectorisation of C23 logp1, which is an alias for log1p. There are no new tests or ulp entries because the new symbols are simply aliases. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
* Update copyright dates with scripts/update-copyrightsPaul Eggert2024-01-011-1/+1
|
* aarch64: Add half-width versions of AdvSIMD f32 libmvec routinesJoe Ramsay2023-12-201-0/+2
| | | | | | | | | | | Compilers may emit calls to 'half-width' routines (two-lane single-precision variants). These have been added in the form of wrappers around the full-width versions, where the low half of the vector is simply duplicated. This will perform poorly when one lane triggers the special-case handler, as there will be a redundant call to the scalar version, however this is expected to be rare at Ofast. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* aarch64: Add vector implementations of log1p routinesJoe Ramsay2023-11-101-0/+128
May discard sign of zero.