| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
|
|
|
|
| |
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
|
|
|
|
| |
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
|
|
|
|
| |
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
|
|
|
|
| |
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
|
|
|
|
| |
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
|
|
|
|
|
|
|
| |
This includes a fix for big-endian in AdvSIMD log, some cosmetic
changes, and numerous small optimisations mainly around inlining and
using indexed variants of MLA intrinsics.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
| |
|
|
|
|
|
|
|
| |
Added annotations for autovec by GCC and GFortran - this enables GCC
>= 9 to autovectorise math calls at -Ofast.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Compilers may emit calls to 'half-width' routines (two-lane
single-precision variants). These have been added in the form of
wrappers around the full-width versions, where the low half of the
vector is simply duplicated. This will perform poorly when one lane
triggers the special-case handler, as there will be a redundant call
to the scalar version, however this is expected to be rare at Ofast.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
|
|
|
|
|
|
|
| |
routines
Avoids emitting many saves/restores of vector registers, reduces the
amount of code generated around the scalar fallback.
|
|
|
|
|
|
|
|
| |
These were broken by the new atan2 functions, as they were only
set up for univariate functions. Arity is now detected from the
input file - this revealed a mistake that the double-precision
inputs were being used for both single- and double-precision
routines, which is now remedied.
|
|
|
|
| |
May discard sign of 0 - auto tests for -0 and -0x1p-10000 updated accordingly.
|
|
|
|
| |
May discard sign of zero.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
| |
Double-precision routines either reuse the exp table (AdvSIMD) or use
SVE FEXPA intruction.
|
|
|
|
| |
A table is also added, which is shared between AdvSIMD and SVE log10.
|
|
|
|
| |
A table is also added, which is shared between AdvSIMD and SVE log2.
|
|
|
|
| |
Some routines reuse table from v_exp_data.c
|
|
|
|
|
| |
This includes some utility headers for evaluating polynomials using
various schemes.
|
|
|
|
|
|
|
|
|
|
|
| |
* Transpose table layout for improved memory access
* Use half-vector special comparisons for AdvSIMD
* Improve register use near special-case branches
- Due to the presence of a function call, return value would get
mov-d out of x0 in order to facilitate PCS. By moving the final
computation after the branch this can be avoided
Also change SVE routines to use overloaded intrinsics for readability.
|
|
|
|
|
|
|
| |
Use overloaded intrinsics for readability. Codegen does not
change, however while we're bringing the routines up-to-date with
recent improvements to other routines in AOR it is worth copying
this change over as well.
|
|
|
|
|
|
| |
Saves a mov by ensuring return value does not need to be moved out of
the way before special-case branch. Also change to use overloaded
intrinsics.
|
|
|
|
|
|
|
|
| |
* Update ULP comment reflecting a new observed max in [-pi/2, pi/2]
* Use the same polynomial in AdvSIMD and SVE, rather than FTRIG instructions
* Improve register use near special-case branch
Also use overloaded intrinsics for SVE.
|
|
|
|
|
|
|
| |
Remove the unnecessary extra checks for sin (-0.0) from vector sin/sinf,
improving performance. Passes regress.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Optimised implementations for single and double precision, Advanced
SIMD and SVE, copied from Arm Optimized Routines.
As previously, data tables are used via a barrier to prevent
overly aggressive constant inlining. Special-case handlers are
marked NOINLINE to avoid incurring the penalty of switching call
standards unnecessarily.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Optimised implementations for single and double precision, Advanced
SIMD and SVE, copied from Arm Optimized Routines. Log lookup table
added as HIDDEN symbol to allow it to be shared between AdvSIMD and
SVE variants.
As previously, data tables are used via a barrier to prevent
overly aggressive constant inlining. Special-case handlers are
marked NOINLINE to avoid incurring the penalty of switching call
standards unnecessarily.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Optimised implementations for single and double precision, Advanced
SIMD and SVE, copied from Arm Optimized Routines.
As previously, data tables are used via a barrier to prevent
overly aggressive constant inlining. Special-case handlers are
marked NOINLINE to avoid incurring the penalty of switching call
standards unnecessarily.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Replace the loop-over-scalar placeholder routines with optimised
implementations from Arm Optimized Routines (AOR).
Also add some headers containing utilities for aarch64 libmvec
routines, and update libm-test-ulps.
Data tables for new routines are used via a pointer with a
barrier on it, in order to prevent overly aggressive constant
inlining in GCC. This allows a single adrp, combined with offset
loads, to be used for every constant in the table.
Special-case handlers are marked NOINLINE in order to confine the
save/restore overhead of switching from vector to normal calling
standard. This way we only incur the extra memory access in the
exceptional cases. NOINLINE definitions have been moved to
math_private.h in order to reduce duplication.
AOR exposes a config option, WANT_SIMD_EXCEPT, to enable
selective masking (and later fixing up) of invalid lanes, in
order to trigger fp exceptions correctly (AdvSIMD only). This is
tested and maintained in AOR, however it is configured off at
source level here for performance reasons. We keep the
WANT_SIMD_EXCEPT blocks in routine sources to greatly simplify
the upstreaming process from AOR to glibc.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch enables libmvec on AArch64. The proposed change is mainly
implementing build infrastructure to add the new routines to ABI,
tests and benchmarks. I have demonstrated how this all fits together
by adding implementations for vector cos, in both single and double
precision, targeting both Advanced SIMD and SVE.
The implementations of the routines themselves are just loops over the
scalar routine from libm for now, as we are more concerned with
getting the plumbing right at this point. We plan to contribute vector
routines from the Arm Optimized Routines repo that are compliant with
requirements described in the libmvec wiki.
Building libmvec requires minimum GCC 10 for SVE ACLE. To avoid raising
the minimum GCC by such a big jump, we allow users to disable libmvec
if their compiler is too old.
Note that at this point users have to manually call the vector math
functions. This seems to be acceptable to some downstream users.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The AFP feature (Alternate floating-point behavior) was added in armv8.7 and
introduced new FPCR bits.
Currently, HWCAP2_AFP bits (bit 0, 1, 2) in FPCR are preserved when fenv is
set to default environment. This is a deviation from standard behaviour.
Clear these bits when setting the fenv to default.
There is no libc API to modify the new FPCR bits. Restoring those bits matters
if the user changed them directly.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I used these shell commands:
../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")
and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 7061 files FOO.
I then removed trailing white space from math/tgmath.h,
support/tst-support-open-dev-null-range.c, and
sysdeps/x86_64/multiarch/strlen-vec.S, to work around the following
obscure pre-commit check failure diagnostics from Savannah. I don't
know why I run into these diagnostics whereas others evidently do not.
remote: *** 912-#endif
remote: *** 913:
remote: *** 914-
remote: *** error: lines with trailing whitespace found
...
remote: *** error: sysdeps/unix/sysv/linux/statx_cp.c: trailing lines
|
|
|
|
| |
It allows to remove the arch-specific implementations.
|
|
|
|
|
| |
Redirect target specific roundeven functions for aarch64, ldbl-128ibm
and riscv.
|
|
|
|
|
|
| |
Add inline assembler for the roundeven functions.
Passes GLIBC regression. Note GCC does not inline the builtin (PR100966),
so this cannot be used for now.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I used these shell commands:
../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")
and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 6694 files FOO.
I then removed trailing white space from benchtests/bench-pthread-locks.c
and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this
diagnostic from Savannah:
remote: *** pre-commit check failed ...
remote: *** error: lines with trailing whitespace found
remote: error: hook declined to update refs/heads/master
|
|
|
|
|
|
|
|
| |
The -fno-math-errno is already added by default and the minimum
required GCC to build glibc (6.2) make the -ffinite-math-only
superflous.
Checked on aarch64-linux-gnu.
|
|
|
|
|
|
|
| |
The define is already set on the math-use-builtins-ceil.h, the patch
just removes the implementations (it was missed on c9feb1be93).
Checked on aarch64-linux-gnu.
|
|
|
|
|
|
|
|
|
|
|
| |
Each symbol definitions are moved on a separated file and it
cover all symbol type definitions (float, double, long double,
and float128).
It allows to set support for architectures without the boiler
place of copying default values.
Checked with a build on the affected ABIs.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
introduce sysdep header math-use-builtins.h to replace aarch64
implementations with corresponding generic ones.
- newly inroduced generic sqrt{,f}, fma{,f}
- existing floor{,f}, nearbyint{,f}, rint{,f}, round{,f}, trunc{,f}
- Note that generic copysign was already enabled (via generic
math-use-builtins.h) now thru sysdep header
Tested with build-many-glibcs for aarch64-linux-gnu
This is a non functional change and aarch64 libm before/after was
byte invariant as compared below:
| cd /SCRATCH/vgupta/gnu/install-glibc-A-baseline
| for i in `find . -name libm-2.31.9000.so`; do
| echo $i; diff $i /SCRATCH/vgupta/gnu/install-glibc-C-reduce-scope/$i ;
| echo $?;
| done
| ./aarch64-linux-gnu/lib64/libm-2.31.9000.so
| 0
| ./arm-linux-gnueabi/lib/libm-2.31.9000.so
| 0
| ./x86_64-linux-gnu/lib64/libm-2.31.9000.so
| 0
| ./arm-linux-gnueabihf/lib/libm-2.31.9000.so
| 0
| ./riscv64-linux-gnu-rv64imac-lp64/lib64/lp64/libm-2.31.9000.so
| 0
| ./riscv64-linux-gnu-rv64imafdc-lp64/lib64/lp64/libm-2.31.9000.so
| 0
| ./powerpc-linux-gnu/lib/libm-2.31.9000.so
| 0
| ./microblaze-linux-gnu/lib/libm-2.31.9000.so
| 0
| ./nios2-linux-gnu/lib/libm-2.31.9000.so
| 0
| ./hppa-linux-gnu/lib/libm-2.31.9000.so
| 0
| ./s390x-linux-gnu/lib64/libm-2.31.9000.so
| 0
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a new macro, libm_alias_finite, to define all _finite
symbol. It sets all _finite symbol as compat symbol based on its first
version (obtained from the definition at built generated first-versions.h).
The <fn>f128_finite symbols were introduced in GLIBC 2.26 and so need
special treatment in code that is shared between long double and float128.
It is done by adding a list, similar to internal symbol redifinition,
on sysdeps/ieee754/float128/float128_private.h.
Alpha also needs some tricky changes to ensure we still emit 2 compat
symbols for sqrt(f).
Passes buildmanyglibc.
Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Also, change sources.redhat.com to sourceware.org.
This patch was automatically generated by running the following shell
script, which uses GNU sed, and which avoids modifying files imported
from upstream:
sed -ri '
s,(http|ftp)(://(.*\.)?(gnu|fsf|sourceware)\.org($|[^.]|\.[^a-z])),https\2,g
s,(http|ftp)(://(.*\.)?)sources\.redhat\.com($|[^.]|\.[^a-z]),https\2sourceware.org\4,g
' \
$(find $(git ls-files) -prune -type f \
! -name '*.po' \
! -name 'ChangeLog*' \
! -path COPYING ! -path COPYING.LIB \
! -path manual/fdl-1.3.texi ! -path manual/lgpl-2.1.texi \
! -path manual/texinfo.tex ! -path scripts/config.guess \
! -path scripts/config.sub ! -path scripts/install-sh \
! -path scripts/mkinstalldirs ! -path scripts/move-if-change \
! -path INSTALL ! -path locale/programs/charmap-kw.h \
! -path po/libc.pot ! -path sysdeps/gnu/errlist.c \
! '(' -name configure \
-execdir test -f configure.ac -o -f configure.in ';' ')' \
! '(' -name preconfigure \
-execdir test -f preconfigure.ac ';' ')' \
-print)
and then by running 'make dist-prepare' to regenerate files built
from the altered files, and then executing the following to cleanup:
chmod a+x sysdeps/unix/sysv/linux/riscv/configure
# Omit irrelevant whitespace and comment-only changes,
# perhaps from a slightly-different Autoconf version.
git checkout -f \
sysdeps/csky/configure \
sysdeps/hppa/configure \
sysdeps/riscv/configure \
sysdeps/unix/sysv/linux/csky/configure
# Omit changes that caused a pre-commit check to fail like this:
# remote: *** error: sysdeps/powerpc/powerpc64/ppc-mcount.S: trailing lines
git checkout -f \
sysdeps/powerpc/powerpc64/ppc-mcount.S \
sysdeps/unix/sysv/linux/s390/s390-64/syscall.S
# Omit change that caused a pre-commit check to fail like this:
# remote: *** error: sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S: last line does not end in newline
git checkout -f sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch makes further coding style fixes where code was breaking
lines after an operator, contrary to the GNU Coding Standards. As
with the previous patch, it is limited to files following a reasonable
approximation to GNU style already, and is not exhaustive; more such
issues remain to be fixed.
Tested for x86_64, and with build-many-glibcs.py.
* dirent/dirent.h [!_DIRENT_HAVE_D_NAMLEN
&& _DIRENT_HAVE_D_RECLEN] (_D_ALLOC_NAMLEN): Break lines before
rather than after operators.
* elf/cache.c (print_cache): Likewise.
* gshadow/fgetsgent_r.c (__fgetsgent_r): Likewise.
* htl/pt-getattr.c (__pthread_getattr_np): Likewise.
* hurd/hurdinit.c (_hurd_setproc): Likewise.
* hurd/hurdkill.c (_hurd_sig_post): Likewise.
* hurd/hurdlookup.c (__file_name_lookup_under): Likewise.
* hurd/hurdsig.c (_hurd_internal_post_signal): Likewise.
(reauth_proc): Likewise.
* hurd/lookup-at.c (__file_name_lookup_at): Likewise.
(__file_name_split_at): Likewise.
(__directory_name_split_at): Likewise.
* hurd/lookup-retry.c (__hurd_file_name_lookup_retry): Likewise.
* hurd/port2fd.c (_hurd_port2fd): Likewise.
* iconv/gconv_dl.c (do_print): Likewise.
* inet/netinet/in.h (struct sockaddr_in): Likewise.
* libio/wstrops.c (_IO_wstr_seekoff): Likewise.
* locale/setlocale.c (new_composite_name): Likewise.
* malloc/memusagestat.c (main): Likewise.
* misc/fstab.c (fstab_convert): Likewise.
* nptl/pthread_mutex_unlock.c (__pthread_mutex_unlock_usercnt):
Likewise.
* nss/nss_compat/compat-grp.c (getgrent_next_nss): Likewise.
(getgrent_next_file): Likewise.
(internal_getgrnam_r): Likewise.
(internal_getgrgid_r): Likewise.
* nss/nss_compat/compat-initgroups.c (getgrent_next_nss):
Likewise.
(internal_getgrent_r): Likewise.
* nss/nss_compat/compat-pwd.c (getpwent_next_nss_netgr): Likewise.
(getpwent_next_nss): Likewise.
(getpwent_next_file): Likewise.
(internal_getpwnam_r): Likewise.
(internal_getpwuid_r): Likewise.
* nss/nss_compat/compat-spwd.c (getspent_next_nss_netgr):
Likewise.
(getspent_next_nss): Likewise.
(internal_getspnam_r): Likewise.
* pwd/fgetpwent_r.c (__fgetpwent_r): Likewise.
* shadow/fgetspent_r.c (__fgetspent_r): Likewise.
* string/strchr.c (STRCHR): Likewise.
* string/strchrnul.c (STRCHRNUL): Likewise.
* sysdeps/aarch64/fpu/fpu_control.h (_FPU_FPCR_IEEE): Likewise.
* sysdeps/aarch64/sfp-machine.h (_FP_CHOOSENAN): Likewise.
* sysdeps/csky/dl-machine.h (elf_machine_rela): Likewise.
* sysdeps/generic/memcopy.h (PAGE_COPY_FWD_MAYBE): Likewise.
* sysdeps/generic/symbol-hacks.h (__stack_chk_fail_local):
Likewise.
* sysdeps/gnu/netinet/ip_icmp.h (ICMP_INFOTYPE): Likewise.
* sysdeps/gnu/updwtmp.c (TRANSFORM_UTMP_FILE_NAME): Likewise.
* sysdeps/gnu/utmp_file.c (TRANSFORM_UTMP_FILE_NAME): Likewise.
* sysdeps/hppa/jmpbuf-unwind.h (_JMPBUF_UNWINDS): Likewise.
* sysdeps/mach/hurd/bits/stat.h (S_ISPARE): Likewise.
* sysdeps/mach/hurd/dl-sysdep.c (_dl_sysdep_start): Likewise.
(open_file): Likewise.
* sysdeps/mach/hurd/htl/pt-mutexattr-setprotocol.c
(pthread_mutexattr_setprotocol): Likewise.
* sysdeps/mach/hurd/ioctl.c (__ioctl): Likewise.
* sysdeps/mach/hurd/mmap.c (__mmap): Likewise.
* sysdeps/mach/hurd/ptrace.c (ptrace): Likewise.
* sysdeps/mach/hurd/spawni.c (__spawni): Likewise.
* sysdeps/microblaze/dl-machine.h (elf_machine_type_class):
Likewise.
(elf_machine_rela): Likewise.
* sysdeps/mips/mips32/sfp-machine.h (_FP_CHOOSENAN): Likewise.
* sysdeps/mips/mips64/sfp-machine.h (_FP_CHOOSENAN): Likewise.
* sysdeps/mips/sys/asm.h (multiple #if conditionals): Likewise.
* sysdeps/posix/rename.c (rename): Likewise.
* sysdeps/powerpc/novmx-sigjmp.c (__novmx__sigjmp_save): Likewise.
* sysdeps/powerpc/sigjmp.c (__vmx__sigjmp_save): Likewise.
* sysdeps/s390/fpu/fenv_libc.h (FPC_VALID_MASK): Likewise.
* sysdeps/s390/utf8-utf16-z9.c (gconv_end): Likewise.
* sysdeps/unix/grantpt.c (grantpt): Likewise.
* sysdeps/unix/sysv/linux/a.out.h (N_TXTOFF): Likewise.
* sysdeps/unix/sysv/linux/updwtmp.c (TRANSFORM_UTMP_FILE_NAME):
Likewise.
* sysdeps/unix/sysv/linux/utmp_file.c (TRANSFORM_UTMP_FILE_NAME):
Likewise.
* sysdeps/x86/cpu-features.c (get_common_indices): Likewise.
* time/tzfile.c (__tzfile_compute): Likewise.
|
|
|
|
|
|
|
| |
* All files with FSF copyright notices: Update copyright dates
using scripts/update-copyrights.
* locale/programs/charmap-kw.h: Regenerated.
* locale/programs/locfile-kw.h: Likewise.
|