about summary refs log tree commit diff
path: root/sysdeps/aarch64
Commit message (Collapse)AuthorAgeFilesLines
* Add libm_alias_finite for _finite symbolsWilco Dijkstra2020-01-033-3/+6
| | | | | | | | | | | | | | | | | | | This patch adds a new macro, libm_alias_finite, to define all _finite symbol. It sets all _finite symbol as compat symbol based on its first version (obtained from the definition at built generated first-versions.h). The <fn>f128_finite symbols were introduced in GLIBC 2.26 and so need special treatment in code that is shared between long double and float128. It is done by adding a list, similar to internal symbol redifinition, on sysdeps/ieee754/float128/float128_private.h. Alpha also needs some tricky changes to ensure we still emit 2 compat symbols for sqrt(f). Passes buildmanyglibc. Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* Update copyright dates with scripts/update-copyrights.Joseph Myers2020-01-01134-134/+134
|
* aarch64: add default memcpy version for kunpeng920Xuelei Zhang2019-12-271-1/+1
| | | | Checked on aarch64-linux-gnu.
* aarch64: ifunc rename for kunpengXuelei Zhang2019-12-272-2/+2
| | | | | | | Rename ifunc for kunpeng to kunpeng920, and modify the corresponding function files including IS_KUNPENG920 judgement. Checked on aarch64-linux-gnu.
* aarch64: Modify error-shown comments for strcpyXuelei Zhang2019-12-271-1/+1
| | | | Checked on aarch64-linux-gnu.
* aarch64: Optimized memset for Kunpeng processor.Xuelei Zhang2019-12-194-2/+119
| | | | | | | | | | | | | | | Due to the branch prediction issue of Kunpeng processor, we found memset_generic has poor performance on middle sizes setting, and so we reconstructed the logic, expanded the loop by 4 times in set_long to solve the problem, even when setting below 1K sizes have benefit. Another change is that DZ_ZVA seems no work when setting zero, so we discarded it and used set_long to set zero instead. Fewer branches and predictions also make the zero case have slightly improvement. Checked on aarch64-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
* aarch64: Optimized strlen for strlen_asimdXuelei Zhang2019-12-192-17/+29
| | | | | | | | | | | Optimize the strlen implementation by using vector operations and loop unrolling in main loop.Compared to __strlen_generic,it reduces latency of cases in bench-strlen by 7%~18% when the length of src is greater than 128 bytes, with gains throughout the benchmark. Checked on aarch64-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
* aarch64: Optimized implementation of memrchrXuelei Zhang2019-12-191-0/+165
| | | | | | | | | | | Considering the excellent performance of memchr.S on glibc 2.30, the same algorithm is used to find chrin. Compared to memrchr.c, this method with memrchr.S achieves an average performance improvement of 58% based on benchtest and its extension cases. Checked on aarch64-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
* aarch64: Optimized implementation of strnlenXuelei Zhang2019-12-191-1/+51
| | | | | | | | | | | Optimize the strlen implementation by using vector operations and loop unrooling in main loop. Compared to aarch64/strnlen.S, it reduces latency of cases in bench-strnlen by 11%~24% when the length of src is greater than 64 bytes, with gains throughout the benchmark. Checked on aarch64-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
* aarch64: Optimized implementation of strcpyXuelei Zhang2019-12-191-32/+27
| | | | | | | | | | | Optimize the strcpy implementation by using vector loads and operations in main loop.Compared to aarch64/strcpy.S, it reduces latency of cases in bench-strlen by 5%~18% when the length of src is greater than 64 bytes, with gains throughout the benchmark. Checked on aarch64-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
* aarch64: Optimized implementation of memcmpXuelei Zhang2019-12-191-53/+79
| | | | | | | | | | | The loop body is expanded from a 16-byte comparison to a 64-byte comparison, and the usage of ldp is replaced by the Post-index mode to the Base plus offset mode. Hence, compare can faster 18% around > 128 bytes in all. Checked on aarch64-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
* elf: Do not run IFUNC resolvers for LD_DEBUG=unused [BZ #24214]Florian Weimer2019-12-021-1/+2
| | | | | | | | | This commit adds missing skip_ifunc checks to aarch64, arm, i386, sparc, and x86_64. A new test case ensures that IRELATIVE IFUNC resolvers do not run in various diagnostic modes of the dynamic loader. Reviewed-By: Szabolcs Nagy <szabolcs.nagy@arm.com>
* nptl: Add default pthread-offsets.hAdhemerval Zanella2019-11-261-3/+0
| | | | | | | | | | This patch adds a default pthread-offsets.h based on default thread definitions from struct_mutex.h and struct_rwlock.h. The idea is to simplify new ports inclusion. Checked with a build on affected abis. Change-Id: I7785a9581e651feb80d1413b9e03b5ac0452668a
* nptl: Add struct_rwlock.hAdhemerval Zanella2019-11-262-15/+41
| | | | | | | | | | | | | | | | | | | | | | | | This patch adds a new generic __pthread_rwlock_arch_t definition meant to be used by new ports. Its layout mimics the current usage on some 64 bits ports and it allows some ports to use the generic definition. The arch __pthread_rwlock_arch_t definition is moved from pthreadtypes-arch.h to another arch-specific header (struct_rwlock.h). Also the static intialization macro for pthread_rwlock_t is set to use an arch defined on (__PTHREAD_RWLOCK_INITIALIZER) which simplifies its implementation. The default pthread_rwlock_t layout differs from current ports with: 1. Internal layout is the same for 32 bits and 64 bits. 2. Internal flag is an unsigned short so it should not required additional padding to align for word boundary (if it is the case for the ABI). Checked with a build on affected abis. Change-Id: I776a6a986c23199929d28a3dcd30272db21cd1d0
* nptl: Add struct_mutex.hAdhemerval Zanella2019-11-261-7/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current way of defining the common mutex definition for POSIX and C11 on pthreadtypes-arch.h (added by commit 06be6368da16104be5) is not really the best options for newer ports. It requires define some misleading flags that should be always defined as 0 (__PTHREAD_COMPAT_PADDING_MID and __PTHREAD_COMPAT_PADDING_END), it exposes options used solely for linuxthreads compat mode (__PTHREAD_MUTEX_USE_UNION and __PTHREAD_MUTEX_NUSERS_AFTER_KIND), and requires newer ports to explicit define them (adding more boilerplate code). This patch adds a new default __pthread_mutex_s definition meant to be used by newer ports. Its layout mimics the current usage on both 32 and 64 bits ports and it allows most ports to use the generic definition. Only ports that use some arch-specific definition (such as hardware lock-elision or linuxthreads compat) requires specific headers. For 32 bit, the generic definitions mimic the other 32-bit ports of using an union to define the fields uses on adaptive and robust mutexes (thus not allowing both usage at same time) and by using a single linked-list for robust mutexes. Both decisions seemed to follow what recent ports have done and make the resulting pthread_mutex_t/mtx_t object smaller. Also the static intialization macro for pthread_mutex_t is set to use a macro __PTHREAD_MUTEX_INITIALIZER where the architecture can redefine in its struct_mutex.h if it requires additional fields to be initialized. Checked with a build on affected abis. Change-Id: I30a22c3e3497805fd6e52994c5925897cffcfe13
* nptl: Remove rwlock elision definitionsAdhemerval Zanella2019-11-261-2/+0
| | | | | | | | | | The new rwlock implementation added by cc25c8b4c1196 (2.25) removed support for lock-elision. This patch removes remaining the arch-specific unused definitions. Checked with a build against all affected ABIs. Change-Id: I5dec8af50e3cd56d7351c52ceff4aa3771b53cd6
* nptl: Add tests for internal pthread_rwlock_t offsetsAdhemerval Zanella2019-11-261-0/+2
| | | | | | | | | | | | | This patch new build tests to check for internal fields offsets for internal pthread_rwlock_t definition. Althoug the '__data.__flags' field layout should be preserved due static initializators, the patch also adds tests for the futexes that may be used in a shared memory (although using different libc version in such scenario is not really supported). Checked with a build against all affected ABIs. Change-Id: Iccc103d557de13d17e4a3f59a0cad2f4a640c148
* nptl: Cleanup mutex internal offset testsAdhemerval Zanella2019-11-261-4/+0
| | | | | | | | | | | | | The offsets of pthread_mutex_t __data.__nusers, __data.__spins, __data.elision, __data.list are not required to be constant over the releases. Only the __data.__kind is used for static initializers. This patch also adds an additional size check for __data.__kind. Checked with a build against affected ABIs. Change-Id: I7a4e48cc91b4c4ada57e9a5d1b151fb702bfaa9f
* aarch64: Increase small and medium cases for __memcpy_genericKrzysztof Koch2019-11-121-35/+47
| | | | | | | | | | | | | | | | | | | | | | | | Increase the upper bound on medium cases from 96 to 128 bytes. Now, up to 128 bytes are copied unrolled. Increase the upper bound on small cases from 16 to 32 bytes so that copies of 17-32 bytes are not impacted by the larger medium case. Benchmarking: The attached figures show relative timing difference with respect to 'memcpy_generic', which is the existing implementation. 'memcpy_med_128' denotes the the version of memcpy_generic with only the medium case enlarged. The 'memcpy_med_128_small_32' numbers are for the version of memcpy_generic submitted in this patch, which has both medium and small cases enlarged. The figures were generated using the script from: https://www.sourceware.org/ml/libc-alpha/2019-10/msg00563.html Depending on the platform, the performance improvement in the bench-memcpy-random.c benchmark ranges from 6% to 20% between the original and final version of memcpy.S Tested against GLIBC testsuite and randomized tests.
* Split up endian.h to minimize exposure of BYTE_ORDER.Alistair Francis2019-10-013-31/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With only two exceptions (sys/types.h and sys/param.h, both of which historically might have defined BYTE_ORDER) the public headers that include <endian.h> only want to be able to test __BYTE_ORDER against __*_ENDIAN. This patch creates a new bits/endian.h that can be included by any header that wants to be able to test __BYTE_ORDER and/or __FLOAT_WORD_ORDER against the __*_ENDIAN constants, or needs __LONG_LONG_PAIR. It only defines macros in the implementation namespace. The existing bits/endian.h (which could not be included independently of endian.h, and only defines __BYTE_ORDER and maybe __FLOAT_WORD_ORDER) is renamed to bits/endianness.h. I also took the opportunity to canonicalize the form of this header, which we are stuck with having one copy of per architecture. Since they are so short, this means git doesn’t understand that they were renamed from existing headers, sigh. endian.h itself is a nonstandard header and its only remaining use from a standard header is guarded by __USE_MISC, so I dropped the __USE_MISC conditionals from around all of the public-namespace things it defines. (This means, an application that requests strict library conformance but includes endian.h will still see the definition of BYTE_ORDER.) A few changes to specific bits/endian(ness).h variants deserve mention: - sysdeps/unix/sysv/linux/ia64/bits/endian.h is moved to sysdeps/ia64/bits/endianness.h. If I remember correctly, ia64 did have selectable endianness, but we have assembly code in sysdeps/ia64 that assumes it’s little-endian, so there is no reason to treat the ia64 endianness.h as linux-specific. - The C-SKY port does not fully support big-endian mode, the compile will error out if __CSKYBE__ is defined. - The PowerPC port had extra logic in its bits/endian.h to detect a broken compiler, which strikes me as unnecessary, so I removed it. - The only files that defined __FLOAT_WORD_ORDER always defined it to the same value as __BYTE_ORDER, so I removed those definitions. The SH bits/endian(ness).h had comments inconsistent with the actual setting of __FLOAT_WORD_ORDER, which I also removed. - I *removed* copyright boilerplate from the few bits/endian(ness).h headers that had it; these files record a single fact in a fashion dictated by an external spec, so I do not think they are copyrightable. As long as I was changing every copy of ieee754.h in the tree, I noticed that only the MIPS variant includes float.h, because it uses LDBL_MANT_DIG to decide among three different versions of ieee854_long_double. This patch makes it not include float.h when GCC’s intrinsic __LDBL_MANT_DIG__ is available. * string/endian.h: Unconditionally define LITTLE_ENDIAN, BIG_ENDIAN, PDP_ENDIAN, and BYTE_ORDER. Condition byteswapping macros only on !__ASSEMBLER__. Move the definitions of __BIG_ENDIAN, __LITTLE_ENDIAN, __PDP_ENDIAN, __FLOAT_WORD_ORDER, and __LONG_LONG_PAIR to... * string/bits/endian.h: ...this new file, which includes the renamed header bits/endianness.h for the definition of __BYTE_ORDER and possibly __FLOAT_WORD_ORDER. * string/Makefile: Install bits/endianness.h. * include/bits/endian.h: New wrapper. * bits/endian.h: Rename to bits/endianness.h. Add multiple-include guard. Rewrite the comment explaining what the machine-specific variants of this file should do. * sysdeps/unix/sysv/linux/ia64/bits/endian.h: Move to sysdeps/ia64. * sysdeps/aarch64/bits/endian.h * sysdeps/alpha/bits/endian.h * sysdeps/arm/bits/endian.h * sysdeps/csky/bits/endian.h * sysdeps/hppa/bits/endian.h * sysdeps/ia64/bits/endian.h * sysdeps/m68k/bits/endian.h * sysdeps/microblaze/bits/endian.h * sysdeps/mips/bits/endian.h * sysdeps/nios2/bits/endian.h * sysdeps/powerpc/bits/endian.h * sysdeps/riscv/bits/endian.h * sysdeps/s390/bits/endian.h * sysdeps/sh/bits/endian.h * sysdeps/sparc/bits/endian.h * sysdeps/x86/bits/endian.h: Rename to endianness.h; canonicalize form of file; remove redundant definitions of __FLOAT_WORD_ORDER. * sysdeps/powerpc/bits/endianness.h: Remove logic to check for broken compilers. * ctype/ctype.h * sysdeps/aarch64/nptl/bits/pthreadtypes-arch.h * sysdeps/arm/nptl/bits/pthreadtypes-arch.h * sysdeps/csky/nptl/bits/pthreadtypes-arch.h * sysdeps/ia64/ieee754.h * sysdeps/ieee754/ieee754.h * sysdeps/ieee754/ldbl-128/ieee754.h * sysdeps/ieee754/ldbl-128ibm/ieee754.h * sysdeps/m68k/nptl/bits/pthreadtypes-arch.h * sysdeps/microblaze/nptl/bits/pthreadtypes-arch.h * sysdeps/mips/ieee754/ieee754.h * sysdeps/mips/nptl/bits/pthreadtypes-arch.h * sysdeps/nios2/nptl/bits/pthreadtypes-arch.h * sysdeps/nptl/pthread.h * sysdeps/riscv/nptl/bits/pthreadtypes-arch.h * sysdeps/sh/nptl/bits/pthreadtypes-arch.h * sysdeps/sparc/sparc32/ieee754.h * sysdeps/unix/sysv/linux/generic/bits/stat.h * sysdeps/unix/sysv/linux/generic/bits/statfs.h * sysdeps/unix/sysv/linux/sys/acct.h * wctype/bits/wctype-wchar.h: Include bits/endian.h, not endian.h. * sysdeps/unix/sysv/linux/hppa/pthread.h: Don’t include endian.h. * sysdeps/mips/ieee754/ieee754.h: Use __LDBL_MANT_DIG__ in ifdefs, instead of LDBL_MANT_DIG. Only include float.h when __LDBL_MANT_DIG__ is not predefined, in which case define __LDBL_MANT_DIG__ to equal LDBL_MANT_DIG.
* Prefer https to http for gnu.org and fsf.org URLsPaul Eggert2019-09-07132-132/+132
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Also, change sources.redhat.com to sourceware.org. This patch was automatically generated by running the following shell script, which uses GNU sed, and which avoids modifying files imported from upstream: sed -ri ' s,(http|ftp)(://(.*\.)?(gnu|fsf|sourceware)\.org($|[^.]|\.[^a-z])),https\2,g s,(http|ftp)(://(.*\.)?)sources\.redhat\.com($|[^.]|\.[^a-z]),https\2sourceware.org\4,g ' \ $(find $(git ls-files) -prune -type f \ ! -name '*.po' \ ! -name 'ChangeLog*' \ ! -path COPYING ! -path COPYING.LIB \ ! -path manual/fdl-1.3.texi ! -path manual/lgpl-2.1.texi \ ! -path manual/texinfo.tex ! -path scripts/config.guess \ ! -path scripts/config.sub ! -path scripts/install-sh \ ! -path scripts/mkinstalldirs ! -path scripts/move-if-change \ ! -path INSTALL ! -path locale/programs/charmap-kw.h \ ! -path po/libc.pot ! -path sysdeps/gnu/errlist.c \ ! '(' -name configure \ -execdir test -f configure.ac -o -f configure.in ';' ')' \ ! '(' -name preconfigure \ -execdir test -f preconfigure.ac ';' ')' \ -print) and then by running 'make dist-prepare' to regenerate files built from the altered files, and then executing the following to cleanup: chmod a+x sysdeps/unix/sysv/linux/riscv/configure # Omit irrelevant whitespace and comment-only changes, # perhaps from a slightly-different Autoconf version. git checkout -f \ sysdeps/csky/configure \ sysdeps/hppa/configure \ sysdeps/riscv/configure \ sysdeps/unix/sysv/linux/csky/configure # Omit changes that caused a pre-commit check to fail like this: # remote: *** error: sysdeps/powerpc/powerpc64/ppc-mcount.S: trailing lines git checkout -f \ sysdeps/powerpc/powerpc64/ppc-mcount.S \ sysdeps/unix/sysv/linux/s390/s390-64/syscall.S # Omit change that caused a pre-commit check to fail like this: # remote: *** error: sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S: last line does not end in newline git checkout -f sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S
* aarch64: Disable using DC ZVA in emag memsetFeng Xue2019-08-142-7/+17
| | | | | | | * sysdeps/aarch64/multiarch/memset_base64.S (DC_ZVA_THRESHOLD): Disable DC ZVA code if this macro is defined as zero. * sysdeps/aarch64/multiarch/memset_emag.S (DC_ZVA_THRESHOLD): Change to zero to disable using DC ZVA.
* Declare most TS 18661-1 interfaces for C2X.Joseph Myers2019-08-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | C2X adds the interfaces from TS 18661-1, and all except a handful in Annex F are unconditionally visible in C2X rather than only visible when __STDC_WANT_IEC_60559_BFP_EXT__ is defined. This patch updates glibc headers accordingly: most uses of __GLIBC_USE (IEC_60559_BFP_EXT) are changed to a new __GLIBC_USE (IEC_60559_BFP_EXT_C2X). (Regarding totalorder and totalordermag, the type-generic macros in tgmath.h will go away when the functions are changed to take pointer arguments.) * bits/libc-header-start.h (__GLIBC_USE_IEC_60559_BFP_EXT): Update comment. (__GLIBC_USE_IEC_60559_BFP_EXT_C2X): New macro. * bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Change to [__GLIBC_USE (IEC_60559_BFP_EXT_C2X)]. * include/limits.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise. * math/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise. * math/math.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise. * stdlib/bits/stdlib-ldbl.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise. * stdlib/stdint.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise. * stdlib/stdlib.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise. * sysdeps/aarch64/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise. * sysdeps/alpha/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise. * sysdeps/arm/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise. * sysdeps/csky/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise. * sysdeps/hppa/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise. * sysdeps/ia64/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise. * sysdeps/m68k/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise. * sysdeps/microblaze/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise. * sysdeps/mips/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise. * sysdeps/nios2/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise. * sysdeps/powerpc/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise. * sysdeps/riscv/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise. * sysdeps/s390/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise. * sysdeps/sh/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise. * sysdeps/sparc/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise. * sysdeps/x86/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise. * math/bits/mathcalls.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise, except for totalorder, totalordermag, getpayload, setpayload and setpayloadsig. * math/tgmath.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise, except for totalorder and totalordermag.
* aarch64: simplify the DT_AARCH64_VARIANT_PCS handling codeSzabolcs Nagy2019-07-102-6/+1
| | | | | | | | | | | Remove unnecessary variant_pcs field: the dynamic tag can be checked directly. * sysdeps/aarch64/dl-machine.h (elf_machine_runtime_setup): Remove the DT_AARCH64_VARIANT_PCS check. (elf_machine_lazy_rel): Use l_info[DT_AARCH64 (VARIANT_PCS)]. * sysdeps/aarch64/linkmap.h (struct link_map_machine): Remove variant_pcs.
* aarch64: new ifunc resolver ABISzabolcs Nagy2019-07-045-1/+185
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Passing a second argument to the ifunc resolver allows accessing AT_HWCAP2 values from the resolver. AArch64 will start using AT_HWCAP2 on linux because for ilp32 to remain compatible with lp64 ABI no more than 32bit hwcap flags can be in AT_HWCAP which is already used up. Currently the relocation ordering logic does not guarantee that ifunc resolvers can call libc apis or access libc objects, so only the resolver arguments and runtime environment dependent instructions can be used to do the dispatch (this affects ifunc resolvers outside of the libc). Since ifunc resolver is target specific and only supposed to be called by the dynamic linker, the call ABI can be changed in a backward compatible way: Old call ABI passed hwcap as uint64_t, new abi sets the _IFUNC_ARG_HWCAP flag in the hwcap and passes a second argument that's a pointer to an extendible struct. A resolver has to check the _IFUNC_ARG_HWCAP flag before accessing the second argument. The new sys/ifunc.h installed header has the definitions for the new ABI, everything is in the implementation reserved namespace. An alternative approach is to try to support extern calls from ifunc resolvers such as getauxval, but that seems non-trivial https://sourceware.org/ml/libc-alpha/2017-01/msg00468.html * sysdeps/aarch64/Makefile: Install sys/ifunc.h and add tests. * sysdeps/aarch64/dl-irel.h (elf_ifunc_invoke): Update to new ABI. * sysdeps/aarch64/sys/ifunc.h: New file. * sysdeps/aarch64/tst-ifunc-arg-1.c: New file. * sysdeps/aarch64/tst-ifunc-arg-2.c: New file.
* aarch64: handle STO_AARCH64_VARIANT_PCSSzabolcs Nagy2019-06-133-4/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Avoid lazy binding of symbols that may follow a variant PCS with different register usage convention from the base PCS. Currently the lazy binding entry code does not preserve all the registers required for AdvSIMD and SVE vector calls. Saving and restoring all registers unconditionally may break existing binaries, even if they never use vector calls, because of the larger stack requirement for lazy resolution, which can be significant on an SVE system. The solution is to mark all symbols in the symbol table that may follow a variant PCS so the dynamic linker can handle them specially. In this patch such symbols are always resolved at load time, not lazily. So currently LD_AUDIT for variant PCS symbols are not supported, for that the _dl_runtime_profile entry needs to be changed e.g. to unconditionally save/restore all registers (but pass down arg and retval registers to pltentry/exit callbacks according to the base PCS). This patch also removes a __builtin_expect from the modified code because the branch prediction hint did not seem useful. * sysdeps/aarch64/dl-dtprocnum.h: New file. * sysdeps/aarch64/dl-machine.h (DT_AARCH64): Define. (elf_machine_runtime_setup): Handle DT_AARCH64_VARIANT_PCS. (elf_machine_lazy_rel): Check STO_AARCH64_VARIANT_PCS and bind such symbols at load time. * sysdeps/aarch64/linkmap.h (struct link_map_machine): Add variant_pcs.
* aarch64: thunderx2 memmove performance improvementsAnton Youdkevitch2019-05-033-362/+216
| | | | | | | | | | | | | | | | | | | | | | | | The performance improvement is about 20%-30% for larger cases and about 1%-5% for smaller cases. Used SIMD load/store instead of GPR for large overlapping forward moves. Reused existing memcpy implementation for smaller or overlapping backward moves. Fixed the existing memcpy implementation to allow it to deal with the overlapping case. Simplified loop tails in the memcpy implementation - use branchless overlapping sequence of fixed length load/stores instead of branching depending on the size. A cleanup/optimization converting str's to stp's. Added __memmove_thunderx2 to the list of the available implementations.
* aarch64: thunderx2 memcpy implementation cleanup and streamliningAnton Youdkevitch2019-04-051-21/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here is the updated patch for improving the long unaligned code path (the one using "ext" instruction). 1. Always taken conditional branch at the beginning is removed. 2. Epilogue code is placed after the end of the loop to reduce the number of branches. 3. The redundant "mov" instructions inside the loop are gone due to the changed order of the registers in the "ext" instructions inside the loop, the prologue has additional "ext" instruction. 4.Updating count in the prologue was hoisted out as it is the same update for each prologue. 5. Invariant code of the loop epilogue was hoisted out. 6. As the current size of the ext chunk is exactly 16 instructions long "nop" was added at the beginning of the code sequence so that the loop entry for all the chunks be aligned. * sysdeps/aarch64/multiarch/memcpy_thunderx2.S: Cleanup branching and remove redundant code.
* Break more lines before not after operators.Joseph Myers2019-02-252-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes further coding style fixes where code was breaking lines after an operator, contrary to the GNU Coding Standards. As with the previous patch, it is limited to files following a reasonable approximation to GNU style already, and is not exhaustive; more such issues remain to be fixed. Tested for x86_64, and with build-many-glibcs.py. * dirent/dirent.h [!_DIRENT_HAVE_D_NAMLEN && _DIRENT_HAVE_D_RECLEN] (_D_ALLOC_NAMLEN): Break lines before rather than after operators. * elf/cache.c (print_cache): Likewise. * gshadow/fgetsgent_r.c (__fgetsgent_r): Likewise. * htl/pt-getattr.c (__pthread_getattr_np): Likewise. * hurd/hurdinit.c (_hurd_setproc): Likewise. * hurd/hurdkill.c (_hurd_sig_post): Likewise. * hurd/hurdlookup.c (__file_name_lookup_under): Likewise. * hurd/hurdsig.c (_hurd_internal_post_signal): Likewise. (reauth_proc): Likewise. * hurd/lookup-at.c (__file_name_lookup_at): Likewise. (__file_name_split_at): Likewise. (__directory_name_split_at): Likewise. * hurd/lookup-retry.c (__hurd_file_name_lookup_retry): Likewise. * hurd/port2fd.c (_hurd_port2fd): Likewise. * iconv/gconv_dl.c (do_print): Likewise. * inet/netinet/in.h (struct sockaddr_in): Likewise. * libio/wstrops.c (_IO_wstr_seekoff): Likewise. * locale/setlocale.c (new_composite_name): Likewise. * malloc/memusagestat.c (main): Likewise. * misc/fstab.c (fstab_convert): Likewise. * nptl/pthread_mutex_unlock.c (__pthread_mutex_unlock_usercnt): Likewise. * nss/nss_compat/compat-grp.c (getgrent_next_nss): Likewise. (getgrent_next_file): Likewise. (internal_getgrnam_r): Likewise. (internal_getgrgid_r): Likewise. * nss/nss_compat/compat-initgroups.c (getgrent_next_nss): Likewise. (internal_getgrent_r): Likewise. * nss/nss_compat/compat-pwd.c (getpwent_next_nss_netgr): Likewise. (getpwent_next_nss): Likewise. (getpwent_next_file): Likewise. (internal_getpwnam_r): Likewise. (internal_getpwuid_r): Likewise. * nss/nss_compat/compat-spwd.c (getspent_next_nss_netgr): Likewise. (getspent_next_nss): Likewise. (internal_getspnam_r): Likewise. * pwd/fgetpwent_r.c (__fgetpwent_r): Likewise. * shadow/fgetspent_r.c (__fgetspent_r): Likewise. * string/strchr.c (STRCHR): Likewise. * string/strchrnul.c (STRCHRNUL): Likewise. * sysdeps/aarch64/fpu/fpu_control.h (_FPU_FPCR_IEEE): Likewise. * sysdeps/aarch64/sfp-machine.h (_FP_CHOOSENAN): Likewise. * sysdeps/csky/dl-machine.h (elf_machine_rela): Likewise. * sysdeps/generic/memcopy.h (PAGE_COPY_FWD_MAYBE): Likewise. * sysdeps/generic/symbol-hacks.h (__stack_chk_fail_local): Likewise. * sysdeps/gnu/netinet/ip_icmp.h (ICMP_INFOTYPE): Likewise. * sysdeps/gnu/updwtmp.c (TRANSFORM_UTMP_FILE_NAME): Likewise. * sysdeps/gnu/utmp_file.c (TRANSFORM_UTMP_FILE_NAME): Likewise. * sysdeps/hppa/jmpbuf-unwind.h (_JMPBUF_UNWINDS): Likewise. * sysdeps/mach/hurd/bits/stat.h (S_ISPARE): Likewise. * sysdeps/mach/hurd/dl-sysdep.c (_dl_sysdep_start): Likewise. (open_file): Likewise. * sysdeps/mach/hurd/htl/pt-mutexattr-setprotocol.c (pthread_mutexattr_setprotocol): Likewise. * sysdeps/mach/hurd/ioctl.c (__ioctl): Likewise. * sysdeps/mach/hurd/mmap.c (__mmap): Likewise. * sysdeps/mach/hurd/ptrace.c (ptrace): Likewise. * sysdeps/mach/hurd/spawni.c (__spawni): Likewise. * sysdeps/microblaze/dl-machine.h (elf_machine_type_class): Likewise. (elf_machine_rela): Likewise. * sysdeps/mips/mips32/sfp-machine.h (_FP_CHOOSENAN): Likewise. * sysdeps/mips/mips64/sfp-machine.h (_FP_CHOOSENAN): Likewise. * sysdeps/mips/sys/asm.h (multiple #if conditionals): Likewise. * sysdeps/posix/rename.c (rename): Likewise. * sysdeps/powerpc/novmx-sigjmp.c (__novmx__sigjmp_save): Likewise. * sysdeps/powerpc/sigjmp.c (__vmx__sigjmp_save): Likewise. * sysdeps/s390/fpu/fenv_libc.h (FPC_VALID_MASK): Likewise. * sysdeps/s390/utf8-utf16-z9.c (gconv_end): Likewise. * sysdeps/unix/grantpt.c (grantpt): Likewise. * sysdeps/unix/sysv/linux/a.out.h (N_TXTOFF): Likewise. * sysdeps/unix/sysv/linux/updwtmp.c (TRANSFORM_UTMP_FILE_NAME): Likewise. * sysdeps/unix/sysv/linux/utmp_file.c (TRANSFORM_UTMP_FILE_NAME): Likewise. * sysdeps/x86/cpu-features.c (get_common_indices): Likewise. * time/tzfile.c (__tzfile_compute): Likewise.
* aarch64: Optimized memchr specific to AmpereComputing emagFeng Xue2019-02-016-3/+308
| | | | | | | | | | | | | | | | | | This version uses general register based memory instruction to load data, because vector register based is slightly slower in emag. Character-matching is performed on 16-byte (both size and alignment) memory block in parallel each iteration. * sysdeps/aarch64/memchr.S (__memchr): Rename to MEMCHR. [!MEMCHR](MEMCHR): Set to __memchr. * sysdeps/aarch64/multiarch/Makefile (sysdep_routines): Add memchr_generic and memchr_nosimd. * sysdeps/aarch64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add memchr ifuncs. * sysdeps/aarch64/multiarch/memchr.c: New file. * sysdeps/aarch64/multiarch/memchr_generic.S: Likewise. * sysdeps/aarch64/multiarch/memchr_nosimd.S: Likewise.
* aarch64: Optimized memset specific to AmpereComputing emagFeng Xue2019-02-015-2/+217
| | | | | | | | | | | | | | | | | | | This version uses general register based memory store instead of vector register based, for the former is faster than the latter in emag. The fact that DC ZVA size in emag is 64-byte, is used by IFUNC dispatch to select this memset, so that cost of runtime-check on DC ZVA size can be saved. * sysdeps/aarch64/multiarch/Makefile (sysdep_routines): Add memset_emag. * sysdeps/aarch64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add __memset_emag to memset ifunc. * sysdeps/aarch64/multiarch/memset.c (libc_ifunc): Add IS_EMAG check for ifunc dispatch. * sysdeps/aarch64/multiarch/memset_base64.S: New file. * sysdeps/aarch64/multiarch/memset_emag.S: New file.
* [AArch64] Add ifunc support for AresWilco Dijkstra2019-01-091-1/+1
| | | | | | | | | | | | | | | Add Ares to the midr_el0 list and support ifunc dispatch. Since Ares supports 2 128-bit loads/stores, use Neon registers for memcpy by selecting __memcpy_falkor by default (we should rename this to __memcpy_simd or similar). * manual/tunables.texi (glibc.cpu.name): Add ares tunable. * sysdeps/aarch64/multiarch/memcpy.c (__libc_memcpy): Use __memcpy_falkor for ares. * sysdeps/unix/sysv/linux/aarch64/cpu-features.h (IS_ARES): Add new define. * sysdeps/unix/sysv/linux/aarch64/cpu-features.c (cpu_list): Add ares cpu.
* Update copyright dates with scripts/update-copyrights.Joseph Myers2019-01-01123-123/+123
| | | | | | | * All files with FSF copyright notices: Update copyright dates using scripts/update-copyrights. * locale/programs/charmap-kw.h: Regenerated. * locale/programs/locfile-kw.h: Likewise.
* [AArch64] Adjust writeback in non-zero memsetWilco Dijkstra2018-11-201-3/+4
| | | | | | | | This fixes an ineffiency in the non-zero memset. Delaying the writeback until the end of the loop is slightly faster on some cores - this shows ~5% performance gain on Cortex-A53 when doing large non-zero memsets. * sysdeps/aarch64/memset.S (MEMSET): Improve non-zero memset loop.
* Remove extra space at end of line.Steve Ellcey2018-10-161-1/+1
|
* aarch64: optimized memcpy implementation for thunderx2Anton Youdkevitch2018-10-162-18/+603
| | | | | | | | | | | | | | | | Since aligned loads and stores are huge performance advantage the implementation always tries to do aligned access. Among the cases when src and dst addresses are aligned or unaligned evenly there are cases of not evenly unaligned src and dst. For such cases (if the length is big enough) ext instruction is used to merge-and-shift two memory chunks loaded from two adjacent aligned locations and then the adjusted chunk gets stored to aligned address. Performance gain against the current T2 implementation: memcpy-large: 65K-32M: +40% - +10% memcpy-walk: 128-32M: +20% - +2%
* Remove unnecessary math_private.h includes.Joseph Myers2018-09-288-8/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After my changes to move various macros, inlines and other content from math_private.h to more specific headers, many files including math_private.h no longer need to do so. Furthermore, since the optimized inlines of various functions have been moved to include/fenv.h or replaced by use of function names GCC inlines automatically, a missing math_private.h include where one is appropriate will reliably cause a build failure rather than possibly causing code to be less well optimized while still building successfully. Thus, this patch removes includes of math_private.h that are now unnecessary. In the case of two RISC-V files, the include is replaced by one of stdbool.h because the files in question were relying on math_private.h to get a definition of bool. Tested for x86_64 and x86, and with build-many-glibcs.py. * math/fromfp.h: Do not include <math_private.h>. * math/s_cacosh_template.c: Likewise. * math/s_casin_template.c: Likewise. * math/s_casinh_template.c: Likewise. * math/s_ccos_template.c: Likewise. * math/s_cproj_template.c: Likewise. * math/s_fdim_template.c: Likewise. * math/s_fmaxmag_template.c: Likewise. * math/s_fminmag_template.c: Likewise. * math/s_iseqsig_template.c: Likewise. * math/s_ldexp_template.c: Likewise. * math/s_nextdown_template.c: Likewise. * math/w_log1p_template.c: Likewise. * math/w_scalbln_template.c: Likewise. * sysdeps/aarch64/fpu/feholdexcpt.c: Likewise. * sysdeps/aarch64/fpu/fesetround.c: Likewise. * sysdeps/aarch64/fpu/fgetexcptflg.c: Likewise. * sysdeps/aarch64/fpu/ftestexcept.c: Likewise. * sysdeps/aarch64/fpu/s_llrint.c: Likewise. * sysdeps/aarch64/fpu/s_llrintf.c: Likewise. * sysdeps/aarch64/fpu/s_lrint.c: Likewise. * sysdeps/aarch64/fpu/s_lrintf.c: Likewise. * sysdeps/i386/fpu/s_atanl.c: Likewise. * sysdeps/i386/fpu/s_f32xaddf64.c: Likewise. * sysdeps/i386/fpu/s_f32xsubf64.c: Likewise. * sysdeps/i386/fpu/s_fdim.c: Likewise. * sysdeps/i386/fpu/s_logbl.c: Likewise. * sysdeps/i386/fpu/s_rintl.c: Likewise. * sysdeps/i386/fpu/s_significandl.c: Likewise. * sysdeps/ia64/fpu/s_matherrf.c: Likewise. * sysdeps/ia64/fpu/s_matherrl.c: Likewise. * sysdeps/ieee754/dbl-64/s_atan.c: Likewise. * sysdeps/ieee754/dbl-64/s_cbrt.c: Likewise. * sysdeps/ieee754/dbl-64/s_fma.c: Likewise. * sysdeps/ieee754/dbl-64/s_fmaf.c: Likewise. * sysdeps/ieee754/flt-32/s_cbrtf.c: Likewise. * sysdeps/ieee754/k_standardf.c: Likewise. * sysdeps/ieee754/k_standardl.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_copysignl.c: Likewise. * sysdeps/ieee754/ldbl-64-128/s_finitel.c: Likewise. * sysdeps/ieee754/ldbl-64-128/s_fpclassifyl.c: Likewise. * sysdeps/ieee754/ldbl-64-128/s_isinfl.c: Likewise. * sysdeps/ieee754/ldbl-64-128/s_isnanl.c: Likewise. * sysdeps/ieee754/ldbl-64-128/s_signbitl.c: Likewise. * sysdeps/ieee754/ldbl-96/s_cbrtl.c: Likewise. * sysdeps/ieee754/ldbl-96/s_fma.c: Likewise. * sysdeps/ieee754/ldbl-96/s_fmal.c: Likewise. * sysdeps/ieee754/s_signgam.c: Likewise. * sysdeps/powerpc/power5+/fpu/s_modf.c: Likewise. * sysdeps/powerpc/power5+/fpu/s_modff.c: Likewise. * sysdeps/powerpc/power7/fpu/s_logbf.c: Likewise. * sysdeps/riscv/rv64/rvd/s_ceil.c: Likewise. * sysdeps/riscv/rv64/rvd/s_floor.c: Likewise. * sysdeps/riscv/rv64/rvd/s_nearbyint.c: Likewise. * sysdeps/riscv/rv64/rvd/s_round.c: Likewise. * sysdeps/riscv/rv64/rvd/s_roundeven.c: Likewise. * sysdeps/riscv/rv64/rvd/s_trunc.c: Likewise. * sysdeps/riscv/rvd/s_finite.c: Likewise. * sysdeps/riscv/rvd/s_fmax.c: Likewise. * sysdeps/riscv/rvd/s_fmin.c: Likewise. * sysdeps/riscv/rvd/s_fpclassify.c: Likewise. * sysdeps/riscv/rvd/s_isinf.c: Likewise. * sysdeps/riscv/rvd/s_isnan.c: Likewise. * sysdeps/riscv/rvd/s_issignaling.c: Likewise. * sysdeps/riscv/rvf/fegetround.c: Likewise. * sysdeps/riscv/rvf/feholdexcpt.c: Likewise. * sysdeps/riscv/rvf/fesetenv.c: Likewise. * sysdeps/riscv/rvf/fesetround.c: Likewise. * sysdeps/riscv/rvf/feupdateenv.c: Likewise. * sysdeps/riscv/rvf/fgetexcptflg.c: Likewise. * sysdeps/riscv/rvf/ftestexcept.c: Likewise. * sysdeps/riscv/rvf/s_ceilf.c: Likewise. * sysdeps/riscv/rvf/s_finitef.c: Likewise. * sysdeps/riscv/rvf/s_floorf.c: Likewise. * sysdeps/riscv/rvf/s_fmaxf.c: Likewise. * sysdeps/riscv/rvf/s_fminf.c: Likewise. * sysdeps/riscv/rvf/s_fpclassifyf.c: Likewise. * sysdeps/riscv/rvf/s_isinff.c: Likewise. * sysdeps/riscv/rvf/s_isnanf.c: Likewise. * sysdeps/riscv/rvf/s_issignalingf.c: Likewise. * sysdeps/riscv/rvf/s_nearbyintf.c: Likewise. * sysdeps/riscv/rvf/s_roundevenf.c: Likewise. * sysdeps/riscv/rvf/s_roundf.c: Likewise. * sysdeps/riscv/rvf/s_truncf.c: Likewise. * sysdeps/riscv/rv64/rvd/s_rint.c: Include <stdbool.h> instead of <math_private.h>. * sysdeps/riscv/rvf/s_rintf.c: Likewise.
* Use round functions not __round functions in glibc libm.Joseph Myers2018-09-272-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Continuing the move to use, within libm, public names for libm functions that can be inlined as built-in functions on many architectures, this patch moves calls to __round functions to call the corresponding round names instead, with asm redirection to __round when the calls are not inlined. An additional complication arises in sysdeps/ieee754/ldbl-128ibm/e_expl.c, where a call to roundl, with the result converted to int, gets converted by the compiler to call lroundl in the case of 32-bit long, so resulting in localplt test failures. It's logically correct to let the compiler make such an optimization; an appropriate asm redirection of lroundl to __lroundl is thus added to that file (it's not needed anywhere else). Tested for x86_64, and with build-many-glibcs.py. * include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (round): Redirect using MATH_REDIRECT. * sysdeps/aarch64/fpu/s_round.c: Define NO_MATH_REDIRECT before header inclusion. * sysdeps/aarch64/fpu/s_roundf.c: Likewise. * sysdeps/ieee754/dbl-64/s_round.c: Likewise. * sysdeps/ieee754/dbl-64/wordsize-64/s_round.c: Likewise. * sysdeps/ieee754/float128/s_roundf128.c: Likewise. * sysdeps/ieee754/flt-32/s_roundf.c: Likewise. * sysdeps/ieee754/ldbl-128/s_roundl.c: Likewise. * sysdeps/ieee754/ldbl-96/s_roundl.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_round.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_roundf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_round.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_roundf.c: Likewise. * sysdeps/riscv/rv64/rvd/s_round.c: Likewise. * sysdeps/riscv/rvf/s_roundf.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_roundl.c: Likewise. (round): Redirect to __round. (__roundl): Call round instead of __round. * sysdeps/powerpc/fpu/math_private.h [_ARCH_PWR5X] (__round): Remove macro. [_ARCH_PWR5X] (__roundf): Likewise. * sysdeps/ieee754/dbl-64/e_gamma_r.c (gamma_positive): Use round functions instead of __round variants. * sysdeps/ieee754/flt-32/e_gammaf_r.c (gammaf_positive): Likewise. * sysdeps/ieee754/ldbl-128/e_gammal_r.c (gammal_positive): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (gammal_positive): Likewise. * sysdeps/ieee754/ldbl-96/e_gammal_r.c (gammal_positive): Likewise. * sysdeps/x86/fpu/powl_helper.c (__powl_helper): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_expl.c (lroundl): Redirect to __lroundl. (__ieee754_expl): Call roundl instead of __roundl.
* Use trunc functions not __trunc functions in glibc libm.Joseph Myers2018-09-202-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Continuing the move to use, within libm, public names for libm functions that can be inlined as built-in functions on many architectures, this patch moves calls to __trunc functions to call the corresponding trunc names instead, with asm redirection to __trunc when the calls are not inlined. Tested for x86_64, and with build-many-glibcs.py. * include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (trunc): Redirect using MATH_REDIRECT. * sysdeps/aarch64/fpu/s_trunc.c: Define NO_MATH_REDIRECT before header inclusion. * sysdeps/aarch64/fpu/s_truncf.c: Likewise. * sysdeps/ieee754/dbl-64/wordsize-64/s_trunc.c: Likewise. * sysdeps/ieee754/float128/s_truncf128.c: Likewise. * sysdeps/ieee754/dbl-64/s_trunc.c: Likewise. * sysdeps/ieee754/flt-32/s_truncf.c: Likewise. * sysdeps/ieee754/ldbl-128/s_truncl.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_trunc.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_truncf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_trunc.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_truncf.c: Likewise. * sysdeps/riscv/rv64/rvd/s_trunc.c: Likewise. * sysdeps/riscv/rvf/s_truncf.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_trunc.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_truncf.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_trunc.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_truncf.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_trunc_template.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_truncl.c: Likewise. (ceil): Redirect to __ceil. (floor): Redirect to __floor. (trunc): Redirect to __trunc. (__truncl): Call trunc instead of __trunc. * sysdeps/powerpc/fpu/math_private.h [_ARCH_PWR5X] (__trunc): Remove macro. [_ARCH_PWR5X] (__truncf): Likewise. * sysdeps/ieee754/dbl-64/e_gamma_r.c (__ieee754_gamma_r): Use trunc functions instead of __trunc variants. * sysdeps/ieee754/flt-32/e_gammaf_r.c (__ieee754_gammaf_r): Likewise. * sysdeps/ieee754/ldbl-128/e_gammal_r.c (__ieee754_gammal_r): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (__ieee754_gammal_r): Likewise. * sysdeps/ieee754/ldbl-96/e_gammal_r.c (__ieee754_gammal_r): Likewise.
* Use ceil functions not __ceil functions in glibc libm.Joseph Myers2018-09-172-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Continuing the move to use, within libm, public names for libm functions that can be inlined as built-in functions on many architectures, this patch moves calls to __ceil functions to call the corresponding ceil names instead, with asm redirection to __ceil when the calls are not inlined. Tested for x86_64, and with build-many-glibcs.py. * include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (ceil): Redirect using MATH_REDIRECT. * sysdeps/aarch64/fpu/s_ceil.c: Define NO_MATH_REDIRECT before header inclusion. * sysdeps/aarch64/fpu/s_ceilf.c: Likewise. * sysdeps/ieee754/dbl-64/s_ceil.c: Likewise. * sysdeps/ieee754/dbl-64/wordsize-64/s_ceil.c: Likewise. * sysdeps/ieee754/float128/s_ceilf128.c: Likewise. * sysdeps/ieee754/flt-32/s_ceilf.c: Likewise. * sysdeps/ieee754/ldbl-128/s_ceill.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_ceill.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_ceil_template.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceil.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceilf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceil.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceilf.c: Likewise. * sysdeps/riscv/rv64/rvd/s_ceil.c: Likewise. * sysdeps/riscv/rvf/s_ceilf.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_ceil.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_ceilf.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_ceil.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_ceilf.c: Likewise. * sysdeps/powerpc/fpu/math_private.h [_ARCH_PWR5X] (__ceil): Remove macro. * sysdeps/ieee754/dbl-64/e_gamma_r.c (gamma_positive): Use ceil functions instead of __ceil variants. * sysdeps/ieee754/flt-32/e_gammaf_r.c (gammaf_positive): Likewise. * sysdeps/ieee754/ldbl-128/e_gammal_r.c (gammal_positive): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (gammal_positive): Likewise. * sysdeps/ieee754/ldbl-128ibm/s_truncl.c (__truncl): Likewise. * sysdeps/ieee754/ldbl-96/e_gammal_r.c (gammal_positive): Likewise. * sysdeps/powerpc/power5+/fpu/s_modf.c (__modf): Likewise. * sysdeps/powerpc/power5+/fpu/s_modff.c (__modff): Likewise.
* Use rint functions not __rint functions in glibc libm.Joseph Myers2018-09-142-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Continuing the move to use, within libm, public names for libm functions that can be inlined as built-in functions on many architectures, this patch moves calls to __rint functions to call the corresponding rint names instead, with asm redirection to __rint when the calls are not inlined. The x86_64 math_private.h is removed as no longer useful after this patch. This patch is relative to a tree with my floor patch <https://sourceware.org/ml/libc-alpha/2018-09/msg00148.html> applied, and much the same considerations arise regarding possibly replacing an IFUNC call with a direct inline expansion. Tested for x86_64, and with build-many-glibcs.py. * include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (rint): Redirect using MATH_REDIRECT. * sysdeps/aarch64/fpu/s_rint.c: Define NO_MATH_REDIRECT before header inclusion. * sysdeps/aarch64/fpu/s_rintf.c: Likewise. * sysdeps/alpha/fpu/s_rint.c: Likewise. * sysdeps/alpha/fpu/s_rintf.c: Likewise. * sysdeps/i386/fpu/s_rintl.c: Likewise. * sysdeps/ieee754/dbl-64/s_rint.c: Likewise. * sysdeps/ieee754/dbl-64/wordsize-64/s_rint.c: Likewise. * sysdeps/ieee754/float128/s_rintf128.c: Likewise. * sysdeps/ieee754/flt-32/s_rintf.c: Likewise. * sysdeps/ieee754/ldbl-128/s_rintl.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_rintl.c: Likewise. * sysdeps/m68k/coldfire/fpu/s_rint.c: Likewise. * sysdeps/m68k/coldfire/fpu/s_rintf.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_rint.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_rintf.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_rintl.c: Likewise. * sysdeps/powerpc/fpu/s_rint.c: Likewise. * sysdeps/powerpc/fpu/s_rintf.c: Likewise. * sysdeps/riscv/rv64/rvd/s_rint.c: Likewise. * sysdeps/riscv/rvf/s_rintf.c: Likewise. * sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_rint.c: Likewise. * sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_rintf.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_rint.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_rintf.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_rint.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_rintf.c: Likewise. * sysdeps/x86_64/fpu/math_private.h: Remove file. * math/e_scalb.c (invalid_fn): Use rint functions instead of __rint variants. * math/e_scalbf.c (invalid_fn): Likewise. * math/e_scalbl.c (invalid_fn): Likewise. * sysdeps/ieee754/dbl-64/e_gamma_r.c (__ieee754_gamma_r): Likewise. * sysdeps/ieee754/flt-32/e_gammaf_r.c (__ieee754_gammaf_r): Likewise. * sysdeps/ieee754/k_standard.c (__kernel_standard): Likewise. * sysdeps/ieee754/k_standardl.c (__kernel_standard_l): Likewise. * sysdeps/ieee754/ldbl-128/e_gammal_r.c (__ieee754_gammal_r): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (__ieee754_gammal_r): Likewise. * sysdeps/ieee754/ldbl-96/e_gammal_r.c (__ieee754_gammal_r): Likewise. * sysdeps/powerpc/powerpc32/fpu/s_llrint.c (__llrint): Likewise. * sysdeps/powerpc/powerpc32/fpu/s_llrintf.c (__llrintf): Likewise.
* Use floor functions not __floor functions in glibc libm.Joseph Myers2018-09-142-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Similar to the changes that were made to call sqrt functions directly in glibc, instead of __ieee754_sqrt variants, so that the compiler could inline them automatically without needing special inline definitions in lots of math_private.h headers, this patch makes libm code call floor functions directly instead of __floor variants, removing the inlines / macros for x86_64 (SSE4.1) and powerpc (POWER5). The redirection used to ensure that __ieee754_sqrt does still get called when the compiler doesn't inline a built-in function expansion is refactored so it can be applied to other functions; the refactoring is arranged so it's not limited to unary functions either (it would be reasonable to use this mechanism for copysign - removing the inline in math_private_calls.h but also eliminating unnecessary local PLT entry use in the cases (powerpc soft-float and e500v1, for IBM long double) where copysign calls don't get inlined). The point of this change is that more architectures can get floor calls inlined where they weren't previously (AArch64, for example), without needing special inline definitions in their math_private.h, and existing such definitions in math_private.h headers can be removed. Note that it's possible that in some cases an inline may be used where an IFUNC call was previously used - this is the case on x86_64, for example. I think the direct calls to floor are still appropriate; if there's any significant performance cost from inline SSE2 floor instead of an IFUNC call ending up with SSE4.1 floor, that indicates that either the function should be doing something else that's faster than using floor at all, or it should itself have IFUNC variants, or that the compiler choice of inlining for generic tuning should change to allow for the possibility that, by not inlining, an SSE4.1 IFUNC might be called at runtime - but not that glibc should avoid calling floor internally. (After all, all the same considerations would apply to any user program calling floor, where it might either be inlined or left as an out-of-line call allowing for a possible IFUNC.) Tested for x86_64, and with build-many-glibcs.py. * include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (MATH_REDIRECT): New macro. [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (MATH_REDIRECT_LDBL): Likewise. [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (MATH_REDIRECT_F128): Likewise. [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (MATH_REDIRECT_UNARY_ARGS): Likewise. [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (sqrt): Redirect using MATH_REDIRECT. [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (floor): Likewise. * sysdeps/aarch64/fpu/s_floor.c: Define NO_MATH_REDIRECT before header inclusion. * sysdeps/aarch64/fpu/s_floorf.c: Likewise. * sysdeps/ieee754/dbl-64/s_floor.c: Likewise. * sysdeps/ieee754/dbl-64/wordsize-64/s_floor.c: Likewise. * sysdeps/ieee754/float128/s_floorf128.c: Likewise. * sysdeps/ieee754/flt-32/s_floorf.c: Likewise. * sysdeps/ieee754/ldbl-128/s_floorl.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_floorl.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_floor_template.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floor.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floorf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_floor.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_floorf.c: Likewise. * sysdeps/riscv/rv64/rvd/s_floor.c: Likewise. * sysdeps/riscv/rvf/s_floorf.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_floor.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_floorf.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_floor.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_floorf.c: Likewise. * sysdeps/powerpc/fpu/math_private.h [_ARCH_PWR5X] (__floor): Remove macro. [_ARCH_PWR5X] (__floorf): Likewise. * sysdeps/x86_64/fpu/math_private.h [__SSE4_1__] (__floor): Remove inline function. [__SSE4_1__] (__floorf): Likewise. * math/w_lgamma_main.c (LGFUNC (__lgamma)): Use floor functions instead of __floor variants. * math/w_lgamma_r_compat.c (__lgamma_r): Likewise. * math/w_lgammaf_main.c (LGFUNC (__lgammaf)): Likewise. * math/w_lgammaf_r_compat.c (__lgammaf_r): Likewise. * math/w_lgammal_main.c (LGFUNC (__lgammal)): Likewise. * math/w_lgammal_r_compat.c (__lgammal_r): Likewise. * math/w_tgamma_compat.c (__tgamma): Likewise. * math/w_tgamma_template.c (M_DECL_FUNC (__tgamma)): Likewise. * math/w_tgammaf_compat.c (__tgammaf): Likewise. * math/w_tgammal_compat.c (__tgammal): Likewise. * sysdeps/ieee754/dbl-64/e_lgamma_r.c (sin_pi): Likewise. * sysdeps/ieee754/dbl-64/k_rem_pio2.c (__kernel_rem_pio2): Likewise. * sysdeps/ieee754/dbl-64/lgamma_neg.c (__lgamma_neg): Likewise. * sysdeps/ieee754/flt-32/e_lgammaf_r.c (sin_pif): Likewise. * sysdeps/ieee754/flt-32/lgamma_negf.c (__lgamma_negf): Likewise. * sysdeps/ieee754/ldbl-128/e_lgammal_r.c (__ieee754_lgammal_r): Likewise. * sysdeps/ieee754/ldbl-128/e_powl.c (__ieee754_powl): Likewise. * sysdeps/ieee754/ldbl-128/lgamma_negl.c (__lgamma_negl): Likewise. * sysdeps/ieee754/ldbl-128/s_expm1l.c (__expm1l): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_lgammal_r.c (__ieee754_lgammal_r): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_powl.c (__ieee754_powl): Likewise. * sysdeps/ieee754/ldbl-128ibm/lgamma_negl.c (__lgamma_negl): Likewise. * sysdeps/ieee754/ldbl-128ibm/s_expm1l.c (__expm1l): Likewise. * sysdeps/ieee754/ldbl-128ibm/s_truncl.c (__truncl): Likewise. * sysdeps/ieee754/ldbl-96/e_lgammal_r.c (sin_pi): Likewise. * sysdeps/ieee754/ldbl-96/lgamma_negl.c (__lgamma_negl): Likewise. * sysdeps/powerpc/power5+/fpu/s_modf.c (__modf): Likewise. * sysdeps/powerpc/power5+/fpu/s_modff.c (__modff): Likewise.
* Add new exp and exp2 implementationsSzabolcs Nagy2018-09-051-44/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Optimized exp and exp2 implementations using a lookup table for fractional powers of 2. There are several variants, see e_exp_data.c, they can be selected by modifying math_config.h allowing different tradeoffs. The default selection should be acceptable as generic libm code. Worst case error is 0.509 ULP for exp and 0.507 ULP for exp2, on aarch64 the rodata size is 2160 bytes, shared between exp and exp2. On aarch64 .text + .rodata size decreased by 24912 bytes. The non-nearest rounding error is less than 1 ULP even on targets without efficient round implementation (although the error rate is higher in that case). Targets with single instruction, rounding mode independent, to nearest integer rounding and conversion can use them by setting TOINT_INTRINSICS and adding the necessary code to their math_private.h. The __exp1 code uses the same algorithm, so the error bound of pow increased a bit. New double precision error handling code was added following the style of the single precision error handling code. Improvements on Cortex-A72 compared to current glibc master: exp thruput: 1.61x in [-9.9 9.9] exp latency: 1.53x in [-9.9 9.9] exp thruput: 1.13x in [0.5 1] exp latency: 1.30x in [0.5 1] exp2 thruput: 2.03x in [-9.9 9.9] exp2 latency: 1.64x in [-9.9 9.9] For small (< 1) inputs the current exp code uses a separate algorithm so the speed up there is less. Was tested on aarch64-linux-gnu (TOINT_INTRINSICS, fma contraction) and arm-linux-gnueabihf (!TOINT_INTRINSICS, no fma contraction) and x86_64-linux-gnu (!TOINT_INTRINSICS, no fma contraction) and powerpc64le-linux-gnu (!TOINT_INTRINSICS, fma contraction) targets, only non-nearest rounding ulp errors increase and they are within acceptable bounds (ulp updates are in separate patches). * NEWS: Mention exp and exp2 improvements. * math/Makefile (libm-support): Remove t_exp. (type-double-routines): Add math_err and e_exp_data. * sysdeps/aarch64/libm-test-ulps: Update. * sysdeps/arm/libm-test-ulps: Update. * sysdeps/i386/fpu/e_exp_data.c: New file. * sysdeps/i386/fpu/math_err.c: New file. * sysdeps/i386/fpu/t_exp.c: Remove. * sysdeps/ia64/fpu/e_exp_data.c: New file. * sysdeps/ia64/fpu/math_err.c: New file. * sysdeps/ia64/fpu/t_exp.c: Remove. * sysdeps/ieee754/dbl-64/e_exp.c: Rewrite. * sysdeps/ieee754/dbl-64/e_exp2.c: Rewrite. * sysdeps/ieee754/dbl-64/e_exp_data.c: New file. * sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Update error bound. * sysdeps/ieee754/dbl-64/eexp.tbl: Remove. * sysdeps/ieee754/dbl-64/math_config.h: New file. * sysdeps/ieee754/dbl-64/math_err.c: New file. * sysdeps/ieee754/dbl-64/t_exp.c: Remove. * sysdeps/ieee754/dbl-64/t_exp2.h: Remove. * sysdeps/ieee754/dbl-64/uexp.h: Remove. * sysdeps/ieee754/dbl-64/uexp.tbl: Remove. * sysdeps/m68k/m680x0/fpu/e_exp_data.c: New file. * sysdeps/m68k/m680x0/fpu/math_err.c: New file. * sysdeps/m68k/m680x0/fpu/t_exp.c: Remove. * sysdeps/powerpc/fpu/libm-test-ulps: Update. * sysdeps/x86_64/fpu/libm-test-ulps: Update.
* Do not include fenv_private.h in math_private.h.Joseph Myers2018-09-034-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Continuing the clean-up related to the catch-all math_private.h header, this patch stops math_private.h from including fenv_private.h. Instead, fenv_private.h is included directly from those users of math_private.h that also used interfaces from fenv_private.h. No attempt is made to remove unused includes of math_private.h, but that is a natural followup. (However, since math_private.h sometimes defines optimized versions of math.h interfaces or __* variants thereof, as well as defining its own interfaces, I think it might make sense to get all those optimized versions included from include/math.h, not requiring a separate header at all, before eliminating unused math_private.h includes - that avoids a file quietly becoming less-optimized if someone adds a call to one of those interfaces without restoring a math_private.h include to that file.) There is still a pitfall that if code uses plain fe* and __fe* interfaces, but only includes fenv.h and not fenv_private.h or (before this patch) math_private.h, it will compile on platforms with exceptions and rounding modes but not get the optimized versions (and possibly not compile) on platforms without exception and rounding mode support, so making it easy to break the build for such platforms accidentally. I think it would be most natural to move the inlines / macros for fe* and __fe* in the case of no exceptions and rounding modes into include/fenv.h, so that all code including fenv.h with _ISOMAC not defined automatically gets them. Then fenv_private.h would be purely the header for the libc_fe*, SET_RESTORE_ROUND etc. internal interfaces and the risk of breaking the build on other platforms than the one you tested on because of a missing fenv_private.h include would be much reduced (and there would be some unused fenv_private.h includes to remove along with unused math_private.h includes). Tested for x86_64 and x86, and tested with build-many-glibcs.py that installed stripped shared libraries are unchanged by this patch. * sysdeps/generic/math_private.h: Do not include <fenv_private.h>. * math/fromfp.h: Include <fenv_private.h>. * math/math-narrow.h: Likewise. * math/s_cexp_template.c: Likewise. * math/s_csin_template.c: Likewise. * math/s_csinh_template.c: Likewise. * math/s_ctan_template.c: Likewise. * math/s_ctanh_template.c: Likewise. * math/s_iseqsig_template.c: Likewise. * math/w_acos_compat.c: Likewise. * math/w_acosf_compat.c: Likewise. * math/w_acosl_compat.c: Likewise. * math/w_asin_compat.c: Likewise. * math/w_asinf_compat.c: Likewise. * math/w_asinl_compat.c: Likewise. * math/w_ilogb_template.c: Likewise. * math/w_j0_compat.c: Likewise. * math/w_j0f_compat.c: Likewise. * math/w_j0l_compat.c: Likewise. * math/w_j1_compat.c: Likewise. * math/w_j1f_compat.c: Likewise. * math/w_j1l_compat.c: Likewise. * math/w_jn_compat.c: Likewise. * math/w_jnf_compat.c: Likewise. * math/w_llogb_template.c: Likewise. * math/w_log10_compat.c: Likewise. * math/w_log10f_compat.c: Likewise. * math/w_log10l_compat.c: Likewise. * math/w_log2_compat.c: Likewise. * math/w_log2f_compat.c: Likewise. * math/w_log2l_compat.c: Likewise. * math/w_log_compat.c: Likewise. * math/w_logf_compat.c: Likewise. * math/w_logl_compat.c: Likewise. * sysdeps/aarch64/fpu/feholdexcpt.c: Likewise. * sysdeps/aarch64/fpu/fesetround.c: Likewise. * sysdeps/aarch64/fpu/fgetexcptflg.c: Likewise. * sysdeps/aarch64/fpu/ftestexcept.c: Likewise. * sysdeps/ieee754/dbl-64/e_atan2.c: Likewise. * sysdeps/ieee754/dbl-64/e_exp.c: Likewise. * sysdeps/ieee754/dbl-64/e_exp2.c: Likewise. * sysdeps/ieee754/dbl-64/e_gamma_r.c: Likewise. * sysdeps/ieee754/dbl-64/e_jn.c: Likewise. * sysdeps/ieee754/dbl-64/e_pow.c: Likewise. * sysdeps/ieee754/dbl-64/e_remainder.c: Likewise. * sysdeps/ieee754/dbl-64/e_sqrt.c: Likewise. * sysdeps/ieee754/dbl-64/gamma_product.c: Likewise. * sysdeps/ieee754/dbl-64/lgamma_neg.c: Likewise. * sysdeps/ieee754/dbl-64/s_atan.c: Likewise. * sysdeps/ieee754/dbl-64/s_fma.c: Likewise. * sysdeps/ieee754/dbl-64/s_fmaf.c: Likewise. * sysdeps/ieee754/dbl-64/s_llrint.c: Likewise. * sysdeps/ieee754/dbl-64/s_llround.c: Likewise. * sysdeps/ieee754/dbl-64/s_lrint.c: Likewise. * sysdeps/ieee754/dbl-64/s_lround.c: Likewise. * sysdeps/ieee754/dbl-64/s_nearbyint.c: Likewise. * sysdeps/ieee754/dbl-64/s_sin.c: Likewise. * sysdeps/ieee754/dbl-64/s_sincos.c: Likewise. * sysdeps/ieee754/dbl-64/s_tan.c: Likewise. * sysdeps/ieee754/dbl-64/wordsize-64/s_lround.c: Likewise. * sysdeps/ieee754/dbl-64/wordsize-64/s_nearbyint.c: Likewise. * sysdeps/ieee754/dbl-64/x2y2m1.c: Likewise. * sysdeps/ieee754/float128/float128_private.h: Likewise. * sysdeps/ieee754/flt-32/e_gammaf_r.c: Likewise. * sysdeps/ieee754/flt-32/e_j1f.c: Likewise. * sysdeps/ieee754/flt-32/e_jnf.c: Likewise. * sysdeps/ieee754/flt-32/lgamma_negf.c: Likewise. * sysdeps/ieee754/flt-32/s_llrintf.c: Likewise. * sysdeps/ieee754/flt-32/s_llroundf.c: Likewise. * sysdeps/ieee754/flt-32/s_lrintf.c: Likewise. * sysdeps/ieee754/flt-32/s_lroundf.c: Likewise. * sysdeps/ieee754/flt-32/s_nearbyintf.c: Likewise. * sysdeps/ieee754/k_standardl.c: Likewise. * sysdeps/ieee754/ldbl-128/e_expl.c: Likewise. * sysdeps/ieee754/ldbl-128/e_gammal_r.c: Likewise. * sysdeps/ieee754/ldbl-128/e_j1l.c: Likewise. * sysdeps/ieee754/ldbl-128/e_jnl.c: Likewise. * sysdeps/ieee754/ldbl-128/gamma_productl.c: Likewise. * sysdeps/ieee754/ldbl-128/lgamma_negl.c: Likewise. * sysdeps/ieee754/ldbl-128/s_fmal.c: Likewise. * sysdeps/ieee754/ldbl-128/s_llrintl.c: Likewise. * sysdeps/ieee754/ldbl-128/s_llroundl.c: Likewise. * sysdeps/ieee754/ldbl-128/s_lrintl.c: Likewise. * sysdeps/ieee754/ldbl-128/s_lroundl.c: Likewise. * sysdeps/ieee754/ldbl-128/s_nearbyintl.c: Likewise. * sysdeps/ieee754/ldbl-128/x2y2m1l.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/e_expl.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/e_j1l.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/e_jnl.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/lgamma_negl.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_fmal.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_llrintl.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_llroundl.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_lrintl.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_lroundl.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_rintl.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/x2y2m1l.c: Likewise. * sysdeps/ieee754/ldbl-96/e_gammal_r.c: Likewise. * sysdeps/ieee754/ldbl-96/e_jnl.c: Likewise. * sysdeps/ieee754/ldbl-96/gamma_productl.c: Likewise. * sysdeps/ieee754/ldbl-96/lgamma_negl.c: Likewise. * sysdeps/ieee754/ldbl-96/s_fma.c: Likewise. * sysdeps/ieee754/ldbl-96/s_fmal.c: Likewise. * sysdeps/ieee754/ldbl-96/s_llrintl.c: Likewise. * sysdeps/ieee754/ldbl-96/s_llroundl.c: Likewise. * sysdeps/ieee754/ldbl-96/s_lrintl.c: Likewise. * sysdeps/ieee754/ldbl-96/s_lroundl.c: Likewise. * sysdeps/ieee754/ldbl-96/x2y2m1l.c: Likewise. * sysdeps/powerpc/fpu/e_sqrt.c: Likewise. * sysdeps/powerpc/fpu/e_sqrtf.c: Likewise. * sysdeps/riscv/rv64/rvd/s_ceil.c: Likewise. * sysdeps/riscv/rv64/rvd/s_floor.c: Likewise. * sysdeps/riscv/rv64/rvd/s_nearbyint.c: Likewise. * sysdeps/riscv/rv64/rvd/s_round.c: Likewise. * sysdeps/riscv/rv64/rvd/s_roundeven.c: Likewise. * sysdeps/riscv/rv64/rvd/s_trunc.c: Likewise. * sysdeps/riscv/rvd/s_finite.c: Likewise. * sysdeps/riscv/rvd/s_fmax.c: Likewise. * sysdeps/riscv/rvd/s_fmin.c: Likewise. * sysdeps/riscv/rvd/s_fpclassify.c: Likewise. * sysdeps/riscv/rvd/s_isinf.c: Likewise. * sysdeps/riscv/rvd/s_isnan.c: Likewise. * sysdeps/riscv/rvd/s_issignaling.c: Likewise. * sysdeps/riscv/rvf/fegetround.c: Likewise. * sysdeps/riscv/rvf/feholdexcpt.c: Likewise. * sysdeps/riscv/rvf/fesetenv.c: Likewise. * sysdeps/riscv/rvf/fesetround.c: Likewise. * sysdeps/riscv/rvf/feupdateenv.c: Likewise. * sysdeps/riscv/rvf/fgetexcptflg.c: Likewise. * sysdeps/riscv/rvf/ftestexcept.c: Likewise. * sysdeps/riscv/rvf/s_ceilf.c: Likewise. * sysdeps/riscv/rvf/s_finitef.c: Likewise. * sysdeps/riscv/rvf/s_floorf.c: Likewise. * sysdeps/riscv/rvf/s_fmaxf.c: Likewise. * sysdeps/riscv/rvf/s_fminf.c: Likewise. * sysdeps/riscv/rvf/s_fpclassifyf.c: Likewise. * sysdeps/riscv/rvf/s_isinff.c: Likewise. * sysdeps/riscv/rvf/s_isnanf.c: Likewise. * sysdeps/riscv/rvf/s_issignalingf.c: Likewise. * sysdeps/riscv/rvf/s_nearbyintf.c: Likewise. * sysdeps/riscv/rvf/s_roundevenf.c: Likewise. * sysdeps/riscv/rvf/s_roundf.c: Likewise. * sysdeps/riscv/rvf/s_truncf.c: Likewise.
* [BZ #20271] Add newlines in __libc_fatal calls.Paul Pluzhnikov2018-08-311-1/+1
|
* Split fenv_private.h out of math_private.h more consistently.Joseph Myers2018-08-282-280/+305
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On some architectures, the parts of math_private.h relating to the floating-point environment are in a separate file fenv_private.h included from math_private.h. As this is purely an architecture-specific convention used by several architectures, however, all such architectures still need their own math_private.h, even if it has nothing to do beyond #include <fenv_private.h> and peculiarity of including the i386 file directly instead of having a shared file in sysdeps/x86. This patch makes the fenv_private.h name an architecture-independent convention in glibc. The include of fenv_private.h from math_private.h becomes architecture-independent (until callers are updated to include fenv_private.h directly so the include from math_private.h is no longer needed). Some architecture math_private.h headers are removed if no longer needed, or renamed to fenv_private.h if all they define belongs in that header; architecture fenv_private.h headers now do require #include_next <fenv_private.h>. The i386 fenv_private.h file moves to sysdeps/x86/fpu/ to reflect how it is actually shared with x86_64. The generic math_private.h gets a new include of <stdbool.h>, as needed for bool in some prototypes in that header (previously that was indirectly included via include/fenv.h, which now only gets included too late in math_private.h, after those prototypes). Tested for x86_64 and x86, and tested with build-many-glibcs.py that installed stripped shared libraries are unchanged by the patch. * sysdeps/aarch64/fpu/fenv_private.h: New file. Based on .... * sysdeps/aarch64/fpu/math_private.h: ... this file. All contents moved to fenv_private.h except for ... (TOINT_INTRINSICS): Kept in math_private.h. (roundtoint): Likewise. (converttoint): Likewise. * sysdeps/arm/fenv_private.h: Change multiple-include guard to [ARM_FENV_PRIVATE_H]. Include next <fenv_private.h>. * sysdeps/arm/math_private.h: Remove. * sysdeps/generic/fenv_private.h: New file. Contents moved from .... * sysdeps/generic/math_private.h: ... this file. Include <stdbool.h>. Do not include <fenv.h> or <get-rounding-mode.h>. Include <fenv_private.h>. Remove functions and macros moved to fenv_private.h. * sysdeps/i386/fpu/math_private.h: Remove. * sysdeps/mips/math_private.h: Move to .... * sysdeps/mips/fpu/fenv_private.h: ... here. Change multiple-include guard to [MIPS_FENV_PRIVATE_H]. Remove [__mips_hard_float] conditional. Include next <fenv_private.h>. * sysdeps/powerpc/fpu/fenv_private.h: Change multiple-include guard to [POWERPC_FENV_PRIVATE_H]. Include next <fenv_private.h>. * sysdeps/powerpc/fpu/math_private.h: Do not include <fenv_private.h>. * sysdeps/riscv/rvf/math_private.h: Move to .... * sysdeps/riscv/rvf/fenv_private.h: ... here. Change multiple-include guard to [RISCV_FENV_PRIVATE_H]. Include next <fenv_private.h>. * sysdeps/sparc/fpu/fenv_private.h: Change multiple-include guard to [SPARC_FENV_PRIVATE_H]. Include next <fenv_private.h>. * sysdeps/sparc/fpu/math_private.h: Remove. * sysdeps/i386/fpu/fenv_private.h: Move to .... * sysdeps/x86/fpu/fenv_private.h: ... here. Change multiple-include guard to [X86_FENV_PRIVATE_H]. Include next <fenv_private.h>. * sysdeps/x86_64/fpu/math_private.h: Do not include <sysdeps/i386/fpu/fenv_private.h>.
* Move EXCEPTION_ENABLE_SUPPORTED out of math-tests.h.Joseph Myers2018-08-241-2/+6
| | | | | | | | | | | | | | | | | | | Continuing moving macros out of math-tests.h to smaller headers following typo-proof conventions instead of using #ifndef, this patch moves the EXCEPTION_ENABLE_SUPPORTED macro out to its own math-tests-trap.h header. Tested with build-many-glibcs.py. * sysdeps/generic/math-tests-trap.h: New file. * sysdeps/generic/math-tests.h: Include <math-tests-trap.h>. (EXCEPTION_ENABLE_SUPPORTED): Do not define here. * sysdeps/aarch64/math-tests.h: Remove file. * sysdeps/arm/math-tests.h: Likewise. * sysdeps/riscv/math-tests.h: Likewise. * sysdeps/aarch64/math-tests-trap.h: New file. * sysdeps/arm/math-tests-trap.h: Likewise. * sysdeps/riscv/math-tests-trap.h: Likewise.
* [aarch64] Add an ASIMD variant of strlen for falkorSiddhesh Poyarekar2018-08-156-4/+261
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This variant of strlen uses vector loads and operations to reduce the size of the code and also eliminate the non-ascii fallback. This works very well for falkor because of its two vector units and efficient vector ops. In the best case it reduces latency of cases in bench-strlen by 48%, with gains throughout the benchmark. strlen-walk also sees uniform gains in the 5%-15% range. Overall the routine appears to work better than the stock one for falkor regardless of the benchmark, length of string or cache state. The same cannot be said of a53 and a72 though. a53 performance was greatly reduced and for a72 it was a bit of a mixed bag, slightly on the negative side but I reckon it might be fast in some situations. * sysdeps/aarch64/strlen.S (__strlen): Rename to STRLEN. [!STRLEN](STRLEN): Set to __strlen. * sysdeps/aarch64/multiarch/strlen.c: New file. * sysdeps/aarch64/multiarch/strlen_generic.S: Likewise. * sysdeps/aarch64/multiarch/strlen_asimd.S: Likewise. * sysdeps/aarch64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add strlen. * sysdeps/aarch64/multiarch/Makefile (sysdep_routines): Add strlen_generic and strlen_asimd. Reviewed-By: szabolcs.nagy@arm.com CC: pinskia@gmail.com
* Improve performance of sinf and cosfWilco Dijkstra2018-08-141-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | | The second patch improves performance of sinf and cosf using the same algorithms and polynomials. The returned values are identical to sincosf for the same input. ULP definitions for AArch64 and x64 are updated. sinf/cosf througput gains on Cortex-A72: * |x| < 0x1p-12 : 1.2x * |x| < M_PI_4 : 1.8x * |x| < 2 * M_PI: 1.7x * |x| < 120.0 : 2.3x * |x| < Inf : 3.0x * NEWS: Mention sinf, cosf, sincosf. * sysdeps/aarch64/libm-test-ulps: Update ULP for sinf, cosf, sincosf. * sysdeps/x86_64/fpu/libm-test-ulps: Update ULP for sinf and cosf. * sysdeps/x86_64/fpu/multiarch/s_sincosf-fma.c: Add definitions of constants rather than including generic sincosf.h. * sysdeps/x86_64/fpu/s_sincosf_data.c: Remove. * sysdeps/ieee754/flt-32/s_cosf.c (cosf): Rewrite. * sysdeps/ieee754/flt-32/s_sincosf.h (reduced_sin): Remove. (reduced_cos): Remove. (sinf_poly): New function. * sysdeps/ieee754/flt-32/s_sinf.c (sinf): Rewrite.
* Clean up converttoint handling and document the semanticsSzabolcs Nagy2018-08-101-10/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch currently only affects aarch64. The roundtoint and converttoint internal functions are only called with small values, so 32 bit result is enough for converttoint and it is a signed int conversion so the return type is changed to int32_t. The original idea was to help the compiler keeping the result in uint64_t, then it's clear that no sign extension is needed and there is no accidental undefined or implementation defined signed int arithmetics. But it turns out gcc does a good job with inlining so changing the type has no overhead and the semantics of the conversion is less surprising this way. Since we want to allow the asuint64 (x + 0x1.8p52) style conversion, the top bits were never usable and the existing code ensures that only the bottom 32 bits of the conversion result are used. On aarch64 the neon intrinsics (which round ties to even) are changed to round and lround (which round ties away from zero) this does not affect the results in a significant way, but more portable (relies on round and lround being inlined which works with -fno-math-errno). The TOINT_SHIFT and TOINT_RINT macros were removed, only keep separate code paths for TOINT_INTRINSICS and !TOINT_INTRINSICS. * sysdeps/aarch64/fpu/math_private.h (roundtoint): Use round. (converttoint): Use lround. * sysdeps/ieee754/flt-32/math_config.h (roundtoint): Declare and document the semantics when TOINT_INTRINSICS is set. (converttoint): Likewise. (TOINT_RINT): Remove. (TOINT_SHIFT): Remove. * sysdeps/ieee754/flt-32/e_expf.c (__expf): Remove the TOINT_RINT code path.