about summary refs log tree commit diff
path: root/sysdeps/ieee754
Commit message (Collapse)AuthorAgeFilesLines
* Rename c2x / gnu2x tests to c23 / gnu23Joseph Myers2024-02-0114-24/+24
| | | | | | | Complete the internal renaming from "C2X" and related names in GCC by renaming *-c2x and *-gnu2x tests to *-c23 and *-gnu23. Tested for x86_64, and with build-many-glibcs.py for powerpc64le.
* Refer to C23 in place of C2X in glibcJoseph Myers2024-02-012-8/+12
| | | | | | | | | | | | | | | WG14 decided to use the name C23 as the informal name of the next revision of the C standard (notwithstanding the publication date in 2024). Update references to C2X in glibc to use the C23 name. This is intended to update everything *except* where it involves renaming files (the changes involving renaming tests are intended to be done separately). In the case of the _ISOC2X_SOURCE feature test macro - the only user-visible interface involved - support for that macro is kept for backwards compatibility, while adding _ISOC23_SOURCE. Tested for x86_64.
* math: remove exp10 wrappersWilco Dijkstra2024-01-122-3/+11
| | | | | | | | Remove the error handling wrapper from exp10. This is very similar to the changes done to exp and exp2, except that we also need to handle pow10 and pow10l. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* Update copyright dates with scripts/update-copyrightsPaul Eggert2024-01-01506-506/+506
|
* math: Add new exp10 implementationJoe Ramsay2023-12-043-24/+135
| | | | | | | | | | | | | | | | | | | | | | | New implementation is based on the existing exp/exp2, with different reduction constants and polynomial. Worst-case error in round-to- nearest is 0.513 ULP. The exp/exp2 shared table is reused for exp10 - .rodata size of e_exp_data increases by 64 bytes. As for exp/exp2, targets with single-instruction rounding/conversion intrinsics can use them by toggling TOINT_INTRINSICS=1 and adding the necessary code to their math_private.h. Improvements on Neoverse V1 compared to current GLIBC master: exp10 thruput: 3.3x in [-0x1.439b746e36b52p+8 0x1.34413509f79ffp+8] exp10 latency: 1.8x in [-0x1.439b746e36b52p+8 0x1.34413509f79ffp+8] Tested on: aarch64-linux-gnu (TOINT_INTRINSICS, fma contraction) and x86_64-linux-gnu (!TOINT_INTRINSICS, no fma contraction) Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* Avoid maybe-uninitialized warning in __kernel_rem_pio2Andreas Schwab2023-10-161-9/+5
| | | | | | | | | | | | | | With GCC 14 on 32-bit x86 the compiler emits a maybe-uninitialized warning: ../sysdeps/ieee754/dbl-64/k_rem_pio2.c: In function '__kernel_rem_pio2': ../sysdeps/ieee754/dbl-64/k_rem_pio2.c:364:20: error: 'fq' may be used uninitialized [-Werror=maybe-uninitialized] 364 | y[0] = fq[0]; y[1] = fq[1]; y[2] = fw; | ~~^~~ This is similar to the warning that is suppressed in the other branch of the switch. Help the compiler knowing that the variable is always initialized, which also makes the suppression obsolete.
* x86_64: Add log1p with FMAH.J. Lu2023-08-211-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On Skylake, it changes log1p bench performance by: Before After Improvement max 63.349 58.347 8% min 4.448 5.651 -30% mean 12.0674 10.336 14% The minimum code path is if (hx < 0x3FDA827A) /* x < 0.41422 */ { if (__glibc_unlikely (ax >= 0x3ff00000)) /* x <= -1.0 */ { ... } if (__glibc_unlikely (ax < 0x3e200000)) /* |x| < 2**-29 */ { math_force_eval (two54 + x); /* raise inexact */ if (ax < 0x3c900000) /* |x| < 2**-54 */ { ... } else return x - x * x * 0.5; FMA and non-FMA code sequences look similar. Non-FMA version is slightly faster. Since log1p is called by asinh and atanh, it improves asinh performance by: Before After Improvement max 75.645 63.135 16% min 10.074 10.071 0% mean 15.9483 14.9089 6% and improves atanh performance by: Before After Improvement max 91.768 75.081 18% min 15.548 13.883 10% mean 18.3713 16.8011 8%
* x86_64: Add expm1 with FMAH.J. Lu2023-08-141-0/+7
| | | | | | | | | | | | | | | | | | On Skylake, it improves expm1 bench performance by: Before After Improvement max 70.204 68.054 3% min 20.709 16.2 22% mean 22.1221 16.7367 24% NB: Add extern long double __expm1l (long double); extern long double __expm1f128 (long double); for __typeof (__expm1l) and __typeof (__expm1f128) when __expm1 is defined since __expm1 may be expanded in their declarations which causes the build failure.
* configure: Use autoconf 2.71Siddhesh Poyarekar2023-07-171-11/+14
| | | | | | | | | | | | | | Bump autoconf requirement to 2.71 to allow regenerating configure on more recent distributions. autoconf 2.71 has been in Fedora since F36 and is the current version in Debian stable (bookworm). It appears to be current in Gentoo as well. All sysdeps configure and preconfigure scripts have also been regenerated; all changes are trivial transformations that do not affect functionality. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* sysdeps/ieee754/ldbl-128ibm-compat: Fix warn unused resultFrédéric Bérat2023-07-052-14/+17
| | | | | | | Return value from *scanf and *asprintf routines are now properly checked in test-scanf-ldbl-compat-template.c and test-printf-ldbl-compat.c. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* misc/bits/syslog.h: Clearly separate declaration from definitionFrédéric Bérat2023-07-051-0/+1
| | | | | | | | This allows to include bits/syslog-decl.h in include/sys/syslog.h and therefore be able to create the libc_hidden_builtin_proto (__syslog_chk) prototype. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* stdio: Ensure *_chk routines have their hidden builtin definition availableFrédéric Bérat2023-07-054-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If libc_hidden_builtin_{def,proto} isn't properly set for *_chk routines, there are unwanted PLT entries in libc.so. There is a special case with __asprintf_chk: If ldbl_* macros are used for asprintf, ABI gets broken on s390x, if it isn't, ppc64le isn't building due to multiple asm redirections. This is due to the inclusion of bits/stdio-lbdl.h for ppc64le whereas it isn't for s390x. This header creates redirections, which are not compatible with the ones generated using libc_hidden_def. Yet, we can't use libc_hidden_ldbl_proto on s390x since it will not create a simple strong alias (e.g. as done on x86_64), but a versioned alias, leading to ABI breakage. This results in errors on s390x: /usr/bin/ld: glibc/iconv/../libio/bits/stdio2.h:137: undefined reference to `__asprintf_chk' Original __asprintf_chk symbols: 00000000001395b0 T __asprintf_chk 0000000000177e90 T __nldbl___asprintf_chk __asprintf_chk symbols with ldbl_* macros: 000000000012d590 t ___asprintf_chk 000000000012d590 t __asprintf_chk@@GLIBC_2.4 000000000012d590 t __GI___asprintf_chk 000000000012d590 t __GL____asprintf_chk___asprintf_chk 0000000000172240 T __nldbl___asprintf_chk __asprintf_chk symbols with the patch: 000000000012d590 t ___asprintf_chk 000000000012d590 T __asprintf_chk 000000000012d590 t __GI___asprintf_chk 0000000000172240 T __nldbl___asprintf_chk Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* sysdeps: Ensure ieee128*_chk routines to be properly namedFrédéric Bérat2023-07-0519-40/+40
| | | | | | | | | | | | The *_chk routines naming doesn't match the name that would be generated using libc_hidden_ldbl_proto. Since the macro is needed for some of these *_chk functions for _FORTIFY_SOURCE to be enabled, that needed to be fixed. While at it, all the *_chk function get renamed appropriately for consistency, even if not strictly necessary. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>
* Exclude routines from fortificationFrédéric Bérat2023-07-052-16/+94
| | | | | | | | | | | | | | | | | Since the _FORTIFY_SOURCE feature uses some routines of Glibc, they need to be excluded from the fortification. On top of that: - some tests explicitly verify that some level of fortification works appropriately, we therefore shouldn't modify the level set for them. - some objects need to be build with optimization disabled, which prevents _FORTIFY_SOURCE to be used for them. Assembler files that implement architecture specific versions of the fortified routines were not excluded from _FORTIFY_SOURCE as there is no C header included that would impact their behavior. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* aarch64: Add vector implementations of cos routinesJoe Ramsay2023-06-302-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace the loop-over-scalar placeholder routines with optimised implementations from Arm Optimized Routines (AOR). Also add some headers containing utilities for aarch64 libmvec routines, and update libm-test-ulps. Data tables for new routines are used via a pointer with a barrier on it, in order to prevent overly aggressive constant inlining in GCC. This allows a single adrp, combined with offset loads, to be used for every constant in the table. Special-case handlers are marked NOINLINE in order to confine the save/restore overhead of switching from vector to normal calling standard. This way we only incur the extra memory access in the exceptional cases. NOINLINE definitions have been moved to math_private.h in order to reduce duplication. AOR exposes a config option, WANT_SIMD_EXCEPT, to enable selective masking (and later fixing up) of invalid lanes, in order to trigger fp exceptions correctly (AdvSIMD only). This is tested and maintained in AOR, however it is configured off at source level here for performance reasons. We keep the WANT_SIMD_EXCEPT blocks in routine sources to greatly simplify the upstreaming process from AOR to glibc. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* Fix misspellings in sysdeps/ -- BZ 25337Paul Pluzhnikov2023-05-304-4/+4
|
* Added Redirects to longdouble error functions [BZ #29033]Sachin Monga2023-05-102-0/+6
| | | | | | | | This patch redirects the error functions to the appropriate longdouble variants which enables the compiler to optimize for the abi ieeelongdouble. Signed-off-by: Sachin Monga <smonga@linux.ibm.com>
* math: Improve fmod(f) performanceWilco Dijkstra2023-04-172-77/+101
| | | | | | | | | | | | | Optimize the fast paths (x < y) and (x/y < 2^12). Delay handling of special cases to reduce the number of instructions executed before the fast paths. Performance improvements for fmod: Skylake Zen2 Neoverse V1 subnormals 11.8% 4.2% 11.5% normal 3.9% 0.01% -0.5% close-exponents 6.3% 5.6% 19.4% Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* math: Remove the error handling wrapper from fmod and fmodfAdhemerval Zanella Netto2023-04-038-7/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The error handling is moved to sysdeps/ieee754 version with no SVID support. The compatibility symbol versions still use the wrapper with SVID error handling around the new code. There is no new symbol version nor compatibility code on !LIBM_SVID_COMPAT targets (e.g. riscv). The ia64 is unchanged, since it still uses the arch specific __libm_error_region on its implementation. For both i686 and m68k, which provive arch specific implementation, wrappers are added so no new symbol are added (which would require to change the implementations). It shows an small improvement, the results for fmod: Architecture | Input | master | patch -----------------|-----------------|----------|-------- x86_64 (Ryzen 9) | subnormals | 12.5049 | 9.40992 x86_64 (Ryzen 9) | normal | 296.939 | 296.738 x86_64 (Ryzen 9) | close-exponents | 16.0244 | 13.119 aarch64 (N1) | subnormal | 6.81778 | 4.33313 aarch64 (N1) | normal | 155.620 | 152.915 aarch64 (N1) | close-exponents | 8.21306 | 5.76138 armhf (N1) | subnormal | 15.1083 | 14.5746 armhf (N1) | normal | 244.833 | 241.738 armhf (N1) | close-exponents | 21.8182 | 22.457 Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
* math: Improve fmodfAdhemerval Zanella Netto2023-04-032-93/+187
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This uses a new algorithm similar to already proposed earlier [1]. With x = mx * 2^ex and y = my * 2^ey (mx, my, ex, ey being integers), the simplest implementation is: mx * 2^ex == 2 * mx * 2^(ex - 1) while (ex > ey) { mx *= 2; --ex; mx %= my; } With mx/my being mantissa of double floating pointer, on each step the argument reduction can be improved 8 (which is sizeof of uint32_t minus MANTISSA_WIDTH plus the signal bit): while (ex > ey) { mx << 8; ex -= 8; mx %= my; } */ The implementation uses builtin clz and ctz, along with shifts to convert hx/hy back to doubles. Different than the original patch, this path assume modulo/divide operation is slow, so use multiplication with invert values. I see the following performance improvements using fmod benchtests (result only show the 'mean' result): Architecture | Input | master | patch -----------------|-----------------|----------|-------- x86_64 (Ryzen 9) | subnormals | 17.2549 | 12.0318 x86_64 (Ryzen 9) | normal | 85.4096 | 49.9641 x86_64 (Ryzen 9) | close-exponents | 19.1072 | 15.8224 aarch64 (N1) | subnormal | 10.2182 | 6.81778 aarch64 (N1) | normal | 60.0616 | 20.3667 aarch64 (N1) | close-exponents | 11.5256 | 8.39685 I also see similar improvements on arm-linux-gnueabihf when running on the N1 aarch64 chips, where it a lot of soft-fp implementation (for modulo, and multiplication): Architecture | Input | master | patch -----------------|-----------------|----------|-------- armhf (N1) | subnormal | 11.6662 | 10.8955 armhf (N1) | normal | 69.2759 | 34.1524 armhf (N1) | close-exponents | 13.6472 | 18.2131 Instead of using the math_private.h definitions, I used the math_config.h instead which is used on newer math implementations. Co-authored-by: kirill <kirill.okhotnikov@gmail.com> [1] https://sourceware.org/pipermail/libc-alpha/2020-November/119794.html Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
* math: Improve fmodAdhemerval Zanella Netto2023-04-032-96/+208
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This uses a new algorithm similar to already proposed earlier [1]. With x = mx * 2^ex and y = my * 2^ey (mx, my, ex, ey being integers), the simplest implementation is: mx * 2^ex == 2 * mx * 2^(ex - 1) while (ex > ey) { mx *= 2; --ex; mx %= my; } With mx/my being mantissa of double floating pointer, on each step the argument reduction can be improved 11 (which is sizeo of uint64_t minus MANTISSA_WIDTH plus the signal bit): while (ex > ey) { mx << 11; ex -= 11; mx %= my; } */ The implementation uses builtin clz and ctz, along with shifts to convert hx/hy back to doubles. Different than the original patch, this path assume modulo/divide operation is slow, so use multiplication with invert values. I see the following performance improvements using fmod benchtests (result only show the 'mean' result): Architecture | Input | master | patch -----------------|-----------------|----------|-------- x86_64 (Ryzen 9) | subnormals | 19.1584 | 12.5049 x86_64 (Ryzen 9) | normal | 1016.51 | 296.939 x86_64 (Ryzen 9) | close-exponents | 18.4428 | 16.0244 aarch64 (N1) | subnormal | 11.153 | 6.81778 aarch64 (N1) | normal | 528.649 | 155.62 aarch64 (N1) | close-exponents | 11.4517 | 8.21306 I also see similar improvements on arm-linux-gnueabihf when running on the N1 aarch64 chips, where it a lot of soft-fp implementation (for modulo, clz, ctz, and multiplication): Architecture | Input | master | patch -----------------|-----------------|----------|-------- armhf (N1) | subnormal | 15.908 | 15.1083 armhf (N1) | normal | 837.525 | 244.833 armhf (N1) | close-exponents | 16.2111 | 21.8182 Instead of using the math_private.h definitions, I used the math_config.h instead which is used on newer math implementations. Co-authored-by: kirill <kirill.okhotnikov@gmail.com> [1] https://sourceware.org/pipermail/libc-alpha/2020-November/119794.html Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
* Move libc_freeres_ptrs and libc_subfreeres to hidden/weak functionsAdhemerval Zanella Netto2023-03-272-0/+23
| | | | | | | | | | | | | | | | | | | | They are both used by __libc_freeres to free all library malloc allocated resources to help tooling like mtrace or valgrind with memory leak tracking. The current scheme uses assembly markers and linker script entries to consolidate the free routine function pointers in the RELRO segment and to be freed buffers in BSS. This patch changes it to use specific free functions for libc_freeres_ptrs buffers and call the function pointer array directly with call_function_static_weak. It allows the removal of both the internal macros and the linker script sections. Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* C2x scanf binary constant handlingJoseph Myers2023-03-0254-2/+931
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | C2x adds binary integer constants starting with 0b or 0B, and supports those constants for the %i scanf format (in addition to the %b format, which isn't yet implemented for scanf in glibc). Implement that scanf support for glibc. As with the strtol support, this is incompatible with previous C standard versions, in that such an input string starting with 0b or 0B was previously required to be parsed as 0 (with the rest of the input potentially matching subsequent parts of the scanf format string). Thus this patch adds 12 new __isoc23_* functions per long double format (12, 24 or 36 depending on how many long double formats the glibc configuration supports), with appropriate header redirection support (generally very closely following that for the __isoc99_* scanf functions - note that __GLIBC_USE (DEPRECATED_SCANF) takes precedence over __GLIBC_USE (C2X_STRTOL), so the case of GNU extensions to C89 continues to get old-style GNU %a and does not get this new feature). The function names would remain as __isoc23_* even if C2x ends up published in 2024 rather than 2023. When scanf %b support is added, I think it will be appropriate for all versions of scanf to follow C2x rules for inputs to the %b format (given that there are no compatibility concerns for a new format). Tested for x86_64 (full glibc testsuite). The first version was also tested for powerpc (32-bit) and powerpc64le (stdio-common/ and wcsmbs/ tests), and with build-many-glibcs.py.
* math: Suppress -O0 warnings for soft-fp fsqrt [BZ #19444]Adhemerval Zanella2023-01-111-0/+11
| | | | | | The patch suppress the same warnings from 87c266d758d29e52bfb717f90, that shows issues for microblaze, mips soft-fp, nios2, and or1k. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* Update copyright dates with scripts/update-copyrightsJoseph Myers2023-01-06491-491/+491
|
* Fix ldbl-128 built-in function useJoseph Myers2023-01-053-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the following issues with built-in function use in sysdeps/ieee754/ldbl-128 and sysdeps/ieee754/float128: * fabsl used __builtin_fabsf128 unconditionally, breaking the build with GCC 6 for several architectures; it should use __builtin_fabsl with an appropriate redirection in float128_private.h. (I'm not particularly concerned with building glibc with GCC 6; rather, I want to be able to run the tgmath.h tests with GCC 6, which is a significantly different case for tgmath.h compared to GCC 7 and later because of the lack of _FloatN / _FloatNx support in the compiler, and at present running the tests with a compiler means building glibc with that compiler.) * Some (conditional) uses of built-in functions had been added to ldbl-128 without appropriate float128_private.h remapping (there was remapping for the macros controlling whether the built-in functions are used, just not for the functions themselves). * s_llrintl.c called __builtin_round not __builtin_llrintl, which is obviously wrong. Tested with build-many-glibcs.py for aarch64-linux-gnu, GCC 6 (where it fixes the glibc build) and GCC 12, and with the glibc testsuite for x86_64.
* stdio-common: Convert vfprintf and related functions to buffersFlorian Weimer2022-12-193-76/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vfprintf is entangled with vfwprintf (of course), __printf_fp, __printf_fphex, __vstrfmon_l_internal, and the strfrom family of functions. The latter use the internal snprintf functionality, so vsnprintf is converted as well. The simples conversion is __printf_fphex, followed by __vstrfmon_l_internal and __printf_fp, and finally __vfprintf_internal and __vfwprintf_internal. __vsnprintf_internal and strfrom* are mostly consuming the new interfaces, so they are comparatively simple. __printf_fp is a public symbol, so the FILE *-based interface had to preserved. The __printf_fp rewrite does not change the actual binary-to-decimal conversion algorithm, and digits are still not emitted directly to the target buffer. However, the staging buffer now uses bytes instead of wide characters, and one buffer copy is eliminated. The changes are at least performance-neutral in my testing. Floating point printing and snprintf improved measurably, so that this Lua script for i=1,5000000 do print(i, i * math.pi) end runs about 5% faster for me. To preserve fprintf performance for a simple "%d" format, this commit has some logic changes under LABEL (unsigned_number) to avoid additional function calls. There are certainly some very easy performance improvements here: binary, octal and hexadecimal formatting can easily avoid the temporary work buffer (the number of digits can be computed ahead-of-time using one of the __builtin_clz* built-ins). Decimal formatting can use a specialized version of _itoa_word for base 10. The existing (inconsistent) width handling between strfmon and printf is preserved here. __print_fp_buffer_1 would have to use __translated_number_width to achieve ISO conformance for printf. Test expectations in libio/tst-vtables-common.c are adjusted because the internal staging buffer merges all virtual function calls into one. In general, stack buffer usage is greatly reduced, particularly for unbuffered input streams. __printf_fp can still use a large buffer in binary128 mode for %g, though. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* Use GCC builtins for logb functions if desired.Xiaolin Tang2022-11-294-0/+18
| | | | | | | | This patch is using the corresponding GCC builtin for logbf, logb, logbl and logbf128 if the USE_FUNCTION_BUILTIN macros are defined to one in math-use-builtins-function.h. Co-Authored-By: Xi Ruoyao <xry111@xry111.site>
* Use GCC builtins for llrint functions if desired.Xiaolin Tang2022-11-294-17/+38
| | | | | | | | This patch is using the corresponding GCC builtin for llrintf, llrint, llrintl and llrintf128 if the USE_FUNCTION_BUILTIN macros are defined to one in math-use-builtins-function.h. Co-Authored-By: Xi Ruoyao <xry111@xry111.site>
* Use GCC builtins for lrint functions if desired.Xiaolin Tang2022-11-294-17/+38
| | | | | | | | This patch is using the corresponding GCC builtin for lrintf, lrint, lrintl and lrintf128 if the USE_FUNCTION_BUILTIN macros are defined to one in math-use-builtins-function.h. Co-Authored-By: Xi Ruoyao <xry111@xry111.site>
* Fix build with GCC 13 _FloatN, _FloatNx built-in functionsJoseph Myers2022-10-313-0/+380
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | GCC 13 has added more _FloatN and _FloatNx versions of existing <math.h> and <complex.h> built-in functions, for use in libstdc++-v3. This breaks the glibc build because of how those functions are defined as aliases to functions with the same ABI but different types. Add appropriate -fno-builtin-* options for compiling relevant files, as already done for the case of long double functions aliasing double ones and based on the list of files used there. I fixed some mistakes in that list of double files that I noticed while implementing this fix, but there may well be more such (harmless) cases, in this list or the new one (files that don't actually exist or don't define the named functions as aliases so don't need the options). I did try to exclude cases where glibc doesn't define certain functions for _FloatN or _FloatNx types at all from the new uses of -fno-builtin-* options. As with the options for double files (see the commit message for commit 49348beafe9ba150c9bd48595b3f372299bddbb0, "Fix build with GCC 10 when long double = double."), it's deliberate that the options are used even if GCC currently doesn't have a built-in version of a given functions, so providing some level of future-proofing against more such built-in functions being added in future. Tested with build-many-glibcs.py for aarch64-linux-gnu powerpc-linux-gnu powerpc64le-linux-gnu x86_64-linux-gnu (compilers and glibcs builds) with GCC mainline.
* Avoid undefined behaviour in ibm128 implementation of llroundl (BZ #29488)Aurelien Jarno2022-10-241-12/+9
| | | | | | | | Detecting an overflow edge case depended on signed overflow of a long long. Replace the additions and the overflow checks by __builtin_add_overflow(). Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
* Fix BZ #29463 in the ibm128 implementation of y1l tooMichael Hudson-Doyle2022-10-241-0/+3
| | | | | | | | | Avoid moving code across SET_RESTORE_ROUNDL in order to fix [BZ #29463]. Tested-by: Aurelien Jarno <aurelien@aurel32.net> Reviewed-by: Aurelien Jarno <aurelien@aurel32.net> Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
* math: Fix asin and acos invalid exception with old gccSzabolcs Nagy2022-10-171-16/+2
| | | | | | | | | | | | | | This works around a gcc issue where it const folded inf/inf into nan, preventing the invalid exception to be signalled. (x-x)/(x-x) is more robust against optimizations and works for all out of bounds values including x==nan. The gcc issue https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95115 should be fixed on release branches starting from gcc-10, but it is better to change the code in case glibc is built with older gcc. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
* Update _FloatN header support for C++ in GCC 13Joseph Myers2022-09-281-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GCC 13 adds support for _FloatN and _FloatNx types in C++, so breaking the installed glibc headers that assume such support is not present. GCC mostly works around this with fixincludes, but that doesn't help for building glibc and its tests (glibc doesn't itself contain C++ code, but there's C++ code built for tests). Update glibc's bits/floatn-common.h and bits/floatn.h headers to handle the GCC 13 support directly. In general the changes match those made by fixincludes, though I think the ones in sysdeps/powerpc/bits/floatn.h, where the header tests __LDBL_MANT_DIG__ == 113 or uses #elif, wouldn't match the existing fixincludes patterns. Some places involving special C++ handling in relation to _FloatN support are not changed. There's no need to change the __HAVE_FLOATN_NOT_TYPEDEF definition (also in a form that wouldn't be matched by the fixincludes fixes) because it's only used in relation to macro definitions using features not supported for C++ (__builtin_types_compatible_p and _Generic). And there's no need to change the inline function overloads for issignaling, iszero and iscanonical in C++ because cases where types have the same format but are no longer compatible types are handled automatically by the C++ overload resolution rules. This patch also does not change the overload handling for iseqsig, and there I think changes *are* needed, beyond those in this patch or made by fixincludes. The way that overload is defined, via a template parameter to a structure type, requires overloads whenever the types are incompatible, even if they have the same format. So I think we need to add overloads with GCC 13 for every supported _FloatN and _FloatNx type, rather than just having one for _Float128 when it has a different ABI to long double as at present (but for older GCC, such overloads must not be defined for types that end up defined as typedefs for another type). Tested with build-many-glibcs.py: compilers build for aarch64-linux-gnu ia64-linux-gnu mips64-linux-gnu powerpc-linux-gnu powerpc64le-linux-gnu x86_64-linux-gnu; glibcs build for aarch64-linux-gnu ia64-linux-gnu i686-linux-gnu mips-linux-gnu mips64-linux-gnu-n32 powerpc-linux-gnu powerpc64le-linux-gnu x86_64-linux-gnu.
* Ensure calculations happen with desired rounding mode in y1lf128Michael Hudson-Doyle2022-08-181-0/+3
| | | | | | | | | | | | math/test-float128-y1 fails on x86_64 and ppc64el with gcc 12 and -O3, because code inside a block guarded by SET_RESTORE_ROUNDL is being moved after the rounding mode has been restored. Use math_force_eval to prevent this (and insert some math_opt_barrier calls to prevent code from being moved before the rounding mode is set). Fixes #29463 Reviewed-By: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
* i686: Use generic sincosf implementation for SSE2 versionAdhemerval Zanella2022-06-011-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The generic implementation shows slight better performance (gcc 11.2.1 on a Ryzen 9 5900X): * s_sincosf-sse2.S: "sincosf": { "workload-random": { "duration": 3.89961e+09, "iterations": 9.5472e+07, "reciprocal-throughput": 40.8429, "latency": 40.8483, "max-throughput": 2.4484e+07, "min-throughput": 2.44808e+07 } } * generic s_cossinf.c: "sincosf": { "workload-random": { "duration": 3.71953e+09, "iterations": 1.48512e+08, "reciprocal-throughput": 25.0515, "latency": 25.0391, "max-throughput": 3.99177e+07, "min-throughput": 3.99375e+07 } } Checked on i686-linux-gnu. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* i686: Use generic sinf implementation for SSE2 versionAdhemerval Zanella2022-06-011-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Performance seems to be similar (gcc 11.2.1 on a Ryzen 9 5900X), the generic algorithm shows slight better performance for the 'workload-huge.wrf' input set. * s_sinf-sse2.S: "sinf": { "": { "duration": 3.72405e+09, "iterations": 2.38374e+08, "max": 63.973, "min": 11.211, "mean": 15.6227 }, "workload-random.wrf": { "duration": 3.76923e+09, "iterations": 8.4e+07, "reciprocal-throughput": 17.6355, "latency": 72.108, "max-throughput": 5.67037e+07, "min-throughput": 1.38681e+07 }, "workload-huge.wrf": { "duration": 3.76943e+09, "iterations": 6e+07, "reciprocal-throughput": 29.3493, "latency": 96.2985, "max-throughput": 3.40724e+07, "min-throughput": 1.03844e+07 } } * generic s_sinf.c: "sinf": { "": { "duration": 3.70989e+09, "iterations": 2.18025e+08, "max": 69.782, "min": 11.1, "mean": 17.0159 }, "workload-random.wrf": { "duration": 3.77213e+09, "iterations": 9.6e+07, "reciprocal-throughput": 17.5402, "latency": 61.0459, "max-throughput": 5.70119e+07, "min-throughput": 1.63811e+07 }, "workload-huge.wrf": { "duration": 3.81576e+09, "iterations": 5.6e+07, "reciprocal-throughput": 38.2111, "latency": 98.0659, "max-throughput": 2.61704e+07, "min-throughput": 1.01972e+07 } } Checked on i686-linux-gnu. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* i686: Use generic cosf implementation for SSE2 versionAdhemerval Zanella2022-06-011-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Performance seems to be similar (gcc 11.2.1 on a Ryzen 9 5900X): * s_cosf-sse2.S: "cosf": { "workload-random": { "duration": 3.74987e+09, "iterations": 9.616e+07, "reciprocal-throughput": 15.8141, "latency": 62.1782, "max-throughput": 6.32346e+07, "min-throughput": 1.60828e+07 } } * generic s_cosf.c: "cosf": { "workload-random": { "duration": 3.87298e+09, "iterations": 1.00968e+08, "reciprocal-throughput": 18.3448, "latency": 58.3722, "max-throughput": 5.45113e+07, "min-throughput": 1.71314e+07 } } Checked on i686-linux-gnu.
* x86_64: Optimize sincos where sin/cos is optimized (bug 29193)Andreas Schwab2022-06-011-0/+7
| | | | | | | The compiler may substitute calls to sin or cos with calls to sincos, thus we should have the same optimized implementations for sincos. The optimized implementations may produce results that differ, that also makes sure that the sincos call aggrees with the sin and cos calls.
* math: Add math-use-builtins-fabs (BZ#29027)Adhemerval Zanella2022-05-233-5/+36
| | | | | | | | | | | | | | | | | | Both float, double, and _Float128 are assumed to be supported (float and double already only uses builtins). Only long double is parametrized due GCC bug 29253 which prevents its usage on powerpc. It allows to remove i686, ia64, x86_64, powerpc, and sparc arch specific implementation. On ia64 it also fixes the sNAN handling: math/test-float64x-fabs math/test-ldouble-fabs Checked on x86_64-linux-gnu, i686-linux-gnu, powerpc-linux-gnu, powerpc64-linux-gnu, sparc64-linux-gnu, and ia64-linux-gnu.
* math: Use builtin for ldbl-96 copysignAdhemerval Zanella2022-04-071-7/+3
| | | | | | | All architectures that uses it (x86, ia64, m68k) implement the builtin. Checked on x86_64-linux-gnu and ia64-linux-gnu.
* math: Fix float conversion regressions with gcc-12 [BZ #28713]Szabolcs Nagy2022-01-106-16/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Converting double precision constants to float is now affected by the runtime dynamic rounding mode instead of being evaluated at compile time with default rounding mode (except static object initializers). This can change the computed result and cause performance regression. The known correctness issues (increased ulp errors) are already fixed, this patch fixes remaining cases of unnecessary runtime conversions. Add float M_* macros to math.h as new GNU extension API. To avoid conversions the new M_* macros are used and instead of casting double literals to float, use float literals (only required if the conversion is inexact). The patch was tested on aarch64 where the following symbols had new spurious conversion instructions that got fixed: __clog10f __gammaf_r_finite@GLIBC_2.17 __j0f_finite@GLIBC_2.17 __j1f_finite@GLIBC_2.17 __jnf_finite@GLIBC_2.17 __kernel_casinhf __lgamma_negf __log1pf __y0f_finite@GLIBC_2.17 __y1f_finite@GLIBC_2.17 cacosf cacoshf casinhf catanf catanhf clogf gammaf_positive Fixes bug 28713. Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
* Update copyright dates with scripts/update-copyrightsPaul Eggert2022-01-01490-490/+490
| | | | | | | | | | | | | | | | | | | | | | | I used these shell commands: ../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright (cd ../glibc && git commit -am"[this commit message]") and then ignored the output, which consisted lines saying "FOO: warning: copyright statement not found" for each of 7061 files FOO. I then removed trailing white space from math/tgmath.h, support/tst-support-open-dev-null-range.c, and sysdeps/x86_64/multiarch/strlen-vec.S, to work around the following obscure pre-commit check failure diagnostics from Savannah. I don't know why I run into these diagnostics whereas others evidently do not. remote: *** 912-#endif remote: *** 913: remote: *** 914- remote: *** error: lines with trailing whitespace found ... remote: *** error: sysdeps/unix/sysv/linux/statx_cp.c: trailing lines
* s_sincosf.h: Change pio4 type to float [BZ #28713]H.J. Lu2021-12-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | s_cosf.c and s_sinf.c have if (abstop12 (y) < abstop12 (pio4)) where abstop12 takes a float argument, but pio4 is static const double. pio4 is used only in calls to abstop12 and never in arithmetic. Apply -static const double pio4 = 0x1.921FB54442D18p-1; +static const float pio4 = 0x1.921FB6p-1f; to fix: FAIL: math/test-float-cos FAIL: math/test-float-sin FAIL: math/test-float-sincos FAIL: math/test-float32-cos FAIL: math/test-float32-sin FAIL: math/test-float32-sincos when compiling with GCC 12. Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
* sysdeps: Simplify sin Taylor Series calculationAkila Welihinda2021-12-131-5/+5
| | | | | | | | | | | | The macro TAYLOR_SIN adds the term `-0.5*da*a^2 + da` in hopes of regaining some precision as a function of da. However the comment says we add the term `-0.5*da*a^2 + 0.5*da` which is different. This fix updates the comment to reflect the code and also simplifies the calculation by replacing `a` with `x` because they always have the same value. Signed-off-by: Akila Welihinda <akilawelihinda@ucla.edu> Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
* math: Remove the error handling wrapper from hypot and hypotfAdhemerval Zanella2021-12-134-9/+39
| | | | | | | | | | | | The error handling is moved to sysdeps/ieee754 version with no SVID support. The compatibility symbol versions still use the wrapper with SVID error handling around the new code. There is no new symbol version nor compatibility code on !LIBM_SVID_COMPAT targets (e.g. riscv). Only ia64 is unchanged, since it still uses the arch specific __libm_error_region on its implementation. Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
* math: Use fmin/fmax on hypotWilco Dijkstra2021-12-131-2/+3
| | | | | | It optimizes for architectures that provides fast builtins. Checked on aarch64-linux-gnu.
* math: Use an improved algorithm for hypotl (ldbl-128)Adhemerval Zanella2021-12-131-130/+96
| | | | | | | | | | | | | | | | | | | | | This implementation is based on 'An Improved Algorithm for hypot(a,b)' by Carlos F. Borges [1] using the MyHypot3 with the following changes: - Handle qNaN and sNaN. - Tune the 'widely varying operands' to avoid spurious underflow due the multiplication and fix the return value for upwards rounding mode. - Handle required underflow exception for subnormal results. The main advantage of the new algorithm is its precision. With a random 1e9 input pairs in the range of [LDBL_MIN, LDBL_MAX], glibc current implementation shows around 0.05% results with an error of 1 ulp (453266 results) while the new implementation only shows 0.0001% of total (1280). Checked on aarch64-linux-gnu and x86_64-linux-gnu. [1] https://arxiv.org/pdf/1904.09481.pdf
* math: Use an improved algorithm for hypotl (ldbl-96)Adhemerval Zanella2021-12-131-133/+98
| | | | | | | | | | | | | | | | | | | This implementation is based on 'An Improved Algorithm for hypot(a,b)' by Carlos F. Borges [1] using the MyHypot3 with the following changes: - Handle qNaN and sNaN. - Tune the 'widely varying operands' to avoid spurious underflow due the multiplication and fix the return value for upwards rounding mode. - Handle required underflow exception for subnormal results. The main advantage of the new algorithm is its precision. With a random 1e8 input pairs in the range of [LDBL_MIN, LDBL_MAX], glibc current implementation shows around 0.02% results with an error of 1 ulp (23158 results) while the new implementation only shows 0.0001% of total (111). [1] https://arxiv.org/pdf/1904.09481.pdf