about summary refs log tree commit diff
path: root/sysdeps/powerpc/powerpc64
Commit message (Collapse)AuthorAgeFilesLines
* powerpc64le: Adhere to ABI stack alignment requirementSachin Monga2024-10-281-1/+1
| | | | | | The ABI requires all stack frames be 16-byte aligned. Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
* Fix whitespace related license issues.Carlos O'Donell2024-10-077-13/+13
| | | | | | | | | | | | Several copies of the licenses in files contained whitespace related problems. Two cases are addressed here, the first is two spaces after a period which appears between "PURPOSE." and "See". The other is a space after the last forward slash in the URL. Both issues are corrected and the licenses now match the official textual description of the license (and the other license in the sources). Since these whitespaces changes do not alter the paragraph structure of the license, nor create new sentences, they do not change the license.
* powerpc64le: Build new strtod tests with long double ABI flags (bug 32145)Florian Weimer2024-09-051-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes several test failures: =====FAIL: stdlib/tst-strtod1i.out===== Locale tests all OK Locale tests all OK Locale tests strtold("1,5") returns -6,38643e+367 and not 1,5 strtold("1.5") returns 1,5 and not 1 strtold("1.500") returns 1 and not 1500 strtold("36.893.488.147.419.103.232") returns 1500 and not 3,68935e+19 Locale tests all OK =====FAIL: stdlib/tst-strtod3.out===== 0: got wrong results -2.5937e+4826, expected 0 =====FAIL: stdlib/tst-strtod4.out===== 0: got wrong results -6,38643e+367, expected 0 1: got wrong results 0, expected 1e+06 2: got wrong results 1e+06, expected 10 =====FAIL: stdlib/tst-strtod5i.out===== 0: got wrong results -6,38643e+367, expected 0 2: got wrong results 0, expected -0 4: got wrong results -0, expected 0 5: got wrong results 0, expected -0 6: got wrong results -0, expected 0 7: got wrong results 0, expected -0 8: got wrong results -0, expected 0 9: got wrong results 0, expected -0 10: got wrong results -0, expected 0 11: got wrong results 0, expected -0 12: got wrong results -0, expected 0 13: got wrong results 0, expected -0 14: got wrong results -0, expected 0 15: got wrong results 0, expected -0 16: got wrong results -0, expected 0 17: got wrong results 0, expected -0 18: got wrong results -0, expected 0 20: got wrong results 0, expected -0 22: got wrong results -0, expected 0 23: got wrong results 0, expected -0 24: got wrong results -0, expected 0 25: got wrong results 0, expected -0 26: got wrong results -0, expected 0 27: got wrong results 0, expected -0 Fixes commit 3fc063dee01da4f80920a14b7db637c8501d6fd4 ("Make __strtod_internal tests type-generic"). Suggested-by: Joseph Myers <josmyers@redhat.com> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* powerpc64: Fix syscall_cancel build for powerpc64le-linux-gnu [BZ #32125]Jeevitha Palanisamy2024-08-301-1/+1
| | | | | | | | | In __syscall_cancel_arch, there's a tail call to __syscall_do_cancel. On P10, since the caller uses the TOC and the callee is using PC-relative addressing, there's only a branch instruction with no NOPs to restore the TOC, which causes the build error. The fix involves adding the NOTOC directive to the branch instruction, informing the linker not to generate a TOC stub, thus resolving the issue.
* powerpc64: Optimize strcpy and stpcpy for Power9/10Mahesh Bodapati2024-08-231-53/+223
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch modifies the current Power9 implementation of strcpy and stpcpy to optimize it for Power9 and Power10. No new Power10 instructions are used, so the original Power9 strcpy is modified instead of creating a new implementation for Power10. The changes also affect stpcpy, which uses the same implementation with some additional code before returning. Improvements compared to the old Power9 version: Use simple comparisons for the first ~512 bytes: The main loop is good for long strings, but comparing 16B each time is better for shorter strings. After aligning the address to 16 bytes, we unroll the loop four times, checking 128 bytes each time. There may be some overlap with the main loop for unaligned strings, but it is better for shorter strings. Loop with 64 bytes for longer bytes: Use 4 consecutive lxv/stxv instructions. Showed an average improvement of 13%. Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com> Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
* nptl: Fix Race conditions in pthread cancellation [BZ#12683]Adhemerval Zanella2024-08-231-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current racy approach is to enable asynchronous cancellation before making the syscall and restore the previous cancellation type once the syscall returns, and check if cancellation has happen during the cancellation entrypoint. As described in BZ#12683, this approach shows 2 problems: 1. Cancellation can act after the syscall has returned from the kernel, but before userspace saves the return value. It might result in a resource leak if the syscall allocated a resource or a side effect (partial read/write), and there is no way to program handle it with cancellation handlers. 2. If a signal is handled while the thread is blocked at a cancellable syscall, the entire signal handler runs with asynchronous cancellation enabled. This can lead to issues if the signal handler call functions which are async-signal-safe but not async-cancel-safe. For the cancellation to work correctly, there are 5 points at which the cancellation signal could arrive: [ ... )[ ... )[ syscall ]( ... 1 2 3 4 5 1. Before initial testcancel, e.g. [*... testcancel) 2. Between testcancel and syscall start, e.g. [testcancel...syscall start) 3. While syscall is blocked and no side effects have yet taken place, e.g. [ syscall ] 4. Same as 3 but with side-effects having occurred (e.g. a partial read or write). 5. After syscall end e.g. (syscall end...*] And libc wants to act on cancellation in cases 1, 2, and 3 but not in cases 4 or 5. For the 4 and 5 cases, the cancellation will eventually happen in the next cancellable entrypoint without any further external event. The proposed solution for each case is: 1. Do a conditional branch based on whether the thread has received a cancellation request; 2. It can be caught by the signal handler determining that the saved program counter (from the ucontext_t) is in some address range beginning just before the "testcancel" and ending with the syscall instruction. 3. SIGCANCEL can be caught by the signal handler and determine that the saved program counter (from the ucontext_t) is in the address range beginning just before "testcancel" and ending with the first uninterruptable (via a signal) syscall instruction that enters the kernel. 4. In this case, except for certain syscalls that ALWAYS fail with EINTR even for non-interrupting signals, the kernel will reset the program counter to point at the syscall instruction during signal handling, so that the syscall is restarted when the signal handler returns. So, from the signal handler's standpoint, this looks the same as case 2, and thus it's taken care of. 5. For syscalls with side-effects, the kernel cannot restart the syscall; when it's interrupted by a signal, the kernel must cause the syscall to return with whatever partial result is obtained (e.g. partial read or write). 6. The saved program counter points just after the syscall instruction, so the signal handler won't act on cancellation. This is similar to 4. since the program counter is past the syscall instruction. So The proposed fixes are: 1. Remove the enable_asynccancel/disable_asynccancel function usage in cancellable syscall definition and instead make them call a common symbol that will check if cancellation is enabled (__syscall_cancel at nptl/cancellation.c), call the arch-specific cancellable entry-point (__syscall_cancel_arch), and cancel the thread when required. 2. Provide an arch-specific generic system call wrapper function that contains global markers. These markers will be used in SIGCANCEL signal handler to check if the interruption has been called in a valid syscall and if the syscalls has side-effects. A reference implementation sysdeps/unix/sysv/linux/syscall_cancel.c is provided. However, the markers may not be set on correct expected places depending on how INTERNAL_SYSCALL_NCS is implemented by the architecture. It is expected that all architectures add an arch-specific implementation. 3. Rewrite SIGCANCEL asynchronous handler to check for both canceling type and if current IP from signal handler falls between the global markers and act accordingly. 4. Adjust libc code to replace LIBC_CANCEL_ASYNC/LIBC_CANCEL_RESET to use the appropriate cancelable syscalls. 5. Adjust 'lowlevellock-futex.h' arch-specific implementations to provide cancelable futex calls. Some architectures require specific support on syscall handling: * On i386 the syscall cancel bridge needs to use the old int80 instruction because the optimized vDSO symbol the resulting PC value for an interrupted syscall points to an address outside the expected markers in __syscall_cancel_arch. It has been discussed in LKML [1] on how kernel could help userland to accomplish it, but afaik discussion has stalled. Also, sysenter should not be used directly by libc since its calling convention is set by the kernel depending of the underlying x86 chip (check kernel commit 30bfa7b3488bfb1bb75c9f50a5fcac1832970c60). * mips o32 is the only kABI that requires 7 argument syscall, and to avoid add a requirement on all architectures to support it, mips support is added with extra internal defines. Checked on aarch64-linux-gnu, arm-linux-gnueabihf, powerpc-linux-gnu, powerpc64-linux-gnu, powerpc64le-linux-gnu, i686-linux-gnu, and x86_64-linux-gnu. [1] https://lkml.org/lkml/2016/3/8/1105 Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* Convert to autoconf 2.72 (vanilla release, no distribution patches)Andreas K. Hüttel2024-06-173-34/+47
| | | | | | | As discussed at the patch review meeting Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org> Reviewed-by: Simon Chopin <simon.chopin@canonical.com>
* Implement C23 exp2m1, exp10m1Joseph Myers2024-06-173-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | C23 adds various <math.h> function families originally defined in TS 18661-4. Add the exp2m1 and exp10m1 functions (exp2(x)-1 and exp10(x)-1, like expm1). As with other such functions, these use type-generic templates that could be replaced with faster and more accurate type-specific implementations in future. Test inputs are copied from those for expm1, plus some additions close to the overflow threshold (copied from exp2 and exp10) and also some near the underflow threshold. exp2m1 has the unusual property of having an input (M_MAX_EXP) where whether the function overflows (under IEEE semantics) depends on the rounding mode. Although these could reasonably be XFAILed in the testsuite (as we do in some cases for arguments very close to a function's overflow threshold when an error of a few ulps in the implementation can result in the implementation not agreeing with an ideal one on whether overflow takes place - the testsuite isn't smart enough to handle this automatically), since these functions aren't required to be correctly rounding, I made the implementation check for and handle this case specially. The Makefile ordering expected by lint-makefiles for the new functions is a bit peculiar, but I implemented it in this patch so that the test passes; I don't know why log2 also needed moving in one Makefile variable setting when it didn't in my previous patches, but the failure showed a different place was expected for that function as well. The powerpc64le IFUNC setup seems not to be as self-contained as one might hope; it shouldn't be necessary to add IFUNCs for new functions such as these simply to get them building, but without setting up IFUNCs for the new functions, there were undefined references to __GI___expm1f128 (that IFUNC machinery results in no such function being defined, but doesn't stop include/math.h from doing the redirection resulting in the exp2m1f128 and exp10m1f128 implementations expecting to call it). Tested for x86_64 and x86, and with build-many-glibcs.py.
* Implement C23 logp1Joseph Myers2024-06-174-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | C23 adds various <math.h> function families originally defined in TS 18661-4. Add the logp1 functions (aliases for log1p functions - the name is intended to be more consistent with the new log2p1 and log10p1, where clearly it would have been very confusing to name those functions log21p and log101p). As aliases rather than new functions, the content of this patch is somewhat different from those actually adding new functions. Tests are shared with log1p, so this patch *does* mechanically update all affected libm-test-ulps files to expect the same errors for both functions. The vector versions of log1p on aarch64 and x86_64 are *not* updated to have logp1 aliases (and thus there are no corresponding header, tests, abilist or ulps changes for vector functions either). It would be reasonable for such vector aliases and corresponding changes to other files to be made separately. For now, the log1p tests instead avoid testing logp1 in the vector case (a Makefile change is needed to avoid problems with grep, used in generating the .c files for vector function tests, matching more than one ALL_RM_TEST line in a file testing multiple functions with the same inputs, when it assumes that the .inc file only has a single such line). Tested for x86_64 and x86, and with build-many-glibcs.py.
* powerpc: Remove duplicate strchrnul and strncasecmp_l libc.a (BZ 31786)Adhemerval Zanella2024-05-233-1/+19
| | | | | | | | | | | | For powerpc64 the generic version provides a weak definition of strchrnul, which are already provided by the ifunc resolver. The powerpc32 version is slight different, where for static case there is no iFUNC support. The strncasecmp_l is provided ifunc resolver. Checked on powerpc-linux-gnu-power4 and powerpc64-linux-gnu. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* powerpc64: Fix by using the configure value $libc_cv_cc_submachine [BZ ↵Manjunath Matti2024-05-162-4/+4
| | | | | | | | | | #31629] This patch ensures that $libc_cv_cc_submachine, which is set from "--with-cpu", overrides $CFLAGS for configure time tests. Suggested-by: Peter Bergner <bergner@linux.ibm.com> Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
* powerpc: Optimized strncmp for power10Amrita H S2024-05-065-1/+304
| | | | | | | | | | | | | | | | | | This patch is based on __strcmp_power10. Improvements from __strncmp_power9: 1. Uses new POWER10 instructions - This code uses lxvp to decrease contention on load by loading 32 bytes per instruction. 2. Performance implication - This version has around 38% better performance on average. - Minor performance regression is seen for few small sizes and specific combination of alignments. Signed-off-by: Amrita H S <amritahs@linux.ibm.com> Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
* login: structs utmp, utmpx, lastlog _TIME_BITS independence (bug 30701)Florian Weimer2024-04-191-2/+1
| | | | | | | | | These structs describe file formats under /var/log, and should not depend on the definition of _TIME_BITS. This is achieved by defining __WORDSIZE_TIME64_COMPAT32 to 1 on 32-bit ports that support 32-bit time_t values (where __time_t is 32 bits). Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* powerpc: Fix ld.so address determination for PCREL mode (bug 31640)Florian Weimer2024-04-141-0/+19
| | | | | | | | This seems to have stopped working with some GCC 14 versions, which clobber r2. With other compilers, the kernel-provided r2 value is still available at this point. Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
* powerpc: Placeholder and infrastructure/build support to add Power11 ↵Amrita H S2024-03-199-2/+14
| | | | | | | | | | | | related changes. The following three changes have been added to provide initial Power11 support. 1. Add the directories to hold Power11 files. 2. Add support to select Power11 libraries based on AT_PLATFORM. 3. Let submachine=power11 be set automatically. Reviewed-by: Florian Weimer <fweimer@redhat.com> Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
* powerpc: Remove power8 strcasestr optimizationAdhemerval Zanella2024-03-128-675/+1
| | | | | | | | | | | | | | | | | | Similar to strstr (1e9a550ba4), power8 strcasestr does not show much improvement compared to the generic implementation. The geomean on bench-strcasestr shows: __strcasestr_power8 __strcasestr_ppc power10 1159 1120 power9 1640 1469 power8 1787 1904 The strcasestr uses the same 'trick' as power7 strstr to detect potential quadradic behavior, which only adds overheads for input that trigger quadradic behavior and it is really a hack. Checked on powerpc64le-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>
* powerpc: Remove power7 strstr optimizationAdhemerval Zanella2024-02-238-671/+1
| | | | | | | | | | | | | | | The optimization is not faster than the generic algorithm, using the bench-strstr the geometric mean running on a POWER10 machine using gcc 13.1.1 is 482.47 while the default __strstr_ppc is 340.97 (which uses the generic implementation). Also, there is no need to redirect the internal str*/mem* call to optimized version, internal ifunc is supported and enabled for internal calls (meaning that the generic implementation will use any asm optimization if available). Checked on powerpc64le-linux-gnu. Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
* Rename c2x / gnu2x tests to c23 / gnu23Joseph Myers2024-02-011-2/+2
| | | | | | | Complete the internal renaming from "C2X" and related names in GCC by renaming *-c2x and *-gnu2x tests to *-c23 and *-gnu23. Tested for x86_64, and with build-many-glibcs.py for powerpc64le.
* string: Use builtins for ffs and ffsllAdhemerval Zanella Netto2024-02-011-36/+0
| | | | | | | It allows to remove a lot of arch-specific implementations. Checked on x86_64, aarch64, powerpc64. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* Update copyright dates with scripts/update-copyrightsPaul Eggert2024-01-01270-270/+270
|
* powerpc: Fix performance issues of strcmp power10Amrita H S2023-12-151-66/+95
| | | | | | | | | | | | | | | | | Current implementation of strcmp for power10 has performance regression for multiple small sizes and alignment combination. Most of these performance issues are fixed by this patch. The compare loop is unrolled and page crosses of unrolled loop is handled. Thanks to Paul E. Murphy for helping in fixing the performance issues. Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com> Co-Authored-By: Paul E. Murphy <murphyp@linux.ibm.com> Reviewed-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>
* powerpc : Add optimized memchr for POWER10MAHESH BODAPATI2023-12-145-10/+367
| | | | | | Optimized memchr for POWER10 based on existing rawmemchr and strlen. Reordering instructions and loop unrolling helped in getting better performance. Reviewed-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>
* powerpc: Optimized strcmp for power10Amrita H S2023-12-075-1/+240
| | | | | | | | | | | | | | | | | | | This patch is based on __strcmp_power9 and __strlen_power10. Improvements from __strcmp_power9: 1. Uses new POWER10 instructions - This code uses lxvp to decrease contention on load by loading 32 bytes per instruction. 2. Performance implication - This version has around 30% better performance on average. - Performance regression is seen for a specific combination of sizes and alignments. Some of them is observed without changes also, while rest may be induced by the patch. Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com> Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>
* elf: Remove LD_PROFILE for static binariesAdhemerval Zanella2023-11-212-8/+14
| | | | | | | | | | | The _dl_non_dynamic_init does not parse LD_PROFILE, which does not enable profile for dlopen objects. Since dlopen is deprecated for static objects, it is better to remove the support. It also allows to trim down libc.a of profile support. Checked on x86_64-linux-gnu. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* PowerPC: Influence cpu/arch hwcap features via GLIBC_TUNABLESMahesh Bodapati2023-08-012-5/+4
| | | | | | | | | | | This patch enables the option to influence hwcaps used by PowerPC. The environment variable, GLIBC_TUNABLES=glibc.cpu.hwcaps=-xxx,yyy,-zzz...., can be used to enable CPU/ARCH feature yyy, disable CPU/ARCH feature xxx and zzz, where the feature name is case-sensitive and has to match the ones mentioned in the file{sysdeps/powerpc/dl-procinfo.c}. Note that the hwcap tunables only used in the IFUNC selection. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* powerpc: Fix powerpc64 strchrnul build with old gccAdhemerval Zanella Netto2023-07-261-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The compiler might not see that internal definition is an alias due the libc_ifunc macro, which redefines __strchrnul. With gcc 6 it fails with: In file included from <command-line>:0:0: ./../include/libc-symbols.h:472:33: error: ‘__EI___strchrnul’ aliased to undefined symbol ‘__GI___strchrnul’ extern thread __typeof (name) __EI_##name \ ^ ./../include/libc-symbols.h:468:3: note: in expansion of macro ‘__hidden_ver2’ __hidden_ver2 (, local, internal, name) ^~~~~~~~~~~~~ ./../include/libc-symbols.h:476:29: note: in expansion of macro ‘__hidden_ver1’ # define hidden_def(name) __hidden_ver1(__GI_##name, name, name); ^~~~~~~~~~~~~ ./../include/libc-symbols.h:557:32: note: in expansion of macro ‘hidden_def’ # define libc_hidden_def(name) hidden_def (name) ^~~~~~~~~~ ../sysdeps/powerpc/powerpc64/multiarch/strchrnul.c:38:1: note: in expansion of macro ‘libc_hidden_def’ libc_hidden_def (__strchrnul) ^~~~~~~~~~~~~~~ Use libc_ifunc_hidden as stpcpy. Checked on powerpc64 with gcc 6 and gcc 13. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* configure: Use autoconf 2.71Siddhesh Poyarekar2023-07-173-93/+119
| | | | | | | | | | | | | | Bump autoconf requirement to 2.71 to allow regenerating configure on more recent distributions. autoconf 2.71 has been in Fedora since F36 and is the current version in Debian stable (bookworm). It appears to be current in Gentoo as well. All sysdeps configure and preconfigure scripts have also been regenerated; all changes are trivial transformations that do not affect functionality. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* Regenerate configure fragment -- BZ 25337.Paul Pluzhnikov2023-05-231-1/+1
| | | | | | | In commit 0b25c28e028b63c95108c442d8112811107e4c13 I updated congure.ac but neglected to regenerate updated configure. Fix this here.
* Fix misspellings in sysdeps/powerpc -- BZ 25337Paul Pluzhnikov2023-05-2316-18/+18
| | | | | | | All fixes are in comments, so the binaries should be identical before/after this commit, but I can't verify this. Reviewed-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>
* powerpc:GCC(<10) doesn't allow -mlong-double-64 after -mabi=ieeelongdoubleMahesh Bodapati2023-05-191-0/+17
| | | | | Removed -mabi=ieeelongdouble on failing tests. It resolves the error. error: ‘-mabi=ieeelongdouble’ requires ‘-mlong-double-128’
* Added Redirects to longdouble error functions [BZ #29033]Sachin Monga2023-05-101-0/+1
| | | | | | | | This patch redirects the error functions to the appropriate longdouble variants which enables the compiler to optimize for the abi ieeelongdouble. Signed-off-by: Sachin Monga <smonga@linux.ibm.com>
* powerpc: Disable stack protector in early static initializationAdhemerval Zanella2023-04-031-0/+3
| | | | | | | | Similar to fb95c316382679c0826cc8399760977cd95f15c9, also disable for string-ppc64.c (pulled on rltd as the default string implementation). Checked on powerpc64-linux-gnu.
* powerpc: Remove powerpc64 strncmp variantsAdhemerval Zanella Netto2023-03-028-494/+9
| | | | | | | | | | | | | The default, and power7 implementation just adds word aligned access when inputs have the same aligment. The unaligned case is still done by byte operations. This is already covered by the generic implementation, which also add the unaligned input optimization. Checked on powerpc64-linux-gnu built without multi-arch for powerpc64, power7, power8, and power9 (build for le). Reviewed-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>
* string: Add libc_hidden_proto for memrchrAdhemerval Zanella2023-02-083-9/+11
| | | | | | | Although static linker can optimize it to local call, it follows the internal scheme to provide hidden proto and definitions. Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.org>
* string: Add libc_hidden_proto for strchrnulAdhemerval Zanella2023-02-081-0/+1
| | | | | | | Although static linker can optimize it to local call, it follows the internal scheme to provide hidden proto and definitions. Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.org>
* string: Improve generic memchrAdhemerval Zanella2023-02-061-8/+1
| | | | | | | | | | | | | | | | | New algorithm read the first aligned address and mask off the unwanted bytes (this strategy is similar to arch-specific implementations used on powerpc, sparc, and sh). The loop now read word-aligned address and check using the has_eq macro. Checked on x86_64-linux-gnu, i686-linux-gnu, powerpc-linux-gnu, and powerpc64-linux-gnu by removing the arch-specific assembly implementation and disabling multi-arch (it covers both LE and BE for 64 and 32 bits). Co-authored-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
* Update copyright dates with scripts/update-copyrightsJoseph Myers2023-01-06270-270/+270
|
* powerpc64: Remove old strncmp optimizationRajalakshmi Srinivasaraghavan2022-12-025-256/+2
| | | | | | | | | | This patch cleans up the power4 strncmp optimization for powerpc64 which is unlikely to be used anywhere. Tested on ppc64le with and without --disable-multi-arch flag. Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* Disable use of -fsignaling-nans if compiler does not support itAdhemerval Zanella2022-11-012-4/+4
| | | | Reviewed-by: Fangrui Song <maskray@google.com>
* Fix build with GCC 13 _FloatN, _FloatNx built-in functionsJoseph Myers2022-10-312-2/+129
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | GCC 13 has added more _FloatN and _FloatNx versions of existing <math.h> and <complex.h> built-in functions, for use in libstdc++-v3. This breaks the glibc build because of how those functions are defined as aliases to functions with the same ABI but different types. Add appropriate -fno-builtin-* options for compiling relevant files, as already done for the case of long double functions aliasing double ones and based on the list of files used there. I fixed some mistakes in that list of double files that I noticed while implementing this fix, but there may well be more such (harmless) cases, in this list or the new one (files that don't actually exist or don't define the named functions as aliases so don't need the options). I did try to exclude cases where glibc doesn't define certain functions for _FloatN or _FloatNx types at all from the new uses of -fno-builtin-* options. As with the options for double files (see the commit message for commit 49348beafe9ba150c9bd48595b3f372299bddbb0, "Fix build with GCC 10 when long double = double."), it's deliberate that the options are used even if GCC currently doesn't have a built-in version of a given functions, so providing some level of future-proofing against more such built-in functions being added in future. Tested with build-many-glibcs.py for aarch64-linux-gnu powerpc-linux-gnu powerpc64le-linux-gnu x86_64-linux-gnu (compilers and glibcs builds) with GCC mainline.
* Introduce <pointer_guard.h>, extracted from <sysdep.h>Florian Weimer2022-10-182-0/+2
| | | | | | | | | | | | | | This allows us to define a generic no-op version of PTR_MANGLE and PTR_DEMANGLE. In the future, we can use PTR_MANGLE and PTR_DEMANGLE unconditionally in C sources, avoiding an unintended loss of hardening due to missing include files or unlucky header inclusion ordering. In i386 and x86_64, we can avoid a <tls.h> dependency in the C code by using the computed constant from <tcb-offsets.h>. <sysdep.h> no longer includes these definitions, so there is no cyclic dependency anymore when computing the <tcb-offsets.h> constants. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* elf: Remove -fno-tree-loop-distribute-patterns usage on dl-supportAdhemerval Zanella2022-10-101-0/+24
| | | | | | | | | | | | | | Besides the option being gcc specific, this approach is still fragile and not future proof since we do not know if this will be the only optimization option gcc will add that transforms loops to memset (or any libcall). This patch adds a new header, dl-symbol-redir-ifunc.h, that can b used to redirect the compiler generated libcalls to port the generic memset implementation if required. Checked on x86_64-linux-gnu and aarch64-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* arc4random: simplify design for better safetyJason A. Donenfeld2022-07-276-345/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than buffering 16 MiB of entropy in userspace (by way of chacha20), simply call getrandom() every time. This approach is doubtlessly slower, for now, but trying to prematurely optimize arc4random appears to be leading toward all sorts of nasty properties and gotchas. Instead, this patch takes a much more conservative approach. The interface is added as a basic loop wrapper around getrandom(), and then later, the kernel and libc together can work together on optimizing that. This prevents numerous issues in which userspace is unaware of when it really must throw away its buffer, since we avoid buffering all together. Future improvements may include userspace learning more from the kernel about when to do that, which might make these sorts of chacha20-based optimizations more possible. The current heuristic of 16 MiB is meaningless garbage that doesn't correspond to anything the kernel might know about. So for now, let's just do something conservative that we know is correct and won't lead to cryptographic issues for users of this function. This patch might be considered along the lines of, "optimization is the root of all evil," in that the much more complex implementation it replaces moves too fast without considering security implications, whereas the incremental approach done here is a much safer way of going about things. Once this lands, we can take our time in optimizing this properly using new interplay between the kernel and userspace. getrandom(0) is used, since that's the one that ensures the bytes returned are cryptographically secure. But on systems without it, we fallback to using /dev/urandom. This is unfortunate because it means opening a file descriptor, but there's not much of a choice. Secondly, as part of the fallback, in order to get more or less the same properties of getrandom(0), we poll on /dev/random, and if the poll succeeds at least once, then we assume the RNG is initialized. This is a rough approximation, as the ancient "non-blocking pool" initialized after the "blocking pool", not before, and it may not port back to all ancient kernels, though it does to all kernels supported by glibc (≥3.2), so generally it's the best approximation we can do. The motivation for including arc4random, in the first place, is to have source-level compatibility with existing code. That means this patch doesn't attempt to litigate the interface itself. It does, however, choose a conservative approach for implementing it. Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> Cc: Florian Weimer <fweimer@redhat.com> Cc: Cristian Rodríguez <crrodriguez@opensuse.org> Cc: Paul Eggert <eggert@cs.ucla.edu> Cc: Mark Harris <mark.hsj@gmail.com> Cc: Eric Biggers <ebiggers@kernel.org> Cc: linux-crypto@vger.kernel.org Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* powerpc64: Add optimized chacha20Adhemerval Zanella Netto2022-07-226-0/+345
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It adds vectorized ChaCha20 implementation based on libgcrypt cipher/chacha20-ppc.c. It targets POWER8 and it is used on default for LE. On a POWER8 it shows the following improvements (using formatted bench-arc4random data): POWER8 GENERIC MB/s ----------------------------------------------- arc4random [single-thread] 138.77 arc4random_buf(16) [single-thread] 174.36 arc4random_buf(32) [single-thread] 228.11 arc4random_buf(48) [single-thread] 252.31 arc4random_buf(64) [single-thread] 270.11 arc4random_buf(80) [single-thread] 278.97 arc4random_buf(96) [single-thread] 287.78 arc4random_buf(112) [single-thread] 291.92 arc4random_buf(128) [single-thread] 295.25 POWER8 MB/s ----------------------------------------------- arc4random [single-thread] 198.06 arc4random_buf(16) [single-thread] 278.79 arc4random_buf(32) [single-thread] 448.89 arc4random_buf(48) [single-thread] 551.09 arc4random_buf(64) [single-thread] 646.12 arc4random_buf(80) [single-thread] 698.04 arc4random_buf(96) [single-thread] 756.06 arc4random_buf(112) [single-thread] 784.12 arc4random_buf(128) [single-thread] 808.04 ----------------------------------------------- Checked on powerpc64-linux-gnu and powerpc64le-linux-gnu. Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>
* Add bounds check to __libc_ifunc_impl_listWilco Dijkstra2022-06-101-7/+2
| | | | | | | | | | | | Add a proper bounds check to __libc_ifunc_impl_list. This makes MAX_IFUNC redundant and fixes several targets that will write outside the array. To avoid unnecessary large diffs, pass the maximum in the argument 'i' to IFUNC_IMPL_ADD - 'max' can be used in new ifunc definitions and existing ones can be updated if desired. Passes buildmanyglibc. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* powerpc: Fix VSX register number on __strncpy_power9 [BZ #29197]Matheus Castanho2022-06-071-2/+2
| | | | | | | | | | | | | | | __strncpy_power9 initializes VR 18 with zeroes to be used throughout the code, including when zero-padding the destination string. However, the v18 reference was mistakenly being used for stxv and stxvl, which take a VSX vector as operand. The code ended up using the uninitialized VSR 18 register by mistake. Both occurrences have been changed to use the proper VSX number for VR 18 (i.e. VSR 50). Tested on powerpc, powerpc64 and powerpc64le. Signed-off-by: Kewen Lin <linkw@gcc.gnu.org>
* math: Add math-use-builtins-fabs (BZ#29027)Adhemerval Zanella2022-05-231-34/+0
| | | | | | | | | | | | | | | | | | Both float, double, and _Float128 are assumed to be supported (float and double already only uses builtins). Only long double is parametrized due GCC bug 29253 which prevents its usage on powerpc. It allows to remove i686, ia64, x86_64, powerpc, and sparc arch specific implementation. On ia64 it also fixes the sNAN handling: math/test-float64x-fabs math/test-ldouble-fabs Checked on x86_64-linux-gnu, i686-linux-gnu, powerpc-linux-gnu, powerpc64-linux-gnu, sparc64-linux-gnu, and ia64-linux-gnu.
* elf: Replace PI_STATIC_AND_HIDDEN with opposite HIDDEN_VAR_NEEDS_DYNAMIC_RELOCFangrui Song2022-04-262-0/+5
| | | | | | | | | | | | | | | | | | PI_STATIC_AND_HIDDEN indicates whether accesses to internal linkage variables and hidden visibility variables in a shared object (ld.so) need dynamic relocations (usually R_*_RELATIVE). PI (position independent) in the macro name is a misnomer: a code sequence using GOT is typically position-independent as well, but using dynamic relocations does not meet the requirement. Not defining PI_STATIC_AND_HIDDEN is legacy and we expect that all new ports will define PI_STATIC_AND_HIDDEN. Current ports defining PI_STATIC_AND_HIDDEN are more than the opposite. Change the configure default. No functional change. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* powerpc64: Set up thread register for _dl_relocate_static_pieAlan Modra2022-04-101-0/+21
| | | | | | | | | | | | | | | | | libgcc ifunc resolvers that access hwcap via a field in the tcb can't be called until the thread pointer is set up. Other ifunc resolvers might need access to at_platform. This patch sets up a fake thread pointer early to a copy of tcbhead_t. hwcapinfo.c already had local variables for hwcap and at_platform, replace them with an entire tcbhead_t. It's not that large and this way we easily ensure hwcap and at_platform are at the same relative offsets as they are in the real thread block. The patch also conditionally disables part of tst-tlsifunc-static, "bar address read from IFUNC resolver is incorrect". We can't get a proper address for a thread variable before glibc initialises tls. Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
* powerpc64: Use medium model toc accesses throughoutAlan Modra2022-04-106-15/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The PowerPC64 linker edits medium model toc-indirect code to toc-pointer relative: addis r9,r2,tc_entry_for_var@toc@ha ld r9,tc_entry_for_var@toc@l(r9) becomes addis r9,r2,(var-.TOC.)@ha addi r9,r9,(var-.TOC.)@l when "var" is known to be local to the binary. This isn't done for small-model toc-indirect code, because "var" is almost guaranteed to be too far away from .TOC. for a 16-bit signed offset. And, because the analysis of which .toc entry can be removed becomes much more complicated in objects that mix code models, they aren't removed if any small-model toc sequence appears in an object file. Unfortunately, glibc's build of ld.so smashes the needed objects together in a ld -r linking stage. This means the GOT/TOC is left with a whole lot of relative relocations which is untidy, but in itself is not a serious problem. However, static-pie on powerpc64 bombs due to a segfault caused by one of the small-model accesses before _dl_relocate_static_pie. (The very first one in rcrt1.o passing start_addresses in r8 to __libc_start_main.) So this patch makes all the toc/got accesses in assembly medium code model, and a couple of functions hidden. By itself this is not enough to give us working static-pie, but it is useful in isolation to enable better linker optimisation. There's a serious problem in libgcc too. libgcc ifuncs access the AT_HWCAP words stored in the tcb with an offset from the thread pointer (r13), but r13 isn't set at the time _dl_relocate_static_pie. A followup patch will fix that. Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>