about summary refs log tree commit diff
path: root/sysdeps/powerpc/powerpc64
Commit message (Collapse)AuthorAgeFilesLines
* powerpc: Disable stack protector in early static initializationAdhemerval Zanella2023-04-031-0/+3
| | | | | | | | Similar to fb95c316382679c0826cc8399760977cd95f15c9, also disable for string-ppc64.c (pulled on rltd as the default string implementation). Checked on powerpc64-linux-gnu.
* powerpc: Remove powerpc64 strncmp variantsAdhemerval Zanella Netto2023-03-028-494/+9
| | | | | | | | | | | | | The default, and power7 implementation just adds word aligned access when inputs have the same aligment. The unaligned case is still done by byte operations. This is already covered by the generic implementation, which also add the unaligned input optimization. Checked on powerpc64-linux-gnu built without multi-arch for powerpc64, power7, power8, and power9 (build for le). Reviewed-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>
* string: Add libc_hidden_proto for memrchrAdhemerval Zanella2023-02-083-9/+11
| | | | | | | Although static linker can optimize it to local call, it follows the internal scheme to provide hidden proto and definitions. Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.org>
* string: Add libc_hidden_proto for strchrnulAdhemerval Zanella2023-02-081-0/+1
| | | | | | | Although static linker can optimize it to local call, it follows the internal scheme to provide hidden proto and definitions. Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.org>
* string: Improve generic memchrAdhemerval Zanella2023-02-061-8/+1
| | | | | | | | | | | | | | | | | New algorithm read the first aligned address and mask off the unwanted bytes (this strategy is similar to arch-specific implementations used on powerpc, sparc, and sh). The loop now read word-aligned address and check using the has_eq macro. Checked on x86_64-linux-gnu, i686-linux-gnu, powerpc-linux-gnu, and powerpc64-linux-gnu by removing the arch-specific assembly implementation and disabling multi-arch (it covers both LE and BE for 64 and 32 bits). Co-authored-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
* Update copyright dates with scripts/update-copyrightsJoseph Myers2023-01-06270-270/+270
|
* powerpc64: Remove old strncmp optimizationRajalakshmi Srinivasaraghavan2022-12-025-256/+2
| | | | | | | | | | This patch cleans up the power4 strncmp optimization for powerpc64 which is unlikely to be used anywhere. Tested on ppc64le with and without --disable-multi-arch flag. Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* Disable use of -fsignaling-nans if compiler does not support itAdhemerval Zanella2022-11-012-4/+4
| | | | Reviewed-by: Fangrui Song <maskray@google.com>
* Fix build with GCC 13 _FloatN, _FloatNx built-in functionsJoseph Myers2022-10-312-2/+129
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | GCC 13 has added more _FloatN and _FloatNx versions of existing <math.h> and <complex.h> built-in functions, for use in libstdc++-v3. This breaks the glibc build because of how those functions are defined as aliases to functions with the same ABI but different types. Add appropriate -fno-builtin-* options for compiling relevant files, as already done for the case of long double functions aliasing double ones and based on the list of files used there. I fixed some mistakes in that list of double files that I noticed while implementing this fix, but there may well be more such (harmless) cases, in this list or the new one (files that don't actually exist or don't define the named functions as aliases so don't need the options). I did try to exclude cases where glibc doesn't define certain functions for _FloatN or _FloatNx types at all from the new uses of -fno-builtin-* options. As with the options for double files (see the commit message for commit 49348beafe9ba150c9bd48595b3f372299bddbb0, "Fix build with GCC 10 when long double = double."), it's deliberate that the options are used even if GCC currently doesn't have a built-in version of a given functions, so providing some level of future-proofing against more such built-in functions being added in future. Tested with build-many-glibcs.py for aarch64-linux-gnu powerpc-linux-gnu powerpc64le-linux-gnu x86_64-linux-gnu (compilers and glibcs builds) with GCC mainline.
* Introduce <pointer_guard.h>, extracted from <sysdep.h>Florian Weimer2022-10-182-0/+2
| | | | | | | | | | | | | | This allows us to define a generic no-op version of PTR_MANGLE and PTR_DEMANGLE. In the future, we can use PTR_MANGLE and PTR_DEMANGLE unconditionally in C sources, avoiding an unintended loss of hardening due to missing include files or unlucky header inclusion ordering. In i386 and x86_64, we can avoid a <tls.h> dependency in the C code by using the computed constant from <tcb-offsets.h>. <sysdep.h> no longer includes these definitions, so there is no cyclic dependency anymore when computing the <tcb-offsets.h> constants. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* elf: Remove -fno-tree-loop-distribute-patterns usage on dl-supportAdhemerval Zanella2022-10-101-0/+24
| | | | | | | | | | | | | | Besides the option being gcc specific, this approach is still fragile and not future proof since we do not know if this will be the only optimization option gcc will add that transforms loops to memset (or any libcall). This patch adds a new header, dl-symbol-redir-ifunc.h, that can b used to redirect the compiler generated libcalls to port the generic memset implementation if required. Checked on x86_64-linux-gnu and aarch64-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* arc4random: simplify design for better safetyJason A. Donenfeld2022-07-276-345/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than buffering 16 MiB of entropy in userspace (by way of chacha20), simply call getrandom() every time. This approach is doubtlessly slower, for now, but trying to prematurely optimize arc4random appears to be leading toward all sorts of nasty properties and gotchas. Instead, this patch takes a much more conservative approach. The interface is added as a basic loop wrapper around getrandom(), and then later, the kernel and libc together can work together on optimizing that. This prevents numerous issues in which userspace is unaware of when it really must throw away its buffer, since we avoid buffering all together. Future improvements may include userspace learning more from the kernel about when to do that, which might make these sorts of chacha20-based optimizations more possible. The current heuristic of 16 MiB is meaningless garbage that doesn't correspond to anything the kernel might know about. So for now, let's just do something conservative that we know is correct and won't lead to cryptographic issues for users of this function. This patch might be considered along the lines of, "optimization is the root of all evil," in that the much more complex implementation it replaces moves too fast without considering security implications, whereas the incremental approach done here is a much safer way of going about things. Once this lands, we can take our time in optimizing this properly using new interplay between the kernel and userspace. getrandom(0) is used, since that's the one that ensures the bytes returned are cryptographically secure. But on systems without it, we fallback to using /dev/urandom. This is unfortunate because it means opening a file descriptor, but there's not much of a choice. Secondly, as part of the fallback, in order to get more or less the same properties of getrandom(0), we poll on /dev/random, and if the poll succeeds at least once, then we assume the RNG is initialized. This is a rough approximation, as the ancient "non-blocking pool" initialized after the "blocking pool", not before, and it may not port back to all ancient kernels, though it does to all kernels supported by glibc (≥3.2), so generally it's the best approximation we can do. The motivation for including arc4random, in the first place, is to have source-level compatibility with existing code. That means this patch doesn't attempt to litigate the interface itself. It does, however, choose a conservative approach for implementing it. Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> Cc: Florian Weimer <fweimer@redhat.com> Cc: Cristian Rodríguez <crrodriguez@opensuse.org> Cc: Paul Eggert <eggert@cs.ucla.edu> Cc: Mark Harris <mark.hsj@gmail.com> Cc: Eric Biggers <ebiggers@kernel.org> Cc: linux-crypto@vger.kernel.org Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* powerpc64: Add optimized chacha20Adhemerval Zanella Netto2022-07-226-0/+345
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It adds vectorized ChaCha20 implementation based on libgcrypt cipher/chacha20-ppc.c. It targets POWER8 and it is used on default for LE. On a POWER8 it shows the following improvements (using formatted bench-arc4random data): POWER8 GENERIC MB/s ----------------------------------------------- arc4random [single-thread] 138.77 arc4random_buf(16) [single-thread] 174.36 arc4random_buf(32) [single-thread] 228.11 arc4random_buf(48) [single-thread] 252.31 arc4random_buf(64) [single-thread] 270.11 arc4random_buf(80) [single-thread] 278.97 arc4random_buf(96) [single-thread] 287.78 arc4random_buf(112) [single-thread] 291.92 arc4random_buf(128) [single-thread] 295.25 POWER8 MB/s ----------------------------------------------- arc4random [single-thread] 198.06 arc4random_buf(16) [single-thread] 278.79 arc4random_buf(32) [single-thread] 448.89 arc4random_buf(48) [single-thread] 551.09 arc4random_buf(64) [single-thread] 646.12 arc4random_buf(80) [single-thread] 698.04 arc4random_buf(96) [single-thread] 756.06 arc4random_buf(112) [single-thread] 784.12 arc4random_buf(128) [single-thread] 808.04 ----------------------------------------------- Checked on powerpc64-linux-gnu and powerpc64le-linux-gnu. Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>
* Add bounds check to __libc_ifunc_impl_listWilco Dijkstra2022-06-101-7/+2
| | | | | | | | | | | | Add a proper bounds check to __libc_ifunc_impl_list. This makes MAX_IFUNC redundant and fixes several targets that will write outside the array. To avoid unnecessary large diffs, pass the maximum in the argument 'i' to IFUNC_IMPL_ADD - 'max' can be used in new ifunc definitions and existing ones can be updated if desired. Passes buildmanyglibc. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* powerpc: Fix VSX register number on __strncpy_power9 [BZ #29197]Matheus Castanho2022-06-071-2/+2
| | | | | | | | | | | | | | | __strncpy_power9 initializes VR 18 with zeroes to be used throughout the code, including when zero-padding the destination string. However, the v18 reference was mistakenly being used for stxv and stxvl, which take a VSX vector as operand. The code ended up using the uninitialized VSR 18 register by mistake. Both occurrences have been changed to use the proper VSX number for VR 18 (i.e. VSR 50). Tested on powerpc, powerpc64 and powerpc64le. Signed-off-by: Kewen Lin <linkw@gcc.gnu.org>
* math: Add math-use-builtins-fabs (BZ#29027)Adhemerval Zanella2022-05-231-34/+0
| | | | | | | | | | | | | | | | | | Both float, double, and _Float128 are assumed to be supported (float and double already only uses builtins). Only long double is parametrized due GCC bug 29253 which prevents its usage on powerpc. It allows to remove i686, ia64, x86_64, powerpc, and sparc arch specific implementation. On ia64 it also fixes the sNAN handling: math/test-float64x-fabs math/test-ldouble-fabs Checked on x86_64-linux-gnu, i686-linux-gnu, powerpc-linux-gnu, powerpc64-linux-gnu, sparc64-linux-gnu, and ia64-linux-gnu.
* elf: Replace PI_STATIC_AND_HIDDEN with opposite HIDDEN_VAR_NEEDS_DYNAMIC_RELOCFangrui Song2022-04-262-0/+5
| | | | | | | | | | | | | | | | | | PI_STATIC_AND_HIDDEN indicates whether accesses to internal linkage variables and hidden visibility variables in a shared object (ld.so) need dynamic relocations (usually R_*_RELATIVE). PI (position independent) in the macro name is a misnomer: a code sequence using GOT is typically position-independent as well, but using dynamic relocations does not meet the requirement. Not defining PI_STATIC_AND_HIDDEN is legacy and we expect that all new ports will define PI_STATIC_AND_HIDDEN. Current ports defining PI_STATIC_AND_HIDDEN are more than the opposite. Change the configure default. No functional change. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* powerpc64: Set up thread register for _dl_relocate_static_pieAlan Modra2022-04-101-0/+21
| | | | | | | | | | | | | | | | | libgcc ifunc resolvers that access hwcap via a field in the tcb can't be called until the thread pointer is set up. Other ifunc resolvers might need access to at_platform. This patch sets up a fake thread pointer early to a copy of tcbhead_t. hwcapinfo.c already had local variables for hwcap and at_platform, replace them with an entire tcbhead_t. It's not that large and this way we easily ensure hwcap and at_platform are at the same relative offsets as they are in the real thread block. The patch also conditionally disables part of tst-tlsifunc-static, "bar address read from IFUNC resolver is incorrect". We can't get a proper address for a thread variable before glibc initialises tls. Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
* powerpc64: Use medium model toc accesses throughoutAlan Modra2022-04-106-15/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The PowerPC64 linker edits medium model toc-indirect code to toc-pointer relative: addis r9,r2,tc_entry_for_var@toc@ha ld r9,tc_entry_for_var@toc@l(r9) becomes addis r9,r2,(var-.TOC.)@ha addi r9,r9,(var-.TOC.)@l when "var" is known to be local to the binary. This isn't done for small-model toc-indirect code, because "var" is almost guaranteed to be too far away from .TOC. for a 16-bit signed offset. And, because the analysis of which .toc entry can be removed becomes much more complicated in objects that mix code models, they aren't removed if any small-model toc sequence appears in an object file. Unfortunately, glibc's build of ld.so smashes the needed objects together in a ld -r linking stage. This means the GOT/TOC is left with a whole lot of relative relocations which is untidy, but in itself is not a serious problem. However, static-pie on powerpc64 bombs due to a segfault caused by one of the small-model accesses before _dl_relocate_static_pie. (The very first one in rcrt1.o passing start_addresses in r8 to __libc_start_main.) So this patch makes all the toc/got accesses in assembly medium code model, and a couple of functions hidden. By itself this is not enough to give us working static-pie, but it is useful in isolation to enable better linker optimisation. There's a serious problem in libgcc too. libgcc ifuncs access the AT_HWCAP words stored in the tcb with an offset from the thread pointer (r13), but r13 isn't set at the time _dl_relocate_static_pie. A followup patch will fix that. Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
* powerpc: Remove fcopysign{f} implementationAdhemerval Zanella2022-04-071-48/+0
| | | | | | The builtin and generic implementation from generic files are suffice. Checked on powerpc64-linux-gnu and powerpc-linux-gnu.
* configure.ac: fix bashisms in configure.acSam James2022-03-224-4/+4
| | | | | | | | | | | | | | | | | | | configure scripts need to be runnable with a POSIX-compliant /bin/sh. On many (but not all!) systems, /bin/sh is provided by Bash, so errors like this aren't spotted. Notably Debian defaults to /bin/sh provided by dash which doesn't tolerate such bashisms as '=='. This retains compatibility with bash. Fixes configure warnings/errors like: ``` checking if compiler warns about alias for function with incompatible types... yes /var/tmp/portage/sys-libs/glibc-2.34-r10/work/glibc-2.34/configure: 4209: test: xyes: unexpected operator ``` Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Signed-off-by: Sam James <sam@gentoo.org>
* powerpc: Remove powerpc64 bzero optimizationsAdhemerval Zanella2022-02-2315-197/+1
| | | | | The symbol is not present in current POSIX specification and compiler already generates memset call.
* powerpc: Remove bcopy optimizationsAdhemerval Zanella2022-02-239-113/+1
| | | | | The symbols is not present in current POSIX specification and compiler already generates memmove call.
* elf: Remove prelink supportAdhemerval Zanella2022-02-101-37/+0
| | | | | | | | | | | | | Prelinked binaries and libraries still work, the dynamic tags DT_GNU_PRELINKED, DT_GNU_LIBLIST, DT_GNU_CONFLICT just ignored (meaning the process is reallocated as default). The loader environment variable TRACE_PRELINKING is also removed, since it used solely on prelink. Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* powerpc64le: Use <gcc-macros.h> in early HWCAP checkFlorian Weimer2022-01-141-4/+5
| | | | | | | | This is required so that the checks still work if $(early-cflags) selects a different ISA level. Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@redhat.com>
* debug: Remove catchsegv and libSegfault (BZ #14913)Adhemerval Zanella2022-01-061-124/+0
| | | | | | | | | | | | Trapping SIGSEGV within the process is error-prone, adds security issues, and modern analysis design tends to happen out of the process (either by attaching a debugger or by post-mortem analysis). The libSegfault also has some design problems, it uses non async-signal-safe function (backtrace) on signal handler. There are multiple alternatives if users do want to use similar functionality, such as sigsegv gnulib module or libsegfault.
* Update copyright dates with scripts/update-copyrightsPaul Eggert2022-01-01278-278/+278
| | | | | | | | | | | | | | | | | | | | | | | I used these shell commands: ../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright (cd ../glibc && git commit -am"[this commit message]") and then ignored the output, which consisted lines saying "FOO: warning: copyright statement not found" for each of 7061 files FOO. I then removed trailing white space from math/tgmath.h, support/tst-support-open-dev-null-range.c, and sysdeps/x86_64/multiarch/strlen-vec.S, to work around the following obscure pre-commit check failure diagnostics from Savannah. I don't know why I run into these diagnostics whereas others evidently do not. remote: *** 912-#endif remote: *** 913: remote: *** 914- remote: *** error: lines with trailing whitespace found ... remote: *** error: sysdeps/unix/sysv/linux/statx_cp.c: trailing lines
* elf: Add _dl_audit_pltexitAdhemerval Zanella2021-12-281-2/+2
| | | | | | | | | It consolidates the code required to call la_pltexit audit callback. Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu. Reviewed-by: Florian Weimer <fweimer@redhat.com>
* csu: Always use __executable_start in gmon-start.cFlorian Weimer2021-12-051-37/+0
| | | | | | | | | | | | | | Current binutils defines __executable_start as the lowest text address, so using the entry point address as a fallback is no longer necessary. As a result, overriding <entry.h> is only necessary if the entry point is not called _start. The previous approach to define __ASSEMBLY__ to suppress the declaration breaks if headers included by <entry.h> are not compatible with __ASSEMBLY__. This happens with rseq integration because it is necessary to include kernel headers in more places. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* powerpc64[le]: Fix CFI and LR save address for asm syscalls [BZ #28532]Matheus Castanho2021-11-301-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Syscalls based on the assembly templates are missing CFI for r31, which gets clobbered when scv is used, and info for LR is inaccurate, placed in the wrong LOC and not using the proper offset. LR was also being saved to the callee's frame, while the ABI mandates it to be saved to the caller's frame. These are fixed by this commit. After this change: $ readelf -wF libc.so.6 | grep 0004b9d4.. -A 7 && objdump --disassemble=kill libc.so.6 00004a48 0000000000000020 00004a4c FDE cie=00000000 pc=000000000004b9d4..000000000004ba3c LOC CFA r31 ra 000000000004b9d4 r1+0 u u 000000000004b9e4 r1+48 u u 000000000004b9e8 r1+48 c-16 u 000000000004b9fc r1+48 c-16 c+16 000000000004ba08 r1+48 c-16 000000000004ba18 r1+48 u 000000000004ba1c r1+0 u libc.so.6: file format elf64-powerpcle Disassembly of section .text: 000000000004b9d4 <kill>: 4b9d4: 1f 00 4c 3c addis r2,r12,31 4b9d8: 2c c3 42 38 addi r2,r2,-15572 4b9dc: 25 00 00 38 li r0,37 4b9e0: d1 ff 21 f8 stdu r1,-48(r1) 4b9e4: 20 00 e1 fb std r31,32(r1) 4b9e8: 98 8f ed eb ld r31,-28776(r13) 4b9ec: 10 00 ff 77 andis. r31,r31,16 4b9f0: 1c 00 82 41 beq 4ba0c <kill+0x38> 4b9f4: a6 02 28 7d mflr r9 4b9f8: 40 00 21 f9 std r9,64(r1) 4b9fc: 01 00 00 44 scv 0 4ba00: 40 00 21 e9 ld r9,64(r1) 4ba04: a6 03 28 7d mtlr r9 4ba08: 08 00 00 48 b 4ba10 <kill+0x3c> 4ba0c: 02 00 00 44 sc 4ba10: 00 00 bf 2e cmpdi cr5,r31,0 4ba14: 20 00 e1 eb ld r31,32(r1) 4ba18: 30 00 21 38 addi r1,r1,48 4ba1c: 18 00 96 41 beq cr5,4ba34 <kill+0x60> 4ba20: 01 f0 20 39 li r9,-4095 4ba24: 40 48 23 7c cmpld r3,r9 4ba28: 20 00 e0 4d bltlr+ 4ba2c: d0 00 63 7c neg r3,r3 4ba30: 08 00 00 48 b 4ba38 <kill+0x64> 4ba34: 20 00 e3 4c bnslr+ 4ba38: c8 32 fe 4b b 2ed00 <__syscall_error> ... 4ba44: 40 20 0c 00 .long 0xc2040 4ba48: 68 00 00 00 .long 0x68 4ba4c: 06 00 5f 5f rlwnm r31,r26,r0,0,3 4ba50: 6b 69 6c 6c xoris r12,r3,26987
* powerpc: Define USE_PPC64_NOTOC iff compiler supports itAdhemerval Zanella2021-11-222-24/+43
| | | | | | | | | | | | | | | | | | The @notoc usage only yields an advantage on ISA 3.1+ machine (power10) and for ld.bfd also when it sees pcrel relocations used on the code (generated if compiler targets ISA 3.1+). On bfd case ISA 3.1+ instruction on stubs are used iff linker also sees the new pc-relative relocations (for instance R_PPC64_D34), otherwise it generates default stubs (ppc64_elf_check_relocs:4700). This patch also help on linkers that do not implement this optimization, since building for older ISA (such as 3.0 / power9) will also trigger power10 stubs generation in the assembly code uses the NOTOC imacro. Checked on powerpc64le-linux-gnu. Reviewed-by: Fangrui Song <maskray@google.com> Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
* String: Add hidden defs for __memcmpeq() to enable internal usageNoah Goldstein2021-10-268-0/+12
| | | | | | | | No bug. This commit adds hidden defs for all declarations of __memcmpeq. This enables usage of __memcmpeq without the PLT for usage internal to GLIBC.
* String: Add support for __memcmpeq() ABI on all targetsNoah Goldstein2021-10-269-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | No bug. This commit adds support for __memcmpeq() as a new ABI for all targets. In this commit __memcmpeq() is implemented only as an alias to the corresponding targets memcmp() implementation. __memcmpeq() is added as a new symbol starting with GLIBC_2.35 and defined in string.h with comments explaining its behavior. Basic tests that it is callable and works where added in string/tester.c As discussed in the proposal "Add new ABI '__memcmpeq()' to libc" __memcmpeq() is essentially a reserved namespace for bcmp(). The means is shares the same specifications as memcmp() except the return value for non-equal byte sequences is any non-zero value. This is less strict than memcmp()'s return value specification and can be better optimized when a boolean return is all that is needed. __memcmpeq() is meant to only be called by compilers if they can prove that the return value of a memcmp() call is only used for its boolean value. All tests in string/tester.c passed. As well build succeeds on x86_64-linux-gnu target.
* powerpc: Remove backtrace implementationAdhemerval Zanella2021-10-201-117/+0
| | | | | | | | | | | | | | | The powerpc optimization to provide a fast stacktrace requires some ad-hoc code to handle Linux signal frames and the change is fragile once the kernel decides to slight change its execution sequence [1]. The generic implementation work as-is and it should be future proof since the kernel provides the expected CFI directives in vDSO shared page. Checked on powerpc-linux-gnu, powerpc64le-linux-gnu, and powerpc64-linux-gnu. [1] https://sourceware.org/pipermail/libc-alpha/2021-January/122027.html
* elf: Fix dynamic-link.h usage on rtld.cAdhemerval Zanella2021-10-143-17/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 4af6982e4c fix does not fully handle RTLD_BOOTSTRAP usage on rtld.c due two issues: 1. RTLD_BOOTSTRAP is also used on dl-machine.h on various architectures and it changes the semantics of various machine relocation functions. 2. The elf_get_dynamic_info() change was done sideways, previously to 490e6c62aa get-dynamic-info.h was included by the first dynamic-link.h include *without* RTLD_BOOTSTRAP being defined. It means that the code within elf_get_dynamic_info() that uses RTLD_BOOTSTRAP is in fact unused. To fix 1. this patch now includes dynamic-link.h only once with RTLD_BOOTSTRAP defined. The ELF_DYNAMIC_RELOCATE call will now have the relocation fnctions with the expected semantics for the loader. And to fix 2. part of 4af6982e4c is reverted (the check argument elf_get_dynamic_info() is not required) and the RTLD_BOOTSTRAP pieces are removed. To reorganize the includes the static TLS definition is moved to its own header to avoid a circular dependency (it is defined on dynamic-link.h and dl-machine.h requires it at same time other dynamic-link.h definition requires dl-machine.h defitions). Also ELF_MACHINE_NO_REL, ELF_MACHINE_NO_RELA, and ELF_MACHINE_PLT_REL are moved to its own header. Only ancient ABIs need special values (arm, i386, and mips), so a generic one is used as default. The powerpc Elf64_FuncDesc is also moved to its own header, since csu code required its definition (which would require either include elf/ folder or add a full path with elf/). Checked on x86_64, i686, aarch64, armhf, powerpc64, powerpc32, and powerpc64le. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* elf: Avoid nested functions in the loader [BZ #27220]Fangrui Song2021-10-071-9/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | dynamic-link.h is included more than once in some elf/ files (rtld.c, dl-conflict.c, dl-reloc.c, dl-reloc-static-pie.c) and uses GCC nested functions. This harms readability and the nested functions usage is the biggest obstacle prevents Clang build (Clang doesn't support GCC nested functions). The key idea for unnesting is to add extra parameters (struct link_map *and struct r_scope_elm *[]) to RESOLVE_MAP, ELF_MACHINE_BEFORE_RTLD_RELOC, ELF_DYNAMIC_RELOCATE, elf_machine_rel[a], elf_machine_lazy_rel, and elf_machine_runtime_setup. (This is inspired by Stan Shebs' ppc64/x86-64 implementation in the google/grte/v5-2.27/master which uses mixed extra parameters and static variables.) Future simplification: * If mips elf_machine_runtime_setup no longer needs RESOLVE_GOTSYM, elf_machine_runtime_setup can drop the `scope` parameter. * If TLSDESC no longer need to be in elf_machine_lazy_rel, elf_machine_lazy_rel can drop the `scope` parameter. Tested on aarch64, i386, x86-64, powerpc64le, powerpc64, powerpc32, sparc64, sparcv9, s390x, s390, hppa, ia64, armhf, alpha, and mips64. In addition, tested build-many-glibcs.py with {arc,csky,microblaze,nios2}-linux-gnu and riscv64-linux-gnu-rv64imafdc-lp64d. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* powerpc: Delete unneeded ELF_MACHINE_BEFORE_RTLD_RELOCFangrui Song2021-09-271-2/+0
| | | | Reviewed-by: Raphael M Zinsly <rzinsly@linux.ibm.com>
* Add narrowing fma functionsJoseph Myers2021-09-222-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the narrowing fused multiply-add functions from TS 18661-1 / TS 18661-3 / C2X to glibc's libm: ffma, ffmal, dfmal, f32fmaf64, f32fmaf32x, f32xfmaf64 for all configurations; f32fmaf64x, f32fmaf128, f64fmaf64x, f64fmaf128, f32xfmaf64x, f32xfmaf128, f64xfmaf128 for configurations with _Float64x and _Float128; __f32fmaieee128 and __f64fmaieee128 aliases in the powerpc64le case (for calls to ffmal and dfmal when long double is IEEE binary128). Corresponding tgmath.h macro support is also added. The changes are mostly similar to those for the other narrowing functions previously added, especially that for sqrt, so the description of those generally applies to this patch as well. As with sqrt, I reused the same test inputs in auto-libm-test-in as for non-narrowing fma rather than adding extra or separate inputs for narrowing fma. The tests in libm-test-narrow-fma.inc also follow those for non-narrowing fma. The non-narrowing fma has a known bug (bug 6801) that it does not set errno on errors (overflow, underflow, Inf * 0, Inf - Inf). Rather than fixing this or having narrowing fma check for errors when non-narrowing does not (complicating the cases when narrowing fma can otherwise be an alias for a non-narrowing function), this patch does not attempt to check for errors from narrowing fma and set errno; the CHECK_NARROW_FMA macro is still present, but as a placeholder that does nothing, and this missing errno setting is considered to be covered by the existing bug rather than needing a separate open bug. missing-errno annotations are duly added to many of the auto-libm-test-in test inputs for fma. This completes adding all the new functions from TS 18661-1 to glibc, so will be followed by corresponding stdc-predef.h changes to define __STDC_IEC_60559_BFP__ and __STDC_IEC_60559_COMPLEX__, as the support for TS 18661-1 will be at a similar level to that for C standard floating-point facilities up to C11 (pragmas not implemented, but library functions done). (There are still further changes to be done to implement changes to the types of fromfp functions from N2548.) Tested as followed: natively with the full glibc testsuite for x86_64 (GCC 11, 7, 6) and x86 (GCC 11); with build-many-glibcs.py with GCC 11, 7 and 6; cross testing of math/ tests for powerpc64le, powerpc32 hard float, mips64 (all three ABIs, both hard and soft float). The different GCC versions are to cover the different cases in tgmath.h and tgmath.h tests properly (GCC 6 has _Float* only as typedefs in glibc headers, GCC 7 has proper _Float* support, GCC 8 adds __builtin_tgmath).
* powerpc: Fix unrecognized instruction errors with recent GCCPaul A. Clarke2021-09-201-0/+1
| | | | | | | | | | | | | | | | | | | | | | Recent binutils commit b25f942e18d6ecd7ec3e2d2e9930eb4f996c258a changes the behavior of `.machine` directives to override, rather than augment, the base CPU. This can result in _reduced_ functionality when, for example, compiling for default machine "power8", but explicitly asking for ".machine power5", which loses Altivec instructions. In tst-ucontext-ppc64-vscr.c, while the instructions provoking the new error messages are bracketed by ".machine power5", which is ostensibly Power ISA 2.03 (POWER5), the POWER5 processor did not support the VSX subset, so these instructions are not recognized as "power5". Error: unrecognized opcode: `vspltisb' Error: unrecognized opcode: `vpkuwus' Error: unrecognized opcode: `mfvscr' Error: unrecognized opcode: `stvx' Manually adding the VSX subset via ".machine altivec" is sufficient. Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
* Add narrowing square root functionsJoseph Myers2021-09-102-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the narrowing square root functions from TS 18661-1 / TS 18661-3 / C2X to glibc's libm: fsqrt, fsqrtl, dsqrtl, f32sqrtf64, f32sqrtf32x, f32xsqrtf64 for all configurations; f32sqrtf64x, f32sqrtf128, f64sqrtf64x, f64sqrtf128, f32xsqrtf64x, f32xsqrtf128, f64xsqrtf128 for configurations with _Float64x and _Float128; __f32sqrtieee128 and __f64sqrtieee128 aliases in the powerpc64le case (for calls to fsqrtl and dsqrtl when long double is IEEE binary128). Corresponding tgmath.h macro support is also added. The changes are mostly similar to those for the other narrowing functions previously added, so the description of those generally applies to this patch as well. However, the not-actually-narrowing cases (where the two types involved in the function have the same floating-point format) are aliased to sqrt, sqrtl or sqrtf128 rather than needing a separately built not-actually-narrowing function such as was needed for add / sub / mul / div. Thus, there is no __nldbl_dsqrtl name for ldbl-opt because no such name was needed (whereas the other functions needed such a name since the only other name for that entry point was e.g. f32xaddf64, not reserved by TS 18661-1); the headers are made to arrange for sqrt to be called in that case instead. The DIAG_* calls in sysdeps/ieee754/soft-fp/s_dsqrtl.c are because they were observed to be needed in GCC 7 testing of riscv32-linux-gnu-rv32imac-ilp32. The other sysdeps/ieee754/soft-fp/ files added didn't need such DIAG_* in any configuration I tested with build-many-glibcs.py, but if they do turn out to be needed in more files with some other configuration / GCC version, they can always be added there. I reused the same test inputs in auto-libm-test-in as for non-narrowing sqrt rather than adding extra or separate inputs for narrowing sqrt. The tests in libm-test-narrow-sqrt.inc also follow those for non-narrowing sqrt. Tested as followed: natively with the full glibc testsuite for x86_64 (GCC 11, 7, 6) and x86 (GCC 11); with build-many-glibcs.py with GCC 11, 7 and 6; cross testing of math/ tests for powerpc64le, powerpc32 hard float, mips64 (all three ABIs, both hard and soft float). The different GCC versions are to cover the different cases in tgmath.h and tgmath.h tests properly (GCC 6 has _Float* only as typedefs in glibc headers, GCC 7 has proper _Float* support, GCC 8 adds __builtin_tgmath).
* Remove "Contributed by" linesSiddhesh Poyarekar2021-09-0315-15/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | We stopped adding "Contributed by" or similar lines in sources in 2012 in favour of git logs and keeping the Contributors section of the glibc manual up to date. Removing these lines makes the license header a bit more consistent across files and also removes the possibility of error in attribution when license blocks or files are copied across since the contributed-by lines don't actually reflect reality in those cases. Move all "Contributed by" and similar lines (Written by, Test by, etc.) into a new file CONTRIBUTED-BY to retain record of these contributions. These contributors are also mentioned in manual/contrib.texi, so we just maintain this additional record as a courtesy to the earlier developers. The following scripts were used to filter a list of files to edit in place and to clean up the CONTRIBUTED-BY file respectively. These were not added to the glibc sources because they're not expected to be of any use in future given that this is a one time task: https://gist.github.com/siddhesh/b5ecac94eabfd72ed2916d6d8157e7dc https://gist.github.com/siddhesh/15ea1f5e435ace9774f485030695ee02 Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* Remove sysdeps/*/tls-macros.hFangrui Song2021-08-181-42/+0
| | | | | | | | They provide TLS_GD/TLS_LD/TLS_IE/TLS_IE macros for TLS testing. Now that we have migrated to __thread and tls_model attributes, these macros are unused and the tls-macros.h files can retire. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* powerpc64: Add checks for Altivec and VSX in ifunc selectionAnton Blanchard2021-08-0626-68/+139
| | | | | | | We'd like to support processors without Altivec or VSX, so check the relevant hwcap bits before selecting them. Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
* powerpc64: Check cacheline size before using optimised memset routinesAnton Blanchard2021-08-062-10/+23
| | | | | | | A number of optimised memset routines assume the cacheline size is 128B, so we better check before using them. Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
* powerpc64: Replace some PPC_FEATURE_HAS_VSX with PPC_FEATURE_ARCH_2_06Anton Blanchard2021-08-0620-38/+38
| | | | | | | | We use PPC_FEATURE_HAS_VSX to select a number of POWER7 optimised functions. These functions don't use any VSX instructions, so PPC_FEATURE_ARCH_2_06 seems like a better fit. Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
* powerpc64le: Fix typo in configureAnton Blanchard2021-07-082-2/+2
| | | | | | The configure script checks for -mlong-double-128 but mentions -mlongdouble when it fails. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* powerpc64: Remove strcspn ifunc from the loaderTulio Magno Quites Machado Filho2021-07-081-0/+18
| | | | | | | | | | 5 years ago, commit 8f1b841e452dbb083112fd036033b7f4af506ba0 unintentionally added an ifunc to the loader. That modification has not caused any harm so far, but it doesn't add any value either, because the hwcap information is available later during libc initialization. Suggested-by: Anton Blanchard <anton@ozlabs.org>
* powerpc: optimize strcpy/stpcpy for POWER9/10Pedro Franco de Carvalho2021-07-011-71/+89
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch modifies the current POWER9 implementation of strcpy and stpcpy to optimize it for POWER9/10. Since no new POWER10 instructions are used, the original POWER9 strcpy is modified instead of creating a new implementation for POWER10. This implementation is based on both the original POWER9 implementation of strcpy and the preamble of the new POWER10 implementation of strlen. The changes also affect stpcpy, which uses the same implementation with some additional code before returning. On POWER9, averaging improvements across the benchmark inputs (length/source alignment/destination alignment), for an experiment that ran the benchmark five times, bench-strcpy showed an improvement of 5.23%, and bench-stpcpy showed an improvement of 6.59%. On POWER10, bench-strcpy showed 13.16%, and bench-stpcpy showed 13.59%. The changes are: 1. Removed the null string optimization. Although this results in a few extra cycles for the null string, in combination with the second change, this resulted in improvements for for other cases. 2. Adapted the preamble from strlen for POWER10. This is the part of the function that handles up to the first 16 bytes of the string. 3. Increased number of unrolled iterations in the main loop to 6. Reviewed-by: Matheus Castanho <msc@linux.ibm.com> Tested-by: Matheus Castanho <msc@linux.ibm.com>
* Add build option to disable usage of scv on powerpcMatheus Castanho2021-06-101-8/+8
| | | | | | | | | | | | | | | | | Commit 68ab82f56690ada86ac1e0c46bad06ba189a10ef added support for the scv syscall ABI on powerpc. Since then systems that have kernel and processor support started using scv. However adding the proper support for a new syscall ABI requires changes to several other projects (e.g. qemu, valgrind, strace, kernel), which are gradually receiving support. Meanwhile, having a way to disable scv on glibc at build time can be useful for distros that may encounter conflicts with projects that still do not support the scv ABI, buying time until proper support is added. This commit adds a --disable-scv option that disables scv support and uses sc for all syscalls, like before commit 68ab82f56690ada86ac1e0c46bad06ba189a10ef. Reviewed-by: Raphael M Zinsly <rzinsly@linux.ibm.com>
* Remove stale references to libdl.aFlorian Weimer2021-06-091-1/+0
| | | | | | | | Since commit 0c1c3a771eceec46e66ce1183cf988e2303bd373 ("dlfcn: Move dlopen into libc") libdl.a is empty, so linking against it is no longer necessary. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>