mirror/glibc - mirror of git://sourceware.org/git/glibc.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	powerpc: Add log IFUNC multiarch support for POWER10	Raphael Moreira Zinsly	2021-04-26	7	-0/+101
\| \| \| \| \| \| \|	Checked on ppc64le built without --with-cpu, with --with-cpu=power9 and with --disable-multi-arch. Reviewed-by: Matheus Castanho <msc@linux.ibm.com>
*	nptl: Move pthread_spin_trylock into libc	Florian Weimer	2021-04-23	1	-1/+9
\| \| \| \|	The symbol was moved using scripts/move-symbol-to-libc.py.
*	nptl: Move pthread_spin_lock into libc	Florian Weimer	2021-04-23	1	-1/+7
\| \| \| \|	The symbol was moved using scripts/move-symbol-to-libc.py.
*	nptl: Move pthread_spin_init, Move pthread_spin_unlock into libc	Florian Weimer	2021-04-23	1	-1/+9
\| \| \| \| \| \| \|	For some architectures, the two functions are aliased, so these symbols need to be moved at the same time. The symbols were moved using scripts/move-symbol-to-libc.py.
*	powerpc: Add optimized strlen for POWER10	Matheus Castanho	2021-04-22	5	-1/+230
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Improvements compared to POWER9 version: 1. Take into account first 16B comparison for aligned strings The previous version compares the first 16B and increments r4 by the number of bytes until the address is 16B-aligned, then starts doing aligned loads at that address. For aligned strings, this causes the first 16B to be compared twice, because the increment is 0. Here we calculate the next 16B-aligned address differently, which avoids that issue. 2. Use simple comparisons for the first ~192 bytes The main loop is good for big strings, but comparing 16B each time is better for smaller strings. So after aligning the address to 16 Bytes, we check more 176B in 16B chunks. There may be some overlaps with the main loop for unaligned strings, but we avoid using the more aggressive strategy too soon, and also allow the loop to start at a 64B-aligned address. This greatly benefits smaller strings and avoids overlapping checks if the string is already aligned at a 64B boundary. 3. Reduce dependencies between load blocks caused by address calculation on loop Doing a precise time tracing on the code showed many loads in the loop were stalled waiting for updates to r4 from previous code blocks. This implementation avoids that as much as possible by using 2 registers (r4 and r5) to hold addresses to be used by different parts of the code. Also, the previous code aligned the address to 16B, then to 64B by doing a few 48B loops (if needed) until the address was aligned. The main loop could not start until that 48B loop had finished and r4 was updated with the current address. Here we calculate the address used by the loop very early, so it can start sooner. The main loop now uses 2 pointers 128B apart to make pointer updates less frequent, and also unrolls 1 iteration to guarantee there is enough time between iterations to update the pointers, reducing stalled cycles. 4. Use new P10 instructions lxvp is used to load 32B with a single instruction, reducing contention in the load queue. vextractbm allows simplifying the tail code for the loop, replacing vbpermq and avoiding having to generate a permute control vector. Reviewed-by: Paul E Murphy <murphyp@linux.ibm.com> Reviewed-by: Raphael M Zinsly <rzinsly@linux.ibm.com> Reviewed-by: Lucas A. M. Magalhaes <lamm@linux.ibm.com>
*	nptl: Move __pthread_unwind_next into libc	Florian Weimer	2021-04-21	2	-13/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It's necessary to stub out __libc_disable_asynccancel and __libc_enable_asynccancel via rtld-stubbed-symbols because the new direct references to the unwinder result in symbol conflicts when the rtld exception handling from libc is linked in during the construction of librtld.map. unwind-forcedunwind.c is merged into unwind-resume.c. libc now needs the functions that were previously only used in libpthread. The GLIBC_PRIVATE exports of __libc_longjmp and __libc_siglongjmp are no longer needed, so switch them to hidden symbols. The symbol __pthread_unwind_next has been moved using scripts/move-symbol-to-libc.py. Reviewed-by: Adhemerva Zanella <adhemerval.zanella@linaro.org>
*	powerpc: Update libm test ulps	Tulio Magno Quites Machado Filho	2021-04-09	1	-10/+10
\| \| \| \|	Update after commit 43576de04afc6a0896a3ecc094e1581069a0652a.
*	Fix the inaccuracy of j0f/j1f/y0f/y1f [BZ #14469, #14470, #14471, #14472]	Paul Zimmermann	2021-04-02	1	-31/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For j0f/j1f/y0f/y1f, the largest error for all binary32 inputs is reduced to at most 9 ulps for all rounding modes. The new code is enabled only when there is a cancellation at the very end of the j0f/j1f/y0f/y1f computation, or for very large inputs, thus should not give any visible slowdown on average. Two different algorithms are used: * around the first 64 zeros of j0/j1/y0/y1, approximation polynomials of degree 3 are used, computed using the Sollya tool (https://www.sollya.org/) * for large inputs, an asymptotic formula from [1] is used [1] Fast and Accurate Bessel Function Computation, John Harrison, Proceedings of Arith 19, 2009. Inputs yielding the new largest errors are added to auto-libm-test-in, and ulps are regenerated for various targets (thanks Adhemerval Zanella). Tested on x86_64 with --disable-multi-arch and on powerpc64le-linux-gnu. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
*	powerpc64le: Use ifunc for _Float128 functions also in libc	Andreas Schwab	2021-04-01	3	-8/+17
\| \| \| \| \| \|	This fixes missing definition of math functions in libc in a static link that are no longer built for libm after commit 4898d9712b ("Avoid adding duplicated symbols into static libraries").
*	powerpc: Add optimized llogb* for POWER9	Raphael Moreira Zinsly	2021-03-16	2	-0/+43
\| \| \| \| \|	The POWER9 builtins used to improve the ilogb* functions can be used in the llogb* functions as well.
*	powerpc: Add optimized ilogb* for POWER9	Raphael Moreira Zinsly	2021-03-16	3	-1/+59
\| \| \| \| \| \|	The instructions xsxexpdp and xsxexpqp introduced on POWER9 extract the exponent from a double-precision and quad-precision floating-point respectively, thus they can be used to improve ilogb, ilogbf and ilogbf128.
*	powerpc: Update libm-test-ulps	Matheus Castanho	2021-03-16	1	-1/+1
\| \| \| \| \| \|	Generated with 'make regen-ulps' on POWER8. Tested on powerpc, powerpc64, and powerpc64le
*	powerpc: Regenerate ulps	Florian Weimer	2021-03-03	1	-10/+10
\| \| \| \|	This time on a POWER8 machine.
*	powerpc: Update libm-test-ulps	Matheus Castanho	2021-03-02	1	-13/+15
\| \| \| \| \| \|	Generated with 'make regen-ulps' Tested on powerpc, powerpc64, and powerpc64le
*	Implement <unwind-link.h> for dynamically loading the libgcc_s unwinder	Florian Weimer	2021-03-01	1	-0/+28
\| \| \| \| \| \| \| \| \| \|	This will be used to consolidate the libgcc_s access for backtrace and pthread_cancel. Unlike the existing backtrace implementations, it provides some hardening based on pointer mangling. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
*	Reduce the statically linked startup code [BZ #23323]	Florian Weimer	2021-02-25	2	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It turns out the startup code in csu/elf-init.c has a perfect pair of ROP gadgets (see Marco-Gisbert and Ripoll-Ripoll, "return-to-csu: A New Method to Bypass 64-bit Linux ASLR"). These functions are not needed in dynamically-linked binaries because DT_INIT/DT_INIT_ARRAY are already processed by the dynamic linker. However, the dynamic linker skipped the main program for some reason. For maximum backwards compatibility, this is not changed, and instead, the main map is consulted from __libc_start_main if the init function argument is a NULL pointer. For statically linked binaries, the old approach based on linker symbols is still used because there is nothing else available. A new symbol version __libc_start_main@@GLIBC_2.34 is introduced because new binaries running on an old libc would not run their ELF constructors, leading to difficult-to-debug issues.
*	powerpc64: Workaround sigtramp vdso return call	Raoni Fassina Firmino	2021-01-28	1	-1/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A not so recent kernel change[1] changed how the trampoline `__kernel_sigtramp_rt64` is used to call signal handlers. This was exposed on the test misc/tst-sigcontext-get_pc Before kernel 5.9, the kernel set LR to the trampoline address and jumped directly to the signal handler, and at the end the signal handler, as any other function, would `blr` to the address set. In other words, the trampoline was executed just at the end of the signal handler and the only thing it did was call sigreturn. But since kernel 5.9 the kernel set CTRL to the signal handler and calls to the trampoline code, the trampoline then `bctrl` to the address in CTRL, setting the LR to the next instruction in the middle of the trampoline, when the signal handler returns, the rest of the trampoline code executes the same code as before. Here is the full trampoline code as of kernel 5.11.0-rc5 for reference: V_FUNCTION_BEGIN(__kernel_sigtramp_rt64) .Lsigrt_start: bctrl /* call the handler */ addi r1, r1, __SIGNAL_FRAMESIZE li r0,__NR_rt_sigreturn sc .Lsigrt_end: V_FUNCTION_END(__kernel_sigtramp_rt64) This new behavior breaks how `backtrace()` uses to detect the trampoline frame to correctly reconstruct the stack frame when it is called from inside a signal handling. This workaround rely on the fact that the trampoline code is at very least two (maybe 3?) instructions in size (as it is in the 32 bits version, only on `li` and `sc`), so it is safe to check the return address be in the range __kernel_sigtramp_rt64 .. + 4. [1] subject: powerpc/64/signal: Balance return predictor stack in signal trampoline commit: 0138ba5783ae0dcc799ad401a1e8ac8333790df9 url: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0138ba5783ae0dcc799ad401a1e8ac8333790df9 Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
*	powerpc64: Select POWER9 machine for the scv instruction	Florian Weimer	2021-01-22	1	-0/+3
\| \| \| \| \| \| \| \| \|	It is not available with the baseline ISA. Fixes commit 68ab82f56690ada86ac1e0c46bad06ba189a10ef ("powerpc: Runtime selection between sc and scv for syscalls"). Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
*	Update powerpc-nofpu libm-test-ulps.	Joseph Myers	2021-01-18	1	-16/+17
\|
*	Update copyright dates with scripts/update-copyrights	Paul Eggert	2021-01-02	596	-596/+596
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I used these shell commands: ../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright (cd ../glibc && git commit -am"[this commit message]") and then ignored the output, which consisted lines saying "FOO: warning: copyright statement not found" for each of 6694 files FOO. I then removed trailing white space from benchtests/bench-pthread-locks.c and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this diagnostic from Savannah: remote: * pre-commit check failed ... remote: * error: lines with trailing whitespace found remote: error: hook declined to update refs/heads/master
*	powerpc: Runtime selection between sc and scv for syscalls	Matheus Castanho	2020-12-30	2	-9/+126
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Linux kernel v5.9 added support for system calls using the scv instruction for POWER9 and later. The new codepath provides better performance (see below) if compared to using sc. For the foreseeable future, both sc and scv mechanisms will co-exist, so this patch enables glibc to do a runtime check and use scv when it is available. Before issuing the system call to the kernel, we check hwcap2 in the TCB for PPC_FEATURE2_SCV to see if scv is supported by the kernel. If not, we fallback to sc and keep the old behavior. The kernel implements a different error return convention for scv, so when returning from a system call we need to handle the return value differently depending on the instruction we used to enter the kernel. For syscalls implemented in ASM, entry and exit are implemented by different macros (PSEUDO and PSEUDO_RET, resp.), which may be used in sequence (e.g. for templated syscalls) or with other instructions in between (e.g. clone). To avoid accessing the TCB a second time on PSEUDO_RET to check which instruction we used, the value read from hwcap2 is cached on a non-volatile register. This is not needed when using INTERNAL_SYSCALL macro, since entry and exit are bundled into the same inline asm directive. The dynamic loader may issue syscalls before the TCB has been setup so it always uses sc with no extra checks. For the static case, there is no compile-time way to determine if we are inside startup code, so we also check the value of the thread pointer before effectively accessing the TCB. For such situations in which the availability of scv cannot be determined, sc is always used. Support for scv in syscalls implemented in their own ASM file (clone and vfork) will be added later. For now simply use sc as before. Average performance over 1M calls for each syscall "type": - stat: C wrapper calling INTERNAL_SYSCALL - getpid: templated ASM syscall - syscall: call to gettid using syscall function Standard: stat : 1.573445 us / ~3619 cycles getpid : 0.164986 us / ~379 cycles syscall : 0.162743 us / ~374 cycles With scv: stat : 1.537049 us / ~3535 cycles <~ -84 cycles / -2.32% getpid : 0.109923 us / ~253 cycles <~ -126 cycles / -33.25% syscall : 0.116410 us / ~268 cycles <~ -106 cycles / -28.34% Tested on powerpc, powerpc64, powerpc64le (with and without scv) Tested-by: Lucas A. M. Magalhães <lamm@linux.ibm.com> Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
*	powerpc: Regenerate ulps	Florian Weimer	2020-12-22	1	-12/+13
\| \| \| \| \|	For new inputs added in commit cad5ad81d2f7f58a7ad0d8afa8c1b710, as seen on a POWER8 system.
*	powerpc64le: Add glibc-hwcaps support	Florian Weimer	2020-12-04	3	-0/+128
\| \| \| \| \|	The "power10" and "power9" subdirectories are selected in a way that matches the -mcpu=power10 and -mcpu=power9 options of GCC.
*	powerpc64le: ifunc select *f128 routines in multiarch mode	Paul E. Murphy	2020-11-30	16	-197/+817
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Programatically generate simple wrappers for interesting libm *f128 objects. Selected functions are transcendental functions or those with trivial compiler builtins. This can result in a 2-3x speedup (e.g logf128 and expf128). A second set of implementation files are generated which include the first implementation encountered along the search path. This usually works, except when a wrapper is overriden and makefile search order slightly diverges from include order. Likewise, wrapper object files are created for each generated file. These hold the ifunc selection routines which export ABI. Next, several shared headers are intercepted to control renaming of asm function redirects are used first, and sometimes macro renames if the former is impractical. Notably, if the request machine supports hardware IEEE128 (i.e POWER9 and newer) this ifunc machinery is disabled. Likewise existing ifunc support for float128 is consolidated into this (e.g sqrtf128 and fmaf128). Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
*	powerpc: Make PT_THREAD_POINTER available to assembly code	Matheus Castanho	2020-11-24	1	-10/+16
\| \| \| \| \| \| \| \|	PT_THREAD_POINTER is currenty defined inside a #ifndef __ASSEMBLER__ block, but its usage should not be limited to C code, as it can be useful when accessing the TLS from assembly code as well. Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
*	nptl: Move stack list variables into _rtld_global	Florian Weimer	2020-11-16	1	-2/+0
\| \| \| \| \| \| \| \| \|	Now __thread_gscope_wait (the function behind THREAD_GSCOPE_WAIT, formerly __wait_lookup_done) can be implemented directly in ld.so, eliminating the unprotected GL (dl_wait_lookup_done) function pointer. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
*	powerpc: Eliminate UP macro conditionals	Florian Weimer	2020-11-13	3	-14/+5
\| \| \| \| \| \|	The macro is never defined. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
*	powerpc: Add optimized stpncpy for POWER9	Raphael M Zinsly	2020-11-12	6	-2/+135
\| \| \| \| \| \| \|	Add stpncpy support into the POWER9 strncpy. Reviewed-by: Matheus Castanho <msc@linux.ibm.com> Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
*	powerpc: Add optimized strncpy for POWER9	Raphael M Zinsly	2020-11-12	5	-1/+391
\| \| \| \| \| \| \| \|	Similar to the strcpy P9 optimization, this version uses VSX to improve performance. Reviewed-by: Matheus Castanho <msc@linux.ibm.com> Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
*	aarch64: enforce >=64K guard size [BZ #26691]	Szabolcs Nagy	2020-10-02	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are several compiler implementations that allow large stack allocations to jump over the guard page at the end of the stack and corrupt memory beyond that. See CVE-2017-1000364. Compilers can emit code to probe the stack such that the guard page cannot be skipped, but on aarch64 the probe interval is 64K by default instead of the minimum supported page size (4K). This patch enforces at least 64K guard on aarch64 unless the guard is disabled by setting its size to 0. For backward compatibility reasons the increased guard is not reported, so it is only observable by exhausting the address space or parsing /proc/self/maps on linux. On other targets the patch has no effect. If the stack probe interval is larger than a page size on a target then ARCH_MIN_GUARD_SIZE can be defined to get large enough stack guard on libc allocated stacks. The patch does not affect threads with user allocated stacks. Fixes bug 26691.
*	powerpc: Protect dl_powerpc_cpu_features on INIT_ARCH() [BZ #26615]	Raphael Moreira Zinsly	2020-09-22	1	-1/+1
\| \| \| \| \| \| \|	dl_powerpc_cpu_features also needs to be protected by __GLRO to check for the _rtld_global_ro realocation before accessing it. Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
*	powerpc: fix ifunc implementation list for POWER9 strlen and stpcpy	Raphael Moreira Zinsly	2020-09-17	1	-2/+2
\| \| \| \| \|	__strlen_power9 and __stpcpy_power9 were added to their ifunc lists using the wrong function names.
*	Update powerpc libm-test-ulps	Matheus Castanho	2020-09-10	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	Before this patch, the following tests were failing: ppc and ppc64: FAIL: math/test-ldouble-j0 ppc64le: FAIL: math/test-float128-j0 FAIL: math/test-float64x-j0 FAIL: math/test-ibm128-j0 FAIL: math/test-ldouble-j0
*	powerpc: Fix incorrect cache line size load in memset (bug 26332)	Florian Weimer	2020-08-03	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	__GLRO loaded the word after the requested variable on big-endian PowerPC, where LOWORD is 4. This can cause the memset implement go wrong because the masking with the cache line size produces wrong results, particularly if the loaded value happens to be 1. The __GLRO macro is not used in any place where loading the lower 32-bit word of a 64-bit value is desired, so the +4 offset is always wrong. Fixes commit 18363b4f010da9ba459b13310b113ac0647c2fcc ("powerpc: Move cache line size to rtld_global_ro") and bug 26332. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
*	powerpc: Fix POWER10 selection	Tulio Magno Quites Machado Filho	2020-07-21	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	Add a line that was missing from a previous commit. Without increasing str, the null-byte is not validated, and _dl_string_platform returns -1. Fixes: d2ba3677da7a ("powerpc: Add support for POWER10") Reviewed-by: Carlos O'Donell <carlos@redhat.com>
*	powerpc64le: guarantee a .gnu.attributes section [BZ #26220]	Paul E. Murphy	2020-07-21	1	-0/+8
\| \| \| \| \| \| \| \| \| \|	Upstream GCC 11 development is now building the ibm128 runtime support (in libgcc) without a .gnu.attributes section on ppc64le. Ensure we have one to replace by building one ibm128 file in libc and libm with attributes. Reviewed-by: Carlos O'Donell <carlos@redhat.com> Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
*	Update powerpc-nofpu libm-test-ulps.	Joseph Myers	2020-07-20	1	-30/+30
\|
*	powerpc64: Fix calls when r2 is not used [BZ #26173]	Tulio Magno Quites Machado Filho	2020-07-10	5	-3/+48
\| \| \| \| \| \| \| \| \|	Teach the linker that __mcount_internal, __sigjmp_save_symbol, __syscall_error and __GI_exit do not use r2, so that it does not need to recover r2 after the call. Test at configure time if the assembler supports @notoc and define USE_PPC64_NOTOC.
*	powerpc: Add support for POWER10	Tulio Magno Quites Machado Filho	2020-06-29	14	-3/+26
\| \| \| \| \| \| \| \|	1. Add the directories to hold POWER10 files. 2. Add support to select POWER10 libraries based on AT_PLATFORM. 3. Let submachine=power10 be set automatically.
*	powerpc: Add new hwcap values	Tulio Magno Quites Machado Filho	2020-06-23	2	-1/+3
\| \| \| \| \| \| \| \| \|	Linux commit ID ee988c11acf6f9464b7b44e9a091bf6afb3b3a49 reserved 2 new bits in AT_HWCAP2: - PPC_FEATURE2_ARCH_3_1 indicates the availability of the POWER ISA 3.1; - PPC_FEATURE2_MMA indicates the availability of the Matrix-Multiply Assist facility.
*	powerpc: Use sqrt{f} builtin	Adhemerval Zanella	2020-06-22	3	-80/+42
\| \| \| \| \| \| \| \| \| \|	The powerpc sqrt implementation is also simplified: - the static constants are open coded within the implementation. - for !USE_SQRT_BUILTIN the function is implemented directly on __ieee754_sqrt (it avoid an superflous extra jump). Checked on powerpc-linux-gnu and powerpc64le-linux-gnu.
*	math: Decompose math-use-builtins.h	Adhemerval Zanella	2020-06-22	2	-77/+9
\| \| \| \| \| \| \| \| \| \| \|	Each symbol definitions are moved on a separated file and it cover all symbol type definitions (float, double, long double, and float128). It allows to set support for architectures without the boiler place of copying default values. Checked with a build on the affected ABIs.
*	powerpc64le: refactor e_sqrtf128.c	Paul E. Murphy	2020-06-16	2	-39/+7
\| \| \| \| \|	Combine both implementations into a single file to allow building twice with appropriate multiarch support when possible.
*	powerpc: Automatic CPU detection in preconfigure	Paul E. Murphy	2020-06-11	2	-7/+113
\| \| \| \| \| \| \| \| \|	Added a check to detect the CPU value in preconfigure, so that glibc is built with the correct --with-cpu value. And move existing checks into preconfigure.ac. Co-Authored-By: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com> Co-Authored-By: Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com>
*	powerpc64le: add optimized strlen for P9	Paul E. Murphy	2020-06-05	6	-1/+226
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This started as a trivial change to Anton's rawmemchr. I got carried away. This is a hybrid between P8's asympotically faster 64B checks with extremely efficient small string checks e.g <64B (and sometimes a little bit more depending on alignment). The second trick is to align to 64B by running a 48B checking loop 16B at a time until we naturally align to 64B (i.e checking 48/96/144 bytes/iteration based on the alignment after the first 5 comparisons). This allieviates the need to check page boundaries. Finally, explicly use the P7 strlen with the runtime loader when building P9. We need to be cautious about vector/vsx extensions here on P9 only builds.
*	powerpc64le: use common fmaf128 implementation	Paul E. Murphy	2020-06-05	3	-38/+10
\| \| \| \| \| \| \| \| \|	This defines the macro such that it should behave best on all supported powerpc targets. Likewise, this allows us to remove the ppc64le specific s_fmaf128.c. I have verified powerpc64le multiarch and powerpc64le power9 no-multiarch builds continue to generate optimize fmaf128.
*	powerpc: Fix powerpc64le due a7a3435c9a	Adhemerval Zanella	2020-06-04	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The build uses an undefined macro evaluation for fmaf128 build. For now set USE_FMAL_BUILTIN and USE_FMAF128_BUILTIN to 0. Checked with a build for: powerpc64le-linux-gnu-power9-disable-multi-arch powerpc64le-linux-gnu-power9 powerpc64le-linux-gnu powerpc64-linux-gnu-power8 powerpc64-linux-gnu powerpc-linux-gnu-power4 powerpc-linux-gnu
*	powerpc/fpu: use generic fma functions	Vineet Gupta	2020-06-03	3	-54/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Tested with build-many-glibcs for powerpc-linux-gnu This is a non functional change and powerpc libm before/after was byte invariant as compared below: \| cd /SCRATCH/vgupta/gnu/install-glibc-A-baseline \| for i in `find . -name libm-2.31.9000.so`; do \| echo $i; diff $i /SCRATCH/vgupta/gnu/install-glibc-C-reduce-scope/$i ; \| echo $?; \| done \| ./aarch64-linux-gnu/lib64/libm-2.31.9000.so \| 0 \| ./arm-linux-gnueabi/lib/libm-2.31.9000.so \| 0 \| ./x86_64-linux-gnu/lib64/libm-2.31.9000.so \| 0 \| ./arm-linux-gnueabihf/lib/libm-2.31.9000.so \| 0 \| ./riscv64-linux-gnu-rv64imac-lp64/lib64/lp64/libm-2.31.9000.so \| 0 \| ./riscv64-linux-gnu-rv64imafdc-lp64/lib64/lp64/libm-2.31.9000.so \| 0 \| ./powerpc-linux-gnu/lib/libm-2.31.9000.so \| 0 \| ./microblaze-linux-gnu/lib/libm-2.31.9000.so \| 0 \| ./nios2-linux-gnu/lib/libm-2.31.9000.so \| 0 \| ./hppa-linux-gnu/lib/libm-2.31.9000.so \| 0 \| ./s390x-linux-gnu/lib64/libm-2.31.9000.so \| 0 Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
*	powerpc: Optimized rawmemchr for POWER9	Anton Blanchard	2020-05-18	5	-3/+145
\| \| \| \| \| \| \| \| \| \| \|	This version uses vector instructions and is up to 60% faster on medium matches and up to 90% faster on long matches, compared to the POWER7 version. A few examples: __rawmemchr_power9 __rawmemchr_power7 Length 32, alignment 0: 2.27566 3.77765 Length 64, alignment 2: 2.46231 3.51064 Length 1024, alignment 0: 17.3059 32.6678
*	powerpc: Optimized stpcpy for POWER9	Anton Blanchard via Libc-alpha	2020-05-18	6	-21/+123
\| \| \| \| \| \| \| \| \| \|	Add stpcpy support to the POWER9 strcpy. This is up to 40% faster on small strings and up to 90% faster on long relatively unaligned strings, compared to the POWER8 version. A few examples: __stpcpy_power9 __stpcpy_power8 Length 20, alignments in bytes 4/ 4: 2.58246 4.8788 Length 1024, alignments in bytes 1/ 6: 24.8186 47.8528