about summary refs log tree commit diff
path: root/sysdeps/powerpc
Commit message (Collapse)AuthorAgeFilesLines
* PowerPC: multiarch wcschr for PowerPC32Adhemerval Zanella2013-12-068-6/+142
|
* PowerPC: multiarch strchr for PowerPC32Adhemerval Zanella2013-12-066-1/+143
|
* PowerPC: multiarch strchrnul for PowerPC32Adhemerval Zanella2013-12-065-1/+113
|
* PowerPC: multiarch strncasecmp for PowerPC32Adhemerval Zanella2013-12-066-1/+155
|
* PowerPC: multiarch strcasecmp for PowerPC32Adhemerval Zanella2013-12-066-1/+179
|
* PowerPC: multiarch strncmp for PowerPC32Adhemerval Zanella2013-12-065-1/+121
|
* PowerPC: multiarch strnlen for PowerPC32Adhemerval Zanella2013-12-066-1/+123
|
* PowerPC: multiarch strlen for PowerPC32Adhemerval Zanella2013-12-065-1/+115
|
* PowerPC: multiarch rawmemchr for PowerPC32Adhemerval Zanella2013-12-065-1/+119
|
* PowerPC: multiarch memrchr for PowerPC32Adhemerval Zanella2013-12-065-1/+111
|
* PowerPC: multiarch memchr for PowerPC32Adhemerval Zanella2013-12-065-1/+122
|
* PowerPC: multiarch mempcpy for PowerPC32Adhemerval Zanella2013-12-065-1/+114
|
* PowerPC: multiarch memset/bzero for PowerPC32Adhemerval Zanella2013-12-0611-1/+307
|
* PowerPC: multiarch memcmp for PowerPC32Adhemerval Zanella2013-12-066-1/+146
|
* PowerPC: multiarch memcpy for PowerPC32Adhemerval Zanella2013-12-068-1/+254
|
* PowerPC: initial support for multilib for PowerPC32Adhemerval Zanella2013-12-063-0/+105
| | | | | This patch add a empty Makefile, the C IFUNC helper macros, and a empty available IFUNC implementation enumeration.
* Update powerpc-fpu ULPs.Adhemerval Zanella2013-12-051-8/+2217
|
* PowerPC: Add systemtap static probe points in setjmp/longjmpAdhemerval Zanella2013-12-0511-59/+96
| | | | | | | | | This patch add static probes for setjmp/longjmp in the way gdb expects,fixing the gdb.base/longjmp.exp gdb testcases. It changes the symbol_name and use macros to to avoid change the probe names and ending up adding more logic on GDB (since with the expected name GDB work seamlessly).
* PowerPC64 ELFv2 ABI 5/6: LD_AUDIT interface changesUlrich Weigand2013-12-045-45/+163
| | | | | | | | | | | | | | | | | | | | | | | | The ELFv2 ABI changes the calling convention by passing and returning structures in registers in more cases than the old ABI: http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01145.html http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01147.html For the most part, this does not affect glibc, since glibc assembler files do not use structure parameters / return values. However, one place is affected: the LD_AUDIT interface provides a structure to the audit routine that contains all registers holding function argument and return values for the intercepted PLT call. Since the new ABI now sometimes uses registers to return values that were never used for this purpose in the old ABI, this structure has to be extended. To force audit routines to be modified for the new ABI if necessary, the patch defines v2 variants of the la_ppc64 types and routines. In addition, the patch contains two unrelated changes to the PLT trampoline routines: it fixes a bug where FPR return values were stored in the wrong place, and it removes the unnecessary save/restore of CR.
* PowerPC64 ELFv2 ABI 4/6: Stack frame layout changesUlrich Weigand2013-12-047-65/+109
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This updates glibc for the changes in the ELFv2 relating to the stack frame layout. These are described in more detail here: http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01149.html http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01146.html Specifically, the "compiler and linker doublewords" were removed, which has the effect that the save slot for the TOC register is now at offset 24 rather than 40 to the stack pointer. In addition, a function may now no longer necessarily assume that its caller has set up a 64-byte register save area its use. To address the first change, the patch goes through all assembler files and replaces immediate offsets in instructions accessing the ABI-defined stack slots by symbolic offsets. Those already were defined in ucontext_i.sym and used in some of the context routines, but that doesn't really seem like the right place for those defines. The patch instead defines those symbolic offsets in sysdeps.h, in two variants for the old and new ABI, and uses them systematically in all assembler files, not just the context routines. The second change only affected a few assembler files that used the save area to temporarily store some registers. In those cases where this happens within a leaf function, this patch changes the code to store those registers to the "red zone" below the stack pointer. Otherwise, the functions already allocate a stack frame, and the patch changes them to add extra space in these frames as temporary space for the ELFv2 ABI.
* PowerPC64 ELFv2 ABI 3/6: PLT local entry point optimizationUlrich Weigand2013-12-042-2/+50
| | | | | | | | | | | | | | | | | | | | | | | | | This is a follow-on to the previous patch to support the ELFv2 ABI in the dynamic loader, split off into its own patch since it is just an optional optimization. In the ELFv2 ABI, most functions define both a global and a local entry point; the local entry requires r2 to be already set up by the caller to point to the callee's TOC; while the global entry does not require the caller to know about the callee's TOC, but it needs to set up r12 to the callee's entry point address. Now, when setting up a PLT slot, the dynamic linker will usually need to enter the target function's global entry point. However, if the linker can prove that the target function is in the same DSO as the PLT slot itself, and the whole DSO only uses a single TOC (which the linker will let ld.so know via a DT_PPC64_OPT entry), then it is possible to actually enter the local entry point address into the PLT slot, for a slight improvement in performance. Note that this uncovered a problem on the first call via _dl_runtime_resolve, because that routine neglected to restore the caller's TOC before calling the target function for the first time, since it assumed that function would always reload its own TOC anyway ...
* PowerPC64 ELFv2 ABI 2/6: Remove function descriptorsUlrich Weigand2013-12-045-29/+138
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for the ELFv2 ABI feature to remove function descriptors. See this GCC patch for in-depth discussion: http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01141.html This mostly involves two types of changes: updating assembler source files to the new logic, and updating the dynamic loader. After the refactoring in the previous patch, most of the assembler source changes can be handled simply by providing ELFv2 versions of the macros in sysdep.h. One somewhat non-obvious change is in __GI__setjmp: this used to "fall through" to the immediately following __setjmp ENTRY point. This is no longer safe in the ELFv2 since ENTRY defines both a global and a local entry point, and you cannot simply fall through to a global entry point as it requires r12 to be set up. Also, makecontext needs to be updated to set up registers according to the new ABI for calling into the context's start routine. The dynamic linker changes mostly consist of removing special code to handle function descriptors. We also need to support the new PLT and glink format used by the the ELFv2 linker, see: https://sourceware.org/ml/binutils/2013-10/msg00376.html In addition, the dynamic linker now verifies that the dynamic libraries it loads match its own ABI. The hack in VDSO_IFUNC_RET to "synthesize" a function descriptor for vDSO routines is also no longer necessary for ELFv2.
* PowerPC64 ELFv2 ABI 1/6: Code refactoringUlrich Weigand2013-12-044-59/+38
| | | | | | | | | | | | | | This is the first patch to support the new ELFv2 ABI in glibc. As preparation, this patch simply refactors some of the powerpc64 assembler code to move all code related to creating function descriptors (.opd section) or using function descriptors (function pointer call) into a central place in sysdep.h. Note that most locations creating .opd entries were already using macros in sysdep.h, this patch simply extends this to the remaining places. No relevant change in generated code expected.
* PowerPC64: Report overflow on @h and @ha relocationsAlan Modra2013-12-041-2/+22
| | | | | | | | | | | | | | | | | This patch updates glibc in accordance with the binutils patch checked in here: https://sourceware.org/ml/binutils/2013-10/msg00372.html This changes the various R_PPC64_..._HI and _HA relocations to report 32-bit overflows. The motivation is that existing uses of @h / @ha are to build up 32-bit offsets (for the "medium model" TOC access that GCC now defaults to), and we'd really like to see failures at link / load time rather than silent truncations. For those rare cases where a modifier is needed to build up a 64-bit constant, new relocations _HIGH / _HIGHA are supported. The patch also fixes a bug in overflow checking for the R_PPC64_ADDR30 and R_PPC64_ADDR32 relocations.
* Update powerpc-fpu ULPs.Adhemerval Zanella2013-12-041-2/+260
|
* Update powerpc-fpu ULPs.Adhemerval Zanella2013-12-021-34/+1199
|
* Add powerpc-nofpu/e500 support functions for atomic compound assignment and ↵Joseph Myers2013-11-2812-4/+340
| | | | FLT_ROUNDS.
* Fix dbl-64 e_sqrt.c for non-default rounding modes (bug 16271).Joseph Myers2013-11-283-0/+3
|
* PowerPC: Fix __fe_nomask_env missing symbolAdhemerval Zanella2013-11-266-10/+9
| | | | | This patch fix the missing symbol __fe_nomask_env from commit 41e8926aa4b7f17bc95984737ee82a254ad0911c for GLIBC_2.1.
* Fix powerpc-nofpu build.Joseph Myers2013-11-252-0/+6
|
* PowerPC: Set/restore rounding mode only when neededAdhemerval Zanella2013-11-253-4/+277
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch helps some math functions performance by adding the libc_fexxx variant of inline functions to handle both FPU round and exception set/restore and by using them on the libc_fexxx_ctx functions. It is based on already coded fexxx family functions for PPC with fpu. Here is the summary of performance improvements due this patch (measured on a POWER7 machine): Before: cos(): ITERS:9.5895e+07: TOTAL:5116.03Mcy, MAX:77.6cy, MIN:49.792cy, 18744 calls/Mcy exp(): ITERS:2.827e+07: TOTAL:5187.15Mcy, MAX:494.018cy, MIN:38.422cy, 5450.01 calls/Mcy pow(): ITERS:6.1705e+07: TOTAL:5144.26Mcy, MAX:171.95cy, MIN:29.935cy, 11994.9 calls/Mcy sin(): ITERS:8.6898e+07: TOTAL:5117.06Mcy, MAX:83.841cy, MIN:46.582cy, 16982 calls/Mcy tan(): ITERS:2.9473e+07: TOTAL:5115.39Mcy, MAX:191.017cy, MIN:172.352cy, 5761.63 calls/Mcy After: cos(): ITERS:2.05265e+08: TOTAL:5111.37Mcy, MAX:78.754cy, MIN:24.196cy, 40158.5 calls/Mcy exp(): ITERS:3.341e+07: TOTAL:5170.84Mcy, MAX:476.317cy, MIN:15.574cy, 6461.23 calls/Mcy pow(): ITERS:7.6153e+07: TOTAL:5129.1Mcy, MAX:147.5cy, MIN:30.916cy, 14847.2 calls/Mcy sin(): ITERS:1.58816e+08: TOTAL:5115.11Mcy, MAX:1490.39cy, MIN:22.341cy, 31048.4 calls/Mcy tan(): ITERS:3.4964e+07: TOTAL:5114.18Mcy, MAX:177.422cy, MIN:146.115cy, 6836.68 calls/Mcy
* Make powerpc-nofpu floating-point state thread-local (bug 15483).Joseph Myers2013-11-1918-67/+104
|
* PowerPC: Fix __fe_mask_env exportAdhemerval Zanella2013-11-132-7/+3
| | | | | This patch does not export __fe_mask_env anymore, only providing a compatibility symbol. It fixes BZ#14143.
* rename configure.in to configure.acMike Frysinger2013-10-306-3/+3
| | | | | | | Autoconf has been deprecating configure.in for quite a long time. Rename all our configure.in and preconfigure.in files to .ac. Signed-off-by: Mike Frysinger <vapier@gentoo.org>
* PowerPC: strcpy/stpcpy optimization for PPC64/POWER7Adhemerval Zanella2013-10-254-134/+407
| | | | | | | | | | This patch intends to unify both strcpy and stpcpy implementationsi for PPC64 and PPC64/POWER7. The idead default powerpc64 implementation is to provide both doubleword and word aligned memory access. For PPC64/POWER7 is also provide doubleword and word memory access, remove the branch hints, use the cmpb instruction for compare doubleword/words, and add an optimization for inputs of same alignment.
* Add e500 port.Joseph Myers2013-10-1829-10/+1165
|
* Extend powerpc-nofpu -fno-builtin-fabsl workaround to more files.Joseph Myers2013-10-101-0/+6
|
* Move powerpc ports pieces to libc.Joseph Myers2013-10-0437-0/+9134
|
* e500 port: fix fpu_control.h constant values.Joseph Myers2013-10-041-10/+8
|
* Use stdint.h types in union unaligned.Alan Modra2013-10-042-6/+6
| | | | | | * sysdeps/powerpc/powerpc32/dl-machine.c (__process_machine_rela): Use stdint types in rather than __attribute__((mode())). * sysdeps/powerpc/powerpc64/dl-machine.h (elf_machine_rela): Likewise.
* Correct little-endian relocation of UADDR64,32,16.Alan Modra2013-10-042-24/+18
| | | | | | * sysdeps/powerpc/powerpc32/dl-machine.c (__process_machine_rela): Correct handling of unaligned relocs for little-endian. * sysdeps/powerpc/powerpc64/dl-machine.h (elf_machine_rela): Likewise.
* PowerPC LE memchr and memrchrAlan Modra2013-10-046-382/+404
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | http://sourceware.org/ml/libc-alpha/2013-08/msg00105.html Like strnlen, memchr and memrchr had a number of defects fixed by this patch as well as adding little-endian support. The first one I noticed was that the entry to the main loop needlessly checked for "are we done yet?" when we know the size is large enough that we can't be done. The second defect I noticed was that the main loop count was wrong, which in turn meant that the small loop needed to handle an extra word. Thirdly, there is nothing to say that the string can't wrap around zero, except of course that we'd normally hit a segfault on trying to read from address zero. Fixing that simplified a number of places: - /* Are we done already? */ - addi r9,r8,8 - cmpld r9,r7 - bge L(null) becomes + cmpld r8,r7 + beqlr However, the exit gets an extra test because I test for being on the last word then if so whether the byte offset is less than the end. Overall, the change is a win. Lastly, memrchr used the wrong cache hint. * sysdeps/powerpc/powerpc64/power7/memchr.S: Replace rlwimi with insrdi. Make better use of reg selection to speed exit slightly. Schedule entry path a little better. Remove useless "are we done" checks on entry to main loop. Handle wrapping around zero address. Correct main loop count. Handle single left-over word from main loop inline rather than by using loop_small. Remove extra word case in loop_small caused by wrong loop count. Add little-endian support. * sysdeps/powerpc/powerpc32/power7/memchr.S: Likewise. * sysdeps/powerpc/powerpc64/power7/memrchr.S: Likewise. Use proper cache hint. * sysdeps/powerpc/powerpc32/power7/memrchr.S: Likewise. * sysdeps/powerpc/powerpc64/power7/rawmemchr.S: Add little-endian support. Avoid rlwimi. * sysdeps/powerpc/powerpc32/power7/rawmemchr.S: Likewise.
* PowerPC LE memsetAlan Modra2013-10-047-34/+34
| | | | | | | | | | | | | | | | | http://sourceware.org/ml/libc-alpha/2013-08/msg00104.html One of the things I noticed when looking at power7 timing is that rlwimi is cracked and the two resulting insns have a register dependency. That makes it a little slower than the equivalent rldimi. * sysdeps/powerpc/powerpc64/memset.S: Replace rlwimi with insrdi. Formatting. * sysdeps/powerpc/powerpc64/power4/memset.S: Likewise. * sysdeps/powerpc/powerpc64/power6/memset.S: Likewise. * sysdeps/powerpc/powerpc64/power7/memset.S: Likewise. * sysdeps/powerpc/powerpc32/power4/memset.S: Likewise. * sysdeps/powerpc/powerpc32/power6/memset.S: Likewise. * sysdeps/powerpc/powerpc32/power7/memset.S: Likewise.
* PowerPC LE memcpyAlan Modra2013-10-049-410/+928
| | | | | | | | | | | | | | | | | | | | http://sourceware.org/ml/libc-alpha/2013-08/msg00103.html LIttle-endian support for memcpy. I spent some time cleaning up the 64-bit power7 memcpy, in order to avoid the extra alignment traps power7 takes for little-endian. It probably would have been better to copy the linux kernel version of memcpy. * sysdeps/powerpc/powerpc32/power4/memcpy.S: Add little endian support. * sysdeps/powerpc/powerpc32/power6/memcpy.S: Likewise. * sysdeps/powerpc/powerpc32/power7/memcpy.S: Likewise. * sysdeps/powerpc/powerpc32/power7/mempcpy.S: Likewise. * sysdeps/powerpc/powerpc64/memcpy.S: Likewise. * sysdeps/powerpc/powerpc64/power4/memcpy.S: Likewise. * sysdeps/powerpc/powerpc64/power6/memcpy.S: Likewise. * sysdeps/powerpc/powerpc64/power7/memcpy.S: Likewise. * sysdeps/powerpc/powerpc64/power7/mempcpy.S: Likewise. Make better use of regs. Use power7 mtocrf. Tidy function tails.
* PowerPC LE memcmpAlan Modra2013-10-044-1893/+3451
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | http://sourceware.org/ml/libc-alpha/2013-08/msg00102.html This is a rather large patch due to formatting and renaming. The formatting changes were to make it possible to compare power7 and power4 versions of memcmp. Using different register defines came about while I was wrestling with the code, trying to find spare registers at one stage. I found it much simpler if we refer to a reg by the same name throughout a function, so it's better if short-term multiple use regs like rTMP are referred to using their register number. I made the cr field usage changes when attempting to reload rWORDn regs in the exit path to byte swap before comparing when little-endian. That proved a bad idea due to the pipelining involved in the main loop; Offsets to reload the regs were different first time around the loop.. Anyway, I left the cr field usage changes in place for consistency. Aside from these more-or-less cosmetic changes, I fixed a number of places where an early exit path restores regs unnecessarily, removed some dead code, and optimised one or two exits. * sysdeps/powerpc/powerpc64/power7/memcmp.S: Add little-endian support. Formatting. Consistently use rXXX register defines or rN defines. Use early exit labels that avoid restoring unused non-volatile regs. Make cr field use more consistent with rWORDn compares. Rename regs used as shift registers for unaligned loop, using rN defines for short lifetime/multiple use regs. * sysdeps/powerpc/powerpc64/power4/memcmp.S: Likewise. * sysdeps/powerpc/powerpc32/power7/memcmp.S: Likewise. Exit with addi 1,1,64 to pop stack frame. Simplify return value code. * sysdeps/powerpc/powerpc32/power4/memcmp.S: Likewise.
* PowerPC LE strchrAlan Modra2013-10-046-74/+212
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | http://sourceware.org/ml/libc-alpha/2013-08/msg00101.html Adds little-endian support to optimised strchr assembly. I've also tweaked the big-endian code a little. In power7/strchr.S there's a check in the tail of the function that we didn't match 0 before finding a c match, done by comparing leading zero counts. It's just as valid, and quicker, to compare the raw output from cmpb. Another little tweak is to use rldimi/insrdi in place of rlwimi for the power7 strchr functions. Since rlwimi is cracked, it is a few cycles slower. rldimi can be used on the 32-bit power7 functions too. * sysdeps/powerpc/powerpc64/power7/strchr.S (strchr): Add little-endian support. Correct typos, formatting. Optimize tail. Use insrdi rather than rlwimi. * sysdeps/powerpc/powerpc32/power7/strchr.S: Likewise. * sysdeps/powerpc/powerpc64/power7/strchrnul.S (__strchrnul): Add little-endian support. Correct typos. * sysdeps/powerpc/powerpc32/power7/strchrnul.S: Likewise. Use insrdi rather than rlwimi. * sysdeps/powerpc/powerpc64/strchr.S (rTMP4, rTMP5): Define. Use in loop and entry code to keep "and." results. (strchr): Add little-endian support. Comment. Move cntlzd earlier in tail. * sysdeps/powerpc/powerpc32/strchr.S: Likewise.
* PowerPC LE strcpyAlan Modra2013-10-044-3/+78
| | | | | | | | | | | | | | | http://sourceware.org/ml/libc-alpha/2013-08/msg00100.html The strcpy changes for little-endian are quite straight-forward, just a matter of rotating the last word differently. I'll note that the powerpc64 version of stpcpy is just begging to be converted to use 64-bit loads and stores.. * sysdeps/powerpc/powerpc64/strcpy.S: Add little-endian support: * sysdeps/powerpc/powerpc32/strcpy.S: Likewise. * sysdeps/powerpc/powerpc64/stpcpy.S: Likewise. * sysdeps/powerpc/powerpc32/stpcpy.S: Likewise.
* PowerPC LE strcmp and strncmpAlan Modra2013-10-048-81/+380
| | | | | | | | | | | | | | | | | | | | | | http://sourceware.org/ml/libc-alpha/2013-08/msg00099.html More little-endian support. I leave the main strcmp loops unchanged, (well, except for renumbering rTMP to something other than r0 since it's needed in an addi insn) and modify the tail for little-endian. I noticed some of the big-endian tail code was a little untidy so have cleaned that up too. * sysdeps/powerpc/powerpc64/strcmp.S (rTMP2): Define as r0. (rTMP): Define as r11. (strcmp): Add little-endian support. Optimise tail. * sysdeps/powerpc/powerpc32/strcmp.S: Similarly. * sysdeps/powerpc/powerpc64/strncmp.S: Likewise. * sysdeps/powerpc/powerpc32/strncmp.S: Likewise. * sysdeps/powerpc/powerpc64/power4/strncmp.S: Likewise. * sysdeps/powerpc/powerpc32/power4/strncmp.S: Likewise. * sysdeps/powerpc/powerpc64/power7/strncmp.S: Likewise. * sysdeps/powerpc/powerpc32/power7/strncmp.S: Likewise.
* PowerPC LE strnlenAlan Modra2013-10-042-102/+115
| | | | | | | | | | | | | | | | http://sourceware.org/ml/libc-alpha/2013-08/msg00098.html The existing strnlen code has a number of defects, so this patch is more than just adding little-endian support. The changes here are similar to those for memchr. * sysdeps/powerpc/powerpc64/power7/strnlen.S (strnlen): Add little-endian support. Remove unnecessary "are we done" tests. Handle "s" wrapping around zero and extremely large "size". Correct main loop count. Handle single left-over word from main loop inline rather than by using small_loop. Correct comments. Delete "zero" tail, use "end_max" instead. * sysdeps/powerpc/powerpc32/power7/strnlen.S: Likewise.
* PowerPC LE strlenAlan Modra2013-10-044-47/+131
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | http://sourceware.org/ml/libc-alpha/2013-08/msg00097.html This is the first of nine patches adding little-endian support to the existing optimised string and memory functions. I did spend some time with a power7 simulator looking at cycle by cycle behaviour for memchr, but most of these patches have not been run on cpu simulators to check that we are going as fast as possible. I'm sure PowerPC can do better. However, the little-endian support mostly leaves main loops unchanged, so I'm banking on previous authors having done a good job on big-endian.. As with most code you stare at long enough, I found some improvements for big-endian too. Little-endian support for strlen. Like most of the string functions, I leave the main word or multiple-word loops substantially unchanged, just needing to modify the tail. Removing the branch in the power7 functions is just a tidy. .align produces a branch anyway. Modifying regs in the non-power7 functions is to suit the new little-endian tail. * sysdeps/powerpc/powerpc64/power7/strlen.S (strlen): Add little-endian support. Don't branch over align. * sysdeps/powerpc/powerpc32/power7/strlen.S: Likewise. * sysdeps/powerpc/powerpc64/strlen.S (strlen): Add little-endian support. Rearrange tmp reg use to suit. Comment. * sysdeps/powerpc/powerpc32/strlen.S: Likewise.