mirror/glibc - mirror of git://sourceware.org/git/glibc.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	x86-64: Improve branch predication in _dl_runtime_resolve_avx512_opt [BZ #21258]	H.J. Lu	2017-03-21	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On Skylake server, _dl_runtime_resolve_avx512_opt is used to preserve the first 8 vector registers. The code layout is if only %xmm0 - %xmm7 registers are used preserve %xmm0 - %xmm7 registers if only %ymm0 - %ymm7 registers are used preserve %ymm0 - %ymm7 registers preserve %zmm0 - %zmm7 registers Branch predication always executes the fallthrough code path to preserve %zmm0 - %zmm7 registers speculatively, even though only %xmm0 - %xmm7 registers are used. This leads to lower CPU frequency on Skylake server. This patch changes the fallthrough code path to preserve %xmm0 - %xmm7 registers instead: if whole %zmm0 - %zmm7 registers are used preserve %zmm0 - %zmm7 registers if only %ymm0 - %ymm7 registers are used preserve %ymm0 - %ymm7 registers preserve %xmm0 - %xmm7 registers Tested on Skylake server. [BZ #21258] * sysdeps/x86_64/dl-trampoline.S (_dl_runtime_resolve_opt): Define only if _dl_runtime_resolve is defined to _dl_runtime_resolve_sse_vex. * sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_opt): Fallthrough to _dl_runtime_resolve_sse_vex.
*	Update copyright dates with scripts/update-copyrights.	Joseph Myers	2017-01-01	1	-1/+1
\|
*	X86-64: Add _dl_runtime_resolve_avx[512]_{opt\|slow} [BZ #20508]	H.J. Lu	2016-09-06	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is transition penalty when SSE instructions are mixed with 256-bit AVX or 512-bit AVX512 load instructions. Since _dl_runtime_resolve_avx and _dl_runtime_profile_avx512 save/restore 256-bit YMM/512-bit ZMM registers, there is transition penalty when SSE instructions are used with lazy binding on AVX and AVX512 processors. To avoid SSE transition penalty, if only the lower 128 bits of the first 8 vector registers are non-zero, we can preserve %xmm0 - %xmm7 registers with the zero upper bits. For AVX and AVX512 processors which support XGETBV with ECX == 1, we can use XGETBV with ECX == 1 to check if the upper 128 bits of YMM registers or the upper 256 bits of ZMM registers are zero. We can restore only the non-zero portion of vector registers with AVX/AVX512 load instructions which will zero-extend upper bits of vector registers. This patch adds _dl_runtime_resolve_sse_vex which saves and restores XMM registers with 128-bit AVX store/load instructions. It is used to preserve YMM/ZMM registers when only the lower 128 bits are non-zero. _dl_runtime_resolve_avx_opt and _dl_runtime_resolve_avx512_opt are added and used on AVX/AVX512 processors supporting XGETBV with ECX == 1 so that we store and load only the non-zero portion of vector registers. This avoids SSE transition penalty caused by _dl_runtime_resolve_avx and _dl_runtime_profile_avx512 when only the lower 128 bits of vector registers are used. _dl_runtime_resolve_avx_slow is added and used for AVX processors which don't support XGETBV with ECX == 1. Since there is no SSE transition penalty on AVX512 processors which don't support XGETBV with ECX == 1, _dl_runtime_resolve_avx512_slow isn't provided. [BZ #20495] [BZ #20508] * sysdeps/x86/cpu-features.c (init_cpu_features): For Intel processors, set Use_dl_runtime_resolve_slow and set Use_dl_runtime_resolve_opt if XGETBV suports ECX == 1. * sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt): New. (bit_arch_Use_dl_runtime_resolve_slow): Likewise. (index_arch_Use_dl_runtime_resolve_opt): Likewise. (index_arch_Use_dl_runtime_resolve_slow): Likewise. * sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup): Use _dl_runtime_resolve_avx512_opt and _dl_runtime_resolve_avx_opt if Use_dl_runtime_resolve_opt is set. Use _dl_runtime_resolve_slow if Use_dl_runtime_resolve_slow is set. * sysdeps/x86_64/dl-trampoline.S: Include <cpu-features.h>. (_dl_runtime_resolve_opt): New. Defined for AVX and AVX512. (_dl_runtime_resolve): Add one for _dl_runtime_resolve_sse_vex. * sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx_slow): New. (_dl_runtime_resolve_opt): Likewise. (_dl_runtime_profile): Define only if _dl_runtime_profile is defined.
*	Require binutils 2.24 to build x86-64 glibc [BZ #20139]	H.J. Lu	2016-07-01	1	-22/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If assembler doesn't support AVX512DQ, _dl_runtime_resolve_avx is used to save the first 8 vector registers, which only saves the lower 256 bits of vector register, for lazy binding. When it is called on AVX512 platform, the upper 256 bits of ZMM registers are clobbered. Parameters passed in ZMM registers will be wrong when the function is called the first time. This patch requires binutils 2.24, whose assembler can store and load ZMM registers, to build x86-64 glibc. Since mathvec library needs assembler support for AVX512DQ, we disable mathvec if assembler doesn't support AVX512DQ. [BZ #20139] * config.h.in (HAVE_AVX512_ASM_SUPPORT): Renamed to ... (HAVE_AVX512DQ_ASM_SUPPORT): This. * sysdeps/x86_64/configure.ac: Require assembler from binutils 2.24 or above. (HAVE_AVX512_ASM_SUPPORT): Removed. (HAVE_AVX512DQ_ASM_SUPPORT): New. * sysdeps/x86_64/configure: Regenerated. * sysdeps/x86_64/dl-trampoline.S: Make HAVE_AVX512_ASM_SUPPORT check unconditional. * sysdeps/x86_64/multiarch/ifunc-impl-list.c: Likewise. * sysdeps/x86_64/multiarch/memcpy.S: Likewise. * sysdeps/x86_64/multiarch/memcpy_chk.S: Likewise. * sysdeps/x86_64/multiarch/memmove-avx512-no-vzeroupper.S: Likewise. * sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memmove.S: Likewise. * sysdeps/x86_64/multiarch/memmove_chk.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy_chk.S: Likewise. * sysdeps/x86_64/multiarch/memset-avx512-no-vzeroupper.S: Likewise. * sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memset.S: Likewise. * sysdeps/x86_64/multiarch/memset_chk.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_cos8_core_avx512.S: Check HAVE_AVX512DQ_ASM_SUPPORT instead of HAVE_AVX512_ASM_SUPPORT. * sysdeps/x86_64/fpu/multiarch/svml_d_exp8_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_log8_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_pow8_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sin8_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core_avx512.: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_cosf16_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_logf16_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_powf16_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core_avx51: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sinf16_core_avx512.S: Likewise.
*	[x86_64] Set DL_RUNTIME_UNALIGNED_VEC_SIZE to 8	H.J. Lu	2016-02-19	1	-8/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Due to GCC bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066 __tls_get_addr may be called with 8-byte stack alignment. Although this bug has been fixed in GCC 4.9.4, 5.3 and 6, we can't assume that stack will be always aligned at 16 bytes. Since SSE optimized memory/string functions with aligned SSE register load and store are used in the dynamic linker, we must set DL_RUNTIME_UNALIGNED_VEC_SIZE to 8 so that _dl_runtime_resolve_sse will align the stack before calling _dl_fixup: Dump of assembler code for function _dl_runtime_resolve_sse: 0x00007ffff7deea90 <+0>: push %rbx 0x00007ffff7deea91 <+1>: mov %rsp,%rbx 0x00007ffff7deea94 <+4>: and $0xfffffffffffffff0,%rsp ^^^^^^^^^^^ Align stack to 16 bytes 0x00007ffff7deea98 <+8>: sub $0x100,%rsp 0x00007ffff7deea9f <+15>: mov %rax,0xc0(%rsp) 0x00007ffff7deeaa7 <+23>: mov %rcx,0xc8(%rsp) 0x00007ffff7deeaaf <+31>: mov %rdx,0xd0(%rsp) 0x00007ffff7deeab7 <+39>: mov %rsi,0xd8(%rsp) 0x00007ffff7deeabf <+47>: mov %rdi,0xe0(%rsp) 0x00007ffff7deeac7 <+55>: mov %r8,0xe8(%rsp) 0x00007ffff7deeacf <+63>: mov %r9,0xf0(%rsp) 0x00007ffff7deead7 <+71>: movaps %xmm0,(%rsp) 0x00007ffff7deeadb <+75>: movaps %xmm1,0x10(%rsp) 0x00007ffff7deeae0 <+80>: movaps %xmm2,0x20(%rsp) 0x00007ffff7deeae5 <+85>: movaps %xmm3,0x30(%rsp) 0x00007ffff7deeaea <+90>: movaps %xmm4,0x40(%rsp) 0x00007ffff7deeaef <+95>: movaps %xmm5,0x50(%rsp) 0x00007ffff7deeaf4 <+100>: movaps %xmm6,0x60(%rsp) 0x00007ffff7deeaf9 <+105>: movaps %xmm7,0x70(%rsp) [BZ #19679] * sysdeps/x86_64/dl-trampoline.S (DL_RUNIME_UNALIGNED_VEC_SIZE): Renamed to ... (DL_RUNTIME_UNALIGNED_VEC_SIZE): This. Set to 8. (DL_RUNIME_RESOLVE_REALIGN_STACK): Renamed to ... (DL_RUNTIME_RESOLVE_REALIGN_STACK): This. Updated. (DL_RUNIME_RESOLVE_REALIGN_STACK): Renamed to ... (DL_RUNTIME_RESOLVE_REALIGN_STACK): This. * sysdeps/x86_64/dl-trampoline.h (DL_RUNIME_RESOLVE_REALIGN_STACK): Renamed to ... (DL_RUNTIME_RESOLVE_REALIGN_STACK): This.
*	Update copyright dates with scripts/update-copyrights.	Joseph Myers	2016-01-04	1	-1/+1
\|
*	Support x86-64 assmebler without AVX512	H.J. Lu	2015-10-13	1	-16/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When x86-64 assmebler doesn't support AVX512, we should make _dl_runtime_resolve_avx512/_dl_runtime_profile_avx512 as aliases of _dl_runtime_resolve_avx/_dl_runtime_profile_avx. Tested on x86-64 using GCC 5.2 with binutils 20151008 and GCC 4.8 with binutils 20130219. There are no differences in ld.so with binutils 20151008. There are no unexpected failures with binutils 20130219 and 20151008. [BZ #19124] * sysdeps/x86_64/dl-trampoline.S [!HAVE_AVX512_ASM_SUPPORT] (_dl_runtime_resolve_avx512): Make it a hidden alias of _dl_runtime_resolve_avx. (_dl_runtime_profile_avx512): Make it a hidden alias of _dl_runtime_profile_avx.
*	Save and restore vector registers in x86-64 ld.so	H.J. Lu	2015-08-25	1	-389/+73
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds SSE, AVX and AVX512 versions of _dl_runtime_resolve and _dl_runtime_profile, which save and restore the first 8 vector registers used for parameter passing. elf_machine_runtime_setup selects the proper _dl_runtime_resolve or _dl_runtime_profile based on _dl_x86_cpu_features. It avoids race condition caused by FOREIGN_CALL macros, which are only used for x86-64. Performance impact of saving and restoring 8 vector registers are negligible on Nehalem, Sandy Bridge, Ivy Bridge and Haswell when ld.so is optimized with SSE2. [BZ #15128] * sysdeps/x86_64/Makefile [$(subdir) == elf] (tests): Add ifuncmain8. (modules-names): Add ifuncmod8. ($(objpfx)ifuncmain8): New rule. * sysdeps/x86_64/dl-machine.h: Include <dl-procinfo.h> and <cpuid.h>. (elf_machine_runtime_setup): Use _dl_runtime_resolve_sse, _dl_runtime_resolve_avx, or _dl_runtime_resolve_avx512, _dl_runtime_profile_sse, _dl_runtime_profile_avx, or _dl_runtime_profile_avx512, based on HAS_ARCH_FEATURE. * sysdeps/x86_64/dl-trampoline.S: Rewrite. * sysdeps/x86_64/dl-trampoline.h: Likewise. * sysdeps/x86_64/ifuncmain8.c: New file. * sysdeps/x86_64/ifuncmod8.c: Likewise. * sysdeps/x86_64/nptl/tcb-offsets.sym (RTLD_SAVESPACE_SSE): Removed. * sysdeps/x86_64/nptl/tls.h (__128bits): Removed. (tcbhead_t): Change rtld_must_xmm_save to __glibc_unused1. Change rtld_savespace_sse to __glibc_unused2. (RTLD_CHECK_FOREIGN_CALL): Removed. (RTLD_ENABLE_FOREIGN_CALL): Likewise. (RTLD_PREPARE_FOREIGN_CALL): Likewise. (RTLD_FINALIZE_FOREIGN_CALL): Likewise.
*	Improve bndmov encoding with zero displacement	H.J. Lu	2015-07-09	1	-0/+8
\| \| \| \| \| \| \| \| \| \|	If x86-64 assembler doesn't support MPX, we encode bndmov instruction by hand. When displacement is zero, assembler generates shorter encoding. This patch improves bndmov encoding with zero displacement so that ld.so is identical when using assemblers with and without MPX support. * sysdeps/x86_64/dl-trampoline.S (_dl_runtime_resolve): Improve bndmov encoding with zero displacement.
*	Preserve bound registers for pointer pass/return	Igor Zamyatin	2015-07-09	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We need to save/restore bound registers and add a BND prefix before branches in _dl_runtime_profile so that bound registers for pointer pass and return are preserved when LD_AUDIT is used. [BZ #18134] * sysdeps/i386/configure.ac: Set HAVE_MPX_SUPPORT. * sysdeps/i386/configure: Regenerated. * sysdeps/i386/dl-trampoline.S (PRESERVE_BND_REGS_PREFIX): New. (_dl_runtime_profile): Save and restore Intel MPX return bound registers when calling _dl_call_pltexit. Add PRESERVE_BND_REGS_PREFIX before return. * sysdeps/i386/link-defines.sym (LRV_BND0_OFFSET): New. (LRV_BND1_OFFSET): Likewise. * sysdeps/x86/bits/link.h (La_i86_retval): Add lrv_bnd0 and lrv_bnd1. * sysdeps/x86_64/dl-trampoline.S (_dl_runtime_profile): Fix typo in bndmov encoding. * sysdeps/x86_64/dl-trampoline.h: Properly save and restore Intel MPX bound registers. Add PRESERVE_BND_REGS_PREFIX before branch instructions to preserve bounds.
*	Preserve bound registers in _dl_runtime_resolve	H.J. Lu	2015-03-16	1	-0/+8
\| \| \| \| \| \| \| \| \|	We need to add a BND prefix before indirect branch at the end of _dl_runtime_resolve to preserve bound registers. [BZ #18134] * sysdeps/x86_64/dl-trampoline.S (PRESERVE_BND_REGS_PREFIX): New. (_dl_runtime_resolve): Add a BND prefix before indirect branch.
*	Update copyright dates with scripts/update-copyrights.	Joseph Myers	2015-01-02	1	-1/+1
\|
*	Save/restore bound registers for _dl_runtime_profile	Igor Zamyatin	2014-04-16	1	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch saves and restores bound registers in x86-64 PLT for ld.so profile and LD_AUDIT: * sysdeps/x86_64/bits/link.h (La_x86_64_regs): Add lr_bnd. (La_x86_64_retval): Add lrv_bnd0 and lrv_bnd1. * sysdeps/x86_64/dl-trampoline.S (_dl_runtime_profile): Save Intel MPX bound registers before _dl_profile_fixup. * sysdeps/x86_64/dl-trampoline.h: Restore Intel MPX bound registers after _dl_profile_fixup. Save and restore bound registers bnd0/bnd1 when calling _dl_call_pltexit. * sysdeps/x86_64/link-defines.sym (BND_SIZE): New. (LR_BND_OFFSET): Likewise. (LRV_BND0_OFFSET): Likewise. (LRV_BND1_OFFSET): Likewise.
*	Save/restore bound registers in _dl_runtime_resolve	Igor Zamyatin	2014-04-09	1	-20/+79
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch saves and restores bound registers in symbol lookup for x86-64: 1. Branches without BND prefix clear bound registers. 2. x86-64 pass bounds in bound registers as specified in MPX psABI extension on hjl/mpx/master branch at https://github.com/hjl-tools/x86-64-psABI https://groups.google.com/forum/#!topic/x86-64-abi/KFsB0XTgWYc Binutils has been updated to create an alternate PLT to add BND prefix when branching to ld.so. * config.h.in (HAVE_MPX_SUPPORT): New #undef. * sysdeps/x86_64/configure.ac: Set HAVE_MPX_SUPPORT. * sysdeps/x86_64/configure: Regenerated. * sysdeps/x86_64/dl-trampoline.S (REGISTER_SAVE_AREA): New macro. (REGISTER_SAVE_RAX): Likewise. (REGISTER_SAVE_RCX): Likewise. (REGISTER_SAVE_RDX): Likewise. (REGISTER_SAVE_RSI): Likewise. (REGISTER_SAVE_RDI): Likewise. (REGISTER_SAVE_R8): Likewise. (REGISTER_SAVE_R9): Likewise. (REGISTER_SAVE_BND0): Likewise. (REGISTER_SAVE_BND1): Likewise. (REGISTER_SAVE_BND2): Likewise. (_dl_runtime_resolve): Use them. Save and restore Intel MPX bound registers when calling _dl_fixup.
*	Save and restore AVX-512 zmm registers to x86-64 ld.so	Igor Zamyatin	2014-03-13	1	-18/+104
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	AVX-512 ISA adds 512-bit zmm registers. This patch updates _dl_runtime_profile to pass zmm registers to run-time audit. It also changes _dl_x86_64_save_sse and _dl_x86_64_restore_sse to upport zmm registers, which are called when only when RTLD_PREPARE_FOREIGN_CALL is used. Its performance impact is minimum. * config.h.in (HAVE_AVX512_SUPPORT): New #undef. (HAVE_AVX512_ASM_SUPPORT): Likewise. * sysdeps/x86_64/bits/link.h (La_x86_64_zmm): New. (La_x86_64_vector): Add zmm. * sysdeps/x86_64/Makefile (tests): Add tst-audit10. (modules-names): Add tst-auditmod10a and tst-auditmod10b. ($(objpfx)tst-audit10): New target. ($(objpfx)tst-audit10.out): Likewise. (tst-audit10-ENV): New. (AVX512-CFLAGS): Likewise. (CFLAGS-tst-audit10.c): Likewise. (CFLAGS-tst-auditmod10a.c): Likewise. (CFLAGS-tst-auditmod10b.c): Likewise. * sysdeps/x86_64/configure.ac: Set config-cflags-avx512, HAVE_AVX512_SUPPORT and HAVE_AVX512_ASM_SUPPORT. * sysdeps/x86_64/configure: Regenerated. * sysdeps/x86_64/dl-trampoline.S (_dl_runtime_profile): Add AVX-512 zmm register support. (_dl_x86_64_save_sse): Likewise. (_dl_x86_64_restore_sse): Likewise. * sysdeps/x86_64/dl-trampoline.h: Updated to support different size vector registers. * sysdeps/x86_64/link-defines.sym (YMM_SIZE): New. (ZMM_SIZE): Likewise. * sysdeps/x86_64/tst-audit10.c: New file. * sysdeps/x86_64/tst-auditmod10a.c: Likewise. * sysdeps/x86_64/tst-auditmod10b.c: Likewise.
*	Update copyright notices with scripts/update-copyrights	Allan McRae	2014-01-01	1	-1/+1
\|
*	Fix typos.	Ondřej Bílka	2013-08-30	1	-1/+1
\|
*	Update copyright notices with scripts/update-copyrights.	Joseph Myers	2013-01-02	1	-1/+1
\|
*	Check if RTLD_SAVESPACE_SSE is aligned to 32 bytes	H.J. Lu	2012-05-11	1	-0/+4
\|
*	Replace FSF snail mail address with URLs.	Paul Eggert	2012-02-09	1	-3/+2
\|
*	Simplify AVX check	H.J. Lu	2011-09-07	1	-4/+1
\|
*	Fix minor CFI problem in regular x86-64 trampoline	Ulrich Drepper	2011-08-20	1	-1/+2
\|
*	Fix CFI info in x86-64 trampolines for non-AVX code	Ulrich Drepper	2011-08-20	1	-2/+3
\|
*	One more typo in AVX test	Ulrich Drepper	2011-07-23	1	-2/+2
\|
*	One more change to XSAVE patch	Ulrich Drepper	2011-07-22	1	-2/+4
\|
*	Fix AVX check	Andreas Schwab	2011-07-22	1	-6/+15
\|
*	Fix check for AVX enablement	Ulrich Drepper	2011-07-20	1	-5/+12
\| \| \| \| \|	The AVX bit is set if the CPU supports AVX. But this doesn't mean the kernel does. Add checks according to Intel's documentation.
*	Handle AVX saving on x86-64 in interrupted smbol lookups.	Ulrich Drepper	2009-08-25	1	-1/+0
\| \| \| \| \| \| \| \| \|	If a signal arrived during a symbol lookup and the signal handler also required a symbol lookup, the end of the lookup in the signal handler reset the flag whether restoring AVX/SSE registers is needed. Resetting means in this case that the tail part of the outer lookup code will try to restore the registers and this can fail miserably. We now restore to the previous value which makes nesting calls possible.
*	Support mixed SSE/AVX audit and check AVX only once.	H.J. Lu	2009-08-08	1	-237/+7
\| \| \| \| \| \| \| \| \| \|	This patch fixes mixed SSE/AVX audit and checks AVX only once in _dl_runtime_profile. When an AVX or SSE register value in pltenter is modified, we have to make sure that the SSE part value is the same in both lr_xmm and lr_vector fields so that pltexit will get the correct value from either lr_xmm or lr_vector fields. AVX-enabled pltenter should update both lr_xmm and lr_vector fields to support stacked AVX/SSE pltenter functions.
*	Improve CFI in x86-64 ld.so trampoline code.	Ulrich Drepper	2009-07-29	1	-1/+2
\|
*	Properly restore AVX registers on x86-64.	H.J. Lu	2009-07-29	1	-10/+10
\| \| \| \| \|	tst-audit4 and tst-audit5 fail under AVX emulator due to je instead of jne. This patch fixes them.
*	Preserve SSE registers in runtime relocations on x86-64.	Ulrich Drepper	2009-07-29	1	-0/+82
\| \| \| \| \| \| \| \| \| \|	SSE registers are used for passing parameters and must be preserved in runtime relocations. This is inside ld.so enforced through the tests in tst-xmmymm.sh. But the malloc routines used after startup come from libc.so and can be arbitrarily complex. It's overkill to save the SSE registers all the time because of that. These calls are rare. Instead we save them on demand. The new infrastructure put in place in this patch makes this possible and efficient.
*	Optimize restoring of ymm registers on x86-64.	Ulrich Drepper	2009-07-16	1	-43/+34
\| \| \| \|	The patch mainly reduces the code size but also avoids some jumps.
*	Fix thinko in AVX audit patch.	Ulrich Drepper	2009-07-15	1	-20/+4
\| \| \| \|	Don't use AVX instructions too often.
*	Fix typo in last change.	Ulrich Drepper	2009-07-15	1	-1/+1
\|
*	Secure AVX changes for auditing code.	Ulrich Drepper	2009-07-15	1	-32/+295
\| \| \| \| \| \| \| \|	The original AVX patch used a function pointer to handle the difference between machines with and without AVX support. This is insecure. A well-placed memory exploit could lead to redirection of the execution. Using a variable and several tests is a bit slower but cannot be exploited in this way.
*	Add AVX support to ld.so auditing for x86-64.	H.J. Lu	2009-07-10	1	-124/+55
\|
*	Fix handling of xmm6 in ld.so audit hooks on x86-64.	H.J. Lu	2009-07-02	1	-2/+4
\|
*	[BZ #9881]	Ulrich Drepper	2009-03-15	1	-3/+3
\| \| \| \| \| \| \| \| \|	* inet/inet6_rth.c (inet6_rth_add): Add some error checking. Patch mostly by Yang Hongyang <yanghy@cn.fujitsu.com>. * inet/Makefile (tests): Add tst-inet6_rth. * inet/tst-inet6_rth.c: New file. alignment of La_x86_64_regs. Store xmm parameters.
*	* elf/dl-runtime.c (reloc_offset): Define.	Ulrich Drepper	2009-03-15	1	-12/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	(reloc_index): Define. (_dl_fixup): Rename reloc_offset parameter to reloc_arg. (_dl_fixup_profile): Likewise. Use reloc_index instead of computing index from reloc_offset. (_dl_call_pltexit): Likewise. * sysdeps/x86_64/dl-trampoline.S (_dl_runtime_resolve): Just pass the relocation index to _dl_fixup. (_dl_runtime_profile): Likewise for _dl_fixup_profile and _dl_call_pltexit. * sysdeps/x86_64/dl-runtime.c: New file.
*	[BZ #9893]	Ulrich Drepper	2009-03-14	1	-99/+140
\| \| \| \| \| \|	* sysdeps/x86_64/dl-trampoline.S (_dl_runtime_profile): Fix alignement of La_x86_64_regs. Store xmm parameters. Patch mostly by Jiri Olsa <olsajiri@gmail.com>.
*	* sysdeps/x86_64/dl-trampoline.S (_dl_runtime_profile): Make sure	Ulrich Drepper	2007-10-31	1	-26/+28
\| \| \| \| \|	stack is properly aligned for the target function. Correct unwind info.
*	* sysdeps/x86_64/dl-trampoline.S (_dl_runtime_profile): Correctly	Ulrich Drepper	2007-08-24	1	-3/+4
\| \| \| \|	align stack for call if pltexit is to be used.
*	* elf/dl-reloc.c [PROF] (_dl_relocate_object): Define	Ulrich Drepper	2005-07-07	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	consider_profiling always to zero. Don't count of compiler to remove unreached if block. * sysdeps/x86_64/dl-trampoline.S [PROF] (_dl_runtime_profile): Don't compile. * sysdeps/i386/dl-trampoline.S [PROF] (_dl_runtime_profile): Likewise. * sysdeps/ia64/dl-trampoline.S [PROF] (_dl_runtime_profile): Likewise. * sysdeps/s390/s390-64/dl-trampoline.S [PROF] (_dl_runtime_profile): Likewise. * sysdeps/s390/s390-32/dl-trampoline.S [PROF] (_dl_runtime_profile): Likewise. * sysdeps/powerpc/powerpc64/dl-trampoline.S [PROF] (_dl_profile_resolve): Likewise. * sysdeps/powerpc/powerpc32/dl-trampoline.S [PROF] (_dl_profile_resolve): Likewise. * gmon/Makefile: Add rules to build and run tst-profile-static. * gmon/tst-profile-static.c: New file. * Makeconfig (+link-static): Allow passing program-specific flags.
*	* csu/elf-init.c (__libc_csu_fini): Don't do anything here.	Ulrich Drepper	2005-01-06	1	-0/+188
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* sysdeps/generic/libc-start.c: Don't register program destructor here. * dlfcn/Makefile: Add rules to build dlfcn.c. (LDFLAGS-dl.so): Removed. * dlfcn/dlclose.c: _dl_close is now in ld.so, use function pointer table. * dlfcn/dlmopen.c: Likewise for _dl_open. * dlfcn/dlopen.c: Likewise. * dlfcn/dlopenold.c: Likewise. * elf/dl-libc.c: Likewise for _dl_open and _dl_close. * elf/Makefile (routines): Remove dl-open and dl-close. (dl-routines): Add dl-open, dl-close, and dl-trampoline. Add rules to build and run tst-audit1. * elf/tst-audit1.c: New file. * elf/tst-auditmod1.c: New file. * elf/Versions [libc]: Remove _dl_open and _dl_close. * elf/dl-close.c: Change for use inside ld.so instead of libc.so. * elf/dl-open.c: Likewise. * elf/dl-debug.c (_dl_debug_initialize): Allow reinitialization, signaled by nonzero parameter. * elf/dl-init.c: Fix use of r_state. * elf/dl-load.c: Likewise. * elf/dl-close.c: Add auditing checkpoints. * elf/dl-open.c: Likewise. * elf/dl-fini.c: Likewise. * elf/dl-load.c: Likewise. * elf/dl-sym.c: Likewise. * sysdeps/generic/libc-start.c: Likewise. * elf/dl-object.c: Allocate memory for auditing information. * elf/dl-reloc.c: Remove RESOLV. We now always need the map. Correctly initialize slotinfo. * elf/dynamic-link.h: Adjust after removal of RESOLV. * sysdeps/hppa/dl-lookupcfg.h: Likewise. * sysdeps/ia64/dl-lookupcfg.h: Likewise. * sysdeps/powerpc/powerpc64/dl-lookupcfg.h: Removed. * elf/dl-runtime.c (_dl_fixup): Little cleanup. (_dl_profile_fixup): New parameters to point to register struct and variable for frame size. Add auditing checkpoints. (_dl_call_pltexit): New function. Don't define trampoline code here. * elf/rtld.c: Recognize LD_AUDIT. Load modules on startup. Remove all the functions from _rtld_global_ro which only _dl_open and _dl_close needed. Add auditing checkpoints. * elf/link.h: Define symbols for auditing interfaces. * include/link.h: Likewise. * include/dlfcn.h: Define __RTLD_AUDIT. Remove prototypes for _dl_open and _dl_close. Adjust access to argc and argv in libdl. * dlfcn/dlfcn.c: New file. * sysdeps/generic/dl-lookupcfg.h: Remove all content now that RESOLVE is gone. * sysdeps/generic/ldsodefs.h: Add definitions for auditing interfaces. * sysdeps/generic/unsecvars.h: Add LD_AUDIT. * sysdeps/i386/dl-machine.h: Remove trampoline code here. Adjust for removal of RESOLVE. * sysdeps/x86_64/dl-machine.h: Likewise. * sysdeps/generic/dl-trampoline.c: New file. * sysdeps/i386/dl-trampoline.c: New file. * sysdeps/x86_64/dl-trampoline.c: New file. * sysdeps/generic/dl-tls.c: Cleanups. Fixup for dtv_t change. Fix updating of DTV. * sysdeps/generic/libc-tls.c: Likewise. * sysdeps/arm/bits/link.h: Renamed to ... * sysdeps/arm/buts/linkmap.h: ...this. * sysdeps/generic/bits/link.h: Renamed to... * sysdeps/generic/bits/linkmap.h: ...this. * sysdeps/hppa/bits/link.h: Renamed to... * sysdeps/hppa/bits/linkmap.h: ...this. * sysdeps/hppa/i386/link.h: Renamed to... * sysdeps/hppa/i386/linkmap.h: ...this. * sysdeps/hppa/ia64/link.h: Renamed to... * sysdeps/hppa/ia64/linkmap.h: ...this. * sysdeps/hppa/s390/link.h: Renamed to... * sysdeps/hppa/s390/linkmap.h: ...this. * sysdeps/hppa/sh/link.h: Renamed to... * sysdeps/hppa/sh/linkmap.h: ...this. * sysdeps/hppa/x86_64/link.h: Renamed to... * sysdeps/hppa/x86_64/linkmap.h: ...this. 2005-01-06 Ulrich Drepper <drepper@redhat.com> * allocatestack.c (init_one_static_tls): Adjust initialization of DTV entry for static tls deallocation fix. * sysdeps/alpha/tls.h (dtv_t): Change pointer type to be struct which also contains information whether the memory pointed to is static TLS or not. * sysdeps/i386/tls.h: Likewise. * sysdeps/ia64/tls.h: Likewise. * sysdeps/powerpc/tls.h: Likewise. * sysdeps/s390/tls.h: Likewise. * sysdeps/sh/tls.h: Likewise. * sysdeps/sparc/tls.h: Likewise. * sysdeps/x86_64/tls.h: Likewise.
*	(CFLAGS-tst-align.c): Add -mpreferred-stack-boundary=4.	Ulrich Drepper	2004-12-22	1	-189/+0
\|
*	2.5-18.1	Jakub Jelinek	2007-07-12	1	-0/+189