about summary refs log tree commit diff
path: root/sysdeps/x86/cpu-features.h
diff options
context:
space:
mode:
authorH.J. Lu <hjl.tools@gmail.com>2017-03-21 10:59:31 -0700
committerH.J. Lu <hjl.tools@gmail.com>2017-04-07 10:06:58 -0700
commit3966298a45782a73739ea31d76ee96b5c1a2788f (patch)
treedb420145876e45b6a7e8ab4c23959095c4bdeecc /sysdeps/x86/cpu-features.h
parentcc5dcd88039269bfaeefc0f5b2cf675904f7ee33 (diff)
downloadglibc-3966298a45782a73739ea31d76ee96b5c1a2788f.tar.gz
glibc-3966298a45782a73739ea31d76ee96b5c1a2788f.tar.xz
glibc-3966298a45782a73739ea31d76ee96b5c1a2788f.zip
x86-64: Improve branch predication in _dl_runtime_resolve_avx512_opt [BZ #21258]
On Skylake server, _dl_runtime_resolve_avx512_opt is used to preserve
the first 8 vector registers.  The code layout is

  if only %xmm0 - %xmm7 registers are used
     preserve %xmm0 - %xmm7 registers
  if only %ymm0 - %ymm7 registers are used
     preserve %ymm0 - %ymm7 registers
  preserve %zmm0 - %zmm7 registers

Branch predication always executes the fallthrough code path to preserve
%zmm0 - %zmm7 registers speculatively, even though only %xmm0 - %xmm7
registers are used.  This leads to lower CPU frequency on Skylake
server.  This patch changes the fallthrough code path to preserve
%xmm0 - %xmm7 registers instead:

  if whole %zmm0 - %zmm7 registers are used
    preserve %zmm0 - %zmm7 registers
  if only %ymm0 - %ymm7 registers are used
     preserve %ymm0 - %ymm7 registers
  preserve %xmm0 - %xmm7 registers

Tested on Skylake server.

	[BZ #21258]
	* sysdeps/x86_64/dl-trampoline.S (_dl_runtime_resolve_opt):
	Define only if _dl_runtime_resolve is defined to
	_dl_runtime_resolve_sse_vex.
	* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_opt):
	Fallthrough to _dl_runtime_resolve_sse_vex.

(cherry picked from commit c15f8eb50cea7ad1a4ccece6e0982bf426d52c00)
Diffstat (limited to 'sysdeps/x86/cpu-features.h')
0 files changed, 0 insertions, 0 deletions