about summary refs log tree commit diff
path: root/sysdeps/x86_64/lshift.S
diff options
context:
space:
mode:
authorH.J. Lu <hjl.tools@gmail.com>2017-06-05 11:09:48 -0700
committerH.J. Lu <hjl.tools@gmail.com>2017-06-05 11:09:59 -0700
commitef9c4cb6c7abb6340b52e19de31d2a56c8de5844 (patch)
treee97752428e31d5d595c22a44b1905b5a91948d24 /sysdeps/x86_64/lshift.S
parent9cd30491dd6d9d4c5e9372d7a5c75f78f3a11260 (diff)
downloadglibc-ef9c4cb6c7abb6340b52e19de31d2a56c8de5844.tar.gz
glibc-ef9c4cb6c7abb6340b52e19de31d2a56c8de5844.tar.xz
glibc-ef9c4cb6c7abb6340b52e19de31d2a56c8de5844.zip
x86-64: Optimize wmemset with SSE2/AVX2/AVX512
The difference between memset and wmemset is byte vs int.  Add stubs
to SSE2/AVX2/AVX512 memset for wmemset with updated constant and size:

SSE2 wmemset:
	shl    $0x2,%rdx
	movd   %esi,%xmm0
	mov    %rdi,%rax
	pshufd $0x0,%xmm0,%xmm0
	jmp	entry_from_wmemset

SSE2 memset:
	movd   %esi,%xmm0
	mov    %rdi,%rax
	punpcklbw %xmm0,%xmm0
	punpcklwd %xmm0,%xmm0
	pshufd $0x0,%xmm0,%xmm0
entry_from_wmemset:

Since the ERMS versions of wmemset requires "rep stosl" instead of
"rep stosb", only the vector store stubs of SSE2/AVX2/AVX512 wmemset
are added.  The SSE2 wmemset is about 3X faster and the AVX2 wmemset
is about 6X faster on Haswell.

	* include/wchar.h (__wmemset_chk): New.
	* sysdeps/x86_64/memset.S (VDUP_TO_VEC0_AND_SET_RETURN): Renamed
	to MEMSET_VDUP_TO_VEC0_AND_SET_RETURN.
	(WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New.
	(WMEMSET_CHK_SYMBOL): Likewise.
	(WMEMSET_SYMBOL): Likewise.
	(__wmemset): Add hidden definition.
	(wmemset): Add weak hidden definition.
	* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
	wmemset_chk-nonshared.
	* sysdeps/x86_64/multiarch/ifunc-impl-list.c
	(__libc_ifunc_impl_list): Add __wmemset_sse2_unaligned,
	__wmemset_avx2_unaligned, __wmemset_avx512_unaligned,
	__wmemset_chk_sse2_unaligned, __wmemset_chk_avx2_unaligned
	and __wmemset_chk_avx512_unaligned.
	* sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S
	(VDUP_TO_VEC0_AND_SET_RETURN): Renamed to ...
	(MEMSET_VDUP_TO_VEC0_AND_SET_RETURN): This.
	(WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New.
	(WMEMSET_SYMBOL): Likewise.
	* sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S
	(VDUP_TO_VEC0_AND_SET_RETURN): Renamed to ...
	(MEMSET_VDUP_TO_VEC0_AND_SET_RETURN): This.
	(WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New.
	(WMEMSET_SYMBOL): Likewise.
	* sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Updated.
	(WMEMSET_CHK_SYMBOL): New.
	(WMEMSET_CHK_SYMBOL (__wmemset_chk, unaligned)): Likewise.
	(WMEMSET_SYMBOL (__wmemset, unaligned)): Likewise.
	* sysdeps/x86_64/multiarch/memset.S (WMEMSET_SYMBOL): New.
	(libc_hidden_builtin_def): Also define __GI_wmemset and
	__GI___wmemset.
	(weak_alias): New.
	* sysdeps/x86_64/multiarch/wmemset.c: New file.
	* sysdeps/x86_64/multiarch/wmemset.h: Likewise.
	* sysdeps/x86_64/multiarch/wmemset_chk-nonshared.S: Likewise.
	* sysdeps/x86_64/multiarch/wmemset_chk.c: Likewise.
	* sysdeps/x86_64/wmemset.c: Likewise.
	* sysdeps/x86_64/wmemset_chk.c: Likewise.
Diffstat (limited to 'sysdeps/x86_64/lshift.S')
0 files changed, 0 insertions, 0 deletions