summary refs log tree commit diff
path: root/sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S
Commit message (Collapse)AuthorAgeFilesLines
* Update copyright dates with scripts/update-copyrights.Joseph Myers2019-01-011-1/+1
| | | | | | | * All files with FSF copyright notices: Update copyright dates using scripts/update-copyrights. * locale/programs/charmap-kw.h: Regenerated. * locale/programs/locfile-kw.h: Likewise.
* x86: Don't include <init-arch.h> in assembly codesH.J. Lu2018-08-031-1/+0
| | | | | | | | | | | | There is no need to include <init-arch.h> in assembly codes since all x86 IFUNC selector functions are written in C. Tested on i686 and x86-64. There is no code change in libc.so, ld.so and libmvec.so. * sysdeps/i386/i686/multiarch/bzero-ia32.S: Don't include <init-arch.h>. * sysdeps/x86_64/fpu/multiarch/svml_d_sin8_core-avx2.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core-avx2.S: Likewise. * sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Likewise.
* Update copyright dates with scripts/update-copyrights.Joseph Myers2018-01-011-1/+1
| | | | | | | * All files with FSF copyright notices: Update copyright dates using scripts/update-copyrights. * locale/programs/charmap-kw.h: Regenerated. * locale/programs/locfile-kw.h: Likewise.
* x86-64: Implement memset family IFUNC selectors in CH.J. Lu2017-06-151-0/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement memset family IFUNC selectors in C. All internal calls within libc.so can use IFUNC on x86-64 since unlike x86, x86-64 supports PC-relative addressing to access the GOT entry so that it can call via PLT without using an extra register. For libc.a, we can't use IFUNC for functions which are called before IFUNC has been initialized. Use IFUNC internally reduces the icache footprint since libc.so and other codes in the process use the same implementations. This patch uses IFUNC for memset functions within libc. 2017-06-07 H.J. Lu <hongjiu.lu@intel.com> Erich Elsen <eriche@google.com> * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add memset-sse2-unaligned-erms, and memset_chk-nonshared. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add test for __memset_chk_erms. Update comments. * sysdeps/x86_64/multiarch/ifunc-memset.h: New file. * sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memset.c: Likewise. * sysdeps/x86_64/multiarch/memset_chk-nonshared.S: Likewise. * sysdeps/x86_64/multiarch/memset_chk.c: Likewise. * sysdeps/x86_64/multiarch/memset.S: Removed. * sysdeps/x86_64/multiarch/memset_chk.S: Likewise. * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S (__memset_chk_erms): New function.
* X86-64: Remove the previous SSE2/AVX2 memsetsH.J. Lu2016-06-081-20/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since the new SSE2/AVX2 memsets are faster than the previous ones, we can remove the previous SSE2/AVX2 memsets and replace them with the new ones. This reduces the size of libc.so by about 900 bytes. No change in IFUNC selection if SSE2 and AVX2 memsets weren't used before. If SSE2 or AVX2 memset was used, the new SSE2 or AVX2 memset optimized with Enhanced REP STOSB will be used for processors with ERMS. The new AVX512 memset will be used for processors with AVX512 which prefer vzeroupper. [BZ #19881] * sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Folded into ... * sysdeps/x86_64/memset.S: This. (__bzero): Removed. (__memset_tail): Likewise. (__memset_chk): Likewise. (memset): Likewise. (MEMSET_CHK_SYMBOL): New. Define only if MEMSET_SYMBOL isn't defined. (MEMSET_SYMBOL): Define only if MEMSET_SYMBOL isn't defined. * sysdeps/x86_64/multiarch/memset-avx2.S: Removed. (__memset_zero_constant_len_parameter): Check SHARED instead of PIC. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove memset-avx2 and memset-sse2-unaligned-erms. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Remove __memset_chk_sse2, __memset_chk_avx2, __memset_sse2 and __memset_avx2_unaligned. * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S (__bzero): Enabled. * sysdeps/x86_64/multiarch/memset.S (memset): Replace __memset_sse2 and __memset_avx2 with __memset_sse2_unaligned and __memset_avx2_unaligned. Use __memset_sse2_unaligned_erms or __memset_avx2_unaligned_erms if processor has ERMS. Support __memset_avx512_unaligned_erms and __memset_avx512_unaligned. (memset): Removed. (__memset_chk): Likewise. (MEMSET_SYMBOL): New. (libc_hidden_builtin_def): Replace __memset_sse2 with __memset_sse2_unaligned. * sysdeps/x86_64/multiarch/memset_chk.S (__memset_chk): Replace __memset_chk_sse2 and __memset_chk_avx2 with __memset_chk_sse2_unaligned and __memset_chk_avx2_unaligned_erms. Use __memset_chk_sse2_unaligned_erms or __memset_chk_avx2_unaligned_erms if processor has ERMS. Support __memset_chk_avx512_unaligned_erms and __memset_chk_avx512_unaligned.
* Add a comment in memset-sse2-unaligned-erms.SH.J. Lu2016-04-051-0/+2
| | | | | * sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Add a comment on VMOVU and VMOVA.
* Don't put SSE2/AVX/AVX512 memmove/memset in ld.soH.J. Lu2016-04-031-8/+10
| | | | | | | | | | | | | | Since memmove and memset in ld.so don't use IFUNC, don't put SSE2, AVX and AVX512 memmove and memset in ld.so. * sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S: Skip if not in libc. * sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S: Likewise.
* Add x86-64 memset with unaligned store and rep stosbH.J. Lu2016-03-311-0/+16
Implement x86-64 memset with unaligned store and rep movsb. Support 16-byte, 32-byte and 64-byte vector register sizes. A single file provides 2 implementations of memset, one with rep stosb and the other without rep stosb. They share the same codes when size is between 2 times of vector register size and REP_STOSB_THRESHOLD which defaults to 2KB. Key features: 1. Use overlapping store to avoid branch. 2. For size <= 4 times of vector register size, fully unroll the loop. 3. For size > 4 times of vector register size, store 4 times of vector register size at a time. [BZ #19881] * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add memset-sse2-unaligned-erms, memset-avx2-unaligned-erms and memset-avx512-unaligned-erms. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Test __memset_chk_sse2_unaligned, __memset_chk_sse2_unaligned_erms, __memset_chk_avx2_unaligned, __memset_chk_avx2_unaligned_erms, __memset_chk_avx512_unaligned, __memset_chk_avx512_unaligned_erms, __memset_sse2_unaligned, __memset_sse2_unaligned_erms, __memset_erms, __memset_avx2_unaligned, __memset_avx2_unaligned_erms, __memset_avx512_unaligned_erms and __memset_avx512_unaligned. * sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S: New file. * sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Likewise.