diff options
author | Noah Goldstein <goldstein.w.n@gmail.com> | 2022-10-29 15:19:59 -0500 |
---|---|---|
committer | Noah Goldstein <goldstein.w.n@gmail.com> | 2022-11-08 19:22:08 -0800 |
commit | 2d2493a644092dd3d32dd3b8f6aa3adad351db3c (patch) | |
tree | 563bf95352ce800331014c64c5270b5effa6812e /sysdeps/x86_64/wcsncmp.S | |
parent | 419c832aba43276e285586998261d1db06033193 (diff) | |
download | glibc-2d2493a644092dd3d32dd3b8f6aa3adad351db3c.tar.gz glibc-2d2493a644092dd3d32dd3b8f6aa3adad351db3c.tar.xz glibc-2d2493a644092dd3d32dd3b8f6aa3adad351db3c.zip |
x86: Use VMM API in memcmpeq-evex.S and minor changes
Changes to generated code are: 1. In a few places use `vpcmpeqb` instead of `vpcmpneq` to save a byte of code size. 2. Add a branch for length <= (VEC_SIZE * 6) as opposed to doing the entire block of [VEC_SIZE * 4 + 1, VEC_SIZE * 8] in a single basic-block (the space to add the extra branch without changing code size is bought with the above change). Change (2) has roughly a 20-25% speedup for sizes in [VEC_SIZE * 4 + 1, VEC_SIZE * 6] and negligible to no-cost for [VEC_SIZE * 6 + 1, VEC_SIZE * 8] From N=10 runs on Tigerlake: align1,align2 ,length ,result ,New Time ,Cur Time ,New Time / Old Time 0 ,0 ,129 ,0 ,5.404 ,6.887 ,0.785 0 ,0 ,129 ,1 ,5.308 ,6.826 ,0.778 0 ,0 ,129 ,18446744073709551615 ,5.359 ,6.823 ,0.785 0 ,0 ,161 ,0 ,5.284 ,6.827 ,0.774 0 ,0 ,161 ,1 ,5.317 ,6.745 ,0.788 0 ,0 ,161 ,18446744073709551615 ,5.406 ,6.778 ,0.798 0 ,0 ,193 ,0 ,6.804 ,6.802 ,1.000 0 ,0 ,193 ,1 ,6.950 ,6.754 ,1.029 0 ,0 ,193 ,18446744073709551615 ,6.792 ,6.719 ,1.011 0 ,0 ,225 ,0 ,6.625 ,6.699 ,0.989 0 ,0 ,225 ,1 ,6.776 ,6.735 ,1.003 0 ,0 ,225 ,18446744073709551615 ,6.758 ,6.738 ,0.992 0 ,0 ,256 ,0 ,5.402 ,5.462 ,0.989 0 ,0 ,256 ,1 ,5.364 ,5.483 ,0.978 0 ,0 ,256 ,18446744073709551615 ,5.341 ,5.539 ,0.964 Rewriting with VMM API allows for memcmpeq-evex to be used with evex512 by including "x86-evex512-vecs.h" at the top. Complete check passes on x86-64.
Diffstat (limited to 'sysdeps/x86_64/wcsncmp.S')
0 files changed, 0 insertions, 0 deletions