x86: Optimize L(less_vec) case in memcmpeq-evex.S - mirror/glibc - mirror of git://sourceware.org/git/glibc.git

diff options

author	Noah Goldstein <goldstein.w.n@gmail.com>	2021-12-24 18:54:53 -0600
committer	Noah Goldstein <goldstein.w.n@gmail.com>	2021-12-27 03:18:58 -0600
commit	cca457f9c51a90cf82cae75432ed3de20942519c (patch)
tree	84222287827e96165605200016965b48ef0d5928 /sysdeps/x86_64/multiarch/wmemchr.c
parent	abddd61de090ae84e380aff68a98bd94ef704667 (diff)
download	glibc-cca457f9c51a90cf82cae75432ed3de20942519c.tar.gz glibc-cca457f9c51a90cf82cae75432ed3de20942519c.tar.xz glibc-cca457f9c51a90cf82cae75432ed3de20942519c.zip

x86: Optimize L(less_vec) case in memcmpeq-evex.S

No bug.
Optimizations are twofold.

1) Replace page cross and 0/1 checks with masked load instructions in
   L(less_vec). In applications this reduces branch-misses in the
   hot [0, 32] case.
2) Change controlflow so that L(less_vec) case gets the fall through.

Change 2) helps copies in the [0, 32] size range but comes at the cost
of copies in the [33, 64] size range.  From profiles of GCC and
Python3, 94%+ and 99%+ of calls are in the [0, 32] range so this
appears to the the right tradeoff.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

Diffstat (limited to 'sysdeps/x86_64/multiarch/wmemchr.c')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: