Aarch64: Add memcpy for qualcomm's oryon-1 core - mirror/glibc - mirror of git://sourceware.org/git/glibc.git

diff options

author	Andrew Pinski <quic_apinski@quicinc.com>	2024-06-12 15:53:35 -0700
committer	Andreas K. Hüttel <dilfridge@gentoo.org>	2024-06-30 13:46:33 +0200
commit	4dc83cac78a92a99cdd1ae808890083461597b82 (patch)
tree	be643ae0ee9f98518d8b6ea39980375b1a1e34cf /sysdeps/aarch64/fpu/log1p_sve.c
parent	4228baef1a94e8bde84ad74f2e0358120a2bcac7 (diff)
download	glibc-4dc83cac78a92a99cdd1ae808890083461597b82.tar.gz glibc-4dc83cac78a92a99cdd1ae808890083461597b82.tar.xz glibc-4dc83cac78a92a99cdd1ae808890083461597b82.zip

Aarch64: Add memcpy for qualcomm's oryon-1 core

Qualcomm's new core (oryon-1) has a different performance characteristic
than other cores. For memcpy, it is faster to use the GPRs to
do the copy for large sizes (2x faster). For even larger sizes,
it is better to use the nontemporal load/store instructions so
we don't pollute the L1/L2 caches.

For smaller sizes, the characteristic are very similar to
other cores.
I used the thunderx memcpy as a starting point and expanded from there.

Changes since v1:
* v2: Fix ordering in Makefile.
* v3: Fix comment grammar about the ldnp/stnp instructions.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

Diffstat (limited to 'sysdeps/aarch64/fpu/log1p_sve.c')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: