diff options
author | Siddhesh Poyarekar <siddhesh@sourceware.org> | 2018-05-11 00:08:01 +0530 |
---|---|---|
committer | Siddhesh Poyarekar <siddhesh@sourceware.org> | 2018-05-11 00:08:02 +0530 |
commit | 70c97f8493ab2a215c2543d78f212abb23f151ed (patch) | |
tree | fc07055d4b4221040496e590104da4ea05964db7 /ChangeLog | |
parent | 8f5b00d375dbd7f5e15e57b24fec3bd5a4b1e98d (diff) | |
download | glibc-70c97f8493ab2a215c2543d78f212abb23f151ed.tar.gz glibc-70c97f8493ab2a215c2543d78f212abb23f151ed.tar.xz glibc-70c97f8493ab2a215c2543d78f212abb23f151ed.zip |
aarch64,falkor: Ignore prefetcher hints for memmove tail
The tail of the copy loops are unable to train the falkor hardware prefetcher because they load from a different base compared to the hot loop. In this case avoid serializing the instructions by loading them into different registers. Also peel the last iteration of the loop into the tail (and have them use different registers) since it gives better performance for medium sizes. This results in performance improvements of between 3% and 20% over the current falkor implementation for sizes between 128 bytes and 1K on the memmove-walk benchmark, thus mostly covering the regressions seen against the generic memmove. * sysdeps/aarch64/multiarch/memmove_falkor.S (__memmove_falkor): Use multiple registers to move data in loop tail.
Diffstat (limited to 'ChangeLog')
-rw-r--r-- | ChangeLog | 6 |
1 files changed, 6 insertions, 0 deletions
diff --git a/ChangeLog b/ChangeLog index b2b66020e8..c3b2e03c3b 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,9 @@ +2018-05-11 Siddhesh Poyarekar <siddhesh@sourceware.org> + + * sysdeps/aarch64/multiarch/memmove_falkor.S + (__memmove_falkor): Use multiple registers to move data in + loop tail. + 2018-05-10 Joseph Myers <joseph@codesourcery.com> * math/math-underflow.h: New file. |