diff options
author | Krzysztof Koch <Krzysztof.Koch@arm.com> | 2020-06-08 14:06:15 +0100 |
---|---|---|
committer | Wilco Dijkstra <wdijkstr@arm.com> | 2020-06-08 14:13:05 +0100 |
commit | d1f75e964484504e4f30f4623569d5889a97ac18 (patch) | |
tree | 5b123eeec64573bb9fa690b3f0186b66bb53ae6a /math/s_ldexp_template.c | |
parent | f112dcc506a6ec0aac5c34891736eec3c4f5dad6 (diff) | |
download | glibc-d1f75e964484504e4f30f4623569d5889a97ac18.tar.gz glibc-d1f75e964484504e4f30f4623569d5889a97ac18.tar.xz glibc-d1f75e964484504e4f30f4623569d5889a97ac18.zip |
AArch64: Merge Falkor memcpy and memmove implementations
Falkor's memcpy and memmove share some implementation details, therefore, the two routines are moved to a single source file for code reuse. The two routines now share code for small and medium copies (up to and including 128 bytes). Large copies in memcpy do not handle overlap correctly, consequently, the loops for moving/copying more than 128 bytes stay separate for memcpy and memmove. To increase code reuse a number of small modifications were made: 1. The old implementation of memcpy copied the first 16-bytes as soon as the size of data was determined to be greater than 32 bytes. For memcpy code to also work when copying small/medium overlapping data, the first load and store was moved to the large copy case. 2. Medium memcpy case no longer assumes that 16 bytes were already copied and uses 8 registers to copy up to 128 bytes. 3. Small case for memmove was enlarged to that of memcpy, which is less than or equal to 32 bytes. 4. Medium case for memmove was enlarged to that of memcpy, which is less than or equal to 128 bytes. Other changes include: 1. Improve alignment of existing loop bodies. 2. 'Delouse' memmove and memcpy input arguments. Make sure that upper 32-bits of input registers are zeroed if unused. 3. Do one more iteration in memmove loops and reduce the number of copies made from the start/end of the buffer, depending on the direction of the memmove loop. Benchmarking: Looking at the results from bench-memcpy-random.out, we can see that now memmove_falkor is about 5% faster than memcpy_falkor_old, while memmove_falkor_old was more than 15% slower. The memcpy implementation remained largely unmodified, so there is no significant performance change. The reason for such a significant memmove performance gain is the increase of the upper bound on the small copy case to 32 bytes and the increase of the upper bound on the medium copy case to 128 bytes. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Diffstat (limited to 'math/s_ldexp_template.c')
0 files changed, 0 insertions, 0 deletions