x86: Optimize memcmp-evex-movbe.S for frontend behavior and size - mirror/glibc - mirror of git://sourceware.org/git/glibc.git

diff options

author	Noah Goldstein <goldstein.w.n@gmail.com>	2021-09-21 18:45:03 -0500
committer	Noah Goldstein <goldstein.w.n@gmail.com>	2021-10-12 12:02:12 -0500
commit	1bd8b8d58fc9967cc073d2c13bfb6befefca2faa (patch)
tree	056a0dcdfc1c587b9e4fc279ce8573bbcb0ee218 /sysdeps/x86_64/fpu/s_fminf.S
parent	8faa1e04493f23b16f473d21a3a5bc49b781ed2a (diff)
download	glibc-1bd8b8d58fc9967cc073d2c13bfb6befefca2faa.tar.gz glibc-1bd8b8d58fc9967cc073d2c13bfb6befefca2faa.tar.xz glibc-1bd8b8d58fc9967cc073d2c13bfb6befefca2faa.zip

x86: Optimize memcmp-evex-movbe.S for frontend behavior and size

No bug.

The frontend optimizations are to:
1. Reorganize logically connected basic blocks so they are either in
   the same cache line or adjacent cache lines.
2. Avoid cases when basic blocks unnecissarily cross cache lines.
3. Try and 32 byte align any basic blocks possible without sacrificing
   code size. Smaller / Less hot basic blocks are used for this.

Overall code size shrunk by 168 bytes. This should make up for any
extra costs due to aligning to 64 bytes.

In general performance before deviated a great deal dependending on
whether entry alignment % 64 was 0, 16, 32, or 48. These changes
essentially make it so that the current implementation is at least
equal to the best alignment of the original for any arguments.

The only additional optimization is in the page cross case. Branch on
equals case was removed from the size == [4, 7] case. As well the [4,
7] and [2, 3] case where swapped as [4, 7] is likely a more hot
argument size.

test-memcmp and test-wmemcmp are both passing.

Diffstat (limited to 'sysdeps/x86_64/fpu/s_fminf.S')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: