about summary refs log tree commit diff
path: root/include
diff options
context:
space:
mode:
authorRich Felker <dalias@aerifal.cx>2015-02-26 01:51:39 -0500
committerRich Felker <dalias@aerifal.cx>2015-02-26 01:51:39 -0500
commit69858fa93107aa7485b143c54137e745a7b7ad72 (patch)
treedcc3bcbf9fa71af0227341d8a305c626d429e0d9 /include
parent20cbd607759038dca57f84ef7e7b5d44a3088574 (diff)
downloadmusl-69858fa93107aa7485b143c54137e745a7b7ad72.tar.gz
musl-69858fa93107aa7485b143c54137e745a7b7ad72.tar.xz
musl-69858fa93107aa7485b143c54137e745a7b7ad72.zip
overhaul optimized i386 memset asm
on most cpu models, "rep stosl" has high overhead that makes it
undesirable for small memset sizes. the new code extends the
minimal-branch fast path for short memsets from size 15 up to size 62,
and shrink-wraps this code path. in addition, "rep stosl" is very
sensitive to misalignment. the cost varies with size and with cpu
model, but it has been observed performing 1.5 to 4 times slower when
the destination address is not aligned mod 16. the new code thus
ensures alignment mod 16, but also preserves any existing additional
alignment, in case there are cpu models where it is beneficial.

this version is based in part on changes to the x86_64 memset asm
proposed by Denys Vlasenko.
Diffstat (limited to 'include')
0 files changed, 0 insertions, 0 deletions