Use strspn/strcspn/strpbrk ifunc in internal calls. - mirror/glibc - mirror of git://sourceware.org/git/glibc.git

diff options

author	Ondřej Bílka <neleai@seznam.cz>	2014-03-18 11:01:38 +0100
committer	Ondřej Bílka <neleai@seznam.cz>	2015-05-12 20:18:51 +0200
commit	0f4840be2528b3e3f2ecea009ab08e753701e9be (patch)
tree	fc4eafdf024c8eccba2724f48153a6ccc7166121 /gen-locales.mk
parent	7327b333e56926d7d79bb9e01b839d3618bf750f (diff)
download	glibc-0f4840be2528b3e3f2ecea009ab08e753701e9be.tar.gz glibc-0f4840be2528b3e3f2ecea009ab08e753701e9be.tar.xz glibc-0f4840be2528b3e3f2ecea009ab08e753701e9be.zip

Use strspn/strcspn/strpbrk ifunc in internal calls.

To make a strtok faster and improve performance in general we need to do one
additional change.

A comment:

/* It doesn't make sense to send libc-internal strcspn calls through a PLT.
   The speedup we get from using SSE4.2 instruction is likely eaten away
   by the indirect call in the PLT.  */

Does not make sense at all because nobody bothered to check it. Gap
between these implementations is quite big, when haystack is empty a
sse2 is around 40 cycles slower because it needs to populate a lookup
table and difference only increases with size. That is much bigger than
plt slowdown which is few cycles.

Even benchtest show a gap which also may be reverse by branch
misprediction but my internal benchmark shown.

 simple_strspn stupid_strspn __strspn_sse42  __strspn_sse2
Length    0, alignment  0, acc len  6:  18.6562 35.2344 17.0469 61.6719
Length    6, alignment  0, acc len  6:  59.5469 72.5781 16.4219 73.625

This patch also handles strpbrk which is implemented by including a
x86_64/multiarch/strcspn.S file.

	* sysdeps/x86_64/multiarch/strspn.S: Remove plt indirection.
	* sysdeps/x86_64/multiarch/strcspn.S: Likewise.

Diffstat (limited to 'gen-locales.mk')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: