about summary refs log tree commit diff
path: root/src/string
Commit message (Collapse)AuthorAgeFilesLines
* adapt build of arm memcpy asm not to use .sub filesRich Felker2016-01-204-2/+7
| | | | | | | | | this depends on commit 9f5eb77992b42d484d69e879d24ef86466f20f21, which made it possible to use a .c file for arch-specific replacements, and on commit 2f853dd6b9a95d5b13ee8f9df762125e0588df5d, the out-of-tree build support, which made it so that src/*/$(ARCH)/* 'replacement' files get used even if they don't match the base name of a .c file in the parent directory.
* remove non-working pre-armv4t support from arm asmRich Felker2015-11-091-4/+0
| | | | | | | | | | | | | | | the idea of the three-instruction sequence being removed was to be able to return to thumb code when used on armv4t+ from a thumb caller, but also to be able to run on armv4 without the bx instruction available (in which case the low bit of lr would always be 0). however, without compiler support for generating such a sequence from C code, which does not exist and which there is unlikely to be interest in implementing, there is little point in having it in the asm, and it would likely be easier to add pre-armv4t support via enhanced linker handling of R_ARM_V4BX than at the compiler level. removing this code simplifies adding support for building libc in thumb2-only form (for cortex-m).
* convert arm memcpy asm to UAL, remove .word hacksRich Felker2015-11-051-22/+24
| | | | | contrary to commit 9367fe926196f407705bb07cd29c6e40eb1774dd, all relevant gas versions actually do support .syntax unified.
* reimplement strverscmp to fix corner casesRich Felker2015-06-231-32/+25
| | | | | | | | | | | | | | | | | this interface is non-standardized and is a GNU invention, and as such, our implementation should match the behavior of the GNU function. one peculiarity the old implementation got wrong was the handling of all-zero digit sequences: they are supposed to compare greater than digit sequences of which they are a proper prefix, as in 009 < 00. in addition, high bytes were treated with char signedness rather than as unsigned. this was wrong regardless of what the GNU function does since the resulting order relation varied by arch. the new strverscmp implementation makes explicit the cases where the order differs from what strcmp would produce, of which there are only two.
* remove potentially PIC-incompatible relocations from x86_64 and x32 asmRich Felker2015-04-182-1/+5
| | | | analogous to commit 8ed66ecbcba1dd0f899f22b534aac92a282f42d5 for i386.
* remove the last of possible-textrels from i386 asmRich Felker2015-04-182-1/+5
| | | | | | | | | | | | none of these are actual textrels because of ld-time binding performed by -Bsymbolic-functions, but I'm changing them with the goal of making ld-time binding purely an optimization rather than relying on it for semantic purposes. in the case of memmove's call to memcpy, making it explicit that the memmove asm is assuming the forward-copying behavior of the memcpy asm is desirable anyway; in case memcpy is ever changed, the semantic mismatch would be apparent while editing memmcpy.s.
* overhaul optimized x86_64 memset asmRich Felker2015-02-261-26/+55
| | | | | | | | | | | | | | on most cpu models, "rep stosq" has high overhead that makes it undesirable for small memset sizes. the new code extends the minimal-branch fast path for short memsets from size 15 up to size 126, and shrink-wraps this code path. in addition, "rep stosq" is sensitive to misalignment. the cost varies with size and with cpu model, but it has been observed performing 1.5 times slower when the destination address is not aligned mod 16. the new code thus ensures alignment mod 16, but also preserves any existing additional alignment, in case there are cpu models where it is beneficial. this version is based in part on changes proposed by Denys Vlasenko.
* overhaul optimized i386 memset asmRich Felker2015-02-261-32/+61
| | | | | | | | | | | | | | | on most cpu models, "rep stosl" has high overhead that makes it undesirable for small memset sizes. the new code extends the minimal-branch fast path for short memsets from size 15 up to size 62, and shrink-wraps this code path. in addition, "rep stosl" is very sensitive to misalignment. the cost varies with size and with cpu model, but it has been observed performing 1.5 to 4 times slower when the destination address is not aligned mod 16. the new code thus ensures alignment mod 16, but also preserves any existing additional alignment, in case there are cpu models where it is beneficial. this version is based in part on changes to the x86_64 memset asm proposed by Denys Vlasenko.
* x86_64/memset: avoid performing final store twiceDenys Vlasenko2015-02-101-1/+1
| | | | | | | | | | | | | | The code does a potentially misaligned 8-byte store to fill the tail of the buffer. Then it fills the initial part of the buffer which is a multiple of 8 bytes. Therefore, if size is divisible by 8, we were storing last word twice. This patch decrements byte count before dividing it by 8, making one less store in "size is divisible by 8" case, and not changing anything in all other cases. All at the cost of replacing one MOV insn with LEA insn. Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
* x86_64/memset: simple optimizationsDenys Vlasenko2015-02-101-14/+16
| | | | | | | | | | | | | | | | "and $0xff,%esi" is a six-byte insn (81 e6 ff 00 00 00), can use 4-byte "movzbl %sil,%esi" (40 0f b6 f6) instead. 64-bit imul is slow, move it as far up as possible so that the result (rax) has more time to be ready by the time we start using it in mem stores. There is no need to shuffle registers in preparation to "rep movs" if we are not going to take that code path. Thus, patch moves "jump if len < 16" instructions up, and changes alternate code path to use rdx and rdi instead of rcx and r8. Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
* fix tabs/spaces in memcpy.sRich Felker2014-11-231-279/+279
| | | | | | this file had been a mess that went unnoticed ever since it was imported. some lines used spaces for indention while others used tabs, and tabs were used for alignment.
* fix build regression in arm asm for memcpyRich Felker2014-11-231-30/+30
| | | | | | | | | | | | | | | | | commit 27828f7e9adb6b4f93ca56f6f98ef4c44bb5ed4e fixed compatibility with clang's internal assembler, but broke compatibility with gas and the traditional arm asm syntax by switching to the arm "unified assembler language" (UAL). recent versions of gas also support UAL, but require the .syntax directive to be used to switch to it. clang on the other hand defaults to UAL. and old versions of gas (still relevant) don't support UAL at all. for the conditional ldm/stm instructions, "ia" is default and can just be omitted, resulting in a mnemonic that's compatible with both traditional and UAL syntax. but for byte/halfword loads and stores, there seems to be no mnemonic compatible with both, and thus .word is used to produce the desired opcode explicitly. the .inst directive is not used because it is not compatible with older assemblers.
* arm assembly changes for clang compatibilityJoakim Sindholt2014-11-231-30/+30
|
* fix handling of odd lengths in swab functionRich Felker2014-10-041-1/+1
| | | | | | | | this function is specified to leave the last byte with "unspecified disposition" when the length is odd, so for the most part correct programs should not be calling swab with odd lengths. however, doing so is permitted, and should not write past the end of the destination buffer.
* add support for LC_TIME and LC_MESSAGES translationsRich Felker2014-07-261-2/+3
| | | | | | | | | | | | | | | | | for LC_MESSAGES, translation of strerror and similar literal message functions is supported. for messages in other places (particularly the dynamic linker) that use format strings, translation is not yet supported. in order to make it possible and safe, such messages will need to be refactored to separate the textual content from the format. for LC_TIME, the day and month names and strftime-style format strings provided by nl_langinfo are supported for translation. however there may be limitations, as some of the original C-locale nl_langinfo strings are non-unique and thus perhaps non-suitable as keys. overall, the locale support activated by this commit should not be seen as complete and polished but as a basis for beginning to test locale functionality and implement locales.
* consolidate str[n]casecmp_l into str[n]casecmp source filesRich Felker2014-07-022-0/+16
| | | | | this is mainly done for consistency with the ctype functions and to declutter the src/locale directory.
* fix incorrect comparison loop condition in memmemRich Felker2014-06-191-2/+2
| | | | | | | | | | the logic for this loop was copied from null-terminated-string logic in strstr without properly adapting it to work with explicit lengths. presumably this error could result in false negatives (wrongly comparing past the end of the needle/haystack), false positives (stopping comparison early when the needle contains null bytes), and crashes (from runaway reads past the end of mapped memory).
* fix false negatives with periodic needles in strstr, wcsstr, and memmemRich Felker2014-04-183-3/+3
| | | | | | | | in cases where the memorized match range from the right factor exceeded the length of the left factor, it was wrongly treated as a mismatch rather than a match. issue reported by Yves Bastide.
* fix search past the end of haystack in memmemTimo Teräs2014-04-091-0/+1
| | | | | | | | to optimize the search, memchr is used to find the first occurrence of the first character of the needle in the haystack before switching to a search for the full needle. however, the number of characters skipped by this first step were not subtracted from the haystack length, causing memmem to search past the end of the haystack.
* include cleanups: remove unused headers and add feature test macrosSzabolcs Nagy2013-12-1222-18/+7
|
* strcmp: Remove unnecessary check for *rMichael Forney2013-11-231-1/+1
| | | | If *l == *r && *l, then by transitivity, *r.
* optimized C memcpyRich Felker2013-08-281-16/+111
| | | | | | | | | | | | | | | | unlike the old C memcpy, this version handles word-at-a-time reads and writes even for misaligned copies. it does not require that the cpu support misaligned accesses; instead, it performs bit shifts to realign the bytes for the destination. essentially, this is the C version of the ARM assembly language memcpy. the ideas are all the same, and it should perform well on any arch with a decent number of general-purpose registers that has a barrel shift operation. since the barrel shifter is an optional cpu feature on microblaze, it may be desirable to provide an alternate asm implementation on microblaze, but otherwise the C code provides a competitive implementation for "generic risc-y" cpu archs that should alleviate the urgent need for arch-specific memcpy asm.
* optimized C memsetRich Felker2013-08-271-12/+77
| | | | | | | | | | | | | | | | this version of memset is optimized both for small and large values of n, and makes no misaligned writes, so it is usable (and near-optimal) on all archs. it is capable of filling up to 52 or 56 bytes without entering a loop and with at most 7 branches, all of which can be fully predicted if memset is called multiple times with the same size. it also uses the attribute extension to inform the compiler that it is violating the aliasing rules, unlike the previous code which simply assumed it was safe to violate the aliasing rules since translation unit boundaries hide the violations from the compiler. for non-GNUC compilers, 100% portable fallback code in the form of a naive loop is provided. I intend to eventually apply this approach to all of the string/memory functions which are doing word-at-a-time accesses.
* add arm-optimized memcpy implementation from bionic libcRich Felker2013-08-143-0/+383
| | | | | | | | | | | | | | | | | | | | the approach of this implementation was heavily investigated prior to adopting it. attempts to obtain similar performance with pure C code were capping out at about 75% of the performance of the asm, with considerably larger code size, and were fragile in that the compiler would sometimes compile part of memcpy into a call to itself. therefore, just using the asm seems to be the best option. this commit is the first to make use of the new subarch-specific asm framework. the new armel directory is the location for arm asm that should not be used for all arm subarchs, only the default one. armhf is the name of the little-endian hardfloat-ABI subarch, which can use the exact same asm. in both cases, the build system finds the asm by following a memcpy.sub file. the other two subarchs, armeb and armebhf, would need a big-endian variant of this code. it would not be hard to adapt the code to big endian, but I will hold off on doing so until there is demand for it.
* optimized memset asm for i386 and x86_64Rich Felker2013-08-012-0/+88
| | | | | | | | | | | | | | | | | | | | the concept of both versions is the same; they differ only in details. for long runs, they use "rep movsl" or "rep movsq", and for small runs, they use a trick, writing from both ends towards the middle, that reduces the number of branches needed. in addition, if memset is called multiple times with the same length, all branches will be predicted; there are no loops. for larger runs, there are likely faster approaches than "rep", at least on some cpu models. for 32-bit, it's unlikely that there is any faster approach that does not require non-baseline instructions; doing anything fancier would require inspecting cpu capabilities. for 64-bit, there may very well be faster versions that work on all models; further optimization could be explored in the future. with these changes, memset is anywhere between 50% faster and 6 times faster, depending on the cpu model and the length and alignment of the destination buffer.
* fix a couple misleading/wrong signal descriptions in strsignalRich Felker2013-07-091-2/+2
| | | | | | | there are still several more that are misleading, but SIGFPE (integer division error misdescribed as floating point) and and SIGCHLD (possibly non-exit status change events described as exiting) were the worst offenders.
* add realtime signals to strsignalRich Felker2013-07-091-3/+19
| | | | | the name format RTnn/RTnnn was chosen to minimized bloat while uniquely identifying the signal.
* fix off-by-one array bound in strsignalRich Felker2013-07-091-1/+1
|
* Add ABI compatability aliases.Isaac Dunham2013-04-051-0/+3
| | | | | | | | GNU used several extensions that were incompatible with C99 and POSIX, so they used alternate names for the standard functions. The result is that we need these to run standards-conformant programs that were linked with glibc.
* fix integer type issue in strverscmpRich Felker2013-02-261-1/+3
| | | | | | | | | lenl-lenr is not a valid expression for a signed int return value from strverscmp, since after implicit conversion from size_t to int this difference could have the wrong sign or might even be zero. using the difference for char values works since they're bounded well within the range of differences representable by int, but it does not work for size_t values.
* implement non-stub strverscmpRich Felker2013-02-261-2/+35
| | | | patch by Isaac Dunham.
* replace stub with working strcasestrRich Felker2013-02-211-2/+4
|
* fix wrong return value from wmemmove on forward copiesRich Felker2013-02-211-1/+2
|
* fix alignment logic in strlcpyRich Felker2012-12-261-1/+1
|
* simplify logic in stpcpy; avoid copying first aligned byte twiceRich Felker2012-10-221-4/+4
| | | | | gcc seems to be generating identical or near-identical code for both versions, but the newer code is more expressive of what it's doing.
* add memmem function (gnu extension)Rich Felker2012-10-151-0/+148
| | | | based on strstr. passes gnulib tests and a few quick checks of my own.
* optimize strchrnul/strcspn not to scan string twice on no-matchRich Felker2012-09-273-25/+29
| | | | | | | | | when strchr fails, and important piece of information already computed, the string length, is thrown away. have strchrnul (with namespace protection) be the underlying function so this information can be kept, and let strchr be a wrapper for it. this also allows strcspn to be considerably faster in the case where the match set has a single element that's not matched.
* slightly cleaner strlen, also seems to compile to better codeRich Felker2012-09-271-6/+4
| | | | | | | testing with gcc 4.6.3 on x86, -Os, the old version does a duplicate null byte check after the first loop. this is purely the compiler being stupid, but the old code was also stupid and unintuitive in how it expressed the check.
* asm for memmove on i386 and x86_64Rich Felker2012-09-102-0/+36
| | | | | | | for the sake of simplicity, I've only used rep movsb rather than breaking up the copy for using rep movsd/q. on all modern cpus, this seems to be fine, but if there are performance problems, there might be a need to go back and add support for rep movsd/q.
* reenable word-at-at-time copying in memmoveRich Felker2012-09-101-4/+27
| | | | | | | | | before restrict was added, memove called memcpy for forward copies and used a byte-at-a-time loop for reverse copies. this was changed to avoid invoking UB now that memcpy has an undefined copying order, making memmove considerably slower. performance is still rather bad, so I'll be adding asm soon.
* use restrict everywhere it's required by c99 and/or posix 2008Rich Felker2012-09-0620-20/+20
| | | | | | | | to deal with the fact that the public headers may be used with pre-c99 compilers, __restrict is used in place of restrict, and defined appropriately for any supported compiler. we also avoid the form [restrict] since older versions of gcc rejected it due to a bug in the original c99 standard, and instead use the form *restrict.
* remove dependency of wmemmove on wmemcpy directionRich Felker2012-09-061-4/+4
| | | | | | unlike the memmove commit, this one should be fine to leave in place. wmemmove is not performance-critical, and even if it were, it's already copying whole 32-bit words at a time instead of bytes.
* remove dependency of memmove on memcpy directionRich Felker2012-09-061-5/+4
| | | | | | | | this commit introduces a performance regression in many uses of memmove, which will need to be addressed before the next release. i'm making it as a temporary measure so that the restrict patch can be committed without invoking undefined behavior when memmove calls memcpy with overlapping regions.
* memcpy asm for i386 and x86_64Rich Felker2012-08-112-0/+51
|
* remove unused but buggy code from strstr.cRich Felker2012-08-111-10/+0
|
* remove buggy short-string wcsstr implementation; always use twowayRich Felker2012-08-111-9/+0
| | | | | | since this interface is rarely used, it's probably best to lean towards keeping code size down anyway. one-character needles will still be found immediately by the initial wcschr call anyway.
* optimize mempcpy to minimize need for data saved across the callRich Felker2012-07-311-2/+1
|
* make strerror_r behave nicer on failureRich Felker2012-06-201-2/+8
| | | | | | | if the buffer is too short, at least return a partial string. this is helpful if the caller is lazy and does not check for failure. care is taken to avoid writing anything if the buffer length is zero, and to always null-terminate when the buffer length is non-zero.
* fix overrun (n essentially ignored) in wcsncmpRich Felker2012-05-261-1/+1
| | | | bug report and solution by Richard Pennington
* fix failure of strrchr(str, 0)Rich Felker2012-05-261-1/+1
| | | | bug report and solution by Richard Pennington