about summary refs log tree commit diff
path: root/src/string
Commit message (Collapse)AuthorAgeFilesLines
* fix misleading comment in strstrRich Felker2020-12-091-1/+1
| | | | | | | | the intent here is just to scan at least l bytes forward for the end of the haystack and at least some decent minimum to avoid doing it over and over if the needle is short, with no need to be precise. the comment erroneously stated this as an estimate for MIN when it's actually an estimate for MAX.
* add optimized aarch64 memcpy and memsetRich Felker2020-06-262-0/+301
| | | | | | | | | | | | these are based on the ARM optimized-routines repository v20.05 (ef907c7a799a), with macro dependencies flattened out and memmove code removed from memcpy. this change is somewhat unfortunate since having the branch for memmove support in the large n case of memcpy is the performance-optimal and size-optimal way to do both, but it makes memcpy alone (static-linked) about 40% larger and suggests a policy that use of memcpy as memmove is supported. tabs used for alignment have also been replaced with spaces.
* add big-endian support to ARM assembler memcpyAndre McCurdy2020-06-252-7/+97
| | | | | Allow the existing ARM assembler memcpy implementation to be used for both big and little endian targets.
* handle possibility that SIGEMT replaces SIGSTKFLT in strsignalRich Felker2020-05-211-0/+10
| | | | | presently all archs define SIGSTKFLT but this is not correct. change strsignal as a prerequisite for fixing that.
* fix undefined behavior from signed overflow in strstr and memmemRich Felker2020-04-302-8/+8
| | | | | | | | | | | | | | unsigned char promotes to int, which can overflow when shifted left by 24 bits or more. this has been reported multiple times but then forgotten. it's expected to be benign UB, but can trap when built with explicit overflow catching (ubsan or similar). fix it now. note that promotion to uint32_t is safe and portable even outside of the assumptions usually made in musl, since either uint32_t has rank at least unsigned int, so that no further default promotions happen, or int is wide enough that the shift can't overflow. this is a desirable property to have in case someone wants to reuse the code elsewhere.
* remove redundant condition in memccpyAlexander Monakov2020-03-201-1/+1
| | | | | | | | Commit d9bdfd164 ("fix memccpy to not access buffer past given size") correctly added a check for 'n' nonzero, but made the pre-existing test '*s==c' redundant: n!=0 implies *s==c. Remove the unnecessary check. Reported by Alexey Izbyshev.
* add thumb2 support to arm assembler memcpyAndre McCurdy2020-01-162-6/+9
| | | | | | | For Thumb2 compatibility, replace two instances of a single instruction "orr with a variable shift" with the two instruction equivalent. Neither of the replacements are in a performance critical loop.
* fix memccpy to not access buffer past given sizeQuentin Rameau2018-12-021-1/+1
| | | | | memccpy would return a pointer over the given size when c is not found in the source buffer and n reaches 0.
* optimize two-way strstr and memmem bad character shiftRich Felker2018-11-082-2/+2
| | | | | | | | | | | | | | | | first, the condition (mem && k < p) is redundant, because mem being nonzero implies the needle is periodic with period exactly p, in which case any byte that appears in the needle must appear in the last p bytes of the needle, bounding the shift (k) by p. second, the whole point of replacing the shift k by mem (=l-p) is to prevent shifting by less than mem when discarding the memory on shift, in which case linear time could not be guaranteed. but as written, the check also replaced shifts greater than mem by mem, reducing the benefit of the shift. there is no possible benefit to this reduction of the shift; since mem is being cleared, the full shift is valid and more optimal. so only replace the shift by mem when it would be less than mem.
* remove commented-out debug printf from strstrRich Felker2018-11-021-1/+0
| | | | this was leftover from before the initial commit.
* fix spuriously slow check in twoway strstr/memmem coresRich Felker2018-11-022-2/+2
| | | | | mem0 && mem && ... is redundant since mem can only be nonzero when mem0 is nonzero.
* fix aliasing-based undefined behavior in string functionsRich Felker2018-09-268-19/+46
| | | | | | | | | use the GNU C may_alias attribute if available, and fallback to naive byte-by-byte loops if __GNUC__ is not defined. this patch has been written to minimize changes so that history remains reviewable; it does not attempt to bring the affected code into a more consistent or elegant form.
* optimize nop case of wmemmoveRich Felker2018-09-231-0/+1
|
* fix undefined pointer comparison in wmemmoveRich Felker2018-09-231-1/+2
|
* fix undefined pointer comparison in memmoveRich Felker2018-09-231-1/+1
| | | | | | | | | | | | | | | the comparison must take place in the address space model as an integer type, since comparing pointers that are not pointing into the same array is undefined. the subsequent d<s comparison however is valid, because it's only reached in the case where the source and dest overlap, in which case they are necessarily pointing to parts of the same array. to make the comparison, use an unsigned range check for dist(s,d)>=n, algebraically !(-n<s-d<n). subtracting n yields !(-2*n<s-d-n<0), which mapped into unsigned modular arithmetic is !(-2*n<s-d-n) or rather -2*n>=s-d-n.
* reduce spurious inclusion of libc.hRich Felker2018-09-129-9/+0
| | | | | | | | | | | | | | | | | | | | | libc.h was intended to be a header for access to global libc state and related interfaces, but ended up included all over the place because it was the way to get the weak_alias macro. most of the inclusions removed here are places where weak_alias was needed. a few were recently introduced for hidden. some go all the way back to when libc.h defined CANCELPT_BEGIN and _END, and all (wrongly implemented) cancellation points had to include it. remaining spurious users are mostly callers of the LOCK/UNLOCK macros and files that use the LFS64 macro to define the awful *64 aliases. in a few places, new inclusion of libc.h is added because several internal headers no longer implicitly include libc.h. declarations for __lockfile and __unlockfile are moved from libc.h to stdio_impl.h so that the latter does not need libc.h. putting them in libc.h made no sense at all, since the macros in stdio_impl.h are needed to use them correctly anyway.
* remove or make static various unused __-prefixed symbolsRich Felker2018-09-121-4/+1
|
* overhaul internally-public declarations using wrapper headersRich Felker2018-09-125-10/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commits leading up to this one have moved the vast majority of libc-internal interface declarations to appropriate internal headers, allowing them to be type-checked and setting the stage to limit their visibility. the ones that have not yet been moved are mostly namespace-protected aliases for standard/public interfaces, which exist to facilitate implementing plain C functions in terms of POSIX functionality, or C or POSIX functionality in terms of extensions that are not standardized. some don't quite fit this description, but are "internally public" interfacs between subsystems of libc. rather than create a number of newly-named headers to declare these functions, and having to add explicit include directives for them to every source file where they're needed, I have introduced a method of wrapping the corresponding public headers. parallel to the public headers in $(srcdir)/include, we now have wrappers in $(srcdir)/src/include that come earlier in the include path order. they include the public header they're wrapping, then add declarations for namespace-protected versions of the same interfaces and any "internally public" interfaces for the subsystem they correspond to. along these lines, the wrapper for features.h is now responsible for the definition of the hidden, weak, and weak_alias macros. this means source files will no longer need to include any special headers to access these features. over time, it is my expectation that the scope of what is "internally public" will expand, reducing the number of source files which need to include *_impl.h and related headers down to those which are actually implementing the corresponding subsystems, not just using them.
* remove unused code from strcpy.cRich Felker2018-09-121-7/+0
|
* optimize explicit_bzero for sizeAlexander Monakov2018-07-021-1/+1
| | | | | Avoid saving/restoring the incoming argument by reusing memset return value.
* add explicit_bzero implementationDavid Carlier2018-06-261-0/+8
| | | | | | | | | maintainer's note: past sentiment was that, despite being imperfect and unable to force clearing of all possible copies of sensitive data (e.g. in registers, register spills, signal contexts left on the stack, etc.) this function would be added if major implementations agreed on it, which has happened -- several BSDs and glibc all include it.
* fix OOB reads in Xbyte_memmemAlexander Monakov2017-09-041-9/+9
| | | | Reported by Leah Neukirchen.
* fix undefined behavior in memset due to missing sequence pointsRich Felker2017-08-291-4/+8
| | | | patch by Pascal Cuoq.
* fix arm run-time abi string functionsSzabolcs Nagy2017-06-226-36/+76
| | | | | | | | | in arm rtabi these __aeabi_* functions have special abi (they are only allowed to clobber r0,r1,r2,r3,ip,lr,cpsr), so they cannot be simple wrappers around normal string functions (which may clobber other registers), the safest solution is to write them in asm, a minimalistic implementation works because these are not supposed to be emitted by compilers or used in general.
* disable use of arm memcpy asm if building as thumb codeRich Felker2016-12-172-2/+2
| | | | | the thumb incompatibilities in the asm are probably only minor and should be fixable, but for now just use the C version.
* fix read past end of haystack buffer for short needles in memmemRich Felker2016-04-011-0/+1
| | | | | | | | | | | | | | the two/three/four byte memmem specializations are not prepared to handle haystacks shorter than the needle; they unconditionally read at least up to the needle length and subtract from the haystack length. if the haystack is shorter, the remaining haystack length underflows and produces an unbounded search which will eventually either crash or find a spurious match. the top-level memmem function attempted to avoid this case already by checking for haystack shorter than needle, but it failed to re-check after using memchr to remove the maximal prefix not containing the first byte of the needle.
* move arm-specific translation units out of arch/arm/src, to src/*/armRich Felker2016-01-224-0/+36
| | | | | | | this is possible with the new build system that allows src/*/$(ARCH)/* files which do not shadow a file in the parent directory, and yields a more logical organization. eventually it will be possible to remove arch/*/src from the build system.
* adapt build of arm memcpy asm not to use .sub filesRich Felker2016-01-204-2/+7
| | | | | | | | | this depends on commit 9f5eb77992b42d484d69e879d24ef86466f20f21, which made it possible to use a .c file for arch-specific replacements, and on commit 2f853dd6b9a95d5b13ee8f9df762125e0588df5d, the out-of-tree build support, which made it so that src/*/$(ARCH)/* 'replacement' files get used even if they don't match the base name of a .c file in the parent directory.
* remove non-working pre-armv4t support from arm asmRich Felker2015-11-091-4/+0
| | | | | | | | | | | | | | | the idea of the three-instruction sequence being removed was to be able to return to thumb code when used on armv4t+ from a thumb caller, but also to be able to run on armv4 without the bx instruction available (in which case the low bit of lr would always be 0). however, without compiler support for generating such a sequence from C code, which does not exist and which there is unlikely to be interest in implementing, there is little point in having it in the asm, and it would likely be easier to add pre-armv4t support via enhanced linker handling of R_ARM_V4BX than at the compiler level. removing this code simplifies adding support for building libc in thumb2-only form (for cortex-m).
* convert arm memcpy asm to UAL, remove .word hacksRich Felker2015-11-051-22/+24
| | | | | contrary to commit 9367fe926196f407705bb07cd29c6e40eb1774dd, all relevant gas versions actually do support .syntax unified.
* reimplement strverscmp to fix corner casesRich Felker2015-06-231-32/+25
| | | | | | | | | | | | | | | | | this interface is non-standardized and is a GNU invention, and as such, our implementation should match the behavior of the GNU function. one peculiarity the old implementation got wrong was the handling of all-zero digit sequences: they are supposed to compare greater than digit sequences of which they are a proper prefix, as in 009 < 00. in addition, high bytes were treated with char signedness rather than as unsigned. this was wrong regardless of what the GNU function does since the resulting order relation varied by arch. the new strverscmp implementation makes explicit the cases where the order differs from what strcmp would produce, of which there are only two.
* remove potentially PIC-incompatible relocations from x86_64 and x32 asmRich Felker2015-04-182-1/+5
| | | | analogous to commit 8ed66ecbcba1dd0f899f22b534aac92a282f42d5 for i386.
* remove the last of possible-textrels from i386 asmRich Felker2015-04-182-1/+5
| | | | | | | | | | | | none of these are actual textrels because of ld-time binding performed by -Bsymbolic-functions, but I'm changing them with the goal of making ld-time binding purely an optimization rather than relying on it for semantic purposes. in the case of memmove's call to memcpy, making it explicit that the memmove asm is assuming the forward-copying behavior of the memcpy asm is desirable anyway; in case memcpy is ever changed, the semantic mismatch would be apparent while editing memmcpy.s.
* overhaul optimized x86_64 memset asmRich Felker2015-02-261-26/+55
| | | | | | | | | | | | | | on most cpu models, "rep stosq" has high overhead that makes it undesirable for small memset sizes. the new code extends the minimal-branch fast path for short memsets from size 15 up to size 126, and shrink-wraps this code path. in addition, "rep stosq" is sensitive to misalignment. the cost varies with size and with cpu model, but it has been observed performing 1.5 times slower when the destination address is not aligned mod 16. the new code thus ensures alignment mod 16, but also preserves any existing additional alignment, in case there are cpu models where it is beneficial. this version is based in part on changes proposed by Denys Vlasenko.
* overhaul optimized i386 memset asmRich Felker2015-02-261-32/+61
| | | | | | | | | | | | | | | on most cpu models, "rep stosl" has high overhead that makes it undesirable for small memset sizes. the new code extends the minimal-branch fast path for short memsets from size 15 up to size 62, and shrink-wraps this code path. in addition, "rep stosl" is very sensitive to misalignment. the cost varies with size and with cpu model, but it has been observed performing 1.5 to 4 times slower when the destination address is not aligned mod 16. the new code thus ensures alignment mod 16, but also preserves any existing additional alignment, in case there are cpu models where it is beneficial. this version is based in part on changes to the x86_64 memset asm proposed by Denys Vlasenko.
* x86_64/memset: avoid performing final store twiceDenys Vlasenko2015-02-101-1/+1
| | | | | | | | | | | | | | The code does a potentially misaligned 8-byte store to fill the tail of the buffer. Then it fills the initial part of the buffer which is a multiple of 8 bytes. Therefore, if size is divisible by 8, we were storing last word twice. This patch decrements byte count before dividing it by 8, making one less store in "size is divisible by 8" case, and not changing anything in all other cases. All at the cost of replacing one MOV insn with LEA insn. Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
* x86_64/memset: simple optimizationsDenys Vlasenko2015-02-101-14/+16
| | | | | | | | | | | | | | | | "and $0xff,%esi" is a six-byte insn (81 e6 ff 00 00 00), can use 4-byte "movzbl %sil,%esi" (40 0f b6 f6) instead. 64-bit imul is slow, move it as far up as possible so that the result (rax) has more time to be ready by the time we start using it in mem stores. There is no need to shuffle registers in preparation to "rep movs" if we are not going to take that code path. Thus, patch moves "jump if len < 16" instructions up, and changes alternate code path to use rdx and rdi instead of rcx and r8. Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
* fix tabs/spaces in memcpy.sRich Felker2014-11-231-279/+279
| | | | | | this file had been a mess that went unnoticed ever since it was imported. some lines used spaces for indention while others used tabs, and tabs were used for alignment.
* fix build regression in arm asm for memcpyRich Felker2014-11-231-30/+30
| | | | | | | | | | | | | | | | | commit 27828f7e9adb6b4f93ca56f6f98ef4c44bb5ed4e fixed compatibility with clang's internal assembler, but broke compatibility with gas and the traditional arm asm syntax by switching to the arm "unified assembler language" (UAL). recent versions of gas also support UAL, but require the .syntax directive to be used to switch to it. clang on the other hand defaults to UAL. and old versions of gas (still relevant) don't support UAL at all. for the conditional ldm/stm instructions, "ia" is default and can just be omitted, resulting in a mnemonic that's compatible with both traditional and UAL syntax. but for byte/halfword loads and stores, there seems to be no mnemonic compatible with both, and thus .word is used to produce the desired opcode explicitly. the .inst directive is not used because it is not compatible with older assemblers.
* arm assembly changes for clang compatibilityJoakim Sindholt2014-11-231-30/+30
|
* fix handling of odd lengths in swab functionRich Felker2014-10-041-1/+1
| | | | | | | | this function is specified to leave the last byte with "unspecified disposition" when the length is odd, so for the most part correct programs should not be calling swab with odd lengths. however, doing so is permitted, and should not write past the end of the destination buffer.
* add support for LC_TIME and LC_MESSAGES translationsRich Felker2014-07-261-2/+3
| | | | | | | | | | | | | | | | | for LC_MESSAGES, translation of strerror and similar literal message functions is supported. for messages in other places (particularly the dynamic linker) that use format strings, translation is not yet supported. in order to make it possible and safe, such messages will need to be refactored to separate the textual content from the format. for LC_TIME, the day and month names and strftime-style format strings provided by nl_langinfo are supported for translation. however there may be limitations, as some of the original C-locale nl_langinfo strings are non-unique and thus perhaps non-suitable as keys. overall, the locale support activated by this commit should not be seen as complete and polished but as a basis for beginning to test locale functionality and implement locales.
* consolidate str[n]casecmp_l into str[n]casecmp source filesRich Felker2014-07-022-0/+16
| | | | | this is mainly done for consistency with the ctype functions and to declutter the src/locale directory.
* fix incorrect comparison loop condition in memmemRich Felker2014-06-191-2/+2
| | | | | | | | | | the logic for this loop was copied from null-terminated-string logic in strstr without properly adapting it to work with explicit lengths. presumably this error could result in false negatives (wrongly comparing past the end of the needle/haystack), false positives (stopping comparison early when the needle contains null bytes), and crashes (from runaway reads past the end of mapped memory).
* fix false negatives with periodic needles in strstr, wcsstr, and memmemRich Felker2014-04-183-3/+3
| | | | | | | | in cases where the memorized match range from the right factor exceeded the length of the left factor, it was wrongly treated as a mismatch rather than a match. issue reported by Yves Bastide.
* fix search past the end of haystack in memmemTimo Teräs2014-04-091-0/+1
| | | | | | | | to optimize the search, memchr is used to find the first occurrence of the first character of the needle in the haystack before switching to a search for the full needle. however, the number of characters skipped by this first step were not subtracted from the haystack length, causing memmem to search past the end of the haystack.
* include cleanups: remove unused headers and add feature test macrosSzabolcs Nagy2013-12-1222-18/+7
|
* strcmp: Remove unnecessary check for *rMichael Forney2013-11-231-1/+1
| | | | If *l == *r && *l, then by transitivity, *r.
* optimized C memcpyRich Felker2013-08-281-16/+111
| | | | | | | | | | | | | | | | unlike the old C memcpy, this version handles word-at-a-time reads and writes even for misaligned copies. it does not require that the cpu support misaligned accesses; instead, it performs bit shifts to realign the bytes for the destination. essentially, this is the C version of the ARM assembly language memcpy. the ideas are all the same, and it should perform well on any arch with a decent number of general-purpose registers that has a barrel shift operation. since the barrel shifter is an optional cpu feature on microblaze, it may be desirable to provide an alternate asm implementation on microblaze, but otherwise the C code provides a competitive implementation for "generic risc-y" cpu archs that should alleviate the urgent need for arch-specific memcpy asm.
* optimized C memsetRich Felker2013-08-271-12/+77
| | | | | | | | | | | | | | | | this version of memset is optimized both for small and large values of n, and makes no misaligned writes, so it is usable (and near-optimal) on all archs. it is capable of filling up to 52 or 56 bytes without entering a loop and with at most 7 branches, all of which can be fully predicted if memset is called multiple times with the same size. it also uses the attribute extension to inform the compiler that it is violating the aliasing rules, unlike the previous code which simply assumed it was safe to violate the aliasing rules since translation unit boundaries hide the violations from the compiler. for non-GNUC compilers, 100% portable fallback code in the form of a naive loop is provided. I intend to eventually apply this approach to all of the string/memory functions which are doing word-at-a-time accesses.