about summary refs log tree commit diff
path: root/src/regex
Commit message (Collapse)AuthorAgeFilesLines
* implement FNM_LEADING_DIR extension flag in fnmatchRich Felker2013-12-021-2/+9
| | | | | | | | | | | | | | | | previously this flag was defined and accepted as a no-op, possibly breaking some software that uses it. given the choice to remove the definition and possibly break applications that were already working, or simply implement the feature, the latter turned out to be easy enough to make the decision easy. in the case where the FNM_PATHNAME flag is also set, this implementation is clean and essentially optimal. otherwise, it's an inefficient "brute force" implementation. at some point, when cleaning up and refactoring this code, I may add a more direct code path for handling FNM_LEADING_DIR in the non-FNM_PATHNAME case, but at this point my main interest is avoiding introducing new bugs in the code that implements the standard fnmatch features specified by POSIX.
* fix fnmatch corner cases related to escapingRich Felker2013-12-011-4/+4
| | | | | | the FNM_PATHNAME logic for advancing by /-delimited components was incorrect when the / character was escaped (i.e. \/), and a final \ at the end of pattern was not handled correctly.
* fix the end of string matching in fnmatch with FNM_PATHNAMESzabolcs Nagy2013-12-011-2/+2
| | | | | | a '/' in the pattern could be incorrectly matched against the terminating null byte in the string causing arbitrarily long sequence of out-of-bounds access in fnmatch("/","",FNM_PATHNAME)
* fix allocation sizes in regcompSzabolcs Nagy2013-10-071-4/+4
| | | | | sizeof had incorrect argument in a few places, the size was always large enough so the issue was not critical.
* revert regex "cleanup" that seems unjustified and may break backtrackingRich Felker2013-02-011-0/+3
| | | | | | it's not clear to me at the moment whether the code that was removed (and which is now being re-added) is needed, but it's far from being a no-op, and i don't want to risk breaking regex in this release.
* remove unused "params" related code from regexSzabolcs Nagy2013-01-152-21/+11
| | | | | some structs and functions had reference to the params feature of tre that is not used by the code anymore
* regex: remove an unused local variable from regexecSzabolcs Nagy2013-01-141-3/+0
| | | | pos_start local variable is not used in tre_tnfa_run_backtrack
* use restrict everywhere it's required by c99 and/or posix 2008Rich Felker2012-09-064-5/+5
| | | | | | | | to deal with the fact that the public headers may be used with pre-c99 compilers, __restrict is used in place of restrict, and defined appropriately for any supported compiler. we also avoid the form [restrict] since older versions of gcc rejected it due to a bug in the original c99 standard, and instead use the form *restrict.
* fix regex on armRich Felker2012-05-251-1/+1
| | | | | | | | | | | TRE has a broken assumption that wchar_t is signed, which is a sane expectation, but not required by the standard, and false on ARM's ABI. i leave tre_char_t as wchar_t for now, since a pointer to it is directly passed to functions that need pointer to wchar_t. it does not seem to break anything. and since the maximum unicode scalar value is 0x10ffff, just use that explicitly rather than using the max value of any particular C type.
* remove some no-op end of string tests from regex parserRich Felker2012-05-131-4/+0
| | | | | | | | these are cruft from the original code which used an explicit string length rather than null termination. i blindly converted all the checks to null terminator checks, without noticing that in several cases, the subsequent switch statement would automatically handle the null byte correctly.
* another BRE fix: in ^*, * is literalRich Felker2012-05-131-0/+2
| | | | | | i don't understand why this has to be conditional on being in BRE mode, but enabling this code unconditionally breaks a huge number of ERE test cases.
* fix error checking for \ at end of regex (this was broken previously)Rich Felker2012-05-071-1/+1
|
* fix copy and paste error in regex code causing mishandling of \) in BRERich Felker2012-05-071-1/+1
|
* fix regex breakage in last commit (failure to handle empty regex, etc.)Rich Felker2012-05-071-4/+1
|
* fix ugly bugs in TRE regex parserRich Felker2012-05-071-60/+31
| | | | | | | | | | | | | | | | | | | | | | 1. * in BRE is not special at the beginning of the regex or a subexpression. this broke ncurses' build scripts. 2. \\( in BRE is a literal \ followed by a literal (, not a literal \ followed by a subexpression opener. 3. the ^ in \\(^ in BRE is a literal ^ only at the beginning of the entire BRE. POSIX allows treating it as an anchor at the beginning of a subexpression, but TRE's code for checking if it was at the beginning of a subexpression was wrong, and fixing it for the sake of supporting a non-portable usage was too much trouble when just removing this non-portable behavior was much easier. this patch also moved lots of the ugly logic for empty atom checking out of the default/literal case and into new cases for the relevant characters. this should make parsing faster and make the code smaller. if nothing else it's a lot more readable/logical. at some point i'd like to revisit and overhaul lots of this code...
* new fnmatch implementationRich Felker2012-04-281-131/+273
| | | | | | | | | | unlike the old one, this one's algorithm does not suffer from potential stack overflow issues or pathologically bad performance on certain patterns. instead of backtracking, it uses a matching algorithm which I have not seen before (unsure whether I invented or re-invented it) that runs in O(1) space and O(nm) time. it may be possible to improve the time to O(n), but not without significantly greater complexity.
* update fnmatch to POSIX 2008 semanticsRich Felker2012-04-261-4/+11
| | | | | | | an invalid bracket expression must be treated as if the opening bracket were just a literal character. this is to fix a bug whereby POSIX left the behavior of the "[" shell command undefined due to it being an invalid bracket expression.
* fix signedness error handling invalid multibyte sequences in regexecRich Felker2012-04-141-2/+2
| | | | | | | the "< 0" test was always false due to use of an unsigned type. this resulted in infinite loops on 32-bit machines (adding -1U to a pointer is the same as adding -1) and crashes on 64-bit machines (offsetting the string pointer by 4gb-1b when an illegal sequence was hit).
* remove invalid code from TRERich Felker2012-04-131-14/+0
| | | | | | | | | | | TRE wants to treat + and ? after a +, ?, or * as special; ? means ungreedy and + is reserved for future use. however, this is non-conformant. although redundant, these redundant characters have well-defined (no-op) meaning for POSIX ERE, and are actually _literal_ characters (which TRE is wrongly ignoring) in POSIX BRE mode. the simplest fix is to simply remove the unneeded nonstandard functionality. as a plus, this shaves off a small amount of bloat.
* fix broken regerror (typo) and missing messageRich Felker2012-04-131-2/+2
|
* upgrade to latest upstream TRE regex code (0.8.0)Rich Felker2012-03-205-1168/+1037
| | | | | | | | | | | | | | | | | | | the main practical results of this change are 1. the regex code is no longer subject to LGPL; it's now 2-clause BSD 2. most (all?) popular nonstandard regex extensions are supported I hesitate to call this a "sync" since both the old and new code are heavily modified. in one sense, the old code was "more severely" modified, in that it was actively hostile to non-strictly-conforming expressions. on the other hand, the new code has eliminated the useless translation of the entire regex string to wchar_t prior to compiling, and now only converts multibyte character literals as needed. in the future i may use this modified TRE as a basis for writing the long-planned new regex engine that will avoid multibyte-to-wide character conversion entirely by compiling multibyte bracket expressions specific to UTF-8.
* make glob mark symlinks-to-directories with the GLOB_MARK flagRich Felker2012-01-231-1/+1
| | | | | | POSIX is unclear on whether it should, but all historical implementations seem to behave this way, and it seems more useful to applications.
* support GLOB_PERIOD flag (GNU extension) to glob functionRich Felker2012-01-221-1/+2
| | | | patch by sh4rm4
* duplicate re_nsub in LSB/glibc ABI compatible locationRich Felker2011-06-161-1/+1
|
* fix handling of d_name in struct direntRich Felker2011-06-061-3/+2
| | | | | | | | | | | | basically there are 3 choices for how to implement this variable-size string member: 1. C99 flexible array member: breaks using dirent.h with pre-C99 compiler. 2. old way: length-1 string: generates array bounds warnings in caller. 3. new way: length-NAME_MAX string. no problems, simplifies all code. of course the usable part in the pointer returned by readdir might be shorter than NAME_MAX+1 bytes, but that is allowed by the standard and doesn't hurt anything.
* safety fix for glob's vla usage: disallow patterns longer than PATH_MAXRich Felker2011-06-051-0/+2
| | | | | | | | | | | this actually inadvertently disallows some valid patterns with redundant / or * characters, but it's better than allowing unbounded vla allocation. eventually i'll write code to move the pattern to the stack and eliminate redundancy to ensure that it fits in PATH_MAX at the beginning of glob. this would also allow it to be modified in place for passing to fnmatch rather than copied at each level of recursion.
* eliminate (harmless in this case) vla usage in fnmatch.cRich Felker2011-06-051-1/+1
|
* fix bug in TRE found by clang (typo && instead of &)Rich Felker2011-04-071-1/+1
|
* initial check-in, version 0.5.0 v0.5.0Rich Felker2011-02-127-0/+5364