about summary refs log tree commit diff
path: root/posix/regcomp.c
diff options
context:
space:
mode:
authorPaolo Bonzini <bonzini@gnu.org>2009-11-17 16:23:24 -0800
committerUlrich Drepper <drepper@redhat.com>2009-11-17 16:23:24 -0800
commit815d8147a3418334ffa91e2384c6e159f0809d65 (patch)
tree164ba2a49c0a9287af8894c6c12c12e3d5b33dc8 /posix/regcomp.c
parent7443244740724babd575943ee33c45da326afbe7 (diff)
downloadglibc-815d8147a3418334ffa91e2384c6e159f0809d65.tar.gz
glibc-815d8147a3418334ffa91e2384c6e159f0809d65.tar.xz
glibc-815d8147a3418334ffa91e2384c6e159f0809d65.zip
Fix ranges with multibyte characters as endpoints.
This is another bug in computing the fastmap.  It was reported by a user
of sed because it usually does not happen with !_LIBC.  However, it is
there in that case too.

The bug is that whenever we have a range at the beginning of the regex,
the regex must be tested on any possible multibyte character.  The reason
why _LIBC masks it, is that in general there is a collation symbol for
each possible multibyte-character lead byte, so all the lead bytes are
in general already part of the fastmap.

The tests use cyrillic characters as an example.  With _LIBC, they pass
without the patch too, but you can make them fail by removing collation
symbols handling.
Diffstat (limited to 'posix/regcomp.c')
-rw-r--r--posix/regcomp.c2
1 files changed, 1 insertions, 1 deletions
diff --git a/posix/regcomp.c b/posix/regcomp.c
index 446fed5445..6966b5da3c 100644
--- a/posix/regcomp.c
+++ b/posix/regcomp.c
@@ -377,7 +377,7 @@ re_compile_fastmap_iter (regex_t *bufp, const re_dfastate_t *init_state,
 	     applies to multibyte character sets; for single byte character
 	     sets, the SIMPLE_BRACKET again suffices.  */
 	  if (dfa->mb_cur_max > 1
-	      && (cset->nchar_classes || cset->non_match
+	      && (cset->nchar_classes || cset->non_match || cset->nranges
 # ifdef _LIBC
 		  || cset->nequiv_classes
 # endif /* _LIBC */