byte-based C locale, phase 1: multibyte character handling functions

this patch makes the functions which work directly on multibyte characters treat the high bytes as individual abstract code units rather than as multibyte sequences when MB_CUR_MAX is 1. since MB_CUR_MAX is presently defined as a constant 4, all of the new code added is dead code, and optimizing compilers' code generation should not be affected at all. a future commit will activate the new code. as abstract code units, bytes 0x80 to 0xff are represented by wchar_t values 0xdf80 to 0xdfff, at the end of the surrogates range. this ensures that they will never be misinterpreted as Unicode characters, and that all wctype functions return false for these "characters" without needing locale-specific logic. a high range outside of Unicode such as 0x7fffff80 to 0x7fffffff was also considered, but since C11's char16_t also needs to be able to represent conversions of these bytes, the surrogate range was the natural choice.
author: Rich Felker <dalias@aerifal.cx> 2015-06-16 04:44:17 +0000
committer: Rich Felker <dalias@aerifal.cx> 2015-06-16 05:28:48 +0000
commit: 1507ebf837334e9e07cfab1ca1c2e88449069a80 (patch)
tree: 92bad1f861e442f7e2d2fa4e178f471f4371509a /src/regex
parent: 38e2f727237230300fea6aff68802db04625fd23 (diff)
download: musl-1507ebf837334e9e07cfab1ca1c2e88449069a80.tar.gz
musl-1507ebf837334e9e07cfab1ca1c2e88449069a80.tar.xz
musl-1507ebf837334e9e07cfab1ca1c2e88449069a80.zip
1 files changed, 2 insertions, 1 deletions
diff --git a/src/regex/fnmatch.c b/src/regex/fnmatch.c
index 7f6b65f3..978fff88 100644
--- a/src/regex/fnmatch.c
+++ b/src/regex/fnmatch.c
@@ -18,6 +18,7 @@
 #include <stdlib.h>
 #include <wchar.h>
 #include <wctype.h>
+#include "locale_impl.h"
 
 #define END 0
 #define UNMATCHABLE -2
@@ -229,7 +230,7 @@ static int fnmatch_internal(const char *pat, size_t m, const char *str, size_t n
 	 * On illegal sequences we may get it wrong, but in that case
 	 * we necessarily have a matching failure anyway. */
 	for (s=endstr; s>str && tailcnt; tailcnt--) {
-		if (s[-1] < 128U) s--;
+		if (s[-1] < 128U || MB_CUR_MAX==1) s--;
 		else while ((unsigned char)*--s-0x80U<0x40 && s>str);
 	}
 	if (tailcnt) return FNM_NOMATCH;
author	Rich Felker <dalias@aerifal.cx>	2015-06-16 04:44:17 +0000
committer	Rich Felker <dalias@aerifal.cx>	2015-06-16 05:28:48 +0000
commit	1507ebf837334e9e07cfab1ca1c2e88449069a80 (patch)
tree	92bad1f861e442f7e2d2fa4e178f471f4371509a /src/regex
parent	38e2f727237230300fea6aff68802db04625fd23 (diff)
download	musl-1507ebf837334e9e07cfab1ca1c2e88449069a80.tar.gz musl-1507ebf837334e9e07cfab1ca1c2e88449069a80.tar.xz musl-1507ebf837334e9e07cfab1ca1c2e88449069a80.zip