From ef330a5dfddc763b83fe2406a91c61519279de68 Mon Sep 17 00:00:00 2001 From: Peter Stephenson Date: Sun, 9 Apr 2006 21:47:21 +0000 Subject: 22408: support for multibyte characters in patterns --- Doc/Zsh/expn.yo | 27 ++++++++++++++++++++------- Doc/Zsh/options.yo | 14 ++++++++++++++ 2 files changed, 34 insertions(+), 7 deletions(-) (limited to 'Doc/Zsh') diff --git a/Doc/Zsh/expn.yo b/Doc/Zsh/expn.yo index e4e270f98..71a702809 100644 --- a/Doc/Zsh/expn.yo +++ b/Doc/Zsh/expn.yo @@ -1461,20 +1461,20 @@ tt(LPAR()#)var(X)tt(RPAR()) where var(X) may have one of the following forms: startitem() -item(i)( +item(tt(i))( Case insensitive: upper or lower case characters in the pattern match upper or lower case characters. ) -item(l)( +item(tt(l))( Lower case characters in the pattern match upper or lower case characters; upper case characters in the pattern still only match upper case characters. ) -item(I)( +item(tt(I))( Case sensitive: locally negates the effect of tt(i) or tt(l) from that point on. ) -item(b)( +item(tt(b))( Activate backreferences for parenthesised groups in the pattern; this does not work in filename generation. When a pattern with a set of active parentheses is matched, the strings matched by the groups are @@ -1525,11 +1525,11 @@ start and end indices are set to -1. Pattern matching with backreferences is slightly slower than without. ) -item(B)( +item(tt(B))( Deactivate backreferences, negating the effect of the tt(b) flag from that point on. ) -item(m)( +item(tt(m))( Set references to the match data for the entire string matched; this is similar to backreferencing and does not work in filename generation. The flag must be in effect at the end of the pattern, i.e. not local to a @@ -1550,7 +1550,7 @@ Unlike backreferences, there is no speed penalty for using match references, other than the extra substitutions required for the replacement strings in cases such as the example shown. ) -item(M)( +item(tt(M))( Deactivate the tt(m) flag, hence no references to match data will be created. ) @@ -1596,6 +1596,19 @@ the latter case the `tt((#b))' is useful for backreferences and the `tt((#q.))' will be ignored. Note that colon modifiers in the glob qualifiers are also not applied in ordinary pattern matching. ) +item(tt(u))( +Respect the current locale in determining the presence of multibyte +characters in a pattern, provided the shell was compiled with +tt(MULTIBYTE_SUPPORT). This overrides the tt(MULTIBYTE) +option; the default behaviour is taken from the option. Compare tt(U). +(Mnemonic: typically multibyte characters are from Unicode in the UTF-8 +encoding, although any extension of ASCII supported by the system +library may be used.) +) +item(tt(U))( +All characters are considered to be a single byte long. The opposite +of tt(u). This overrides the tt(MULTIBYTE) option. +) enditem() For example, the test string tt(fooxx) can be matched by the pattern diff --git a/Doc/Zsh/options.yo b/Doc/Zsh/options.yo index 74f8b4c84..0fb87302e 100644 --- a/Doc/Zsh/options.yo +++ b/Doc/Zsh/options.yo @@ -411,6 +411,20 @@ item(tt(MARK_DIRS) (tt(-8), ksh: tt(-X)))( Append a trailing `tt(/)' to all directory names resulting from filename generation (globbing). ) +pindex(MULTIBYTE) +cindex(characters, multibyte, in expansion and globbing) +cindex(multibyte characters, in expansion and globbing) +item(tt(MULTIBYTE))( +Respect multibyte characters when found during pattern matching. +When this option is set, characters strings are examined using the +system library to determine how many bytes form a character, depending +on the current locale. If the option is unset +(or the shell was not compiled with the configuration option +tt(MULTIBYTE_SUPPORT)) a single byte is always treated as a single +character. The option will eventually be extended to cover expansion. +Note, however, that it does not affect the shellʼs editor, which always +uses the locale to determine multibyte characters. +) pindex(NOMATCH) cindex(globbing, no matches) item(tt(NOMATCH) (tt(PLUS()3)) )( -- cgit 1.4.1