about summary refs log tree commit diff
path: root/Doc/Zsh/mod_pcre.yo
blob: faada28dea898b327ac33225e7b4b813afb0b41a (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
COMMENT(!MOD!zsh/pcre
Interface to the PCRE library.
!MOD!)
cindex(regular expressions, perl-compatible)
The tt(zsh/pcre) module makes some commands available as builtins:

startitem()
findex(pcre_compile)
item(tt(pcre_compile) [ tt(-aimxs) ] var(PCRE))(
Compiles a perl-compatible regular expression.

Option tt(-a) will force the pattern to be anchored.
Option tt(-i) will compile a case-insensitive pattern.
Option tt(-m) will compile a multi-line pattern; that is,
tt(^) and tt($) will match newlines within the pattern.
Option tt(-x) will compile an extended pattern, wherein
whitespace and tt(#) comments are ignored.
Option tt(-s) makes the dot metacharacter match all characters,
including those that indicate newline.
)
findex(pcre_study)
item(tt(pcre_study))(
Studies the previously-compiled PCRE which may result in faster
matching.
)
findex(pcre_match)
item(tt(pcre_match) [ tt(-v) var(var) ] [ tt(-a) var(arr) ] \
[ tt(-n) var(offset) ] [ tt(-b) ] var(string))(
Returns successfully if tt(string) matches the previously-compiled
PCRE.

Upon successful match,
if the expression captures substrings within parentheses,
tt(pcre_match) will set the array var($match) to those
substrings, unless the tt(-a) option is given, in which
case it will set the array var(arr).  Similarly, the variable
var(MATCH) will be set to the entire matched portion of the
string, unless the tt(-v) option is given, in which case the variable
var(var) will be set.
No variables are altered if there is no successful match.
A tt(-n) option starts searching for a match from the
byte var(offset) position in var(string).  If the tt(-b) option is given,
the variable var(ZPCRE_OP) will be set to an offset pair string,
representing the byte offset positions of the entire matched portion
within the var(string).  For example, a var(ZPCRE_OP) set to "32 45" indicates
that the matched portion began on byte offset 32 and ended on byte offset 44.
Here, byte offset position 45 is the position directly after the matched
portion.  Keep in mind that the byte position isn't necessarily the same
as the character position when UTF-8 characters are involved.
Consequently, the byte offset positions are only to be relied on in the
context of using them for subsequent searches on var(string), using an offset
position as an argument to the tt(-n) option.  This is mostly
used to implement the "find all non-overlapping matches" functionality.

A simple example of "find all non-overlapping matches":

example(
string="The following zip codes: 78884 90210 99513"
pcre_compile -m "\d{5}"
accum=()
pcre_match -b -- $string
while [[ $? -eq 0 ]] do
    b=($=ZPCRE_OP)
    accum+=$MATCH
    pcre_match -b -n $b[2] -- $string
done
print -l $accum


)
)
enditem()

The tt(zsh/pcre) module makes available the following test condition:

startitem()
findex(pcre-match)
item(expr tt(-pcre-match) pcre)(
Matches a string against a perl-compatible regular expression.

For example,

example([[ "$text" -pcre-match ^d+$ ]] &&
print text variable contains only "d's".)

pindex(REMATCH_PCRE)
pindex(NO_CASE_MATCH)
If the tt(REMATCH_PCRE) option is set, the tt(=~) operator is equivalent to
tt(-pcre-match), and the tt(NO_CASE_MATCH) option may be used.  Note that
tt(NO_CASE_MATCH) never applies to the tt(pcre_match) builtin, instead use
the tt(-i) switch of tt(pcre_compile).
)
enditem()