This manual provides comprehensive technical documentation for regular expression support in the Parasol Framework. It covers pattern syntax, behaviour, and all supported features for both Fluid and C++ developers.
For information on how to use the Regex class and its methods in your code, please refer to the Regex module API documentation.
Parasol's regex support is based on the regular expression syntax defined in the ECMAScript Specification. The implementation provides full Unicode support with UTF-8 encoding enabled by default.
Key features include:
Regular expressions are compiled into pattern objects that can be reused for efficient matching operations.
Regular expressions match characters in the target text based on pattern specifications. The following table describes all character matching forms:
| Pattern | Description | Example |
|---|---|---|
. |
Matches any character except line terminators (U+000A, U+000D, U+2028, U+2029). With the dotall flag, matches every code point. |
a.c matches "abc", "aXc" |
\0 |
Matches NULL character (U+0000) | \0 matches null byte |
\t |
Matches Horizontal Tab (U+0009) | a\tb matches "a b" |
\n |
Matches Line Feed (U+000A) | a\nb matches "a\nb" |
\v |
Matches Vertical Tab (U+000B) | \v matches vertical tab |
\f |
Matches Form Feed (U+000C) | \f matches form feed |
\r |
Matches Carriage Return (U+000D) | \r\n matches Windows line ending |
\cX |
Matches control character where X is A-Z or a-z. Value is (code point of X) & 0x1F |
\cA matches Ctrl-A (U+0001) |
\\ |
Matches backslash character (U+005C) | \\ matches "\" |
\xHH |
Matches character with hexadecimal code HH (00-FF) |
\x41 matches "A" |
\uHHHH |
Matches character with Unicode code point HHHH |
\u0041 matches "A" |
\u{H...} |
Matches character with Unicode code point represented by hex digits (up to 10FFFF) | \u{1F600} matches 😀 |
\X |
When X is one of ^ $ . * + ? ( ) [ ] { } | /, matches X literally |
\( matches "(" |
| Character | Any character not listed above matches itself | abc matches "abc" |
Line terminator code points are: U+000A (Line Feed), U+000D (Carriage Return), U+2028 (Line Separator), and U+2029 (Paragraph Separator).
All escape sequences must be complete and valid. If \c is not followed by a letter A-Z or a-z, \x is not followed by two hexadecimal digits, \u is not followed by four hexadecimal digits, or \u{...} does not contain valid hexadecimal or exceeds U+10FFFF, an error_escape exception is thrown.
In character classes (see Character Classes), the hyphen - can also be escaped as \-. The character ] must always be escaped as \] to be matched literally.
The | operator matches one of multiple alternative patterns, evaluated from left to right:
A|B|C
This matches pattern A, or pattern B, or pattern C. The first successful match is adopted, and remaining alternatives are not evaluated.
local pattern = regex.new('abc|abcdef')
local match = pattern.match('abcdef')
-- match[1] = "abc" (not "abcdef")Even though "abcdef" would match the second alternative completely, the pattern matches "abc" from the first alternative because alternatives are evaluated left to right.
Multiple alternatives can be combined:
local pattern = regex.new('cat|dog|bird|fish')Character classes define sets of characters that can match at a single position in the target text.
A character class is enclosed in square brackets [...] and matches any single character from the set:
| Pattern | Description | Example |
|---|---|---|
[ABC] |
Matches any of A, B, or C | [ABC] matches "A", "B", or "C" |
[^DEF] |
Matches any character except D, E, or F (negated class) | [^DEF] matches any character but "D", "E", "F" |
[G^H] |
Matches G, ^, or H (^ not first, so literal) | [G^H] matches "G", "^", or "H" |
[I-K] |
Matches any character from I to K inclusive (range) | [I-K] matches "I", "J", "K" |
[-LM] |
Matches -, L, or M (leading hyphen is literal) | [-LM] matches "-", "L", "M" |
[N-P-R] |
Matches N, O, P, -, or R (trailing hyphen after range is literal) | [N-P-R] matches "N", "O", "P", "-", "R" |
[S\-U] |
Matches S, -, or U (escaped hyphen) | [S\-U] matches "S", "-", "U" |
[.({|] |
Special regex characters lose their special meaning in character classes | [.({|] matches ".", "(", "{", "|" |
[] |
Empty class matches no code points (always fails) | [] never matches |
[^] |
Complement of empty class matches any code point | [^] matches any character including line terminators |
^ is the first character in [], the class is negated and matches any character NOT in the set.] character must always be escaped as \] to be included literally in a character class.- character creates a range when between two characters. To match - literally, place it first, last, or escape it as \-.., *, +, etc.) lose their special meaning inside character classes.Ranges define a span of consecutive Unicode code points:
local pattern = regex.new('[A-Z]') -- Matches any uppercase letter A-Z
local pattern = regex.new('[0-9]') -- Matches any digit
local pattern = regex.new('[a-zA-Z]') -- Matches any letterIf the range is invalid (e.g., [b-a] where the starting code point is greater than the ending code point), an error_range exception is thrown.
When case-insensitive matching is enabled (with the icase flag), character classes expand to include case-folded variations:
local pattern = regex.new('[E-F]', regex.ICASE)
-- Matches 'E', 'F', 'e', 'f', and any Unicode case variantsNote: Range [E-f] with icase flag will match all characters from U+0045 ('E') to U+0066 ('f'), including brackets, backslash, and other punctuation, plus their case-folded variants.
Predefined character classes provide convenient shortcuts for common character sets:
| Pattern | Equivalent | Description |
|---|---|---|
\d |
[0-9] |
Matches any decimal digit |
\D |
[^0-9] |
Matches any non-digit |
\s |
[ \t\n\v\f\r\u00a0\u1680\u2000-\u200a\u2028-\u2029\u202f\u205f\u3000\ufeff] |
Matches any whitespace character (WhiteSpace + LineTerminator) |
\S |
[^ \t\n\v\f\r\u00a0\u1680\u2000-\u200a\u2028-\u2029\u202f\u205f\u3000\ufeff] |
Matches any non-whitespace |
\w |
[0-9A-Za-z_] |
Matches any word character (alphanumeric + underscore) |
\W |
[^0-9A-Za-z_] |
Matches any non-word character |
\p{...} |
(See Unicode Support) | Matches characters with specified Unicode property |
\P{...} |
(See Unicode Support) | Matches characters without specified Unicode property |
All predefined character classes can be used inside character classes:
local pattern = regex.new('[\\d!\"#$%&\'()]') -- Matches digits or punctuationNote: The \s whitespace class automatically expands when new code points are added to Unicode category Zs.
Character classes support advanced set operations for precise character matching. These operations are always available as standard features.
&&The intersection operator && matches characters that belong to both sets:
[A&&B]
Examples:
-- Match lowercase Latin letters only
local pattern = regex.new('[\\p{sc=Latin}&&\\p{Ll}]')
-- Matches: a, b, c, ..., z, ñ, ø, etc. (lowercase Latin)
-- Does NOT match: A, B, C, ... (not lowercase)
-- Match ASCII letters only (not extended Latin)
local pattern = regex.new('[\\p{sc=Latin}&&[A-Za-z]]')--The subtraction operator -- matches characters in the first set but not in the second:
[A--B]
Examples:
-- Match Latin letters that are NOT lowercase
local pattern = regex.new('[\\p{sc=Latin}--\\p{Ll}]')
-- Matches: A, B, C, ..., Z (uppercase, titlecase, etc.)
-- Does NOT match: a, b, c, ... (lowercase excluded)
-- Match letters except vowels
local pattern = regex.new('[A-Za-z--[AEIOUaeiou]]')
-- Matches: consonants only\q{...}The \q{...} syntax allows character classes to match multi-character sequences:
[a-z\q{ch|th|ph}]
This matches either:
a-z, OR"ch", OR"th", OR"ph"Longest Match Priority: When strings are included in a character class, the longest matching string is always selected first:
local pattern = regex.new('[a-z\\q{ch|chocolate}]')
-- When matching "chocolate", matches the full word "chocolate"
-- Not "ch" followed by "ocolate"The sequence [a-z\q{ch|th|ph}] is functionally equivalent to (?:ch|th|ph|[a-z]).
Examples:
-- Match common digraphs or single letters
local pattern = regex.new('[a-z\\q{ch|sh|th|ph}]+')
-- Match emoji sequences or letters
local pattern = regex.new('[A-Z\\q{:-)|:-(|:-D}]')String sequences can be used with all set operators (union, intersection, subtraction).
Character classes can be nested as operands for set operations:
-- Valid: nested classes with operators
local pattern = regex.new('[\\p{sc=Latin}--[a-z]]')
-- Valid: nested union and subtraction
local pattern = regex.new('[A[B--C]D]')Operator Restriction: Only one type of operator can be used per level of nesting:
-- INVALID: mixing && and -- at same level
[AB--CD] -- Error: union (AB) then subtraction (--)
-- VALID: operators in different nesting levels
[[AB]--[CD]] -- OK: separate nesting levels
[A[B--C]D] -- OK: subtraction inside unionMultiple uses of the same operator are permitted:
-- Valid: multiple subtractions at same level
[\\p{sc=Latin}--\\p{Lu}--[a-z]]The following characters must be escaped with \ when used literally in character classes:
(, ), [, {, }, /, -, |] must always be escaped (even outside character classes)-- Correct
local pattern = regex.new('[\\(\\)\\[\\]\\{\\}]')
-- Incorrect (throws error_noescape)
local pattern = regex.new('[(]')The following 18 double-character sequences are reserved for future use and cannot appear in character classes:
!! ## $$ %% ** ++ ,, ..
:: ;; << == >> ?? @@ ^^
`` ~~
If any of these appear in a character class, an error_operator exception is thrown.
Quantifiers specify how many times a pattern element must match. Each quantifier has a greedy and non-greedy form.
| Quantifier | Non-Greedy | Matches | Description |
|---|---|---|---|
* |
*? |
0 or more | Repeats the preceding element zero or more times |
+ |
+? |
1 or more | Repeats the preceding element one or more times |
? |
?? |
0 or 1 | Makes the preceding element optional |
{n} |
N/A | Exactly n | Repeats the preceding element exactly n times |
{n,} |
{n,}? |
n or more | Repeats the preceding element at least n times |
{n,m} |
{n,m}? |
n to m | Repeats the preceding element between n and m times (inclusive) |
Greedy quantifiers (default) match as many characters as possible while still allowing the overall pattern to succeed:
local pattern = regex.new('a.*b')
local match = pattern.match('axxxbxxxb')
-- match[1] = "axxxbxxxb" (matches up to the last 'b')Non-greedy quantifiers (with ? suffix) match as few characters as possible while still allowing the overall pattern to succeed:
local pattern = regex.new('a.*?b')
local match = pattern.match('axxxbxxxb')
-- match[1] = "axxxb" (stops at the first 'b')Quantifiers must have a preceding expression to quantify. Using a quantifier without a preceding element (e.g., * at the start of a pattern) throws error_badrepeat.
If a quantifier range is invalid (e.g., {3,2} where n > m), an error_badbrace exception is thrown.
Mismatched { or } characters throw error_brace.
-- Match one or more digits
local pattern = regex.new('\\d+')
-- Match optional sign followed by digits
local pattern = regex.new('[+-]?\\d+')
-- Match exactly 3 letters
local pattern = regex.new('[A-Za-z]{3}')
-- Match 2 to 4 word characters (greedy)
local pattern = regex.new('\\w{2,4}')
-- Match 2 to 4 word characters (non-greedy)
local pattern = regex.new('\\w{2,4}?')
-- Match at least 5 digits
local pattern = regex.new('\\d{5,}')When a capturing group is quantified, the captured value is updated on each iteration. Only the last iteration's match is preserved:
local pattern = regex.new('(?:(a)|(b))+')
local match = pattern.match('ab')
-- match[1] = "ab" (full match)
-- match[2] = "" (empty, last iteration captured 'b', not 'a')
-- match[3] = "b" (last iteration captured 'b')Parentheses create groups for capturing matches and controlling operator precedence.
Capturing groups are created with (...) and are numbered starting from 1:
local pattern = regex.new('(\\d{3})-(\\d{3})-(\\d{4})')
local match = pattern.match('555-123-4567')
-- match[1] = "555-123-4567" (full match, always at index 1)
-- match[2] = "555" (first capturing group)
-- match[3] = "123" (second capturing group)
-- match[4] = "4567" (third capturing group)Group Numbering: Groups are numbered by the position of their opening ( parenthesis from left to right:
local pattern = regex.new('((a)(b))c')
-- Group 1: ((a)(b))
-- Group 2: (a)
-- Group 3: (b)Non-capturing groups (?:...) group expressions without creating a capture:
local pattern = regex.new('(?:tak(?:e|ing))')
-- Matches "take" or "taking" without capturingUse non-capturing groups to:
(?:ab)+(?:cat|dog)Named groups associate a name with a captured substring:
(?<name>...)
Example:
local pattern = regex.new('(?<year>\\d{4})-(?<month>\\d{2})-(?<day>\\d{2})')
local match = pattern.match('2025-10-14')
-- match[1] = "2025-10-14"
-- match[2] = "2025" (group 1, also accessible as 'year')
-- match[3] = "10" (group 2, also accessible as 'month')
-- match[4] = "14" (group 3, also accessible as 'day')Named groups are also assigned a number and can be accessed by both name and number.
Named groups can be reused if they appear in different alternatives:
local pattern = regex.new('(?<year>\\d{4})-\\d{1,2}|\\d{1,2}-(?<year>\\d{4})')
-- Matches "2025-10" or "10-2025"
-- 'year' captures the 4-digit year from either positionThis feature was introduced in ES2025.
A backreference \N (where N is a positive integer starting from 1) matches the same text that was captured by group N:
local pattern = regex.new('(TO|to)..\\1')
-- Matches "TOMATO" or "tomato" but not "Tomato"
-- \1 refers to captured text from group 1Example:
local pattern = regex.new('(["\']).*?\\1')
-- Matches string in quotes: "hello" or 'hello'
-- But not mixed quotes: "hello'A backreference \k<name> matches the text captured by a named group:
local pattern = regex.new('(?<quote>["\']).*?\\k<quote>')
-- Same as above, but using named groupForward References: Backreferences can appear before their corresponding group:
local pattern = regex.new('\\1(abc)') -- Valid in ECMAScriptUndefined Matches: A backreference to a group that hasn't captured anything matches the empty string:
local pattern = regex.new('(a)?b\\1')
-- Matches "b" (group 1 didn't capture, so \1 matches empty string)Invalid Groups: If a backreference refers to a non-existent group number, an error_backref exception is thrown:
local pattern = regex.new('\\5') -- Error: no group 5 existsWhen a capturing group is inside a quantified expression, captures are cleared on each iteration:
local pattern = regex.new('(?:(a)|(b))+')
local match = pattern.match('ab')
-- Only the last iteration's captures are retained
-- match[2] = "" (group 1's last iteration matched nothing)
-- match[3] = "b" (group 2's last iteration matched "b")Flag modifiers allow inline control of matching behaviour within specific parts of a pattern.
Bounded flag modifiers enable or disable flags only within a specific group:
(?ims-ims:...)
Available Flags:
| Flag | Meaning |
|---|---|
i |
Case-insensitive matching (icase) |
m |
Multiline mode (^ and $ match line boundaries) |
s |
Dotall mode (. matches line terminators) |
-i |
Disable case-insensitive matching |
-m |
Disable multiline mode |
-s |
Disable dotall mode |
Examples:
-- Case-insensitive only for middle section
local pattern = regex.new('hello(?i:world)THERE')
-- Matches: "helloworldTHERE", "helloWORLDTHERE", "helloWoRlDTHERE"
-- Does NOT match: "HELLOworldthere" (case-sensitive outside group)
-- Combine multiple flags
local pattern = regex.new('(?ims:.*)')
-- Case-insensitive + multiline + dotall for entire group
-- Disable flags
local pattern = regex.new('(?i)hello(?-i:world)')
-- "hello" is case-insensitive, "world" is case-sensitiveSingle Use per Flag: Each flag letter can only appear once per modifier group:
-- INVALID: 'i' appears twice
(?ii:...) -- Throws error_modifier
(?i-i:...) -- Throws error_modifierScope: Flag modifiers affect only the expressions inside their group.
ES2025 Feature: Bounded flag modifiers were introduced in ES2025 and are enabled by default.
Assertions test conditions at the current position without consuming characters (zero-width).
| Assertion | Description |
|---|---|
^ |
Matches at the start of the string. With multiline flag, also matches immediately after line terminators. |
$ |
Matches at the end of the string. With multiline flag, also matches immediately before line terminators. |
Examples:
-- Match lines starting with "#"
local pattern = regex.new('^#.*', regex.MULTILINE)
-- Match lines ending with ";"
local pattern = regex.new('.*;$', regex.MULTILINE)| Assertion | Description |
|---|---|
\b |
Matches at a word boundary (between \w and \W) |
\B |
Matches at a non-word boundary (not between \w and \W) |
Examples:
-- Match "cat" as a whole word
local pattern = regex.new('\\bcat\\b')
-- Matches: "cat in hat"
-- Does NOT match: "concatenate"
-- Match "cat" not as a whole word
local pattern = regex.new('\\Bcat\\B')
-- Matches: "concatenate"
-- Does NOT match: "cat in hat"Note: Inside a character class [...], \b matches the BEL character (U+0008), not a word boundary. Using \B inside a character class throws error_escape.
Lookahead assertions check if a pattern matches ahead without consuming characters:
| Assertion | Description |
|---|---|
(?=...) |
Positive lookahead: succeeds if pattern matches ahead |
(?!...) |
Negative lookahead: succeeds if pattern does NOT match ahead |
Examples:
-- Match "a" only if followed by "bc" or "def"
local pattern = regex.new('a(?=bc|def)')
-- Matches: "abc" (captures "a"), "adef" (captures "a")
-- Does NOT match: "axyz"
-- Match "a" only if NOT followed by "bc" or "def"
local pattern = regex.new('a(?!bc|def)')
-- Matches: "axyz" (captures "a")
-- Does NOT match: "abc", "adef"
-- Find & symbols that are not HTML entities
local pattern = regex.new('&(?!amp;|lt;|gt;|#)')
-- Matches bare "&" but not "&", "<", etc.Lookbehind assertions check if a pattern matches behind without consuming characters:
| Assertion | Description |
|---|---|
(?<=...) |
Positive lookbehind: succeeds if pattern matches behind |
(?<!...) |
Negative lookbehind: succeeds if pattern does NOT match behind |
Examples:
-- Match "a" only if preceded by "bc" or "de"
local pattern = regex.new('(?<=bc|de)a')
-- Matches: "bca" (captures "a"), "dea" (captures "a")
-- Does NOT match: "xa"
-- Match "a" only if NOT preceded by "bc" or "de"
local pattern = regex.new('(?<!bc|de)a')
-- Matches: "xa" (captures "a")
-- Does NOT match: "bca", "dea"Assertions can be combined for complex matching:
-- Match words between 3-6 letters containing at least one vowel
local pattern = regex.new('\\b(?=\\w*[aeiou])\\w{3,6}\\b', regex.ICASE)
-- Match integer strings that are not part of larger numbers
local pattern = regex.new('(?<!\\d)\\d+(?!\\d)')Parasol's regex implementation provides full Unicode support with UTF-8 encoding enabled by default.
Unicode properties match characters based on their Unicode characteristics using \p{...} and \P{...}:
| Pattern | Description |
|---|---|
\p{Property} |
Matches characters with the specified Unicode property |
\P{Property} |
Matches characters without the specified Unicode property |
Match characters from specific writing systems:
-- Match Latin characters
local pattern = regex.new('\\p{sc=Latin}+')
-- Match Greek characters
local pattern = regex.new('\\p{Script=Greek}+')
-- Match characters used in Latin or Common scripts
local pattern = regex.new('\\p{scx=Latin}+')Common scripts: Latin, Greek, Cyrillic, Han, Arabic, Hebrew, Hiragana, Katakana, etc.
Match characters by their general category:
| Property | Description | Examples |
|---|---|---|
\p{Lu} |
Uppercase letter | A, B, Z, À, Ω |
\p{Ll} |
Lowercase letter | a, b, z, à , ω |
\p{Lt} |
Titlecase letter | Dž, Lj, Nj |
\p{L} |
Any letter (Lu|Ll|Lt|Lm|Lo) | All letters |
\p{Nd} |
Decimal number | 0-9, ০-৯ |
\p{N} |
Any number (Nd|Nl|No) | All numbers |
\p{P} |
Punctuation | ., !, ?, ; |
\p{S} |
Symbol | $, +, =, © |
\p{Z} |
Separator | Space, non-breaking space |
\p{C} |
Other (control, format, etc.) | Control characters |
Examples:
-- Match any letter in any script
local pattern = regex.new('\\p{L}+')
-- Match digits in any script
local pattern = regex.new('\\p{Nd}+')
-- Match all punctuation
local pattern = regex.new('\\p{P}+')Binary properties have true/false values:
-- Match whitespace characters
local pattern = regex.new('\\p{White_Space}+')
-- Match emoji
local pattern = regex.new('\\p{Emoji}')
-- Match characters used in identifiers
local pattern = regex.new('\\p{ID_Start}\\p{ID_Continue}*')Properties can be specified in several formats:
-- Short form
\\p{Lu} -- Uppercase letter
\\p{sc=Latin} -- Latin script
-- Long form
\\p{Script=Latin}
\\p{General_Category=Uppercase_Letter}
-- Binary properties
\\p{Emoji}
\\p{White_Space}For a complete list of available properties, see the ECMAScript Unicode Property Table.
Some Unicode properties match sequences of multiple characters (string properties). These can be used in character classes except negated classes:
-- Valid: string property in positive class
local pattern = regex.new('[\\p{RGI_Emoji}]')
-- INVALID: string property with negation
local pattern = regex.new('[^\\p{RGI_Emoji}]') -- Throws error_complement
-- INVALID: string property with \P{...}
local pattern = regex.new('\\P{RGI_Emoji}') -- Throws error_complementWhen case-insensitive matching is enabled with the icase flag, Unicode case folding rules apply:
local pattern = regex.new('café', regex.ICASE)
-- Matches: "café", "CAFÉ", "Café", "cAfÉ", etc.Case folding follows Unicode rules, which may match more characters than simple ASCII uppercasing/lowercasing:
local pattern = regex.new('ß', regex.ICASE)
-- Matches: "ß" and "SS" (German sharp S case-folds to SS)Character classes operate on Unicode code points:
-- Match all characters in Basic Multilingual Plane
local pattern = regex.new('[\\u0000-\\uFFFF]+')
-- Match emoji range (partial)
local pattern = regex.new('[\\u{1F600}-\\u{1F64F}]+')The regex engine validates UTF-8 sequences:
Trailing bytes must be in range 0x80-0xBF. Invalid trailing bytes cause matching to fail at that position.
Code points must be ≤ 0x10FFFF. Values exceeding this cause matching to fail.
Non-shortest forms are rejected. For example, U+0030 (digit '0') must be encoded as 0x30, not as the longer forms 0xC0 0xB0 or 0xE0 0x80 0xB0.
At pattern compile time, invalid UTF-8 throws error_utf8. At matching time, invalid UTF-8 leads to match failure at that position.
Compilation flags affect how a regex pattern is compiled and interpreted. These flags are specified when creating a regex object.
| Flag | Effect |
|---|---|
ICASE |
Case-insensitive matching. Matches characters regardless of case using Unicode case-folding rules. |
MULTILINE |
Multiline mode. The ^ and $ anchors match at line boundaries (after/before line terminators) in addition to string boundaries. |
DOTALL |
Dotall (singleline) mode. The . metacharacter matches line terminators (U+000A, U+000D, U+2028, U+2029) in addition to all other characters. |
The exact syntax for specifying flags depends on the language binding:
Fluid:
local pattern = regex.new('hello', regex.ICASE)
local pattern = regex.new('.*', regex.DOTALL)
local pattern = regex.new('^line', regex.MULTILINE + regex.ICASE)C++:
auto pattern = pf::regex("hello", pf::regex::ICASE);
auto pattern = pf::regex(".*", pf::regex::DOTALL);
auto pattern = pf::regex("^line", pf::regex::MULTILINE | pf::regex::ICASE);Makes pattern matching case-insensitive using Unicode case-folding:
local pattern = regex.new('hello', regex.ICASE)
-- Matches: "hello", "HELLO", "Hello", "HeLLo", etc.
local pattern = regex.new('[a-z]+', regex.ICASE)
-- Matches: "abc", "ABC", "aBc", etc.Changes behaviour of ^ and $ anchors to match line boundaries:
local pattern = regex.new('^\\w+', regex.MULTILINE)
-- Without MULTILINE: matches word at start of string only
-- With MULTILINE: matches word at start of string AND after each line terminator
local text = "first line\nsecond line\nthird line"
local pattern = regex.new('^\\w+', regex.MULTILINE)
-- Matches: "first", "second", "third"Makes . match line terminators in addition to all other characters:
local pattern = regex.new('.*', regex.DOTALL)
-- Without DOTALL: .* matches up to (but not including) line terminators
-- With DOTALL: .* matches everything including line terminators
local text = "line 1\nline 2\nline 3"
local pattern = regex.new('.*', regex.DOTALL)
local match = pattern.match(text)
-- match[1] = "line 1\nline 2\nline 3" (entire string)Note: When DOTALL is set, .* will match all remaining characters in the subject string.
Match flags modify the behaviour of matching operations at runtime, after a pattern has been compiled. These flags are passed to matching functions (test, match, search, replace, split).
| Flag | Effect |
|---|---|
NOT_BEGIN_OF_LINE |
Do not treat the beginning of the text as the start of a line (affects ^ in multiline mode) |
NOT_END_OF_LINE |
Do not treat the end of the text as the end of a line (affects $ in multiline mode) |
NOT_BEGIN_OF_WORD |
Do not treat the beginning of the text as the start of a word (affects \b) |
NOT_END_OF_WORD |
Do not treat the end of the text as the end of a word (affects \b) |
NOT_NULL |
Do not match empty sequences |
CONTINUOUS |
Only match at the beginning of the text (anchored search) |
PREV_AVAILABLE |
Indicates that the previous character position is available for lookbehind assertions |
REPLACE_NO_COPY |
In replace operations, do not copy non-matching parts of the text |
REPLACE_FIRST_ONLY |
In replace operations, replace only the first occurrence |
Fluid:
local pattern = regex.new('\\w+')
-- Replace only first occurrence
local result = pattern.replace('hello world', 'goodbye', regex.REPLACE_FIRST_ONLY)
-- result = "goodbye world"
-- Match only at beginning
local match = pattern.match('hello world', regex.CONTINUOUS)
-- Succeeds (starts at beginning)
local match = pattern.match(' hello', regex.CONTINUOUS)
-- Fails (does not start at beginning)Useful when matching in the middle of a larger text:
local pattern = regex.new('^hello', regex.MULTILINE)
-- Normal matching
pattern.test('hello') -- true (at beginning)
-- With NOT_BEGIN_OF_LINE
pattern.test('hello', regex.NOT_BEGIN_OF_LINE) -- false (not treated as line start)Prevents matching empty strings:
local pattern = regex.new('a*')
-- Normal: matches empty string
pattern.test('') -- true
-- With NOT_NULL: rejects empty match
pattern.test('', regex.NOT_NULL) -- falseForces match to start at the beginning of the text:
local pattern = regex.new('\\d+')
-- Normal: finds "123" anywhere
pattern.match(' 123') -- Matches "123"
-- With CONTINUOUS: must start at position 0
pattern.match(' 123', regex.CONTINUOUS) -- Fails
pattern.match('123', regex.CONTINUOUS) -- SucceedsAffects replace operations by excluding non-matching text:
local pattern = regex.new('\\d+')
-- Normal replace: keeps non-matching text
pattern.replace('a123b456c', 'X') -- "aXbXc"
-- With REPLACE_NO_COPY: only includes replacements
pattern.replace('a123b456c', 'X', regex.REPLACE_NO_COPY) -- "XX"Limits replacement to the first match:
local pattern = regex.new('\\d+')
-- Normal replace: replaces all
pattern.replace('123 456 789', 'X') -- "X X X"
-- With REPLACE_FIRST_ONLY: replaces only first
pattern.replace('123 456 789', 'X', regex.REPLACE_FIRST_ONLY) -- "X 456 789"Parasol's regex implementation is based on the ECMAScript specification and provides the following characteristics:
The implementation supports expressions defined in the ECMAScript Specification (latest draft), including:
\Q...\E literal sequences: Use explicit escaping instead(?#...) is not supported(?ims:...) instead of (?imnsx-imnsx:...)\p{Alpha} instead of [[:alpha:]])[.ch.] not supported[=e=] not supportedBackreferences can appear before their corresponding groups:
local pattern = regex.new('\\1(abc)') -- ValidThis is valid in ECMAScript but may fail or behave differently in other engines.
Backreferences to groups that haven't captured anything match the empty string:
local pattern = regex.new('(a)?b\\1')
-- Matches "ab" (group 1 captured nothing, so \1 matches empty string)The ECMAScript specification does not define octal escape sequences like \ooo or \0ooo (except \0 for NULL):
-- Valid
local pattern = regex.new('\\0') -- Matches NULL (U+0000)
-- Invalid (not defined by ECMAScript)
local pattern = regex.new('\\101') -- Error: invalid escapeUse hexadecimal or Unicode escapes instead:
local pattern = regex.new('\\x41') -- 'A' in hexadecimal
local pattern = regex.new('\\u0041') -- 'A' in UnicodeSome operations not directly supported can be achieved through alternative patterns:
Intersection (Alternative Method):
-- Direct: [\p{sc=Latin}&&\p{Ll}]
-- Alternative: using lookahead
(?=\\p{sc=Latin})\\p{Ll}Subtraction (Alternative Method):
-- Direct: [\p{sc=Latin}--\p{Ll}]
-- Alternative: using negative lookahead
(?!\\p{Ll})\\p{sc=Latin}Atomic Groups:
-- Perl/PCRE: (?>pattern)
-- ECMAScript equivalent: (?=(pattern))\1Regex patterns should be compiled once and reused:
Inefficient:
for i = 1, 10000 do
local pattern = regex.new('\\d+') -- Compiles pattern 10,000 times
pattern.test(data[i])
endEfficient:
local pattern = regex.new('\\d+') -- Compiles pattern once
for i = 1, 10000 do
pattern.test(data[i]) -- Reuses compiled pattern
endStore frequently used patterns in variables (local or global) rather than recreating them:
-- Compiled patterns
local emailPattern = regex.new('[\\w._%+-]+@[\\w.-]+\\.[A-Za-z]{2,}')
local phonePattern = regex.new('\\d{3}-\\d{3}-\\d{4}')
local datePattern = regex.new('\\d{4}-\\d{2}-\\d{2}')
-- Use patterns multiple times efficiently
for _, contact in ipairs(contacts) do
if emailPattern.test(contact.email) then
processEmail(contact)
end
if phonePattern.test(contact.phone) then
processPhone(contact)
end
endNon-greedy quantifiers can improve performance in some cases:
-- Greedy: tries to match as much as possible, then backtracks
local pattern = regex.new('<.*>')
-- Matches: "<tag>content</tag>" as one match (backtracks from end)
-- Non-greedy: stops at first opportunity
local pattern = regex.new('<.*?>')
-- Matches: "<tag>" and "</tag>" separately (no backtracking)For HTML/XML parsing, non-greedy is typically faster:
-- Extract tag content efficiently
local pattern = regex.new('<([^>]+)>(.*?)</\\1>')Certain patterns can cause exponential time complexity:
Dangerous Pattern:
-- Exponential backtracking on non-match
local pattern = regex.new('(a+)+b')
local text = 'aaaaaaaaaaaaaaaaaac' -- No 'b' at end
-- This takes exponential time as pattern length increasesSolutions:
Use possessive-like behaviour:
-- Prevent backtracking with atomic group simulation
local pattern = regex.new('(?=(a+))\\1+b')Use negated character classes:
-- Clearer intent, better performance
local pattern = regex.new('[^b]+b')Be specific about what you're matching:
-- Instead of: .*
-- Use: [^<]+ (if not matching '<')
-- Use: \\w+ (if matching word characters)Use predefined classes when possible:
-- Faster
local pattern = regex.new('\\d+')
-- Slower (equivalent but not optimised)
local pattern = regex.new('[0-9]+')Simplify complex classes:
-- Complex
local pattern = regex.new('[A-Za-z0-9_]+')
-- Simpler and equivalent
local pattern = regex.new('\\w+')Anchor patterns to reduce search space:
-- Unanchored: searches entire string
local pattern = regex.new('\\d+')
-- Anchored: only checks from beginning
local pattern = regex.new('^\\d+')
-- Anchored both ends: exact match only
local pattern = regex.new('^\\d+$')Unicode properties are optimised internally, but broad categories are faster than specific scripts:
-- Faster: general category
local pattern = regex.new('\\p{L}+') -- All letters
-- Slower: specific script
local pattern = regex.new('\\p{sc=Latin}+') -- Latin letters only^, $)\d, \w, \s)This section provides practical regex patterns for common use cases.
Basic email pattern:
local pattern = regex.new('[\\w._%+-]+@[\\w.-]+\\.[A-Za-z]{2,}')
-- Matches: user@example.com, first.last@sub.domain.co.ukExplanation:
[\w._%+-]+ - Username: word characters, dots, underscores, percent, plus, hyphen@ - Literal @ symbol[\w.-]+ - Domain name: word characters, dots, hyphens\. - Literal dot[A-Za-z]{2,} - Top-level domain: 2 or more lettersMore strict pattern:
local pattern = regex.new('^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$')
-- Anchored to match entire stringBasic URL pattern:
local pattern = regex.new('(https?)://([^/\\s]+)([^\\s]*)')
-- Captures: protocol, domain, path
local match = pattern.match('https://example.com/path?query=value')
-- match[1] = "https://example.com/path?query=value" (full match)
-- match[2] = "https" (protocol)
-- match[3] = "example.com" (domain)
-- match[4] = "/path?query=value" (path)With named captures:
local pattern = regex.new('(?<protocol>https?)://(?<domain>[^/\\s]+)(?<path>[^\\s]*)')
local match = pattern.match('https://example.com/path')
-- Access by name: match.domain (language binding dependent)
-- Access by number: match[3]US phone number:
-- Format: 555-123-4567
local pattern = regex.new('\\d{3}-\\d{3}-\\d{4}')
-- With optional country code: +1-555-123-4567
local pattern = regex.new('(\\+1-)?\\d{3}-\\d{3}-\\d{4}')
-- With optional separators (-, ., space, or none)
local pattern = regex.new('\\d{3}[-. ]?\\d{3}[-. ]?\\d{4}')International E.164 format:
-- +1234567890 to +123456789012345
local pattern = regex.new('\\+\\d{1,15}')ISO 8601 date (YYYY-MM-DD):
local pattern = regex.new('\\d{4}-\\d{2}-\\d{2}')
-- Matches: 2025-10-14
-- With validation (basic):
local pattern = regex.new('\\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\\d|3[01])')
-- Validates month (01-12) and day (01-31)US date format (MM/DD/YYYY):
local pattern = regex.new('(0[1-9]|1[0-2])/(0[1-9]|[12]\\d|3[01])/\\d{4}')
-- Matches: 10/14/2025Flexible date format:
local pattern = regex.new('\\d{1,2}[-/]\\d{1,2}[-/]\\d{2,4}')
-- Matches: 10/14/2025, 10-14-25, 1/5/202524-hour time (HH:MM):
local pattern = regex.new('([01]?\\d|2[0-3]):[0-5]\\d')
-- Matches: 09:30, 23:59, 8:05
-- With optional seconds:
local pattern = regex.new('([01]?\\d|2[0-3]):[0-5]\\d(:[0-5]\\d)?')
-- Matches: 09:30, 09:30:4512-hour time with AM/PM:
local pattern = regex.new('(0?[1-9]|1[0-2]):[0-5]\\d\\s*([AaPp][Mm])')
-- Matches: 9:30 AM, 12:45 PM, 9:30AMMinimum requirements (8+ chars, 1 uppercase, 1 lowercase, 1 digit):
local pattern = regex.new('^(?=.*[a-z])(?=.*[A-Z])(?=.*\\d).{8,}$')Explanation:
^ - Start of string(?=.*[a-z]) - Lookahead: at least one lowercase(?=.*[A-Z]) - Lookahead: at least one uppercase(?=.*\d) - Lookahead: at least one digit.{8,} - At least 8 characters$ - End of stringWith special character requirement:
local pattern = regex.new('^(?=.*[a-z])(?=.*[A-Z])(?=.*\\d)(?=.*[@$!%*?&]).{8,}$')IPv4 address:
local pattern = regex.new('\\b(?:\\d{1,3}\\.){3}\\d{1,3}\\b')
-- Matches: 192.168.1.1, 10.0.0.1
-- With validation (0-255 per octet):
local pattern = regex.new('\\b(?:(?:25[0-5]|2[0-4]\\d|[01]?\\d\\d?)\\.){3}(?:25[0-5]|2[0-4]\\d|[01]?\\d\\d?)\\b')IPv6 address (simplified):
local pattern = regex.new('(?:[0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}')
-- Matches full IPv6: 2001:0db8:85a3:0000:0000:8a2e:0370:7334Match opening and closing tags:
local pattern = regex.new('<([a-zA-Z][a-zA-Z0-9]*)\\b[^>]*>(.*?)</\\1>')
-- Matches: <div>content</div>, <span class="x">text</span>
-- Captures: tag name (group 1), content (group 2)Extract tag content:
local pattern = regex.new('<[^>]+>(.*?)</[^>]+>')
-- Captures content between any tagsMatch self-closing tags:
local pattern = regex.new('<[a-zA-Z][a-zA-Z0-9]*\\b[^>]*/>')
-- Matches: <br/>, <img src="x" />Basic CSV field:
local pattern = regex.new('([^,]+),?')
-- Matches fields separated by commasCSV with quoted fields:
local pattern = regex.new('(?:^|,)(?:\"([^\"]*(?:\"\"[^\"]*)*)\"|([^,]*))')
-- Handles: "quoted field", unquoted, "field with ""quotes"""Extract words:
local pattern = regex.new('\\b\\w+\\b')
-- Matches: any word (alphanumeric + underscore)
local pattern = regex.new('\\b[A-Za-z]+\\b')
-- Matches: only alphabetic wordsExtract words with apostrophes:
local pattern = regex.new('\\b[A-Za-z]+(?:\'[A-Za-z]+)?\\b')
-- Matches: don't, it's, can't, etc.Integer:
local pattern = regex.new('-?\\d+')
-- Matches: 123, -456Floating point:
local pattern = regex.new('-?\\d+\\.\\d+')
-- Matches: 123.45, -67.89
-- With optional decimal part:
local pattern = regex.new('-?\\d+(?:\\.\\d+)?')
-- Matches: 123, 123.45, -67.89Scientific notation:
local pattern = regex.new('-?\\d+(?:\\.\\d+)?(?:[eE][+-]?\\d+)?')
-- Matches: 1.23e10, -4.5E-6, 123Trim leading/trailing whitespace:
local pattern = regex.new('^\\s+|\\s+$')
-- Use with replace to remove leading/trailing spacesCollapse multiple spaces:
local pattern = regex.new('\\s+')
-- Replace with single space to normalize whitespaceSplit on whitespace:
local pattern = regex.new('\\s+')
-- Use with split to separate wordsUnix/Linux path:
local pattern = regex.new('^(/[^/]+)+/?$')
-- Matches: /home/user/file.txt, /usr/local/bin/Windows path:
local pattern = regex.new('^[A-Za-z]:\\\\(?:[^\\\\/:*?\"<>|]+\\\\)*[^\\\\/:*?\"<>|]*$')
-- Matches: C:\Users\Name\file.txtFile extension:
local pattern = regex.new('\\.([A-Za-z0-9]+)$')
-- Captures file extension: .txt, .pdf, .jpgSemantic versioning:
local pattern = regex.new('^(0|[1-9]\\d*)\\.(0|[1-9]\\d*)\\.(0|[1-9]\\d*)(?:-((?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\\.(?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\\+([0-9a-zA-Z-]+(?:\\.[0-9a-zA-Z-]+)*))?$')
-- Matches: 1.0.0, 2.1.3, 1.0.0-alpha.1, 1.0.0+build.123Simple version:
local pattern = regex.new('\\d+\\.\\d+(?:\\.\\d+)?')
-- Matches: 1.0, 1.0.5, 2.10.1When pattern compilation or matching fails, specific error types indicate the nature of the problem. Understanding these errors helps diagnose and fix pattern issues.
These errors occur when compiling a regex pattern:
| Error | Description | Example |
|---|---|---|
error_escape |
Invalid escape sequence | \q (undefined escape), \c (not followed by letter), \x (not followed by two hex digits), \u{GGGG} (invalid hex) |
error_brack |
Mismatched square brackets | [abc, abc], [a[b] (nested) |
error_paren |
Mismatched parentheses | (abc, abc), ((a) (unclosed) |
error_brace |
Mismatched curly braces | a{3, a3}, a{2,} (missing closing brace) |
error_badbrace |
Invalid quantifier range | {3,2} (n > m), {-1} (negative), {,5} (missing n) |
error_range |
Invalid character range in class | [z-a] (reversed), [\u0100-\u0010] (start > end) |
error_backref |
Invalid backreference | \9 (group doesn't exist), \k<name> (name doesn't exist) |
error_modifier |
Invalid flag modifier | (?ii:...) (duplicate flag), (?i-i:...) (contradictory) |
error_operator |
Invalid set operator usage | [AB--CD] (mixed operators at same level), !! (reserved double punctuator in class) |
error_noescape |
Character must be escaped | [(] (should be [\(]), [{] (should be [\{]) in character classes |
error_complement |
Invalid negation | [^\p{RGI_Emoji}] (string property in negated class), \P{RGI_Emoji} (string property with \P) |
error_badrepeat |
Quantifier without preceding expression | *abc (starts with quantifier), a** (double quantifier) |
error_utf8 |
Invalid UTF-8 sequence in pattern | Pattern contains invalid UTF-8 bytes, overlong encoding, or code point > U+10FFFF |
-- Invalid: \q is not defined
local pattern = regex.new('\\q') -- Error: invalid escape sequence
-- Invalid: \c not followed by letter
local pattern = regex.new('\\c5') -- Error: expected A-Z or a-z after \c
-- Invalid: \x not followed by two hex digits
local pattern = regex.new('\\xGG') -- Error: expected two hex digits
-- Invalid: code point exceeds maximum
local pattern = regex.new('\\u{110000}') -- Error: code point > U+10FFFF
-- Valid alternatives:
local pattern = regex.new('q') -- Literal q
local pattern = regex.new('\\x71') -- Hex escape for q
local pattern = regex.new('\\u0071') -- Unicode escape for q-- Invalid: unclosed bracket
local pattern = regex.new('[abc') -- Error: missing ]
-- Invalid: extra closing bracket
local pattern = regex.new('abc]') -- Error: unmatched ]
-- Valid:
local pattern = regex.new('[abc]') -- Correct bracket pair
local pattern = regex.new('\\]') -- Escaped bracket (literal)-- Invalid: unclosed parenthesis
local pattern = regex.new('(abc') -- Error: missing )
-- Invalid: extra closing parenthesis
local pattern = regex.new('abc)') -- Error: unmatched )
-- Valid:
local pattern = regex.new('(abc)') -- Correct parenthesis pair
local pattern = regex.new('\\(abc\\)') -- Escaped parentheses (literals)-- Invalid: unclosed brace
local pattern = regex.new('a{3') -- Error: missing }
-- Valid:
local pattern = regex.new('a{3}') -- Correct quantifier
local pattern = regex.new('\\{3\\}') -- Escaped braces (literals)-- Invalid: n > m in range
local pattern = regex.new('a{5,3}') -- Error: 5 > 3
-- Invalid: missing n
local pattern = regex.new('a{,5}') -- Error: must specify n
-- Valid:
local pattern = regex.new('a{3,5}') -- n ≤ m
local pattern = regex.new('a{3,}') -- n or more (no maximum)
local pattern = regex.new('a{3}') -- exactly n-- Invalid: reversed range
local pattern = regex.new('[z-a]') -- Error: z (U+007A) > a (U+0061)
-- Invalid: empty range
local pattern = regex.new('[\\u0100-\\u0010]') -- Error: start > end
-- Valid:
local pattern = regex.new('[a-z]') -- Correct range
local pattern = regex.new('[z]') -- Single character (no range)-- Invalid: group doesn't exist
local pattern = regex.new('\\5') -- Error: no group 5
-- Invalid: named group doesn't exist
local pattern = regex.new('\\k<missing>') -- Error: no group named 'missing'
-- Valid:
local pattern = regex.new('(a)\\1') -- Backreference to group 1
local pattern = regex.new('(?<x>a)\\k<x>') -- Named backreference-- Invalid: duplicate flag
local pattern = regex.new('(?ii:abc)') -- Error: 'i' appears twice
-- Invalid: contradictory flags
local pattern = regex.new('(?i-i:abc)') -- Error: both +i and -i
-- Valid:
local pattern = regex.new('(?i:abc)') -- Single flag
local pattern = regex.new('(?im:abc)') -- Multiple different flags
local pattern = regex.new('(?i-m:abc)') -- Enable and disable flags-- Invalid: mixed operators at same level
local pattern = regex.new('[AB--CD]') -- Error: union (AB) then subtraction
-- Invalid: reserved double punctuator
local pattern = regex.new('[a-z!!]') -- Error: !! is reserved
-- Valid:
local pattern = regex.new('[[AB]--[CD]]') -- Nested classes
local pattern = regex.new('[A[B--C]D]') -- Operator in nested level
local pattern = regex.new('[a-z\\!\\!]') -- Escaped (two separate !)-- Invalid: ( must be escaped in character class
local pattern = regex.new('[(]') -- Error: must escape (
-- Invalid: { must be escaped
local pattern = regex.new('[{]') -- Error: must escape {
-- Valid:
local pattern = regex.new('[\\(]') -- Escaped (
local pattern = regex.new('[\\{\\}]') -- Escaped braces-- Invalid: string property in negated class
local pattern = regex.new('[^\\p{RGI_Emoji}]') -- Error: cannot negate string property
-- Invalid: string property with \P
local pattern = regex.new('\\P{RGI_Emoji}') -- Error: \P doesn't support string properties
-- Valid:
local pattern = regex.new('[\\p{RGI_Emoji}]') -- String property in positive class
local pattern = regex.new('\\P{Emoji}') -- Character property (not string)
local pattern = regex.new('[^\\p{Emoji}]') -- Character property negated-- Invalid: quantifier at start
local pattern = regex.new('*abc') -- Error: nothing to repeat
-- Invalid: double quantifier
local pattern = regex.new('a**') -- Error: quantifier on quantifier
-- Valid:
local pattern = regex.new('a*bc') -- Quantifier after character
local pattern = regex.new('\\*abc') -- Escaped * (literal)-- Invalid UTF-8 in pattern
-- (This typically occurs when pattern strings contain invalid byte sequences)
-- Invalid: overlong encoding
local pattern = regex.new('\\xC0\\xB0') -- Error: overlong form of U+0030
-- Valid:
local pattern = regex.new('\\x30') -- Shortest form
local pattern = regex.new('\\u0030') -- Unicode escapeFluid:
-- Using catch for error handling
local err, pattern = catch(function()
return regex.new('[invalid')
end)
if err then
print('Pattern compilation failed: ' .. err.message)
print('Error line: ' .. (err.line or 'unknown'))
else
-- Use pattern
endC++:
try {
auto pattern = pf::regex("[invalid");
} catch (const std::exception& e) {
std::cerr << "Pattern compilation failed: " << e.what() << std::endl;
}Test patterns incrementally: Build complex patterns step by step, testing each addition
Use online regex testers: Many tools visualise patterns and highlight errors (ensure they support ECMAScript syntax)
Check bracket matching: Count opening and closing brackets/parentheses/braces
Validate escape sequences: Ensure all \ sequences are valid
Review operator precedence: Verify set operations are properly nested
Examine Unicode sequences: Confirm \u{...} values are valid code points
Test with edge cases: Try empty strings, very long strings, and strings with special characters
Forgetting to escape special characters:
-- Wrong: . matches any character
local pattern = regex.new('file.txt')
-- Matches: "file.txt", "file?txt", "fileXtxt"
-- Correct: \. matches literal dot
local pattern = regex.new('file\\.txt')
-- Matches: "file.txt" onlyIncorrect bracket nesting:
-- Wrong: brackets don't nest this way
local pattern = regex.new('[[a-z]') -- Error
-- Correct: nest with operators
local pattern = regex.new('[[a-m][n-z]]') -- Union of two rangesQuantifier on quantifier:
-- Wrong: double quantifier
local pattern = regex.new('a*+') -- Error
-- Correct: quantify group
local pattern = regex.new('(a*)+')This manual has covered the complete regular expression syntax and features supported by the Parasol Framework:
For API documentation on the Regex class and its methods, please refer to the Regex module documentation in the Parasol API reference.
This manual documents the regex implementation as of 2025. For updates and the latest specification, refer to the ECMAScript Specification.