Regular Expressions
NQE provides regular expression support similar to that found in other languages, like Python, Perl, Java, and SQL. In the networking context, regular expressions are useful for checking whether certain strings match specific formats, or extracting data from textual data sources, such as config files and state tables.
Examples
The following query redacts all IP addresses found in configuration commands:
ipRegex = re`\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}`;
foreach device in network.devices
foreach command in device.outputs.commands
where command.commandType == CommandType.CONFIG
select {
Device: device.name,
RedactedConfig: replaceRegexMatches(command.response, ipRegex, "<REDACTED>")
}
The following query gets only the devices following a naming convention, such as atl123:
validNameRegex = re`(?<Location>\w{1,4})(?<ID:Number>\d{3})`; // location code followed by 3-digit ID
foreach device in network.devices
where hasMatch(device.name, validNameRegex)
select { Device: device.name }
The above example can be modified slightly to extract the location and ID:
validNameRegex = re`(?<Location>\w{1,4})(?<ID:Number>\d{3})`;
foreach device in network.devices
let result = match(device.name, validNameRegex) // Save match result
where isPresent(result)
select {
Device: device.name,
Location: result.data.Location, // Use 'Location' capture group
ID: result.data.ID // Use 'ID' capture group
}
The following sections will explain the constructs used in these examples.
Creating Regular Expressions
Regular expressions can be defined using regex literal syntax or the regex function:
personRegex1 = re`Name: \w+, Location: \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}`;
personRegex2 = regex("Name: \w+, Location: \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}");
The regex function is useful when the pattern is only known when the query is executed, but the resulting regular
expression does not support data extraction.
See the Regex Syntax section for details of the regular expression syntax.
Data Extraction
Data can be extracted from a piece of text by matching the text against a regular expression.
A regular expression matches a string if it adheres to the structure specified by the regular expression.
For instance, personRegex, below, specifies text with any number of word characters, followed
by a space, then any number of additional word characters.
peopleList =
[ "Name: Jasim, Location: 0.0.1.2"
, "Name: Kathy, Location: 8.8.8.8"
, "Name: Jane, Location: right here"
, "Name: Jim, Location: 123.456.7.8"
];
personRegex = re`Name: \w+, Location: \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}`;
foreach person in peopleList
select { result: hasMatch(person, personRegex) }
The above query results in 4 rows, only Jane's row is false.
Capturing matched text
The above example can be modified slightly to work with multiple people in a single piece of text:
// We have all people in a single string
peopleText =
"""
Name: Jasim, Location: 0.0.1.2
Name: Kathy, Location: 8.8.8.8
Name: Jane, Location: right here
Name: Jim, Location: 123.456.7.8
""";
personRegex = re`Name: \w+, Location: \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}`;
foreach personMatch in regexMatches(peopleText, personRegex) // Save match results
select { result: personMatch.string } // Use matched substring
The above query results in 3 rows, with Jane's row omitted since it does not match.
Each personMatch match result is a record with fields:
string: StringThe substring that is matched.start: IntegerZero-based character offset intextwhere the match begins.end: IntegerThe first character offset index after the match ends.data: TThe capture groups. This is a record value.
For more details, see the regexMatches function.
Using capture groups
The above example can be further modified to extract the name and location of each person:
peopleText =
"""
Name: Jasim, Location: 0.0.1.2
Name: Kathy, Location: 8.8.8.8
Name: Jane, Location: right here
Name: Jim, Location: 123.456.7.8
""";
// Use groups to capture interesting sub-parts of a regular expression
personRegex = re`Name: (\w+), Location: (\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})`;
foreach personMatch in regexMatches(peopleText, personRegex)
let person = personMatch.data
select { Name: person["1"], Location: person["2"] } // Use capture group values, by positional index
The above query results in the same 3 rows.
Using named capture groups
Capture groups can also be given meaningful names, which can often increase the clarity of the query:
peopleText =
"""
Name: Jasim, Location: 0.0.1.2
Name: Kathy, Location: 8.8.8.8
Name: Jane, Location: right here
Name: Jim, Location: 123.456.7.8
""";
// Use names for each intersting sub-part of the regular expression
personRegex = re`Name: (?<Name>\w+), Location: (?<Location>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})`;
foreach personMatch in regexMatches(peopleText, personRegex)
let person = personMatch.data
select { Name: person.Name, Location: person.Location } // Use capture group values, by name
In the above example, the name is captured using (?<Name>\w+). As a result, the match value in personMatch.data
has a Name field that contains the matched name. The select statement now references the name as person.Name
rather than person["1"] ---which is still a valid way to reference the name value: Named groups can be accessed
both by name or by positional index. This query results in the same 3 rows as the previous one, without explicit
capture group names.
Since matches of unnamed capture groups are accessed via a record field named according to the group's position in the regex, the field containing the matched data may change if the regex is adjusted. It is recommended to use named capture groups.
Named capture groups can be repeated if separated by the | operator. Referencing the match data by name retrieves the
value wherever it was matched, whereas referencing by index retrieves the value at that particular location.
For example:
foreach text in ["Jane is 32 years old"]
let person = match(text, re`(?<age:Number>\\d+)|(?:(?<name>\\w+) is (?<age:Number>\\d+) years old)`)
select {
"The value of capture named 'age'": person.age,
"The captured value of the first occurence of 'age'": person["1"],
"The captured value of the second occurence of 'age'": person["3"],
"Aside: 'name' is at index 2": person.name == person["2"]
}
The above query results in a single row whose columns have values: 32, null, 32, and finally true.
The above query uses a shy group to ensure we have a choice of two regular expressions, where the second
expression is (?<name>\\w+) is (?<age:Number>\\d+) years old.
Using shy groups
The shy group construction (?:x) is like a capture group (x) but it does no capturing:
Text matching the inner regular expression x is not captured into a field of the match.
Shy groups are useful for enforcing order of operations. For example, the following uses a shy group to apply a
quantifier to the choice [A-C]|[D-F].
initialsRegex = re`(?<initial>(?:[A-C]|[D-F]){2} (?<age:Integer>\d+))`;
foreach person in ["AF 12", "CD 34", "ZD 34"]
select { isValid: hasMatch(person, initialsRegex) }
This query results in 3 rows: true, true, false.
Using type annotations
Named capture groups can also be given type annotations, which ensure that a capture is not only of the right shape but also of the right datatype.
peopleText =
"""
Name: Jasim, Location: 0.0.1.2
Name: Kathy, Location: 8.8.8.8
Name: Jane, Location: right here
Name: Jim, Location: 123.456.7.8
""";
// The only change from the previous example is the type annotation on the 'Location' capture group.
personRegex = re`Name: (?<Name>\w+), Location: (?<Location:IpAddress>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})`;
foreach personMatch in regexMatches(peopleText, personRegex)
let person = personMatch.data
select { Name: person.Name, Location: person.Location }
In this example, the second capture group in personRegex is now declared as <Location:IpAddress> which says that
only valid IP addresses should be matched, that matching data will be captured into Location, and that the captured
data will be converted to NQE's IpAddress type. As such, this query results in only two rows, since Jane's row
is not of the right shape and Jim's row is of the right shape but is not a valid IpAddress.
The supported capture group types are String, Number, Float, IpAddress. Below are further examples.
| Example | Result |
|---|---|
match("2357", re`(?<value>\d+)`) | {value: "2357"} |
match("2357", re`(?<value:String>\d+)`) | {value: "2357"} |
match("2357", re`(?<value:Number>\d+)`) | {value: 2357} |
match("3x", re`(?<value:Number>\d+)`) | null; i.e., no match |
match("3.14", re`(?<value:Float>\d+\.?\d*)`) | {value: 3.14} |
match("3 14", re`(?<value:Float>\d+\.?\d*)`) | null; i.e., no match |
match("2.4.6.8", re`(?<value:IpAddress>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})`) | {value: ipAddress("2.4.6.8")} |
match("a.b.c.d", re`(?<value:IpAddress>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})`) | null; i.e., no match |
Regex Syntax
A regular expression literal starts with re`, then continues with any number of lines, and then ends with a final
backtick `. The following table summarizes the available regular expression constructs:
| Syntax | Description |
|---|---|
c | The single character c |
. | Any single character |
[abc] | Any of the characters a, b, or c |
[^abc] | None of the characters a, b, nor c, |
\d | Any single digit; equivalent to [0-9] |
\D | Any non-digit; equivalent to [^0-9] |
\t | A horizontal tab character; equivalent to \011 |
\n | A new line character; equivalent to \012 |
\r | A carriage return |
\f | A form feed character; equivalent to \014 |
\s | Any (non-vertical) whitespace character; equivalent to [\t\n\f\r ] |
\S | Any non-whitespace character; equivalent to [^\t\n\f\r ] |
\w | Any word-constituent character; equivalent to [0-9A-Za-z_] |
\W | Any non-word-constituent character; equivalent to [^0-9A-Za-z_] |
^ | Assert at beginning of text |
$ | Assert at end of text |
\b | Assert at ASCII word boundary: Matches the empty string, but only at the beginning or end of a word |
\B | Assert not at ASCII word boundary |
xy | Regex x followed by regex y |
x∣y | Matches x if possible, otherwise try to match y. “Ordered Choice”. |
(?<name>x) | Named capturing group (submatch) |
(?<name:Type>x) | Named and typed capturing group (submatch) |
(x) | Numbered capturing group (submatch) |
(?:x) | Non-capturing group: Text matching the inner regular expression x is not captured into a field of the match. This is useful for enforcing precedence. |
x* | As many instances of x as possible, possibly zero instances |
x*? | Like x*, but matches the shortest possible string so the overall regex matches |
x+ | As many instances of x as possible, asserting at least one instance |
x+? | Like x+, but matches the shortest possible string so the overall regex matches |
x? | Zero or one instances of x, prefer one |
x?? | Like x?, but matches the shortest possible string so the overall regex matches |
x{n,m} | As many instances of x as possible, asserting at least n instances and no more than m instances |
x{n,m}? | Like x{n,m}, but matches the shortest possible string so the overall regex matches |
x{n,} | As many instances of x as possible, asserting at least n instances |
x{n,}? | Like x{n,}?, but matches the shortest possible string so the overall regex matches |
x{n} | Exactly n instances of x |
x{n}? | Like x{n}?, but matches the shortest possible string so the overall regex matches |
There is no support for back-references such as \1.
Regex Validity
The following kinds of regular expressions are considered invalid:
-
Quantifying with an upper bound greater than 1 involving capture groups, such as
re`(\w+){1,3}`andre`(?<name>\w+){1,3}`.- Fix by switching to a non-capturing group construction as in
re`(?:\w+){1,3}`.
- Fix by switching to a non-capturing group construction as in
-
Nesting quantifiers, such as
re`\d{3}*`orre`\d{3}+`.- NQE disallows such nestings for performance reasons.
- Fix by using a shy capture group as in
re`(?:\d{3})*`orre`(?:\d{3})+`. - Note that you can nest quantifers where the outer quantifier is
?; these are reluctant quantifiers. For example,re`\d{3}?`is a valid regular expression.
-
Using the same name in multiple capture groups with different types, such as
re`(?<age:String>\d+)|(?<age:Number>\d+)`.- Fix by making the declared capture group
agehave the same type, as in
re`(?<age:String>\d+)|(?<age:String>\d+)`orre`(?<age:Number>\d+)|(?<age:Number>\d+)`.
- Fix by making the declared capture group
See Also
Functions
| Function | Description |
|---|---|
| hasMatch | Checks whether a string matches the structure specified by a regex or not. |
| match | Gets a record of information about text that matches the structure specified by a regex, or otherwise null. |
| regexMatches | Gets the list of all substring matches of a given regex. |
| replaceRegexMatches | Replaces all substring matches of a given regex using the provided replacement template. |
| regex | Converts a string into a regex. |
Related Types
The Pattern Type allows word-based pattern matching and replaceMatches allows character-based matching via globs. However, globs only support a very narrow set of meta-characters. Regular expressions bridge the expressive power not covered by Pattern Type and globs.