Skip to main content

Regular Expressions

NQE provides regular expression support similar to that found in other languages, like Python, Perl, Java, and SQL. In the networking context, regular expressions are useful for checking whether certain strings match specific formats, or extracting data from textual data sources, such as config files and state tables.

Examples

The following query redacts all IP addresses found in configuration commands:

ipRegex = re`\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}`;

foreach device in network.devices
foreach command in device.outputs.commands
where command.commandType == CommandType.CONFIG
select {
Device: device.name,
RedactedConfig: replaceRegexMatches(command.response, ipRegex, "<REDACTED>")
}

The following query gets only the devices following a naming convention, such as atl123:

validNameRegex = re`(?<Location>\w{1,4})(?<ID:Number>\d{3})`; // location code followed by 3-digit ID

foreach device in network.devices
where hasMatch(device.name, validNameRegex)
select { Device: device.name }

The above example can be modified slightly to extract the location and ID:

validNameRegex = re`(?<Location>\w{1,4})(?<ID:Number>\d{3})`;

foreach device in network.devices
let result = match(device.name, validNameRegex) // Save match result
where isPresent(result)
select {
Device: device.name,
Location: result.data.Location, // Use 'Location' capture group
ID: result.data.ID // Use 'ID' capture group
}

The following sections will explain the constructs used in these examples.

Creating Regular Expressions

Regular expressions can be defined using regex literal syntax or the regex function:

personRegex1 = re`Name: \w+, Location: \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}`;
personRegex2 = regex("Name: \w+, Location: \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}");

The regex function is useful when the pattern is only known when the query is executed, but the resulting regular expression does not support data extraction.

See the Regex Syntax section for details of the regular expression syntax.

Data Extraction

Data can be extracted from a piece of text by matching the text against a regular expression.

A regular expression matches a string if it adheres to the structure specified by the regular expression. For instance, personRegex, below, specifies text with any number of word characters, followed by a space, then any number of additional word characters.

peopleList  =
[ "Name: Jasim, Location: 0.0.1.2"
, "Name: Kathy, Location: 8.8.8.8"
, "Name: Jane, Location: right here"
, "Name: Jim, Location: 123.456.7.8"
];

personRegex = re`Name: \w+, Location: \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}`;

foreach person in peopleList
select { result: hasMatch(person, personRegex) }

The above query results in 4 rows, only Jane's row is false.

Capturing matched text

The above example can be modified slightly to work with multiple people in a single piece of text:

// We have all people in a single string
peopleText =
"""
Name: Jasim, Location: 0.0.1.2
Name: Kathy, Location: 8.8.8.8
Name: Jane, Location: right here
Name: Jim, Location: 123.456.7.8
""";

personRegex = re`Name: \w+, Location: \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}`;

foreach personMatch in regexMatches(peopleText, personRegex) // Save match results
select { result: personMatch.string } // Use matched substring

The above query results in 3 rows, with Jane's row omitted since it does not match.

Each personMatch match result is a record with fields:

  • string: String The substring that is matched.
  • start: Integer Zero-based character offset in text where the match begins.
  • end: Integer The first character offset index after the match ends.
  • data: T The capture groups. This is a record value.

For more details, see the regexMatches function.

Using capture groups

The above example can be further modified to extract the name and location of each person:

peopleText  =
"""
Name: Jasim, Location: 0.0.1.2
Name: Kathy, Location: 8.8.8.8
Name: Jane, Location: right here
Name: Jim, Location: 123.456.7.8
""";

// Use groups to capture interesting sub-parts of a regular expression
personRegex = re`Name: (\w+), Location: (\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})`;

foreach personMatch in regexMatches(peopleText, personRegex)
let person = personMatch.data
select { Name: person["1"], Location: person["2"] } // Use capture group values, by positional index

The above query results in the same 3 rows.

Using named capture groups

Capture groups can also be given meaningful names, which can often increase the clarity of the query:

peopleText  =
"""
Name: Jasim, Location: 0.0.1.2
Name: Kathy, Location: 8.8.8.8
Name: Jane, Location: right here
Name: Jim, Location: 123.456.7.8
""";

// Use names for each intersting sub-part of the regular expression
personRegex = re`Name: (?<Name>\w+), Location: (?<Location>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})`;

foreach personMatch in regexMatches(peopleText, personRegex)
let person = personMatch.data
select { Name: person.Name, Location: person.Location } // Use capture group values, by name

In the above example, the name is captured using (?<Name>\w+). As a result, the match value in personMatch.data has a Name field that contains the matched name. The select statement now references the name as person.Name rather than person["1"] ---which is still a valid way to reference the name value: Named groups can be accessed both by name or by positional index. This query results in the same 3 rows as the previous one, without explicit capture group names.

caution

Since matches of unnamed capture groups are accessed via a record field named according to the group's position in the regex, the field containing the matched data may change if the regex is adjusted. It is recommended to use named capture groups.

Named capture groups can be repeated if separated by the | operator. Referencing the match data by name retrieves the value wherever it was matched, whereas referencing by index retrieves the value at that particular location. For example:

foreach text in ["Jane is 32 years old"]
let person = match(text, re`(?<age:Number>\\d+)|(?:(?<name>\\w+) is (?<age:Number>\\d+) years old)`)
select {
"The value of capture named 'age'": person.age,
"The captured value of the first occurence of 'age'": person["1"],
"The captured value of the second occurence of 'age'": person["3"],
"Aside: 'name' is at index 2": person.name == person["2"]
}

The above query results in a single row whose columns have values: 32, null, 32, and finally true.

The above query uses a shy group to ensure we have a choice of two regular expressions, where the second expression is (?<name>\\w+) is (?<age:Number>\\d+) years old.

Using shy groups

The shy group construction (?:x) is like a capture group (x) but it does no capturing: Text matching the inner regular expression x is not captured into a field of the match.

Shy groups are useful for enforcing order of operations. For example, the following uses a shy group to apply a quantifier to the choice [A-C]|[D-F].

initialsRegex = re`(?<initial>(?:[A-C]|[D-F]){2} (?<age:Integer>\d+))`;
foreach person in ["AF 12", "CD 34", "ZD 34"]
select { isValid: hasMatch(person, initialsRegex) }

This query results in 3 rows: true, true, false.

Using type annotations

Named capture groups can also be given type annotations, which ensure that a capture is not only of the right shape but also of the right datatype.

peopleText  =
"""
Name: Jasim, Location: 0.0.1.2
Name: Kathy, Location: 8.8.8.8
Name: Jane, Location: right here
Name: Jim, Location: 123.456.7.8
""";

// The only change from the previous example is the type annotation on the 'Location' capture group.
personRegex = re`Name: (?<Name>\w+), Location: (?<Location:IpAddress>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})`;

foreach personMatch in regexMatches(peopleText, personRegex)
let person = personMatch.data
select { Name: person.Name, Location: person.Location }

In this example, the second capture group in personRegex is now declared as <Location:IpAddress> which says that only valid IP addresses should be matched, that matching data will be captured into Location, and that the captured data will be converted to NQE's IpAddress type. As such, this query results in only two rows, since Jane's row is not of the right shape and Jim's row is of the right shape but is not a valid IpAddress.

The supported capture group types are String, Number, Float, IpAddress. Below are further examples.

ExampleResult
match("2357", re`(?<value>\d+)`){value: "2357"}
match("2357", re`(?<value:String>\d+)`){value: "2357"}
match("2357", re`(?<value:Number>\d+)`){value: 2357}
match("3x", re`(?<value:Number>\d+)`)null; i.e., no match
match("3.14", re`(?<value:Float>\d+\.?\d*)`){value: 3.14}
match("3 14", re`(?<value:Float>\d+\.?\d*)`)null; i.e., no match
match("2.4.6.8", re`(?<value:IpAddress>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})`){value: ipAddress("2.4.6.8")}
match("a.b.c.d", re`(?<value:IpAddress>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})`)null; i.e., no match

Regex Syntax

A regular expression literal starts with re`, then continues with any number of lines, and then ends with a final backtick `. The following table summarizes the available regular expression constructs:

SyntaxDescription
cThe single character c
.Any single character
[abc]Any of the characters a, b, or c
[^abc]None of the characters a, b, nor c,
\dAny single digit; equivalent to [0-9]
\DAny non-digit; equivalent to [^0-9]
\tA horizontal tab character; equivalent to \011
\nA new line character; equivalent to \012
\rA carriage return
\fA form feed character; equivalent to \014
\sAny (non-vertical) whitespace character; equivalent to [\t\n\f\r ]
\SAny non-whitespace character; equivalent to [^\t\n\f\r ]
\wAny word-constituent character; equivalent to [0-9A-Za-z_]
\WAny non-word-constituent character; equivalent to [^0-9A-Za-z_]
^Assert at beginning of text
$Assert at end of text
\bAssert at ASCII word boundary: Matches the empty string, but only at the beginning or end of a word
\BAssert not at ASCII word boundary
xyRegex x followed by regex y
x∣yMatches x if possible, otherwise try to match y. “Ordered Choice”.
(?<name>x)Named capturing group (submatch)
(?<name:Type>x)Named and typed capturing group (submatch)
(x)Numbered capturing group (submatch)
(?:x)Non-capturing group: Text matching the inner regular expression x is not captured into a field of the match. This is useful for enforcing precedence.
x*As many instances of x as possible, possibly zero instances
x*?Like x*, but matches the shortest possible string so the overall regex matches
x+As many instances of x as possible, asserting at least one instance
x+?Like x+, but matches the shortest possible string so the overall regex matches
x?Zero or one instances of x, prefer one
x??Like x?, but matches the shortest possible string so the overall regex matches
x{n,m}As many instances of x as possible, asserting at least n instances and no more than m instances
x{n,m}?Like x{n,m}, but matches the shortest possible string so the overall regex matches
x{n,}As many instances of x as possible, asserting at least n instances
x{n,}?Like x{n,}?, but matches the shortest possible string so the overall regex matches
x{n}Exactly n instances of x
x{n}?Like x{n}?, but matches the shortest possible string so the overall regex matches
caution

There is no support for back-references such as \1.

Regex Validity

The following kinds of regular expressions are considered invalid:

  • Quantifying with an upper bound greater than 1 involving capture groups, such as re`(\w+){1,3}` and re`(?<name>\w+){1,3}`.

    • Fix by switching to a non-capturing group construction as in re`(?:\w+){1,3}`.
  • Nesting quantifiers, such as re`\d{3}*` or re`\d{3}+`.

    • NQE disallows such nestings for performance reasons.
    • Fix by using a shy capture group as in re`(?:\d{3})*` or re`(?:\d{3})+`.
    • Note that you can nest quantifers where the outer quantifier is ?; these are reluctant quantifiers. For example, re`\d{3}?` is a valid regular expression.
  • Using the same name in multiple capture groups with different types, such as
    re`(?<age:String>\d+)|(?<age:Number>\d+)`.

    • Fix by making the declared capture group age have the same type, as in
      re`(?<age:String>\d+)|(?<age:String>\d+)` or re`(?<age:Number>\d+)|(?<age:Number>\d+)`.

See Also

Functions

FunctionDescription
hasMatchChecks whether a string matches the structure specified by a regex or not.
matchGets a record of information about text that matches the structure specified by a regex, or otherwise null.
regexMatchesGets the list of all substring matches of a given regex.
replaceRegexMatchesReplaces all substring matches of a given regex using the provided replacement template.
regexConverts a string into a regex.

The Pattern Type allows word-based pattern matching and replaceMatches allows character-based matching via globs. However, globs only support a very narrow set of meta-characters. Regular expressions bridge the expressive power not covered by Pattern Type and globs.