Regex
For testing and manipulating text, the VB.NET Regex
class
is useful. With Regex
, we use a text-processing language. This language easily handles string
data.
With the Match
function, we search strings. And with Replace
, we change those we find. And often RegexOptions
are used to change how these functions are evaluated.
Match
exampleThis program uses Regex
. Please notice the System.Text.RegularExpressions
namespace—this is important to get started with regular expressions.
Regex
object. The Regex
pattern "\w+" matches one or more word characters together.Match
Function on the Regex
instance. This returns a Match
(which we test next).Console.WriteLine
) its value—the string
"do."Imports System.Text.RegularExpressions Module Module1 Sub Main() ' Step 1: create Regex. Dim regex As Regex = New Regex("\w+") ' Step 2: call Match on Regex. Dim match As Match = regex.Match(" do?") ' Step 3: test the Success bool. ' ... If we have Success, write the Value. If match.Success Then Console.WriteLine("RESULT: [{0}]", match.Value) End If End Sub End ModuleRESULT: [do]
IgnoreCase
Next, we use different syntax, and an option, for Match
. We call the Regex.Match
shared Function—no Regex
object is needed. We then specify an option, RegexOptions.IgnoreCase
.
enum
value, a constant, specifies that lower and uppercase letters are equal.Imports System.Text.RegularExpressions Module Module1 Sub Main() ' Match ignoring case of letters. Dim match As Match = Regex.Match("I like that cat", "C.T", RegexOptions.IgnoreCase) If match.Success Then ' Write value. Console.WriteLine(match.Value) End If End Sub End Modulecat
This example uses Match
and Groups. We specify the case of letters is unimportant with RegexOptions.IgnoreCase
. And finally we test for Success on the Match
object received.
Match
. With Regex
, indexing starts at 1 not 0.Imports System.Text.RegularExpressions Module Module1 Sub Main() ' The input string. Dim value As String = "/content/alternate-1.aspx" ' Invoke the Match method. Dim m As Match = Regex.Match(value, _ "content/([A-Za-z0-9\-]+)\.aspx$", _ RegexOptions.IgnoreCase) ' If successful, write the group. If (m.Success) Then Dim key As String = m.Groups(1).Value Console.WriteLine(key) End If End Sub End Modulealternate-1
A Regex
object requires time to be created. We can instead share Regex
objects, with the shared keyword. A shared Regex
object is faster than shared Regex
Functions.
Regex
as a field in a module or class
often results in a speed boost, when Match
is called more than once.Match
function is an instance function on a Regex
object. This program has the same result as the previous program.Imports System.Text.RegularExpressions Module Module1 ''' <summary> ''' Member field regular expression. ''' </summary> Private _reg As Regex = New Regex("content/([A-Za-z0-9\-]+)\.aspx$", _ RegexOptions.IgnoreCase) Sub Main() ' The input string. Dim value As String = "/content/alternate-1.aspx" ' Invoke the Match method. ' ... Use the regex field. Dim m As Match = _reg.Match(value) ' If successful, write the group. If (m.Success) Then Dim key As String = m.Groups(1).Value Console.WriteLine(key) End If End Sub End Modulealternate-1
Match
, NextMatch
The Match()
Function returns the first match only. But we can call NextMatch()
on that returned Match
object. This is a match that is found in the text, further on.
NextMatch
can be called in a loop. This results in behavior similar to the Matches
method (which may be easier to use).Imports System.Text.RegularExpressions Module Module1 Sub Main() ' Get first match. Dim match As Match = Regex.Match("4 and 5", "\d") If match.Success Then Console.WriteLine(match.Value) End If ' Get next match. match = match.NextMatch() If match.Success Then Console.WriteLine(match.Value) End If End Sub End Module4 5
IsMatch
This returns true if a String
matches the regular expression. We get a Boolean
that tells us whether a pattern matches. If no other results are needed, IsMatch
is useful.
IsValid
Boolean
function, which computes the result of the Regex.IsMatch
function on its parameter.string
of lowercase ASCII letters, uppercase ASCII letters, or digits.Imports System.Text.RegularExpressions Module Module1 Function IsValid(ByRef value As String) As Boolean Return Regex.IsMatch(value, "^[a-zA-Z0-9]*$") End Function Sub Main() Console.WriteLine(IsValid("dotnetperls0123")) Console.WriteLine(IsValid("DotNetPerls")) Console.WriteLine(IsValid(":-)")) End Sub End ModuleTrue True False
Matching the start and end of a String
is commonly-needed. We use the metacharacters "^" and "$" to match the starts and ends of a string
.
IsMatch()
evaluates these metacharacters in the same way that Match
(or Matches
) can—the result is different for each function.Imports System.Text.RegularExpressions Module Module1 Sub Main() Dim value As String = "XXYY" ' Match the start with a "^" char. If Regex.IsMatch(value, "^XX") Then Console.WriteLine("ISMATCH START") End If ' Match the end with a "$" char. If Regex.IsMatch(value, "YY$") Then Console.WriteLine("ISMATCH END") End If End Sub End ModuleISMATCH START ISMATCH END
Regex
To optimize Regex
performance in VB.NET, we can use the RegexOptions.Compiled
enum
and store the Regex
in a field. Here we test Compiled Regexes.
Regex
field in the Module
, and calls IsMatch
on the field instance.Regex.IsMatch
directly with no stored Regex
instance. This code does the same thing as version 1.RegexOptions.Compiled
) makes regular expression testing much faster.Imports System.Text.RegularExpressions Module Module1 Dim _regex As Regex = New Regex("X.+0", RegexOptions.Compiled) Sub Version1() ' Use compiled regular expression stored as field. If _regex.IsMatch("X12340") = False Then Throw New Exception End If End Sub Sub Version2() ' Do not use compiled Regex. If Regex.IsMatch("X12340", "X.+0") = False Then Throw New Exception End If End Sub Sub Main() Dim m As Integer = 100000 Dim s1 As Stopwatch = Stopwatch.StartNew ' Version 1: use RegexOptions.Compiled. For i As Integer = 0 To m - 1 Version1() Next s1.Stop() Dim s2 As Stopwatch = Stopwatch.StartNew ' Version 2: do not compile the Regex. For i As Integer = 0 To m - 1 Version2() Next s2.Stop() Dim u As Integer = 1000000 Console.WriteLine(((s1.Elapsed.TotalMilliseconds * u) / m).ToString("0.00 ns")) Console.WriteLine(((s2.Elapsed.TotalMilliseconds * u) / m).ToString("0.00 ns")) End Sub End Module131.78 ns IsMatch, RegexOptions.Compiled 484.66 ns IsMatch
In some programs, a Regex
is the easiest way to process text. At its core, the Regex
type exposes a text-processing language—one built upon finite deterministic automata.