HomeSearch

C# Regex.Match Examples: Regular Expressions

Use the Regex class and Regex.Match. Review features from System.Text.RegularExpressions.
Regex. We live in a universe of great complexity. An acorn falls to the ground. A tree grows in its place. From small things big effects come.
Regex details. A regular expression a tiny program. Much like an acorn it contains a processing instruction. It processes text—it replaces and matches text.
Match example. This program introduces the Regex class. Regex, and Match, are found in the System.Text.RegularExpressions namespace.

Step 1: We create a Regex. The Regex uses a pattern that indicates one or more digits.

Step 2: Here we invoke the Match method on the Regex. The characters "55" match the pattern specified in step 1.

Step 3: The returned Match object has a bool property called Success. If it equals true, we found a match.

C# program that uses Match, Regex using System; using System.Text.RegularExpressions; class Program { static void Main() { // Step 1: create new Regex. Regex regex = new Regex(@"\d+"); // Step 2: call Match on Regex instance. Match match = regex.Match("Dot 55 Perls"); // Step 3: test for Success. if (match.Success) { Console.WriteLine("MATCH VALUE: " + match.Value); } } } Output MATCH VALUE: 55
Static method. We do not need to create a Regex instance to use Match: we can invoke the static Regex.Match. This example builds up some complexity—we access Groups after testing Success.

Part 1: This is the string we are testing. Notice how it has a file name part inside a directory name and extension.

Part 2: We use the Regex.Match static method. The second argument is the pattern we wish to match with.

Part 3: We test the result of Match with the Success property. When true, a Match occurred and we can access its Value or Groups.

Part 4: We access Groups when Success is true. This collection is indexed at 1, not zero—the first group is found at index 1.

Regex Groups
C# program that uses Regex.Match using System; using System.Text.RegularExpressions; class Program { static void Main() { // Part 1: the input string. string input = "/content/alternate-1.aspx"; // Part 2: call Regex.Match. Match match = Regex.Match(input, @"content/([A-Za-z0-9\-]+)\.aspx$", RegexOptions.IgnoreCase); // Part 3: check the Match for Success. if (match.Success) { // Part 4: get the Group value and display it. string key = match.Groups[1].Value; Console.WriteLine(key); } } } Output alternate-1 Pattern details: @" This starts a verbatim string literal. content/ The group must follow this string. [A-Za-z0-9\-]+ One or more alphanumeric characters. (...) A separate group. \.aspx This must come after the group. $ Matches the end of the string.
NextMatch. More than one match may be found. We can call NextMatch() to search for a match that comes after the current one in the text. NextMatch can be used in a loop.

Step 1: We call Regex.Match. Two matches occur. This call to Regex.Match returns the first Match only.

Step 2: NextMatch returns another Match object—it does not modify the current one. We assign a variable to it.

C# program that uses NextMatch using System; using System.Text.RegularExpressions; class Program { static void Main() { string value = "4 AND 5"; // Step 1: get first match. Match match = Regex.Match(value, @"\d"); if (match.Success) { Console.WriteLine(match.Value); } // Step 2: get second match. match = match.NextMatch(); if (match.Success) { Console.WriteLine(match.Value); } } } Output 4 5
Preprocess. Sometimes we can preprocess strings before using Match() on them. This can be faster and clearer. Experiment. I found using ToLower to normalize chars was a good choice.ToLower
C# program that uses ToLower, Match using System; using System.Text.RegularExpressions; class Program { static void Main() { // This is the input string. string input = "/content/alternate-1.aspx"; // Here we lowercase our input first. input = input.ToLower(); Match match = Regex.Match(input, @"content/([A-Za-z0-9\-]+)\.aspx$"); } }
Static. Often a Regex instance object is faster than the static Regex.Match. For performance, we should usually use an instance object. It can be shared throughout an entire project.Static Regex

Sometimes: We only need to call Match once in a program's execution. A Regex object does not help here.

Class: Here a static class stores an instance Regex that can be used project-wide. We initialize it inline.

Static
C# program that uses static Regex using System; using System.Text.RegularExpressions; class Program { static void Main() { // The input string again. string input = "/content/alternate-1.aspx"; // This calls the static method specified. Console.WriteLine(RegexUtil.MatchKey(input)); } } static class RegexUtil { static Regex _regex = new Regex(@"/content/([a-z0-9\-]+)\.aspx$"); /// <summary> /// This returns the key that is matched within the input. /// </summary> static public string MatchKey(string input) { Match match = _regex.Match(input.ToLower()); if (match.Success) { return match.Groups[1].Value; } else { return null; } } } Output alternate-1
Numbers. A common requirement is extracting a number from a string. We can do this with Regex.Match. To get further numbers, consider Matches() or NextMatch.

Digits: We extract a group of digit characters and access the Value string representation of that number.

Parse: To parse the number, use int.Parse or int.TryParse on the Value here. This will convert it to an int.

int.Parse
C# program that matches numbers using System; using System.Text.RegularExpressions; class Program { static void Main() { // ... Input string. string input = "Dot Net 100 Perls"; // ... One or more digits. Match m = Regex.Match(input, @"\d+"); // ... Write value. Console.WriteLine(m.Value); } } Output 100
Value, length, index. A Match object, returned by Regex.Match has a Value, Length and Index. These describe the matched text (a substring of the input).

Value: This is the matched text, represented as a separate string. This is a substring of the original input.

Length: This is the length of the Value string. Here, the Length of "Axxxxy" is 6.

Index: The index where the matched text begins within the input string. The character "A" starts at index 4 here.

C# program that shows value, length, index using System; using System.Text.RegularExpressions; class Program { static void Main() { Match m = Regex.Match("123 Axxxxy", @"A.*y"); if (m.Success) { Console.WriteLine("Value = " + m.Value); Console.WriteLine("Length = " + m.Length); Console.WriteLine("Index = " + m.Index); } } } Output Value = Axxxxy Length = 6 Index = 4
IsMatch. This method tests for a matching pattern. It does not capture groups from this pattern. It just sees if the pattern exists in a valid form in the input string.

Bool: IsMatch returns a bool value. Both overloads receive an input string that is searched for matches.

Bool Method

Internals: When we use the static Regex.IsMatch method, a new Regex is created. This is done in the same way as any instance Regex.

And: This instance is discarded at the end of the method. It will be cleaned up by the garbage collector.

C# program that uses Regex.IsMatch method using System; using System.Text.RegularExpressions; class Program { /// <summary> /// Test string using Regex.IsMatch static method. /// </summary> static bool IsValid(string value) { return Regex.IsMatch(value, @"^[a-zA-Z0-9]*$"); } static void Main() { // Test the strings with the IsValid method. Console.WriteLine(IsValid("dotnetperls0123")); Console.WriteLine(IsValid("DotNetPerls")); Console.WriteLine(IsValid(":-)")); // Console.WriteLine(IsValid(null)); // Throws an exception } } Output True True False
Start, end matching. We can use metacharacters to match the start and end of strings. This is often done when using regular expressions. Use "^" to match the start, and "$" for the end.

Info: We use IsMatch here, but Regex.Match could be used instead—a Match would be returned instead of a bool.

C# program that uses IsMatch, start and end using System; using System.Text.RegularExpressions; class Program { static void Main() { string test = "xxyy"; // Use the "^" char to match the start of a string. if (Regex.IsMatch(test, "^xx")) { Console.WriteLine("START MATCHES"); } // Use the "$" char to match the end of a string. if (Regex.IsMatch(test, "yy$")) { Console.WriteLine("END MATCHES"); } } } Output START MATCHES END MATCHES Pattern details: ^ Match start of string. xx Match 2 x chars. yy Match 2 y chars. $ Match end of string.
RegexOptions. With the Regex type, the RegexOptions enum is used to modify method behavior. Often I find the IgnoreCase value helpful.

IgnoreCase: Lowercase and uppercase letters are distinct in the Regex text language. IgnoreCase changes this.

IgnoreCase

Multiline: We can change how the Regex type acts upon newlines with the RegexOptions enum. This is often useful.

Multiline
C# program that uses RegexOptions.IgnoreCase using System; using System.Text.RegularExpressions; class Program { static void Main() { const string value = "TEST"; // ... This ignores the case of the "TE" characters. if (Regex.IsMatch(value, "te..", RegexOptions.IgnoreCase)) { Console.WriteLine(true); } } } Output True
Benchmark, Regex. Consider the performance of Regex.Match. If we use the RegexOptions.Compiled enum, and use a cached Regex object, we can get a performance boost.

Version 1: In this version of the code, we call the static Regex.Match method, without any object caching.

Version 2: Here we access a cached object and call Match() on this instance of the Regex.

Result: By using a static field Regex, and RegexOptions.Compiled, our method completes twice as fast.

Warning: A compiled Regex will cause a program to start up slower, and may use more memory—so only compile hot Regexes.

C# program that benchmarks Match, RegexOptions.Compiled using System; using System.Diagnostics; using System.Text.RegularExpressions; class Program { static int Version1() { string value = "This is a simple 5string5 for Regex."; return Regex.Match(value, @"5\w+5").Length; } static Regex _wordRegex = new Regex(@"5\w+5", RegexOptions.Compiled); static int Version2() { string value = "This is a simple 5string5 for Regex."; return _wordRegex.Match(value).Length; } const int _max = 1000000; static void Main() { // Version 1: use Regex.Match. var s1 = Stopwatch.StartNew(); for (int i = 0; i < _max; i++) { if (Version1() != 8) { return; } } s1.Stop(); // Version 2: use Regex.Match, compiled Regex, instance Regex. var s2 = Stopwatch.StartNew(); for (int i = 0; i < _max; i++) { if (Version2() != 8) { return; } } s2.Stop(); Console.WriteLine(((double)(s1.Elapsed.TotalMilliseconds * 1000000) / _max).ToString("0.00 ns")); Console.WriteLine(((double)(s2.Elapsed.TotalMilliseconds * 1000000) / _max).ToString("0.00 ns")); Console.Read(); } } Output 826.03 ns Regex.Match 412.09 ns instanceRegex.Match, Compiled
Notes, Regex performance. Sadly C# code that uses Regex often results in slower code than imperative loops (like for-loops). But we can optimize Regex usage.

1. Compile. Using the RegexOptions.Compiled argument to a Regex instance will make it execute faster. This however has a startup penalty.

RegexOptions.Compiled

2. Replace with loop. Some Regex method calls can be replaced with a loop. The loop is much faster.

Regex vs. Loop

3. Use static fields. You can cache a Regex instance as a static field—an example is provided in the above example.

Matches. Sometimes one match is not enough. Here we use Matches instead of Match: it returns multiple Match objects at once. These are returned in a MatchCollection.MatchesMatches: Quote
Replace. Sometimes we need to replace a pattern of text with some other text. Regex.Replace helps. We can replace patterns with a string, or with a value determined by a MatchEvaluator.ReplaceReplace: EndReplace: NumbersReplace: SpacesReplace: Trim
Split. Do you need to extract substrings that contain only certain characters (certain digits, letters)? Split() returns a string array that will contain the matching substrings.Split

Numbers: We can handle certain character types, such as numbers, with the Split method. This is powerful. It handles many variations.

Split: Numbers

Caution: The Split method in Regex is more powerful than the one on the string type. But it may be slower in common cases.

String Split
Escape. This method can change a user input to a valid Regex pattern. It assumes no metacharacters were intended. The input string should be only literal characters.

Note: With Escape, we don't get out of jail free, but we do change the representation of certain characters in a string.

Escape, Unescape
Word count. With Regex we can count words in strings. We compare this method with Microsoft Word's implementation. We come close to Word's algorithm.Word Count
Files. We often need to process text files. The Regex type, and its methods, are used for this. But we need to combine a file input type (like StreamReader) with the Regex code.Regex: Files
HTML. Regex can be used to process or extract parts of HTML strings. There are problems with this approach. But it works in many situations.HTML: TitleHTML: ParagraphsHTML: Remove HTML Tags
Research. A regular expression can describe any "regular" language. These languages are ones where complexity is finite: there is a limited number of possibilities.
Automaton. A regular expression is based on finite state machines. These automata encode states and possible transitions to new states.
Operators. Regular expressions use compiler theory. With a compiler, we transform regular languages (like Regex) into tiny programs that mess with text.

Quote: These expressions are commonly used to describe patterns. Regular expressions are built from single characters, using union, concatenation, and the Kleene closure, or any-number-of, operator (Compilers: Principles, Techniques and Tools).

A summary. Regular expressions are a concise way to process text data. We use Regex.Matches, and IsMatch, to check a pattern (evaluating its metacharacters) against an input string.
© 2007-2020 Sam Allen. Every person is special and unique. Send bug reports to info@dotnetperls.com.
Home
Dot Net Perls