C# Regex.Split Examples

Split strings based on patterns with the Regex.Split method from System.Text.RegularExpressions.


This method separates strings based on a pattern. It handles a delimiter specified as a pattern—such as \D+ which means non-digit characters.

Benefits, Regex.

Using a Regex yields a greater level of flexibility and power than string.Split. The syntax is more complicated, and performance may be worse.SplitRegex

First example.

We use Regex.Split to split on all non-digit values in the input string. We then loop through the result strings, with a foreach-loop, and use int.TryParse.Parse

Input: The input string contains the numbers 10, 20, 40 and 1, and the static Regex.Split method is called with two parameters.

Pattern: The string @"\D+" is a verbatim string literal that matches non-digit chars. An escaped uppercase letter like \D means NOT.

Regex.Split NumbersStatic
C# program that uses Regex.Split using System; using System.Text.RegularExpressions; class Program { static void Main() { // // String containing numbers. // string sentence = "10 cats, 20 dogs, 40 fish and 1 programmer."; // // Get all digit sequence as strings. // string[] digits = Regex.Split(sentence, @"\D+"); // // Now we have each number string. // foreach (string value in digits) { // // Parse the value to get the number. // int number; if (int.TryParse(value, out number)) { Console.WriteLine(value); } } } } Output 10 20 40 1


Here we extract all substrings that are separated by whitespace characters. We could also use string.Split. But this version is simpler and can also be more easily extended.

Note: The example gets all operands and operators from an equation string. An operand is a character like * that acts on operands.

Tokens: With Regex, we implement a simple tokenizer. Lexical analysis and tokenization is done in many programs.

Warning: This may be an effective way to parse computer languages or program output, but it is not the fastest way.

C# program that tokenizes using System; using System.Text.RegularExpressions; class Program { static void Main() { // // The equation. // string operation = "3 * 5 = 15"; // // Split it on whitespace sequences. // string[] operands = Regex.Split(operation, @"\s+"); // // Now we have each token. // foreach (string operand in operands) { Console.WriteLine(operand); } } } Output 3 * 5 = 15


Here we get all the words that have an initial uppercase letter in a string. The Regex.Split call gets all the words. And the foreach-loop checks the first letters.Foreach

Tip: It is often useful to combine regular expressions and manual looping and string operations.

C# program that collects uppercase words using System; using System.Collections.Generic; using System.Text.RegularExpressions; class Program { static void Main() { // // String containing uppercased words. // string sentence = "Bob and Michelle are from Indiana."; // // Get all words. // string[] uppercaseWords = Regex.Split(sentence, @"\W"); // // Get all uppercased words. // var list = new List<string>(); foreach (string value in uppercaseWords) { // // Check the word. // if (!string.IsNullOrEmpty(value) && char.IsUpper(value[0])) { list.Add(value); } } // // Write all proper nouns. // foreach (var value in list) { Console.WriteLine(value); } } } Output Bob Michelle Indiana


For performance consider the string Split method (on the string type) instead of regular expressions. That method is more appropriate for precise and predictable input.

Also: You can change the Regex.Split method call into an instance Regex. This enhances performance and reduces memory pressure.

Further: You can use the RegexOptions.Compiled enumerated constant for greater performance.


A summary.

We extracted strings with the Regex.Split method. We used patterns of non-digit characters, whitespace characters, and non-word characters.

We processed

the string array result of Regex.Split by parsing the integers in a sentence. Using loops on the results of Regex.Split is an easy way to further filter your results.
Dot Net Perls
© 2007-2019 Sam Allen. All rights reserved. Written by Sam Allen,