Split strings on different characters, including more complex operations like using multiple character delimiters. We may need to split strings on multiple characters when we have a string that contains \r\n characters (like Windows text files). Learn if there are any performance pitfalls or other gotchas.
First, let's carefully examine the very basics of the Split methods. You already know the general way to do this, but it is good to look at the basic syntax before we move on to more complex delimiters. The following code shows the most basic way of splitting a string on a single character.
/// <summary>
/// Example class.
/// </summary>
class ExampleClass
{
/// <summary>
/// Very simple string split example.
/// </summary>
public ExampleClass()
{
string exampleString = "there is a cat";
// Split string on spaces. This will separate all the words in a string
string[] words = exampleString.Split(' ');
foreach (string word in words)
{
Console.WriteLine(word);
// there
// is
// a
// cat
}
}
}
You must use either the Regex method for splitting or use the C# new array syntax. Let's look at some statements that will split strings on linefeed characters. Note that a new char array is created in the following methods. This is because there is only an overloaded function with that signature if you need StringSplitOptions, which is used to remove empty strings.
class TestSplit
{
string _test = @"cat
dog
animal
person";
/// <summary>
/// Demonstrate some methods of splitting strings on multiple lines.
/// </summary>
public TestSplit()
{
// 1)
// Split the string _test on line breaks. The return value from Split
// will be a string[] array. Make sure to include
// System.Text.RegularExpressions.
string[] lines = Regex.Split(_test, "\r\n");
// 2)
// Use a new char[] array of two characters (\r and \n) to break
// lines from _test into separate strings. Use "RemoveEmptyEntries"
// to make sure no empty strings get put in the string[] array.
string[] lines2 = _test.Split(new char[] { '\r', '\n' },
StringSplitOptions.RemoveEmptyEntries);
// 3)
// Same as the previous example, but uses a new string of 2 characters.
// Will not return any empty strings, so "None" is an okay value for
// StringSplitOptions.
string[] lines3 = _test.Split(new string[] { "\r\n" },
StringSplitOptions.None);
}
}
Here is an example that separates words in a string based on punctuation and also whitespace. This is a very flexible method. It uses RemoveEmptyStrings to eliminate punctuation and other whitespace. The following method splits English words from strings and returns arrays.
/// <summary>
/// Take all the words in the input string and separate them.
/// This method takes into account a lot of punctuation.
/// </summary>
public void SplitWordsTest(string inputValue)
{
// This is really neat. We can split on all punctuation and all spaces,
// and then we will have an array of all the words in a string.
string[] inputWords = inputValue.Split(new char[] { ' ', ',', ';', '.',
'-', '/', '|', '\\', '%', '#', '@', '!', '~', '$', '^',
'(', ')', '+', '[', ']', '{', '}', '"', ':', '<', '>', '?',
'\r', '\n', '*' },
StringSplitOptions.RemoveEmptyEntries);
foreach (string inputWord in inputWords)
{
Console.WriteLine(inputWord);
}
}
String splitting functions may show different performance characteristics based on the type of strings they are working on. The length of the blocks, the number of delimiters, and the total size of the string may factor into their performance. I tested a long string and a short string (40 and 1200 characters).
Here I show the results of my benchmarks of the various string splitting methods. First, I felt that the second or third methods would be the best, because I have observed performance problems with regular expressions before. Here are the results, first as an image and then a table.
| String used | Method used | Time in ms |
| Short |
1 Regex |
3853 |
| Short |
2 Split(new char[]) |
843 |
| Short |
3 Split(new string[]) |
951 |
| Long |
1 Regex |
9968 |
| Long |
2 Split(new char[]) |
10483 |
| Long |
3 Split(new string[]) |
12873 |
The following chart shows the three methods compared to each other on short strings. As a reminder, method 1 is the Regex method, and it is by far the slowest on the short strings. This may be because the compilation time eclipses the actual splitting time.
Long string results. The benchmark for the methods on the long strings is much more even. It may be that for very long strings, such as entire files, the Regex method is equivalent or even faster. Therefore, it is safe to say that for short strings, Regex is slowest, but for long strings it is very fast.
The conclusion here depends on the scale of your program. For programs that use shorter strings, the non-Regex methods that split based on arrays are the fastest and simplest, and they will also avoid any Regex compilations. For somewhat longer strings or files that contain more lines, Regex is appropriate, although not significantly better than the second char array method.