C#Dot Net Perls

C#
Split String Examples

by Sam Allen

Problem

Split strings on different characters, including more complex operations like using multiple character delimiters. We may need to split strings on multiple characters when we have a string that contains \r\n characters (like Windows text files). Learn if there are any performance pitfalls or other gotchas.

C# Solution

First, let's carefully examine the very basics of the Split methods. You already know the general way to do this, but it is good to look at the basic syntax before we move on to more complex delimiters. The following code shows the most basic way of splitting a string on a single character.

/// <summary>
/// Example class.
/// </summary>
class ExampleClass
{
    /// <summary>
    /// Very simple string split example.
    /// </summary>
    public ExampleClass()
    {
        string exampleString = "there is a cat";
        // Split string on spaces. This will separate all the words in a string
        string[] words = exampleString.Split(' ');
        foreach (string word in words)
        {
            Console.WriteLine(word);
            // there
            // is
            // a
            // cat
        }
    }
}

How can I split on multiple characters (such as newlines)?

You must use either the Regex method for splitting or use the C# new array syntax. Let's look at some statements that will split strings on linefeed characters. Note that a new char array is created in the following methods. This is because there is only an overloaded function with that signature if you need StringSplitOptions, which is used to remove empty strings.

class TestSplit
{
    string _test = @"cat
dog
animal
person";

    /// <summary>
    /// Demonstrate some methods of splitting strings on multiple lines.
    /// </summary>
    public TestSplit()
    {
        // 1)
        // Split the string _test on line breaks. The return value from Split
        // will be a string[] array. Make sure to include
        // System.Text.RegularExpressions.
        string[] lines = Regex.Split(_test, "\r\n");

        // 2)
        // Use a new char[] array of two characters (\r and \n) to break
        // lines from _test into separate strings. Use "RemoveEmptyEntries"
        // to make sure no empty strings get put in the string[] array.
        string[] lines2 = _test.Split(new char[] { '\r', '\n' },
            StringSplitOptions.RemoveEmptyEntries);

        // 3)
        // Same as the previous example, but uses a new string of 2 characters.
        // Will not return any empty strings, so "None" is an okay value for
        // StringSplitOptions.
        string[] lines3 = _test.Split(new string[] { "\r\n" },
            StringSplitOptions.None);
    }
}

How can I separate words in a string?

Here is an example that separates words in a string based on punctuation and also whitespace. This is a very flexible method. It uses RemoveEmptyStrings to eliminate punctuation and other whitespace. The following method splits English words from strings and returns arrays.

/// <summary>
/// Take all the words in the input string and separate them.
/// This method takes into account a lot of punctuation.
/// </summary>
public void SplitWordsTest(string inputValue)
{
    // This is really neat. We can split on all punctuation and all spaces,
    // and then we will have an array of all the words in a string.
    string[] inputWords = inputValue.Split(new char[] { ' ', ',', ';', '.',
        '-', '/', '|', '\\', '%', '#', '@', '!', '~', '$', '^',
        '(', ')', '+', '[', ']', '{', '}', '"', ':', '<', '>', '?',
        '\r', '\n', '*' },
        StringSplitOptions.RemoveEmptyEntries);

    foreach (string inputWord in inputWords)
    {
        Console.WriteLine(inputWord);
    }
}

Which strings did you compare?

String splitting functions may show different performance characteristics based on the type of strings they are working on. The length of the blocks, the number of delimiters, and the total size of the string may factor into their performance. I tested a long string and a short string (40 and 1200 characters).

What were the benchmark results?

Here I show the results of my benchmarks of the various string splitting methods. First, I felt that the second or third methods would be the best, because I have observed performance problems with regular expressions before. Here are the results, first as an image and then a table.

String used Method used Time in ms
Short 1
Regex
3853
Short 2
Split(new char[])
843
Short 3
Split(new string[])
951
Long 1
Regex
9968
Long 2
Split(new char[])
10483
Long 3
Split(new string[])
12873

Performance graphs

The following chart shows the three methods compared to each other on short strings. As a reminder, method 1 is the Regex method, and it is by far the slowest on the short strings. This may be because the compilation time eclipses the actual splitting time.

Split strings benchmark 1.

Long string results. The benchmark for the methods on the long strings is much more even. It may be that for very long strings, such as entire files, the Regex method is equivalent or even faster. Therefore, it is safe to say that for short strings, Regex is slowest, but for long strings it is very fast.

Split strings benchmark 2.

Discussion

The conclusion here depends on the scale of your program. For programs that use shorter strings, the non-Regex methods that split based on arrays are the fastest and simplest, and they will also avoid any Regex compilations. For somewhat longer strings or files that contain more lines, Regex is appropriate, although not significantly better than the second char array method.

Dot Net Perls is dedicated to sharing code and knowledge. It has
© 2007-2008 Sam Allen. All rights reserved.

Ads by The Lounge