Dot Net Perls

Regex Improvement - C#

by Sam Allen

Problem

You want to improve the readability and performance of your C# regular expressions. You can Regex as a field on classes, use RegexOptions.Compiled, and avoid static Regexes. Take benchmarks and show examples.

Solution: Regex code in C#

First we will review some of the other work done by experts in C# and MSDN's resources. There is an excellent overview at the BCL Team blog at MSDN.

Microsoft's David Gutierrez states that there are three major options regarding regular expression performance. They are ordered from usually least efficient to most efficient. [Regular Expression performance - blogs.msdn.com]

Next, we look at MSDN which doesn't have extensive documentation here. It warns not to use RegexOptions.Compiled when also using CompileToAssembly. This means you can't combine compiled and precompiled code. [RegexOptions Enumeration - MSDN]

Example 1: using the static Regex Split method

For the next three examples, I use Split, but other methods such as Matches, Match, and Replace have similar characteristics.

using System;
using System.Text.RegularExpressions;

class Program
{
    static void Main()
    {
        string s = "This is a simple /string/ for Regex.";
        string[] c = Regex.Split(s, @"\W+");
        foreach (string m in c)
        {
            Console.WriteLine(m);
        }
        // This
        // is
        // a
        // simple
        // string
        // for
        // Regex
    }
}

This code uses the static Regex.Split method. Static methods do not rely on class-level state, but are slower in cases like this where storing state would save CPU cycles.

It demonstrates a simple Regex that Splits the input string into separate words. The \W+ means one or more non-word characters. It is the delimiter.

Example 2: using an instance Regex with Split

Next here we see the new Regex approach. It works the same as Example 1, but has better performance. It stores the Regex as a method-level instance.

using System;
using System.Text.RegularExpressions;

class Program
{
    static void Main()
    {
        string s = "This is a simple /string/ for Regex.";
        Regex r = new Regex(@"\W+");
        string[] c = r.Split(s);
        foreach (string m in c)
        {
            Console.WriteLine(m);
        }
        // This
        // is
        // a
        // simple
        // string
        // for
        // Regex
    }
}

Example 3: using a class-level compiled Regex

We see two new approaches here. The Regex is stored as a static field, meaning it can be reused throughout the application without recreating it.

Additionally, we see the RegexOptions.Compiled flag, which doesn't affect behavior, but slows startup and improves runtime.

using System;
using System.Text.RegularExpressions;

class Program
{
    static Regex _wordRegex = new Regex(@"\W+", RegexOptions.Compiled);

    static void Main()
    {
        string s = "This is a simple /string/ for Regex.";
        string[] c = _wordRegex.Split(s);
        foreach (string m in c)
        {
            Console.WriteLine(m);
        }
        // This
        // is
        // a
        // simple
        // string
        // for
        // Regex
    }
}

Task: benchmarking the three Regex approaches

The three Regex method calls above are compared here in 1 million iterations on the same method-level objects in the 3 examples.

1 - Static Regex method2 - Instance Regex object3 - Instance compiled Regex
string s = "..."; // [omitted] string[] a = Regex.Split(s, @"\W+");string s = "..."; Regex r = new Regex(@"\W+"); string[] a = r.Split(s);string s = "..."; Regex r = new Regex(@"\W+", RegexOptions.Compiled); string[] a = r.Split(s);

What I found was that there is a small performance improvement moving from Method 1, to Method 2, to Method 3. [RegexPerformance in C# - dotnetperls.com]

Most concerning above is that Method 3 would have startup that takes 10x longer than the other two methods.

My recommendation here is to carefully assess whether the 21% runtime improvement is worth a 10x slowdown at startup. If it is not, you can use Method 2, which has a 5% performance boost with no startup penalty.

Summary: Regex performance examples

We encountered a situation where runtime performance can be enhanced by sacrificing startup time. As stated, I feel that using an instance method, as in Method 2, is best for most situations.

We didn't cover the details of using CompileToAssembly for precompiled Regexes, but that approach would surpass all methods here in performance.

For most projects Method 2 with the instance Regex strikes a good balance between performance at runtime and startup. I intend to replace my RegexOptions.Compiled regexes with interpreted ones.

Dot Net Perls
About
Sitemap
Regexes
Regex Replace With MatchEvaluator
Scraping HTML Links
Regex Match Examples
Remove Whitespace From String
Regex and File Tutorial
New
StartsWith String Examples
GZIP Accept-Encoding Request
© 2008 Sam Allen. All rights reserved.