Dot Net Perls

Regex Replace With MatchEvaluator - C#

by Sam Allen

Problem

How can you use Regex and MatchEvaluator for complex pattern replacements? You want to use Regex to replace lowercased words with uppercased ones. You need exact control over what you change. Simple Regexes aren't powerful enough.

InputOutput
samuel allenSamuel Allen
dot net perlsDot Net Perls
Mother teresaMother Teresa

Solution: Regex MatchEvaluator and C#

Here we use Regex and MatchEvaluator. When researching the problem, I found a good article at MSDN. However, the solution has some weaknesses: it isn't easy to call elsewhere in your program, and has some extra branches. [Regex.Replace Method (String, MatchEvaulator) - MSDN]

With regular expressions, you can specify a MatchEvaluator. This is a delegate method that the Regex.Replace method will call when you need to modify the match. Here we see how you can use MatchEvaluator to uppercase matches.

using System;
using System.Text.RegularExpressions;

class Program
{
    static void Main()
    {
        // Input strings.
        const string s1 = "samuel allen";
        const string s2 = "dot net perls";
        const string s3 = "Mother teresa";

        // Write output strings.
        Console.WriteLine(CapitalizeFirstLetters(s1));
        Console.WriteLine(CapitalizeFirstLetters(s2));
        Console.WriteLine(CapitalizeFirstLetters(s3));

        // Samuel Allen
        // Dot Net Perls
        // Mother Teresa
    }

    /// <summary>
    /// Uppercase first letters of all words in the string.
    /// </summary>
    static string CapitalizeFirstLetters(string v)
    {
        return Regex.Replace(v, @"\b[a-z]\w+", new MatchEvaluator(CapitalizeInner));
    }

    /// <summary>
    /// Delegate method to perform uppercase on the match.
    /// </summary>
    static string CapitalizeInner(Match m)
    {
        string v = m.ToString();
        return char.ToUpper(v[0]) + v.Substring(1);
    }
}
\b      Word break:
        Matches where a word starts.

[a-z]   Matches any lowercase ASCII letter.
        We only need to match words with lowercase first letters.
        This is a character range expression.

\w+     Word characters:
        Matches must have one or more characters.

The method here will only match words of 2 or more characters in length. This avoids matching some words. It avoids one argument to Substring and one if check as well.

The regular expression-based method above has a few key advantages to the C# string one. It can be modified to accommodate different rules much easier. If you wanted to consider different characters as word breaks, you could easily add a character range.

Question: what other uses does MatchEvaluator have?

MSDN indicates you can use it when you need to perform validation. "You can use MatchEvaluator to perform custom verifications or operations at each Replace operation." [MatchEvaluator Delegate - MSDN]

Question: how could I enhance this capitalization algorithm?

You could store a Dictionary of words that need special-casing, such as McCain, McCartney, and DeGeneres. I have used code like that before, and it requires a bit of manual work to find most of the names using different rules.

Summary: using MatchEvaluator

Generally I would recommend the C# string-based method, but sometimes this approach would be superior. Regular expressions offer a very fine degree of control, and by basing the uppercase method on them, we can change rules for matching much easier.

Dot Net Perls
About
Sitemap
Source code
RSS
Regexes
Regex Replace With MatchEvaluator
Scraping HTML Links With Regex
Remove Whitespace From String
Regex Match Use and Options
Word Count Regex
Recent
Pi
NGEN Installer Class
List Element Equality
DateTime Tips and Tricks
Remove HTML Tags From String
© 2008 Sam Allen. All rights reserved.