How can you use Regex and MatchEvaluator for complex pattern replacements? You want to use Regex to replace lowercased words with uppercased ones. You need exact control over what you change. Simple Regexes aren't powerful enough.
| Input | Output |
| samuel allen | Samuel Allen |
| dot net perls | Dot Net Perls |
| Mother teresa | Mother Teresa |
Here we use Regex and MatchEvaluator. When researching the problem, I found a good article at MSDN. However, the solution has some weaknesses: it isn't easy to call elsewhere in your program, and has some extra branches. [Regex.Replace Method (String, MatchEvaulator) - MSDN]
With regular expressions, you can specify a MatchEvaluator. This is a delegate method that the Regex.Replace method will call when you need to modify the match. Here we see how you can use MatchEvaluator to uppercase matches.
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
// Input strings.
const string s1 = "samuel allen";
const string s2 = "dot net perls";
const string s3 = "Mother teresa";
// Write output strings.
Console.WriteLine(CapitalizeFirstLetters(s1));
Console.WriteLine(CapitalizeFirstLetters(s2));
Console.WriteLine(CapitalizeFirstLetters(s3));
// Samuel Allen
// Dot Net Perls
// Mother Teresa
}
/// <summary>
/// Uppercase first letters of all words in the string.
/// </summary>
static string CapitalizeFirstLetters(string v)
{
return Regex.Replace(v, @"\b[a-z]\w+", new MatchEvaluator(CapitalizeInner));
}
/// <summary>
/// Delegate method to perform uppercase on the match.
/// </summary>
static string CapitalizeInner(Match m)
{
string v = m.ToString();
return char.ToUpper(v[0]) + v.Substring(1);
}
}\b Word break:
Matches where a word starts.
[a-z] Matches any lowercase ASCII letter.
We only need to match words with lowercase first letters.
This is a character range expression.
\w+ Word characters:
Matches must have one or more characters.The method here will only match words of 2 or more characters in length. This avoids matching some words. It avoids one argument to Substring and one if check as well.
The regular expression-based method above has a few key advantages to the C# string one. It can be modified to accommodate different rules much easier. If you wanted to consider different characters as word breaks, you could easily add a character range.
MSDN indicates you can use it when you need to perform validation. "You can use MatchEvaluator to perform custom verifications or operations at each Replace operation." [MatchEvaluator Delegate - MSDN]
You could store a Dictionary of words that need special-casing, such as McCain, McCartney, and DeGeneres. I have used code like that before, and it requires a bit of manual work to find most of the names using different rules.
Generally I would recommend the C# string-based method, but sometimes this approach would be superior. Regular expressions offer a very fine degree of control, and by basing the uppercase method on them, we can change rules for matching much easier.