Is using Regex better than IndexOf for capturing substrings? For example, you have a 'before' fragment and an after fragment, and you need to find a string that is between the two. Compare basic string handling methods and also more sophisticated Regexes options.
| Input | Output |
| beforeSTRINGafter | STRING |
| path/content/cat.txt | cat |
Everything the Regex does here can be accomplished with IndexOf and LastIndexOf. However, that approach can lead to bugs and isn't very flexible.
Regexes are frightening to many developers (inluding myself at times). However, they are ideal here because they offer very precise control over string matching. Look carefully at the Regex constructor next, which uses a pattern.
//
// This value ('input') is used in both examples.
//
string input = "path/content/cat.aspx";
//
// 1.
// Use Regex to find string between two parts.
// String must come after "/content/" and before ".aspx"
//
Regex regex = new Regex(@"/content/([a-z0-9\-]+)\.aspx$");
Match match = regex.Match(input);
if (match.Success)
{
//
// We have the inner 'matching' text
//
string between = match.Groups[1].Value;
Debug.WriteLine("Between 1: " + between);
//
// Reliable method shown.
//
}Here is how we can use the IndexOf methods for finding a substring in between two other parts. The code below calls LastIndexOf twice instead of IndexOf because we are matching the last possible match (not greedily). Look at how the code is fragile and hard to deal with.
//
// 2.
// Use string methods to find string between two parts.
// May fail with range exceptions because it has insufficient error handling.
//
int lastExtension = input.LastIndexOf(".aspx");
if (lastExtension != -1)
{
int lastContent = input.LastIndexOf("/content/");
if (lastContent != -1)
{
int lastSlash = lastContent + "/content/".Length;
string between = input.Substring(lastSlash, (lastExtension - lastSlash));
Debug.WriteLine("Between 2: " + between);
// (Read Debug.Write.)
}
}
//
// [ALERT] Above code is buggy and will let you down.
//Expected output. When you put the two code blocks above (1 and 2) into a method in C#, you will receive the same output. In the input string "path/content/cat.aspx", you will get "cat". That's what we want.
Prefer Regex for quick and accurate pattern-based matching. We can capture a substring that is between two other substrings with this method. Regex is ideal for more complicated (delicate) situations where precise control is needed. More detail about Regex Match is available.