Isolate part of a string based on matching patterns around it. Use regular expressions for maximum clarity and performance. You want to be restrictive and precise in what you match, making Regex an ideal approach. Take parts of input strings and separate them.
| Input string | Your required string |
| /Content/Some-Page.aspx | some-page |
| /content/alternate-1.aspx | alternate-1 |
| /images/something.png |
Here we know we have a substring that comes before what we want to 'extract', and then a substring that comes after. First, we can approach this with IndexOf and LastIndexOf, but that approach is fraught with complexity and many lines of code.
My first approach to this problem (after giving up on IndexOf) was simply a static Regex. However, this causes an unnecessary performance drain. The following example shows how I used the Regex.Match static method.
string path = "/content/alternate-1.aspx";
Match match = Regex.Match(path, @"content/([A-Za-z0-9\-]+)\.aspx$",
RegexOptions.IgnoreCase);
if (match.Success)
{
string key = match.Groups[1].Value;
}This is an annoyance to me, but the indexing of the Groups collection on Match objects starts at 1. Some computer languages start with 1, but C# doesn't usually. It does here, and we must remember this.
I found that that by using ToLower instead of IgnoreCase on the Regex yielded a 10% or higher improvement. Clearly, using RegexOptions.IgnoreCase is not always worthwhile, and since I needed a lowercase result, calling the C# string ToLower method first was a win.
// Lowercase our input first for a performance boost. string path = pathInput.ToLower(); Match match = Regex.Match(path, @"content/([A-Za-z0-9\-]+)\.aspx$");
A Regex instance object is faster than using the static Regex.Match, and in important places in your code, always use an instance object. For my project, I created a static class that can be used in the entire project. This version performed nearly twice as well.
using System;
using System.Text.RegularExpressions;
/// <summary>
/// Regexes for use on the site.
/// </summary>
static class RegexUtil
{
static Regex _regex;
static RegexUtil()
{
_regex = new Regex(@"/content/([a-z0-9\-]+)\.aspx$");
}
/// <summary>
/// Return the key that is matched within the path.
/// </summary>
static public string MatchKey(string path)
{
Match match = _regex.Match(path.ToLower());
if (match.Success)
{
return match.Groups[1].Value;
}
else
{
return null;
}
}
}In the previous code, I modified it to use the Compiled flag, which means that the compiler actually converts the Regex to MSIL (intermediate language). I am not 100% clear on how this works, but the general idea is simple. By using compiled, we get a 30% or higher performance improvement.
// Add compiled flag for 30% boost _regex = new Regex(@"/content/([a-z0-9\-]+)\.aspx$", RegexOptions.Compiled);
Here I added the RegexOptions.RightToLeft flag. Even with the end-pattern matching character ($), this improved performance for my application. (Note that these strings are being matched by their ends, which makes RightToLeft a perfect tool.)
// Combine Compiled with RightToLeft
_regex = new Regex(@"/content/([a-z0-9\-]+)\.aspx$",
RegexOptions.Compiled | RegexOptions.RightToLeft);
// (Note how the options are combined with the bitwise operator |.)We have seen the iterations of this Regex, and I was happy with the results. The final method is much safer, more precise, and probably easier to maintain than the original method with string methods. It may even improve performance by reducing stray exceptions.
using System;
using System.Text.RegularExpressions;
/// <summary>
/// Regexes for use on the site.
/// </summary>
static class RegexUtil
{
static Regex _regex;
static RegexUtil()
{
_regex = new Regex(@"/content/([a-z0-9\-]+)\.aspx$",
RegexOptions.Compiled | RegexOptions.RightToLeft);
}
/// <summary>
/// Return the key that is matched within the path.
/// </summary>
static public string MatchKey(string path)
{
Match match = _regex.Match(path.ToLower());
if (match.Success)
{
return match.Groups[1].Value;
}
else
{
return null;
}
}
}Benchmark results. The progression here shows how you can make a Regex faster and simpler. More optimized string handling could improve the plain string version further, and make it more reliable, but the Regex version is probably best.
You can call the above static class method with very simple syntax. What I show next is a snippet of calling code that will return the "key" within two substrings in the input string. This is ideal for URL rewriting on web sites.
string key = RegexUtil.MatchKey(path);
if (key != null)
{
// key was found and is set
}The Regex class is ideal for matching patterns in strings. By using Regex.Match here, I greatly simplified and made more foolproof my code for matching substrings. This is critical for programs that can accept user input. Use this method for matching input that is between two substrings.