Your string containing whitespace needs to be trimmed so that you can have only the important contents. Several methods are available, each with plusses and minuses.
This post results from about three hours of headache-causing searching and trial and error. Let's review some requirements in more detail and I will show the methods that are probably best.
The simple Trim() function is unsurpassed in performance in the simple cases (which I have found to be most common). If at all possible, use Trim(). Here's how you can use Trim, and what it will return.
string box = " Some text "; box = box.Trim(); // // "Some text" // Also works on tabs and newlines. //
Regexes are very powerful, but also very powerful in giving me headaches. One problem I encountered was a multiple-line string. With Regex, you must indicate how you want the engine to treat newlines (\n). The two options are RegexOptions.Multiline and RegexOptions.Singleline.
Resources on the Internet can help you with this, but I wanted to investigate further and see how they perform and are used. My requirements were to trim the beginning and ending whitespace from a medium size (maybe several kilobytes) string. Here's one method that uses two passes.
// // Example string // string source = " Some text "; // // Use the ^ to always match at the start of the string. // Then, look through all WHITESPACE characters with \s // Use + to look through more than 1 characters // Then replace with an empty string. // source = Regex.Replace(source, @"^\s+", ""); // // The exact same as above, but with a $ on the end. // This requires that we match at the end. // source = Regex.Replace(source, @"\s+$, "");
Readers have commented on compiled Regex objects, and they do make a substantial performance improvement. My quick tests showed that for using two compiled regexes 1,000 times each was about 47% faster than not compiling them. However, as I will show next, this was a "drop in the bucket."
//
// Use two precompiled Regexes.
//
Regex a1 = new Regex(@"^\s+", RegexOptions.Compiled);
Regex a2 = new Regex(@"\s+$", RegexOptions.Compiled);
foreach (object item in _collection) // Example loop.
{
//
// Reuse the compiled regex objects over and over again.
//
string source = " Some text ";
source = a1.Replace(source, "");
source = a2.Replace(source, ""); // compiled: 3620
}
What if we could combine the two above regular expressions into a single one, and then compile that? Well, we certainly can do that, and I wrote the code.
string source = " Some text "; // // Use the alternate syntax "|" for combining both regexes. // source = Regex.Replace(source, @"^\s+|\s+$", ""); // // Same as before but with the two alternates switched. // source = Regex.Replace(source, @"\s+$|^\s+", ""); // ... we could compile all of these too. //
Here we need a balance between development time and how much code you want to write (or not write). If you need Trim() with different requirements than the built-in methods, then the Regex methods will be faster to write. If it is at all reasonable for you to use the built-in Trim(), do so.
The benchmark was 1,000 iterations of trimming strings in the ranges of 1 kb to 10 kb. The version with two Regex objects was clearly faster than the other Regex, but the time for Trim() to run was not measurable because it took less than 15 ms.
For my projects where I consider performance critical, I don't use Regex. ASP.NET developers need to make their pages feel nearly instant, but for some other tasks such as processing data in the background Regex is ideal.