You are dealing with strings in your C# program and want to make sure that each string only contains a certain range of characters, or that each string must not contain certain characters. The C# language and .NET Framework provides several ways of doing this, but using regular expressions can be less clear and far slower in runtime performance. Here we look at how you can loop through strings in the C# language and detect ranges of characters and specific characters, using the Regex.IsMatch method with character ranges and also a switch statement.
First, the precise goal of this program is to validate that the string parameter specified only contains the characters a - z lowercase and uppercase, and the ten digits 0 - 9. This is of practical use on some websites and programs that parse data that may not be well formed. Some of these programs must check thousands of strings an hour, which are usually fairly short strings (such as user input from the network). The IsValid1 method makes sure these characters are the only ones that occur by using Regex, while the IsValid2 method does the same with a loop.
~~~ Program that validates strings (C#) ~~~
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
Console.WriteLine(IsValid1("dotnetperls100")); // Valid: has only a-z and 0-9
Console.WriteLine(IsValid2("dotnetperls100"));
Console.WriteLine(IsValid1("$Invalid")); // Invalid: has $
Console.WriteLine(IsValid2("$Invalid"));
Console.WriteLine(IsValid1("900DOTNETPERLS")); // Valid: has only A-Z and 0-9
Console.WriteLine(IsValid2("900DOTNETPERLS"));
Console.WriteLine(IsValid1(" space ")); // Invalid: has space
Console.WriteLine(IsValid2(" space "));
}
/// <summary>
/// Test if string contains any of the specified characters.
/// </summary>
public static bool IsValid1(string path)
{
return Regex.IsMatch(path, @"^[a-zA-Z0-9]*$");
}
/// <summary>
/// Test if string contains any of the specified characters (fast).
/// </summary>
public static bool IsValid2(string path)
{
for (int i = 0; i < path.Length; i++)
{
switch (path[i])
{
case 'a': // Lowercase
case 'b':
case 'c':
case 'd':
case 'e':
case 'f':
case 'g':
case 'h':
case 'i':
case 'j':
case 'k':
case 'l':
case 'm':
case 'n':
case 'o':
case 'p':
case 'q':
case 'r':
case 's':
case 't':
case 'u':
case 'v':
case 'w':
case 'x':
case 'y':
case 'z':
case 'A': // Uppercase
case 'B':
case 'C':
case 'D':
case 'E':
case 'F':
case 'G':
case 'H':
case 'I':
case 'J':
case 'K':
case 'L':
case 'M':
case 'N':
case 'O':
case 'P':
case 'Q':
case 'R':
case 'S':
case 'T':
case 'U':
case 'V':
case 'W':
case 'X':
case 'Y':
case 'Z':
case '0': // Numbers
case '1':
case '2':
case '3':
case '4':
case '5':
case '6':
case '7':
case '8':
case '9':
{
continue;
}
default:
{
return false; // Illegal
}
}
}
return true; // Legal
}
}
~~~ Output of the program ~~~
True
True
False
False
True
True
False
FalseOverview of program text. The Program class encloses the Main entry point, which tests the IsValid1 and IsValid2 methods. The pairs of method invocations return the same Boolean value and prove that the methods operate the same on these string literals.
Method implementations. The two methods IsValid1 and IsValid2 have very different implementations but their results are the same. The IsValid1 method uses the Regex.IsMatch method to return a Boolean that indicates whether the string only has the range of characters specified. The IsValid2 method uses a for-loop construct to iterate through the character indexes in the string. It employs a switch statement on the char, with a collection of constant cases. The switch statement is compiled into a jump table, which computes the result without conditional branches. This article is based on .NET 3.5 SP1.
Here we look at a benchmark of the two methods shown in the program text. The first two strings in the example are tested in tight loops, and the final result indicates the number of nanoseconds that the method invocations required. The IsValid1 method that uses Regex.IsMatch required about 906 nanoseconds, while the IsValid2 method that uses the switch required about 13 nanoseconds, meaning that the regular expression required almost 70 times more processing time.
~~~ Benchmark description ~~~
1000000 loops with 2 method calls in each iteration.
Numbers reported in nanoseconds per method call.
~~~ Code tested in loops ~~~
if (IsValid1("dotnetperls100")) // Body 1 start
{
}
if (IsValid1("$Invalid"))
{
}
if (IsValid2("dotnetperls100")) // Body 2 start
{
}
if (IsValid2("$Invalid"))
{
}
~~~ Benchmark results ~~~
IsValid1: 906.665 ns (Uses regular expression)
IsValid2: 13.500 ns (Uses switch, faster)Here we note the Regex.IsMatch method as used in the program text in the example. This method performs the logic of the Regex.Match internally, but narrows the result information to a Boolean value that indicates whether any matching text was found or not. The Regex.IsMatch method is commonly used in if-statements to see if the string contains the pattern specified. You can find more information on the closely related Regex.Match method here.
Here we mention a quote given by one of the most famous hackers of the original Netscape client software, parts of which are now found in the Firefox web browser. Jamie Zawinski stated that "Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems." Regular expressions can be used in ways that cause more complexities in programs than they fix. For example, the regular expression method here wastes almost 1 microsecond per method call on a fast computer.
This section notes some ways you can enhance the validation methods shown in the program text. If you are trying to filter data for security purposes, it is often best to test each character against a collection of all the valid characters, not against invalid characters. This is because it is easier to omit invalid characters that you haven't thought of, than to omit valid characters that are required for correct execution of the program. In the method, you can also add punctuation and other symbols to fit the requirements of your program.
Here we saw how you can test the validity of string input against a set of characters in the C# programming language targeting the .NET Framework. We used the Regex.IsMatch method to check against a range of characters; the switch statement on an iterative loop for better performance; and also noted some problems and enhancements with character validation methods such as these. The article shows that short code can require much longer periods of time to execute, with the regular expression taking 70 times more execution time than the switch that has many more lines of program source code.