Develop a method that will remove punctuation and whitespace and return a 'cleaned' (sanitized) version. The method must return a series of words separated by a 1-space delimiter. This output can easily be used by full-text indexes and other applications. The following table shows some example input and output.
| When you send this input | You need to receive |
| SomeText&--777 And where before. | SomeText 777 And where before |
You will probably have other requirements, but I present this code to show an approach to the problem. The following static method receives a string, and returns a string stripped of unwanted characters. Here we look at the method and its internals, and my explanations follow.
/// <summary>
/// Static class containing helper string methods.
/// </summary>
static public class StringUtil
{
/// <summary>
/// Strip the input string of characters that are not letters or digits.
/// Replace characters with spaces and then strip all spaces
/// in a sequence until there is only a single space.
/// </summary>
/// <param name="selText">The string you want to process (sanitize).</param>
/// <returns>The new sanitized version of the string.</returns>
static public string SanitizeString(string selText)
{
// We build up a new string with the StringBuilder, and keep track of spaces
// with a bool variable.
StringBuilder res = new StringBuilder();
bool lastWasSpace = false;
for (int i = 0; i < selText.Length; i++)
{
if (char.IsLetterOrDigit(selText[i]))
{
res.Append(selText[i]);
lastWasSpace = false;
}
else if (char.IsWhiteSpace(selText[i]) || char.IsPunctuation(selText[i]))
{
// Replace any number of whitespace or punctuation characters
// in a sequence into a single space.
if (lastWasSpace == false)
{
res.Append(' ');
lastWasSpace = true;
}
}
}
// Return the sanitized string.
return res.ToString();
}
}
This code is highly effective and simple way of removing unwanted characters. It performs its task in time linear to the length of the string. I want to conclude by saying that regular expressions can be thought of as sledgehammers, and you sometimes want a pair of pliers, like this method.