You need to replace whitespace characters in your string in C#, and your requirements are more complex than a single Replace call. Convert specific whitespace characters, newline characters, newline sequences, UNIX newlines, Windows line breaks and sequential whitespace chars using C# methods. Here we see several example methods written in C# that will fix the whitespace problems in your text string.
Here we see how you can use the static Regex.Replace method to change any of a set of individual characters to a space. The square brackets [ ] in the parameter indicate a set of separate characters. To use this example, you will need to add "using System.Text.RegularExpressions;" to the top of your .cs file.
/// <summary>
/// Converts all whitespace in the string to spaces using Regex.
/// </summary>
public static string ConvertWhitespaceToSpacesRegex(string value)
{
value = Regex.Replace(value, "[\n\r\t]", " ");
return value;
}Note on method results. Due to the complexity and length of this article, the results of using these methods are shown in a single section near the bottom. That section proves the methods' correctness on specific string data.
You can also replace all line breaks in your string using two string Replace calls. These receive a single character as the parameters. The first parameter is the character you need to replace, and the second character is the replacement. When using the char overload, you cannot replace a character with nothing.
/// <summary>
/// Converts all whitespace in the string to spaces using string Replace.
/// </summary>
public static string ConvertWhitespaceToSpacesString(string value)
{
value = value.Replace('\r', ' ');
value = value.Replace('\n', ' ');
return value;
}The method you see here uses the ToCharArray method on the string parameter, which converts the string to a char[] array. This allows us to modify the characters in-place. The result is that the performance of this method is far better than those that use Replace calls, on small strings.
See ToCharArray Method, Converting String to Array.
/// <summary>
/// Converts all the whitespace in the string to spaces using switch.
/// 3-4x faster than using string replace.
/// Faster than using a new empty array and filling it.
/// </summary>
public static string ConvertWhitespaceToSpaces(string value)
{
char[] arr = value.ToCharArray();
for (int i = 0; i < arr.Length; i++)
{
switch (arr[i])
{
case '\t':
case '\r':
case '\n':
{
arr[i] = ' ';
break;
}
}
}
return new string(arr);
}Note on the switch statement. Internally, this method uses a switch on the char, which is compiled to a jump table. Jump tables are low-level instructions that provide constant lookup time. Therefore, the switch is faster than if/else statements.
Here we see a method that converts all Windows newlines, which contain two characters, and all UNIX newlines, which contain 1 character. The newlines are all converted to single spaces. The clever part of this method is that it converts the Windows newlines first.
/// <summary>
/// Converts all newlines in the string to single spaces.
/// </summary>
public static string ConvertNewlinesToSingleSpaces(string value)
{
value = value.Replace("\r\n", " ");
value = value.Replace('\n', ' ');
return value;
}Why convert Windows newlines first? The reason you must convert the two-character Windows line breaks first is that the UNIX newlines are half of the Windows ones. Therefore, if you replace UNIX newlines first, you will be left with '\r' characters you don't want.
It is very easy to convert all linebreaks in a string you read in from the disk. On the string, simply use Replace to change all Windows newlines to UNIX newlines. You do not need to change any existing UNIX newlines.
/// <summary>
/// Converts Windows style newlines to UNIX-style newlines.
/// </summary>
public static string ConvertToUnixNewlines(string value)
{
return value.Replace("\r\n", "\n");
}You can also convert all the newlines in your string to Windows newlines, providing compatibility with many applications. The trick here is to convert all pre-existing Windows newlines to UNIX newlines first. Then, convert all UNIX newlines to Windows newlines.
/// <summary>
/// Converts all newlines in the file to Windows newlines.
/// </summary>
public static string ConvertToWindowsNewlines(string value)
{
value = ConvertToUnixNewlines(value);
value = value.Replace("\n", "\r\n");
return value;
}Note on the code example. The code example uses the ConvertToUnixNewlines method in the section directly above. You could simply paste the Replace call instead of using that method.
Here we see a method that converts any number of whitespaces in a sequence into a single space. This is really useful for when you are reading in text data from a database or file and are not sure what kind of whitespaces are used in it.
/// <summary>
/// Convert all whitespaces to a single space.
/// </summary>
public static string ConvertWhitespacesToSingleSpaces(string value)
{
value = Regex.Replace(value, @"\s+", " ");
return value;
}Using this method with HTML. Sometimes, you can use this method on markup such as HTML to reduce the size of the file. The author's experience is that this can reduce the final size by 1%, even after compression. In HTML, two spaces are usually the same as one space.
Here we see an example that uses the simple File.ReadAllText method and calls one of the above methods. Note that the example has the "using System.IO;" line near the top. It writes the modified file, TextFile1.txt, to the Console window.
using System;
using System.IO;
class Program
{
static void Main()
{
//
// Read in text with File.ReadAllText.
//
string value = File.ReadAllText("TextFile1.txt");
//
// Call method and display result.
//
value = NewlineTool.ConvertWhitespacesToSingleSpaces(value); // <-- see note
Console.WriteLine(value);
//
// You can now write it with File.WriteAllText.
//
}
}Note on NewlineTool type. The NewlineTool class specified is a static class located in another file. You can create it by creating "NewlineTool.cs" and then looking at the next example and using the code there.
The author prefers to keep static methods, which do not require allocation or state, in a separate file. The file should have the name of its class as the filename. The class must also be public, as well as the individual methods.
using System.Text.RegularExpressions;
/// <summary>
/// Contains string methods for converting newlines.
/// </summary>
public static class NewlineTool
{
/// <summary>
/// Converts all whitespace in the string to spaces using Regex.
/// </summary>
public static string ConvertWhitespaceToSpacesRegex(string value)
{
// ...
}
}Finally, we look at how the above methods work on a set of five different input strings. The output is shown right after the code. The Main entry point below simply uses an array of strings as the input, and then loops over each of those strings. It then calls all the methods in this article inside the loop and prints out the result.
using System;
class Program
{
static void Main()
{
string[] arr = new string[]
{
"This string\r\ncontains\r\nWindows newlines.",
"This string\ncontains\nUNIX newlines.",
"This string\tcontains a tab.",
"This string\r\ncontains\nmixed newlines.",
"This string contains double spaces."
};
foreach (string value in arr)
{
Console.WriteLine(NewlineTool.ConvertWhitespaceToSpacesRegex(value));
Console.WriteLine(NewlineTool.ConvertWhitespaceToSpacesString(value));
Console.WriteLine(NewlineTool.ConvertWhitespaceToSpaces(value));
Console.WriteLine(NewlineTool.ConvertNewlinesToSingleSpaces(value));
Console.WriteLine(NewlineTool.ConvertWhitespacesToSingleSpaces(value));
string valueUnix = NewlineTool.ConvertToUnixNewlines(value);
Console.WriteLine(valueUnix);
Console.WriteLine(valueUnix.Length);
string valueWindows = NewlineTool.ConvertToWindowsNewlines(value);
Console.WriteLine(valueWindows);
Console.WriteLine(valueWindows.Length);
Console.WriteLine(new string('=', 80));
}
}
}Overview of the output. Next is the output from the above program. Each iteration through the loop is separated by a line. For the last two methods, the char count (Length) is printed. This is so you can see that the newlines have different numbers of characters.
This string contains Windows newlines. This string contains Windows newlines. This string contains Windows newlines. This string contains Windows newlines. This string contains Windows newlines. This string contains Windows newlines. 38 This string contains Windows newlines. 40 ================================================================================ This string contains UNIX newlines. This string contains UNIX newlines. This string contains UNIX newlines. This string contains UNIX newlines. This string contains UNIX newlines. This string contains UNIX newlines. 35 This string contains UNIX newlines. 37 ================================================================================ This string contains a tab. This string contains a tab. This string contains a tab. This string contains a tab. This string contains a tab. This string contains a tab. 27 This string contains a tab. 27 ================================================================================ This string contains mixed newlines. This string contains mixed newlines. This string contains mixed newlines. This string contains mixed newlines. This string contains mixed newlines. This string contains mixed newlines. 36 This string contains mixed newlines. 38 ================================================================================ This string contains double spaces. This string contains double spaces. This string contains double spaces. This string contains double spaces. This string contains double spaces. This string contains double spaces. 39 This string contains double spaces. 39 ================================================================================
Notes on the output. The examples here may not be useful to you, but you can develop a small console program that tests strings such as the one shown. You can then use this to make sure your own methods work properly.
Here we saw ways to convert newlines, line breaks, spaces, tabs and all whitespace characters into single spaces or other characters. We also looked at UNIX newlines and Windows newlines, and how to Replace these strings. Additionally, we saw an optimized method that can be useful in certain programs. Regular expressions had the most flexibility.