You need to count characters in your string. Microsoft Word has an option to count characters including spaces and not including spaces. Test your solution against Microsoft Word to ensure correctness. Here we count the number of letters and other significant characters in C# strings.
Here we look at a method that counts characters like Microsoft Office does. I tested a real-world text file with Word 2007 and developed two methods that closely match Microsoft Office's result. The first important part is that for the character count, we treat many spaces together as one.
=== Method that counts characters (C#) ===
/// <summary>
/// Return the number of characters in a string using the same method
/// as Microsoft Word 2007. Sequential spaces are not counted.
/// </summary>
/// <param name="value">String you want to count chars in.</param>
/// <returns>Number of chars in string.</returns>
static int CountChars(string value)
{
int result = 0;
bool lastWasSpace = false;
foreach (char c in value)
{
if (char.IsWhiteSpace(c))
{
// A.
// Only count sequential spaces one time.
if (lastWasSpace == false)
{
result++;
}
lastWasSpace = true;
}
else
{
// B.
// Count other characters every time.
result++;
lastWasSpace = false;
}
}
return result;
}Description of the code. It iterates over string parameter. The string that is passed to this method has each of its characters examined. Word 2007 counts ten spaces in a row as one, so we do the same here. The bool keeps track of whether the previous char was a space.
Methods used. char.IsWhiteSpace is used. This handles newlines, line breaks, tabs, and any whitespace. Finally, it counts non-whitespace characters normally.
Here we see a method that counts non-whitespace characters. This closely parallels Microsoft Word 2007's results as well. It is simpler and just increments the result for each non-whitespace character.
=== Method that counts word characters (C#) ===
/// <summary>
/// Counts the number of non-whitespace characters.
/// It closely matches Microsoft Word 2007.
/// </summary>
/// <param name="value">String you want to count non-whitespaces in.</param>
/// <returns>Number of non-whitespace chars.</returns>
static int CountNonSpaceChars(string value)
{
int result = 0;
foreach (char c in value)
{
if (!char.IsWhiteSpace(c))
{
result++;
}
}
return result;
}Here we look at the accuracy of the above methods when compared to Microsoft Word. I tested the two methods against Microsoft Word to make sure they are accurate. They come very close to Word's result, although they are off by a tiny amount.
File tested: decision.txt Microsoft Word char count: 834 Method count: 830 [off by 4] File tested: decision.txt Word non-whitespace count: 667 Method count: 667 [exact]
This method pair is helpful because it allows you to more accurately judge the logical length of a file. This will make text that uses two spaces after a period be equivalent in length to text that uses one period.
Counting bytes in files. If you need a way to test the length of text files, these methods could be better. This of course only applies if you want the logical length, not the physical length.
Note on performance of methods. The methods are fairly fast because no StringBuilder appends or other string copying is done. Scanning through individual characters was very fast in my research. The foreach loop is not slower than a for loop.
Here we saw methods that can count the number of characters and non-whitespace characters with results that are similar to Microsoft Word 2007. Use these two methods to count the number of characters, finding the logical length of text. These methods closely parallel Microsoft Word 2007's dialog, but not exactly.