Count characters in your string. Microsoft Word has an option to count characters including spaces and not including spaces. Test your solution against Microsoft Word to ensure correctness.
| String | # chars - no spaces | # chars - with spaces |
| This is a good website. | 19 | 23 |
| This is useful. | 13 | 18 |
I tested a real-world text file with Word 2007 and developed two methods that closely match Microsoft Office's result. The first important part is that for the character count, we treat many spaces together as one.
/// <summary>
/// Return the number of characters in a string using the same method
/// as Microsoft Word 2007. Sequential spaces are not counted.
/// </summary>
/// <param name="value">String you want to count chars in.</param>
/// <returns>Number of chars in string.</returns>
static int CountChars(string value)
{
int result = 0;
bool lastWasSpace = false;
foreach (char c in value)
{
if (char.IsWhiteSpace(c))
{
// A.
// Only count sequential spaces one time.
if (lastWasSpace == false)
{
result++;
}
lastWasSpace = true;
}
else
{
// B.
// Count other characters every time.
result++;
lastWasSpace = false;
}
}
return result;
}The other method here counts non-whitespace characters in your string. This closely parallels Microsoft Word 2007's results as well. It is simpler and just increments the result for each non-whitespace character.
/// <summary>
/// Counts the number of non-whitespace characters.
/// It closely matches Microsoft Word 2007.
/// </summary>
/// <param name="value">String you want to count non-whitespaces in.</param>
/// <returns>Number of non-whitespace chars.</returns>
static int CountNonSpaceChars(string value)
{
int result = 0;
foreach (char c in value)
{
if (!char.IsWhiteSpace(c))
{
result++;
}
}
return result;
}I tested the two methods against Microsoft Word to make sure they are accurate. They come very close to Word's result, although they are off by a tiny amount.
| File tested | Microsoft Word char count | C# char count |
| decision.txt | 834 | 830 |
Non-whitespace results. Here I show the results on the same file with non-whitespace characters. This uses the second method shown in this document.
| File tested | Microsoft Word non-whitespace count | C# non-whitespace count |
| decision.txt | 667 | 667 |
This method pair is helpful because it allows you to more accurately judge the logical length of a file. This will make text that uses two spaces after a period be equivalent in length to text that uses one period.
If you need a way to test the length of text files, these methods could be better. This of course only applies if you want the logical length, not the physical length.
Do further testing against Microsoft Word. I think the small disparity I show could be due to trailing spaces in the test files.
Yes, but not all languages. It should work on most European languages, but won't work on Asian languages. Some languages have different concepts of words.
Yes. No StringBuilder appends or other string copying is done. Scanning through individual characters was very fast in my research. The foreach loop is not slower than a for loop.
Yes. It makes bugs less likely because it reduces the complexity of the syntax. A version that uses iterator variables would be more likely to develop bugs over time through typos.
Use these two methods to count the number of characters. They find the logical length of text. These methods closely parallel Microsoft Word 2007's dialog, but not exactly.