Dot Net Perls

Count Characters in String - C#

by Sam Allen

Problem

Count characters in your string. Microsoft Word has an option to count characters including spaces and not including spaces. Test your solution against Microsoft Word to ensure correctness.

String# chars - no spaces# chars - with spaces
This is a good website.1923
This    is useful.1318

Solution: C#

I tested a real-world text file with Word 2007 and developed two methods that closely match Microsoft Office's result. The first important part is that for the character count, we treat many spaces together as one.

/// <summary>
/// Return the number of characters in a string using the same method
/// as Microsoft Word 2007. Sequential spaces are not counted.
/// </summary>
/// <param name="value">String you want to count chars in.</param>
/// <returns>Number of chars in string.</returns>
static int CountChars(string value)
{
    int result = 0;
    bool lastWasSpace = false;

    foreach (char c in value)
    {
        if (char.IsWhiteSpace(c))
        {
            // A.
            // Only count sequential spaces one time.
            if (lastWasSpace == false)
            {
                result++;
            }
            lastWasSpace = true;
        }
        else
        {
            // B.
            // Count other characters every time.
            result++;
            lastWasSpace = false;
        }
    }
    return result;
}

Example: count non-whitespace characters

The other method here counts non-whitespace characters in your string. This closely parallels Microsoft Word 2007's results as well. It is simpler and just increments the result for each non-whitespace character.

/// <summary>
/// Counts the number of non-whitespace characters.
/// It closely matches Microsoft Word 2007.
/// </summary>
/// <param name="value">String you want to count non-whitespaces in.</param>
/// <returns>Number of non-whitespace chars.</returns>
static int CountNonSpaceChars(string value)
{
    int result = 0;
    foreach (char c in value)
    {
        if (!char.IsWhiteSpace(c))
        {
            result++;
        }
    }
    return result;
}

Information: testing the two methods

I tested the two methods against Microsoft Word to make sure they are accurate. They come very close to Word's result, although they are off by a tiny amount.

File testedMicrosoft Word char countC# char count
decision.txt834830

Non-whitespace results. Here I show the results on the same file with non-whitespace characters. This uses the second method shown in this document.

File testedMicrosoft Word non-whitespace countC# non-whitespace count
decision.txt667667

Question: why is this helpful?

This method pair is helpful because it allows you to more accurately judge the logical length of a file. This will make text that uses two spaces after a period be equivalent in length to text that uses one period.

Information: better than bytes

If you need a way to test the length of text files, these methods could be better. This of course only applies if you want the logical length, not the physical length.

Question: how can I improve these methods?

Do further testing against Microsoft Word. I think the small disparity I show could be due to trailing spaces in the test files.

Question: does it work on other languages than English?

Yes, but not all languages. It should work on most European languages, but won't work on Asian languages. Some languages have different concepts of words.

Question: are the methods fast?

Yes. No StringBuilder appends or other string copying is done. Scanning through individual characters was very fast in my research. The foreach loop is not slower than a for loop.

Question: is foreach better for the loop?

Yes. It makes bugs less likely because it reduces the complexity of the syntax. A version that uses iterator variables would be more likely to develop bugs over time through typos.

Summary

Use these two methods to count the number of characters. They find the logical length of text. These methods closely parallel Microsoft Word 2007's dialog, but not exactly.

Dot Net Perls
About
Sitemap
Source code
RSS
Strings
Split String Examples
IndexOf String Examples
Remove HTML Tags From String
Count Characters in String
Uppercase First Letter in String
Recent
Pi
NGEN Installer Class
List Element Equality
DateTime Tips and Tricks
Remove HTML Tags From String
© 2008 Sam Allen. All rights reserved.