Dot Net Perls

Line Count - C#

by Sam Allen

Problem

Count lines in your string or file with Regex and string-handling methods. You need this for using web server logs or CSV files. Most important is that the method be accurate and also fast.

Solution: C# string methods

Here I show a method that performs 30 times faster than the method shown in a popular C# book. In Windows, line breaks are represented by the invisible characters "\r\n".

Example: count lines in your text file

This next block of code counts the lines in a file on the disk. It does this by using the ReadLine() method in the .NET framework. This method is static because it stores no state.

using System.IO;

class Program
{
    static void Main()
    {
        CountLinesInFile("test.txt");
    }

    /// <summary>
    /// Count the number of lines in the file specified.
    /// </summary>
    /// <param name="f">The filename to count lines in.</param>
    /// <returns>The number of lines in the file.</returns>
    static long CountLinesInFile(string f)
    {
        long count = 0;
        using (StreamReader r = new StreamReader(f))
        {
            string line;
            while ((line = r.ReadLine()) != null)
            {
                count++;
            }
        }
        return count;
    }
}

Example: count lines in your string

The C# Cookboook by Jay Hilyard and Stephen Teilhet offers a useful solution that works properly. It uses a regular expression for counting. The following two methods contrast my regular expression method to a string-based method.

using System;
using System.Text.RegularExpressions;

class Program
{
    static void Main()
    {
        long a = CountLinesInString("This is an\r\nawesome website.");
        Console.WriteLine(a); // 2
        long b = CountLinesInStringSlow("This is an awesome\r\nwebsite.\r\nYeah.");
        Console.WriteLine(b); // 3
    }

    /// <summary>
    /// This method counts the number of lines in a string passed as the argument.
    /// It is benchmarked in this article, but what it does is make a new Regex and
    /// then get a MatchCollection on it, and then return that Count property.
    /// </summary>
    /// <param name="s">The string you want to count lines in.</param>
    /// <returns>The number of lines in the string.</returns>
    static long CountLinesInStringSlow(string s)
    {
        Regex r = new Regex("\n", RegexOptions.Multiline);
        MatchCollection mc = r.Matches(s);
        return mc.Count + 1;
    }

    /// <summary>
    /// This method counts the number of lines in a string passed as the argument.
    /// It uses simple IndexOf and interation to count the newlines. I start
    /// count at 1 because there is always at least one line in the string.
    /// </summary>
    /// <param name="s">You want to count the lines in this.</param>
    /// <returns>The number of lines in the string.</returns>
    static long CountLinesInString(string s)
    {
        long count = 1;
        int start = 0;
        while ((start = s.IndexOf('\n', start)) != -1)
        {
            count++;
            start++;
        }
        return count;
    }
}

Information: my benchmark results

I benchmarked the above methods for 1 million operations to see just how different they perform. The results were interesting and the numbers are more than one order of magnitude different.

Summary: counting lines in your files

Both of these methods accurately count the number of newlines in text. Regex has some performance problems, but for many applications they are not important.

Information: read more

The O'Reilly C# 3.0 Cookbook is a good reference that I have enjoyed reading. The methods presented there are accurate but may not be optimal. This post includes my original work. [O'Reilly C# Cookbook - oreilly.com]

Dot Net Perls
About
Sitemap
Source code
RSS
Regexes
Regex Replace With MatchEvaluator
Scraping HTML Links With Regex
Remove Whitespace From String
Regex Match Use and Options
Word Count Regex
Recent
Pi
NGEN Installer Class
List Element Equality
DateTime Tips and Tricks
Remove HTML Tags From String
© 2008 Sam Allen. All rights reserved.