Count lines in your string or file with Regex and string-handling methods. You need this for using web server logs or CSV files. Most important is that the method be accurate and also fast.
Here I show a method that performs 30 times faster than the method shown in a popular C# book. In Windows, line breaks are represented by the invisible characters "\r\n".
This next block of code counts the lines in a file on the disk. It does this by using the ReadLine() method in the .NET framework. This method is static because it stores no state.
using System.IO;
class Program
{
static void Main()
{
CountLinesInFile("test.txt");
}
/// <summary>
/// Count the number of lines in the file specified.
/// </summary>
/// <param name="f">The filename to count lines in.</param>
/// <returns>The number of lines in the file.</returns>
static long CountLinesInFile(string f)
{
long count = 0;
using (StreamReader r = new StreamReader(f))
{
string line;
while ((line = r.ReadLine()) != null)
{
count++;
}
}
return count;
}
}The C# Cookboook by Jay Hilyard and Stephen Teilhet offers a useful solution that works properly. It uses a regular expression for counting. The following two methods contrast my regular expression method to a string-based method.
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
long a = CountLinesInString("This is an\r\nawesome website.");
Console.WriteLine(a); // 2
long b = CountLinesInStringSlow("This is an awesome\r\nwebsite.\r\nYeah.");
Console.WriteLine(b); // 3
}
/// <summary>
/// This method counts the number of lines in a string passed as the argument.
/// It is benchmarked in this article, but what it does is make a new Regex and
/// then get a MatchCollection on it, and then return that Count property.
/// </summary>
/// <param name="s">The string you want to count lines in.</param>
/// <returns>The number of lines in the string.</returns>
static long CountLinesInStringSlow(string s)
{
Regex r = new Regex("\n", RegexOptions.Multiline);
MatchCollection mc = r.Matches(s);
return mc.Count + 1;
}
/// <summary>
/// This method counts the number of lines in a string passed as the argument.
/// It uses simple IndexOf and interation to count the newlines. I start
/// count at 1 because there is always at least one line in the string.
/// </summary>
/// <param name="s">You want to count the lines in this.</param>
/// <returns>The number of lines in the string.</returns>
static long CountLinesInString(string s)
{
long count = 1;
int start = 0;
while ((start = s.IndexOf('\n', start)) != -1)
{
count++;
start++;
}
return count;
}
}I benchmarked the above methods for 1 million operations to see just how different they perform. The results were interesting and the numbers are more than one order of magnitude different.
Both of these methods accurately count the number of newlines in text. Regex has some performance problems, but for many applications they are not important.
The O'Reilly C# 3.0 Cookbook is a good reference that I have enjoyed reading. The methods presented there are accurate but may not be optimal. This post includes my original work. [O'Reilly C# Cookbook - oreilly.com]