Dot Net Perls

File Read Benchmarks - C#

by Sam Allen

Problem

Filesystems are slow. This has led many developers to implement private caches for frequently-used files. ASP.NET uses in-memory output caching. Should you cache static files in memory in C#? Should you use output caching everywhere?

Solution: C#

Recently I researched Windows' file cache. Here I performed benchmarks on 4 files with 10 settings to determine whether in-memory caching is worthwhile.

What did you test?

I tested two classes in C#. The first one stores bytes that contain the contents of a file it reads in once. The second one simply asks Windows for the entire file each time.

More details. Both classes contain a property that returns the contents of the byte[] array. The first one copies from memory, but the second goes straight to the filesystem.

Copy Read Hits %. My interpretation of Windows' file cache is that it will "hit" the cache if a frequently-requested file is opened. My testing with the physical disk accesses showed that the physical disk wasn't used continually.

What the graph shows. The green line is the Copy Read Hits %, which should indicate how the Copy Interface is used in Windows. When the green line is at the top of the chart, it means that the file cache is hit. This occurred when my program was accessing files.

Benchmark setup

I tested four text files. The smallest is 16 KB, and then 132 KB, 526 KB, 1005 KB. The files are read in as bytes, not strings. My code accessed these files in two ways. The first class stores the file data internally in C#.

/// <summary>
/// Uses C# file cache.
/// </summary>
class PerlFileA
{
    /// <summary>
    /// Cache of the bytes.
    /// </summary>
    byte[] _cache;

    /// <summary>
    /// Get bytes from cache.
    /// </summary>
    public byte[] Contents
    {
        get
        {
            // Copy the cached bytes into an array and return.
            int length = _cache.Length;
            byte[] ret = new byte[length];
            Array.Copy(_cache, ret, length);
            return ret;
        }
    }

    /// <summary>
    /// Read in cache.
    /// </summary>
    public PerlFileA(string name)
    {
        _cache = File.ReadAllBytes(name);
    }
}

The second class simply does the dumb thing and asks Windows each time for the file data. This way should theoretically use the Windows file cache and also be fast.

/// <summary>
/// Doesn't use C# caching.
/// </summary>
class PerlFileB
{
    /// <summary>
    /// Get byte contents.
    /// </summary>
    public byte[] Contents
    {
        get
        {
            // Return the file's bytes directly.
            return File.ReadAllBytes(_name);
        }
    }

    /// <summary>
    /// Stores name of the file.
    /// </summary>
    string _name;

    /// <summary>
    /// Create new file non-cache.
    /// </summary>
    public PerlFileB(string name)
    {
        _name = name;
    }
}

What did the benchmarks show?

They show that byte[] caching files in C# is useful for very small files only. Caching files in C# is a premature optimization (meaning it yields no benefit or a slowdown) for larger files, and any file that is needed 3 or fewer times.

Code that uses the classes

This block of code is what I used to benchmark the file caching methods. It loops over an array of text files referenced in the foreach loop.

using System;
using System.IO;

class Program
{
    static void Main()
    {
        foreach (string name in new string[]
            { "TextFile0.txt", "TextFile1.txt", "TextFile2.txt", "TextFile3.txt" })
        {
            int m = 5000;
            int x = 10;
            long t1 = Environment.TickCount;
            // Class A
            for (int a = 0; a < m; a++)
            {
                PerlFileA p1 = new PerlFileA(name);
                // How many times the cache is copied.
                for (int i = 0; i < x; i++)
                {
                    byte[] c = p1.Contents;
                }
            }
            long t2 = Environment.TickCount;
            // Class B
            for (int a = 0; a < m; a++)
            {
                PerlFileB p2 = new PerlFileB(name);
                // How many times the file is opened.
                for (int i = 0; i < x; i++)
                {
                    byte[] c = p2.Contents;
                }
            }
            long t3 = Environment.TickCount;
            Console.WriteLine((t2 - t1));
            Console.WriteLine((t3 - t2));
            Console.ReadLine();
        }
    }
}

About Windows' file cache

Windows' file cache works very well here but there is some overhead for very small files. Programmers such as myself could be misled by micro-benchmarks here, and end up hurting performance overall.

Kernels and caches. Windows' file cache doesn't have a lot of 'knobs' to turn, and it has been extensively tuned. It is also written in heavily optimized, low-level code. The overhead of C# greatly diminishes the benefits of custom caches.

What about ASP.NET output caching?

My results would indicate that output caching web pages in ASP.NET is not useful unless there are very expensive operations. For less CPU-intensive pages, it could be more efficient not to cache them at all, as Windows' file cache alone works better.

"Reverse optimizations." A reverse optimization is one that causes a slowdown. My results here show that overly aggressive caching slows down the performance of an app and can overall harm system health.

ASP.NET file usage notes

In my article about Response.WriteFile, I indicated that very small files cached in memory are much faster to write than files. This is true but wouldn't be in much more demanding scenarios. That benchmark is of very limited use and isn't my best work.

Discussion

So what is a developer to do? Basically my research here indicates that some custom caching code can slow down your app. If any file is only used 1-3 times, it is usually best not to cache it in C# at all. Custom caches can help with very small files used often.

Windows is smart. Windows has been tuned to cache files when it is best for system health. This means that by providing a custom cache in your app, you may be working against system health.

Trying to be clever? I am guilty of this but my research here shows that in the greater picture of the system and IIS7 server, not caching aggressively with elaborate code is best. Windows' file cache works well.

Dot Net Perls
About
Sitemap
Source code
RSS
File I/O
Excel Interop Use
Using StreamReader
Recursive File and Directory List
ReadLine for Reading File Into List
File Handling
Recent
Pi
NGEN Installer Class
List Element Equality
DateTime Tips and Tricks
Remove HTML Tags From String
© 2008 Sam Allen. All rights reserved.