Dot Net Perls

Windows File Cache Usage

by Sam Allen

Problem

How does the file system cache in IIS and Windows Server work? Can and should developers override this cache? Describe the state of the art with OS file system caches and an overview of their internals.

Solution: Windows

I carefully read the Microsoft TechNet article File Cache Performance and Tuning. Here are my notes and observations of this thorough document.

Is the file cache important?

Yes. The file cache is "essential" to the performance of Windows and IIS. It does however introduce another layer of complexity to the operating system. The file system cache operates "transparently" to your applications.

Brief description of file cache

When sections of files on the disk are referenced by applications, they are mapped into virtual memory. This is performed by the Cache Manager in Windows. This is "transparent," which means no one except the Cache Manager is alerted to this happening.

More on design. The memory used for the file cache is treated the same as other memory sections by Windows. This means the same algorithms and best practices apply to files as to in-memory data structures.

Encapsulation. The file system cache is remarkable because its implementation is hidden from the consumer applications. This is an example of object-oriented design and encapsulation.

Frequently used files

Frequently accessed files tend to remain in memory longer than files that are not commonly used. This is similar to the concept of sliding expiration in ASP.NET. When an item is accessed, its expiration time is reset.

Memory vs. hard disks

Memory is hugely faster than hard disk accesses. This is the principle that underlies the file cache. If hard disks were to become extremely fast (such as advanced solid state drives), this might change.

Is the file cache always beneficial?

No. Sometimes a file is only read once sequentially. In that case the file cache would never register a hit. The sections stored in the cache would not do any good. I am unsure if there could be a significant performance hit.

File cache optimizations

Interestingly, the file cache in Windows 2000 and later and IIS uses prefetching for file sections. This means that a file that is usually accessed after another file can be put in the cache in anticipation of its opening. It may never be hit.

Deferred writes. The cache uses an implementation termed "deferred write-back cache." This means that file system writes "accumulate" in memory--they are not individually written to the disk.

Files vs. segments

The file cache in Windows 2000 uses the concept of "active segments" and not active files. Segments are parts of files. This gives the cache a fine-grained feel of the data being accessed, and what to keep in memory.

Can I adjust all these attributes?

No. Windows 2000 and beyond do not expose all of these "knobs" to the administrator or users. The developers at Microsoft probably felt that changing the values would introduce support nightmares, many bugs, and no performance benefit overall.

Similar to UNIX. The article I read seems to indicate that Windows 2000 borrows heavily from commercial UNIX with its file cache. That was probably very smart, as UNIX and Linux clearly had technical advantages.

How do read-aheads work?

Cache read aheads use heuristics to anticipate which segments to put into virtual storage. My interpretation of the article is that if file B is always accessed after file A, then whenever file A is opened, file B will be opened. The terms "sequentially accessed" refer to this.

Image description. The image shows some performance metrics on Windows Vista. It tells you how many Read Aheads, Copy Reads, MDL Reads, Pin Reads, Data Flushes, and Data Maps are occurring and how frequently. Many parts of this article touch on these metrics.

What is meant by "transparent"?

When you develop a Windows application, you write it as though it is directly working on files. You don't invoke the file cache yourself. The term 'transparent' means that the file cache is hidden, but works between the application and the disk.

How much memory does the file cache use?

Usually a lot. In Windows 2000 and higher, this is determined dynamically. In the performance monitor, the cache performance object will report this value as system cache resident bytes.

File section size. Sections are stored in the virtual memory instead of logical files. The size of each section is 256 KB. (This may change in different Windows OS versions.)

It dominates. On file servers and IIS machines, the file cache is the greatest part of the memory size. However, the size is carefully determined by logic, which negates the need to tweak it yourself.

Is it large enough? The one important thing you can do to improve performance is monitor IIS so it has enough RAM to use for the file cache.

Can I disable file caching?

Yes, but it's hard to do. You would have to provide low-level file I/O routines to do this. As a .NET developer, this would be likely impossible in managed code of any language.

File caching is "everywhere"

File servers like IIS will use the file system cache for every file they serve. Client computers will also use file caches for the files they download. So the same files will be cached in many spots using the same algorithms.

Google Chrome. The article I read does not factor in newer programs like Chrome that use aggressive caching in memory. I expect that Google Chrome and Firefox use many custom caches. So caching is even more prevalent today.

Resource duplication. In a closed system, it would be ideal to eliminate all of the double-caching to save computer resources. Methods of doing this would be interesting to develop and observe.

File cache is global

Interestingly, Windows 2000 and newer versions make it hard to see what applications are doing with the cache. As stated in the start, the file cache introduces another level of complexity, and this reflects that.

Logical reads. A logical read is when an application specifies to read a file. However, the file cache "diverts" this and redirects the request to the virtual cache. The stats reflect logical reads.

File cache transforms I/O

The file cache works behind the scenes (transparently) and will "transform" what the application assumes is a disk read into a virtual memory read. It can do this because it is encapsulated and it overrides the I/O interfaces.

How does the file cache affect benchmarking?

It makes benchmarks harder to perform and repeat reliably. This is because it introduces a level of transparency and complexity. To get around this, testers use measurements of "cold start" and "warm start."

Caveat emptor. The document helpfully provides this warning, which means "buyer beware" (see Wikipedia). It doesn't discourage benchmarking, but cautions you to be careful.

Copy Interface explanation

The Copy Interface is how Microsoft implemented the file cache in a backwards-compatible way. Essentially it means that both the OS and the application have file buffers.

Data exists in two places. The application provides its buffer to the OS, which also has the data in a buffer. The OS then copies the data (bit by bit) into the application buffer. This uses nearly twice the memory that an ideal system would.

Fast Copy Interface explanation

There is also a Fast Copy Interface. This is the same as the Copy Interface, but avoids the "initial call" to the file system. Fast Copy must know that the actual file system won't be needed.

How does Lazy Write work?

It "accumulates" write operations in memory instead of writing to the disk one-by-one. This is similar to the Memento pattern in object oriented programming. It is fascinating and must have been very difficult to implement well.

Lazy write thresholds. When your server is busy, it can accumulate many write requests. It must at some point assert itself and force the writes to be performed. This is called "threshold-triggered lazy write flushes." This is a way to avoid edge cases and problems under severe load.

Dirty cache pages. File sections in the cache that have been written to in memory are termed "dirty cache pages." There is also a way for the OS to write dirty caches to disk immediately (called write-through caching).

Mapping Interface in Windows 2000

This is another interface in Windows 2000. It is more efficient than the Copy Interface because it eliminates the need to store 2 copies of the data. An application that uses Mapping Interface receives a pointer to the data.

Mapping Interface problems. This interface presents a different set of problems. For one, the Cache Manager cannot monitor what is happening to the data. This means it cannot purge pages from the cache based on heuristics.

Applications signal usages. The Mapping Interface contract requires the applications to indicate when they are done with the data from the disk. Then the file segments can be trimmed from the virtual store.

What pinning means. Pinning is a term that refers to a segment of memory being marked as not to be trimmed or rearranged. The word 'pin' is also used in C# programming. When an application is using the Mapping Interface, the data must be pinned so it won't be cleared.

When unpinning happens. When the application is done with the file segment, it is unpinned and therefore able to be removed from the file cache. The lazy writer thread will then flush the dirty pages to the physical disk.

What uses Mapping. I was interested to read that the Windows NTFS file system uses the Mapping Interface for file metadata. This refers to the icons, file types, dates modified and created, and file sizes. Mapping "guarantees the integrity" of the metadata, which is critical. It also taps the benefits of the file cache.

StatisticDescription
Pin Reads/secNumber of reads to pinned data per second.
Pin Read Hit %How effective the pinned segment cache is.

What other file cache interfaces are there?

There is an interface called MDL (for Memory Descriptor List). My impression from the document is that it is less common and not really important for me to delve into.

Performance tuning notes

The document finishes up by presenting some interesting benchmarks. The actual data are 8 years old, but the methods the authors used are useful.

Benchmark setup. The authors used a program called 'Probe', which ran artificial file I/O tests. It accessed segments of a 16 MB file randomly.

Microsoft's assertion. The book Microsoft included with Windows NT claimed that Windows' cache is self-tuning. Later this claim was removed from the software. (Things like this likely have contributed to many programmers' dislike of MS.)

Not self-tuning. The experiments showed that the file cache in Windows 2000 is not self-tuning and that more "knobs" to adjust system parameters could be useful.

How effective is Lazy Write?

It is very effective. The most interesting part of the benchmark results was that when Windows 2000 was subjected to 550 logical disk writes per second, it only performed 8.5 physical writes. This shows that the Lazy Write optimization is extremely useful.

Discussion

Don't rely on the Windows and IIS file caches to meet your needs. However, the file cache is a critical optimization and you must be careful not to duplicate work done by it.

Caching static files. One question that I had was whether it is useful to store a static file in memory in C#. My reading of this article indicates that it is not useful to cache static files in C# code.

Don't duplicate the file cache. It is not useful to try to implement your own file cache in C# on top of Windows and IIS. The file cache uses sophisticated low-level algorithms that are far faster than anything you can write in C#.

ASP.NET questions. I have seen questions about whether it is worthwhile to cache an entire PDF in ASP.NET. My answer is that it is not. Read the PDF off of the disk, and let IIS cache the file for you.

Dot Net Perls
About
Sitemap
Source code
RSS
File I/O
Excel Interop Use
Using StreamReader
Recursive File and Directory List
ReadLine for Reading File Into List
File Handling
Recent
Pi
NGEN Installer Class
List Element Equality
DateTime Tips and Tricks
Remove HTML Tags From String
© 2008 Sam Allen. All rights reserved.