C# Dictionary Size and Performance

Test how Dictionary size affects lookup performance. Larger sizes can improve speed.

Dictionary size influences lookup performance. Smaller Dictionaries are faster than larger Dictionaries. This is true when they are tested for keys that always exist in both. Reducing Dictionary size could help improve performance.Optimization

Benchmark. We see a benchmark that reads in a list of 172820 English words. These are English words so have similar letter frequencies and patterns. A Dictionary containing all the words is compared to a Dictionary that contains the first half.
Then: We measure lookup times for the first half of words, such that all keys exist.
Info: The List is trimmed to half its size with Take, and then the second Dictionary is created. We use the Take and ToDictionary methods from LINQ.
Take, TakeWhileToDictionary
And: Each call to ContainsKey succeeds and returns true. We have a 100% hit rate here.
C# program that benchmarks Dictionary instances using System; using System.Collections.Generic; using System.Diagnostics; using System.IO; using System.Linq; class Program { static void Main() { // Read in list of English words. var lines = new List<string>(File.ReadAllLines("enable1.txt")); // This dictionary contains all strings. [172820 keys] var dictionary1 = lines.ToDictionary(s => s, s => true); // Shrink the List to half its size. lines = lines.Take(lines.Count / 2).ToList(); // This dictionary contains half. [86410 keys] var dictionary2 = lines.ToDictionary(s => s, s => true); const int m = 100; Stopwatch s1 = Stopwatch.StartNew(); for (int i = 0; i < m; i++) { foreach (string s in lines) { if (!dictionary1.ContainsKey(s)) { throw new Exception(); } } } s1.Stop(); Stopwatch s2 = Stopwatch.StartNew(); for (int i = 0; i < m; i++) { foreach (string s in lines) { if (!dictionary2.ContainsKey(s)) { throw new Exception(); } } } s2.Stop(); Console.WriteLine("{0},{1}", s1.ElapsedMilliseconds, s2.ElapsedMilliseconds); Console.Read(); } }

Results. The smaller Dictionary (with half the number of keys) was much faster. In this case, the behavior of both Dictionaries on the input was identical. This means that having unneeded keys in the Dictionary makes it slower.
Dictionary size and performance benchmark Full Dictionary: 791 ms Half-size Dictionary: 591 ms [faster]

My perspective is that you should use separate Dictionaries for separate purposes. If you have two sets of keys, do not store them in the same Dictionary. If you can divide them up, you can enhance lookup performance.

Hashtables usually are implemented with internal arrays of buckets. When the hash function results in a collision with its mathematical index, the runtime will have to loop through an array of the values, comparing strings each time.
Therefore: Reducing total hashtable size enhances average performance with fewer collisions.

Summary. Here we see an example of how you enhance the performance of your Dictionary by subdividing it. Even when the large and small Dictionaries have the identical results on a set of input, the smaller one will perform better on average.

© 2007-2020 Sam Allen. Every person is special and unique. Send bug reports to info@dotnetperls.com.