C# Dictionary Optimization, Increase Capacity

Optimize Dictionary lookups with a capacity. Measure the effect of capacity on speed.

Dictionary optimization. A Dictionary can be optimized with a higher capacity. We sometimes optimize hashtables like Dictionary simply by changing the capacity of the collection to a value higher than the default. This trades space for speed.CapacityDictionary

Benchmark. The implementation of Dictionary is opaque and few tutorials will tell you the tricks to squeeze out nanoseconds from lookups. The example here passes a parameter to the Dictionary constructor to indicate a minimum capacity.
And: This forces the Dictionary to allocate at least that many internal "buckets" and entries.
So: By increasing capacity, we reduce hash collisions and improve performance. A multiplier of 4 yields a speedup of 7% in the example.
Details: The program puts 500 strings as keys into the Dictionary. Capacity will be 500, 1000, 1500, and so on.
GetStrings: This method uses the Path.GetRandomFileName method to generate 500 random file names of 12 characters.
C# program that optimizes Dictionary using System; using System.Collections.Generic; using System.Diagnostics; using System.IO; class Program { static void Main() { // Loop through full capacity multipliers. for (int multiplier = 1; multiplier <= 10; multiplier++) { const int len = 500; // Allocate with multiplied capacity. var dict = new Dictionary<string, bool>(len * multiplier); var arr = GetStrings(len); // Get random keys foreach (string val in arr) { dict[val] = true; // Set keys } const int m = 5000 * 10; Stopwatch s1 = Stopwatch.StartNew(); for (int i = 0; i < m; i++) { for (int j = 0; j < arr.Length; j++) { bool b = dict[arr[j]]; // Lookup element b = dict[arr[0]]; // Lookup first element } } s1.Stop(); // Write timings Console.Write(multiplier.ToString("00")); Console.Write(", "); Console.Write(s1.ElapsedMilliseconds); Console.WriteLine(" ms"); } Console.Read(); } static string[] GetStrings(int len) { // Allocate and return an array of random strings. var arr = new string[len]; for (int i = 0; i < arr.Length; i++) { arr[i] = Path.GetRandomFileName(); } return arr; } } Output 01, 2744 ms (Exact capacity) 02, 2665 ms 03, 2553 ms 04, 2546 ms (7.2% faster than exact capacity with multiplier 1) 05, 2569 ms 06, 2562 ms 07, 2532 ms 08, 2552 ms 09, 2531 ms 10, 2573 ms

Notes, results. We write the current multiplier and the time for the lookups to complete. The program shows that multiplying the full capacity by 4 can improve lookup performance by 7.2% over multiplying it by 1.Multiply

Discussion. We describe the internal data structures in the Dictionary collection in that allows this optimization tip to yield results. Whenever you add or lookup an entry in the Dictionary, a buckets array is accessed (written or read).

The buckets array contains integers that point to the actual data structs in an Entry array. When you have more buckets, you can more closely map the bucket integers to the accurate entry.
And: With fewer buckets you will have to read in the next entries in the chains more.
Int Array

Summary. We looked at a way to optimize a Dictionary. We apply a small multiplier such as 4 to the initial and final capacity of the Dictionary. This optimization allows the Dictionary to have more space to store buckets that point to entries.Optimization

© 2007-2020 Sam Allen. Every person is special and unique. Send bug reports to info@dotnetperls.com.