Collections in .NET have an optional capacity property, which you can specify to adjust the amount of memory allocated. Is it worth the time and effort to adjust this property? Experiment with Dictionary and List capacity values.
I found that when you don't specify a capacity, your collection will be reallocated several times if it even grows to have 100 elements. This makes populating your Dictionary or List twice as slow.
Is it important? Should developers bother with Dictionary or List capacity? What I find might be surprising. Having .NET "resize" the underlying arrays is really expensive and can double the time even in small collections.
I tried to simulate somewhat realistic situations in my programs. Most Dictionaries I have used have a string key and around 100-1000 elements. So I used a 100-element size of the collections.
I benchmarked Dictionaries and Lists. Below is the code I used to test, with the first four test methods (A, B, C and D) using Dictionary.
using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Text;
class CapacityTest
{
/// <summary>
/// The number of iterations.
/// </summary>
static int _m = 100000;
/// <summary>
/// List of values we use for the test.
/// </summary>
List<string> _values = new List<string>();
public CapacityTest()
{
//
// 100 random strings for benchmarking.
//
for (int i = 0; i < 100; i++)
{
_values.Add(System.IO.Path.GetRandomFileName());
}
long t1 = Environment.TickCount;
A_Dictionary();
long t2 = Environment.TickCount;
B_DictionaryCapacity();
long t3 = Environment.TickCount;
C_DictionaryCapacity();
long t4 = Environment.TickCount;
D_DictionaryCapacity();
long t5 = Environment.TickCount;
E_List();
long t6 = Environment.TickCount;
F_ListCapacity();
long t7 = Environment.TickCount;
//
// Write dictionary times
//
Console.WriteLine(t2 - t1);
Console.WriteLine(t3 - t2);
Console.WriteLine(t4 - t3);
Console.WriteLine(t5 - t4);
//
// Write list times
//
Console.WriteLine(t6 - t5);
Console.WriteLine(t7 - t6);
//
// Finished
//
Console.ReadLine();
}
void A_Dictionary()
{
//
// No capacity
//
for (int i = 0; i < _m; i++)
{
Dictionary<string, int> d = new Dictionary<string, int>();
foreach (string k in _values)
{
d.Add(k, 0);
}
}
}
void B_DictionaryCapacity()
{
//
// Capacity from collection Count
//
for (int i = 0; i < _m; i++)
{
Dictionary<string, int> d = new Dictionary<string, int>(_values.Count);
foreach (string k in _values)
{
d.Add(k, 0);
}
}
}
void C_DictionaryCapacity()
{
//
// Const capacity
//
for (int i = 0; i < _m; i++)
{
Dictionary<string, int> d = new Dictionary<string, int>(100);
foreach (string k in _values)
{
d.Add(k, 0);
}
}
}
void D_DictionaryCapacity()
{
//
// Huge capacity (1 order of magnitude too large)
//
for (int i = 0; i < _m; i++)
{
Dictionary<string, int> d = new Dictionary<string, int>(1000);
foreach (string k in _values)
{
d.Add(k, 0);
}
}
}
void E_List()
{
//
// No capacity
//
for (int i = 0; i < _m * 5; i++)
{
List<string> l = new List<string>();
foreach (string k in _values)
{
l.Add(k);
}
}
}
void F_ListCapacity()
{
//
// Exact capacity
//
for (int i = 0; i < _m * 5; i++)
{
List<string> l = new List<string>(100);
foreach (string k in _values)
{
l.Add(k);
}
}
}
}It is very advantageous to specify capacity in your C# programs. What I found to be the best method was using the List's already-known capacity as the capacity for the new collection. That would be 100.
The graphs. As usual, I present my findings on an Excel chart. On the image, the orange bars are the collections with no capacity. You can see that for the dictionary, B and C are the best.
You can modify your program to print out what the Count property of your collections is after they are used. If you are working with 150 elements, you can simply use the constant 200 for better performance (saving several array resizes).
For my latest programs, I decided to simply define some const global variables and use them through my classes. Make a static class and use consts called ElementCountEstimate or other fields with the word estimate.
//
// Define these globals in this class, and access them as
// GlobalClass.ElementCountEstimate.
// Pass that value to your new collections.
//
public static class GlobalClass
{
public const int ElementCountEstimate = 200;
public const int OtherCountEstimate = 100;
}The memory model of .NET programs is complex and takes care of most of the hard stuff for you, but giving it hints about the sizes of the collections is a substantial optimization. The capacity feature is underused and could speed up the performance of Dictionaries and Lists by 2x.
I could not test the impact of the capacity property in my real-world programs. This is because .NET is doing so many things at once that it gets lost in the noise.
Memory pressure. Another final thing to consider is memory pressure in .NET. When the collections' underlying arrays are resized, this results in more memory pressure due to the reallocation. Therefore many aspects of your program will become slower and less efficient.