C# IEqualityComparer Dictionary

by Sam Allen - Updated January 8, 2010

You want to improve your Dictionary in the C# programming language by using the IEqualityComparer, which allows you to implement a custom GetHashCode method. Here we implement this interface in the C# language and also measure GetHashCode methods on custom string keys, and then discuss implementation issues and hash table theory.

--- Key point: ---
You can implement IEqualityComparer interface for Dictionary.
The built-in C# hash code on string types is not ideal.
Most GetHashCode implementations will be specific to your data.

--- Benchmark: ---
The implementation was tested on 499 string keys.
    Default string IEqualityComparer: 4043 ms
    Custom IEqualityComparer:         2736 ms [faster]

Implementing IEqualityComparer

First, here we see the custom IEqualityComparer The author defined to hash the string keys you see above. It implements the IEqualityComparer interface, and its name is StringIndexKeyComparer. Your comparer can have any name, but it must use the interface inheritance syntax as shown.

public class StringIndexKeyComparer : IEqualityComparer<string>
{
    /// <summary>
    /// Has a good distribution.
    /// </summary>
    const int _multiplier = 89;

    /// <summary>
    /// Whether the two strings are equal
    /// </summary>
    public bool Equals(string x, string y)
    {
        return x == y;
    }

    /// <summary>
    /// Return the hash code for this string.
    /// </summary>
    public int GetHashCode(string obj)
    {
        // Stores the result.
        int result = 0;

        // Don't compute hash code on null object.
        if (obj == null)
        {
            return 0;
        }

        // Get length.
        int length = obj.Length;

        // Return default code for zero-length strings [valid, nothing to hash with].
        if (length > 0)
        {
            // Compute hash for strings with length greater than 1
            char let1 = obj[0];          // First char of string we use [significant to order]
            char let2 = obj[length - 1]; // Final char

            // Compute hash code from two characters
            int part1 = let1 + length;
            result = (_multiplier * part1) + let2 + length;
        }
        return result;
    }
}

Description. This is a custom class that implements IEqualityComparer. To implement that interface, it must define Equals and GetHashCode. The class uses the default Equals method, but uses a new GetHashCode implementation.

(Visit msdn.microsoft.com.)

GetHashCode overview. The GetHashCode method above is customized to be optimal on the string keys the author had. For your program, you will need to write your own GetHashCode method. You should compute the result from the most significant parts of your keys, which provide the best distribution of the return value.

GetHashCode method. The GetHashCode shown here computes its result from the key[1] character, and from the final character in key. Additionally, it applies a multiplier to the first character, and also adds the length of the key.

Understanding IEqualityComparer

You can declare a generic Dictionary with any type keys and values, but the hashing method used for those types is always the default, unless you pass in a IEqualityComparer instance in the Dictionary constructor. For my project, I have a set of about 650 string keys. The string keys are in a certain format, with a pattern to their characters.

String keys: 22D-Array-IEnumerable
             22D-Array-Use
             27-Zip
             27-Zip-DEFLATE-Ratio
             27-Zip-Examples
             2About
             2Action-Dictionary
             2Adobe-CS3-Rounded
             2Adobe-Fireworks-CS3-Resampling [...]

String notes. The string keys I have all have the same first character, which is a digit. Because this first character is monotonous, it isn't useful for hashing because its distribution is poor. In the above strings, I decided that the best distribution of hash codes would use the second character, the length, and another character. To reduce collisions, one character would need to be multiplied by a constant.

Using IEqualityComparer with Dictionary

Here we see that to use the custom IEqualityComparer with the Dictionary, you must pass an instance of the IEqualityComparer to the Dictionary constructor. It is a good idea to use a static instance of your IEqualityComparer. The static instance is not shown in this article.

using System;
using System.Collections.Generic;

/// <summary>
/// [see above]
/// </summary>
public class StringIndexKeyComparer : IEqualityComparer<string>
{
    // [see above]
}

class Program
{
    static void Main()
    {
        var custom = new Dictionary<string, bool>(new StringIndexKeyComparer());
        custom.Add("22D-Array-IEnumerable", true);
        custom.Add("22D-Array-Use", true);
        custom.Add("27-Zip", true);
        custom.Add("27-Zip-DEFLATE-Ratio", true);
        custom.Add("27-Zip-Examples", true);
        custom.Add("2About", true);
        custom.Add("2Action-Dictionary", true);
        custom.Add("2Adobe-CS3-Rounded", true);
        custom.Add("2Adobe-Fireworks-CS3-Resampling", true);
    }
}

Description of the Main method. The code about uses the same custom IEqualityComparer shown earlier in this document. You can see it is passed to the Dictionary in the constructor.

Finding excellent GetHashCode

Hashing is a complicated topic but an excellent hash code can really improve your Dictionary's performance. There are excellent books on this topic, such as Robert Sedgewick's Algorithms in C++, which I used.

Multipliers. To better distribute the hash code computations in your program, using a multiplier is useful. A hashtable has an internal array of buckets, and when you use a multiplier, you can spread out the indexes to those buckets. This reduces the number of collisions and improves lookup performance.

Note on the number 89. This constant is one I found to provide a good hashing distribution in the data set I am using. There is a custom constructor I defined on the StringIndexKeyComparer. It is not important to the code style here.

Using custom GetHashCode

Any GetHashCode method you write will have to be custom to your keys. Therefore, for your project just delete the contents of GetHashCode and start over. For certain positions in the industry, such as search engine development, it could benefit you. It is not worth implementing on most projects, unless there is a bottleneck on your Dictionary, which rarely occurs.

Summary

Here we saw some example code, which won't be useful for you, but the explanations may help you understand how to use IEqualityComparer in your project. The most important part of IEqualityComparer here is its implementation of GetHashCode, which can greatly influence the Dictionary's performance.

(Do not copy this page.)

Dot Net Perls | Search
Dictionary | Dictionary Examples, Keys and Values | Dictionary Lookup Comparison | Sort Dictionary Values | ToDictionary Method | TryGetValue Method
C# | Parameter Optimization Tip | SaveFileDialog Tutorial | IntegralHeight Property (Windows Forms) | Array.FindIndex Method
© 2010 Sam Allen. All rights reserved.
Dot Net Perls | Sam Allen