You want to improve your Dictionary in the C# programming language by using the IEqualityComparer, which allows you to implement a custom GetHashCode method. Here we implement this interface in the C# language and also measure GetHashCode methods on custom string keys, and then discuss implementation issues and hash table theory.
--- Key point: ---
You can implement IEqualityComparer interface for Dictionary.
The built-in C# hash code on string types is not ideal.
Most GetHashCode implementations will be specific to your data.
--- Benchmark: ---
The implementation was tested on 499 string keys.
Default string IEqualityComparer: 4043 ms
Custom IEqualityComparer: 2736 ms [faster]First, here we see the custom IEqualityComparer The author defined to hash the string keys you see above. It implements the IEqualityComparer interface, and its name is StringIndexKeyComparer. Your comparer can have any name, but it must use the interface inheritance syntax as shown.
public class StringIndexKeyComparer : IEqualityComparer<string>
{
/// <summary>
/// Has a good distribution.
/// </summary>
const int _multiplier = 89;
/// <summary>
/// Whether the two strings are equal
/// </summary>
public bool Equals(string x, string y)
{
return x == y;
}
/// <summary>
/// Return the hash code for this string.
/// </summary>
public int GetHashCode(string obj)
{
// Stores the result.
int result = 0;
// Don't compute hash code on null object.
if (obj == null)
{
return 0;
}
// Get length.
int length = obj.Length;
// Return default code for zero-length strings [valid, nothing to hash with].
if (length > 0)
{
// Compute hash for strings with length greater than 1
char let1 = obj[0]; // First char of string we use [significant to order]
char let2 = obj[length - 1]; // Final char
// Compute hash code from two characters
int part1 = let1 + length;
result = (_multiplier * part1) + let2 + length;
}
return result;
}
}Description. This is a custom class that implements IEqualityComparer. To implement that interface, it must define Equals and GetHashCode. The class uses the default Equals method, but uses a new GetHashCode implementation.
GetHashCode overview. The GetHashCode method above is customized to be optimal on the string keys the author had. For your program, you will need to write your own GetHashCode method. You should compute the result from the most significant parts of your keys, which provide the best distribution of the return value.
GetHashCode method. The GetHashCode shown here computes its result from the key[1] character, and from the final character in key. Additionally, it applies a multiplier to the first character, and also adds the length of the key.
You can declare a generic Dictionary with any type keys and values, but the hashing method used for those types is always the default, unless you pass in a IEqualityComparer instance in the Dictionary constructor. For my project, I have a set of about 650 string keys. The string keys are in a certain format, with a pattern to their characters.
String keys: 22D-Array-IEnumerable
22D-Array-Use
27-Zip
27-Zip-DEFLATE-Ratio
27-Zip-Examples
2About
2Action-Dictionary
2Adobe-CS3-Rounded
2Adobe-Fireworks-CS3-Resampling [...]String notes. The string keys I have all have the same first character, which is a digit. Because this first character is monotonous, it isn't useful for hashing because its distribution is poor. In the above strings, I decided that the best distribution of hash codes would use the second character, the length, and another character. To reduce collisions, one character would need to be multiplied by a constant.
Here we see that to use the custom IEqualityComparer with the Dictionary, you must pass an instance of the IEqualityComparer to the Dictionary constructor. It is a good idea to use a static instance of your IEqualityComparer. The static instance is not shown in this article.
using System;
using System.Collections.Generic;
/// <summary>
/// [see above]
/// </summary>
public class StringIndexKeyComparer : IEqualityComparer<string>
{
// [see above]
}
class Program
{
static void Main()
{
var custom = new Dictionary<string, bool>(new StringIndexKeyComparer());
custom.Add("22D-Array-IEnumerable", true);
custom.Add("22D-Array-Use", true);
custom.Add("27-Zip", true);
custom.Add("27-Zip-DEFLATE-Ratio", true);
custom.Add("27-Zip-Examples", true);
custom.Add("2About", true);
custom.Add("2Action-Dictionary", true);
custom.Add("2Adobe-CS3-Rounded", true);
custom.Add("2Adobe-Fireworks-CS3-Resampling", true);
}
}Description of the Main method. The code about uses the same custom IEqualityComparer shown earlier in this document. You can see it is passed to the Dictionary in the constructor.
Hashing is a complicated topic but an excellent hash code can really improve your Dictionary's performance. There are excellent books on this topic, such as Robert Sedgewick's Algorithms in C++, which I used.
Multipliers. To better distribute the hash code computations in your program, using a multiplier is useful. A hashtable has an internal array of buckets, and when you use a multiplier, you can spread out the indexes to those buckets. This reduces the number of collisions and improves lookup performance.
Note on the number 89. This constant is one I found to provide a good hashing distribution in the data set I am using. There is a custom constructor I defined on the StringIndexKeyComparer. It is not important to the code style here.
Any GetHashCode method you write will have to be custom to your keys. Therefore, for your project just delete the contents of GetHashCode and start over. For certain positions in the industry, such as search engine development, it could benefit you. It is not worth implementing on most projects, unless there is a bottleneck on your Dictionary, which rarely occurs.
Here we saw some example code, which won't be useful for you, but the explanations may help you understand how to use IEqualityComparer in your project. The most important part of IEqualityComparer here is its implementation of GetHashCode, which can greatly influence the Dictionary's performance.