String keys. We compare GetHashCode methods on custom string keys. We use it with the Dictionary type. You can implement the IEqualityComparer interface for the Dictionary collection.
First example. We see the custom IEqualityComparer. StringIndexKeyComparer implements the IEqualityComparer interface. A comparer must use the interface inheritance syntax as shown.
Tip You should compute the result from the most significant parts of your keys, which provide the best distribution of the return value.
public class StringIndexKeyComparer : IEqualityComparer<string>
{
/// <summary>
/// Has a good distribution.
/// </summary>
const int _multiplier = 89;
/// <summary>
/// Whether the two strings are equal
/// </summary>
public bool Equals(string x, string y)
{
return x == y;
}
/// <summary>
/// Return the hash code for this string.
/// </summary>
public int GetHashCode(string obj)
{
// Stores the result.
int result = 0;
// Don't compute hash code on null object.
if (obj == null)
{
return 0;
}
// Get length.
int length = obj.Length;
// Return default code for zero-length strings [valid, nothing to hash with].
if (length > 0)
{
// Compute hash for strings with length greater than 1
char let1 = obj[0]; // First char of string we use
char let2 = obj[length - 1]; // Final char// Compute hash code from two characters
int part1 = let1 + length;
result = (_multiplier * part1) + let2 + length;
}
return result;
}
}
Keys, analysis. A Dictionary can have any type keys and values. But the hashing method is the default, unless you pass in an IEqualityComparer in the Dictionary constructor.
Note For my project, I have a set of about 650 string keys. The string keys are in a certain format, with a pattern to their characters.
And I decided that the best distribution of hash codes would use the second character, the length, and another character.
Benchmark, IEqualityComparer. Often IEqualityComparer can be used as a performance optimization, or to reduce string allocations. We can add logic to GetHashCode without creating a string.
Version 1 The code uses the same custom IEqualityComparer shown earlier in this document. The ContainsKey method is called many times.
Version 2 We use a Dictionary without a custom IEqualityComparer. So each string uses the default GetHashCode method.
Result In 2020, using the custom IEqualityComparer is faster. We avoid some expensive hashing—but this code depends on the key format.
using System;
using System.Diagnostics;
using System.Collections.Generic;
public class StringIndexKeyComparer : IEqualityComparer<string>
{
const int _multiplier = 89;
public bool Equals(string x, string y)
{
return x == y;
}
public int GetHashCode(string obj)
{
int result = 0;
if (obj == null)
{
return 0;
}
int length = obj.Length;
if (length > 0)
{
char let1 = obj[0];
char let2 = obj[length - 1];
int part1 = let1 + length;
result = (_multiplier * part1) + let2 + length;
}
return result;
}
}
class Program
{
const int _max = 100000000;
static void Main()
{
var test = new Dictionary<string, bool>(new StringIndexKeyComparer());
var test2 = new Dictionary<string, bool>();
// Elements.
string[] elements = { "22D-Array-IEnumerable",
"22D-Array-Use", "27-Zip", "27-Zip-DEFLATE-Ratio",
"27-Zip-Examples", "2About", "2Action-Dictionary",
"2Adobe-CS3-Rounded", "2Adobe-Fireworks-CS3-Resampling" };
// Add elements.foreach (string element in elements)
{
test.Add(element, true);
test2.Add(element, true);
}
var s1 = Stopwatch.StartNew();
// Version 1: test custom IEqualityComparer.for (int i = 0; i < _max; i++)
{
if (!test.ContainsKey("2About"))
{
return;
}
}
s1.Stop();
var s2 = Stopwatch.StartNew();
// Version 2: use default dictionary.for (int i = 0; i < _max; i++)
{
if (!test2.ContainsKey("2About"))
{
return;
}
}
s2.Stop();
Console.WriteLine(((double)(s1.Elapsed.TotalMilliseconds * 1000000) / _max).ToString("0.00 ns"));
Console.WriteLine(((double)(s2.Elapsed.TotalMilliseconds * 1000000) / _max).ToString("0.00 ns"));
}
}15.54 ns StringIndexKeyComparer
23.34 ns Default
Fuzzy lookups. A Dictionary by default uses exact lookups. It computes a hash code based on a key and if another value matches it exactly, the two keys match. But this can be changed.
Next We implement a custom, fuzzy lookup that ignores characters. This is the custom IEqualityComparer.
Tip It computes hashes (in GetHashCode) and compares keys (in Equals) ignoring all hyphen characters.
So The string "cat-1" is considered equal to the string "cat1." The hyphen (dash) character is simply ignored.
using System.Collections.Generic;
class CustomComparer : IEqualityComparer<string>
{
public bool Equals(string x, string y)
{
int xPos = 0;
int yPos = 0;
while (true)
{
// ... Fail if past end.
if (xPos >= x.Length)
{
return false;
}
if (yPos >= y.Length)
{
return false;
}
// ... Skip past hyphens.
if (x[xPos] == '-')
{
xPos++;
continue;
}
if (y[yPos] == '-')
{
yPos++;
continue;
}
// ... Fail if different.
if (x[xPos] != y[yPos])
{
return false;
}
// ... If we have traversed both strings with no error, we match.
if (xPos == x.Length - 1 &&
yPos == y.Length - 1)
{
return true;
}
// ... Increment both places.
xPos++;
yPos++;
}
}
public int GetHashCode(string obj)
{
int code = 0;
// ... Add together all chars.
for (int i = 0; i < obj.Length; i++)
{
if (obj[i] != '-')
{
code += obj[i];
}
}
return code;
}
}
Fuzzy, continued. Next, we use the CustomComparer we created. This Main method shows how hyphens are ignored in the keys of a Dictionary that uses this comparer.
Here All three lookups return the correct value. Some unusual cases, not shown, may not work correctly (such as trailing hyphens).
using System;
using System.Collections.Generic;
class Program
{
static void Main()
{
// ... Add data to dictionary.
var dictionary = new Dictionary<string, int>(new CustomComparer());
dictionary["cat-1"] = 1;
dictionary["cat-2"] = 2;
dictionary["dog-bark"] = 10;
dictionary["dog-woof"] = 20;
// ... Lookup values, ignoring hyphens.
Console.WriteLine(dictionary["cat1"]);
Console.WriteLine(dictionary["cat-1"]);
Console.WriteLine(dictionary["dog--bark"]);
}
}1
1
10
A discussion. An excellent hash code can improve your Dictionary's performance. There are excellent books on this topic, such as Robert Sedgewick's Algorithms in C++, which I used.
Tip To better distribute the hash code computations in your program, using a multiplier is useful.
Tip 2 A hashtable has an internal array of buckets, and when you use a multiplier, you can spread out the indexes to those buckets.
And This reduces the number of hash collisions and improves lookup performance.
Multiplier. The constant 89 provides a good hashing distribution in many string data sets. There is a custom constructor I defined on the StringIndexKeyComparer.
Tip Any GetHashCode method will have to be custom to your keys. Therefore, just delete the contents of GetHashCode and start over.
But It is not worth implementing on most projects, unless there is a bottleneck on your Dictionary, which rarely occurs.
GetHashCode. This method can be optimized. Try multiplying its result by an important character in the string. Also, there is an easy way to test how selective your hash numbers are.
Detail Increment a static int field each time Equals is called. With a better hash code, Equals is called fewer times.
Tip With a more selective hash, the internal Dictionary methods need to search fewer values with Equals, so it is called less.
A summary. With IEqualityComparer, we look up strings based on the results of a method. We can apply fuzzy matching, or any other transformation before comparing keys.
Dot Net Perls is a collection of tested code examples. Pages are continually updated to stay current, with code correctness a top priority.
Sam Allen is passionate about computer languages. In the past, his work has been recommended by Apple and Microsoft and he has studied computers at a selective university in the United States.