HashSet
This is an optimized C# set collection. It helps eliminates duplicate strings or elements in an array. It is a set that hashes its contents.
With HashSet
, we have a simple syntax for taking the union of elements in a set. This is performed in its constructor. More complex methods can be used on the HashSet
.
This program calls the HashSet
constructor. The HashSet
constructor receives a single parameter, which must implement the IEnumerable
string
generic interface
.
string
"cat" is repeated 3 times.HashSet
constructor, which takes the union of elements. It internally calls UnionWith
to eliminate duplicates.ToArray
to convert the HashSet
into a new array, which may be easier to use elsewhere.string
arrays onto the console or as single strings using the string.Join
static
method.using System; using System.Collections.Generic; using System.Linq; // Part 1: input array that contains three duplicate strings. string[] array1 = { "cat", "dog", "cat", "leopard", "tiger", "cat" }; // Part 2: use HashSet constructor to ensure unique strings. var hash = new HashSet<string>(array1); // Part 3: convert to array of strings again. string[] array2 = hash.ToArray(); // Part 4: display the resulting array. Console.WriteLine(string.Join(",", array2));cat,dog,leopard,tiger
This method returns true or false. It tests to see if any of the HashSet
's elements are contained in the IEnumerable
argument's elements. Only one equal element is required.
HashSet
. This means Overlaps returns true for array2, but false for array3.using System; using System.Collections.Generic; int[] array1 = { 1, 2, 3 }; int[] array2 = { 3, 4, 5 }; int[] array3 = { 9, 10, 11 }; HashSet<int> set = new HashSet<int>(array1); bool a = set.Overlaps(array2); bool b = set.Overlaps(array3); // Display results. Console.WriteLine(a); Console.WriteLine(b);True False
SymmetricExceptWith
HashSet
has advanced set logic. SymmetricExceptWith
changes HashSet
so that it contains only the elements in one or the other collection—not both.
var
-keyword. This simplifies the syntax of the HashSet
declaration statement.using System; using System.Collections.Generic; using System.Linq; char[] array1 = { 'a', 'b', 'c' }; char[] array2 = { 'b', 'c', 'd' }; var hash = new HashSet<char>(array1); hash.SymmetricExceptWith(array2); // Write char array. Console.WriteLine(hash.ToArray());ad
Is there any performance benefit to using HashSet
instead of Dictionary
? In the C# language, a Dictionary
with bool
values can work as a set.
HashSet
(string
). We add strings as keys and see if those keys exist.Dictionary
generic collection instead of a Hashset, and perform the same steps otherwise.Dictionary
had slightly better performance in this test than did the HashSet
. In most tests the Dictionary
was faster.Dictionary
should be used instead of HashSet
in places where advanced HashSet
functionality is not needed.using System; using System.Collections.Generic; using System.Diagnostics; const int _max = 10000000; var h = new HashSet<string>(StringComparer.Ordinal); var d = new Dictionary<string, bool>(StringComparer.Ordinal); var a = new string[] { "a", "b", "c", "d", "longer", "words", "also" }; var s1 = Stopwatch.StartNew(); // Version 1: use HashSet. for (int i = 0; i < _max; i++) { foreach (string s in a) { h.Add(s); h.Contains(s); } } s1.Stop(); var s2 = Stopwatch.StartNew(); // Version 2: use Dictionary. for (int i = 0; i < _max; i++) { foreach (string s in a) { d[s] = true; d.ContainsKey(s); } } s2.Stop(); Console.WriteLine(h.Count); Console.WriteLine(d.Count); Console.WriteLine(((double)(s1.Elapsed.TotalMilliseconds * 1000000) / _max).ToString("0.00 ns")); Console.WriteLine(((double)(s2.Elapsed.TotalMilliseconds * 1000000) / _max).ToString("0.00 ns"));7 7 529.99 ns HashSet 517.05 ns Dictionary
Dictionary
Set logic can also be implemented by using a Dictionary
instead of a HashSet
. With a Dictionary
you must specify a value type. This may lead to more confusing code.
Using Dictionary
and HashSet
results in allocations on the managed heap. For small source inputs, the HashSet
and Dictionary
will be slower than simple nested loops.
HashSet
can be applied to elegantly eliminate duplicates in an array. Its constructor takes a union of a collection that implements the IEnumerable
generic interface
.