HashSet. This is an optimized C# set collection. It helps eliminates duplicate strings or elements in an array. It is a set that hashes its contents.
With HashSet, we have a simple syntax for taking the union of elements in a set. This is performed in its constructor. More complex methods can be used on the HashSet.
This program calls the HashSet constructor. The HashSet constructor receives a single parameter, which must implement the IEnumerable string generic interface.
Part 1 We create an array that contains several duplicated strings: the string "cat" is repeated 3 times.
Part 2 We use the HashSet constructor, which takes the union of elements. It internally calls UnionWith to eliminate duplicates.
Part 3 We invoke ToArray to convert the HashSet into a new array, which may be easier to use elsewhere.
Part 4 The program displays string arrays onto the console or as single strings using the string.Join static method.
using System;
using System.Collections.Generic;
using System.Linq;
// Part 1: input array that contains three duplicate strings.
string[] array1 =
{
"cat",
"dog",
"cat",
"leopard",
"tiger",
"cat"
};
// Part 2: use HashSet constructor to ensure unique strings.
var hash = new HashSet<string>(array1);
// Part 3: convert to array of strings again.
string[] array2 = hash.ToArray();
// Part 4: display the resulting array.
Console.WriteLine(string.Join(",", array2));cat,dog,leopard,tiger
Overlaps. This method returns true or false. It tests to see if any of the HashSet's elements are contained in the IEnumerable argument's elements. Only one equal element is required.
Next The element 3 is in the HashSet. This means Overlaps returns true for array2, but false for array3.
using System;
using System.Collections.Generic;
int[] array1 = { 1, 2, 3 };
int[] array2 = { 3, 4, 5 };
int[] array3 = { 9, 10, 11 };
HashSet<int> set = new HashSet<int>(array1);
bool a = set.Overlaps(array2);
bool b = set.Overlaps(array3);
// Display results.
Console.WriteLine(a);
Console.WriteLine(b);True
False
SymmetricExceptWith. HashSet has advanced set logic. SymmetricExceptWith changes HashSet so that it contains only the elements in one or the other collection—not both.
Here This example shows the use of the var-keyword. This simplifies the syntax of the HashSet declaration statement.
using System;
using System.Collections.Generic;
using System.Linq;
char[] array1 = { 'a', 'b', 'c' };
char[] array2 = { 'b', 'c', 'd' };
var hash = new HashSet<char>(array1);
hash.SymmetricExceptWith(array2);
// Write char array.
Console.WriteLine(hash.ToArray());ad
Benchmark. Is there any performance benefit to using HashSet instead of Dictionary? In the C# language, a Dictionary with bool values can work as a set.
Version 1 We test a HashSet(string). We add strings as keys and see if those keys exist.
Version 2 We use the Dictionary generic collection instead of a Hashset, and perform the same steps otherwise.
Result The Dictionary had slightly better performance in this test than did the HashSet. In most tests the Dictionary was faster.
Thus Dictionary should be used instead of HashSet in places where advanced HashSet functionality is not needed.
using System;
using System.Collections.Generic;
using System.Diagnostics;
const int _max = 10000000;
var h = new HashSet<string>(StringComparer.Ordinal);
var d = new Dictionary<string, bool>(StringComparer.Ordinal);
var a = new string[] { "a", "b", "c", "d", "longer", "words", "also" };
var s1 = Stopwatch.StartNew();
// Version 1: use HashSet.
for (int i = 0; i < _max; i++)
{
foreach (string s in a)
{
h.Add(s);
h.Contains(s);
}
}
s1.Stop();
var s2 = Stopwatch.StartNew();
// Version 2: use Dictionary.
for (int i = 0; i < _max; i++)
{
foreach (string s in a)
{
d[s] = true;
d.ContainsKey(s);
}
}
s2.Stop();
Console.WriteLine(h.Count);
Console.WriteLine(d.Count);
Console.WriteLine(((double)(s1.Elapsed.TotalMilliseconds * 1000000) / _max).ToString("0.00 ns"));
Console.WriteLine(((double)(s2.Elapsed.TotalMilliseconds * 1000000) / _max).ToString("0.00 ns"));7
7
529.99 ns HashSet
517.05 ns Dictionary
Dictionary. Set logic can also be implemented by using a Dictionary instead of a HashSet. With a Dictionary you must specify a value type. This may lead to more confusing code.
Allocations. Using Dictionary and HashSet results in allocations on the managed heap. For small source inputs, the HashSet and Dictionary will be slower than simple nested loops.
But When the source input becomes large with thousands of elements, hashed collections are faster.
HashSet can be applied to elegantly eliminate duplicates in an array. Its constructor takes a union of a collection that implements the IEnumerable generic interface.
Dot Net Perls is a collection of tested code examples. Pages are continually updated to stay current, with code correctness a top priority.
Sam Allen is passionate about computer languages. In the past, his work has been recommended by Apple and Microsoft and he has studied computers at a selective university in the United States.
This page was last updated on May 12, 2023 (edit).