C# Distinct Method, Get Unique Elements Only

Invoke the Distinct extension method from System.Linq to get unique elements.
Distinct. This removes all duplicate elements in a collection. It returns only distinct (or unique) elements. The System.Linq namespace provides this extension method.LINQ
Distinct returns an IEnumerable collection. We can loop over the collection returned by Distinct, or invoke other extension methods upon it.IEnumerable
An example. We declare and allocate an array on the managed heap. The array contains 6 elements, but only 4 different numbers. Two are repeated. This fact is key to the program's output.Int Array

Next: We apply the Distinct extension method to the array reference, and then assign the result to an implicitly typed local variable.


Finally: We loop over the result and display the distinct elements in the processed array.

C# program that removes duplicate elements using System; using System.Linq; class Program { static void Main() { // Declare an array with some duplicated elements in it. int[] array1 = { 1, 2, 2, 3, 4, 4 }; // Invoke Distinct extension method. var result = array1.Distinct(); // Display results. foreach (int value in result) { Console.WriteLine(value); } } } Output 1 2 3 4
IEqualityComparer. We can specify an IEqualityComparer to compare elements in the Distinct call. This is probably not useful in many programs.IEqualityComparer

Note: We can "transform" elements in an IEqualityComparer. Here we treat each int as its parity (whether it is even or odd).

Odd, Even
C# program that uses IEqualityComparer using System; using System.Linq; using System.Collections.Generic; class EqualityParity : IEqualityComparer<int> { public bool Equals(int x, int y) { // Consider all even numbers the same, and all odd the same. return (x % 2) == (y % 2); } public int GetHashCode(int obj) { return (obj % 2).GetHashCode(); } } class Program { static void Main() { int[] array1 = { 9, 11, 13, 15, 2, 4, 6, 8 }; // This will remove all except the first event and odd. var distinctResult = array1.Distinct(new EqualityParity()); // Display results. foreach (var result in distinctResult) { Console.WriteLine(result); } } } Output 9 2
Benchmark duplicate methods. Usually a simple loop can be written to remove duplicates. A nested for-loop can execute much faster than the Distinct method on an int array.

Version 1: We use the Distinct method. Note how the code is short and easy to read. This is a benefit.

Version 2: A nested loop scans following elements for a duplicate. An element is added only if no following elements are the same.

Result: On a short int array, the nested loops are faster. But this will depend on the data given to the methods.

C# program that benchmarks dedupe methods using System; using System.Linq; using System.Collections.Generic; using System.Diagnostics; class Program { static IEnumerable<int> Test1(int[] array) { // Use distinct to check for duplicates. return array.Distinct(); } static IEnumerable<int> Test2(int[] array) { // Use nested loop to check for duplicates. List<int> result = new List<int>(); for (int i = 0; i < array.Length; i++) { // Check for duplicates in all following elements. bool isDuplicate = false; for (int y = i + 1; y < array.Length; y++) { if (array[i] == array[y]) { isDuplicate = true; break; } } if (!isDuplicate) { result.Add(array[i]); } } return result; } static void Main() { int[] array1 = { 1, 2, 2, 3, 4, 4 }; const int _max = 1000000; var s1 = Stopwatch.StartNew(); for (int i = 0; i < _max; i++) { // Version 1: benchmark distinct. var result = Test1(array1); if (result.Count() != 4) { break; } } s1.Stop(); var s2 = Stopwatch.StartNew(); for (int i = 0; i < _max; i++) { // Version 2: benchmark nested loop. var result = Test2(array1); if (result.Count() != 4) { break; } } s2.Stop(); Console.WriteLine(((double)(s1.Elapsed.TotalMilliseconds * 1000000) / _max).ToString("0.00 ns")); Console.WriteLine(((double)(s2.Elapsed.TotalMilliseconds * 1000000) / _max).ToString("0.00 ns")); Console.Read(); } } Output 185.44 ns Distinct method 51.11 ns Nested for-loops
Discussion. The Distinct method is not ideal for all purposes. Internally, the Distinct method is implemented in terms of iterators that are automatically generated by the C# compiler.

Therefore: Heap allocations occur when you invoke Distinct. For optimum performance, you could use loops on small collections.

And: With small data sets, the overhead of using iterators and allocations likely overshadows any asymptotic advantage.

A summary. We used the Distinct extension method from System.Linq. This method provides a declarative, function-oriented syntax for a typically imperative processing task.
The Distinct extension incurs practical performance drawbacks in some programs. For performance, a for-loop is probably better (but may be harder to maintain).
Dot Net Perls
© 2007-2020 Sam Allen. Every person is special and unique. Send bug reports to