The Distinct()
method removes all duplicate elements in a collection. It returns only distinct (or unique) elements. The System.Linq
namespace provides this extension method.
Distinct returns an IEnumerable
collection. We can loop over the collection returned by Distinct, or invoke other extension methods upon it.
We declare and allocate an array on the managed heap. The array contains 6 elements, but only 4 different numbers. Two are repeated—this fact is key to the program's output.
Distinct()
extension method to the array reference, and then assign the result to an implicitly typed local variable.using System; using System.Linq; // Declare an array with some duplicated elements in it. int[] array1 = { 1, 2, 2, 3, 4, 4 }; // Invoke Distinct extension method. var result = array1.Distinct(); // Display results. foreach (int value in result) { Console.WriteLine(value); }1 2 3 4
IEqualityComparer
We can specify an IEqualityComparer
to compare elements in the Distinct()
call. This is probably not useful in many programs.
IEqualityComparer
. Here we treat each int
as its parity (whether it is even or odd).using System; using System.Linq; using System.Collections.Generic; class EqualityParity : IEqualityComparer<int> { public bool Equals(int x, int y) { // Consider all even numbers the same, and all odd the same. return (x % 2) == (y % 2); } public int GetHashCode(int obj) { return (obj % 2).GetHashCode(); } } class Program { static void Main() { int[] array1 = { 9, 11, 13, 15, 2, 4, 6, 8 }; // This will remove all except the first event and odd. var distinctResult = array1.Distinct(new EqualityParity()); // Display results. foreach (var result in distinctResult) { Console.WriteLine(result); } } }9 2
Usually a simple loop can be written to remove duplicates. A nested for
-loop can execute much faster than the Distinct()
method on an int
array.
Distinct()
method. Note how the code is short and easy to read. This is a benefit.short
int
array, the nested loops are faster. But this will depend on the data given to the methods.using System; using System.Linq; using System.Collections.Generic; using System.Diagnostics; class Program { static IEnumerable<int> Test1(int[] array) { // Use distinct to check for duplicates. return array.Distinct(); } static IEnumerable<int> Test2(int[] array) { // Use nested loop to check for duplicates. List<int> result = new List<int>(); for (int i = 0; i < array.Length; i++) { // Check for duplicates in all following elements. bool isDuplicate = false; for (int y = i + 1; y < array.Length; y++) { if (array[i] == array[y]) { isDuplicate = true; break; } } if (!isDuplicate) { result.Add(array[i]); } } return result; } static void Main() { int[] array1 = { 1, 2, 2, 3, 4, 4 }; const int _max = 1000000; var s1 = Stopwatch.StartNew(); for (int i = 0; i < _max; i++) { // Version 1: benchmark distinct. var result = Test1(array1); if (result.Count() != 4) { break; } } s1.Stop(); var s2 = Stopwatch.StartNew(); for (int i = 0; i < _max; i++) { // Version 2: benchmark nested loop. var result = Test2(array1); if (result.Count() != 4) { break; } } s2.Stop(); Console.WriteLine(((double)(s1.Elapsed.TotalMilliseconds * 1000000) / _max).ToString("0.00 ns")); Console.WriteLine(((double)(s2.Elapsed.TotalMilliseconds * 1000000) / _max).ToString("0.00 ns")); } }185.44 ns Distinct method 51.11 ns Nested for-loops
The Distinct()
method is not ideal for all purposes. Internally, the Distinct()
method is implemented in terms of iterators that are automatically generated by the C# compiler.
Distinct()
. For optimum performance, you could use loops on small collections.We used the Distinct()
extension method from System.Linq
. This method provides a declarative, function-oriented syntax for a typically imperative processing task.
The Distinct extension incurs practical performance drawbacks in some programs. For performance, a for
-loop is probably better (but may be harder to maintain).