C# Remove Duplicates From List

Eliminate duplicate elements from a List with Distinct, HashSet, or for-loops. Benchmark the methods on Lists.
Remove duplicates. In a parallel universe, all things are duplicated. The differences may be subtle, or obvious (whatever is required for a good movie plot).
A List may have duplicate elements as well. To eliminate duplicates, we can use the Distinct extension method. We can use a method like ToList() to go from an IEnumerable to a List again.
An example. This program uses the System.Linq namespace. A List with 7 int elements is created. But the List contains duplicate elements for the values 3 and 4.

Tip: By using the Distinct() extension method on the List type, we can remove those duplicate elements.

Then: We can optionally invoke the ToList extension to get an actual List with the duplicates removed.

C# program that removes duplicates in List using System; using System.Collections.Generic; using System.Linq; class Program { static void Main() { // List with duplicate elements. List<int> list = new List<int>(); list.Add(1); list.Add(2); list.Add(3); list.Add(3); list.Add(4); list.Add(4); list.Add(4); foreach (int value in list) { Console.WriteLine("Before: {0}", value); } // Get distinct elements and convert into a list again. List<int> distinct = list.Distinct().ToList(); foreach (int value in distinct) { Console.WriteLine("After: {0}", value); } } } Output Before: 1 Before: 2 Before: 3 Before: 3 Before: 4 Before: 4 Before: 4 After: 1 After: 2 After: 3 After: 4
Unique object fields. Suppose we have a List of objects (class instances). We want to find all objects with just a unique field (like the Key field).

Tip: We can use Distinct() with a special IEqualityComparer to treat all objects with an equal field as duplicates of one another.

IEqualityComparer

Thus: We can make the result have each object with a different key. We remove duplicates considering only a part of each object.

C# program that finds objects with unique fields using System; using System.Collections.Generic; using System.Linq; class Item { public int Key; public string Value; } class ItemEqualityComparer : IEqualityComparer<Item> { public bool Equals(Item x, Item y) { // Two items are equal if their keys are equal. return x.Key == y.Key; } public int GetHashCode(Item obj) { return obj.Key.GetHashCode(); } } class Program { static void Main() { var list = new List<Item>(); list.Add(new Item() { Key = 5, Value = "Bird" }); list.Add(new Item() { Key = 5, Value = "Frog" }); list.Add(new Item() { Key = 10, Value = "Bird" }); list.Add(new Item() { Key = 10, Value = "Frog" }); // Get distinct items using IEqualityComparer. // ... The Key field is used to see if two items are equal. var result = list.Distinct(new ItemEqualityComparer()); foreach (var item in result) { Console.WriteLine($"DISTINCT ITEM = {item.Key} {item.Value}"); } } } Output DISTINCT ITEM = 5 Bird DISTINCT ITEM = 10 Bird
Distinct. Found in the System.Linq namespace, this method can also be invoked on other types of collections, such as arrays. It acts upon types that implement IEnumerable.Distinct

Also: Distinct() can specify an equality comparer, which it uses to determine what elements are equal and can be made distinct.

Iterative method. Sometimes, using nested loops is a good solution. This requires less memory allocation. Here we loop through the list, and then loop through all previous elements.

And: If an element that comes before the current one is the same as the current one, we have a duplicate.

Warning: This method would become extremely slow on large collections of elements.

C# program that uses iterative removal using System; using System.Collections.Generic; class Program { public static List<string> RemoveDuplicatesIterative(List<string> items) { List<string> result = new List<string>(); for (int i = 0; i < items.Count; i++) { // Assume not duplicate. bool duplicate = false; for (int z = 0; z < i; z++) { if (items[z] == items[i]) { // This is a duplicate. duplicate = true; break; } } // If not duplicate, add to result. if (!duplicate) { result.Add(items[i]); } } return result; } static void Main() { // Call method with this input. List<string> input = new List<string>() { "x", "x", "y", "y", "y", "z" }; List<string> output = RemoveDuplicatesIterative(input); Console.WriteLine("Input: " + string.Join(",", input)); Console.WriteLine("Output: " + string.Join(",", output)); } } Output Input: x,x,y,y,y,z Output: x,y,z
HashSet method. This method provides predictable, good performance. It uses a HashSet to store all encountered elements, so we always know if we have a duplicate at a certain point.

Note: This is similar to the Distinct method in its implementation. It may be slower for tiny lists than the iterative method.

C# program that uses HashSet for duplicates using System; using System.Collections.Generic; class Program { public static List<string> RemoveDuplicatesSet(List<string> items) { // Use HashSet to maintain table of duplicates encountered. var result = new List<string>(); var set = new HashSet<string>(); for (int i = 0; i < items.Count; i++) { // If not duplicate, add to result. if (!set.Contains(items[i])) { result.Add(items[i]); // Record as a future duplicate. set.Add(items[i]); } } return result; } static void Main() { var input = new List<string>() { "a", "b", "a", "b", "b", "b", "c" }; var output = RemoveDuplicatesSet(input); Console.WriteLine("Input: " + string.Join(",", input)); Console.WriteLine("Output: " + string.Join(",", output)); } } Output Input: a,b,a,b,b,b,c Output: a,b,c
Generate duplicates method. For performance testing, we develop a method that generates a list of strings with some duplicates. We specify the unique string count and the duplicate count.

Note: The list has the values placed in an alternating order, and then all the remaining duplicates are placed at the end.

C# program that generates duplicates using System; using System.Collections.Generic; class Program { public static List<string> GetListWithDuplicates(int length, int duplicatesLength) { const string duplicateString = "bird"; List<string> result = new List<string>(); // Add all required strings. for (int i = 0; i < length; i++) { result.Add("cat" + i); // Add duplicate strings if required. if (duplicatesLength > 0) { result.Add(duplicateString); duplicatesLength--; } } // Add all remaining duplicates. for (int i = 0; i < duplicatesLength; i++) { result.Add(duplicateString); } // Return result. return result; } static void Main() { Console.WriteLine(string.Join(",", GetListWithDuplicates(5, 2))); Console.WriteLine(string.Join(",", GetListWithDuplicates(1, 0))); Console.WriteLine(string.Join(",", GetListWithDuplicates(4, 4))); } } Output cat0,bird,cat1,bird,cat2,cat3,cat4 cat0 cat0,bird,cat1,bird,cat2,bird,cat3,bird
Benchmark. Here we test the performance of these methods on lists of size 3, 300 and 3000 elements. The duplicates are about one-third of the elements on the 2 larger lists.

Results: For the tiny lists of 3 elements, the nested for-loop method (RemoveDuplicatesIterative) is fastest.

And: For the larger lists of 300 and 3000 elements, the HashSet method (in Release mode) is fastest.

C# program that benchmarks duplicate removal methods using System; using System.Collections.Generic; using System.Diagnostics; using System.Linq; class Program { static void Main() { for (int test = 0; test < 3; test++) { // gettestdata. var testData = GetTestData(test); var max = testData.Item4; var s1 = Stopwatch.StartNew(); for (int i = 0; i < max; i++) { // Version 1: use Distinct. var unique = testData.Item2.Distinct().ToList(); if (unique.Count != testData.Item3) { return; } } s1.Stop(); var s2 = Stopwatch.StartNew(); for (int i = 0; i < max; i++) { // Version 2: use HashSet. var unique = RemoveDuplicatesSet(testData.Item2); if (unique.Count != testData.Item3) { return; } } s2.Stop(); var s3 = Stopwatch.StartNew(); for (int i = 0; i < max; i++) { // Version 3: use nested for-loop. var unique = RemoveDuplicatesIterative(testData.Item2); if (unique.Count != testData.Item3) { return; } } s3.Stop(); // Write description. Console.WriteLine(testData.Item1); // Write timings. Console.WriteLine(s1.Elapsed.TotalMilliseconds + " ms"); Console.WriteLine(s2.Elapsed.TotalMilliseconds + " ms"); Console.WriteLine(s3.Elapsed.TotalMilliseconds + " ms"); } } static Tuple<string, List<string>, int, int> GetTestData(int test) { // Tuple contains description string, list, the unique element count, and iterations for test. switch (test) { default: case 0: return new Tuple<string, List<string>, int, int>("3 ELEMENT LIST, 0 DUPLICATES", GetListWithDuplicates(3, 0), 3, 100000); case 1: return new Tuple<string, List<string>, int, int>("300 ELEMENT LIST, 100 DUPLICATES", GetListWithDuplicates(200, 100), 201, 1000); case 2: return new Tuple<string, List<string>, int, int>("3000 ELEMENT LIST, 1000 DUPLICATES", GetListWithDuplicates(2000, 1000), 2001, 100); } } public static List<string> GetListWithDuplicates(int length, int duplicatesLength) { const string duplicateString = "bird"; List<string> result = new List<string>(); for (int i = 0; i < length; i++) { result.Add("cat" + i); if (duplicatesLength > 0) { result.Add(duplicateString); duplicatesLength--; } } for (int i = 0; i < duplicatesLength; i++) { result.Add(duplicateString); } return result; } public static List<string> RemoveDuplicatesSet(List<string> items) { var result = new List<string>(); var set = new HashSet<string>(); for (int i = 0; i < items.Count; i++) { if (!set.Contains(items[i])) { result.Add(items[i]); set.Add(items[i]); } } return result; } public static List<string> RemoveDuplicatesIterative(List<string> items) { List<string> result = new List<string>(); for (int i = 0; i < items.Count; i++) { bool duplicate = false; for (int z = 0; z < i; z++) { if (items[z] == items[i]) { duplicate = true; break; } } if (!duplicate) { result.Add(items[i]); } } return result; } } Output 3 ELEMENT LIST, 0 DUPLICATES 37.9524 ms 19.9176 ms 8.0359 ms 300 ELEMENT LIST, 100 DUPLICATES 23.1117 ms 20.499 ms 188.2431 ms 3000 ELEMENT LIST, 1000 DUPLICATES 22.7601 ms 21.4638 ms 1765.6447 ms
Benchmark results, continued. For optimal performance, consider using a nested for-loop if you have just tiny lists of a few elements. A branch could test for this condition.

And: For larger lists, use the RemoveDuplicatesSet method with HashSet. The Distinct() method from link is almost as fast for larger lists.

Generic method. We have seen the HashSet method is the fastest way to remove duplicates for large lists. We can use a generic method to avoid writing new versions.

Type T: We can use T to mean any type. So we can remove duplicates in a list of strings, ints or anything that has a hash code.

C# program that uses generic method using System; using System.Collections.Generic; class Program { public static List<T> RemoveDuplicatesSet<T>(List<T> items) { // Use HashSet to remember items seen. var result = new List<T>(); var set = new HashSet<T>(); for (int i = 0; i < items.Count; i++) { // Add if needed. if (!set.Contains(items[i])) { result.Add(items[i]); set.Add(items[i]); } } return result; } static void Main() { var input = new List<string>() { "j", "x", "j", "x", "y" }; var output = RemoveDuplicatesSet(input); Console.WriteLine("Input: " + string.Join(",", input)); Console.WriteLine("Output: " + string.Join(",", output)); } } Output Input: j,x,j,x,y Output: j,x,y
A summary. LINQ extension methods are useful on List collections. Developers often favor the List type over the array type for its resizable behavior.LINQList
A unique summary. For erasing duplicate elements in a List and making every element unique, HashSet, and Distinct, are good options. A for-loop is faster on tiny lists.
© 2007-2019 Sam Allen. Every person is special and unique. Send bug reports to info@dotnetperls.com.
HomeSearch
Home
Dot Net Perls