HomeSearch

Java Remove Duplicates From ArrayList

Remove duplicates from an ArrayList of strings. Use a HashSet in removeDuplicates.
Remove duplicates. No one else is here. The ancient ruins are abandoned. You see many pillars standing together—they are duplicates. In the late afternoon, you consider these pillars.
Sometimes duplicates are useful. But often they are not. Duplicates in an ArrayList can be removed. A hashing mechanism (like HashSet) that records encountered elements is helpful.
A method. The removeDuplicates method accepts an ArrayList of Strings. In it, we loop through that argument list. If our HashSet contains the String, we do nothing.

Add: But if our HashSet does not contain the String, we add it to the HashSet (so it is detected next time).

Result: We also build up our result ArrayList. This collection will retain the ordering of the original, but not duplicates.

ArrayList

Info: RemoveDuplicates() could be easily adapted to work upon Integers or other types. Try changing the String type to Integer.

Java program that removes duplicates import java.util.ArrayList; import java.util.HashSet; public class Program { static ArrayList<String> removeDuplicates(ArrayList<String> list) { // Store unique items in result. ArrayList<String> result = new ArrayList<>(); // Record encountered Strings in HashSet. HashSet<String> set = new HashSet<>(); // Loop over argument list. for (String item : list) { // If String is not in set, add it to the list and the set. if (!set.contains(item)) { result.add(item); set.add(item); } } return result; } public static void main(String[] args) { ArrayList<String> list = new ArrayList<>(); list.add("dog"); list.add("cat"); list.add("dog"); list.add("dog"); list.add("cat"); list.add("bird"); // Remove duplicates from ArrayList of Strings. ArrayList<String> unique = removeDuplicates(list); for (String element : unique) { System.out.println(element); } } } Output dog cat bird
HashSet constructor. This approach removes all duplicates, but the ordering of the elements is lost. We pass an ArrayList to the HashSet constructor, and then convert back to an ArrayList.HashSet

Note: This code is simpler and shorter. It does not suffer from any huge performance issues.

Reordering: Using this direct and simple approach causes the elements to be reordered. The HashSet will not retain ordering.

Java program that uses HashSet constructor import java.util.ArrayList; import java.util.HashSet; public class Program { public static void main(String[] args) { // Create ArrayList with one duplicate element. ArrayList<String> values = new ArrayList<>(); values.add("cat"); values.add("dog"); values.add("cat"); values.add("bird"); // Create HashSet from ArrayList. HashSet<String> set = new HashSet<>(values); // Create ArrayList from the set. ArrayList<String> result = new ArrayList<>(set); // The result. System.out.println(result.toString()); } } Output [cat, bird, dog]
Has duplicates. With a nested for-loop, we can check an ArrayList (or other array) for duplicate elements. We only search following (not preceding) elements for an exact match.

Tip: If an element and a following element are equal, the ArrayList has a duplicate. Otherwise, no duplicates exist.

Tip 2: This loop-based method is sometimes faster than using a set like HashSet. Its performance is poor on large collections.

Thus: Calling hasDuplicates on small lists where no duplicates are likely is often a good optimization.

Java program that checks for duplicates import java.util.ArrayList; public class Program { public static boolean hasDuplicates(ArrayList<String> list) { for (int i = 0; i < list.size(); i++) { // Loop over all following elements. for (int x = i + 1; x < list.size(); x++) { // If two elements equal, there is a duplicate. if (list.get(i) == list.get(x)) { return true; } } } // No duplicates found. return false; } public static void main(String[] args) { // This ArrayList has a duplicate. ArrayList<String> values = new ArrayList<>(); values.add("green"); values.add("blue"); values.add("red"); values.add("pink"); values.add("pink"); // See if duplicates exist. if (hasDuplicates(values)) { System.out.println(true); } // Remove all pink strings. values.removeIf(element -> element == "pink"); // No more duplicates. System.out.println(hasDuplicates(values)); } } Output true false
Testing versus removal. Here is an optimization. If an ArrayList rarely contains duplicates, and has few elements, we can avoid modifying it. We can just test for duplicates.

Version 1: This code loops over the ArrayList to see if any duplicates exist. It does not try to remove the duplicates.

Version 2: This code removes the duplicates without first checking whether any duplicates are present.

Result: In this benchmark, we find that on a 4-element ArrayList, calling hasDuplicates is much faster than removeDuplicates.

Warning: If lists usually have duplicates, hasDuplicates() will not help performance. And it will be slow on long lists due to looping.

Java program that times duplicate test, duplicate removal import java.util.*; public class Program { static boolean hasDuplicates(ArrayList<String> list) { // See if duplicates exist. for (int i = 0; i < list.size(); i++) { for (int x = i + 1; x < list.size(); x++) { if (list.get(i) == list.get(x)) { return true; } } } return false; } static ArrayList<String> removeDuplicates(ArrayList<String> list) { // Remove Duplicates: place them in new list (see above example). ArrayList<String> result = new ArrayList<>(); HashSet<String> set = new HashSet<>(); for (String item : list) { if (!set.contains(item)) { result.add(item); set.add(item); } } return result; } public static void main(String[] args) { ArrayList<String> elements = new ArrayList<>(); elements.add("one"); elements.add("two"); elements.add("three"); elements.add("four"); long t1 = System.currentTimeMillis(); // Version 1: test to see if duplicates exist. for (int i = 0; i < 10000000; i++) { if (hasDuplicates(elements)) { System.out.println(false); } } long t2 = System.currentTimeMillis(); // Version 2: process list even if no duplicates. for (int i = 0; i < 10000000; i++) { ArrayList<String> copy = removeDuplicates(elements); if (copy == null) { System.out.println(false); } } long t3 = System.currentTimeMillis(); // ... Benchmark results. System.out.println(t2 - t1); System.out.println(t3 - t2); } } Output 128 ms: hasDuplicates() 1409 ms: removeDuplicates()
Performance notes. For small lists, simply looping over elements already added, and not adding duplicate ones, is faster. But this approach would lead to problems with large collections.

Also: Another strategy is to not add duplicates in the first place. A custom method could prevent duplicates from being added.

Sets versus loops. With set-based methods, we can remove duplicate Strings from an ArrayList. Other approaches, which involve nested loops to test elements, are superior in some cases.
For duplicate removal ("dedupe") there are many possible options. This is a good thing. It means we can select the one that is best for our specific use case.
© 2007-2019 Sam Allen. Every person is special and unique. Send bug reports to info@dotnetperls.com.
Home
Dot Net Perls