Home
C#
Duplicate Words
Updated Jun 13, 2021
Dot Net Perls
Duplicate words. Strings in C# often contain duplicate words. And often these duplicate words are not useful. It is possible to remove them.
String Remove Duplicate Chars
List Remove Duplicates
Stopword notes. This is similar to the concept of removing stop words—common words that lack meaning. A lookup table like Dictionary can be used in a loop.
Dictionary
Input and output. Consider a string like "yellow bird blue bird." We want our algorithm to figure out that the word "bird" is repeated, and to remove it.
yellow bird blue bird yellow bird blue
Example code. We use a Dictionary for constant-time look up. We will be processing words in a loop, and we need to check each word against all words already encountered.
Note Using 2 Lists would result in higher complexity, potentially making your program slow on large data sets.
List
Detail This method uses StringBuilder for performance. The Dictionary stores words already encountered.
StringBuilder
Detail By passing a new char array to string Split, we can deal with punctuation.
String Split
Detail Here var refers to the Dictionary—it is a way to simplify the syntax of the program.
var
using System; using System.Collections.Generic; using System.Text; class Program { static void Main() { string s = "yellow bird, blue bird, yellow sun"; Console.WriteLine(s); Console.WriteLine(RemoveDuplicateWords(s)); } static public string RemoveDuplicateWords(string v) { // Keep track of words found in this Dictionary. var d = new Dictionary<string, bool>(); // Buildup string into this StringBuilder. StringBuilder b = new StringBuilder(); // Split the input. string[] a = v.Split(new char[] { ' ', ',', ';', '.' }, StringSplitOptions.RemoveEmptyEntries); // Loop over each word. foreach (string current in a) { // Lowercase each word. string lower = current.ToLower(); // If we haven't already encountered the word, append it to the result. if (!d.ContainsKey(lower)) { b.Append(current).Append(' '); d.Add(lower, true); } } // Return a string. return b.ToString().Trim(); } }
yellow bird, blue bird, yellow sun yellow bird blue sun
Stopwords. I used this code, and also a variant that removes stop words, to implement a full-text-search feature in a Windows Forms program. A special full-text search database is useful.
A summary. We combined Dictionary with StringBuilder to develop a method that removes duplicate English words efficiently. The code does lookups on each word as it encounters them.
Dot Net Perls is a collection of pages with code examples, which are updated to stay current. Programming is an art, and it can be learned from examples.
Donate to this site to help offset the costs of running the server. Sites like this will cease to exist if there is no financial support for them.
Sam Allen is passionate about computer languages, and he maintains 100% of the material available on this website. He hopes it makes the world a nicer place.
This page was last updated on Jun 13, 2021 (image).
Home
Changes
© 2007-2025 Sam Allen