Home
C#
Duplicate Words
This page was last reviewed on Jun 13, 2021.
Dot Net Perls
Duplicate words. Strings in C# often contain duplicate words. And often these duplicate words are not useful. It is possible to remove them.
String Remove Duplicate Chars
List Remove Duplicates
Stopword notes. This is similar to the concept of removing stop words—common words that lack meaning. A lookup table like Dictionary can be used in a loop.
Dictionary
Input and output. Consider a string like "yellow bird blue bird." We want our algorithm to figure out that the word "bird" is repeated, and to remove it.
yellow bird blue bird yellow bird blue
Example code. We use a Dictionary for constant-time look up. We will be processing words in a loop, and we need to check each word against all words already encountered.
Note Using 2 Lists would result in higher complexity, potentially making your program slow on large data sets.
List
Detail This method uses StringBuilder for performance. The Dictionary stores words already encountered.
StringBuilder
Detail By passing a new char array to string Split, we can deal with punctuation.
String Split
Detail Here var refers to the Dictionary—it is a way to simplify the syntax of the program.
var
using System; using System.Collections.Generic; using System.Text; class Program { static void Main() { string s = "yellow bird, blue bird, yellow sun"; Console.WriteLine(s); Console.WriteLine(RemoveDuplicateWords(s)); } static public string RemoveDuplicateWords(string v) { // Keep track of words found in this Dictionary. var d = new Dictionary<string, bool>(); // Buildup string into this StringBuilder. StringBuilder b = new StringBuilder(); // Split the input. string[] a = v.Split(new char[] { ' ', ',', ';', '.' }, StringSplitOptions.RemoveEmptyEntries); // Loop over each word. foreach (string current in a) { // Lowercase each word. string lower = current.ToLower(); // If we haven't already encountered the word, append it to the result. if (!d.ContainsKey(lower)) { b.Append(current).Append(' '); d.Add(lower, true); } } // Return a string. return b.ToString().Trim(); } }
yellow bird, blue bird, yellow sun yellow bird blue sun
Stopwords. I used this code, and also a variant that removes stop words, to implement a full-text-search feature in a Windows Forms program. A special full-text search database is useful.
A summary. We combined Dictionary with StringBuilder to develop a method that removes duplicate English words efficiently. The code does lookups on each word as it encounters them.
Dot Net Perls is a collection of tested code examples. Pages are continually updated to stay current, with code correctness a top priority.
Sam Allen is passionate about computer languages. In the past, his work has been recommended by Apple and Microsoft and he has studied computers at a selective university in the United States.
This page was last updated on Jun 13, 2021 (image).
Home
Changes
© 2007-2024 Sam Allen.