Stopword notes. This is similar to the concept of removing stop words—common words that lack meaning. A lookup table like Dictionary can be used in a loop.
Input and output. Consider a string like "yellow bird blue bird." We want our algorithm to figure out that the word "bird" is repeated, and to remove it.
yellow bird blue bird
yellow bird blue
Example code. We use a Dictionary for constant-time look up. We will be processing words in a loop, and we need to check each word against all words already encountered.
Note Using 2 Lists would result in higher complexity, potentially making your program slow on large data sets.
using System;
using System.Collections.Generic;
using System.Text;
class Program
{
static void Main()
{
string s = "yellow bird, blue bird, yellow sun";
Console.WriteLine(s);
Console.WriteLine(RemoveDuplicateWords(s));
}
static public string RemoveDuplicateWords(string v)
{
// Keep track of words found in this Dictionary.
var d = new Dictionary<string, bool>();
// Buildup string into this StringBuilder.
StringBuilder b = new StringBuilder();
// Split the input.
string[] a = v.Split(new char[] { ' ', ',', ';', '.' }, StringSplitOptions.RemoveEmptyEntries);
// Loop over each word.
foreach (string current in a)
{
// Lowercase each word.
string lower = current.ToLower();
// If we haven't already encountered the word, append it to the result.
if (!d.ContainsKey(lower))
{
b.Append(current).Append(' ');
d.Add(lower, true);
}
}
// Return a string.
return b.ToString().Trim();
}
}yellow bird, blue bird, yellow sun
yellow bird blue sun
Stopwords. I used this code, and also a variant that removes stop words, to implement a full-text-search feature in a Windows Forms program. A special full-text search database is useful.
A summary. We combined Dictionary with StringBuilder to develop a method that removes duplicate English words efficiently. The code does lookups on each word as it encounters them.
Dot Net Perls is a collection of tested code examples. Pages are continually updated to stay current, with code correctness a top priority.
Sam Allen is passionate about computer languages. In the past, his work has been recommended by Apple and Microsoft and he has studied computers at a selective university in the United States.
This page was last updated on Jun 13, 2021 (image).