Find anagrams and use Dictionary. An anagram is a word that when rearranged forms another word. We need some code to compute anagrams, and also store them on the disk. This will help us learn C#, do homework, or solve word puzzles in our spare time. Most importantly, it might be educational.
First, we should remember that every word can be alphabetized, and two words that have different arrangements of the same letters will form the same key when they are both alphabetized. We can use this relationship to use the alphabetized words as keys in a Dictionary object.
To find anagrams, we need to alphabetize the input words. An alphabetized string always contains the same frequencies of letters as the original word, which makes it ideal for finding anagrams. We can then store all the words that can be made from the key as the value in a Dictionary object.
| This input word | Alphabetized form |
| cat | act |
| tca | act |
I have written a separate article about how to alphabetize strings. I encourage you to read more about the method, but I will put it here for completeness. What follows is the method that alphabetizes a C# string.
/// <summary>
/// Alphabetize the word received as a parameter. This will create
/// a key from the word that is always the same for words with
/// equal letter frequencies.
/// </summary>
/// <param name="wordIn">The word you need to alphabetize.</param>
/// <returns>The alphabetized string.</returns>
static string Alpha(string wordIn)
{
char[] arr = wordIn.ToCharArray();
Array.Sort(arr);
return new string(arr);
}
This next section does the hard part and looks for anagrams, which it then stores in a convenient file. The file can be read into memory and used to instantly find anagrams. The following code uses alphabetization to find and store all anagrams for each word in the word list.
/// <summary>
/// Write a file containing all the key/value anagram pairs.
/// </summary>
public void WriteDictionary()
{
List<string>[] wordLists = new List<string>[_maxLength];
for (int i = 0; i < _maxLength; i++)
{
wordLists[i] = new List<string>();
}
Dictionary<string, List<string>> alphaDictionary =
new Dictionary<string, List<string>>();
using (StreamReader reader = new StreamReader(_inputFile))
{
string line;
while ((line = reader.ReadLine()) != null)
{
line = line.Trim(); // Trim string.
if (line.Length >= _maxLength)
{
continue;
}
string alphaString = Alpha(line);
if (alphaDictionary.ContainsKey(alphaString) == true)
{
// This section adds the word if it isn't already stored as anagram.
bool found = false;
foreach (string lineThere in alphaDictionary[alphaString])
{
if (lineThere == line)
{
found = true;
break;
}
}
if (found == false)
{
alphaDictionary[alphaString].Add(line);
}
}
else
{
// We have a new match, so add it.
alphaDictionary.Add(alphaString, new List<string>());
alphaDictionary[alphaString].Add(line);
wordLists[alphaString.Length].Add(alphaString);
}
}
}
//Write the anagrams we found to a file.
using (StreamWriter writer = new StreamWriter(_fileName))
{
List<string> alphaList = new List<string>(alphaDictionary.Keys);
alphaList.Sort();
for (int i = 0; i < alphaList.Count; i++) // Use List Count.
{
string alpha = alphaList[i];
List<string> actualWords = alphaDictionary[alpha];
foreach (string realWord in actualWords)
{
writer.WriteLine(alpha + ";" + realWord);
}
}
}
}
Now, after we have run WriteDictionary above, we can read in the Dictionary very quickly when we want our anagram program to run. Let me show that method now. Note how I use the Dictionary and TryGetValue method to populate the data structure. This is reasonably efficient and optimized, but not super fast.
/// <summary>
/// Read in the dictionary file to build the anagram data structure.
/// </summary>
public void ReadDictionary()
{
_dict = new Dictionary<string, List<string>>();
using (StreamReader reader = new StreamReader(_fileName))
{
string line;
while ((line = reader.ReadLine()) != null)
{
string[] parts = line.Split(';');
List<string> localList;
if (_dict.TryGetValue(parts[0], out localList))
{
localList.Add(parts[1]);
}
else
{
localList = new List<string>();
localList.Add(parts[1]);
_dict.Add(parts[0], localList);
}
}
}
}
At this point, we have written the anagram file to the disk, and then we can run the program read it back in. Now, we can find anagrams really quickly. The next function is called FindAnagramsFor, and you simply call it and it displays all the anagrams for your word.
/// <summary>
/// Display all the anagrams for the parameter word.
/// </summary>
/// <param name="wordIn">The word you want to find anagrams for.</param>
public void FindAnagramsFor(string wordIn)
{
string alpha = Alpha(wordIn);
List<string> anagramList;
if (_dict.TryGetValue(alpha, out anagramList))
{
Console.WriteLine(wordIn + ":");
foreach (string anagram in anagramList)
{
Console.WriteLine(" " + anagram);
}
}
}
Here is some surrounding code for the above anagram code. We store the anagram logic in a class called Anagrams. This adheres to object-oriented programming and allows very easy reuse of the anagram code. The following example also shows all the anagrams for senators.
class Program
{
static void Main(string[] args)
{
Anagrams anagrams = new Anagrams();
anagrams.WriteDictionary();
anagrams.ReadDictionary();
anagrams.FindAnagramsFor("senators"); // Lots of anagrams for this!
}
}
class Anagrams
{
const int _maxLength = 10;
const string _fileName = "anagrams.txt";
const string _inputFile = "enable1.txt";
Dictionary<string, List<string>> _dict;
// [functions shown above are placed here]
}
In the game Scrabble, a player can use a blank as a substitute for any letter. This can be dealt with in different ways in an anagram program--either with a tree and recursion, or with approximate string matching. I have used Levenshtein distances to find similar strings, and provide the source for you.
I have shown how to make an anagram dictionary and store it. You can take any combination of letters and find all words they make. Finally, there are more complex approaches to letter puzzles or algorithms. Read about my Windows open-source anagram finder, which is far faster and more useful.
The anagram algorithms shown here in this article are available in a much more usable form for copying at my new source code download site.