C# Tree and Nodes Example: Directed Acyclic Word Graph

Develop a tree or directed acyclic graph. Each node contains references to child nodes.
Tree, directed acyclic graph. A directed acyclic graph contains nodes that do not form cycles. It can be used to store strings from a word list—each letter is one node.
For search performance, an optimized tree is often the best solution. It has an initialization cost. And it requires some extra code to create. The C# version here is not optimal.
Example code. In the C# language a tree can be implemented with classes (nodes) that point to further nodes. In this simple implementation, we use a Dictionary.

AddWord: This method takes a string argument and adds it to the tree by creating (or using existing) nodes (one per character).

ContainsWord: This method meanwhile is similar to AddWord, but does not add anything. It just does lookups. It returns a bool.

Add: This method will add a new DictionaryNode to the Dictionary if one does not exist. It one already exists, it will not add another.

Get: Retrieves a node from the tree from the current node—it uses a Dictionary lookup (TryGetValue).

TryGetValue
C# program that uses tree class using System; using System.Collections.Generic; class DictionaryNodeRoot { DictionaryNode _root = new DictionaryNode(); public void AddWord(string value) { // Add nodes from string chars. DictionaryNode current = this._root; for (int i = 0; i < value.Length; i++) { current = current.Add(value[i]); } // Set state. current.SetWord(value); } public bool ContainsWord(string value) { // Get existing nodes from string chars. DictionaryNode current = this._root; for (int i = 0; i < value.Length; i++) { current = current.Get(value[i]); if (current == null) { return false; } } // Return state. return current != null && current.GetWord() != null; } } class DictionaryNode { string _word; Dictionary<char, DictionaryNode> _dict; // Slow. public DictionaryNode Add(char value) { // Add individual node as child. // ... Handle null field. if (this._dict == null) { this._dict = new Dictionary<char, DictionaryNode>(); } // Look up and return if possible. DictionaryNode result; if (this._dict.TryGetValue(value, out result)) { return result; } // Store. result = new DictionaryNode(); this._dict[value] = result; return result; } public DictionaryNode Get(char value) { // Get individual child node. if (this._dict == null) { return null; } DictionaryNode result; if (this._dict.TryGetValue(value, out result)) { return result; } return null; } public void SetWord(string word) { this._word = word; } public string GetWord() { return this._word; } } class Program { static void Main() { // Create a tree. DictionaryNodeRoot tree = new DictionaryNodeRoot(); // Add three words to the tree. tree.AddWord("dot"); tree.AddWord("net"); tree.AddWord("perls"); // Search for strings in the tree. Console.WriteLine(tree.ContainsWord("perls")); Console.WriteLine(tree.ContainsWord("nothing")); } } Output True False
Notes, above program. DictionaryNodeRoot serves as the root of the tree. It provides helper methods—AddWord and ContainsNode. We use this class to add strings to the tree.
Notes, AddWord. We add a string char-by-char with a for-loop in AddWord. The Add method from DictionaryNode.cs returns the newly-added or accessed node. We call Add repeatedly.CharFor

Info: This logic ends up building the tree. Nodes that already exist are not added again. They are instead shared.

Finally: The SetWord method is used. This signals to DictionaryType that a complete word has been reached at this node.

Bool
Notes, DictionaryNode. On each node, we use a Dictionary instance. The Dictionary provides references to child nodes. The string field tells us whether a complete word here exists.

Key: The Dictionary field uses a char key type. This is not essential. With this tree design, any value type could be used.

Notes, Add and Get. Add and Get are similar. Then they do a lookup in the Dictionary for the value specified. They return the DictionaryNode found, or null.Null
Discussion. In software many applications require collections of strings. An obvious example is a spell-checker. All those words are stored somehow in memory.Spell-Checker
Notes, advantages. The tree accommodates vast data sets. This tree could handle any number of words within the limits of memory without a big performance drop.

Simple: The design goal for this tree was simplicity. There are many ways to optimize the tree—see the performance section.

Warning: Optimizations make the tree's design less obvious. They might not work for every requirement.

Performance. Using the Dictionary as a field for child nodes is a slow implementation. An array, on each node, would be faster. Each element could instead be indexed in this array.Array

Tip: Further optimizations are possible. The entire data structure can be collapsed into 1 or 2 int arrays.

Tip 2: These arrays essentially are a flattened form of the Dictionaries. The arrays reference one another.

Int ArrayFlatten ArrayOptimization

And: A tree can be reduced so that the prefixes and suffixes are shared. The implementation here shares only prefixes.

Note, optimization. Premature optimization is the root of something (I think). I have designed this same tree many times. And in none of those times has the performance been that critical.Benchmark
Performance test. Here we compare the performance of an optimized version of our tree against a Dictionary generic. The tree only supports 26 lowercase letters.

Result: The tree is almost twice as fast on a simple benchmark. This is in-line with other benchmark results.

Also: The tree is not done with optimization. An array of "initial nodes" can be used to optimize the first letter's look up.

C# program that benchmarks tree with nodes array using System; class DictionaryNodeRoot { DictionaryNode _root = new DictionaryNode(); public void AddWord(string value) { DictionaryNode current = this._root; for (int i = 0; i < value.Length; i++) { current = current.Add(value[i]); } current.SetWord(value); } public bool ContainsWord(string value) { DictionaryNode current = this._root; for (int i = 0; i < value.Length; i++) { current = current.Get(value[i]); if (current == null) { return false; } } return current != null && current.GetWord() != null; } } class DictionaryNode { string _word; DictionaryNode[] _dict; // Use array for performance boost. public DictionaryNode Add(char value) { if (this._dict == null) { this._dict = new DictionaryNode[26]; } // Look up and return if possible. int key = value % 26; if (this._dict[key] != null) { return this._dict[key]; } // Store. var result = new DictionaryNode(); this._dict[key] = result; return result; } public DictionaryNode Get(char value) { // Get individual child node. if (this._dict == null) { return null; } int key = value % 26; return this._dict[key]; } public void SetWord(string word) { this._word = word; } public string GetWord() { return this._word; } } class Program { static void Main() { DictionaryNodeRoot tree = new DictionaryNodeRoot(); tree.AddWord("cat"); tree.AddWord("car"); tree.AddWord("cap"); var t1 = System.Diagnostics.Stopwatch.StartNew(); for (int i = 0; i < 10000000; i++) { if (!tree.ContainsWord("car")) { return; } } // ... Elapsed time. Console.WriteLine($"Time: {t1.ElapsedMilliseconds}"); } } C# program that uses Dictionary using System; using System.Collections.Generic; class Program { static void Main() { // Use Dictionary. var dictionary = new Dictionary<string, bool>(); dictionary["cat"] = true; dictionary["car"] = true; dictionary["cap"] = true; var t1 = System.Diagnostics.Stopwatch.StartNew(); for (int i = 0; i < 10000000; i++) { if (!dictionary.ContainsKey("car")) { return; } } // ... Elapsed time. Console.WriteLine($"Time: {t1.ElapsedMilliseconds}"); } } Output Time: 124 DictionaryNodeRoot.ContainsWord (with array for nodes) Time: 202 Dictionary.ContainsKey
Notes, prefix search. With a tree, we can search just for prefixes—the entire key does not need to be known. This can avoid many substrings. It is critical for games like Scrabble.
Some research. Directed acyclic graphs are important. They can be used to find optimal paths. This can vastly improve programs—a DAG could replace most primitive maze algorithms.Maze

Quote: For example, it is possible to find shortest paths and longest paths from a given starting vertex in DAGs in linear time by processing the vertices in a topological order....

Directed Acyclic Graph: Wikipedia
A summary. This tree uses Dictionary fields for child nodes. It works with strings. Its design can be adapted to other value types. The nodes can be optimized with other representations.
© 2007-2019 Sam Allen. Every person is special and unique. Send bug reports to info@dotnetperls.com.
HomeSearch
Home
Dot Net Perls