Home
C#
Levenshtein Distance
Updated Dec 15, 2022
Dot Net Perls
Levenshtein. In 1965 Vladmir Levenshtein created a distance algorithm. This tells us the number of edits needed to turn one string into another.
Algorithm notes. With Levenshtein distance, we measure similarity with fuzzy logic. This returns the number of character edits that must occur to get from string A to string B.
An example. This code uses a two-dimensional array instead of a jagged array because the space required will only have one width and one height.
Tip The two-dimensional array requires fewer allocations upon the managed heap and may be faster in this context.
2D Array
Detail This Compute method doesn't need to store state or instance data, which means you can declare it as static.
static
Detail You can verify the algorithm's correctness using a computer science textbook.
using System; using System.Collections.Generic; class Program { static int Compute(string s, string t) { int n = s.Length; int m = t.Length; int[,] d = new int[n + 1, m + 1]; // Verify arguments. if (n == 0) { return m; } if (m == 0) { return n; } // Initialize arrays. for (int i = 0; i <= n; d[i, 0] = i++) { } for (int j = 0; j <= m; d[0, j] = j++) { } // Begin looping. for (int i = 1; i <= n; i++) { for (int j = 1; j <= m; j++) { // Compute cost. int cost = (t[j - 1] == s[i - 1]) ? 0 : 1; d[i, j] = Math.Min( Math.Min(d[i - 1, j] + 1, d[i, j - 1] + 1), d[i - 1, j - 1] + cost); } } // Return cost. return d[n, m]; } public static void Main() { List<string[]> l = new List<string[]> { new string[]{"ant", "aunt"}, new string[]{"Sam", "Samantha"} }; foreach (string[] a in l) { int cost = Compute(a[0], a[1]); Console.WriteLine("{0} -> {1} = {2}", a[0], a[1], cost); } } }
ant -> aunt = 1 Sam -> Samantha = 5
Notes, edit distance. Here is a table showing the edit distance of some word pairs. It is important to verify the correctness of all computer code (particularly from websites).
Words: ant, aunt Levenshtein distance: 1 Note: Only 1 edit is needed. The "u" must be added at index 2. Words: Samantha, Sam Levenshtein distance: 5 Note: The final 5 letters must be removed.
A summary. The difference between two strings is not represented as true or false, but as the number of steps needed to get from one to the other.
A final note. With the Levenshtein distance algorithm, we implement approximate string matching. We implemented this string distance algorithm in the C# language.
Dot Net Perls is a collection of pages with code examples, which are updated to stay current. Programming is an art, and it can be learned from examples.
Donate to this site to help offset the costs of running the server. Sites like this will cease to exist if there is no financial support for them.
Sam Allen is passionate about computer languages, and he maintains 100% of the material available on this website. He hopes it makes the world a nicer place.
No updates found for this page.
Home
Changes
© 2007-2025 Sam Allen