In 1965 Vladmir Levenshtein created a distance algorithm. This tells us the number of edits needed to turn one string
into another.
With Levenshtein distance, we measure similarity with fuzzy logic. This returns the number of character edits that must occur to get from string
A to string
B.
This code uses a two-dimensional array instead of a jagged array because the space required will only have one width and one height.
static
.using System; using System.Collections.Generic; class Program { static int Compute(string s, string t) { int n = s.Length; int m = t.Length; int[,] d = new int[n + 1, m + 1]; // Verify arguments. if (n == 0) { return m; } if (m == 0) { return n; } // Initialize arrays. for (int i = 0; i <= n; d[i, 0] = i++) { } for (int j = 0; j <= m; d[0, j] = j++) { } // Begin looping. for (int i = 1; i <= n; i++) { for (int j = 1; j <= m; j++) { // Compute cost. int cost = (t[j - 1] == s[i - 1]) ? 0 : 1; d[i, j] = Math.Min( Math.Min(d[i - 1, j] + 1, d[i, j - 1] + 1), d[i - 1, j - 1] + cost); } } // Return cost. return d[n, m]; } public static void Main() { List<string[]> l = new List<string[]> { new string[]{"ant", "aunt"}, new string[]{"Sam", "Samantha"} }; foreach (string[] a in l) { int cost = Compute(a[0], a[1]); Console.WriteLine("{0} -> {1} = {2}", a[0], a[1], cost); } } }ant -> aunt = 1 Sam -> Samantha = 5
Here is a table showing the edit distance of some word pairs. It is important to verify the correctness of all computer code (particularly from websites).
Words: ant, aunt Levenshtein distance: 1 Note: Only 1 edit is needed. The "u" must be added at index 2. Words: Samantha, Sam Levenshtein distance: 5 Note: The final 5 letters must be removed.
The difference between two strings is not represented as true or false, but as the number of steps needed to get from one to the other.
With the Levenshtein distance algorithm, we implement approximate string
matching. We implemented this string
distance algorithm in the C# language.