HomeSearch

VB.NET Levenshtein Distance Algorithm

This VB.NET article implements Levenshtein distance, which tells us the number of String edits.
Levenshtein. In 1965 Vladmir Levenshtein introduced a distance algorithm. This tells us the number of changes or edits one string must go through to become another string. In VB.NET we compute Levenshtein distance with a Function.

Levenshtein: Returns the number of character edits that must occur to get from String A to String B.

Note: In Levenshtein distance, edits include removals, inserts and replacements.

Example. The internals of the Levenshtein distance Function are complex. Internally it allocates a two-dimensional array. This 2D array has dimensions equal to the lengths of the two argument strings.

For: A nested For-loop is executed. In the inner block of the For-loops, the cost of the change of two elements is computed.

Note: The cost is one for a different character and zero for the same character.

For Each, For

Then: Array elements are assigned, considering adjacent array elements and the cost. The ending array element is returned as the distance.

VB.NET program that computes Levenshtein distance Module Module1 Sub Main() Console.WriteLine(LevenshteinDistance("aunt", "ant")) Console.WriteLine(LevenshteinDistance("Sam", "Samantha")) Console.WriteLine(LevenshteinDistance("flomax", "volmax")) End Sub ''' <summary> ''' Compute LevenshteinDistance. ''' </summary> Public Function LevenshteinDistance(ByVal s As String, ByVal t As String) As Integer Dim n As Integer = s.Length Dim m As Integer = t.Length Dim d(n + 1, m + 1) As Integer If n = 0 Then Return m End If If m = 0 Then Return n End If Dim i As Integer Dim j As Integer For i = 0 To n d(i, 0) = i Next For j = 0 To m d(0, j) = j Next For i = 1 To n For j = 1 To m Dim cost As Integer If t(j - 1) = s(i - 1) Then cost = 0 Else cost = 1 End If d(i, j) = Math.Min(Math.Min(d(i - 1, j) + 1, d(i, j - 1) + 1), d(i - 1, j - 1) + cost) Next Next Return d(n, m) End Function End Module Output 1 5 3
Next, we review the output of this Function. The number returned by Levenshtein for the arguments "aunt" and "ant" is 1. This means only one edit is required to get from aunt to ant—the letter U is removed.
For the second result, we have five edits. The name "Sam" must be edited five times to get "Samantha"—five letters are added. And to get to "volmax" from "flomax" three edits must occur.

Note: Swapping two characters, as with the "ol" and "lo" in "volmax" and "flomax" is considered two separate edits, not one.

Summary. Aspects of the Levenshtein distance algorithm, such as its design and concepts, are not covered here. But the implementation here is correct on the example Strings tested. It is not fast.

Tip: Certain optimizations, and even caches, could be used to enhance its performance, such as memoization.

And: A data structure such as a Dictionary could store the two strings and also a cached distance. This would improve certain programs.

Dictionary
© 2007-2020 Sam Allen. Every person is special and unique. Send bug reports to info@dotnetperls.com.
Home
Dot Net Perls