Archive for December 6th, 2011

Levenshtein Distance

In all of my days writing PHP I have never before run accross the Levenshtein Distance. I don’t suppose that this is strictly a PHP idea. Nevertheless, I was intrigued to find out that there is a specific function in the library that will calculate the Levenshtein Distance between two strings. It looks like this:

levenshtein(string1,string2,insert,replace,delete)

The required input is the two strings, and then optionally you can give weight to certain distance perameters. So just what is this function measuring? Quite simply how many changes would have to be made between two strings in order to make them equal.

I have been pondering as to why we should ever want to quantify such a thing. But a little cogitation would show that this distance would work very well for spell-checkers in trying to determine what a person meant to write when using a particular spelling. For example, if you wrote “recieve” the Levenshtein Distance to “receive” would be in the neighborhood of two with only two replaces. A person writing “recieve” would probably not be trying to write “rectify”, which would be a distance of three. But that would be a whole lot more likely that “transmogrify”, which would be a much greater distance.

While I don’t think this distance says much about entomology, I can see how it could be a useful tool in scripts dealing in word processing.

A program dealing with the Levenshtein Distance between certain names might also prove entertaining. I will ponder that one.

No Comments