[Haskell-cafe] google-like "do you mean?" feature
Andy Smith
andy.haskell at zambezi.org.uk
Thu Apr 16 02:55:44 EDT 2009
2009/4/16 Michael Mossey <mpm at alumni.caltech.edu>:
> I was thinking that it might be useful to have a Google-like "do you mean
> this?" feature. If the field name is //customer=, then the parser might
> recognize a huge list of variants like //ustomer=, //customor=, etc... that
> is, recognize them well enough to continue parsing and give a decent error
> message in context.
>
> Any ideas how to go about this?
To measure how similar two strings are, you can use a metric like
Levenshtein distance, Damerau-Levenshtein distance, or Jaro-Winkler
distance:
http://en.wikipedia.org/wiki/Levenshtein_distance
http://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance
http://en.wikipedia.org/wiki/Jaro-Winkler_distance
The first two basically count the number of mistakes that a user would
have to make to get from the correct string to the one you read from
the file. There's an 'edit-distance' package in Hackage that
implements the first two:
http://hackage.haskell.org/cgi-bin/hackage-scripts/package/edit-distance
When you find an unrecognised field name in the file, you could
calculate the edit distance to each correct field name, and if there's
one within a certain threshold, assume that's what the user meant (if
there's more than one close match, maybe it's better to report an
error than risk choosing the wrong one).
I imagine this brute-force approach would be fast enough, but if not
you could look at the techniques used by spell checkers to suggest
corrections. Maybe even use a spell checking library, if such a thing
exists (either pure Haskell or a binding to a library like aspell,
although I couldn't see either from a quick search in Hackage).
Andy
More information about the Haskell-Cafe
mailing list