[Haskell-cafe] Optimizing a title matcher

Lyle Kopnicky lists at qseep.net
Tue Sep 26 22:24:05 EDT 2006


Robert Dockins wrote:
> Humm... well, double nested loops seems like the wrong approach.
It may be. I had hoped it would be fast enough that way.
>   Also, if you 
> are using GHC, it's hashtable implementation has farily well-known 
> performance problems. If all you care about is exact matching, then the 
> operation is essentially a finite map intersection (if I've understood what 
> you are trying to do).
>   
No, I don't just care about exact matching. I care about very fuzzy 
matching. It's just that the code I've implemented so far only does 
exact matching on the title strings. It's a first step.
> This is just a guess, but I suspect you will probably get much better 
> performance (and better-looking code!) by just using Data.Map.intersection
>  
> http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data-Map.html#v%3Aintersection
>   
Thanks, but that won't help with the fuzzy matching.
> Alternately, there is the ternary trie implementation from Edison 
> (http://www.eecs.tufts.edu/~rdocki01) that may also work for you.
>   
That may be useful.
> If you need to do prefix matching, then a trie is the way to go.  You can 
> probably code up a nice prefix-intersection operation using tries that should 
> go pretty fast.
>
> If you have some other metric other than prefix in mind for partial matches, 
> then things probably get a lot more complicated.  You're probably looking at 
> calculating minimum distances in some feature-space, which calls for pretty 
> sophisticated algorithms if you need good performance.
>   
Yes, that's the kind of thing I'm looking at doing. Looking at edit 
distance.

Thanks,
Lyle


More information about the Haskell-Cafe mailing list