[Haskell-cafe] can there be (hash-table using) O(n) version of this (I think currently) n log n algo?

Edward Kmett ekmett at gmail.com
Fri Jul 17 21:11:56 EDT 2009


Haskell hash tables are a notorious performance pig, mostly due to the fact
that when we deal with big arrays, if the mutable array changes at all the
garbage collector will have to retraverse the entire thing during the next
collection. Guess the most common scenario for imperative hash tables that
are even lightly tweaked from time to time... ;)
As for other non-IO hash tables, I've seen a couple of unboxed hash tables
using STUArrays (which can side step this issue for unboxable data), IIRC
one may have even been used for a language shootout problem. I even wrote (a
rather poorly performing) Witold Litwin-style sorted linear hash table for
STM a couple of years back (it should still be on hackage under 'thash').

Data.HashTable could be easily reimplemented in ST s, but it would still
suffer the same GC problems as the current hash table, which no one likes.
-Ed

On Fri, Jul 17, 2009 at 6:24 PM, Thomas Hartman <tphyahoo at gmail.com> wrote:

> The code below is, I think, n log n, a few seconds on a million + element
> list.
>
> I wonder if it's possible to get this down to O(N) by using a
> hashtable implemementation, or other better data structure.
>
> Further, is there a hashtable implementation for haskell that doesn't
> live in IO? Maybe in ST or something?
>
> import Data.HashTable
> import qualified Data.Set as S
> import Data.List (foldl')
>
> testdata = [1,4,8,9,20,11,20,14,2,15] ++ [1..(10^6)]
> wantedsum = 29
>
> -- set data structure
> -- findsums locates pairs of integers in a list that add up to a
> wanted sum.
> findsums :: [Int] -> Int -> S.Set (Int,Int)
> findsums xs wanted = snd . foldl' f (S.empty,S.empty) $ xs
>  where f (candidates,successes) next = if  S.member (wanted-next)
> candidates
>                                          then (candidates, S.insert
> (next,wanted-next) successes)
>                                          else (S.insert next
> candidates,successes)
>
> -- hashtable data structure
>
>
>
> -- result: t
> -- fromList
> [(15,14),(16,13),(17,12),(18,11),(19,10),(20,9),(21,8),(22,7),(23,6),(24,5),(25,4),(26,3),(27,2),(28,1)]
> -- probably O(n log n) complexity since using tree based Data.Set (a
> few seconds on million+ element list)
> t = findsums testdata wantedsum
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20090717/3f205f28/attachment.html


More information about the Haskell-Cafe mailing list