[Haskell-cafe] Re: can there be (hash-table using) O(n) version of this (I think currently) n log n algo?

Sun Jul 19 04:26:14 EDT 2009

Thomas Hartman wrote:
> The code below is, I think, n log n, a few seconds on a million + element list.
> 
> I wonder if it's possible to get this down to O(N) by using a
> hashtable implemementation, or other better data structure.
>
> -- findsums locates pairs of integers in a list
> that add up to a wanted sum.
>
> findsums :: [Int] -> Int -> S.Set (Int,Int)
> findsums xs wanted = snd . foldl' f (S.empty,S.empty) $ xs
>   where f (candidates,successes) next =
>      if  S.member (wanted-next) candidates
>        then (candidates, S.insert (next,wanted-next) successes)
>        else (S.insert next candidates,successes)

Remember that hash tables are actually O(key length) instead of O(1), so
I don't think you can break the  log n  for really large lists this
way since the key length increases as well (unless most elements are
equal anyway).

In any case,

I have this conjecture
On which I will lecture:
A Set of some things
Is sorting for kings.

;-)

    findsums goal = uncurry sweep . (id &&& reverse) . sort
        where
        sweep []     _      = []
        sweep _      []     = []
        sweep (x:xs) (y:ys) = if x > y then [] else
            case compare (x+y) goal of
                LT -> sweep xs (y:ys)
                EQ -> (x,y) : sweep xs ys
                GT -> sweep (x:xs) ys

This algorithm needs a proof of correctness, though. And it's longer
that the Data.Set version, too.

Regards,
apfelmus

--
http://apfelmus.nfshost.com