[Haskell-cafe] Re: can there be (hash-table using) O(n) version of this (I think currently) n log n algo?

Wed Dec 9 18:06:31 EST 2009

Never underestimate teh power of the Int{Set,Map}:

{-# LANGUAGE BangPatterns #-}
import Data.Set(Set)
import Data.IntSet(IntSet)
import qualified Data.Set as S
import qualified Data.IntSet as IS
import Control.Parallel.Strategies(rnf)
import Data.Monoid(Monoid(..))
import Data.List

findsumsIS :: [Int] -> Int -> IntSet
findsumsIS xs wanted = snd . foldl' f mempty $ xs
  where f (!candidates,!successes) next =
          let x = wanted - next
          in case x `IS.member` candidates of
              True -> (candidates, IS.insert next successes)
              False -> (IS.insert next candidates,successes)

-- (i had to add bangs in f since it was blowing the stack)
findsums :: [Int] -> Int -> Set (Int,Int)
findsums xs wanted = snd . foldl' f (S.empty,S.empty) $ xs
  where f (!candidates,!successes) next =
          if  S.member (wanted-next) candidates
            then (candidates, S.insert (next,wanted-next) successes)
            else (S.insert next candidates,successes)

{-
  Note that the list has 10 million elements,
  (time is roughly 0.4s with 1 million with IntSet).
-}

{-
main = do
  let s = findsums (take 10000000 (cycle [1..999])) 500
  print (rnf s `seq` ())

[m at monire ~]$ time ./FindSums
()

real	0m8.793s
user	0m8.762s
sys	0m0.022s
-}

{-
main = do
  let s = findsumsIS (take 10000000 (cycle [1..999])) 500
  print (rnf s `seq` ())

[m at monire ~]$ time ./FindSumsIS
()

real	0m4.488s
user	0m4.459s
sys	0m0.023s
-}

Matt

> On Sunday 19 July 2009 09:26:14 Heinrich Apfelmus wrote:
>> Thomas Hartman wrote:
>> > The code below is, I think, n log n, a few seconds on a million +
>> > element
>> > list.
>> >
>> > I wonder if it's possible to get this down to O(N) by using a
>> > hashtable implemementation, or other better data structure.
>> >
>> > -- findsums locates pairs of integers in a list
>> > that add up to a wanted sum.
>> >
>> > findsums :: [Int] -> Int -> S.Set (Int,Int)
>> > findsums xs wanted = snd . foldl' f (S.empty,S.empty) $ xs
>> >   where f (candidates,successes) next =
>> >      if  S.member (wanted-next) candidates
>> >        then (candidates, S.insert (next,wanted-next) successes)
>> >        else (S.insert next candidates,successes)
>>
>> Remember that hash tables are actually O(key length) instead of O(1), so
>> I don't think you can break the  log n  for really large lists this
>> way since the key length increases as well (unless most elements are
>> equal anyway).