[Haskell-cafe] Re: Suspected stupid Haskell Question

Chad Scherrer chad.scherrer at gmail.com
Wed Oct 17 23:38:51 EDT 2007


Hmm, is insertWith' new? If I remember right, I think the stack overflows
were happening because Map.insertWith isn't strict enough. Otherwise I think
the code is the same. But I would expect intTable to be faster, since it
uses IntMap, and there's no IntMap.insertWith' as of 6.6.1 (though it may be
easy enough to add one).

Chad

On 10/17/07, Thomas Hartman <thomas.hartman at db.com> wrote:
>
>
> Since I'm interested in the stack overflow issue, and getting acquainted
> with quickcheck, I thought I would take this opportunity to compare your
> ordTable with some code Yitzchak Gale posted earlier, against Ham's original
> problem.
>
> As far as I can tell, they're the same. They work on lists up to 100000
> element lists of strings, but on 10^6 size lists I lose patience waiting for
> them to finish.
>
> Is there a more scientific way of figuring out if one version is better
> than the other by using, say profiling tools?
>
> Or by reasoning about the code?
>
> t.
>
> ****************************************
>
> import Data.List
> import qualified Data.Map as M
> import Control.Arrow
> import Test.QuickCheck
> import Test.GenTestData
> import System.Random
>
> {-
> Is there a library function to take a list of Strings and return a list of
> ints showing how many times each String occurs in the list.
>
> So for example:
>
> ["egg", "egg", "cheese"] would return [2,1]
> -}
>
> testYitzGale n = do
>   l <- rgenBndStrRow (10,10) (10^n,10^n)  -- 100000 strings, strings are
> 10 chars long, works. craps out on 10^6.
>   m <- return $ freqFold l
>   putStrLn $ "map items: " ++ ( show $ M.size m )
>
> testCScherer n = do
>   l <- rgenBndStrRow (10,10) (10^n,10^n)  -- same limitations as yitz gale
> code.
>   m <- return $ ordTable l
>   putStrLn $ "items: " ++ ( show $ length m )
>
>
> -- slow for big lists
> --freqArr = Prelude.map ( last &&& length ) . group . sort
>
> -- yitz gale code. same as chad scherer code? it's simpler to understand,
> but is it as fast?
> freqFold :: [[Char]] -> M.Map [Char] Int
> freqFold = foldl' g M.empty
>   where g accum x = M.insertWith' (+) x 1 accum
> -- c scherer code. insists on ord. far as I can tell, same speed as yitz.
> ordTable :: (Ord a) => [a] -> [(a,Int)]
> ordTable xs = M.assocs $! foldl' f M.empty xs
>     where f m x = let  m' = M.insertWith (+) x 1 m
>                        Just v = M.lookup x m'
>                   in v `seq` m'
>
>
> l = ["egg","egg","cheese"]
>
> -- other quickcheck stuff
> --prop_unchanged_by_reverse = \l -> ( freqArr (l :: [[Char]]) ) == (
> freqArr $ reverse l )
> --prop_freqArr_eq_freqFold = \l -> ( freqArr (l :: [[Char]]) == (freqFold
> l))
> --test1 = quickCheck prop_unchanged_by_reverse
> --test2 = quickCheck prop_freqArr_eq_freqFold
>
> --------------- generate test data:
> genBndStrRow (minCols,maxCols) (minStrLen, maxStrLen) = rgen ( genBndLoL
> (minStrLen, maxStrLen) (minCols,maxCols) )
>
> gen gen = do
>   sg <- newStdGen
>   return $ generate 10000 sg gen
>
> -- generator for a list with length between min and max
> genBndList :: Arbitrary a => (Int, Int) -> Gen [a]
> genBndList (min,max) = do
>   len <- choose (min,max)
>   vector len
>
>
> -- lists of lists
> --genBndLoL :: (Int, Int) -> (Int, Int) -> Gen [[a]]
> genBndLoL (min1,max1) (min2,max2) = do
>   len1 <- choose (min1,max1)
>   len2 <- choose (min2,max2)
>   vec2 len1 len2
>
> --vec2 :: Arbitrary a => Int -> Int -> Gen [[a]]
> vec2 n m = sequence [ vector m | i <- [1..n] ]
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20071017/c1012336/attachment-0001.htm


More information about the Haskell-Cafe mailing list