[Haskell-cafe] Re: Suspected stupid Haskell Question

Thu Oct 18 10:59:01 EDT 2007

>  But I would expect intTable to be faster,

But if I understand correctly, intTable can only deal with integer keys, 
whereas BH's original question would have wanted string keys, and I can't 
see a way to convert string to int and back.

t.

"Chad Scherrer" <chad.scherrer at gmail.com> 
10/17/2007 11:38 PM

To
Thomas Hartman/ext/dbcom at DBAmericas
cc
haskell-cafe at haskell.org, haskell-cafe-bounces at haskell.org
Subject
Re: [Haskell-cafe] Re: Suspected stupid Haskell Question

Hmm, is insertWith' new? If I remember right, I think the stack overflows 
were happening because Map.insertWith isn't strict enough. Otherwise I 
think the code is the same. But I would expect intTable to be faster, 
since it uses IntMap, and there's no IntMap.insertWith' as of 6.6.1 
(though it may be easy enough to add one).

Chad

On 10/17/07, Thomas Hartman < thomas.hartman at db.com> wrote:

Since I'm interested in the stack overflow issue, and getting acquainted 
with quickcheck, I thought I would take this opportunity to compare your 
ordTable with some code Yitzchak Gale posted earlier, against Ham's 
original problem. 

As far as I can tell, they're the same. They work on lists up to 100000 
element lists of strings, but on 10^6 size lists I lose patience waiting 
for them to finish. 

Is there a more scientific way of figuring out if one version is better 
than the other by using, say profiling tools? 

Or by reasoning about the code? 

t. 

**************************************** 

import Data.List 
import qualified Data.Map as M 
import Control.Arrow 
import Test.QuickCheck 
import Test.GenTestData 
import System.Random 

{- 
Is there a library function to take a list of Strings and return a list of 

ints showing how many times each String occurs in the list. 

So for example: 

["egg", "egg", "cheese"] would return [2,1] 
-} 

testYitzGale n = do 
  l <- rgenBndStrRow (10,10) (10^n,10^n)  -- 100000 strings, strings are 
10 chars long, works. craps out on 10^6. 
  m <- return $ freqFold l 
  putStrLn $ "map items: " ++ ( show $ M.size m ) 

testCScherer n = do 
  l <- rgenBndStrRow (10,10) (10^n,10^n)  -- same limitations as yitz gale 
code. 
  m <- return $ ordTable l 
  putStrLn $ "items: " ++ ( show $ length m ) 

-- slow for big lists 
--freqArr = Prelude.map ( last &&& length ) . group . sort 

-- yitz gale code. same as chad scherer code? it's simpler to understand, 
but is it as fast? 
freqFold :: [[Char]] -> M.Map [Char] Int 
freqFold = foldl' g M.empty 
  where g accum x = M.insertWith' (+) x 1 accum 
-- c scherer code. insists on ord. far as I can tell, same speed as yitz. 
ordTable :: (Ord a) => [a] -> [(a,Int)] 
ordTable xs = M.assocs $! foldl' f M.empty xs 
    where f m x = let  m' = M.insertWith (+) x 1 m 
                       Just v = M.lookup x m' 
                  in v `seq` m' 

l = ["egg","egg","cheese"] 

-- other quickcheck stuff 
--prop_unchanged_by_reverse = \l -> ( freqArr (l :: [[Char]]) ) == ( 
freqArr $ reverse l ) 
--prop_freqArr_eq_freqFold = \l -> ( freqArr (l :: [[Char]]) == (freqFold 
l)) 
--test1 = quickCheck prop_unchanged_by_reverse 
--test2 = quickCheck prop_freqArr_eq_freqFold 

--------------- generate test data: 
genBndStrRow (minCols,maxCols) (minStrLen, maxStrLen) = rgen ( genBndLoL 
(minStrLen, maxStrLen) (minCols,maxCols) ) 

gen gen = do 
  sg <- newStdGen 
  return $ generate 10000 sg gen 

-- generator for a list with length between min and max 
genBndList :: Arbitrary a => (Int, Int) -> Gen [a] 
genBndList (min,max) = do 
  len <- choose (min,max) 
  vector len 

-- lists of lists 
--genBndLoL :: (Int, Int) -> (Int, Int) -> Gen [[a]] 
genBndLoL (min1,max1) (min2,max2) = do 
  len1 <- choose (min1,max1) 
  len2 <- choose (min2,max2) 
  vec2 len1 len2 

--vec2 :: Arbitrary a => Int -> Int -> Gen [[a]] 
vec2 n m = sequence [ vector m | i <- [1..n] ] 

---

This e-mail may contain confidential and/or privileged information. If you 
are not the intended recipient (or have received this e-mail in error) 
please notify the sender immediately and destroy this e-mail. Any 
unauthorized copying, disclosure or distribution of the material in this 
e-mail is strictly forbidden.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20071018/71a51836/attachment-0001.htm