[Haskell-cafe] Why is this so inefficient?

Don Stewart dons at galois.com
Tue Feb 5 13:15:22 EST 2008


If the  strings are relatively short, there can be a bottleneck
in the current Ord instance for Bytestrings. When lots of them
are in a map, the ffi calls to memcmp dominate. 

I've a fix for this (do it all in Haskell for small strings), and
can push it if someone complains some more.

jefferson.r.heard:
> I thought this was fairly straightforward, but where the marked line
> finishes in 0.31 seconds on my machine, the actual transpose takes
> more than 5 minutes.  I know it must be possible to read data in
> haskell faster than this.  I'm trying to read a 100MB comma delimited
> file.  I've tried both CSV modules, and these take even longer to read
> the file.  Is there some general best-practice for reading and parsing
> large amounts of data that I'm not aware of?
> 
> I have tried, by the way, a couple of things, including putting a bang
> (!) before row in transposeRow and using foldr instead of foldl', but
> that didn't change anything other than force me to increase the stack
> size to 100M on the command line.
> 
> I'm running in the profiler now, and I'll update this, but I thought I
> would check and see if my head was on remotely straight to begin with.
> 
> -- Jeff
> 
> ---
> module ColumnMajorCSV where
> 
> import qualified Data.ByteString.Char8 as S
> import qualified Data.Map as M
> import qualified Data.List as L
> 
> transposeRow cols row = zipWith (:) row cols
> 
> transposeCSV :: [[S.ByteString]] -> M.Map String [S.ByteString]
> transposeCSV (header:rows) = M.fromList (zip (map S.unpack header) spreadsheet)
>     where spreadsheet = L.foldl' transposeRow emptySheet rows
>           emptySheet = take (length header) $ repeat []
> 
> dataFromFile :: String -> IO (M.Map String [S.ByteString])
> dataFromFile filename = do
>     f <- S.readFile filename
>     print . length . map (S.split ',' $!) . S.lines $ f
>  -- finishes in 0.31 seconds
>     return . transposeCSV . map (S.split ',' $!) . S.lines $ f  --
> this takes 5 minutes and 6 seconds
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe


More information about the Haskell-Cafe mailing list