[Haskell-cafe] Why is this so inefficient?
dons at galois.com
Tue Feb 5 13:15:22 EST 2008
If the strings are relatively short, there can be a bottleneck
in the current Ord instance for Bytestrings. When lots of them
are in a map, the ffi calls to memcmp dominate.
I've a fix for this (do it all in Haskell for small strings), and
can push it if someone complains some more.
> I thought this was fairly straightforward, but where the marked line
> finishes in 0.31 seconds on my machine, the actual transpose takes
> more than 5 minutes. I know it must be possible to read data in
> haskell faster than this. I'm trying to read a 100MB comma delimited
> file. I've tried both CSV modules, and these take even longer to read
> the file. Is there some general best-practice for reading and parsing
> large amounts of data that I'm not aware of?
> I have tried, by the way, a couple of things, including putting a bang
> (!) before row in transposeRow and using foldr instead of foldl', but
> that didn't change anything other than force me to increase the stack
> size to 100M on the command line.
> I'm running in the profiler now, and I'll update this, but I thought I
> would check and see if my head was on remotely straight to begin with.
> -- Jeff
> module ColumnMajorCSV where
> import qualified Data.ByteString.Char8 as S
> import qualified Data.Map as M
> import qualified Data.List as L
> transposeRow cols row = zipWith (:) row cols
> transposeCSV :: [[S.ByteString]] -> M.Map String [S.ByteString]
> transposeCSV (header:rows) = M.fromList (zip (map S.unpack header) spreadsheet)
> where spreadsheet = L.foldl' transposeRow emptySheet rows
> emptySheet = take (length header) $ repeat 
> dataFromFile :: String -> IO (M.Map String [S.ByteString])
> dataFromFile filename = do
> f <- S.readFile filename
> print . length . map (S.split ',' $!) . S.lines $ f
> -- finishes in 0.31 seconds
> return . transposeCSV . map (S.split ',' $!) . S.lines $ f --
> this takes 5 minutes and 6 seconds
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
More information about the Haskell-Cafe