[Haskell-cafe] Re: optimization help
jeff p
mutjida at gmail.com
Sat Oct 14 18:01:50 EDT 2006
Hello,
> Yet, I'm a bit astonished. I thought that when compiling with -O2,
> cosmetic changes should become negligible. Perhaps the strict foldl' has
> an effect?
>
Perhaps... but I doubt that is the main reason. At the moment I have
no idea why there is such a discrepancy between the heap usages...
A big part of why the solutions you crafted work so efficiently is
that they take advantage of the fact that the rows will be written out
exactly as they are read in. I wanted to see if a more general code
could maintain the same efficiency. Here is some code to read in a
file, write out a file, and do selections-- the idea is that CSV files
are internally represented and manipulated as [[ByteString]].
readCSV file = do
v <- B.readFile file
return $ map (B.split ',') $ B.lines v
writeCSV file tbl = do
h <- openFile file WriteMode
let writeRow = mapM_ (B.hPut h) . (++ [nl]) . intersperse comma
mapM_ writeRow tbl
hClose h
where
comma = B.singleton ','
nl = B.singleton '\n'
select targs test (cols : rows) = map narrow (cols : passTest rows)
where
myFilter = map snd . filter fst
passTest = myFilter . map (\row -> (runReader test (zip cols
[0..], row), row))
narrow = myFilter . zip (map (`elem` targs) cols)
col x = do
(cols,row) <- ask
let Just i = lookup (B.pack x) cols
return $ row!!i
This code runs reasonably fast-- around 13 seconds to read in a 120MB
file (~750000 rows), select half the columns of around 22000 rows
randomly distributed throughout the input table, and write a new CSV
file. It takes around 90 seconds to just remove some columns from
every row in the table and write a new file. So the slow part of the
program is probably the writeCSV function. Do you think these times
can be improved upon?
-Jeff
More information about the Haskell-Cafe
mailing list