[Haskell-cafe] another Newbie performance question
Luke Palmer
lrpalmer at gmail.com
Sat May 17 17:07:40 EDT 2008
On Sat, May 17, 2008 at 5:22 PM, Philip Müller
<mail at philip.in-aachen.net> wrote:
> If someone here finds the time to look at my code and give me some hints,
> that would really be nice.
A little experimentation reveals that your main bottleneck is readCSVLine:
readCSVLine = read . (\x -> "[" ++ x ++ "]")
(I just replaced it with (:[]) and it sped up enormously)
Thus, I rewrote it myself instead of with read.
readCSVLine = unfoldr builder
where
builder [] = Nothing
builder xs = Just $ readField xs
readField [] = ([],[])
readField (',':xs) = ([],xs)
readField ('"':xs) =
let (l,'"':r) = span (/= '"') xs
(field,rest) = readField r
And decreased the runtime from 17 seconds to 4.2. It probably admits
an even better implementation, but it's likely that this is not the
bottleneck anymore.
The other thing is that the whole table is stored in memory because of
your call to "length csv" in doInteraction. This may have been the
intent. But if not, you can cut another 1 second off the runtime by
"streaming" the file using a function that lazily inserts the line in
the second-to-last position.
insertLine line csv = let (l,r) =
splitLast csv in l ++ [readCSVLine line] ++ r
where
splitLast [x] = ([],[x])
splitLast (x:xs) = let (l,r) = splitLast xs in (x:l,r)
(Note that I got rid of the "pos" parameter)
Presumably in a real application you are scanning until you see
something and then inserting near that, which is a lazy streamlike
operation.
There are probably a few other tricks you could do, but I think I
identified the main factors.
Luke
More information about the Haskell-Cafe
mailing list