[Haskell-cafe] another Newbie performance question

Sat May 17 17:07:40 EDT 2008

On Sat, May 17, 2008 at 5:22 PM, Philip Müller
<mail at philip.in-aachen.net> wrote:
> If someone here finds the time to look at my code and give me some hints,
> that would really be nice.

A little experimentation reveals that your main bottleneck is readCSVLine:

  readCSVLine = read . (\x -> "[" ++ x ++ "]")

(I just replaced it with (:[]) and it sped up enormously)

Thus, I rewrote it myself instead of with read.

readCSVLine       = unfoldr builder
    where
    builder [] = Nothing
    builder xs = Just $ readField xs

    readField []       = ([],[])
    readField (',':xs) = ([],xs)
    readField ('"':xs) =
        let (l,'"':r) = span (/= '"') xs
            (field,rest) = readField r

And decreased the runtime from 17 seconds to 4.2.  It probably admits
an even better implementation, but it's likely that this is not the
bottleneck anymore.

The other thing is that the whole table is stored in memory because of
your call to "length csv" in doInteraction.  This may have been the
intent.  But if not, you can cut another 1 second off the runtime by
"streaming" the file using a function that lazily inserts the line in
the second-to-last position.

insertLine line csv = let (l,r) =
    splitLast csv in l ++ [readCSVLine line] ++ r
    where
    splitLast [x]    = ([],[x])
    splitLast (x:xs) = let (l,r) = splitLast xs in (x:l,r)

(Note that I got rid of the "pos" parameter)

Presumably in a real application you are scanning until you see
something and then inserting near that, which is a lazy streamlike
operation.

There are probably a few other tricks you could do, but I think I
identified the main factors.

Luke