[Haskell-cafe] help with Haskell performance

Tue Nov 10 20:20:44 EST 2009

--- On Sat, 11/7/09, Don Stewart <dons at galois.com> wrote:
> General notes:
>
>  * unpack is almost always wrong.
>  * list indexing with !! is almost always wrong.
>  * words/lines are often wrong for parsing large files (they build large list structures).
>  * toList/fromList probably aren't the best strategy
>  * sortBy (comparing snd)
>  * use insertWith'
> Spefically, avoid constructing intermediate lists, when you can process the
> entire file in a single pass. Use O(1) bytestring substring operations like
> take and drop.

Thanks all for the valuable feedback. Switching from Regex.Posix to Regex.PCRE alone reduced the running time to about 6 secs and a few other optimizations suggested on this thread brought it down to about 5 secs ;) 

I then set out to profile the code out of curiosity to see where the bulk of the time was being spent and sure enough the culprit turned out to be "unpack". My question therefore is, given a list L1 of type [(ByteString, Int)], how do I print it out so as to eliminate the "chunk, empty" markers associated with a bytestring? The suggestions posted here are along the lines of "mapM_ print L1" but that's far from desirable especially because the generated output is for perusal by non-technical users etc.

Thanks.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20091110/b6c69ee6/attachment.html