[Haskell-cafe] How can I improve the pipes's performance with a huge file?

Tom Ellis tom-lists-haskell-cafe-2013 at jaguarpaw.co.uk
Fri Nov 14 17:31:59 UTC 2014


On Fri, Nov 14, 2014 at 05:47:16PM +0100, Wojtek Narczyński wrote:
> On 14.11.2014 10:43, zhangjun.julian wrote:
> >emptyMap = DM.empty::(DM.Map (String,String) Int)
> 
> Laziness makes your data swell.
> 
> 1) Try using ByteString or Text instead of String.
> 2) Try the UNPACK pragma, AFAIR it requires -O2.
>     data Key = Key {-# UNPACK #-} !ByteString   {-# UNPACK #-} !ByteString
>     https://hackage.haskell.org/package/ghc-datasize - this package
> will help you to determine the actual data size

This is certainly true, but there is a distinction to be drawn between
"swollen data" that is a few times bigger than it could be, and a space leak. 

Zhangjun Julian's biggest problem is definitely the latter.  There's no
reason that compiling a dictionary counting occurences and printing it out
should consume 9GB.  Once the space leak is fixed your suggestions will help
reduce memory usage further.

Tom


More information about the Haskell-Cafe mailing list