[Haskell-cafe] zip-archive performance/memmory usage
david at drp.id.au
Tue Aug 10 07:30:53 EDT 2010
I was interested to see if I could determine what was happening with this.
After some playing around, I noticed the code was running significantly
faster if I *didn't* compile it, but ran it with 'runghc' instead (running
under ghci was also fast).
Here are the running times I found. The 'Zip.hs' program comes with the
zip-archive package. The runtime of the compiled version didn't seem to be
affected by optimisations. Regardless, I'm quite surprised running
interpreted was significantly faster than compiled.
> time runghc ./Zip.hs -l ~/jdk1.6.0_05-src.zip
1.48s user 0.17s system 97% cpu 1.680 total
> time ./dist/build/Zip/Zip -l ~/jdk1.6.0_05-src.zip
89.00s user 1.06s system 98% cpu 1:31.84 total
The file 'jdk1.6.0_05-src.zip' was just an 18MB zip file I had lying
around. I'm using ghc 6.12.1
On Tue, Aug 10, 2010 at 12:10 PM, Jason Dagit <dagit at codersbase.com> wrote:
> On Mon, Aug 9, 2010 at 4:29 PM, Pieter Laeremans <pieter at laeremans.org>wrote:
>> I'm trying some haskell scripting. I'm writing a script to print some
>> from a zip archive. The zip-archive library does look nice but the
>> performance of zip-archive/lazy bytestring
>> doesn't seem to scale.
>> Executing :
>> eRelativePath $ head $ zEntries archive
>> on an archive of around 12 MB with around 20 files yields
>> Stack space overflow: current size 8388608 bytes.
> So it's a stack overflow at about 8 megs. I don't have a strong sense of
> what is normal, but that seems like a small stack to me. Oh, actually I
> just check and that is the default stack size :)
> I looked at Zip.hs (included as an example). The closest I see to your
> example is some code for listing the files in the archive. Perhaps you
> should try the supplied program on your archive and see if it too has a
> stack overflow.
> The line the author uses to list files is:
> List -> mapM_ putStrLn $ filesInArchive archive
> But, you're taking the head of the entries, so I don't see how you'd be
> holding on to too much data. I just don't see anything wrong with your
> program. Did you remember to compile with optimizations? Perhaps try the
> author's way of listing entries and see if performance changes?
>> The script in question can be found at :
>> I'm using the latest version of haskell platform. Are these libaries not
>> production ready,
>> or am I doing something terribly wrong ?
> Not production ready would be my assumption. I think an iteratee style
> might be more appropriate for these sorts of nested streams of potentially
> large size anyway. I'm skeptical of anything that depends on lazy
> bytestrings or lazy io. In this case, the performance would appear to be
> depend on lazy bytestrings.
> You might want to experiment with increasing the stack size. Something
> like this:
> ./ZipList +RTS -K100M -RTS foo.zip
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Haskell-Cafe