[Haskell-cafe] [16/16] SBM: Discussion and Conclusion
Don Stewart
dons at galois.com
Sat Dec 22 21:02:53 EST 2007
firefly:
> General
> -------
> Bytestrings are faster than haskell strings. This is good and expected.
> But their memory use, ouch! There seems to be a grave memory bug in 6.9.
Bug report -- since the bytestring library hasn't changed between 6.8.2
and head, that sounds like a bug report.
> Lazy bytestrings are slower than strict bytestrings -- this was completely
> unexpected for me.
Which programs should I look at? Is there a specific program or two
where lazy bytestring performance is entirely out of whack (against
6.8.2?)? If so, please forward it to me and Duncan.
>
> I/O
> ---
> The I/O patterns seen in the various programs are:
> 1) strict bytestrings = read increasingly large chunks (they double every
> time). I don't know if the already-read data have to be copied around or
> if the memory allocator can handle extending existing blocks (if they are
> at the "front" of the heap, say) like realloc() can.
if you're reading from stdin, and we don't know the size, bytestring
getContents uses realloc.
> Even if the blocks do get copied around, that's not where the performance
> limiter is (speed-wise). It does seem to be a bit unwise in terms of
> memory pressure.
>
> Does the gc kick in every time the heap has to expand? Does it do a full
> collection then? If no other allocations have happened since the last
> collection than the allocation that caused the heap to expand, then perhaps
> skip the collection? (or some criteria similar to that)
>
> 2) lazy bytestrings = read blocks of 32K-8 bytes, as expected. The C
> benchmarks show that there's no penalty for reading 32K-8 vs. 32K.
>
> 3) native strings = read blocks 8K.
>
> The C benchmarks show that it barely matters if the block size is 4K, 32K, or
> 32K-8 bytes. In any case, the small differences due to block size are
> completely in the noise. Reading very large blocks, though, as the strict
> bytestrings do, actually has a cost (38-70% slower, depending on the CPU and
> kernel and RAM and busspeeds, etc -- see email 10 in the series). It is still
> peanets compared for almost all the benchmarks.
Useful information, thanks.
> Backend
> -------
> The backend turns out to be rather bad. It is generally very easy to improve
> the generated code by hand without doing anything in terms of sophisticated
> analysis. One can rewrite inner loops using heroic code (load four characters
> at a time together into a 32-bit register or even use MMX to handle eight
> characters in parallel) but it isn't really necessary to do that to gain a
> good improvement and start being competitive with C.
>
> The backend is probably what is costly on some of the lazy bytestring and
> native string code because there are too many tests to see if the buffer has
> been exhausted (I think). Some simple hoisting and common subexpression
> elimination would go far here.
There may be things we can do at the library level too. We changed the
lazy bytestring representation in the last release cycle to remove a
number of tests. If you've a specific example would help though.
> Thoroughness
> ------------
> It really is necessary to benchmark more than one ghc/library combination,
> otherwise the bad memory bugs wouldn't stand out so clearly or I would have
> believed, as Don Stewart did, that he had fixed the memory allocation bug.
I fixed a particular issue with the library. But please make a bug
report if there are other issues wrt. ghc head in particularly.
> Action Items
> ------------
> Suggested short-term actions:
> o Can the allocator expand blocks? If not, see if it can be implemented.
> o Find out why lazy bytestrings "by hand" are that slow - is it due to
> lack of hoisting in the backend?
Example please!
> o Improve backend, mainly by not hiding information from register allocator.
> Alternatively, by using an assembly-to-assembly transformator that
> improves the register allocation.
Some nice examples that could help motivate the native gen hackers would
be useful.
> o Fix -sstderr.
> o perhaps even let the RTS print out /proc/self/maps or /proc/self/status
> or "peak RSS" returned by getrusage() in the ru_maxrss field?
> o Find memory leaks in ghc 6.9 code.
Bug report time.
> o ghc should allow round-tripping of the C-- code it generates.
> o would be awfully nice if the backend didn't sometimes throw the recognizable
> labels away.
> o can the core/stg code be made easier to read?
Can you ensure these points are summarised and directed to the ghc bug
tracker,
http://hackage.haskell.org/trac/ghc/newticket?type=bug
to this work isn't lost on -cafe at .
-- Don
More information about the Haskell-Cafe
mailing list