[Haskell-cafe] Performance Tuning & darcs (a real shootout?)
dagit at eecs.oregonstate.edu
Mon Jan 23 01:38:02 EST 2006
This will be a long email so I've tried to give it sections.
= Background =
I've been using Haskell a while now (close to a year) and I'm still
very much a newbie (but I've used other FP langs). I'm also very
interested in darcs which, as many of you know, is written in Haskell.
I've found a case which darcs does not handle efficiently and I'm
trying to get the performance in that case to a satisfactory level.
If you create a new repository, create a single large file (say
300mb) and record that file darcs will take a very long time and uses
a lot of memory (usually at least 900mb for the 300mb case).
Before I begin telling you what I've tried so far, let me give a
brief history of previous efforts by others to optimize darcs. As
far as I know, the main person to work on optimizing darcs in the
past is Ian Lynagh. He was able to convert darcs from stricter
algorithms to lazy ones and this made a huge improvement in
performance. In theory darcs should be able to lazily read from a
file, compute the patch that needs to be written, write that patch
and then lazily read the patch while it is applied to the
repository. This mostly works as expected, but I've discovered that
somewhere near the end of the application the memory needed spikes
After almost two weeks of poking at darcs doing various benchmarks
and profiles I've realized that optimizing Haskell programs is no
easy task. I've been following the advice of numerous people from
the haskell irc channel and learned a lot about darcs in the
process. I've also been using this nifty library that Ian created
for this purpose to get a measure for the non-mmap memory usage:
Potentially useful information about darcs;
1) Uses a slightly modified version of FastPackedStrings.
2) Can use mmap or not to read files (compile time option).
=Experiments and Findings=
I have a summary of some of my experimentation with darcs here:
Basically what I have found is that the read of the original file
does not cause a spike in memory usage, nor does writing the patch.
This would seem to imply that it's during application of the patch
that the memory spikes. Modifying darcs to read the patch file and
print just the first line of the patch causes some interesting
results. The memory usage according to Ian's memory tool stays very
low, at about 150kb max, but requesting the first line of the patch
appears to make darcs read the entire patch! Darcs will literally
grind away for, say, 30 minutes to just print the first line.
On a side note, I've tried turing off mmap and running some of the
above experiments. Ian's tool reports the same memory usage, and top
still reports large amounts of memory used. Does ghc use mmap to
allocate memory instead of malloc? Even if it does this shouldn't be
a problem for Ian's tool as long as it maps it anonymously.
So far I've been tracking this performance problem by reading the
output of ghc --show-iface and --ddump-simpl for strictness
information, using the ghc profiler (although that makes already bad
performance much worse), Ian's memory tool, and a lot of experiments
and guess work with program modifications. Is there a better way?
Are there tools or techniques that can help me understand why the
memory consumption peaks when applying a patch? Is it foolish to
think that lazy evaluation is the right approach? I've been thinking
that perhaps darcs should be modified to use bounded buffers so that
we have tight control over the amount of memory consumed during a
run. This solution would require a lot of reworking of the existing
code, and sounds frightful from a maintenance point of view.
I'm also wondering if using mmap is a bad idea. Given the way files
are currently mmap'd I think we are limiting darcs to handling files
which are small enough to fit in the address space at once (eg., <
4GB on i386). Additionally, it would seem that using mmap does not
really reduce memory stress.
I'm looking for advice or help in optimizing darcs in this case. I
guess this could be viewed as a challenge for people that felt like
the micro benchmarks of the shootout were unfair to Haskell. Can we
demonstrate that Haskell provides good performance in the real-world
when working with large files? Ideally, darcs could easily work with
a patch that is 10GB in size using only a few megs of ram if need be
and doing so in about the time it takes read the file once or twice
and gzip it.
If anyone wants to look at the darcs code themselves the unstable
version is available via:
darcs get --partial http://abridgegame.org/repos/darcs-unstable
Just to recap, I'm looking for 1) advice, 2) tips, 3) design ideas,
4) tools, 5) libraries and just about anything else :)
More information about the Haskell-Cafe