Plan of Attack for Parallel Builds

Wed Mar 30 10:37:03 CEST 2011

Hi all,

I'm very much looking forward to a future where cabal install exercises all
my core's with some heavy duty Haskell work ;-) Thanks Frank for taking this
up.

I personally like progress reports on the individual builds very much. I
agree that they are not super important, but nevertheless I think that a
progress report significantly improves the user experience. I also have a
simple, ad-hoc scheme that should result in an OK progress report for most
cases.

Gather patterns of the form

  "["<integer>" of "<integer>"]"

in the program output and interpret the resulting sequence such that the
second to last "measurement" is a conservative estimate of the real
progress; i.e.,

progress :: [(Int,Int)] -> Maybe Double
progress xs = case reverse xs of
  (_:(i,n):_) -> return (fromIntegral i / fromIntegral n)
  _            -> mzero

Probably some more filtering of this sequence is required to cater for
repeated calls to GHC. I guess that, as long as progress never goes from
100% to something below, the user will be happy about the progress estimate.
Moreover, the chance that such a pattern occurs where it doesn't indicate
some interesting progress is reasonably low.

best regards,
Simon

2011/3/30 Johan Tibell <johan.tibell at gmail.com>

> Hi Frank,
>
> Thanks for reaching out and gathering input.
>
> On Tue, Mar 29, 2011 at 9:54 PM, Frank Murphy <anirishduck at gmail.com>
> wrote:
> > - Parallelize executeInstallPlan. When given a target load average as a
> flag it
> >  will determine whether it should spawn a worker (if below the target
> load
> >  average) or wait. If waiting, it will listen to all worker status
> channels
> >  and print out their current build status and the load average. Once a
> worker
> >  exits, it will again check the load average and spawn a new thread if
> >  necessary.
>
> I think the most important setting is the number of worker threads
> (e.g. -jX). Load average sounds like a cool idea but I don't know how
> well it'll work in practice. Gentoo's Portage uses it so you might
> snoop around there for more info.
>
> > - Rewrite install.*Package and their callees to use the CHP
> (Communicating
> >  Haskell Process) monad where possible. Use channels to communicate build
> >  status back to the main thread.
>
> CHP might be a bit overkill, an MVar and a Chan or two should be
> enough. At least start simple.
>
> > - It might be necessary to parse the output of external builds in some
> way so
> >  that meaningful status can be communicated back to the user.
>
> I'm not sure this is worth it and even possible in the general case. See
> below.
>
> > - Add a default parallel build log path template. Allow the user to
> specify one
> >  on the command line to override the default.
>
> I'm not quite sure what you mean here. Do you mean that we'd write
> "cabal install" logs to e.g. .cabal/logs or something along those
> lines?
>
> > - On single-threaded (sequential) builds, revert to the old output style.
>
> Sounds good. One possible policy would be: If you run "cabal build",
> you get the old output format (and a single threaded build). If you
> run "cabal install", you get the new output format, regardless of if
> the build runs in parallel or not.
>
> What do people think? Is it worth displaying all the build output for
> "cabal install" in the single threaded case? Does the user care to see
> it? Perhaps it's good for debugging to let single threaded "cabal
> install" show the old output (i.e. if a parellel build fails, run the
> single threaded one to get more output).
>
> >  On multi-threaded builds, display the current status of all running
> builds, load
> >  averages and nothing else. Possible output:
> >
> > Resolving dependencies...
> > Building derive-2.3.0.2...                                            [17
> of 58]
> > Building regex-base-0.93.1...
> [1 of 4]
> > Building dyre-0.8.6...
>  [5 of 7]
> > Configuring xdg-basedir-0.2...                                     [in
> progress]
> >
> >                                                  Dependencies Built:  [0
> of 9]
> >                                                        Load Average:
> [3.4/4.0]
> >                                                                Running 4
> Jobs.
>
> Cabal allows packages to use any build system they want (e.g. make),
> which means that we can't know the progress of a single build in the
> general case. Today, Cabal simply shows the stdout of the build
> process, whatever it is. This means that we cannot show progress of
> individual packages. I suggest something like (take from Gentoo's
> Portage):
>
> Building (1 of 9) derive-2.3.0.2...
> Building (2 of 9) regex-base-0.93.1...
> Building (3 of 9) dyre-0.8.6...
> Building (4 of 9) aeson-0.3.2.1...
> Building (5 of 9) binary-0.5.0.2...
> Installing derive-2.3.0.2
> Installing regex-base-0.93.1
> Building (6 of 9) text-0.11.0.6...
> Installing dyre-0.8.6
> Jobs: 3 of 9 complete, 3 running               Load avg: 3.44, 1.46, 0.69
>
> We could perhaps make a special case for the Simple build type and
> parse the GHC output and show progress on individual builds. I don't
> think it's worth it, at least not initially.
>
> > A possible error message might look like:
> >
> > derive-2.3.0.2 failed during the building phase.
> > Log stored in /home/frank/cabal/logs/build/derive-2.3.0.2.log
>
> For build failures I think we should output the content of the log
> file to stdout (as one chunk, using a lock to avoid interleaving).
> This will make it quicker for users to get to the build failure. For
> successful builds I don't think we need to output more than in the
> example above.
>
> Cheers,
> Johan
>
> _______________________________________________
> cabal-devel mailing list
> cabal-devel at haskell.org
> http://www.haskell.org/mailman/listinfo/cabal-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/cabal-devel/attachments/20110330/09b49260/attachment.htm>