Restricted commandline lenghts

Thu Aug 17 21:20:50 EDT 2006

On Fri, 2006-08-18 at 01:50 +0300, Esa Ilari Vuokko wrote:
> Hi,
> 
> Ticket #19 [1], and maybe some big projects in general, need to be able
> to invoke ar/ld so that we don't run into issues in commandline lenght.
> (And in theory, compiler's as well, but we that'd require the dreaded
> deps analysis.)  I think this is major feature we should resolve before
> GHC 6.6 goes out.  It affects significantly some libraries, at least
> gtk2hs, and I can see real size benefits on some of my own libraries
> as well.
> 
> I posted patch to fix this in some cases for Windows [2], but upon
> talking about it with Duncan, in irc, there's few exta points:
>  * Some unixy shells apparently have restrictions as well, so this
>    shouldn't be just Windows.
>  * On my old implementation, I just split the commandlines by filecount,
>    not according to actual space the paths take.  Is this a major
>    point?  It can affect the number of tool invokations quite radically.
>  * The algorithm should really be parametriseable because it is, I believe,
>    in all cases heuristic.
> 
> As how to proceed on implementing this, I'm a bit unsure.
>  * Because of unicode conversion issues, I am not entirely sure if we
>    can accurately know the length of commandline, atleast in Windows.
>    Or going for the bad case, 4 bytes per character gives us a lot of
>    setback in common case.
>  * I think the common way to workaround this is by using xargs with
>    some constant number per params (much like my current algos).
>  * I think there is a diffrence between complexities, because with
>    ar we really want to do append, but with ld we can do a
>    tree-style build.  I haven't benchmarked this.
>  * This might be useful to generalise into library function(s).

I've sent a patch to the list in another email.

What it does is take an xargs approach. It adds an xargs function and
uses for both the ar and ld cases. It calculates the length of the
command line string and uses as large a number of arguments as will fit,
given a maximum size.

Doing a minimum number of invocations of ar is quite important as at
least GNU ar is shockingly bad when linking thousands of little .o files
into an archive .a file. At the moment, linking GHC's libHSbase.a takes
several invocations of ar via xargs and takes >500Mb of memory. I did
actually submit a patch to binutils to bring this down to a mere 100Mb
but that's only in the very latest binutils versions and it's still
quite a lot. So I've put an arbitrary 32k in for unix systems, but I
think many are actually larger than this, eg 128k so it might be good to
either statically or dynamically find this.

For ld since it doesn't do append I've made output to a temp file and it
link in the previous accumulated .o file if it exists. Then it renames
the temp file to the final target.

I've tested it on linux, just with a very small maximum commandline size
and checked that it does invoke ar/ld multiple times and that the
resulting binaries work.

> I could use some advice or experiences from other buildsystems.  If nobody
> else steps up, I'll probably implement some choice (but I'm only prepared
> to test it in Windows.)

It'd be great if you could test this patch on Windows.

Duncan