Completely reproducible Haskell builds

Greg Steuck greg at nest.cx
Thu May 26 18:15:16 CEST 2011


I am trying to get ghc-7.0.3 build procedure down to a byte-identical
rebuild on Linux-amd64.

I solved one source of variability: ar embedding timestamps into .a
files (HOWTO at the end)

Now I am looking to eliminate variations in ELF64 executables. To make
things easy, I am going to demonstrate with unlit binary.

I start at a point where make leaves off.

% cp utils/unlit/dist/build/tmp/unlit{,.1}
% "/usr/bin/ghc" -o utils/unlit/dist/build/tmp/unlit   -O -H64m
-package-conf libraries/bootstrapping.conf   -i -iutils/unlit/.
-iutils/unlit/dist/build -iutils/unlit/dist/build/autogen
-Iutils/unlit/dist/build -Iutils/unlit/dist/build/autogen
-no-user-package-conf -rtsopts     -odir utils/unlit/dist/build -hidir
utils/unlit/dist/build -stubdir utils/unlit/dist/build -hisuf hi -osuf
o -hcsuf hc  -no-auto-link-packages -no-hs-main
utils/unlit/dist/build/unlit.o
% sha1sum utils/unlit/dist/build/tmp/unlit{,.1}
6f679d9dd9a9ea84a68be99369c9f1dc72ba41f0  utils/unlit/dist/build/tmp/unlit
beed059e09c9429c3b74ea613d5be30c6c17ac3c  utils/unlit/dist/build/tmp/unlit.1
% ls -l utils/unlit/dist/build/tmp/unlit{,.1}
-rwxr-x--- 1 gnezdo eng 15112 May 25 21:53 utils/unlit/dist/build/tmp/unlit
-rwxr-x--- 1 gnezdo eng 15112 May 25 21:53 utils/unlit/dist/build/tmp/unlit.1
% for i in utils/unlit/dist/build/tmp/unlit{,.1}; do readelf -a $i >
$i.elf; done
% diff utils/unlit/dist/build/tmp/unlit{,.1}.elf
250c250
<     27: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS ghc27965_0.c
---
>     27: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS ghc20499_0.c

Looks like there is a temporary file name baked into the ELF file.

Indeed, running with -v reveals:

*** C Compiler:
/usr/bin/gcc -c /tmp/ghc28016_0/ghc28016_0.c -o
/tmp/ghc28016_0/ghc28016_0.o -I/usr/lib/ghc-7.0.3/include
-fno-stack-protector
*** Linker:
/usr/bin/gcc -v -o utils/unlit/dist/build/tmp/unlit
-fno-stack-protector utils/unlit/dist/build/unlit.o
/tmp/ghc28016_0/ghc28016_0.o

Digging a bit into the sources reveals mkExtraCObj in
DriverPipeline.hs which calls newTempName.

The best option for dealing with this seems to be using gcc ability to
accept input from a pipe. I know I could make this work on a Posix
system. Yet I suspect getting it to work on Windows would be overly
onerous.

Next best idea is to make GHC use repeatable temporary .c & .o file
names for each invocation. There is already a unique temporary
directory where all the the temporary files are created. This suggests
I do not need to worry about adversarial races. So GHC just need to
avoid racing with itself. I see a couple of options:

  1) newTempName should create a new subdirectory for each call and
     the return a fixed name inside of this (so /tmp/ghc28016_0/ghc28016_0.c
     above would become /tmp/ghc28016_0/0/dummy.c)
  2) mkExtraCObj could compute some hash function of its xs
     argument (C program text) and then create a file named, e.g.
     /tmp/ghc28016_0/38eb8d8eb0abe9c828ba60983e2a97f7a069ec41.c

Which of these two looks better? Other ideas?

Would people be open to accepting a patch along these lines if I were
to write one?

The steps to make ar not include the timestamps were

1) Add "AR_OPTS = qD" to build.mk. This takes care of most .a files.
2) Set AR_FLAGS=qD in the evironment and dummy out ranlib (create a
   no-op script called ranlib on your PATH prior to real ranlib). This
   takes care of libffi build.

Thanks
Greg

-- 
nest.cx is Gmail hosted, use PGP for anything private. Key:
http://tinyurl.com/ho8qg
Fingerprint: 5E2B 2D0E 1E03 2046 BEC3  4D50 0B15 42BD 8DF5 A1B0



More information about the Glasgow-haskell-users mailing list