Completely reproducible Haskell builds
Greg Steuck
greg at nest.cx
Thu May 26 18:15:16 CEST 2011
I am trying to get ghc-7.0.3 build procedure down to a byte-identical
rebuild on Linux-amd64.
I solved one source of variability: ar embedding timestamps into .a
files (HOWTO at the end)
Now I am looking to eliminate variations in ELF64 executables. To make
things easy, I am going to demonstrate with unlit binary.
I start at a point where make leaves off.
% cp utils/unlit/dist/build/tmp/unlit{,.1}
% "/usr/bin/ghc" -o utils/unlit/dist/build/tmp/unlit -O -H64m
-package-conf libraries/bootstrapping.conf -i -iutils/unlit/.
-iutils/unlit/dist/build -iutils/unlit/dist/build/autogen
-Iutils/unlit/dist/build -Iutils/unlit/dist/build/autogen
-no-user-package-conf -rtsopts -odir utils/unlit/dist/build -hidir
utils/unlit/dist/build -stubdir utils/unlit/dist/build -hisuf hi -osuf
o -hcsuf hc -no-auto-link-packages -no-hs-main
utils/unlit/dist/build/unlit.o
% sha1sum utils/unlit/dist/build/tmp/unlit{,.1}
6f679d9dd9a9ea84a68be99369c9f1dc72ba41f0 utils/unlit/dist/build/tmp/unlit
beed059e09c9429c3b74ea613d5be30c6c17ac3c utils/unlit/dist/build/tmp/unlit.1
% ls -l utils/unlit/dist/build/tmp/unlit{,.1}
-rwxr-x--- 1 gnezdo eng 15112 May 25 21:53 utils/unlit/dist/build/tmp/unlit
-rwxr-x--- 1 gnezdo eng 15112 May 25 21:53 utils/unlit/dist/build/tmp/unlit.1
% for i in utils/unlit/dist/build/tmp/unlit{,.1}; do readelf -a $i >
$i.elf; done
% diff utils/unlit/dist/build/tmp/unlit{,.1}.elf
250c250
< 27: 0000000000000000 0 FILE LOCAL DEFAULT ABS ghc27965_0.c
---
> 27: 0000000000000000 0 FILE LOCAL DEFAULT ABS ghc20499_0.c
Looks like there is a temporary file name baked into the ELF file.
Indeed, running with -v reveals:
*** C Compiler:
/usr/bin/gcc -c /tmp/ghc28016_0/ghc28016_0.c -o
/tmp/ghc28016_0/ghc28016_0.o -I/usr/lib/ghc-7.0.3/include
-fno-stack-protector
*** Linker:
/usr/bin/gcc -v -o utils/unlit/dist/build/tmp/unlit
-fno-stack-protector utils/unlit/dist/build/unlit.o
/tmp/ghc28016_0/ghc28016_0.o
Digging a bit into the sources reveals mkExtraCObj in
DriverPipeline.hs which calls newTempName.
The best option for dealing with this seems to be using gcc ability to
accept input from a pipe. I know I could make this work on a Posix
system. Yet I suspect getting it to work on Windows would be overly
onerous.
Next best idea is to make GHC use repeatable temporary .c & .o file
names for each invocation. There is already a unique temporary
directory where all the the temporary files are created. This suggests
I do not need to worry about adversarial races. So GHC just need to
avoid racing with itself. I see a couple of options:
1) newTempName should create a new subdirectory for each call and
the return a fixed name inside of this (so /tmp/ghc28016_0/ghc28016_0.c
above would become /tmp/ghc28016_0/0/dummy.c)
2) mkExtraCObj could compute some hash function of its xs
argument (C program text) and then create a file named, e.g.
/tmp/ghc28016_0/38eb8d8eb0abe9c828ba60983e2a97f7a069ec41.c
Which of these two looks better? Other ideas?
Would people be open to accepting a patch along these lines if I were
to write one?
The steps to make ar not include the timestamps were
1) Add "AR_OPTS = qD" to build.mk. This takes care of most .a files.
2) Set AR_FLAGS=qD in the evironment and dummy out ranlib (create a
no-op script called ranlib on your PATH prior to real ranlib). This
takes care of libffi build.
Thanks
Greg
--
nest.cx is Gmail hosted, use PGP for anything private. Key:
http://tinyurl.com/ho8qg
Fingerprint: 5E2B 2D0E 1E03 2046 BEC3 4D50 0B15 42BD 8DF5 A1B0
More information about the Glasgow-haskell-users
mailing list