Anyone else failing to validate on 'linker_unload'?
Austin Seipp
aseipp at pobox.com
Mon Sep 2 06:46:02 CEST 2013
I (think) I see the problem, but maybe I'm just tired and shooting in the dark.
The only time checkUnload really iteratively calls free is in
CheckUnload.c (I say 'iteratively', because the fact you're
touching/freeing blocks inside already free blocks make me
suspicious.) The relevant code is:
---------------------------------------------------------------------------
// Look through the unloadable objects, and any object that is still
// marked as unreferenced can be physically unloaded, because we
// have no references to it.
prev = NULL;
for (oc = unloaded_objects; oc; prev = oc, oc = oc->next) {
if (oc->referenced == 0) {
if (prev == NULL) {
unloaded_objects = oc->next;
} else {
prev->next = oc->next;
}
IF_DEBUG(linker, debugBelch("Unloading object file %s\n",
oc->fileName));
freeObjectCode(oc);
} else {
IF_DEBUG(linker, debugBelch("Object file still in use: %s\n",
oc->fileName));
}
}
---------------------------------------------------------------------------
Note that we iterate over oc->next in order to check every unloadable
object. If the object can be unloaded, we call freeObjectCode:
---------------------------------------------------------------------------
void freeObjectCode (ObjectCode *oc)
{
....
stgFree(oc->fileName);
stgFree(oc->archiveMemberName);
stgFree(oc);
}
---------------------------------------------------------------------------
So it would seem we free the object we point to during each traversal.
This is probably bad and could lead to very weird behavior probably.
Ryan, can you do one final thing? When you run that program, be sure
to specify `+RTS -Dl` (must be linked with -debug.) This will enable
all the debug output where the linker is concerned. There will be a
few hundred lines just for initialization (based on my machine.) If my
theory is correct, you'll probably see stuff like 'Unloading object
file ...' right as the invalid read/segfault occurs.
On Sun, Sep 1, 2013 at 11:28 PM, Ryan Newton <rrnewton at gmail.com> wrote:
> Ah, yes I see. Well, giving it the proper arguments when running via
> valgrind puts me back to an "Invalid read" segfault. I confirmed that the
> linker_unload executable itself is 64 bit:
>
> $ file linker_unload
> linker_unload: ELF 64-bit LSB executable, x86-64, version 1 (SYSV),
> dynamically linked (uses shared libs), for GNU/Linux 2.6.18, not stripped
>
> ==72103== Command: ./linker_unload
> /home/beehive/ryan_scratch/ghc-working/libraries/base/dist-install/build/libHSbase-4.7.0.0.a
> /home/beehive/ryan_scratch/ghc-working/libraries/ghc-prim/dist-install/build/libHSghc-prim-0.3.1.0.a
> /home/beehive/ryan_scratch/ghc-working/libraries/integer-gmp/dist-install/build/libHSinteger-gmp-0.5.1.0.a
> ==72103==
> ==72103== Invalid read of size 8
> ==72103== at 0x479F9F: checkUnload (in
> /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
> ==72103== by 0x4689DA: GarbageCollect (in
> /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
> ==72103== by 0x4621F0: scheduleDoGC (in
> /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
> ==72103== by 0x462314: performGC_ (in
> /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
> ==72103== by 0x403341: main (in
> /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
> ==72103== Address 0xf45ed70 is 80 bytes inside a block of size 120 free'd
> ==72103== at 0x4A063F0: free (vg_replace_malloc.c:446)
> ==72103== by 0x479F9E: checkUnload (in
> /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
> ==72103== by 0x4689DA: GarbageCollect (in
> /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
> ==72103== by 0x4621F0: scheduleDoGC (in
> /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
> ==72103== by 0x462314: performGC_ (in
> /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
> ==72103== by 0x403341: main (in
> /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
> ==72103==
>
>
>
>
> On Sun, Sep 1, 2013 at 11:01 PM, Austin Seipp <aseipp at pobox.com> wrote:
>>
>> Oops, should have said this: if you checkout the Makefile for
>> testsuite/tests/rts - at the very bottom - you'll see the
>> linker_unload target. When run, the executable needs some arguments so
>> it knows what to try and load:
>>
>> ---
>> ./linker_unload $(BASE) $(GHC_PRIM) $(INTEGER_GMP)
>> ---
>>
>> So you also need to provide the right arguments. Sorry about that!
>>
>> On Sun, Sep 1, 2013 at 9:54 PM, Ryan Newton <rrnewton at gmail.com> wrote:
>> > Hi Austin,
>> >
>> > Should have said -- this is 64-bit RHEL 6 (my academic departments
>> > standardized configuration).
>> >
>> > $ uname -a
>> > Linux 2.6.32-358.14.1.el6.x86_64 #1 SMP Mon Jun 17 15:54:20 EDT
>> > 2013
>> > x86_64 x86_64 x86_64 GNU/Linux
>> >
>> > Weirdly it seems to have a different behavior when run by "make" and by
>> > hand. When I run the make command you provided it segfaults with error
>> > code
>> > 2:
>> >
>> > cd . && $MAKE -s --no-print-directory linker_unload </dev/null
>> >>linker_unload.run.stdout 2>linker_unload.run.stderr
>> > Wrong exit code (expected 0 , actual 2 )
>> > Stdout:
>> > Stderr:
>> > make[1]: *** [linker_unload] Segmentation fault (core dumped)
>> > *** unexpected failure for linker_unload(normal)
>> > Unexpected results from:
>> > TEST="linker_unload"
>> >
>> > But then when I run it by hand with "./linker_unload" or "valgrind
>> > ./linker_unload" I get an unknown symbol error with exit code 1:
>> >
>> > ==70613==
>> > linker_unload: Test.o: unknown symbol `base_GHCziNum_zdfNumInt_closure'
>> > linker_unload: resolveObjs failed
>> > ==70613==
>> > ==70613== HEAP SUMMARY:
>> >
>> >
>> > -Ryan
>> >
>> >
>> > On Sun, Sep 1, 2013 at 10:46 PM, Austin Seipp <aseipp at pobox.com> wrote:
>> >>
>> >> I have also not seen this test fail on amd64/Linux since Simon
>> >> committed it. From the valgrind output, it looks like your machine is
>> >> 32bit, correct Ryan? Edward told me yesterday on IRC he saw this fail
>> >> on 64bit Linux, so I'm a little confused.
>> >>
>> >> Can you please try this?
>> >>
>> >> $ cd testsuite/tests/rts
>> >> $ make TEST="linker_unload" EXTRA_HC_OPTS="-debug"
>> >> $ valgrind ./linker_unload
>> >>
>> >> This will link you with a debug copy of the RTS, so Valgrind/GDB can
>> >> relate errors back to the relevant source code. Perhaps this will help
>> >> shed light on your problem.
>> >>
>> >>
>> >> On Sun, Sep 1, 2013 at 9:39 PM, Edward Z. Yang <ezyang at mit.edu> wrote:
>> >> > However, as far as I can tell, it is not 100% reproduceable.
>> >> > In a recent validate of 5f98d44d8617756971cf47c040f2556de4e98f63,
>> >> > this test does not fail.
>> >> >
>> >> > Edward
>> >> >
>> >> > Excerpts from Edward Z. Yang's message of Fri Aug 30 21:55:29 -0700
>> >> > 2013:
>> >> >> Yes, this one is failing for me too. Probably related to the
>> >> >> recent object unload patch for
>> >> >> http://ghc.haskell.org/trac/ghc/ticket/8039
>> >> >>
>> >> >> Excerpts from Ryan Newton's message of Fri Aug 30 21:51:24 -0700
>> >> >> 2013:
>> >> >> > That test builds an executable named 'linker_unload' which
>> >> >> > segfaults
>> >> >> > for
>> >> >> > me. Valgrind says this:
>> >> >> >
>> >> >> >
>> >> >> > ==42800== Invalid read of size 8
>> >> >> > ==42800== at 0x66945F: checkUnload (in
>> >> >> >
>> >> >> >
>> >> >> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>> >> >> > ==42800== by 0x657F7A: GarbageCollect (in
>> >> >> >
>> >> >> >
>> >> >> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>> >> >> > ==42800== by 0x651790: scheduleDoGC (in
>> >> >> >
>> >> >> >
>> >> >> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>> >> >> > ==42800== by 0x6518B4: performGC_ (in
>> >> >> >
>> >> >> >
>> >> >> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>> >> >> > ==42800== by 0x403BB1: main (in
>> >> >> >
>> >> >> >
>> >> >> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>> >> >> > ==42800== Address 0x5bfdd20 is 80 bytes inside a block of
>> >> >> > size
>> >> >> > 120
>> >> >> > free'd
>> >> >> > ==42800== at 0x4C273F0: free (vg_replace_malloc.c:446)
>> >> >> > ==42800== by 0x66945E: checkUnload (in
>> >> >> >
>> >> >> >
>> >> >> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>> >> >> > ==42800== by 0x657F7A: GarbageCollect (in
>> >> >> >
>> >> >> >
>> >> >> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>> >> >> > ==42800== by 0x651790: scheduleDoGC (in
>> >> >> >
>> >> >> >
>> >> >> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>> >> >> > ==42800== by 0x6518B4: performGC_ (in
>> >> >> >
>> >> >> >
>> >> >> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>> >> >> > ==42800== by 0x403BB1: main (in
>> >> >> >
>> >> >> >
>> >> >> > /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>> >> >> >
>> >> >> > This went the same across a couple different independent
>> >> >> > checkouts.
>> >> >> >
>> >> >> > -Ryan
>> >> >
>> >> > _______________________________________________
>> >> > ghc-devs mailing list
>> >> > ghc-devs at haskell.org
>> >> > http://www.haskell.org/mailman/listinfo/ghc-devs
>> >>
>> >>
>> >>
>> >> --
>> >> Regards,
>> >> Austin - PGP: 4096R/0x91384671
>> >>
>> >> _______________________________________________
>> >> ghc-devs mailing list
>> >> ghc-devs at haskell.org
>> >> http://www.haskell.org/mailman/listinfo/ghc-devs
>> >
>> >
>>
>>
>>
>> --
>> Regards,
>> Austin - PGP: 4096R/0x91384671
>
>
--
Regards,
Austin - PGP: 4096R/0x91384671
More information about the ghc-devs
mailing list