Anyone else failing to validate on 'linker_unload'?
Simon Marlow
marlowsd at gmail.com
Wed Sep 4 12:08:52 CEST 2013
That's the bug. Fix coming!
Simon
On 02/09/13 05:46, Austin Seipp wrote:
> I (think) I see the problem, but maybe I'm just tired and shooting in the dark.
>
> The only time checkUnload really iteratively calls free is in
> CheckUnload.c (I say 'iteratively', because the fact you're
> touching/freeing blocks inside already free blocks make me
> suspicious.) The relevant code is:
>
> ---------------------------------------------------------------------------
> // Look through the unloadable objects, and any object that is still
> // marked as unreferenced can be physically unloaded, because we
> // have no references to it.
> prev = NULL;
> for (oc = unloaded_objects; oc; prev = oc, oc = oc->next) {
> if (oc->referenced == 0) {
> if (prev == NULL) {
> unloaded_objects = oc->next;
> } else {
> prev->next = oc->next;
> }
> IF_DEBUG(linker, debugBelch("Unloading object file %s\n",
> oc->fileName));
> freeObjectCode(oc);
> } else {
> IF_DEBUG(linker, debugBelch("Object file still in use: %s\n",
> oc->fileName));
> }
> }
> ---------------------------------------------------------------------------
>
> Note that we iterate over oc->next in order to check every unloadable
> object. If the object can be unloaded, we call freeObjectCode:
>
> ---------------------------------------------------------------------------
> void freeObjectCode (ObjectCode *oc)
> {
> ....
> stgFree(oc->fileName);
> stgFree(oc->archiveMemberName);
> stgFree(oc);
> }
> ---------------------------------------------------------------------------
>
> So it would seem we free the object we point to during each traversal.
> This is probably bad and could lead to very weird behavior probably.
>
> Ryan, can you do one final thing? When you run that program, be sure
> to specify `+RTS -Dl` (must be linked with -debug.) This will enable
> all the debug output where the linker is concerned. There will be a
> few hundred lines just for initialization (based on my machine.) If my
> theory is correct, you'll probably see stuff like 'Unloading object
> file ...' right as the invalid read/segfault occurs.
>
>
> On Sun, Sep 1, 2013 at 11:28 PM, Ryan Newton <rrnewton at gmail.com> wrote:
>> Ah, yes I see. Well, giving it the proper arguments when running via
>> valgrind puts me back to an "Invalid read" segfault. I confirmed that the
>> linker_unload executable itself is 64 bit:
>>
>> $ file linker_unload
>> linker_unload: ELF 64-bit LSB executable, x86-64, version 1 (SYSV),
>> dynamically linked (uses shared libs), for GNU/Linux 2.6.18, not stripped
>>
>> ==72103== Command: ./linker_unload
>> /home/beehive/ryan_scratch/ghc-working/libraries/base/dist-install/build/libHSbase-4.7.0.0.a
>> /home/beehive/ryan_scratch/ghc-working/libraries/ghc-prim/dist-install/build/libHSghc-prim-0.3.1.0.a
>> /home/beehive/ryan_scratch/ghc-working/libraries/integer-gmp/dist-install/build/libHSinteger-gmp-0.5.1.0.a
>> ==72103==
>> ==72103== Invalid read of size 8
>> ==72103== at 0x479F9F: checkUnload (in
>> /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
>> ==72103== by 0x4689DA: GarbageCollect (in
>> /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
>> ==72103== by 0x4621F0: scheduleDoGC (in
>> /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
>> ==72103== by 0x462314: performGC_ (in
>> /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
>> ==72103== by 0x403341: main (in
>> /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
>> ==72103== Address 0xf45ed70 is 80 bytes inside a block of size 120 free'd
>> ==72103== at 0x4A063F0: free (vg_replace_malloc.c:446)
>> ==72103== by 0x479F9E: checkUnload (in
>> /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
>> ==72103== by 0x4689DA: GarbageCollect (in
>> /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
>> ==72103== by 0x4621F0: scheduleDoGC (in
>> /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
>> ==72103== by 0x462314: performGC_ (in
>> /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
>> ==72103== by 0x403341: main (in
>> /home/beehive/ryan_scratch/ghc-working/testsuite/tests/rts/linker_unload)
>> ==72103==
>>
>>
>>
>>
>> On Sun, Sep 1, 2013 at 11:01 PM, Austin Seipp <aseipp at pobox.com> wrote:
>>>
>>> Oops, should have said this: if you checkout the Makefile for
>>> testsuite/tests/rts - at the very bottom - you'll see the
>>> linker_unload target. When run, the executable needs some arguments so
>>> it knows what to try and load:
>>>
>>> ---
>>> ./linker_unload $(BASE) $(GHC_PRIM) $(INTEGER_GMP)
>>> ---
>>>
>>> So you also need to provide the right arguments. Sorry about that!
>>>
>>> On Sun, Sep 1, 2013 at 9:54 PM, Ryan Newton <rrnewton at gmail.com> wrote:
>>>> Hi Austin,
>>>>
>>>> Should have said -- this is 64-bit RHEL 6 (my academic departments
>>>> standardized configuration).
>>>>
>>>> $ uname -a
>>>> Linux 2.6.32-358.14.1.el6.x86_64 #1 SMP Mon Jun 17 15:54:20 EDT
>>>> 2013
>>>> x86_64 x86_64 x86_64 GNU/Linux
>>>>
>>>> Weirdly it seems to have a different behavior when run by "make" and by
>>>> hand. When I run the make command you provided it segfaults with error
>>>> code
>>>> 2:
>>>>
>>>> cd . && $MAKE -s --no-print-directory linker_unload </dev/null
>>>>> linker_unload.run.stdout 2>linker_unload.run.stderr
>>>> Wrong exit code (expected 0 , actual 2 )
>>>> Stdout:
>>>> Stderr:
>>>> make[1]: *** [linker_unload] Segmentation fault (core dumped)
>>>> *** unexpected failure for linker_unload(normal)
>>>> Unexpected results from:
>>>> TEST="linker_unload"
>>>>
>>>> But then when I run it by hand with "./linker_unload" or "valgrind
>>>> ./linker_unload" I get an unknown symbol error with exit code 1:
>>>>
>>>> ==70613==
>>>> linker_unload: Test.o: unknown symbol `base_GHCziNum_zdfNumInt_closure'
>>>> linker_unload: resolveObjs failed
>>>> ==70613==
>>>> ==70613== HEAP SUMMARY:
>>>>
>>>>
>>>> -Ryan
>>>>
>>>>
>>>> On Sun, Sep 1, 2013 at 10:46 PM, Austin Seipp <aseipp at pobox.com> wrote:
>>>>>
>>>>> I have also not seen this test fail on amd64/Linux since Simon
>>>>> committed it. From the valgrind output, it looks like your machine is
>>>>> 32bit, correct Ryan? Edward told me yesterday on IRC he saw this fail
>>>>> on 64bit Linux, so I'm a little confused.
>>>>>
>>>>> Can you please try this?
>>>>>
>>>>> $ cd testsuite/tests/rts
>>>>> $ make TEST="linker_unload" EXTRA_HC_OPTS="-debug"
>>>>> $ valgrind ./linker_unload
>>>>>
>>>>> This will link you with a debug copy of the RTS, so Valgrind/GDB can
>>>>> relate errors back to the relevant source code. Perhaps this will help
>>>>> shed light on your problem.
>>>>>
>>>>>
>>>>> On Sun, Sep 1, 2013 at 9:39 PM, Edward Z. Yang <ezyang at mit.edu> wrote:
>>>>>> However, as far as I can tell, it is not 100% reproduceable.
>>>>>> In a recent validate of 5f98d44d8617756971cf47c040f2556de4e98f63,
>>>>>> this test does not fail.
>>>>>>
>>>>>> Edward
>>>>>>
>>>>>> Excerpts from Edward Z. Yang's message of Fri Aug 30 21:55:29 -0700
>>>>>> 2013:
>>>>>>> Yes, this one is failing for me too. Probably related to the
>>>>>>> recent object unload patch for
>>>>>>> http://ghc.haskell.org/trac/ghc/ticket/8039
>>>>>>>
>>>>>>> Excerpts from Ryan Newton's message of Fri Aug 30 21:51:24 -0700
>>>>>>> 2013:
>>>>>>>> That test builds an executable named 'linker_unload' which
>>>>>>>> segfaults
>>>>>>>> for
>>>>>>>> me. Valgrind says this:
>>>>>>>>
>>>>>>>>
>>>>>>>> ==42800== Invalid read of size 8
>>>>>>>> ==42800== at 0x66945F: checkUnload (in
>>>>>>>>
>>>>>>>>
>>>>>>>> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>>>>>>>> ==42800== by 0x657F7A: GarbageCollect (in
>>>>>>>>
>>>>>>>>
>>>>>>>> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>>>>>>>> ==42800== by 0x651790: scheduleDoGC (in
>>>>>>>>
>>>>>>>>
>>>>>>>> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>>>>>>>> ==42800== by 0x6518B4: performGC_ (in
>>>>>>>>
>>>>>>>>
>>>>>>>> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>>>>>>>> ==42800== by 0x403BB1: main (in
>>>>>>>>
>>>>>>>>
>>>>>>>> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>>>>>>>> ==42800== Address 0x5bfdd20 is 80 bytes inside a block of
>>>>>>>> size
>>>>>>>> 120
>>>>>>>> free'd
>>>>>>>> ==42800== at 0x4C273F0: free (vg_replace_malloc.c:446)
>>>>>>>> ==42800== by 0x66945E: checkUnload (in
>>>>>>>>
>>>>>>>>
>>>>>>>> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>>>>>>>> ==42800== by 0x657F7A: GarbageCollect (in
>>>>>>>>
>>>>>>>>
>>>>>>>> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>>>>>>>> ==42800== by 0x651790: scheduleDoGC (in
>>>>>>>>
>>>>>>>>
>>>>>>>> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>>>>>>>> ==42800== by 0x6518B4: performGC_ (in
>>>>>>>>
>>>>>>>>
>>>>>>>> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>>>>>>>> ==42800== by 0x403BB1: main (in
>>>>>>>>
>>>>>>>>
>>>>>>>> /home/beehive/ryan_scratch/validate14/testsuite/tests/rts/linker_unload)
>>>>>>>>
>>>>>>>> This went the same across a couple different independent
>>>>>>>> checkouts.
>>>>>>>>
>>>>>>>> -Ryan
>>>>>>
>>>>>> _______________________________________________
>>>>>> ghc-devs mailing list
>>>>>> ghc-devs at haskell.org
>>>>>> http://www.haskell.org/mailman/listinfo/ghc-devs
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards,
>>>>> Austin - PGP: 4096R/0x91384671
>>>>>
>>>>> _______________________________________________
>>>>> ghc-devs mailing list
>>>>> ghc-devs at haskell.org
>>>>> http://www.haskell.org/mailman/listinfo/ghc-devs
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Austin - PGP: 4096R/0x91384671
>>
>>
>
>
>
More information about the ghc-devs
mailing list