behaviour of ghci on .c modules that are part of a library

Axel Simon Axel.Simon at in.tum.de
Fri Jul 16 07:36:12 EDT 2010


Dear Haskell maintainers,

I've progressed a little and found that the problem is down to  
accessing global variables that are declared in dynamic libraries. In  
a nutshell, this doesn't as the addresses of these global variables  
are all wrong when ghci is executing the code. So, I think I hit:

http://hackage.haskell.org/trac/ghc/ticket/781

I was able to work around this problem by compiling the C modules with  
-fPIC. This bug is pretty bad, I'd say. I've added myself to its CC  
list.

Cheers,
Axel

On 14.07.2010, at 16:51, Axel Simon wrote:

> Hi all,
>
> I'm trying to debug a segfault relating to the memory management in  
> Gtk2Hs. Rather than make you read the ticket http://hackage.haskell.org/trac/gtk2hs/ticket/1183 
>  , I'll describe the problem:
>
> - compiler 6.12.1 or 6.12.3
> - darcs head of Gtk2Hs with #define DEBUG instead of #undef DEBUG in  
> gtk/Graphics/UI/Gtk/General/hsthread.c
> - platform Ubuntu Linux, x86-64
> - to reproduce: cd gtk2hs/gtk/demo/hello and run ghci World.hs and  
> type 'main'
>
> A window with the "Hello World" button appears. After a few seconds,  
> the GC runs and the finaliser of the GtkButton is run since the  
> Haskell program no longer holds a reference to that object (only the  
> GtkWindow in C land has).
>
> Thus, the GC calls a C function gtk2hs_g_object_unref_from_mainloop  
> which is supposed to enqueue the object into a global data structure  
> from which objects are later taken and g_object_unref is called on  
> them.
>
> This global data structure is protected by a mutex, which is  
> acquired using g_static_mutex_lock:
>
> void gtk2hs_g_object_unref_from_mainloop(gpointer object) {
>
>  int mutex_locked = 0;
>  if (threads_initialised) {
> #ifdef DEBUG
>      printf("acquiring lock to add a %s object at %lx\n",
>             g_type_name(G_OBJECT_TYPE(object)), (unsigned long)  
> object);
>      printf("value of lock function is %lx\n",
>             (unsigned long)  
> g_thread_functions_for_glib_use.mutex_lock);
> #endif
>    g_rand_new();
> #if defined( WIN32 )
>    EnterCriticalSection(&gtk2hs_finalizer_mutex);
> #else
>    g_static_mutex_lock(&gtk2hs_finalizer_mutex);
> #endif
>    mutex_locked = 1;
>  }
> [..]
>
> The program prints:
>
> acquiring lock to add a GtkButton object at 22d8020
> value of lock function is 0
> zsh: segmentation fault  ghci World
>
> Now the debugging weirdness starts. Whatever I do, I cannot get gdb  
> to find the symbol gtk2hs_g_object_unref_from_mainloop.
>
> Since the function above is contained in a C file that comes with  
> our Haskell library, I tried to add "cc-options: -g" and "cc- 
> options: -ggdb -O0", but maybe somewhere symbols are stripped. So I  
> added the bogus function call to "g_rand_new()" which is not called  
> anywhere else and gdb stops as follows:
>
> acquiring lock to add a GtkButton object at 2105020
> value of lock function is 0
> [Switching to Thread 0x7ffff41ff710 (LWP 15735)]
>
> Breakpoint 12, 0x00007ffff115bfa0 in g_rand_new () from /usr/lib/ 
> libglib-2.0.so
>
> This all seems reasonable, but:
>
> (gdb) bt
> #0  0x00007ffff115bfa0 in g_rand_new () from /usr/lib/libglib-2.0.so
> #1  0x00000000419b3792 in ?? ()
> #2  0x00007ffff678f078 in ?? ()
>
> i.e. the calling context is broken. I'm very, very sure that the  
> caller is indeed the above mentioned function and since g_rand_new  
> isn't called anywhere in my Haskell program (and otherwise the  
> calling context would be sane).
> I'm also passing the address of gtk2hs_g_object_unref_from_mainloop  
> as FinalizerPtr to all my ForeignPtrs, so there is no inlining going  
> on.
>
> Back to the culprit, the call to g_static_mutex_lock. This is a  
> macro that expands to
>
> *g_thread_functions_for_glib_use.mutex_lock
>
> where g_thread_functions_for_glib is a global variable that contains  
> a lot of function pointers. At the break point, it contains this:
>
> (gdb) print g_thread_functions_for_glib_use
> $33 = {mutex_new = 0x7ffff0cd9820 <g_mutex_new_posix_impl>,
>  mutex_lock = 0x7ffff6c8b3c0 <__pthread_mutex_lock>,
>  mutex_trylock = 0x7ffff0cd97b0 <g_mutex_trylock_posix_impl>,
>  mutex_unlock = 0x7ffff6c8ca00 <__pthread_mutex_unlock>,
>  mutex_free = 0x7ffff0cd9740 <g_mutex_free_posix_impl>,
> [..]
>
> So the call to g_mutex_lock should call the function  
> __pthread_mutex_lock but it calls NULL.
>
> I hoped that writing this email would give me a bit more insight  
> into the problem, but for now I suspect that something overwrites  
> either the stack or the code of the function.
>
> On the same platform, the compiled version prints:
>
> acquiring lock to add a GtkButton object at 1b05820
> value of lock function is 7f7adcabd3c0
> within mutex: adding finalizer to a GtkButton object!
>
> On Mac OS or i386, using ghci or ghc, version 6.10.4, it works as  
> well.
> Now for the fun bit: on i386 using ghci version 6.12.1 it works too.
>
> So it's an x86-64 and ghc 6.12.1 bug. According to Christian Maeder  
> who submitted the ticket, the problem persists in 6.12.3.
>
> Any hints and help appreciated,
> Cheers,
> Axel
>
>
>
>
>
>
>
> _______________________________________________
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users at haskell.org
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users



More information about the Glasgow-haskell-users mailing list