[Haskell-cafe] Segfault in the libc malloc using FFI (occurs only in x86_64)

Anthony Cowley acowley at seas.upenn.edu
Sat Aug 20 20:34:06 CEST 2011


I reported a segfault in FFI calls a while back that should be fixed
in 7.2. It turned out to be a stack alignment issue, which may or may
not be related to what you are seeing.

Here is the bug report:
<http://hackage.haskell.org/trac/ghc/ticket/5250>

Anthony

On Sat, Aug 20, 2011 at 12:51 PM, Vincent Gerard <vincent at xenbox.fr> wrote:
> Hi cafe,
>
> I have been struggling with this issue for the past days. I have
> investigated at the Haskell, C and even at assembly level...
> Perhaps I'm missing something big ??
>
> I hope someone familiar with FFI could help me on this Segfault.
>
> My env is: Linux Debian 3.0.0-1, SMP, GHC 7.0.4, x86_64, eglibc 2.13-10
> (This issues occurs as well on others people with /= env, but only with
> x86_64 arch)
>
> The bug is in the hsmagick library (FFI bindings to
> GraphicsMagick), for which I am the maintainer.
>
> --
> So here is a reproducer (Works fine in 32 bit, segfault in 64).
>
> You should have hsmagick + GraphicsMagick dev libs installed.
>
>  import Graphics.Transform.Magick.Images
>
>  main =  do
>    initializeMagick
>    c <- readImage "image.jpg"
>    putStrLn "End"
> --
> Executing this program, compiled with standard options, a Segfault
> occurs at runtime.
>
> If I compile this small program with -threaded option an assertion error
> is thrown at runtime:
>
>  main: malloc.c:3096: sYSMALLOc: Assertion `(old_top == (((mbinptr)
>  (((char *) &((av)->bins[((1) - 1) * 2])) - __builtin_offsetof (struct
>  malloc_chunk, fd)))) && old_size == 0) || ((unsigned long) (old_size)
>  >= (unsigned long)((((__builtin_offsetof (struct malloc_chunk,
>  >fd_nextsize))+((2 * (sizeof(size_t))) - 1)) & ~((2 * (sizeof(size_t)))
>  >- 1))) && ((old_top)->size & 0x1) && ((unsigned long)old_end &
>  >pagemask) == 0)' failed.
>
> This assertion error is cryptic for me... but it maybe could help
> someone here.
>
> However, having a C experience, investigating a Segfault is in
> my habits, so let's install debugs symbols on everything and dig with
> gdb.
>
> Here is the backtrace:
>
>  Program received signal SIGSEGV, Segmentation fault.
>  0x00007ffff510292c in malloc_consolidate (av=0x7ffff540fe60) at
>  malloc.c:5161 (this is inside the eglibc source)
>  5161              unlink(p, bck, fwd);
> (gdb) bt
>  #0  0x00007ffff510292c in malloc_consolidate (av=0x7ffff540fe60) at
>  malloc.c:5161
>  #1  0x00007ffff5104d64 in _int_malloc (av=0x7ffff540fe60,bytes=8520)
>  at malloc.c:4373
>  #2  0x00007ffff5107420 in __libc_malloc (bytes=8520) at malloc.c:3660
>  #3  0x00007ffff67bf311 in CloneImageInfo (image_info=0x0) at
>  magick/image.c:1012
>  #4  0x0000000000439dfa in
>  hsmagickzm0zi5_GraphicsziTransformziMagickziFFIHelpers_mkNewImageInfo1_info
>  ()
>
> Yay, the Segfault is deep in the libc after a regular malloc (8520bytes)
> (I dug into the GraphicsMagick source starting from magick/image.c:1012)
>
> And this libc call is called by a call to CloneImageInfo(NULL) from
> Haskell, which is correct as it is graphicsMagick way to
> allocate an empty ImageInfo data structure.
> (http://www.graphicsmagick.org/api/image.html#cloneimageinfo)
>
> I tried in a C program to execute (after initializing ImageMagick)
> image_info=CloneImageInfo((ImageInfo *) NULL);
> And of course .... no Segfault...
>
> Let's look at the FFI call...
>
> This FFI call is the done by the mkNewImageInfo as shown in the trace.
>
> The Haskell FFI layer here has done the right job, calling
> the C function with the right args (NULL)
>
> Here is all the code related to mkNewImageInfo (the last Haskell part we
> see in the backtrace)
>
> ---
>  mkNewImageInfo :: IO (ForeignPtr HImageInfo)
>  mkNewImageInfo = mkFinalizedImageInfo =<< mkNewImageInfo_
>
>  mkFinalizedImageInfo :: Ptr HImageInfo -> IO (ForeignPtr HImageInfo)
>  mkFinalizedImageInfo = newForeignPtr imageInfoFinalizer
>
>  mkNewImageInfo_ :: IO (Ptr HImageInfo)
>  mkNewImageInfo_ = clone_image_info nullPtr -- CALL before the Segfault
>
>  destroyImageInfo :: Ptr HImageInfo -> IO ()
>  destroyImageInfo = destroy_image_info
>
>  foreign import ccall "static magick/api.h &DestroyImageInfo"
>    imageInfoFinalizer :: FunPtr (Ptr HImageInfo -> IO ())
>
>  foreign import ccall "static magick/api.h CloneImageInfo"
>    clone_image_info :: Ptr HImageInfo -> IO (Ptr HImageInfo)
> --
>
> To me, this code is right (and works perfectly in 32 bit).
> Furthermore, as the call is made with a nullPtr arg, no data structure
> is used ...
> I tried this code without using ForeignPtr Finalizers (hsmagick <= 0.4)
> and the Segfault occurs as well ...
>
> I even had a look at the generated assembly, which also looks right...
>
> It seems there is a real segmentation error, or an illegal access to a
> memory page... But I could not find the root cause of this error.
>
> And an additionnal piece to the puzzle, when running the program inside
> valgrind, there is no Segfault and the program works as expected.
>
>
> So if anyone have an idea on how only on 64bits arch an Haskell FFI
> call could led to a Segfault in the libc malloc, it would make my day,
> perhaps my week :)
>
> Thanks !
>
> Vincent Gerard
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>



More information about the Haskell-Cafe mailing list