[xmonad] Frequent xmonad crashes (SIGBUS)

Tristan Ravitch travitch at cs.wisc.edu
Tue Feb 26 19:26:59 CET 2013


On Tue, Feb 26, 2013 at 12:21:40PM -0500, Brandon Allbery wrote:
> On Mon, Feb 25, 2013 at 5:10 PM, Zev Weiss <zev at bewilderbeest.net> wrote:
> 
> > For the record, in case anyone else happens to encounter this -- it
> > was pointed out to me by a helpful individual off-list that this is
> > actually a known problem when running binaries mmaped out of AFS,
> > where my xmonad binary happens to reside.  I've changed my xsession
> > script to run it out of a local filesystem instead and am no longer
> > seeing this behavior.
> >
> 
> Can you give me any more information about this?  Simply running
> executables out of AFS does not have any known issues; if it did, Carnegie
> Mellon University (my previous employer) would have run headlong into it
> long since, and it would have been fixed by now.
> 

This is a problem I have been annoyed by for a few years now and I've had
limited success in tracking it down.  The problem doesn't affect all
binaries - seemingly just haskell binaries.  It also gets worse with
larger haskell binaries.

The problem seems to be related to the state of the AFS cache somehow.
Just after a reboot with a cold cache, I have to run ghc (some of my GHC
installs are on AFS) 5+ times in a row to get it to do anything besides
die with a SIGBUS.  The same goes for pandoc.  After the binary starts up
properly the first time, it seems to be in cache and doesn't act up until
it gets kicked out of cache.

Here is an old cafe thread where I tried to track this down - not many
other people reported the problem, but those who did seemed resigned to
it:

  https://groups.google.com/forum/?fromgroups=#!searchin/haskell-cafe/tristan$20afs/haskell-cafe/6qv-Mw8t9kA/XL5x_yE2fX8J

That post highlights a separate but seemingly related problem.  There GHC
fails when it hits some TH code and has to load a few libraries off of
disk during compilation.  I don't know exactly what the ghci linker does
there, but it is prepping that code for execution and explodes if the
libraries it is loading are not in cache.  In those cases, I have to keep
running 'cabal install' and ghc keeps making forward progress, loading a
few more successfully each time.  Eventually they are all in cache and it
works.

My guess is that the problem is some bad interaction between whatever the
GHC RTS does for file IO and AFS, but it is hard to figure out where to
start looking.  I have never gotten a useful backtrace in any of these
crashes.  Most applications don't have any problems, so I imagine it has
to be GHC somehow.  That said, I've seen some similar crashes in
non-Haskell code if a program is using shared libraries that live on AFS.
if some application eats all of your memory and caches start getting
evicted, sometimes those applications with AFS-based shared libraries
explode in a similar way.  

Any insight would definitely be appreciated, since this annoys me a few
times a day.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://www.haskell.org/pipermail/xmonad/attachments/20130226/11c52906/attachment.pgp>


More information about the xmonad mailing list