[jhc] JHC's internal format conflicting with ARM instruction set?

Simon Broadhead sbroadhead at gmail.com
Fri Jan 7 05:57:35 CET 2011


I've spent a day and half playing with JHC and setting it up to cross
compile to ARM and link with an iPhone application. I'm using the
Apple iPhone SDK 4.2 toolchain with debug flags enabled. I exported a
simple function from Haskell and everything worked fine, but when I
tried a function which requires lazy evaluation, unpredictable
behaviour including crashing began to occur. When running in the
simulator (i.e., as x86 code) everything is fine, but on the actual
ARM device, it crashes as soon as a lazy thunk has to be evaluated --
this happens even if I just generate C code and include it in my Xcode
project with the appropriate compiler flags.

After a few hours of debugging, I believe the problem comes from a
property of the ARM instruction set that conflicts with the way
pointers are stored internally by JHC. The problem seems to arise
because the instruction that is being generated by GCC for calling
function pointers (BLX -- Branch with link and exchange instruction
set) reads the LSB of the function pointer and, depending on if it is
0 or 1, switches between two different instruction sets: ARM or Thumb.
GCC's code generation is smart enough to determine if a function is
compiled using ARM or Thumb and so the address returned by '&function'
can have either a 0 or 1 in its lowest bit. When the BLX instruction
is called, the bottom two bits are ignored and the correct location is
jumped to, but if the lowest bit is incorrect, the instruction set
changes and the instruction it finds there will be interpreted as
garbage. Since JHC uses the low-order bits of pointers, including
function pointers, to store flags, and then does its own truncation
before the function pointer is called (LSB always 0), undefined
behaviour occurs. (A minor side effect of this behaviour is that the
TO_SPTR_C macro, which adds flags to a pointer through pointer
arithmetic, doesn't work and has to be changed to use the | operator
in order for it to generate a valid pointer.)

I have yet to find any combination of compiler flags that will allow
this to work on ARM, if any exist. Since the behaviour is defined at
the instruction level, any compiler-level fixes would probably have to
generate a significant amount of extra code (and overhead) to keep it
working in any case.

Simon Broadhead

More information about the jhc mailing list