Allocation & Marshalling Question (again)

Wolfgang Thaller wolfgang.thaller at gmx.net
Thu May 29 08:20:59 EDT 2003


Hello Adrian,

> It would be nice to just partially apply a foreign function of type..
>   f :: CString -> CString -> IO CInt
> .. to a Haskell string, as in..
>   g :: CString -> IO CInt
>   g = f <+ withCString "Hello"

OK... This is the special case of _constant_ values...
It might be safe to write the following:

newtype GarbageCollectedCString = GarbageCollectedCString (ForeignPtr 
CChar)

toCString :: String -> GarbageCollectedCString

After creation, the C String should never be modified (treat it as a 
const char* in C, not a char*). it would be allocated at some point in 
the future by the garbage collector.

The with* and alloca* functions, however, return 'Ptr's, not 
'ForeignPtr's. Ptrs are just address values, it's not possible to 
garbage-collect them or to attach finalizers (that's what ForeignPtrs 
are for, there's a slight performance hit).

Also, as soon as you plan to modify the block pointed to by a 
ForeignPtr, you have to be in the IO monad or lose referential 
transparency.

>
> Of couse there's nothing to stop you partially applying the function
> that results from the explicit lambdas code above, but only at the
> expense of re-marshalling the same Haskell string every time the
> underlying foreign function is called.
>
> Another example, this foreign import seems quite legal (not necessarily
> safe, but legal:-)
>   foreign import ccall unsafe "cfunc" cfunc :: CString -> Word32
> but its awkward to use it in conjunction with withCString.
>
>   withCString "Hello" cfunc
> gives a type error (of course), you need this instead..
>   (unsafePerformIO $ withCString "Hello" (\p -> return $! cfunc p))

OK, that particular unsafePerformIO looks safe to me. However, if you 
have

foreign import ccall unsafe "cfunc" cfunc :: CString -> Word32 -> Word32

(unsafePerformIO $ withCString "Hello" (\p -> return $! cfunc p)) 42

... you're in terrible danger;

Before applying the argument '42', the unsafePerformIO is evaluated; 
and before the argument '42' is applied, the unsafePerformIO will have 
to have _finished_ evaluating. That means withCString will be finished, 
which, in turn, means that withCString will already have deallocated 
the buffer for the string. Bad. Very bad.
Perhaps it won't segfault, because the memory for the buffer won't have 
been reused and overwritten yet, but it's still bad.

>
>> What could the pure function you're passing to allocaBytes do with
>> the pointer?
>
> [My understanding of Haskell was that all functions are pure, even if
> their type is (Ptr a -> IO b). What's impure is the action itself 
> (which
> always has type IO <something>).]

Sorry. Sloppy terminology on my part.
My point was that allocaBytes (in it's current form) relies on getting 
it's hands onto the action. The action

allocaBytes 42 f

will first allocate 42 bytes, then execute the action (f ptr) and then 
free the 42 bytes again. If (f ptr) is not of type IO a, but of type a 
-> b (where b might be IO c or c -> d or ...), what can possibly happen 
between before the block is freed again?

>> When would the storage be freed again? The original allocaBytes frees
>> it after the IO action has finished, but before returing itself. How
>> would that apply to the pure version?
>
> Without understanding the technical details of various Haskell
> implementations I can't say for sure, though I would imagine the
> storage would be freed when the corresponding pointer became
> garbage. This assumes the storage space is (haskell) heap allocated
> (as I gather it is in ghc from Simon M's reply). If it was stack
> allocated then I guess there would be problems.

It's a Ptr, not a ForeignPtr, so the Garbage Collector cannot keep 
track of whether it's still live or not.
GHC's implementation of allocaBytes does some black magic for superior 
performance, and I think that's what SimonM was referring to. But the 
use of the Haskell heap is just an implementation detail here, not a 
feature...

 From the library source code (libraries/base/Foreign/Marshal/Alloc.hs):

#ifdef __GLASGOW_HASKELL__
-- ... some completely unintelligible black magic ...
#else
allocaBytes      :: Int -> (Ptr a -> IO b) -> IO b
allocaBytes size  = bracket (mallocBytes size) free
#endif

Look at the "generic" non-glasgow-haskell implementation: it just 
mallocs the block and then frees it, no chance for GC (malloc & free 
are really implemented using the C stdlib).

> Now it might well be that with current implementations it really
> is unsafe to use withCString with any function that is not of
> type (CString -> IO b). What I want to know is..
> 	* is that so? (maybe)

If you want to use Ptrs (and CString is basically Ptr CChar), yes, 
because the block has to be explicitly freed by the *action* 
withCString.
If you use ForeignPtrs instead of Ptrs, you can have that, or something 
like the toCString function whose signature I presented above. But you 
can have that only if you "promise" not to modify the data block.
For a function like allocaBytes, you will definitely want to modify the 
allocated block, so it's allocation and use has to be in the IO monad 
in order to be safe.

> 	* should that be so? (no IMHO)

Yes, it should be. If I have a Ptr, I want to be able to do pointer 
arithmetic with it, and I want it to be represented as efficiently as 
possible. That means I have to stay in the IO monad.
If I have a ForeignPtr, I don't want pointer arithmetic, and I want it 
to be garbage collected.

Now the question we should be asking is, why isn't there a version of 
CString based on ForeignPtrs?

Cheers,

Wolfgang




More information about the FFI mailing list