[Haskell-cafe] Storables and Ptrs

Mon Dec 6 19:22:34 CET 2010

On Mon, Dec 6, 2010 at 12:03 PM, Tyler Pirtle <teeler at gmail.com> wrote:
> On Sun, Dec 5, 2010 at 9:46 PM, Antoine Latter <aslatter at gmail.com> wrote:
>> On Sun, Dec 5, 2010 at 10:45 PM, Tyler Pirtle <teeler at gmail.com> wrote:
>>> Hi cafe,
>>>
>>> I'm just getting into Foreign.Storable and friends and I'm confused
>>> about the class storable. For GHC, there are instances of storable for
>>> all kinds of basic types (bool, int, etc) - but I can't find the
>>> actual declaration of those instances.
>>>
>>> I'm confused that it seems that all Storable instances operate on a
>>> Ptr, yet none of these types allow access to an underlying Ptr. I
>>> noticed that it's possible via Foreign.Marshal.Utils to call 'new' and
>>> get a datatype wrapped by a Ptr, but this isn't memory managed - I'd
>>> have to explicitly free it? Is that my only choice?
>>
>> The Storable class defines how to copy a particular Haskell type to or
>> from a raw memory buffer - specifically represented by the Ptr type.
>> It is most commonly used when interacting with non-Haskell (or
>> 'Foreign') code, which is why a lot of the tools look like they
>> require manual memory management (because foreign-owned resources must
>> often be managed separately anyway).
>>
>> Not all of the means of creating a Ptr type require manual memory
>> management - the 'alloca' family of Haskell functions allocate a
>> buffer and then free it automatically when outside the scope of the
>> passed-in callback (although 'continuation' or 'action' would be the
>> more Haskell-y way to refer to the idea):
>>
>> alloca :: Storable a => (Ptr a -> IO b) -> IO b
>>
>> This can be used to call into C code expecting pointer input or output
>> types to great effect:
>>
>> wrapperAroundForeignCode :: InputType -> IO OutputType
>> wrapperAroundForeignCode in =
>>  alloca $ \inPtr ->
>>  alloca $ outPtr -> do
>>    poke inPtr in
>>    c_call inPtr outPtr
>>    peek outPtr
>>
>> The functions 'peek' and 'poke' are from the Storable class, and I
>> used the 'alloca' function to allocate temporary storage for the
>> pointers I pass into C-land.
>>
>> Is there a particular problem you're trying to solve? We might be able
>> to offer more specific advice. The Storable and Foreign operations may
>> not even be the best solution to what you're trying to do.
>>
>
>
> Hey Antoine,
>
> Thanks for the clarity, it's very helpful. There is in fact a particular
> problem I'm trying to solve - persisting data structures. I'm a huge
> fan of Data.Vector.Storable.MMap, and I'm interested in other things
> like it - but i realize that the whole thing is built up/on/around
> storables, and building vectors with storables (read == peek, write ==
> poke, etc), because i'm trying to write the raw structures themselves
> to disk (via mmap).
>
> I am aware of Data.Binary, but I feel that this kind of serialization
> for the application I'm building would be too cumbersome considering the
> number of objects I'm dealing with (on the order of hundreds-of-millions
> to billions), especially considering that the application I'm building
> has some very nice pure-ish semantics (an append-only list). I'd
> like the application to able to simply load a file and interact with
> that memory - not have to load the file and then deserialize everything.
>
> If you have any suggestions here, or if anyone has any general feelings
> about the design or implementation of Data.Vector.Storable.MMap I'd be
> very interested in hearing them. Or about any ideas involving persisting
> native data structures in an append-only fashion, too. ;)
>

If you took the approach of Data.Vector.Storable.MMap, every time you
read an element out of the array you would be un-marshalling the
object from a pointer into a Haskell type - in effect, making a copy.
There are probably ways to do this for ByteStrings to make this copy
free, but that's about it.

So depending on your data and usage patterns, that might be a great
approach. Just rember that operations involving Storable make copies
of your data.

For large and complex types you would have a trade-off - each read
might be more expensive than otherwise, but depending on your usage
patterns you could save a lot on how much you keep in memory at a
given time.

If you're lucky, you might be able to write your Storable instance
such that you can take advantage of Haskell's laziness, so that common
operations only need to unmarhsall part of the object. But this might
be overkill.

Maybe you could email the maintainer of the vector package, to see if
they have used the 'MMap' backed vector, or have any feedback from
anyone that has. Most of what I've written is speculation - I've never
tried that sort of thing.

Take care,
Antoine

> Thanks,
>
>
> Tyler
>
>
>
>
>
>
>
>> Take care,
>> Antoine
>>
>>
>>>
>>> Is there a way that given just simply an Int I could obtain a Ptr from
>>> it, and then invoke the storable functions on it? Or for that matter,
>>> if I go and create some new data type, is there some generic
>>> underlying thing (ghc-only or otherwise) that would let me have a Ptr
>>> of it?
>>>
>>> Thanks,
>>>
>>>
>>> Tyler
>>>
>>> _______________________________________________
>>> Haskell-Cafe mailing list
>>> Haskell-Cafe at haskell.org
>>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>>>
>>
>