[Haskell-cafe] Re: Abstraction leak

Donald Bruce Stewart dons at cse.unsw.edu.au
Wed Jul 4 20:56:34 EDT 2007


drtomc:
> On 7/4/07, Donald Bruce Stewart <dons at cse.unsw.edu.au> wrote:
> >Can we do a cheap bytestring binding to libxml, to avoid any initial
> >String processing?
> 
> For my part, it's not too big an issue. A version of HaXml or at least
> Parsec built on top of ByteString would be a good start. I know there
> was a SoC for the latter, though I have not looked to see where it
> ended up.
> 
> Actually, if you were looking for a good bit of abstraction to build
> how's this? It would be *really* nice to do all my IO with mmap so my
> program isn't hit by the buffer duplication problem[*].  The kind of
> API I have in mind is something like:
> 
> data Mapping -- abstract
> 
> mmap :: Handle {- or Fd, perhaps -} -> Offset -> Length -> IO Mapping
> 
> read :: Mapping -> Offset -> Length -> IO ByteString
> 
> write :: Mapping -> Offset -> ByteString -> IO ()
> 
> munmap :: Mapping -> IO () -- maybe just use a finalizer

Oh, we should really restore the mmapFile interface in Data.ByteString.
Currently its commented out to help out windows people.

And the current implementation does indeed use finalisers to handle the
unmapping.

> This API has the problem that read in particular still has to do
> copying. If you think about the binary XML stuff I mentioned before,
> you'll see that it would be really nice if I could mmap in a record
> and parse it without having to do any copying, or at least to defer
> any copying with a copy-on-write scheme. Doing a simple implementation
> of read that just put a ByteString wrapper around the mmapped memory
> would be nice and efficient, but would suffer from the problem that if
> something changed that bit of the underlying file, things would break.
> Maybe it's just not possible to finesse this one.

Yep. The current impl is:

    mmapFile :: FilePath -> IO ByteString
    mmapFile f = mmap f >>= \(fp,l) -> return $! PS fp 0 l

    mmap :: FilePath -> IO (ForeignPtr Word8, Int)
    mmap = do
         ...
         p  <- mmap l fd
         fp <- newForeignPtr p unmap -- attach unmap finaliser
         return fp

Maybe I should just stick this in the unix package.

-- Don


More information about the Haskell-Cafe mailing list