getting a Binary module into the standard libs
Simon Marlow
simonmar@microsoft.com
Thu, 14 Nov 2002 10:54:12 -0000
> > 3) I think we can all agree that we should buffer BinIOs. There are
> > a few questions, given this:
>=20
> > a) Should multiple threads be allowed to write the same BinHandle
> > simultaneously? If not, is an error thrown or is the behiour just
> > left "unspecified"?
> > b) Should multiple threads be allowed to read from the same
> > BinHandle simultaneously? If not, ...
> > c) Should one thread be allowed to write and another to read from
> > the same BH simultaneously? If not, ...
>=20
> I believe GHC has a reader-writer lock on Handles so the answer is
> that one thread blocks if another is already using it in a conflicting
> way.
>
> Basically, I suggest doing whatever normal file Handles do.
This is a tricky one. Doing whatever normal Handles do is the "right"
way to approach this, but I fear it might be expensive.
Handles have a single file pointer (if they have a file pointer at all),
a buffer, and some other state. The Handle itself is protected by a
lock, so that only one thread can access the state at a time.
Currently, a BinIO handle caches the file pointer for speed, and doesn't
protect this with a lock. BinIO handles might also need a cache. The
"right" thing to do is to push this inside the Handle - use the Handle's
buffer as the cache. Provide something like
hOpenBin :: FilePath -> OpenMode -> IO Handle
hPutBits :: Handle -> Int -> Word8 -> IO ()
hGetBits :: Handle -> Int -> IO Word8
hSeekBits :: Handle -> Integer -> IO ()
I don't know whether this would be acceptably fast or not. (I'll try to
do some perf measurements on BinIO vs. BinMem later today, that should
give us a rough idea).
What about BinMem? Currently a BinMem is basically a flat array and a
pointer. It has no lock; if you write or read from two threads
simultaneously you can get race conditions. However, even with a lock,
reading from two threads simultaneously isn't likely to be a good idea
because of the shared file pointer. This is why I suggested having
dupBin:
dupBin :: BinHandle -> IO BinHandle
which essentially gives you another file pointer to work with, so that
two threads can safely read the same BinHandle at different points.
(writing is still problematic - use BinIO if you want multithreaded
writing).
dupBin can be implemented for Handles, and hence BinIO too. It's fairly
straightforward and seems useful anyway.
Summary:
- reading/writing the same BinHandle from two threads isn't useful
unless the threads can have their own file pointers. =3D=3D> need
dupBin
- cacheing of the data in a BinIO should be done in the Handle,
unless that's too expensive.
(Hal: for now, just continue with what you had planned, if we decide to
make some of these changes we can refactor later).
Cheers,
Simon