[Haskell-cafe] How to make asynchronous I/O composable and safe?

Tue Jan 17 10:42:33 CET 2012

This is an interesting problem, I think I might incorporate parts of it 
into the next revision of my Concurrent Haskell tutorial.

It sounds like you're getting overwhelmed by several different problems, 
and dealing with them separately would probably help.  e.g. you want 
some infrastructure to run two threads and send an exception to one 
whenever the other one dies.  You also want to be able to avoid a thread 
being interrupted while performing an operation that should be atomic, 
like sending a message - this is slightly tricky, because there's a 
tradeoff between keeping the thread responsive and not interrupting an 
operation.  The biggest hammer is maskUninterruptible, which can be used 
if all else fails.

Whether Network.TLS supports simultaneous read and write I don't know, 
but you can examine the code or talk to the maintainer.  If it doesn't, 
adding a layer of locking is straightforward, and doesn't increase 
overall complexity (it's localised).

Cheers,
	Simon

On 14/01/2012 05:24, Joey Adams wrote:
> I'm not happy with asynchronous I/O in Haskell.  It's hard to reason
> about, and doesn't compose well.  At least in my code.
>
> I'm currently trying to build a networking layer for my application
> using Network.TLS.  Here is a rather minimalist API:
>
>     newtype Connection = Connection (TLSCtx Handle)
>
>     connectClient :: Handle         -- ^ Connection handle, as returned
> by 'connectTo'
>                   ->  X509           -- ^ TLS certificate (i.e. public key)
>                   ->  IO Connection
>
>     connectServer :: Handle         -- ^ Connection handle, as returned
> by 'accept'
>                   ->  X509           -- ^ TLS certificate (i.e. public key)
>                   ->  TLS.PrivateKey -- ^ TLS private key
>                   ->  IO Connection
>
>     close :: Connection ->  IO ()
>
>     sendMessage :: Connection ->  Message ->  IO ()
>
>     recvMessage :: Connection ->  ByteString ->  IO (Message, ByteString)
>
> The module provides little more than connection initialization and
> message serialization.  I don't try to use locks or STM to multiplex
> the connection or, in the case of recvMessage, hide connection state.
> I just be sure to only use sendMessage in one thread at a time, only
> use recvMessage in one thread at a time, and marshal the "extra bytes"
> parameter of recvMessage from call to call (with the help of StateT).
>
> I wrote a simple "chat server" to test it.  The client turned out okay:
>
>     main :: IO ()
>     main = do
>         cert<- getCertificate
>         handle<- connectTo "localhost" (PortNumber 1337)
>         conn<- connectClient handle cert
>         _<- forkIO $ forever $ do
>             s<- getLine
>             sendMessage conn $ TestMessage s
>         forever $ flip runStateT B.empty $ do
>             msg<- StateT $ recvMessage conn
>             case msg of
>                 TestMessage s ->
>                     liftIO $ putStrLn s
>                 _ ->
>                     liftIO $ hPrintf stderr
>                         "Warning: unrecognized message from server: %s\n"
>                         (messageTypeName msg)
>
> The only glaring problem is that, if the user presses Ctrl+D, the
> forked (sending) thread dies, but the main (receiving) thread lingers.
>   I'd have to add exception handlers to ensure that when one thread
> dies, the other thread dies too.
>
> However, the server is an abomination (see attachment).
>
> Unfortunately, it's not as simple as "spawn one thread per client".
> We need at least two threads, one to listen for messages from the
> client, and another to send messages to the client.  GHC won't let us
> simultaneously, in the same thread, wait for input from a connection
> and wait for an STM transaction to succeed.
>
> Another source of complexity is: what if we throw an exception at a
> thread while it is in the middle of sending a packet?  Then we can't
> shut down the connection properly (i.e. Network.TLS.bye), because the
> receiver might think the close_notify packet is part of the
> interrupted packet.
>
> Having a thread for each client is good, as it:
>
>   * Lets us think about each client separately.  No need to turn our
> code inside out or write one big loop that juggles all the clients.
>
>   * Isolates exceptions.  If sendMessage or recvMessage throws an
> exception, it doesn't bring the whole server down.
>
> On the other hand, having multiple threads interact with a single
> client is hard to think about:
>
>   * We have to synchronize the threads (e.g. when one dies, kill the other one)
>
>   * Multiple places where an exception can arise
>
>   * Can multiple threads interact with the connection handle simultaneously?
>
> So why don't I make my connection API handle some of this?  Well, I
> tried.  There are so many ways to do it, and I couldn't find a way
> that simplified usage much.  The approach used by Handle and by
> Network.TLS is to use MVars and IORefs to ensure that, if two threads
> access the same connection, the connection doesn't become totally
> corrupt.  If I do the same, then I'll have *three* layers of locking
> under the hood.
>
> Worse, the locking done by Handle and Network.TLS doesn't guarantee
> much.  I don't know if it's safe to have one thread sending and
> another thread receiving.  Especially in the case of Network.TLS,
> where 'recvData' automatically handshakes in some cases, which sends
> packets.  Since I don't know how much thread safety to expect, I can't
> write networking code and know for sure that it is safe.
>
> I'm certainly not protected from interleaved data if multiple threads
> send on the same handle.  For example:
>
>      import Control.Concurrent
>      import System.IO
>
>      main :: IO ()
>      main = do
>          hSetBuffering stdout NoBuffering
>          _<- forkIO $ putStrLn "One sentence."
>          putStrLn "Another sentence."
>
> produces:
>
>      AnOonteh esre nsteenntceen.c
>      e.
>
> That is, I can't rely on putStrLn being "atomic".  To produce
> intelligible output (without changing the buffering mode), I have to
> "lock" the output each time I write something.  putStrLn doesn't do it
> for me.
>
> === Summary ===
>
> In Haskell, sound logic and a great type system lead to elegant,
> composable code in a variety of domains, such as:
>
>   * Expression evaluation
>   * Parsing
>   * Concurrent programming (thanks to STM)
>
> Asynchronous I/O is tricky.  However, Haskell currently does little to
> alleviate the complexity (at least for me).
>
> How can we structure network protocol APIs so that they stack well
> (e.g. only lock once, rather than locking each layer's connection
> state)?  How can we deal with I/O errors without having to think about
> them at every turn?
>
> For now, how can I structure my application's communication API so
> it's less messy to use?
>
> Thanks,
> - Joey
>
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe