[Haskell-cafe] Mutable data structures and asynchronous exceptions

Thu Sep 28 19:55:48 UTC 2017

If you are in ST, you can not modify anything externally visible without
using unsafe functions. If an exception occurs at any point, your
changes would remain in some broken state, but there would be no
reference to it, so they are just garbage collected and nothing bad
happens. If you need externally visible changes, you have to use IO, but
then you also have the full arsenal of exception handling functions at
your disposal. If you write code which is polymorphic and can either
work in IO or ST, it can not have any visible side effects and thus you
can ignore any exceptions in it (because you could runST it in
completely pure code).

If you think of your array list, lets look at possible signatures for
adding an element:

addPure :: ArrayList a -> a -> ArrayList a

Clearly this just copies the whole array every time, there is nothing
mutable here.

addST :: ArrayList s a -> a -> ST s ()

This one is mutable, but you can never get out of ST with this
ArrayList. While you are in ST, it doesn't matter if an async exception
interrupts you, because you will throw away the result of the ST action
anyway (and thus your broken ArrayList).

addST' :: ArrayList a -> a -> ST s (ArrayList a)

Has to copy the whole array because you can implement addPure with this
and runST.

addIO :: ArrayList a -> a -> IO ()

This can modify the list, but it can (and has to) also handle
exceptions. This is the only one which Java provides.

Regarding the monadic polymorphic (you are talking about MonadPrim,
right?) functions: They can not handle exceptions, because they might be
used in an ST context. But as stated earlier, if you compose them to
another action in some MonadPrim, it will be exception safe because you
can just apply runST to it, constraining MonadPrim to ST and getting a
pure value out of it (and such a value never needs to handle exceptions).

Of course, all this changes as soon as you use unsafeThaw in ST without
proving that you have the ONLY reference to that buffer/array/...

On 09/28/2017 05:51 PM, Станислав Черничкин wrote:
> Thank your for reply. I think I should clarify what exactly I'd like to
> discuss.
> 
> The “data structures” I'm talking about are in general single-threaded
> mutable containers (like mentioned hashtables, or like ArrayList in
> Java). Such structures are not thread safe, yet it would be nice to have
> async exception safety. The word “atomicity” I used in a sense mentioned
> here https://en.wikipedia.org/wiki/Atomicity_(database_systems) :
> operation either occurs or fails and data structure remains in previous
> state. In many cases such behavior can be achieved without complex
> exception clean-up routines.
> 
> Let me give an example. Consider something like ArrayList from Java (an
> vector which can grow while elements added). I want to implement 'add'
> action. The contract is straightforward – the action may either add
> element to structure, possibly reallocating underlying memory buffer,
> writing element at last position, incrementing element counter, or it
> may throw OutOfMemory exception. But in the latter case the structure
> should stay “undamaged”. This could be implemented as following:
> 
> if count_equals_capacity thenallocate_new_buffer (let's suppose it
> garbage-collected)
> 
> copy_elements
> 
> update_buffer_pointer
> 
> update_capacity_variable
> 
> write_new_element_to_buffer
> 
> update_count_variable
> 
> 
> This code does not contains any explicit exception handling but it
> satisfies the contract. The only place there exception can occur is
> allocate_new_buffer. In this case action will be interrupted before any
> state modifications. All other operations are basically memory writes
> and completely safe (assuming code correct and will not segfault).
> 
> Things become complicated in presence of async exceptions. Suppose async
> exception raised between write_new_element_to_buffer and
> update_count_variable. At first glance nothing wrong happed, but if the
> buffer holds references, it will now contain a reference to some object,
> preventing it from being GC-d, and this reference will be beyond
> buffer's count value, because exception occurred before updating count
> variable, so programmer will be completely unaware of it. But this still
> can be fixed by masking exceptions in critical blocks. And we can
> defenelly implement all of this in the IO monad.
> 
> The question is how to write “monad polymorhic” code. i.e. code, which
> can run both in IO and ST. Mutable data structures benefit from being
> “monad polymorhic”. Most Haskell mutable containers (vectors,
> hashtables, impure-containers) are build on PrimState monad allowing
> them run both in IO and ST. But they seems just ignore the fact that
> async exception may corrupt state. Some of them ( e.g.
> https://hackage.haskell.org/package/impure-containers-0.4.0/docs/src/Data-ArrayList-Generic.html#ArrayList
> ) seem even ignore that unsafeGrow may throw OutOfMemory (though
> attempting to recover from OutOfMemory may be bad idea itself).
> 
> 
> 2017-09-28 15:45 GMT+03:00 Michael Snoyman <michael at snoyman.com
> <mailto:michael at snoyman.com>>:
> 
>     > Since exception can arise at any point, it is not possible to guarantee atomicity of operation, hence mutable data structure may remain in
>     incorrect state in case of interruption.
> 
>     Even if async exceptions didn't exist, we couldn't guarantee
>     atomicity in general without specifically atomic functions (like
>     atomicModifyIORef or STM), since another thread may access the data
>     concurrently and create a data race.
> 
>     If you're only talking about single-threaded cases—of which ST is
>     _basically_ a subset[1]—I don't think you're really worried about
>     _atomicity_, but about exception safety. Exception safety goes
>     beyond async exceptions, since almost all IO actions can throw some
>     form of synchronous exception. For those cases, you can use one of
>     the many exception-cleanup functions, like finally, onException,
>     bracket, or bracketOnError.
> 
>     It's true that those functions don't work inside ST, but I'd argue
>     you don't need them to. The expected behavior of code that receives
>     an async exception is to (1) clean up after itself and (2) rethrow
>     the exception. But as ST blocks are supposed to be free of
>     externally-visible side effects, worrying about putting its
>     variables back into some safe state is unnecessary[2].
> 
>     To summarize:
> 
>     * If you need true atomicity, you're in IO and dealing with multiple
>     threads. I'd recommend sticking with STM unless you have a strong
>     reason to do otherwise.
>     * If you are single threaded and in IO, you can get away with
>     non-STM stuff more easily, and need to make sure you're using
>     exception-aware functions.
>     * If you're inside ST, make sure any resources you acquire are
>     cleaned up correctly, but otherwise you needn't worry about exceptions.
> 
>     Also, you may be interested in reading the documentation for
>     safe-exceptions[3], which talks more about async exception safety.
> 
>     [1] I say basically since you'd have to pull out unsafe functions to
>     fork a thread that has access to an STVar or similar, though it
>     could be done.
>     [2] If you're doing something like binding to a C library inside ST,
>     you may have some memory cleanup to perform, but the STVars and
>     other data structures should never be visible again.
>     [3] https://haskell-lang.org/library/safe-exceptions
>     <https://haskell-lang.org/library/safe-exceptions>
> 
>     On Thu, Sep 28, 2017 at 2:00 PM, Станислав Черничкин
>     <schernichkin at gmail.com <mailto:schernichkin at gmail.com>> wrote:
> 
>         It's quite hard to implement mutable data structures in presence
>         of asynchronous exceptions. Since exception can arise at any
>         point, it is not possible to guarantee atomicity of operation,
>         hence mutable data structure may remain in incorrect state in
>         case of interruption. One can certainly use maskAsyncExceptions#
>         and friends to protect critical regions, but masking function
>         are living in IO, mutable data structures on other hand trend to
>         be state-polymorphic (to allow it usage in ST).
> 
>         This lead to conflicting requirements: 
>         - One should not care about asynchronous exceptions inside ST
>         (it is not possible to catch exception in ST, hence not possible
>         to use something in invalid state). More over, it is not even
>         possible to do write “exception-safe” code, because masking
>         functions not available.
>         - One should provide accurate masking then using same data
>         structures in IO.
> 
>         So I want do discuss several questions topics on this case.
> 
>         1. Impact. Are async exceptions really common? Would not be
>         easier to say: “ok, things can go bad if you combine async
>         exceptions with mutable data structures, just don't do it”. 
> 
>         2. Documentation. Should library authors explicitly mention
>         async exceptions safety? For example
>         https://hackage.haskell.org/package/hashtables
>         <https://hackage.haskell.org/package/hashtables> – is it async
>         exceptions safe when used in IO? Or even worse
>         https://hackage.haskell.org/package/ghc-prim-0.5.1.0/docs/GHC-Prim.html#v:resizeMutableByteArray-35-
>         <https://hackage.haskell.org/package/ghc-prim-0.5.1.0/docs/GHC-Prim.html#v:resizeMutableByteArray-35->
>         - what will happened in case of async exception? This functions
>         is sate-polimorphic, will it implicitly mask exceptions if used
>         from IO?
> 
>         3. Best practices. How should we deal with problem? Is creating
>         separate versions of  code for ST and IO is the only way?
>         Probably it is possible to add “mask” to something like
>         https://hackage.haskell.org/package/primitive-0.6.2.0/docs/Control-Monad-Primitive.html#t:PrimMonad
>         <https://hackage.haskell.org/package/primitive-0.6.2.0/docs/Control-Monad-Primitive.html#t:PrimMonad>
>         emit mask in IO instance and NOOP in ST version? Or maybe
>         somebody know better patterns for async exeption safe code?
> 
>         -- 
>         Sincerely, Stanislav Chernichkin.
> 
>         _______________________________________________
>         Haskell-Cafe mailing list
>         To (un)subscribe, modify options or view archives go to:
>         http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>         <http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe>
>         Only members subscribed via the mailman list are allowed to post.
> 
> 
> 
> 
> 
> -- 
> Sincerely, Stanislav Chernichkin.
> 
> 
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20170928/13b5e44c/attachment.sig>