[Haskell-cafe] Mutable data structures and asynchronous exceptions
Станислав Черничкин
schernichkin at gmail.com
Thu Sep 28 15:51:47 UTC 2017
Thank your for reply. I think I should clarify what exactly I'd like to
discuss.
The “data structures” I'm talking about are in general single-threaded
mutable containers (like mentioned hashtables, or like ArrayList in Java).
Such structures are not thread safe, yet it would be nice to have async
exception safety. The word “atomicity” I used in a sense mentioned here
https://en.wikipedia.org/wiki/Atomicity_(database_systems) : operation
either occurs or fails and data structure remains in previous state. In
many cases such behavior can be achieved without complex exception clean-up
routines.
Let me give an example. Consider something like ArrayList from Java (an
vector which can grow while elements added). I want to implement 'add'
action. The contract is straightforward – the action may either add element
to structure, possibly reallocating underlying memory buffer, writing
element at last position, incrementing element counter, or it may throw
OutOfMemory exception. But in the latter case the structure should stay
“undamaged”. This could be implemented as following:
if count_equals_capacity thenallocate_new_buffer (let's suppose it
garbage-collected)
copy_elements
update_buffer_pointer
update_capacity_variable
write_new_element_to_buffer
update_count_variable
This code does not contains any explicit exception handling but it
satisfies the contract. The only place there exception can occur is
allocate_new_buffer. In this case action will be interrupted before any
state modifications. All other operations are basically memory writes and
completely safe (assuming code correct and will not segfault).
Things become complicated in presence of async exceptions. Suppose async
exception raised between write_new_element_to_buffer and
update_count_variable. At first glance nothing wrong happed, but if the
buffer holds references, it will now contain a reference to some object,
preventing it from being GC-d, and this reference will be beyond buffer's
count value, because exception occurred before updating count variable, so
programmer will be completely unaware of it. But this still can be fixed by
masking exceptions in critical blocks. And we can defenelly implement all
of this in the IO monad.
The question is how to write “monad polymorhic” code. i.e. code, which can
run both in IO and ST. Mutable data structures benefit from being “monad
polymorhic”. Most Haskell mutable containers (vectors, hashtables,
impure-containers) are build on PrimState monad allowing them run both in
IO and ST. But they seems just ignore the fact that async exception may
corrupt state. Some of them ( e.g.
https://hackage.haskell.org/package/impure-containers-0.4.0/docs/src/Data-ArrayList-Generic.html#ArrayList
) seem even ignore that unsafeGrow may throw OutOfMemory (though attempting
to recover from OutOfMemory may be bad idea itself).
2017-09-28 15:45 GMT+03:00 Michael Snoyman <michael at snoyman.com>:
> > Since exception can arise at any point, it is not possible to guarantee
> atomicity of operation, hence mutable data structure may remain in
> incorrect state in case of interruption.
>
> Even if async exceptions didn't exist, we couldn't guarantee atomicity in
> general without specifically atomic functions (like atomicModifyIORef or
> STM), since another thread may access the data concurrently and create a
> data race.
>
> If you're only talking about single-threaded cases—of which ST is
> _basically_ a subset[1]—I don't think you're really worried about
> _atomicity_, but about exception safety. Exception safety goes beyond async
> exceptions, since almost all IO actions can throw some form of synchronous
> exception. For those cases, you can use one of the many exception-cleanup
> functions, like finally, onException, bracket, or bracketOnError.
>
> It's true that those functions don't work inside ST, but I'd argue you
> don't need them to. The expected behavior of code that receives an async
> exception is to (1) clean up after itself and (2) rethrow the exception.
> But as ST blocks are supposed to be free of externally-visible side
> effects, worrying about putting its variables back into some safe state is
> unnecessary[2].
>
> To summarize:
>
> * If you need true atomicity, you're in IO and dealing with multiple
> threads. I'd recommend sticking with STM unless you have a strong reason to
> do otherwise.
> * If you are single threaded and in IO, you can get away with non-STM
> stuff more easily, and need to make sure you're using exception-aware
> functions.
> * If you're inside ST, make sure any resources you acquire are cleaned up
> correctly, but otherwise you needn't worry about exceptions.
>
> Also, you may be interested in reading the documentation for
> safe-exceptions[3], which talks more about async exception safety.
>
> [1] I say basically since you'd have to pull out unsafe functions to fork
> a thread that has access to an STVar or similar, though it could be done.
> [2] If you're doing something like binding to a C library inside ST, you
> may have some memory cleanup to perform, but the STVars and other data
> structures should never be visible again.
> [3] https://haskell-lang.org/library/safe-exceptions
>
> On Thu, Sep 28, 2017 at 2:00 PM, Станислав Черничкин <
> schernichkin at gmail.com> wrote:
>
>> It's quite hard to implement mutable data structures in presence of
>> asynchronous exceptions. Since exception can arise at any point, it is not
>> possible to guarantee atomicity of operation, hence mutable data structure
>> may remain in incorrect state in case of interruption. One can certainly
>> use maskAsyncExceptions# and friends to protect critical regions, but
>> masking function are living in IO, mutable data structures on other hand
>> trend to be state-polymorphic (to allow it usage in ST).
>>
>> This lead to conflicting requirements:
>> - One should not care about asynchronous exceptions inside ST (it is not
>> possible to catch exception in ST, hence not possible to use something in
>> invalid state). More over, it is not even possible to do write
>> “exception-safe” code, because masking functions not available.
>> - One should provide accurate masking then using same data structures in
>> IO.
>>
>> So I want do discuss several questions topics on this case.
>>
>> 1. Impact. Are async exceptions really common? Would not be easier to
>> say: “ok, things can go bad if you combine async exceptions with mutable
>> data structures, just don't do it”.
>>
>> 2. Documentation. Should library authors explicitly mention async
>> exceptions safety? For example https://hackage.haskell.org/pa
>> ckage/hashtables – is it async exceptions safe when used in IO? Or even
>> worse https://hackage.haskell.org/package/ghc-prim-0.5.1.0/docs/GH
>> C-Prim.html#v:resizeMutableByteArray-35- - what will happened in case of
>> async exception? This functions is sate-polimorphic, will it implicitly
>> mask exceptions if used from IO?
>>
>> 3. Best practices. How should we deal with problem? Is creating separate
>> versions of code for ST and IO is the only way? Probably it is possible to
>> add “mask” to something like https://hackage.haskell.org/pa
>> ckage/primitive-0.6.2.0/docs/Control-Monad-Primitive.html#t:PrimMonad
>> emit mask in IO instance and NOOP in ST version? Or maybe somebody know
>> better patterns for async exeption safe code?
>>
>> --
>> Sincerely, Stanislav Chernichkin.
>>
>> _______________________________________________
>> Haskell-Cafe mailing list
>> To (un)subscribe, modify options or view archives go to:
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>> Only members subscribed via the mailman list are allowed to post.
>>
>
>
--
Sincerely, Stanislav Chernichkin.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20170928/39388772/attachment.html>
More information about the Haskell-Cafe
mailing list