Correct parsers for bounded integral values

Mon Jul 21 15:56:46 UTC 2025

Thanks for the encouragement Rodrigo!  I'll follow the process and
hope to open a ticket soon.

Viktor Dukhovni (2025-Jul-21, excerpt):
> It is also fair to point out that once an Int or other bounded integral
> type is read, arithmetic with that type (addition, subtraction and
> multiplication) silently overflows.  And so silent overflow in `read`
> is not inconsistent with the type's semantics.

I see parsing as a boundary between an outside world (throwing text at
me) and an inside world, where I have programmed some algorithm.  As
programmer, it is my responsibility to ensure that the types are
chosen so that the algorithm works correctly, ideally on any accepted
input, i.e., I have to guarantee that no inadvertent overflow happens
in this inside world.  However, calculating away based on
misinterpreted input, will lead to invalid results.

Viktor Dukhovni (2025-Jul-21, excerpt):
> That said, if various middleware libraries hide overflows, because under
> the covers thay're using `read`, that could be a problem, so we do want
> the ecosystem at large to make sensible choices about when silent
> overflow may or may not be appropriate.  Perhaps that means having
> both wrapping and overflow-checked implementations available, and
> clear docs with each about its behaviour and the corresponding
> alternative.

I did not realise this clearly enough before, but have elaborated a
bit on Haskell-cafe [1].  We do have unbounded `read :: String ->
Integer` and silently overflowing `fromInteger :: Integer -> Word8`,
which can be combined if overflow is desired.  This follows the idea
to be explicit about dangerous things.  In addition, we have `read ::
String -> Word8` and company, which I'd like to fix.

> A few of quick observations about [2]:

Thank you =)

>     - It disallows expliccit leading "+" (just like "read", but perhaps
>       that should be tolerated).

Yes, it probably should not be that strict.  For my own projects I
assumed it easier to make it more forgiving later, than the other way
round.  There really should be consensus on whether or not leading `+`
or `0` should be allowed.  But these are fixes to make towards the
end, I guess.

>     - It disallows multiple leading zeros, perhaps these should be
>       tolerated.
>
>     - It disallows "-0", perhaps these should be tolerated, as well
>       as "-0000", "-000001", ...  (With lazy ByteStrings, which might
>       never terminate, there is a generous, but sensible limit on
>       the number of leading zeros allowed).

I ruled this out because I wanted a simple guarantee for termination.
Your idea of “generous, but sensible” sounds compelling, the leading
`0`s can be cosumed in constant space, we need not keep them.

>     - One way to avoid difficulties with handling negative minBound is
>       to parse signed values via the corresponding unsigned type, which
>       can accommodate `-minBound` as a positive value, and then negate
>       the final result.  This makse possible sharing the low-level
>       digit-by-digit code between the positive and negative cases.

How do you mean?  I did not get this “accommodate `-minBound` as a
positive value” right, my initial approach to use

    char '-' >> negate <$> parseUnsigned (negate minBound)

fails, exactly because the negation of the lower bound may not be
(read: is usually not) within the upper bound, and thus wraps around,
e.g., incorrectly `negate (minBound :: Int8)` → `-128` due to the
upper bound of `127`.

Viktor Dukhovni (2025-Jul-21, excerpt):
> If parsing of Integer and Natual is also in scope […]

No, not at all.  I have no reservations against `read` for the
unbounded types.  That should be left alone.

Cheers
Stefan

[1]: https://mail.haskell.org/pipermail/haskell-cafe/2025-July/137162.html
[2]: https://github.com/s5k6/robust-int

--
Stefan Klinger, Ph.D. -- computer scientist              o/X
http://stefan-klinger.de                                 /\/
https://github.com/s5k6                                    \
I prefer receiving plain text messages, not exceeding 32kB.