Correct parsers for bounded integral values

Mon Jul 21 18:54:06 UTC 2025

For base introducing a new function `readBoundedNum :: (Bounded a, Num 
a) => String -> a` or similar seems very reasonable to me.
Changing "read" to throw an exception or similar after decades less so.

On 20/07/2025 22:08, Viktor Dukhovni wrote:
> On Sun, Jul 20, 2025 at 09:12:20PM +0200, Stefan Klinger wrote:
>
>> I'd like to bring to your attention a discussion that I have started
>> over at Haskell-cafe [1].  I was complaining about the silent overflow
>> of parsers for bounded integers:
>>
>>      > read "298" :: Word8
>>      42
> FWIW, there haven't AFAIK any complaints about ByteString's readInt,
> readWord, readInteger, readNatural and various sized variants having
> overflow checks.  But these have always been more like `reads` than
> `read`, returning `Maybe (a, ByteString)`, so perhaps somewhat more
> oriented towards detecting unexpected excess input, as well as for
> some time now range overflow.  So there's some precedent for overflow
> checking, but...
>
> It is also fair to point out that once an Int or other bounded integral
> type is read, arithmetic with that type (addition, subtraction and
> multiplication) silently overflows.  And so silent overflow in `read`
> is not inconsistent with the type's semantics.
>
> If converting strings to numbers is in support of string-oriented
> network protocols (e.g. the SIZE ESMTP extension), then one really
> should make an effort to avoid silent overflow, but in that context the
> various ByteString read methods are already available.
>
> That said, if various middleware libraries hide overflows, because under
> the covers thay're using `read`, that could be a problem, so we do want
> the ecosystem at large to make sensible choices about when silent
> overflow may or may not be appropriate.  Perhaps that means having
> both wrapping and overflow-checked implementations available, and
> clear docs with each about its behaviour and the corresponding
> alternative.
>
>> I find this unsatisfying, and I have demonstrated a solution [2] that
>> seems correct and performant.
> A few of quick observations about [2]:
>
>      - It disallows expliccit leading "+" (just like "read", but perhaps
>        that should be tolerated).
>
>      - It disallows multiple leading zeros, perhaps these should be
>        tolerated.
>
>      - It disallows "-0", perhaps these should be tolerated, as well
>        as "-0000", "-000001", ...  (With lazy ByteStrings, which might
>        never terminate, there is a generous, but sensible limit on
>        the number of leading zeros allowed).
>
>      - One way to avoid difficulties with handling negative minBound is
>        to parse signed values via the corresponding unsigned type, which
>        can accommodate `-minBound` as a positive value, and then negate
>        the final result.  This makse possible sharing the low-level
>        digit-by-digit code between the positive and negative cases.
>
> If parsing of Integer and Natual is also in scope, I would expect that
> it avoids doing multi-precision arithmetic for each digit, parsing
> groups of digits into ~Word sized blocks, and merge the blocks
> hierarchically with only a logarithmic number of MP multiplies.
>