[Haskell-cafe] Correct parsers for bounded integral values

Stefan Klinger haskell at stefan-klinger.de
Mon Jul 21 15:42:48 UTC 2025


Hallo, and thanks for the discussion!

TL;DR: Jeff, my apporach would not violate your expectations

But many thanks for the consideration, it gave me new insight =)

Jeff Clites via Haskell-Cafe (2025-Jul-20, excerpt):
> GHC's treatment of integer-style numeric literals is part of the
> Haskell spec--it's not just a GHC implementation
> detail. Specifically, it treats them as unbounded precision
> Integer's, subsequently converted to other types via `fromInteger`.

>     ghci> x = 123456
>     ghci> x :: Int
>     123456
>     ghci> x :: Word8
>     64
>     ghci> :type x
>     x :: Num a => a

Slowly I think I maybe get what you mean.  You're entirely correct,
the docs [1] say

    An integer literal represents the application of the function
    fromInteger to the appropriate value of type Integer, so such
    literals have type (Num a) => a.

Hence, the GHC parser would have to use

    read :: String -> Integer

to parse the literal, otherwise we would not be getting an `Integer`
value.  This is one particular `read` function.  I imagine in your
example

    x = 123456

is equivalent to

    x = fromInteger (read "123456" :: Integer)

and the following `x :: Int` and `x :: Word8` choose the appropriate
`fromInteger` function.

They do *not* choose the `read` function.  It is still the one that
returns an unbounded `Integer`, and I do not want to get rid of, or
even touch that unbounded `read` function.  It is good.

But there is a *different* read function

    read :: String -> Word8

which is not used in the scenario above, and this is the buggy one
(and all its bounded cousins).

I did not realise this before as explicitly as I do now, but the
example

    > read "298" :: Word8
    42

chooses a different `read` function than would be used by the GHC
parser for

    > x = 298
    > x :: Word8
    42

which just happens to produce `42` as well, because `fromInteger`
*also* wraps around:

    > fromInteger (toInteger 298) :: Word8
    42

The `x` above is not of type `Word8`, it is a `Num a => a`, because
one can still retieve the original value from it:

    > x :: Integer
    298

To summarize: The GHC parser uses `read :: String -> Integer` to parse
literals of type `Integer`.  It would be completely unaffected by my
suggested modification of the *other* `read` functions, which only
concern the bounded integral types.

Hmmmmm.  Does this address your concerns?


About the more general approach you requested

Jeff Clites via Haskell-Cafe (2025-Jul-20, excerpt):
> This was really just meant to show my thought process, which was to
> realize that parsing a number with a limit implied by a datatype was
> just a specific case of a more general operation, and once you have
> that more general operation you can implement
> number-limited-by-a-datatype in terms of that.

Yes, it is probably a worthwhile effort to factor out the common idea
that I've implemented specifically for Parsec and Attoparsec, ideally
it would be usable for most parser libraries.  I'll reconsider the
hints at `scan`, thanks for that.

Actually, my own implementations were rather meant as proof of
concepts, they do have their own, questionable, peculiarities like
strictly forbidding leading `+`, etc.

I'll try to open a ticket on GHC's tracker, hoping to consolidate
discussion there.  And I do expect a quite some work ahead of me…

Cheers =)
Stefan



[1]: https://hackage.haskell.org/package/base-4.21.0.0/docs/Prelude.html#v:fromInteger


-- 
Stefan Klinger, Ph.D. -- computer scientist              o/X
http://stefan-klinger.de                                 /\/
https://github.com/s5k6                                    \
I prefer receiving plain text messages, not exceeding 32kB.


More information about the Haskell-Cafe mailing list