[Haskell-cafe] Correct parsers for bounded integral values
Stefan Klinger
haskell at stefan-klinger.de
Mon Jul 21 15:42:48 UTC 2025
Hallo, and thanks for the discussion!
TL;DR: Jeff, my apporach would not violate your expectations
But many thanks for the consideration, it gave me new insight =)
Jeff Clites via Haskell-Cafe (2025-Jul-20, excerpt):
> GHC's treatment of integer-style numeric literals is part of the
> Haskell spec--it's not just a GHC implementation
> detail. Specifically, it treats them as unbounded precision
> Integer's, subsequently converted to other types via `fromInteger`.
> ghci> x = 123456
> ghci> x :: Int
> 123456
> ghci> x :: Word8
> 64
> ghci> :type x
> x :: Num a => a
Slowly I think I maybe get what you mean. You're entirely correct,
the docs [1] say
An integer literal represents the application of the function
fromInteger to the appropriate value of type Integer, so such
literals have type (Num a) => a.
Hence, the GHC parser would have to use
read :: String -> Integer
to parse the literal, otherwise we would not be getting an `Integer`
value. This is one particular `read` function. I imagine in your
example
x = 123456
is equivalent to
x = fromInteger (read "123456" :: Integer)
and the following `x :: Int` and `x :: Word8` choose the appropriate
`fromInteger` function.
They do *not* choose the `read` function. It is still the one that
returns an unbounded `Integer`, and I do not want to get rid of, or
even touch that unbounded `read` function. It is good.
But there is a *different* read function
read :: String -> Word8
which is not used in the scenario above, and this is the buggy one
(and all its bounded cousins).
I did not realise this before as explicitly as I do now, but the
example
> read "298" :: Word8
42
chooses a different `read` function than would be used by the GHC
parser for
> x = 298
> x :: Word8
42
which just happens to produce `42` as well, because `fromInteger`
*also* wraps around:
> fromInteger (toInteger 298) :: Word8
42
The `x` above is not of type `Word8`, it is a `Num a => a`, because
one can still retieve the original value from it:
> x :: Integer
298
To summarize: The GHC parser uses `read :: String -> Integer` to parse
literals of type `Integer`. It would be completely unaffected by my
suggested modification of the *other* `read` functions, which only
concern the bounded integral types.
Hmmmmm. Does this address your concerns?
About the more general approach you requested
Jeff Clites via Haskell-Cafe (2025-Jul-20, excerpt):
> This was really just meant to show my thought process, which was to
> realize that parsing a number with a limit implied by a datatype was
> just a specific case of a more general operation, and once you have
> that more general operation you can implement
> number-limited-by-a-datatype in terms of that.
Yes, it is probably a worthwhile effort to factor out the common idea
that I've implemented specifically for Parsec and Attoparsec, ideally
it would be usable for most parser libraries. I'll reconsider the
hints at `scan`, thanks for that.
Actually, my own implementations were rather meant as proof of
concepts, they do have their own, questionable, peculiarities like
strictly forbidding leading `+`, etc.
I'll try to open a ticket on GHC's tracker, hoping to consolidate
discussion there. And I do expect a quite some work ahead of me…
Cheers =)
Stefan
[1]: https://hackage.haskell.org/package/base-4.21.0.0/docs/Prelude.html#v:fromInteger
--
Stefan Klinger, Ph.D. -- computer scientist o/X
http://stefan-klinger.de /\/
https://github.com/s5k6 \
I prefer receiving plain text messages, not exceeding 32kB.
More information about the Haskell-Cafe
mailing list