[Haskell-cafe] Correct parsers for bounded integral values

Sun Jul 20 16:23:24 UTC 2025

On Sun, Jul 20, 2025 at 07:08:54AM -0800, Daniil Iaitskov wrote:

> one divMod and double cmp per digit can be replaced with left shift and
> single cmp per digit which is faster

But, sadly, at first blush not correct as written.  And the multiply is
not needed, instead, as in the ByteString code, one can take the fast
path when the old value is at most than 1/10th of the upper bound, fail
when strictly larger, and take a bit more care when exactly equal.

> CPU is optimized to work with word size values. Assuming word size is 64bit
> then majority of fixed types (Word8 - Word64) fits a register.
> 
>     {-# SPECIALIZE parse @Int #-}
> 
> parse :: forall a. Num a => Parsec Word64 -> Word64 -> a
> 
>    parse nextDigit s old =
>       nextDigit >>= \case
>          Nothing -> pure $ fromIntegral s
>          Just d -> do
>           let s' = 10 * s + d
>               new = s' `shiftL` (finiteBitSize s' - finiteBitSize (undefined @a))
>           if old > new then fail "overflow"
>           else parse nextDigit s' new

It looks like it might not always detect overflow.  Counter-example:

    30 * 10 + 1 = 301

which is 45 mod 256, which happens be larger than 30.  So the above
would presumably accept "301" returning 45 as a Word8.  Overflow in
addition of two positive numbers will produce an answer smaller than
either, but this is not the case with multiplication by 10.

See:

    https://hackage-content.haskell.org/package/bytestring-0.12.2.0/docs/src/Data.ByteString.Lazy.ReadInt.html#_readDecimal
    https://hackage-content.haskell.org/package/bytestring-0.12.2.0/docs/src/Data.ByteString.Lazy.ReadInt.html#_digits

-- 
    Viktor.  🇺🇦 Слава Україні!