[GHC] #12665: Make Read instances faster, and make them fail fast

Fri Oct 7 23:48:37 UTC 2016

#12665: Make Read instances faster, and make them fail fast
-------------------------------------+-------------------------------------
        Reporter:  dfeuer            |                Owner:  dfeuer
            Type:  feature request   |               Status:  new
        Priority:  high              |            Milestone:  8.2.1
       Component:  Core Libraries    |              Version:  8.0.1
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
 Type of failure:  Runtime           |  Unknown/Multiple
  performance bug                    |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by dfeuer):

 Replying to [comment:13 int-e]:

 > One can also store the bases in the stack itself, see `valInteger''` in
 the (updated) http://int-e.eu/~bf3/haskell/FromDigits.hs file. This causes
 some extra allocations, so the code becomes slower, but almost
 imperceptibly so.

 I wrote up something essentially equivalent to `Numeric.readDec` based on
 your latest code and the shape of the lexer:

 {{{#!hs
 decNat :: ReadP Natural
 decNat = valNat 10 19 $ \c ->
   let diff = charDiff c '0'
   in if diff < 10
      then Just diff
      else Nothing

 valNat :: Word -> Word -> (Char -> Maybe Word) -> ReadP Natural
 valNat base chunkSize terp = do
   s <- ReadP.look
   valNat' base chunkSize terp s
 {-# INLINE valNat #-}

 data Stack = SNil | SSkip Natural !Stack | SCons Natural !Natural !Stack

 {-# INLINE valNat' #-}
 valNat' :: Word -> Word -> (Char -> Maybe Word) -> [Char] -> ReadP Natural
 valNat' base chunkSize terp cs0@(c0 : _)
   | Just _ <- terp c0 = goChunks SNil 0 0 cs0
   where
     b1 :: Natural
     b1 = fromIntegral base * fromIntegral (base^(chunkSize-1))

     goChunks :: Stack -> Word -> Word -> [Char] -> ReadP Natural
     goChunks !s !l !chunk !ds
       | l == chunkSize = goChunks (step s (fromIntegral chunk)) 0 0 ds
     goChunks !s !l !chunk (c:cs)
       | Just d <- terp c
           = ReadP.get *> goChunks s (l+1) (chunk*base+d) cs
     goChunks !s !l !chunk _ =
        pure $ fromStack (fromIntegral chunk) (fromIntegral (base^l)) s

     step :: Stack -> Natural -> Stack
     step SNil d = SCons b1 d SNil
     step (SSkip b s) d = SCons b d s
     step (SCons b d' s) d = SSkip b (step' b s (d + d'*b))

     step' :: Natural -> Stack -> Natural -> Stack
     step' b SNil d = SCons (b*b) d SNil
     step' _ (SSkip b s) d = SCons b d s
     step' _ (SCons b d' s) d = SSkip b (step' b s (d + d' * b))

     fromStack :: Natural -> Natural -> Stack -> Natural
     fromStack d' _ SNil = d'
     fromStack d' b' (SSkip b s) = fromStack d' b' s
     fromStack d' b' (SCons b d s) = fromStack (d*b'+d') (b*b') s

 valNat' _ _ _ _ = ReadP.pfail

 -- We could write a parser more precisely imitating GHC's current
 -- `Read` instance by using `valNat` and then performing a look-ahead
 -- to check for illegal termination sequences involving `.` or `e`,
 -- but I'm really not convinced that we should.
 }}}

 This seems to be very fast indeed when used for base 10 with a chunk size
 of 19 (on a 64-bit system). Oddly, `Numeric.readDec` is much, much slower
 than `read`, which looks likely to be a `RULES` issue. That makes it a bit
 hard to compare the algorithms fairly. I haven't yet wired your simpler
 version up in quite this fashion, and I'm not actually sure if it can be
 wired up so, but it would be worth checking.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/12665#comment:14>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler