[GHC] #12665: Make Read instances faster, and make them fail fast
GHC
ghc-devs at haskell.org
Wed Oct 5 18:55:01 UTC 2016
#12665: Make Read instances faster, and make them fail fast
-------------------------------------+-------------------------------------
Reporter: dfeuer | Owner:
Type: feature | Status: new
request |
Priority: high | Milestone: 8.2.1
Component: Core | Version: 8.0.1
Libraries |
Keywords: | Operating System: Unknown/Multiple
Architecture: | Type of failure: Runtime
Unknown/Multiple | performance bug
Test Case: | Blocked By:
Blocking: | Related Tickets:
Differential Rev(s): | Wiki Page:
-------------------------------------+-------------------------------------
At present, the `Read` instances for standard types are generally written
''as though'' `Read` were a typical parser for a programming language.
However, it most assuredly is not. Specifically, we currently follow the
pattern of lexing and then parsing. Lexing looks at the string and
attempts to identify the next token, whatever it may be. But for `Read`,
we don't need that. Thanks to the types, we have a very clear sense of
what characters we expect to encounter. I've started to sketch out an
improvement, which reads `Int` and `Word` around seven times as fast (9.5
times as fast with parentheses and negation), and will fail immediately on
something like `read (fix ('a':)) :: Int` rather than going into an
infinite loop.
To begin with, modify the definition of `paren` thus:
{{{#!hs
expectP' :: Char -> ReadPrec ()
expectP' c = lift (expect c)
{-# INLINE expectP' #-}
paren :: ReadPrec a -> ReadPrec a
-- ^ @(paren p)@ parses \"(P0)\"
-- where @p@ parses \"P0\" in precedence context zero
paren p = do expectP' '('
x <- reset p
expectP' ')'
return x
}}}
This allows fast failure when looking for parentheses, so we don't have to
scan to the end of the first token (whatever it may be) before concluding
that it is not `'('`.
Now we can parse `Word` and `Int` very efficiently. I had to specialize
earlier than I wanted to convince GHC that I don't want to convert through
`Integer`. I'm not sure why the `fromIntegral` rule doesn't fire reliably
around here. The code below (temporarily) uses the current definition for
base 16, because that's a bit fussy; I'll rewrite it soon.
{{{#!hs
charDiff :: Char -> Char -> Word
charDiff c1 c2 = fromIntegral (ord c1 - ord c2)
{-# INLINE charDiff #-}
readHexOct :: ReadP Word
readHexOct = do
_ <- ReadP.char '0'
baseId <- lexBaseChar
case baseId of
Oct -> readBaseP 8
Hex -> L.readHexP
{-# INLINE readHexOct #-}
data BaseId = Oct | Hex
lexBaseChar :: ReadP BaseId
lexBaseChar = do { c <- ReadP.get;
case c of
'o' -> pure Oct
'O' -> pure Oct
'x' -> pure Hex
'X' -> pure Hex
_ -> ReadP.pfail }
readWord :: ReadP Word
readWord = readNumber (readHexOct ReadP.<++ readBaseP 10)
readInt :: ReadP Int
readInt = fromIntegral <$> readWord
readBaseP :: Integral a => Word -> ReadP a
readBaseP !base = do
c <- ReadP.get
let diff = charDiff c '0'
if diff < base
then readBaseP' base (fromIntegral diff)
else ReadP.pfail
{-# INLINE readBaseP #-}
readBaseP' :: Integral a => Word -> a -> ReadP a
readBaseP' !base !acc0 = ReadP.look >>= go acc0
where
go !acc (c:cs) | diff < base = ReadP.get *> go (fromIntegral base *
acc + fromIntegral diff) cs
where diff = charDiff c '0'
go !acc _ = pure acc
{-# INLINE readBaseP' #-}
readNumber :: Num a => ReadP.ReadP a -> ReadPrec a
readNumber p = parens $ do
cs <- lift skipSpaces *> look
case cs of
('-': _) -> get *> lift (skipSpaces *> (negate <$> p))
_ -> lift $ skipSpaces *> p
{-# INLINE readNumber #-}
}}}
Other `Word` and `Int`-like types can be built on top of that foundation.
I haven't yet attempted to deal with other instances, but I think there
are probably a lot of opportunities for similar improvements.
WARNING: I know very little about parsing, and less about `ReadPrec`. It's
conceivable that I've made some semantic errors here, although I don't
think I have.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/12665>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list