Making rule application less fragile
johan.tibell at gmail.com
Thu Mar 13 14:47:03 EDT 2008
I'm trying (for the first time ever) to use RULES pragmas to achieve
some nice speedups in my bytestring parsing library. The relevant code
in my library's module is:
-- The module imports Control.Applicative which containes 'many' and 'some'.
-- | The parser @satisfy p@ succeeds for any byte for which the
-- supplied function @p@ returns 'True'. Returns the byte that is
-- actually parsed.
satisfy :: (Word8 -> Bool) -> Parser Word8
satisfy p =
Parser $ \s@(S bs pos eof) succ fail ->
case S.uncons bs of
Just (b, bs') -> if p b
then succ b (S bs' (pos + 1) eof)
else fail s
Nothing -> if eof
then fail s
else IPartial $ \x ->
case x of
Just bs' -> retry (S bs' pos eof)
Nothing -> fail (S bs pos True)
where retry s' = unParser (satisfy p) s' succ fail
-- | @byte b@ parses a single byte @b at . Returns the parsed byte
-- (i.e. @b@).
byte :: Word8 -> Parser Word8
byte b = satisfy (== b)
-- Rewrite rules
satisfyMany :: (Word8 -> Bool) -> Parser S.ByteString
satisfyMany p = undefined -- More efficient implementation goes here.
satisfySome :: (Word8 -> Bool) -> Parser S.ByteString
satisfySome p = undefined -- More efficient implementation goes here.
"fmap/pack/many/satisfy" forall p.
fmap S.pack (many (satisfy p)) = satisfyMany p
"fmap/pack/some/satisfy" forall p.
fmap S.pack (some (satisfy p)) = satisfySome p
In another module where I use the library I have this code:
pHeaders :: Parser [(S.ByteString, S.ByteString)]
pHeaders = many header
header = liftA2 (,) fieldName (byte (c2w ':') *> spaces *> contents)
fieldName = liftA2 (S.cons) letter fieldChars
contents = liftA2 (S.append) (fmap S.pack $ some notEOL <* crlf)
(continuation <|> pure S.empty)
continuation = liftA2 (S.cons) ((c2w ' ') <$
some (oneOf (map c2w " \t"))) contents
-- It's important that all three of these definitions are kept on the
-- top level to have RULES fire correctly.
fieldChars = fmap S.pack $ many fieldChar
-- fieldChar = letter <|> digit <|> oneOf (map c2w "-_")
fieldChar = satisfy isFieldChar
isFieldChar b = (isDigit $ chr $ fromIntegral b) ||
(isAlpha $ chr $ fromIntegral b) ||
(b `elem` map c2w "-_")
I want the fieldChars use of 'fmap S.pack $ many fieldChar' to trigger
my rewrite rule "fmap/pack/many/satisfy" which it does in this case.
The trouble is that the rule only triggers when I make at least
fieldChars and fieldChar top-level definition and isFieldChar either a
named local definition in fieldChar or a top-level definition. If I
turn the predicate (isFieldChar) into to an anonymous lambda function
it doesn't trigger, if I make either fieldChars or fieldChars a local
defintion (in a where clause) of pHeaders it doesn't trigger. If I
make fieldChar a local definition in fieldChars it doesn't trigger,
It would be great if there was a way to make this a bit less fragile
and have the rule trigger more often as it is potentially a huge
performance win. I understand it's hard to guarantee that the rule
always triggers but now it triggers in rare cases.
More information about the Glasgow-haskell-users