Making rule application less fragile
Don Stewart
dons at galois.com
Thu Mar 13 18:24:39 EDT 2008
Hey Johan,
The main thing to remember is that anything you wish to match on
in a rule needs to not be inlined in the first pass.
So to match "many" or "satisfy" robustly, you'll need:
{-# NOINLINE [1] many #-}
For example.
johan.tibell:
> Hi,
>
> I'm trying (for the first time ever) to use RULES pragmas to achieve
> some nice speedups in my bytestring parsing library. The relevant code
> in my library's module is:
>
> -- The module imports Control.Applicative which containes 'many' and 'some'.
>
> -- | The parser @satisfy p@ succeeds for any byte for which the
> -- supplied function @p@ returns 'True'. Returns the byte that is
> -- actually parsed.
> satisfy :: (Word8 -> Bool) -> Parser Word8
> satisfy p =
> Parser $ \s@(S bs pos eof) succ fail ->
> case S.uncons bs of
> Just (b, bs') -> if p b
> then succ b (S bs' (pos + 1) eof)
> else fail s
> Nothing -> if eof
> then fail s
> else IPartial $ \x ->
> case x of
> Just bs' -> retry (S bs' pos eof)
> Nothing -> fail (S bs pos True)
> where retry s' = unParser (satisfy p) s' succ fail
>
> -- | @byte b@ parses a single byte @b at . Returns the parsed byte
> -- (i.e. @b@).
> byte :: Word8 -> Parser Word8
> byte b = satisfy (== b)
>
> -- ---------------------------------------------------------------------
> -- Rewrite rules
>
> satisfyMany :: (Word8 -> Bool) -> Parser S.ByteString
> satisfyMany p = undefined -- More efficient implementation goes here.
>
> satisfySome :: (Word8 -> Bool) -> Parser S.ByteString
> satisfySome p = undefined -- More efficient implementation goes here.
>
> {-# RULES
>
> "fmap/pack/many/satisfy" forall p.
> fmap S.pack (many (satisfy p)) = satisfyMany p
>
> "fmap/pack/some/satisfy" forall p.
> fmap S.pack (some (satisfy p)) = satisfySome p
> #-}
>
> In another module where I use the library I have this code:
>
> pHeaders :: Parser [(S.ByteString, S.ByteString)]
> pHeaders = many header
> where
> header = liftA2 (,) fieldName (byte (c2w ':') *> spaces *> contents)
> fieldName = liftA2 (S.cons) letter fieldChars
> contents = liftA2 (S.append) (fmap S.pack $ some notEOL <* crlf)
> (continuation <|> pure S.empty)
> continuation = liftA2 (S.cons) ((c2w ' ') <$
> some (oneOf (map c2w " \t"))) contents
>
> -- It's important that all three of these definitions are kept on the
> -- top level to have RULES fire correctly.
> fieldChars = fmap S.pack $ many fieldChar
>
> -- fieldChar = letter <|> digit <|> oneOf (map c2w "-_")
> fieldChar = satisfy isFieldChar
> where
> isFieldChar b = (isDigit $ chr $ fromIntegral b) ||
> (isAlpha $ chr $ fromIntegral b) ||
> (b `elem` map c2w "-_")
>
> I want the fieldChars use of 'fmap S.pack $ many fieldChar' to trigger
> my rewrite rule "fmap/pack/many/satisfy" which it does in this case.
> The trouble is that the rule only triggers when I make at least
> fieldChars and fieldChar top-level definition and isFieldChar either a
> named local definition in fieldChar or a top-level definition. If I
> turn the predicate (isFieldChar) into to an anonymous lambda function
> it doesn't trigger, if I make either fieldChars or fieldChars a local
> defintion (in a where clause) of pHeaders it doesn't trigger. If I
> make fieldChar a local definition in fieldChars it doesn't trigger,
> etc.
>
> It would be great if there was a way to make this a bit less fragile
> and have the rule trigger more often as it is potentially a huge
> performance win. I understand it's hard to guarantee that the rule
> always triggers but now it triggers in rare cases.
>
> -- Johan
> _______________________________________________
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users at haskell.org
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
More information about the Glasgow-haskell-users
mailing list