Making rule application less fragile

Johan Tibell johan.tibell at gmail.com
Thu Mar 13 14:47:03 EDT 2008


Hi,

I'm trying (for the first time ever) to use RULES pragmas to achieve
some nice speedups in my bytestring parsing library. The relevant code
in my library's module is:

-- The module imports Control.Applicative which containes 'many' and 'some'.

-- | The parser @satisfy p@ succeeds for any byte for which the
-- supplied function @p@ returns 'True'.  Returns the byte that is
-- actually parsed.
satisfy :: (Word8 -> Bool) -> Parser Word8
satisfy p =
    Parser $ \s@(S bs pos eof) succ fail ->
        case S.uncons bs of
          Just (b, bs') -> if p b
                           then succ b (S bs' (pos + 1) eof)
                           else fail s
          Nothing       -> if eof
                           then fail s
                           else IPartial $ \x ->
                               case x of
                                 Just bs' -> retry (S bs' pos eof)
                                 Nothing  -> fail (S bs pos True)
            where retry s' = unParser (satisfy p) s' succ fail

-- | @byte b@ parses a single byte @b at .  Returns the parsed byte
-- (i.e. @b@).
byte :: Word8 -> Parser Word8
byte b = satisfy (== b)

-- ---------------------------------------------------------------------
-- Rewrite rules

satisfyMany :: (Word8 -> Bool) -> Parser S.ByteString
satisfyMany p = undefined  -- More efficient implementation goes here.

satisfySome :: (Word8 -> Bool) -> Parser S.ByteString
satisfySome p = undefined  -- More efficient implementation goes here.

{-# RULES

"fmap/pack/many/satisfy" forall p.
  fmap S.pack (many (satisfy p)) = satisfyMany p

"fmap/pack/some/satisfy" forall p.
  fmap S.pack (some (satisfy p)) = satisfySome p
  #-}

In another module where I use the library I have this code:

pHeaders :: Parser [(S.ByteString, S.ByteString)]
pHeaders = many header
    where
      header = liftA2 (,) fieldName (byte (c2w ':') *> spaces *> contents)
      fieldName = liftA2 (S.cons) letter fieldChars
      contents = liftA2 (S.append) (fmap S.pack $ some notEOL <* crlf)
                 (continuation <|> pure S.empty)
      continuation = liftA2 (S.cons) ((c2w ' ') <$
                                      some (oneOf (map c2w " \t"))) contents

-- It's important that all three of these definitions are kept on the
-- top level to have RULES fire correctly.
fieldChars = fmap S.pack $ many fieldChar

-- fieldChar = letter <|> digit <|> oneOf (map c2w "-_")
fieldChar = satisfy isFieldChar
    where
      isFieldChar b = (isDigit $ chr $ fromIntegral b) ||
                      (isAlpha $ chr $ fromIntegral b) ||
                      (b `elem` map c2w "-_")

I want the fieldChars use of 'fmap S.pack $ many fieldChar' to trigger
my rewrite rule "fmap/pack/many/satisfy" which it does in this case.
The trouble is that the rule only triggers when I make at least
fieldChars and fieldChar top-level definition and isFieldChar either a
named local definition in fieldChar or a top-level definition. If I
turn the predicate (isFieldChar) into to an anonymous lambda function
it doesn't trigger, if I make either fieldChars or fieldChars a local
defintion (in a where clause) of pHeaders it doesn't trigger. If I
make fieldChar a local definition in fieldChars it doesn't trigger,
etc.

It would be great if there was a way to make this a bit less fragile
and have the rule trigger more often as it is potentially a huge
performance win. I understand it's hard to guarantee that the rule
always triggers but now it triggers in rare cases.

-- Johan


More information about the Glasgow-haskell-users mailing list