[Haskell-cafe] attpoarsec and recursive parsing

Olaf Klinke olf at aatal-apotheke.de
Sat Nov 23 17:05:00 UTC 2024


On Fri, 2024-11-22 at 11:05 +0100, PICCA Frederic-Emmanuel wrote:
> Hello, I end up with this solution
> 
> in order to parse this data type.
> 
> If you have some advice in order to simplify or beautify the Parser part do not hesitate.
> 
> Cheers
> 
> Fred
> 
> data MaskLocation = MaskLocation Text
>                   | MaskLocation'Tmpl Text
>                   | MaskLocation'Or MaskLocation MaskLocation
>     deriving (Eq, Generic, Show)
>     deriving anyclass (FromJSON, ToJSON)
> 
> instance FieldEmitter MaskLocation where
>   fieldEmitter (MaskLocation t)      = t
>   fieldEmitter (MaskLocation'Tmpl t) = t
>   fieldEmitter (MaskLocation'Or l r) = fieldEmitter l <> " | " <> fieldEmitter r
> 
> instance FieldParsable MaskLocation where
>   fieldParser = do
>     let loc :: Text -> MaskLocation
>         loc t = if "{scannumber:" `Data.Text.isInfixOf` t
>                 then MaskLocation'Tmpl (strip t)
>                 else MaskLocation (strip t)
> 
>     t <- takeTill (== '|')
>     if t == ""
>       then fail "MaskLocation is Empty"
>       else do mc <- peekChar
>               case mc of
>                 Nothing -> pure $ loc t
>                 Just '|'  -> do
>                   _ <- char '|'  -- extract the '|' char
>                   MaskLocation'Or (loc t) <$> fieldParser
>                 Just c -> fail ("MaskLocation " <> [c])

The peekChar seems suspicious. That is what backtracking parsers
abstract over. Attoparsec always backtracks, in other parser libraries
there is the try combinator. You match on the '|' char using peekChar
and then later discard it by the _ <- char '|' parser fragment. 

Also note that MaskLocation is a tree (with two different types of
leaves), but your fieldParser will only ever produce trees that have a
single leaf in the left branch. These trees are therefore equivalent to
non-empty linked lists. Therefore you might as well use sepBy1:

type P = Data.Attoparsec.Text.Parser

loc :: Text -> Either Text Text
loc t = if ("{scannumber:" `Data.Text.isInfixOf` t)
  then Left  (strip t)
  else Right (strip t)
-- note: either id id ~ fieldEmitter

locationOrTempl :: P (Either Text Text)
locationOrTempl = fmap loc (takeWhile1 (/= '|'))

sep :: P Text
sep = fromString "|" -- IsString instance of P

maskLocations :: P [Either Text Text]
maskLocations = locationOrTempl `sepBy1` sep

Olaf



More information about the Haskell-Cafe mailing list