[Haskell-cafe] Could optimize attoparsec if GHC would collapse `runParser (Parser parser)` to `parser`

Tyson Whitehead twhitehead at gmail.com
Thu May 27 18:31:43 UTC 2021


Hi. I'm hoping someone with some knowledge about GHC optimizations can
weigh in and help out here.

The attoparsec library has a `<|>` combinator for choice. Since
attoparsec does arbitrary backtracking, you would expect that
`recursiveLhs <|> rhs` would stack up a bunch of `rhs` alternatives to
the depth of the recursion.

Looking at the definitions though, would make you believe that so long
as the `rhs` is constant as it usually is (e.g., `pure []` if you are
collecting a list), inlining should result in it not stacking up a
bunch of the `rhs` alternatives (intuitively the most recent one will
always succeed so there is no reason to hold on to more than one).

Unfortunately, it doesn't seem to work. The full details can be found
in this bug report

https://github.com/haskell/attoparsec/issues/186

but I believe it comes down to the fact that GHC is not collapsing a
packing and immediate unpacking of the parser datatype
```
runParser (Parser parser) -> parser
```
which is resulting in the fact that the `rhs` always succeeds (i.e.,
doesn't use the failure continuation) not being exposed.

I am hoping someone knowledgeable in these matters could weigh in.

Thanks!  -Tyson

PS: The parser datatype is a wrapper on CPS style lambda
```
newtype Parser i a = Parser {
      runParser :: forall r.
                   State i -> Pos -> More
                -> Failure i (State i)   r
                -> Success i (State i) a r
                -> IResult i r
    }
```
where Failure and Success are type synonyms for the continuations
```
type Failure i t   r = t -> Pos -> More -> [String] -> String -> IResult i r
type Success i t a r = t -> Pos -> More -> a -> IResult i r
```


More information about the Haskell-Cafe mailing list