[GHC] #11303: Pattern matching against sets of strings sharing a prefix blows up pattern checker
GHC
ghc-devs at haskell.org
Tue Dec 29 10:22:48 UTC 2015
#11303: Pattern matching against sets of strings sharing a prefix blows up pattern
checker
-------------------------------------+-------------------------------------
Reporter: bgamari | Owner: gkaracha
Type: bug | Status: new
Priority: highest | Milestone: 8.0.1
Component: Compiler | Version: 7.11
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Compile-time | Unknown/Multiple
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: #11302 | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by gkaracha):
Ahhhh, I see. This has been noticed before. Actually, I have added a note
in `deSugar/Check.hs`
about it: `Note [Literals in PmPat]`. My primary test case for this
problem has been
function `mkTextEncoding'` from `libraries/base/GHC/IO/Encoding.hs` until
now but I guess your
example stresses the same problem even more.
The source of the problem lies in the way we translate literals:
* In the paper, we treat them as guards: literal `l` gets translated to
`x (True <- x == l)`.
This allows the algorithm to work uniformly on everything but is too
expensive, even for
`mkTextEncoding'`. Essentially, the covered set contains `x |> {True ~
(x == l)}` and the
uncovered contains `x |> {False ~ (x == l)}`. Hence, we explode first
(everything is a
guard) and then prune (use the term oracle to check the constraints)
which is really
exponential.
* Hence, I changed the implementation and added literals in the pattern
language. This means
that for literal `l` the covered set contains `l |> {}` and the
uncovered set contains
`x |> {False ~ (x == l)}` like before. Since literals are patterns,
more things can be
checked eagerly, like with constructors where we can see immediately
that e.g. `True` can
not match `False`. Hence, we generate less from the start.
The last thing to do (and I think this will address all performance issues
concerning literals)
is to completely add literals in the pattern language, that is, use a
variant of Sestoft's
negative patterns:
{{{#!hs
data PmPat :: PatTy -> * where
PmCon :: { ... } -> PmPat t
PmVar :: Id -> PmPat t
PmLit :: PmLit -> PmPat t
PmNLit :: Id -> [PmLit] -> PmPat VA -- add this constructor
PmGrd :: { ... } -> PmPat PAT
}}}
The idea is that `PmNLit x lits` represents in a compact way __all
literals `x` that are **not
equal** to any of `lits`__. Hence, we generate much less and there is
significantly less need
for pruning.
There is a slight possibility that this change will affect #322 and also I
expect the size of
the checker to increase 5-10% (LoC) but I have no better solution at the
moment. Any ideas on
this approach?
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/11303#comment:4>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list