[GHC] #11303: Pattern matching against sets of strings sharing a prefix blows up pattern checker

GHC ghc-devs at haskell.org
Tue Dec 29 10:22:48 UTC 2015


#11303: Pattern matching against sets of strings sharing a prefix blows up pattern
checker
-------------------------------------+-------------------------------------
        Reporter:  bgamari           |                Owner:  gkaracha
            Type:  bug               |               Status:  new
        Priority:  highest           |            Milestone:  8.0.1
       Component:  Compiler          |              Version:  7.11
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
 Type of failure:  Compile-time      |  Unknown/Multiple
  performance bug                    |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:  #11302            |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by gkaracha):

 Ahhhh, I see. This has been noticed before. Actually, I have added a note
 in `deSugar/Check.hs`
 about it: `Note [Literals in PmPat]`. My primary test case for this
 problem has been
 function `mkTextEncoding'` from `libraries/base/GHC/IO/Encoding.hs` until
 now but I guess your
 example stresses the same problem even more.

 The source of the problem lies in the way we translate literals:
  * In the paper, we treat them as guards: literal `l` gets translated to
 `x (True <- x == l)`.
    This allows the algorithm to work uniformly on everything but is too
 expensive, even for
    `mkTextEncoding'`. Essentially, the covered set contains `x |> {True ~
 (x == l)}` and the
    uncovered contains `x |> {False ~ (x == l)}`. Hence, we explode first
 (everything is a
    guard) and then prune (use the term oracle to check the constraints)
 which is really
    exponential.

  * Hence, I changed the implementation and added literals in the pattern
 language. This means
    that for literal `l` the covered set contains `l |> {}` and the
 uncovered set contains
    `x |> {False ~ (x == l)}` like before. Since literals are patterns,
 more things can be
    checked eagerly, like with constructors where we can see immediately
 that e.g. `True` can
    not match `False`. Hence, we generate less from the start.

 The last thing to do (and I think this will address all performance issues
 concerning literals)
 is to completely add literals in the pattern language, that is, use a
 variant of Sestoft's
 negative patterns:
 {{{#!hs
 data PmPat :: PatTy -> * where
   PmCon  :: { ... }        -> PmPat t
   PmVar  :: Id             -> PmPat t
   PmLit  :: PmLit          -> PmPat t
   PmNLit :: Id -> [PmLit] -> PmPat VA -- add this constructor
   PmGrd  :: { ... }        -> PmPat PAT
 }}}
 The idea is that `PmNLit x lits` represents in a compact way __all
 literals `x` that are **not
 equal** to any of `lits`__. Hence, we generate much less and there is
 significantly less need
 for pruning.

 There is a slight possibility that this change will affect #322 and also I
 expect the size of
 the checker to increase 5-10% (LoC) but I have no better solution at the
 moment. Any ideas on
 this approach?

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/11303#comment:4>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list