[GHC] #14519: Exponential runtime performance regression in GHC 8.2 + Data.Text.Lazy + Text.RE.TDFA

GHC ghc-devs at haskell.org
Tue Jan 16 09:59:38 UTC 2018


#14519: Exponential runtime performance regression in GHC 8.2 + Data.Text.Lazy +
Text.RE.TDFA
-------------------------------------+-------------------------------------
        Reporter:  ntc2              |                Owner:  tdammers
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:  8.4.1
       Component:  Compiler          |              Version:  8.2.2
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  Runtime           |            Test Case:
  performance bug                    |  https://github.com/ntc2/ghc-8.2.1
                                     |  -regex-lazy-text-
                                     |  bug/tree/07b7bb32c6e90e8f2d2eada4b59943f37e632d53
      Blocked By:                    |             Blocking:
 Related Tickets:  #13745, #14564    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by tdammers):

 > So the string "$wnext2{v reC2}" must appear in some .o file. I'd do
 strings *.o | grep to find it.

 Using `strings -f **/*.o | grep wnext2`, I get:

 {{{
 regex-tdfa-text-1.0.0.3/dist/dist-sandbox-
 3f35acf6/build/Text/Regex/TDFA/Text/Lazy.o: $wnext2{v reyv} (regex-tdfa-
 text-1.0.0.3-CIfFZ6rjdCoJI5EFpqTwBO:Text.Regex.TDFA.Text.Lazy) (fun)
 }}}

 There is no identifier `wnext` or `next` in that module, but it does have
 this:

 {{{
 {-# SPECIALIZE execMatch :: Regex -> Position -> Char -> L.Text ->
 [MatchArray] #-}
 execMatch :: Uncons text => Regex -> Position -> Char -> text ->
 [MatchArray]
 execMatch = Engine.execMatch
 }}}

 `execMatch` imported from `Text.Regex.TDFA.NewDFA.Engine`, in turn,
 defines local functions `next` and `next'`; the alias defined here seems
 to mainly exist in order to add the `SPECIALIZE` pragma for lazy `Text`.

 `execMatch` is monstrously large and quite complex, so I haven't managed
 to figure out entirely how it works, however there is a promising starting
 point for further digging:

 Whether or not `SPECIALIZE` triggers might depend on debugging and/or
 profiling build flags, which would explain why the problem disappears in a
 profiling build. The hypothesis would be that the specialized code does
 something that the non-specialized code doesn't do, and that something
 ends up being *less* efficient rather than *more*. The `execMatch`
 function is polymorphic over, among other things, the `text` type, with a
 typeclass constraint `Uncons text` (documentation
 [http://hackage.haskell.org/package/regex-tdfa-1.2.2/docs/Text-Regex-TDFA-
 NewDFA-Uncons.html here]), so we should expect `uncons` to inline for the
 specialized version, but not the un-specialized one (because in the
 specialized case we can statically resolve it to the non-polymorphic
 `uncons` from `Data.Text.Lazy`), and then, possibly maybe, the inlined
 uncons leads to a space leak.

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/14519#comment:21>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list