[GHC] #14519: Exponential runtime performance regression in GHC 8.2 + Data.Text.Lazy + Text.RE.TDFA

GHC ghc-devs at haskell.org
Mon Jan 22 11:51:54 UTC 2018


#14519: Exponential runtime performance regression in GHC 8.2 + Data.Text.Lazy +
Text.RE.TDFA
-------------------------------------+-------------------------------------
        Reporter:  ntc2              |                Owner:  tdammers
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:  8.6.1
       Component:  Compiler          |              Version:  8.2.2
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  Runtime           |            Test Case:
  performance bug                    |  https://github.com/ntc2/ghc-8.2.1
                                     |  -regex-lazy-text-
                                     |  bug/tree/07b7bb32c6e90e8f2d2eada4b59943f37e632d53
      Blocked By:                    |             Blocking:
 Related Tickets:  #13745, #14564    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by tdammers):

 A smaller reproduction case that depends only on `text`:

 {{{
 module Main
 where

 import qualified Data.Text.Lazy as LText
 import Data.Char
 import System.Environment

 type LText = LText.Text

 matches :: LText -> ([LText], [LText])
 matches str =
   case LText.uncons str of
     Nothing ->
       ([], [])
     Just (c, str') ->
       let (upperMatches, lowerMatches) = matches $ LText.drop 1 str
           upper = isUpper c
           lower = isLower c
           match = LText.take 10 str
           upperMatches' = if upper then match:upperMatches else
 upperMatches
           lowerMatches' = if lower then match:lowerMatches else
 lowerMatches
       in (upperMatches', lowerMatches')

 main = do
   (arg0:args) <- getArgs
   input <- LText.pack <$> readFile arg0
   let (upper, lower) = matches input
   putStrLn $ "Lowercase: " ++ show (take 1 lower)
   putStrLn $ "Uppercase: " ++ show (take 1 upper)
   print $ LText.take 10 input
 }}}

 This example program tries to roughly mimic the usage of lazy `Text`s in
 `regex-tdfa-text` without actually using any code from that. Specifically,
 it uses `drop` (which triggers the offending `RULE`), it snatches off
 characters from the front of the remaining string one by one, it passes
 the tail through a recursive loop, it holds on to chunks of text as it
 traverses the input, thus preventing them from being collected, and it
 eventually forces at least the first one of the accumulated chunks by
 printing them to the console.

 I'm not 100% sure whether preventing collection is absolutely necessary to
 trigger the bug, but in any case, the above example runs slower by about a
 factor 2 when compiled with optimizations, the dump shows that the
 offending `RULE` is being hit, and the ticky profiles hint at the same
 performance issue as well.

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/14519#comment:43>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list