[GHC] #14519: Exponential runtime performance regression in GHC 8.2 + Data.Text.Lazy + Text.RE.TDFA
GHC
ghc-devs at haskell.org
Mon Jan 22 11:51:54 UTC 2018
#14519: Exponential runtime performance regression in GHC 8.2 + Data.Text.Lazy +
Text.RE.TDFA
-------------------------------------+-------------------------------------
Reporter: ntc2 | Owner: tdammers
Type: bug | Status: new
Priority: normal | Milestone: 8.6.1
Component: Compiler | Version: 8.2.2
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: Runtime | Test Case:
performance bug | https://github.com/ntc2/ghc-8.2.1
| -regex-lazy-text-
| bug/tree/07b7bb32c6e90e8f2d2eada4b59943f37e632d53
Blocked By: | Blocking:
Related Tickets: #13745, #14564 | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by tdammers):
A smaller reproduction case that depends only on `text`:
{{{
module Main
where
import qualified Data.Text.Lazy as LText
import Data.Char
import System.Environment
type LText = LText.Text
matches :: LText -> ([LText], [LText])
matches str =
case LText.uncons str of
Nothing ->
([], [])
Just (c, str') ->
let (upperMatches, lowerMatches) = matches $ LText.drop 1 str
upper = isUpper c
lower = isLower c
match = LText.take 10 str
upperMatches' = if upper then match:upperMatches else
upperMatches
lowerMatches' = if lower then match:lowerMatches else
lowerMatches
in (upperMatches', lowerMatches')
main = do
(arg0:args) <- getArgs
input <- LText.pack <$> readFile arg0
let (upper, lower) = matches input
putStrLn $ "Lowercase: " ++ show (take 1 lower)
putStrLn $ "Uppercase: " ++ show (take 1 upper)
print $ LText.take 10 input
}}}
This example program tries to roughly mimic the usage of lazy `Text`s in
`regex-tdfa-text` without actually using any code from that. Specifically,
it uses `drop` (which triggers the offending `RULE`), it snatches off
characters from the front of the remaining string one by one, it passes
the tail through a recursive loop, it holds on to chunks of text as it
traverses the input, thus preventing them from being collected, and it
eventually forces at least the first one of the accumulated chunks by
printing them to the console.
I'm not 100% sure whether preventing collection is absolutely necessary to
trigger the bug, but in any case, the above example runs slower by about a
factor 2 when compiled with optimizations, the dump shows that the
offending `RULE` is being hit, and the ticky profiles hint at the same
performance issue as well.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/14519#comment:43>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list