[GHC] #15646: ghci takes super long time to find the type of large fractional number

Sun Sep 23 09:44:38 UTC 2018

#15646: ghci takes super long time to find the type of large fractional number
-------------------------------------+-------------------------------------
        Reporter:  Johannkokos       |                Owner:
                                     |  JulianLeviston
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:  8.6.1
       Component:  GHCi              |              Version:  8.4.3
      Resolution:                    |             Keywords:  newcomer
Operating System:  Unknown/Multiple  |         Architecture:
 Type of failure:  Compile-time      |  Unknown/Multiple
  performance bug                    |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by osa1):

 So, to summarize the problem here:

 - Semantics of a floating point literal (which is in our case in `XeY`
 form) is defined as `fromRational (n % d)`.
     - So we need to find and `n` and `d` such that `fromRational (n % d)`
 gives us our float.
 - As per Haskell 2010 chapter 6.4.1 (note that section 2.5 only gives the
 syntax, not the type) `fromRational` has type `Fractional a => Rational ->
 a` so `fromRational (n % d)` has type `Fractional a => a`
     - So `XeY` form always has type `Fractional a => a` !
 - Currently we tokenize an `XeY` we call `readRational__` on the string
 `"XeY"`, which returns this: `(n%1)*(10^^(k-d))`. `n`, `k` and `d` are
 parsed from the input so this is very cheap (in our example `n` is `X`,
 `k` is `Y`, `d` is `0`).
     - However, evaluating `(10^^_) :: Rational` part of the expression
 takes long time and uses a lot of memory. Indeed, as comment:10 says, if
 you have 1000000000 as the exponent then representation of this number
 takes GBs of memory so this is expected.
 - comment:12 asks why rendering `Infinity` takes this long: that's because
 we generate the `Infinity` value during desugaring (after parsing), and
 for this we have to first generate the `Rational` value for the literal
 first (which is what is taking all the time and using all the memory). You
 can observe this if you run ghci with `-fdump-parsed -fdump-ds` and
 evaluate `1e100000 :: Float`.

 Perhaps it makes sense to distinguish these two problems:

 - Type checking is too slow
 - Generating `Infinity :: Float` is too slow

 (a) in comment:10 fixes (1) but not (2).
 (c) in comment:10 fixes (1), and it may also fix (2) if we could somehow
 use this new constructor fields to return `Infinity` without calculating
 `10^^e :: Integer` first (as far as I understand the new consturctor will
 look like: `ExpRational n e -- stands for (n%1)*(10^^e)`).

 However, (c) means a change in a public type, and we'd need to evaluate
 performance changes etc. caused by turning a product type to a sum type
 (more optimizations apply to product types than sum types! at least until
 we improve demand analysis for sum types and use unboxed sums in WW).
 Also, I'm not sure if adding one more constructor to a library type just
 for this is a good idea. We'd need to think about how/whether to expose
 this to users (perhaps we do want to expose it in `Data.Ratio` somehow).
 This means long discussions and bikeshedding etc.

 What to do? I think delaying evaluation `readRational` until it's needed
 by making some code lazier would work. Looking at the lexer, we generate a
 `ITrational` for this literal like this:

 {{{
 tok_float        str = ITrational   $! readFractionalLit str

 readFractionalLit :: String -> FractionalLit
 readFractionalLit str = ((FL $! (SourceText str)) $! is_neg) $!
 readRational str
                         where is_neg = case str of ('-':_) -> True
 }}}

 Notice how we're very deliberately strict here. My guess is that simply
 removing strictness around `readRational` (e.g. the `$!` in
 `readFractionalLit`) may fix this. If it doesn't then some other code down
 the line needs to be made lazier too. However I'm not sure if this causes
 other problems (the fact that the thunk for `readRational str` will keep
 the `str` alive is not a problem because the `str` only holds the string
 for the literal). We'll have to evaluate this change in strictness
 somehow.

 If we follow this route then we'd also need to add a test to make sure
 typing this expression remains fast (strictness properties are hard to
 maintain ...).

 Here's another idea that is like (c) but does not need changes in a public
 type: duplicate the `Ratio` type so that only the version used by GHC has
 the special constructor. The code gen and `Data.Ratio` will still use the
 current type. Now we can use the special constructor in lexing to avoid
 evaluating a huge `Integer`, and only when desugaring we do some kind of
 evaluation (this is where we can perhaps be smart and generate `Infinity`
 efficiently, or in the worst case we end up with the current situation,
 but with fast type checking).

 None of these solutions strike me as particularly satisfying though...

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15646#comment:13>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler