[GHC] #9675: Unreasonable memory usage on large data structures

GHC ghc-devs at haskell.org
Sun Oct 12 20:39:12 UTC 2014


#9675: Unreasonable memory usage on large data structures
-------------------------------------+-------------------------------------
              Reporter:  Polarina    |            Owner:
                  Type:  bug         |           Status:  new
              Priority:  normal      |        Milestone:
             Component:  Compiler    |          Version:  7.8.3
            Resolution:              |         Keywords:
      Operating System:  Linux       |     Architecture:  x86_64 (amd64)
       Type of failure:  Compile-    |       Difficulty:  Unknown
  time performance bug               |       Blocked By:
             Test Case:              |  Related Tickets:
              Blocking:              |
Differential Revisions:              |
-------------------------------------+-------------------------------------

Comment (by nomeata):

 Did some GHC profiling, and the problem is clearly the huge number of
 useless variables in the pattern match in the Core for the record selector
 functions, here for a three-field-record (I’ll spare us the 3000-field-
 record):

 {{{
 Test.field3 :: Test.Foo -> GHC.Types.Int -> GHC.Types.Int
 [GblId[[RecSel]], Arity=1, Caf=NoCafRefs, Str=DmdType]
 Test.field3 =
   \ (ds_dm2 :: Test.Foo) ->
     case ds_dm2 of _ [Occ=Dead] { Test.Foo ds1_dm3 ds2_dm4 ds3_dm5 ->
     ds3_dm5
     }

 }}}

 See the attached heap profile, `IdInfo`, `Var`, `IntMap` and `[]` are the
 biggest chunks.

 The problem is that a data type declaration like
 {{{
 data R = R { field1 :: T1, field2 :: T2 }
 }}}
 generates Haskell code of the form
 {{{
 field3 (R {field3 = x}) = x
 }}}
 which is then turned in to the form above by general code that also
 handles multiple fields, field puns, and further patterns in the field.

 While the submitter’s code above may or may not be reasonable it migth be
 worth improving the situation for the benefot of modules with medium-sized
 records (including GHC’s own DynFlags module).

 I see a two approaches:

 1. Avoid generating actual Core for these selector. Make them implicitly
 availailbe, and only generated them in STG or later. I am not sure if
 there is precedence for such implicit things.

 2. Extend the this code somehow:
 {{{
 AltCon
   = DataAlt DataCon
   | LitAlt  Literal
   | DEFAULT
 }}}
 I have two ideas here. Either special-case the single-field-match, i.e. a
 new constructor
 {{{
   | FieldSelAlt DataCon Int
 }}}
 which binds only one variable, the field indicated by the `Int` variable.
 This would get rid of the quadraticness of the representation. But quite a
 bit of code would have to handle two cases, and there would be no unique
 representation of the same match.

 The other one is to extend `DataAlt` by a bit field:
 {{{
   = DataAlt DataCon [Bool]
 }}}
 (with a more suitable data structure for `[Bool]`, e.g. some kind of
 `BitField`). The idea is that this field specifies what fields of the
 constructor are actually used by the match. The number of variables bound
 by such an alternative would equal to the number of `True`s in the list.

 This would improve every alternative where not all variables are used. It
 might make some code more complicated – I could imagine that some compiler
 phases expect that within a case alternative, there is a locally bound
 name for every field – but maybe it’s not too bad.

 This would still be quadratic (but with a much lower factor – one bit
 instead of whatever `Id` + `IdInfo` + `(:)` + whatnot use). We can even
 get rid of that by making the `BitField` a data type like
 {{{
 data BitField = Empty | Singleton Int | ManyBits ByteString | All
 }}}
 (`Empty` would be useful for `Con {}` patterns. Not sure if `All` is
 useful. Maybe `data BitField = Empty | Singleton Int | All` is actually
 enough, if few-but-more-than-one-field-patterns are rare.)


 At this point I’m hoping for someone experienced (e.g. SPJ, of course) to
 tell us which of these ideas are totally whacky, and which are worth
 pursuing, and if so, what problems to expect. But this is not a promise
 that I’ll implement it :-)

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/9675#comment:6>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list