[GHC] #9675: Unreasonable memory usage on large data structures
GHC
ghc-devs at haskell.org
Sun Oct 12 20:39:12 UTC 2014
#9675: Unreasonable memory usage on large data structures
-------------------------------------+-------------------------------------
Reporter: Polarina | Owner:
Type: bug | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 7.8.3
Resolution: | Keywords:
Operating System: Linux | Architecture: x86_64 (amd64)
Type of failure: Compile- | Difficulty: Unknown
time performance bug | Blocked By:
Test Case: | Related Tickets:
Blocking: |
Differential Revisions: |
-------------------------------------+-------------------------------------
Comment (by nomeata):
Did some GHC profiling, and the problem is clearly the huge number of
useless variables in the pattern match in the Core for the record selector
functions, here for a three-field-record (I’ll spare us the 3000-field-
record):
{{{
Test.field3 :: Test.Foo -> GHC.Types.Int -> GHC.Types.Int
[GblId[[RecSel]], Arity=1, Caf=NoCafRefs, Str=DmdType]
Test.field3 =
\ (ds_dm2 :: Test.Foo) ->
case ds_dm2 of _ [Occ=Dead] { Test.Foo ds1_dm3 ds2_dm4 ds3_dm5 ->
ds3_dm5
}
}}}
See the attached heap profile, `IdInfo`, `Var`, `IntMap` and `[]` are the
biggest chunks.
The problem is that a data type declaration like
{{{
data R = R { field1 :: T1, field2 :: T2 }
}}}
generates Haskell code of the form
{{{
field3 (R {field3 = x}) = x
}}}
which is then turned in to the form above by general code that also
handles multiple fields, field puns, and further patterns in the field.
While the submitter’s code above may or may not be reasonable it migth be
worth improving the situation for the benefot of modules with medium-sized
records (including GHC’s own DynFlags module).
I see a two approaches:
1. Avoid generating actual Core for these selector. Make them implicitly
availailbe, and only generated them in STG or later. I am not sure if
there is precedence for such implicit things.
2. Extend the this code somehow:
{{{
AltCon
= DataAlt DataCon
| LitAlt Literal
| DEFAULT
}}}
I have two ideas here. Either special-case the single-field-match, i.e. a
new constructor
{{{
| FieldSelAlt DataCon Int
}}}
which binds only one variable, the field indicated by the `Int` variable.
This would get rid of the quadraticness of the representation. But quite a
bit of code would have to handle two cases, and there would be no unique
representation of the same match.
The other one is to extend `DataAlt` by a bit field:
{{{
= DataAlt DataCon [Bool]
}}}
(with a more suitable data structure for `[Bool]`, e.g. some kind of
`BitField`). The idea is that this field specifies what fields of the
constructor are actually used by the match. The number of variables bound
by such an alternative would equal to the number of `True`s in the list.
This would improve every alternative where not all variables are used. It
might make some code more complicated – I could imagine that some compiler
phases expect that within a case alternative, there is a locally bound
name for every field – but maybe it’s not too bad.
This would still be quadratic (but with a much lower factor – one bit
instead of whatever `Id` + `IdInfo` + `(:)` + whatnot use). We can even
get rid of that by making the `BitField` a data type like
{{{
data BitField = Empty | Singleton Int | ManyBits ByteString | All
}}}
(`Empty` would be useful for `Con {}` patterns. Not sure if `All` is
useful. Maybe `data BitField = Empty | Singleton Int | All` is actually
enough, if few-but-more-than-one-field-patterns are rare.)
At this point I’m hoping for someone experienced (e.g. SPJ, of course) to
tell us which of these ideas are totally whacky, and which are worth
pursuing, and if so, what problems to expect. But this is not a promise
that I’ll implement it :-)
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/9675#comment:6>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list