[Haskell-cafe] Call for discussion: OverloadedLists extension
wren ng thornton
wren at freegeek.org
Sun Sep 30 05:47:11 CEST 2012
On 9/28/12 2:48 PM, Mario Blažević wrote:
> On 12-09-26 08:07 PM, wren ng thornton wrote:
>> On 9/25/12 1:57 PM, Sjoerd Visscher wrote:
>>>> Maybe we could make a literal [a,b,c] turn into
>>>> unpack [a,b,c]#
>>>> where
>>>> [a,b,c]#
>>>> is a statically-allocated vector?
>>
>> I'm kinda surprised this isn't already being done. Just doing this seems
>> like it'd be a good undertaking, regardless of whether we get overloaded
>> list literals. Just storing the literal as a C-like array and inflating
>> it to a list/array/vector at runtime seems like it should be a big win
>> for code that uses a lot of literals.
>
> Why?
>
> I'm surprised that this is an issue at all. If list literals you
> are talking about are constant, wouldn't GHC apply constant folding and
> construct the list only the first time it's needed?
The problem is: if the list is stored naively in the .data segment (as
apparently it is), then we have to store all the pointer structure as
well as the data. This hugely bloats the disk footprint for programs.
That is, all the reasons why String=[Char] is bad at runtime are also
reasons why this representation is bad at objectcode time. For most
lists, the pointer structure is a considerable portion of the total
memory cost. During runtime this overhead is (or at least may be)
unavoidable due to the dynamic nature of program execution; but there's
no reason to have this overhead in the compiled format of the program
since it's trivial to generate it from a compact representation (e.g.,
storing lists as C-style arrays + lengths).
The only conceivable benefit of storing lists on disk using their heap
representation is to allow treating the .data segment as if it were part
of the heap, i.e., to have zero-cost inflation and to allow GC to ignore
that "part of the heap". However, for lists, I can't imagine this
actually being beneficial in practice. This sort of thing is more
beneficial for large structures of static data (e.g., sets, maps,...).
But then for large static data, we still usually want a non-heap
representation (e.g., cache-oblivious datastructures), since we're
liable to only look at the data rather than to change it. It's only when
we have lots of static "mutable" data that it makes sense to take heap
snapshots.
--
Live well,
~wren
More information about the Haskell-Cafe
mailing list