[Haskell-cafe] Call for discussion: OverloadedLists extension

Sun Sep 30 05:47:11 CEST 2012

On 9/28/12 2:48 PM, Mario Blažević wrote:
> On 12-09-26 08:07 PM, wren ng thornton wrote:
>> On 9/25/12 1:57 PM, Sjoerd Visscher wrote:
>>>> Maybe we could make a literal [a,b,c] turn into
>>>>     unpack [a,b,c]#
>>>> where
>>>>     [a,b,c]#
>>>> is a statically-allocated vector?
>>
>> I'm kinda surprised this isn't already being done. Just doing this seems
>> like it'd be a good undertaking, regardless of whether we get overloaded
>> list literals. Just storing the literal as a C-like array and inflating
>> it to a list/array/vector at runtime seems like it should be a big win
>> for code that uses a lot of literals.
>
> Why?
>
>      I'm surprised that this is an issue at all. If list literals you
> are talking about are constant, wouldn't GHC apply constant folding and
> construct the list only the first time it's needed?

The problem is: if the list is stored naively in the .data segment (as 
apparently it is), then we have to store all the pointer structure as 
well as the data. This hugely bloats the disk footprint for programs.

That is, all the reasons why String=[Char] is bad at runtime are also 
reasons why this representation is bad at objectcode time. For most 
lists, the pointer structure is a considerable portion of the total 
memory cost. During runtime this overhead is (or at least may be) 
unavoidable due to the dynamic nature of program execution; but there's 
no reason to have this overhead in the compiled format of the program 
since it's trivial to generate it from a compact representation (e.g., 
storing lists as C-style arrays + lengths).

The only conceivable benefit of storing lists on disk using their heap 
representation is to allow treating the .data segment as if it were part 
of the heap, i.e., to have zero-cost inflation and to allow GC to ignore 
that "part of the heap". However, for lists, I can't imagine this 
actually being beneficial in practice. This sort of thing is more 
beneficial for large structures of static data (e.g., sets, maps,...). 
But then for large static data, we still usually want a non-heap 
representation (e.g., cache-oblivious datastructures), since we're 
liable to only look at the data rather than to change it. It's only when 
we have lots of static "mutable" data that it makes sense to take heap 
snapshots.

-- 
Live well,
~wren