[Haskell-cafe] serialized data structures (Was: Generalized, named, and exportable default declarations)

Tue Apr 13 20:23:06 UTC 2021

On Tue, 13 Apr 2021, Olaf Klinke wrote:

>> What if we would not complicate the language and generalize syntactic 
>> sugar for Text, but instead improve data layout for all Haskell types and 
>> eventually make a custom Text type unnecessary?
>
> So essentially Gibbon's aim is to make String work like Text under the
> hood, without me having to worry about it? +1 for that!

In principle yes.

I think the idea is the following: Linked data structures were invented 
decades ago in order to easily perform in-place updates, e.g. insert and 
remove elements from lists and trees. However, in Haskell we do not allow 
in-place modifications and copying a data structure is not much more 
expensive than traversing it. But if we copy the whole tree anyway why do 
we still need pointers? We have to acknowledge that linked lists come with 
some costs. A linked list with elements spread across memory have 
cache-unfriendly access patterns. In contrast to that a serialized tree is 
compact in memory and very cache friendly. You can still access a subtree 
without copying.

Yet, linked data structures allow sharing of sub-trees or common list 
tails. For this case we still need pointers. But maybe not as many as 
today. Gibbon solves this by using two alternative internal identifiers 
for every data constructor: One for an embedded subtree and one for a 
pointer to a subtree.

This way a list or a String could be hold in one memory chunk or it could 
be effectively a linked list of chunks or a linked list of single 
characters if required.