[Haskell-cafe] serialized data structures (Was: Generalized, named, and exportable default declarations)
Henning Thielemann
lemming at henning-thielemann.de
Tue Apr 13 20:23:06 UTC 2021
On Tue, 13 Apr 2021, Olaf Klinke wrote:
>> What if we would not complicate the language and generalize syntactic
>> sugar for Text, but instead improve data layout for all Haskell types and
>> eventually make a custom Text type unnecessary?
>
> So essentially Gibbon's aim is to make String work like Text under the
> hood, without me having to worry about it? +1 for that!
In principle yes.
I think the idea is the following: Linked data structures were invented
decades ago in order to easily perform in-place updates, e.g. insert and
remove elements from lists and trees. However, in Haskell we do not allow
in-place modifications and copying a data structure is not much more
expensive than traversing it. But if we copy the whole tree anyway why do
we still need pointers? We have to acknowledge that linked lists come with
some costs. A linked list with elements spread across memory have
cache-unfriendly access patterns. In contrast to that a serialized tree is
compact in memory and very cache friendly. You can still access a subtree
without copying.
Yet, linked data structures allow sharing of sub-trees or common list
tails. For this case we still need pointers. But maybe not as many as
today. Gibbon solves this by using two alternative internal identifiers
for every data constructor: One for an embedded subtree and one for a
pointer to a subtree.
This way a list or a String could be hold in one memory chunk or it could
be effectively a linked list of chunks or a linked list of single
characters if required.
More information about the Haskell-Cafe
mailing list