[Haskell-cafe] Fwd: Efficient records with arbitrarily many fields [was: Extensible states]

Mon Jul 6 08:15:20 UTC 2015

On 6 July 2015 at 10:51, Frank Staals <frank at fstaals.net> wrote:
> Alexander V Vershilov <alexander.vershilov at gmail.com> writes:
>
>> Not sure that it will make sense for a cafe level, but here is my
>> understanding of the current state. For the problem of "anonymous
>> records", i.e. some datatype that may have different fields and their
>> types we have following appoaches:
>>
>> 1. vinyl/hlist/etc. i.e. GADT base heterogeneous list. This one have
>> O(N) random access and may not be a good fit for "anonymous record"
>> use case
>>
>> 2. records: polymorphic data structures with type-level information
>> about field names. Here we have a lot of datastructures one per number
>> of fields multiply on 2 (lazy and strict variant). This one has O(1) random
>> access.
>>
>> 3. fixed-vector-hetero/"wheel above". array based structures has O(1) access.
>> I call code above a "wheel" because there were a number of similar
>> implementations here and there.
>
> Somehow I feel the most promissing approach is missing from this list:
> The 'Just do it while compiling!' paper[1] describes how to get O(log n)
> query and update, by essentially using heterogeneous binomial trees. I
> think I also saw some haskell implementation in a github gist somewhere,
> but I could not find the link at the momement.

Yes, I tend to agree that this approach should be noted, I just was
unaware of that article. As it seems it's better fit for functional
paradigm that the "array" based and may have better properties
when there are many updates and no fusion was implemented.

> Having said that. I think the Big-O notation used here is misleading,
> since in 99.999% of all cases we have constant-sized records. Therefore,
> all approaches give O(1) access/update in these cases. If one really
> want to know what is fastest benchmarking is the only way to go here.

Benchmarking in the current case is always the answer, as for some
use-cases memory overhead and locality may matter. And vector/data-structure
approaches will give better results

> Regards,
>
> [1] http://www.fing.edu.uy/~mviera/papers/pepm13.pdf
>
> --
>
> - Frank

-- 
Alexander