[Haskell-cafe] library for set of heterogeneous values?

Anthony Clayden anthony_clayden at clear.net.nz
Thu Aug 11 04:37:57 UTC 2016


I have a (let's call it) database of heterogeneous records.

They're not Haskell records, but anonymous/extensible type-labelled rows. (Could be tuples, could be HLists, could be Lens-like, could be something fancier.)

There's a small number (dozens) of distinct row types, each with a large number (thousands) of rows.  The variety of row-types is not predictable in advance. And indeed a row might 'morph' over time with fields added/removed.

So the obvious answer of putting the lot into a giant HList (each element of the list being a row) isn't going to scale. I could have a type-indexed HList in which each element is a Set of homogeneous rows. But performance still suffers from scanning along the list to find the right type index.

Is there something better? On hackage there's two packages called HSet, neither giving very much help about their suitability:

* `hset` (lower case) [AlekseyUymanov] seems isomorphic to a type-indexed HList.
      ie Must be unique type in each element (could be a Set type, I guess)

* `HSet` (upper case) [athanclark] "Faux heterogeneous sets" seems a lot meatier
     why the "Faux"?
     built over hashtables in the ST monad.

Has anybody used these? Can give guidance on what they can and can't?

Bonus questions:

Given a filter specifying a restriction on (some) fields of rows, I want to get a heterog subset:
* all rows with at least those fields, matching those restrictions.
* the restriction might be merely "has field labelled L".

GIven a candidate row for insertion, I want first to scan for quasi-duplicates:
* any existing row with a subset of the given fields, and the same value at those fields.
* any existing row with a superset of the given fields, and the same value at those fields in common.
* ignore records with only a partial overlap of fields.

One possible data structure: a "vertical store".
Give each row a Globally Unique Id.
Have a separate set for each possible field,
where the set elements are field value (key) to set of GUId -- records with that value.

Then I have a different bonus question:
* how to retrieve all field values for a given GUId?


Thanks
AntC
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20160811/039ce6d6/attachment.html>


More information about the Haskell-Cafe mailing list