[Haskell-cafe] Fingerprinting Haskell Objects

Alexander Kjeldaas alexander.kjeldaas at gmail.com
Tue Oct 7 21:15:55 UTC 2014


Assuming the Generic instance is a stable interface, I would create a
traversal of that, feeding directly into a Blake2b-implementation (a fast
SHA3 finalist, tweaked).


This gives you a cryptographically strong fingerprint, space usage is
flexible (extract as many bytes as you want), is fast (~1GB/s), and with
low complexity/external dependencies.


Alexander


On Tue, Oct 7, 2014 at 10:30 PM, Ozgun Ataman <ozataman at gmail.com> wrote:

> Hello everybody,
>
> I have a little question I wanted to run by the folks here. I've run into
> it several times over the past few years and would love to lock down a good
> answer.
>
> What's the best way to "fingerprint" a Haskell object into, say,
> ByteString, so that this fingerprint can be used as the "lookup key" in a
> database (for example) and be trusted that it will remain constant over
> time even as the underlying libraries evolve?
>
> Here's a simple example:
>
>    - Say I'm building a manual index on top of a key-value store (redis,
>    dynamodb, etc.)
>
>    - I want my keys to be arbitrary tuples (or similar records) that may
>    contain various fields in them
>
>    - I would like to avoid ad-hoc, hand-written MyTuple -> ByteString and
>    ByteString -> MyTuple conversions. However, Generic derivations,
>    template-haskell, etc. are acceptable
>
>    - Notice how your fingerprint, which is used as a lookup key in the
>    database, has to remain stationary. If it changes even by a single bit over
>    time for the same MyTuple, the key-value store will NOT be able to find the
>    index associated with MyTuple at this later time
>
>
> Here are some ideas (and related concepts) I've considered and used over
> the years:
>
>    - Hand-write a "Prism' MyTuple ByteString". This works, but is tedious
>    and error-prone.
>
>    - Use Serialize/Binary and trust that the encode/decode pair will
>    produce results consistently in 5 years (dangerous territory!)
>
>    - Use SafeCopy, which is great for ensuring timeless decoding of the
>    *value* in the index, but can we be sure that fingerprint (MyTuple ->
>    ByteString) conversion is persistent? What if SafeCopy authors one day
>    decide to encode tuples differently? They would write the migrations to
>    transparently handle legacy code for *values*, but not for *keys*. Also
>    notice here how migrations help with the ByteString -> MyTuple leg, but do
>    not ensure MyTuple -> ByteString produces the same ByteString over time.
>
>    - Hashable would've been nice, but there is NO guarantee of persistent
>    results, even across multiple runs of the same code
>
> What would be your preferred solution?
>
> Thank you,
> Oz
>
>
>
>
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20141007/09aa9105/attachment.html>


More information about the Haskell-Cafe mailing list