[Haskell-cafe] Fingerprinting Haskell Objects
Alexander Kjeldaas
alexander.kjeldaas at gmail.com
Tue Oct 7 21:15:55 UTC 2014
Assuming the Generic instance is a stable interface, I would create a
traversal of that, feeding directly into a Blake2b-implementation (a fast
SHA3 finalist, tweaked).
This gives you a cryptographically strong fingerprint, space usage is
flexible (extract as many bytes as you want), is fast (~1GB/s), and with
low complexity/external dependencies.
Alexander
On Tue, Oct 7, 2014 at 10:30 PM, Ozgun Ataman <ozataman at gmail.com> wrote:
> Hello everybody,
>
> I have a little question I wanted to run by the folks here. I've run into
> it several times over the past few years and would love to lock down a good
> answer.
>
> What's the best way to "fingerprint" a Haskell object into, say,
> ByteString, so that this fingerprint can be used as the "lookup key" in a
> database (for example) and be trusted that it will remain constant over
> time even as the underlying libraries evolve?
>
> Here's a simple example:
>
> - Say I'm building a manual index on top of a key-value store (redis,
> dynamodb, etc.)
>
> - I want my keys to be arbitrary tuples (or similar records) that may
> contain various fields in them
>
> - I would like to avoid ad-hoc, hand-written MyTuple -> ByteString and
> ByteString -> MyTuple conversions. However, Generic derivations,
> template-haskell, etc. are acceptable
>
> - Notice how your fingerprint, which is used as a lookup key in the
> database, has to remain stationary. If it changes even by a single bit over
> time for the same MyTuple, the key-value store will NOT be able to find the
> index associated with MyTuple at this later time
>
>
> Here are some ideas (and related concepts) I've considered and used over
> the years:
>
> - Hand-write a "Prism' MyTuple ByteString". This works, but is tedious
> and error-prone.
>
> - Use Serialize/Binary and trust that the encode/decode pair will
> produce results consistently in 5 years (dangerous territory!)
>
> - Use SafeCopy, which is great for ensuring timeless decoding of the
> *value* in the index, but can we be sure that fingerprint (MyTuple ->
> ByteString) conversion is persistent? What if SafeCopy authors one day
> decide to encode tuples differently? They would write the migrations to
> transparently handle legacy code for *values*, but not for *keys*. Also
> notice here how migrations help with the ByteString -> MyTuple leg, but do
> not ensure MyTuple -> ByteString produces the same ByteString over time.
>
> - Hashable would've been nice, but there is NO guarantee of persistent
> results, even across multiple runs of the same code
>
> What would be your preferred solution?
>
> Thank you,
> Oz
>
>
>
>
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20141007/09aa9105/attachment.html>
More information about the Haskell-Cafe
mailing list