[Haskell-cafe] Adding Content-Addressable Storage to GHC
carter.schonwald at gmail.com
Wed Mar 18 18:49:55 UTC 2020
I slightly feel like we should first upgrade the typeable hashing rep from
md5 to sha3-256 or something like that (though i've talked about how
finding a compilable pair of collisions would make for a fun april first
package ), but I do think theres certainly some really cool things if we
think about that sort of direction carefully. I'm not aware of any efforts
in this direction atm
On Wed, Mar 18, 2020 at 2:05 PM Alan & Kim Zimmerman <alan.zimm at gmail.com>
> I am not exploring, but watching with great interest. And may not be able
> to resist jumping in if something comes of it.
> On Wed, 18 Mar 2020 at 11:23, Chris Done <haskell-cafe at chrisdone.com>
>> Hi all,
>> Is there any effort or designs ongoing to add CAS (content-addressable
>> storage) to GHC, as in Unison? <
>> == The idea ==
>> The summary of the idea is simply that top-level declarations can be
>> addressed by a hash of their contents. Recursive definitions are
>> transformed into the worker/wrapper to eliminate the self-referencing issue
>> of hashing.
>> == Why I want this ==
>> There are lots of advantages to this, but the one that excites me the
>> most is that we can move to running tests, especially property tests, at
>> The main downside to running tests at compile-time, as seen done with
>> template-haskell is that you will re-run tests every time the module is
>> recompiled, making your dev cycle slower. However, if your tests are keyed
>> upon CAS hashes, then those hashes are only invalidated when individual
>> declarations actually change. This means the re-running of tests becomes
>> granular at the declaration-level. When a single test completes, either
>> successfully or not, you can cache the result and lookup the result next
>> time, using e.g. the SHA512 of the expression evaluated.
>> Therefore you could change a single function in a library and it would
>> only re-run the tests that are actually affected, rather than running all
>> the tests in the whole module, and rather than the more typical approach
>> which is running ALL tests in a test suite just because one thing changed.
>> If you can couple tests with code then you can avoid the decoupling of
>> code from the tests.
>> == Implementation approaches ==
>> There are various ways to implement this with varying degrees of
>> 1. Use TH: reify declarations, inspect the AST, and produce a SHA512. Use
>> ambient values such as the GHC version, instances in scope, extensions, ghc
>> options, etc. With TH, I'm confident that you can only achieve an imperfect
>> hash because I doubt that all information is available to TH.
>> Names that come from external packages could be treated as CAS'd at the
>> scope of the package's installed hash. Ideally, you could have granularity
>> into other packages. But it's not a necessity if you just want caching for
>> your current development package.
>> 2. Use a source plugin. A source plugin is already capable of accessing
>> all GHC context information, so this might lead to more of a perfect hash.
>> 3. Add it to GHC directly. Exposing a `expressionSHA512 :: Exp ->
>> ByteString` could be one imaginary way to access this information. With
>> such a function you could implement caching of fine-grained tests.
>> A related discussion is the deterministic builds:
>> Anyone else exploring this?
>> Haskell-Cafe mailing list
>> To (un)subscribe, modify options or view archives go to:
>> Only members subscribed via the mailman list are allowed to post.
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> Only members subscribed via the mailman list are allowed to post.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Haskell-Cafe