[Haskell-cafe] Adding Content-Addressable Storage to GHC

Alan & Kim Zimmerman alan.zimm at gmail.com
Wed Mar 18 18:03:16 UTC 2020


I am not exploring, but watching with great interest.  And may not be able
to resist jumping in if something comes of it.

Alan

On Wed, 18 Mar 2020 at 11:23, Chris Done <haskell-cafe at chrisdone.com> wrote:

> Hi all,
>
> Is there any effort or designs ongoing to add CAS (content-addressable
> storage) to GHC, as in Unison? <
> https://www.unisonweb.org/docs/tour/>
>
> == The idea ==
>
> The summary of the idea is simply that top-level declarations can be
> addressed by a hash of their contents. Recursive definitions are
> transformed into the worker/wrapper to eliminate the self-referencing issue
> of hashing.
>
> == Why I want this ==
>
> There are lots of advantages to this, but the one that excites me the most
> is that we can move to running tests, especially property tests, at
> compile-time.
>
> The main downside to running tests at compile-time, as seen done with
> template-haskell is that you will re-run tests every time the module is
> recompiled, making your dev cycle slower. However, if your tests are keyed
> upon CAS hashes, then those hashes are only invalidated when individual
> declarations actually change. This means the re-running of tests becomes
> granular at the declaration-level.  When a single test completes, either
> successfully or not, you can cache the result and lookup the result next
> time, using e.g. the SHA512 of the expression evaluated.
>
> Therefore you could change a single function in a library and it would
> only re-run the tests that are actually affected, rather than running all
> the tests in the whole module, and rather than the more typical approach
> which is running ALL tests in a test suite just because one thing changed.
>
> If you can couple tests with code then you can avoid the decoupling of
> code from the tests.
>
> == Implementation approaches ==
>
> There are various ways to implement this with varying degrees of
> satisfaction:
>
> 1. Use TH: reify declarations, inspect the AST, and produce a SHA512. Use
> ambient values such as the GHC version, instances in scope, extensions, ghc
> options, etc. With TH, I'm confident that you can only achieve an imperfect
> hash because I doubt that all information is available to TH.
>
> Names that come from external packages could be treated as CAS'd at the
> scope of the package's installed hash. Ideally, you could have granularity
> into other packages. But it's not a necessity if you just want caching for
> your current development package.
>
> 2. Use a source plugin. A source plugin is already capable of accessing
> all GHC context information, so this might lead to more of a perfect hash.
>
> 3. Add it to GHC directly. Exposing a `expressionSHA512 :: Exp ->
> ByteString` could be one imaginary way to access this information. With
> such a function you could implement caching of fine-grained tests.
>
> A related discussion is the deterministic builds:
> https://gitlab.haskell.org/ghc/ghc/wikis/deterministic-builds
>
> Anyone else exploring this?
>
> Cheers,
>
> Chris
>
>
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20200318/786b632c/attachment.html>


More information about the Haskell-Cafe mailing list