panic when compiling SHA

Carter Schonwald carter.schonwald at gmail.com
Mon Jan 13 03:26:59 UTC 2014


agreed,
this level of unrolling would cause problems even in most C compilers! When
I write unrolled simd C code, I use a fixed number of variables that
corresponds to the # of registers that are live on my target arch (yes,
internally the turn into SSA which then does an ANF/CPS style and such),
but by shadowing/resuing a fixed number of  names, i can make it
"syntactically" clear what the lifetimes of my variables is.

But yeah, branch predictors are pretty good on modern hardware, a loop is
worth considering.


On Sun, Jan 12, 2014 at 10:06 PM, Ben Lippmeier <benl at ouroborus.net> wrote:

>
> On 10/01/2014, at 6:17 , Adam Wick <awick at galois.com> wrote:
>
> > That’s the problem with SHA, then. The implementation (and the spec,
> really) is essentially a long combination of the form:
> >
> > let x_n5 = small_computation x_n1 x_n2 x_n3 x_n4
> >      x_n6 = small_computation x_n2 x_n3 x_n4 x_n5
> >      …
> >
> > Which has ~70 entries. The actual number of live variables alive at any
> time should be relatively small, but if slots aren’t getting reused there’s
> going to be some significant blowup. (To be honest, I had figured — and
> thought I had validated — that doing it this way would give the compiler
> the best chance at generating optimal code, but it appears I merely set
> myself up to hit this limitation several years later.)
>
> If this [1] is the current version then I don't think there is any
> performance reason to manually unroll the loops like that. If you write a
> standard tail-recursive loop then the branch predictor in the processor
> should make the correct prediction for all iterations except the last one.
> You'll get one pipeline stall at the end due to a mis-predicted backward
> branch, but it won't matter in terms of absolute percentage of execution
> time. You generally only need to worry about branches if the branch flag
> flips between True and False frequently.
>
> If you care deeply about performance then on some processors it can be
> helpful to unfold this sort of code so that the SHA constants are
> represented as literals in the instruction stream instead of in static data
> memory -- but that ability is very processor specific and you'd need to
> really stare at the emitted assembly code to see if it's worthwhile.
>
> Ben.
>
>
> [1] https://github.com/GaloisInc/SHA/blob/master/Data/Digest/Pure/SHA.hs
>
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20140112/455d070a/attachment.html>


More information about the ghc-devs mailing list