[Haskell-cafe] Data.Set optimisations [was: library for set of heterogeneous values?]

Tue Aug 23 05:52:32 UTC 2016

On Mon, Aug 15, 2016 at 1:15 AM, Anthony Clayden
<anthony_clayden at clear.net.nz> wrote:
> "indirect jumps are particularly expensive on a modern processor
> architecture, because they fox the branch prediction hardware, leading to a
> stall of 10 or more cycles depending on the length of the pipeline."
> [Introduction. The paper goes on to estimate a penalty around 7 to 9 % in
> general GHC code.]

Indeed, this is the point I was about to bring up (but I've been AWOL
the last few weeks).

The cost model here makes sense if you think about a recursive
function over something like lists. To compile down to a tight C-like
loop we'll chose the conditional statement to be "is this nil" and
assume it's always false, so that code falls through straightline to
do whatever loop body, and then does an unconditional jump back up to
the top of the loop. "Almost always" our assumption that the condition
fails will be correct, as it's correct in the recursive case and only
fails in the basis case. (This notion of "almost always" is assuming
we don't have most of our input arguments be the empty list; with
felicity degrading as nil inputs begin to dominate.)

Regardless of how GHC does things, those basics of branch prediction
and primop ordering will remain the same. The question is just about
how we get there, how we go from english descriptions like "the
recursive case" and "the basis case" (or the "common" and "uncommon"
cases) and get to the actual byte code.

In practice GHC does the naive thing of ordering branches to be in
order of declaration in the data type definition. This is good in as
much as it doesn't matter how users textually order their case
analyses, since GHC will reorder them. But it's bad in that the
assumed ordering is not always very smart— especially when the order
of data constructors in a data type's definition is also used for
other magic, like deriving Ord. Really, we'd like to make GHC smarter
here about how it orders things; but that gets complicated very
quickly.

> That seems to me worrying for case/constructor branching code in general. Do
> we pay that cost for every pattern-match on a list or a Maybe, for example?

In principle, yes. In practice, not so much. Because it's not
recursive, you gotta figure you're just going to be wrong half the
time— but you don't know which half. The only stable solutions here
are (a) use dynamic instrumentation to determine which is the hot path
and then recompile, or (b) distinguish the (Nothing | Some a) type
from the (Just a | Everything) type— which we often want anyways for
other things like Ord and Monoid instances, but noone other than me
seems to care very much. In addition to these, there are also various
unstable solutions: (c) assume the user-provided case-analysis order
gives the most likely constructor first (but do we treat Maybe as a
special case? do we over-serialize everything?), (d) add a new pragma
to specify for each case analysis which constructor should be
considered hot (but how much should we actually trust the user to get
that right?), etc...

-- 
Live well,
~wren