Should folds in containers package remain INLINE

Sun May 6 02:31:17 CEST 2012

On 5/4/12 5:01 AM, Simon Peyton-Jones wrote:
>
> | I know one thing I run into a lot, especially with folds but also with
> | other generic functions, is that what I really want is type class
> | specialization. That is, at compile time inline the function enough to
> | make the type class parameters static so that the methods can be inlined
> | back into the generic function; and then at runtime do type-based
> | dispatch to the specialized versions just like if you were looking up
> | the instance record or doing SPECIALIZE lookup. The goal being to remove
> | the indirect calls in inner loops, but in a particularly restricted way
> | since instances are known at compile time and therefore code bloat will
> | be limited (unlike, say, function or record arguments).
>
> Can you give a concrete example.  It's hard to be certain but what you describe sounds like exactly what INLINABLE does.

For a (semi)concrete example, say we have an implementation of the 
forward--backward algorithm. The algorithm is O(n^3) but those are some 
nice tight loops. The algorithm can be parametrized by your choice of 
semiring. So we make a type class for semirings in order to make the 
algorithm generic, but we want the algorithm to be specialized on the 
type class rather than performing indirect calls in the middle of O(n^3).

The (first-order) forward algorithm is defined by the recurrence:

     alpha :: Position -> State -> Probability
     alpha 0 s0 = prior s0
     alpha j sj =
         let i = j-1 in
         sum [ alpha i si * p si sj | si <- states i ]

where prior and p are probability models, states gives us all the 
possible states at that position, the sum and (*) are actually for the 
semiring of choice, and where we actually store everything in a table in 
order to do dynamic programming.

The backward algorithm is defined similarly:

     beta :: Position -> State -> Probability
     beta T sT = coprior sT
     beta i si =
         let j = i+1 in
         sum [ p si sj * beta j sj | sj <- states j ]

where T is some known constant, namely the maximum position.

For efficiency reasons, the actual code is a good deal more complicated 
than the above recurrences, but the basic idea is the same. We don't 
particularly want to specialize on the parameters (prior, p, states,...) 
since those are defined dynamically, but we do want to specialize on the 
semiring since the set of semirings we'll care about is fixed at compile 
time.

Given what Johan Tibell said, I think INLINABLE will in fact do this; 
but his description of the pragma differs from what I recalled of back 
when we came up with it. Hence the question.

-- 
Live well,
~wren