[Haskell-cafe] bytestring vs. uvector

Sun Mar 8 20:47:54 EDT 2009

>> uvector is, if my memory serves me correctly, a fork of the vector library.
>> It uses modern stream fusion, but is under active development and is a
>> little scary. I'm a little unclear on the exact difference between uvector
>> and vector. Both use arrays that are not pinned, so they can't be readily
>> used with foreign code. If you want to use either library, understand that
>> you're embarking on a bracing adventure.
> 
> vector and uvector are roughly based on the same technology; uvector
> is - as far as I remember - a fork of some of the old DPH code which
> uses stream fusion which Don cleaned up and worked on (and it's proven
> pretty useful, and people are still hacking on it.)
> 
> vector however, has the notion of 'recycling arrays' when it does
> array operations. The technique is in fact quite similar to stream
> fusion. Roman L. built this from scratch I think, so it's quite a bit
> more unused and less stable than even uvector is maybe, but I suppose
> you could say it's kind of a superset of uvector. Hopefully though
> it should mature a little, and the plans are to have the technology
> from both of these folded into the Data Parallel Haskell project so we
> get fast array operations+automatic parallelisation.
> 
> For info, see Roman's paper, 'Recycle your arrays!'
> 
> http://www.cse.unsw.edu.au/~rl/publications/recycling.html

Given the close relationship between uvector and vector, it would
be very helpful if both package descriptions on hackage could 
point to a common haskell wiki page, starting out with the text
and link above, plus a link to the stream fusion paper (I hadn't 
been aware that vector incorporates the recycling work, and 
had often wondered about the precise relationship between those
two packages). Apart from saving others from similar confusion,
that would also provide a place to record experience with those two alternatives.

Btw, have any of the Haskell array optimization researchers
considered fixpoints yet? Both fusion and recycling are based 
on rewrite rules of the kind "in . out --> id". Now, given a loop 
like this:

    loop a = if c a then loop (out (action (in a))) else a
    loop a

these rules don't apply. Unrolling the loop a fixed number of
times would enable some rule applications, but still some would
remain in the loop body. But with a little rewriting

    loop a = if c a then loop (out (action (in a))) else out (id (in a))
    loop a

    loop a = if c a then loop (out (action (in a))) else out (id (in a))
    (if c a then loop (out (action (in a))) else out (id (in a)))

we can now push the out into the next iteration of the loop or,
if there is no next iteration, into the loop epilogue

    loop a = if c (out a) then loop (action (in (out a))) else id (in (out a))
    out (if c a then loop (action (in a)) else a)

making the rewrite rule applicable

    loop a = if c (out a) then loop (action a) else id a
    out (if c a then loop (action (in a)) else a)

leading (modulo bugs, omissions, and oversights;-) to a fused/
recycled loop body, with potentially substantial benefit.

Claus