[Haskell-cafe] vector operations

Evan Laforge qdunkan at gmail.com
Mon Jun 11 19:52:55 CEST 2012

On Tue, May 29, 2012 at 12:52 PM, Roman Leshchinskiy <rl at cse.unsw.edu.au> wrote:
> On 29/05/2012, at 19:49, Evan Laforge wrote:
>> Good question.. I copied both to a file and tried ghc-core, but it
>> inlines big chunks of Data.Vector and I can't read it very well, but
>> it looks like the answer is no, it still builds the the list of sums.
>> I guess the next step is to benchmark and see how busy the gc is on
>> each version.
> Vector should definitely fuse this, if it doesn't it's a bug. Please report if it doesn't for you. To verify, just count the number of letrecs in the optimised Core. You'll see one letrec if it has been fused and two if it hasn't.

I see two letrecs in find_before2, but both of them are on findIndex.
I only have one findIndex so I'm not sure what's going on.  The first
one calls the second, but there's an boxed Either argument in there,
which must be coming out of vector internals.

I had to stick NOINLINE on the functions so I could find them in the
core.  I don't think this should affect the optimization of the
contents, though.

The fold_abort version is shorter and simpler, only has one letrec
that takes all unboxed arguments, and I think I can more or less
follow what it's doing.

Of course that doesn't mean it's any faster, I could be just
misreading the core.  I could do a bug report, but maybe someone else
should look at the core first to make sure I'm not just confused?  I
appended the file below, just run ghc-core and search for find_before.

On Tue, May 29, 2012 at 12:54 PM, Duncan Coutts
<duncan.coutts at googlemail.com> wrote:
> Note that foldr allows early abort so that's fine. Also, there's no
> fundamental restriction due to stream fusion. Stream fusion can be
> used for lazy lists afterall and can implement Data.List.foldr just
> fine.

But can foldr do a sum running from left to right?  I thought you had
to be left-biased for that.

And as for early abort with foldr, I can think of how to do so if I'm
generating lazy data with a right-biased constructor like (:), but how
could you do that for, say, a sum?  The obvious version, 'foldr (\x v
-> if v > 10 then v else v + x) 0' will still run the function on
every element.

I suppose if fusion works its magic then early abort with foldl or
scanl should happen.  If the generating loop gets fused with the
consuming loop, and the consuming loop only consumes part of the
input, as it would with findIndex.

import qualified Data.Vector.Unboxed as Unboxed

-- | Find the index of the last value whose running sum is still below the
-- given number.
{-# NOINLINE find_before #-}
find_before :: Int -> Unboxed.Vector Int -> Int
find_before n = fst . fold_abort go (0, 0)
    go (i, total) a
        | total + a <= n = Just (i+1, total+a)
        | otherwise = Nothing

fold_abort :: (Unboxed.Unbox a) =>
    (accum -> a -> Maybe accum) -> accum -> Unboxed.Vector a -> accum
fold_abort f accum vec = go 0 accum
    where go i accum = maybe accum (go (i+1)) $ f accum =<< vec Unboxed.!? i

{-# NOINLINE find_before2 #-}
find_before2 :: Int -> Unboxed.Vector Int -> Int
find_before2 n vec = case Unboxed.findIndex (>n) sums of
        Just i -> max 0 (i-1)
        Nothing -> 0
    where sums = Unboxed.scanl' (+) 0 vec

main :: IO ()
main = do
    print (t0 find_before)
    print (t0 find_before2)

t0 :: (Int -> Unboxed.Vector Int -> Int) -> [Int]
t0 f = [f n (Unboxed.fromList [2, 2, 2, 2]) | n <- [0..6]]

More information about the Haskell-Cafe mailing list