Proposal to add mapAccumL function to vector package

John Ky newhoggy at gmail.com
Thu Nov 8 22:53:41 UTC 2018


After removing the load time of the file from benchmarking, the difference
between mapM and hand rolled version is even more stark 😱

benchmarking medium.csv/mapAccumL               for DVS.Vector Word64
time                 2.371 ms   (2.347 ms .. 2.396 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 2.442 ms   (2.420 ms .. 2.477 ms)
std dev              91.95 μs   (57.33 μs .. 137.7 μs)
variance introduced by outliers: 23% (moderately inflated)

benchmarking medium.csv/mapAccumLViaStrictState for DVS.Vector Word64
time                 604.4 ms   (479.0 ms .. 700.5 ms)
                     0.995 R²   (0.983 R² .. 1.000 R²)
mean                 643.8 ms   (607.3 ms .. 695.2 ms)
std dev              51.95 ms   (13.07 ms .. 69.16 ms)
variance introduced by outliers: 21% (moderately inflated)

benchmarking medium.csv/mapAccumLViaLazyState   for DVS.Vector Word64
time                 62.08 ms   (61.14 ms .. 63.23 ms)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 63.16 ms   (62.43 ms .. 65.35 ms)
std dev              2.013 ms   (637.6 μs .. 3.434 ms)


On Fri, 9 Nov 2018 at 09:31, John Ky <newhoggy at gmail.com> wrote:

> Hi Carter,
>
> I've created a ticket <https://github.com/haskell/vector/issues/228> as
> requested and have reproduced the text here:
>
> I've implemented a hand rolled version, and another two versions based on
> a combination of mapM and the lazy and strict versions of State monad.
>
> haskell-works/hw-prim#38
> <https://github.com/haskell-works/hw-prim/pull/38>
>
> The benchmarks show that the hand rolled versions run two times faster
> than the lazy state monad version and 16 times faster than the strict state
> monad.
>
> I found the slow performance of the strict monad version most surprising.
>
> I'm aware that the version that using mapM might enable fusion, however
> it is a fair bit slower than a hand rolled version that defeats fusion.
>
> I would love to have a fusion-enabled version that runs as fast as the
> hand rolled version. Would that be possible?
>
>
> Cheers,
>
>
> -John
>
> On Fri, 9 Nov 2018 at 01:18, Carter Schonwald <carter.schonwald at gmail.com>
> wrote:
>
>> I thought about it a teeny bit more
>>
>> This should be trivially definable using mapM or equivalent using state t
>>
>> Have you tried doing that simple high level definition?  I think that
>> works with vector fusion quite nicely.
>>
>> ... ohhh. I see.  There’s two vectors of inputs.  I’ll have to think
>> about this more.
>>
>> On Thu, Nov 8, 2018 at 8:14 AM Carter Schonwald <
>> carter.schonwald at gmail.com> wrote:
>>
>>> Hey John
>>> :
>>> I’m happy to help you contribute to to vector.
>>>
>>> 1). This might not actually be needed with stream fusion on  ... though
>>> perhaps this sort of shared computation needs to be its own combinators
>>> because of the sharing meaning that the stream fusion on map and foldl
>>> might not work. (Have you tried doing let x = map ... in let y = foldl ...
>>> in something with x and y? Even eg just writing map accum l in terms of
>>> just that? It could very well fuse ... though I don’t think it can with the
>>> current vector fusion framework. Though I think one of the more exotic
>>> fusion frameworks Amos Robinson did a few years ago could handle that
>>> fusion.   Sadly that one requires an ILP solver at compile time.  But
>>> there’s some tricks I think we could do )
>>>
>>> 2) writing it in stream fusion form / as map accum l for streams will
>>> actually be simpler I think
>>>
>>> 3) put a ticket on our GitHub. I’ve a huge backlog (life and stuff got
>>> me a bit slow), but this sounds like a super reasonable feature request
>>>
>>>
>>>
>>> On Thu, Nov 8, 2018 at 6:31 AM John Ky <newhoggy at gmail.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> I'd like to add the mapAccumL function to the vector package.
>>>>
>>>> Specifically the Data.Vector.Storable module, but it would also be
>>>> useful other vector modules.
>>>>
>>>> This is my attempt at an implementation:
>>>>
>>>> {-# LANGUAGE ScopedTypeVariables #-}
>>>>
>>>> mapAccumL :: forall a b c. (Storable b, Storable c)
>>>> => (a -> b -> (a, c))
>>>> -> a
>>>> -> DVS.Vector b
>>>> -> (a, DVS.Vector c)
>>>> mapAccumL f a vb = DVS.createT $ do
>>>> vc <- DVSM.unsafeNew (DVS.length vb)
>>>> a' <- go 0 a vc
>>>> return (a', vc)
>>>> where go :: Int -> a -> DVS.MVector s c -> ST s a
>>>> go i a0 vc = if i < DVS.length vb
>>>> then do
>>>> let (a1, c1) = f a0 (DVS.unsafeIndex vb i)
>>>> DVSM.unsafeWrite vc i c1
>>>> go (i + 1) a1 vc
>>>> else return a0
>>>> {-# INLINE mapAccumL #-}
>>>>
>>>> The implementation should obey the following law:
>>>>
>>>> import qualified Data.List as L
>>>> import qualified Data.Vector.Storable as DVS
>>>>
>>>> (DVS.toList <$> DVS.mapAccumL f a (DVS.fromList bs)) === L.mapAccumL f
>>>> a bs
>>>>
>>>> Cheers,
>>>>
>>>> -John
>>>>
>>>> _______________________________________________
>>>> Libraries mailing list
>>>> Libraries at haskell.org
>>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/libraries/attachments/20181109/c7a4a0e7/attachment.html>


More information about the Libraries mailing list