mapM_ for bytestring

Fri Sep 6 01:47:49 CEST 2013

I can see that argument. I did the opposite in unordered-containers,
which have many near identical traversals in the name of speed. I must
admit it's a pain to maintain. Changes typically have to happen in at
least 2 places and the code feels cut-n-pasty.

On Thu, Sep 5, 2013 at 4:36 PM, Edward Kmett <ekmett at gmail.com> wrote:
> I personally find myself on the other side of the divide, especially if
> someone else is willing to do the work to write and test the method for two
> reasons.
>
> 1.) The bytestring API is already fairly noisy in that it really needs to be
> imported qualified already. Finding such a common function missing from an
> API that is so all-inclusive is pretty jarring from a user perspective, so
> it would seem to follow the principle of least surprise to include it if
> possible.
>
> 2.) Bytestring was built from the ground up for speed. A factor of 4-8x
> speed difference even before dipping into inlinePerformIO tricks isn't
> negligible here.
>
> -Edward
>
>
>
> On Thu, Sep 5, 2013 at 7:07 PM, Johan Tibell <johan.tibell at gmail.com> wrote:
>>
>> I think the important issue here is whether we want to add monadic
>> versions of functions in bytestring (where it makes sense)? It wield
>> yield better performance than using `unpack`, but at the cost of lots
>> of code duplication.
>>
>> We have the same issue in containers where you could for example want
>> an `updateM` function, that would let you decide whether to update a
>> value by performing some side effect. You could simulate this using a
>> combination of `lookup` and `insert`, but that requires twice the
>> number of traversals of the data structure.
>>
>> Right now I'm of the mind that the extra traversals (and using unpack
>> in the case of ByteString) is better than the code duplication.
>>
>> On Thu, Sep 5, 2013 at 3:51 PM, Edward Kmett <ekmett at gmail.com> wrote:
>> > The maintainers for bytestring are still listed as Don Stewart and
>> > Duncan
>> > Coutts on the package, and it doesn't seem to fall the list of core
>> > packages
>> > per http://www.haskell.org/haskellwiki/Library_submissions so I suppose
>> > it
>> > would come down to talking one of them into taking the patch.
>> >
>> > It seems odd that a fundamental package like this is omitted from the
>> > Library_submissions page though, as the older
>> > http://trac.haskell.org/haskell-platform/wiki/PackageMaintainers page on
>> > the
>> > trac shows it as maintained by GHC Central.
>> >
>> > -Edward
>> >
>> >
>> >
>> > On Thu, Sep 5, 2013 at 4:38 PM, Artyom Kazak <yom at artyom.me> wrote:
>> >>
>> >> So, I have written several implementations of mapM_:
>> >>     * bsMapM_gen   — generic, works for any monad
>> >>     * bsMapM_short — essentially (\f s -> mapM_ f $ unpack s)
>> >>     * bsMapM_IO    — hand-written version specifically for IO
>> >>
>> >> Generic and hand-written versions don’t differ much. The overhead
>> >> seems to be coming from inlinePerformIO (am I right here? Also, am
>> >> I using inlinePerformIO legitimately?), which is only needed when
>> >> we’re not in the IO monad.
>> >>
>> >>       {-# SPECIALISE ... IO #-}
>> >>       {-# SPECIALISE ... ST #-}
>> >>       bsMapM_gen :: Monad m => (Word8 -> m a) -> ByteString -> m ()
>> >>       bsMapM_gen f s = unsafePerformIO $ unsafeUseAsCStringLen s mapp
>> >>         where
>> >>           mapp (ptr, len) = return $ go 0
>> >>             where
>> >>               go i | i == len  = return ()
>> >>                    | otherwise = let !b = inlinePerformIO $
>> >>                                           peekByteOff ptr i
>> >>                                  in  f b >> go (i+1)
>> >>
>> >> The short version relies on fusion of `unpack` and `mapM_`. Its
>> >> advantage is that even when compiled without optimisations, it’s
>> >> still fast. (Question: would the same happen to other versions,
>> >> when put into Data.ByteString module? I suppose packages like
>> >> bytestring are compiled with optimisations, so it probably would.)
>> >>
>> >>       {-# SPECIALISE ... IO #-}
>> >>       {-# SPECIALISE ... ST #-}
>> >>       bsMapM_shortIO :: (Word8 -> IO a) -> ByteString -> IO ()
>> >>       bsMapM_shortIO f s = mapM_ f (unpack s)
>> >>
>> >> Finally, the IO-specialised version. It’s faster than generic
>> >> version (and, similarly, an ST-specialised version using
>> >> unsafeIOToST would be just as fast), so I assume a SPECIALISE pragma
>> >> involving bsMapM_IO and bsMapM_ST should be present.
>> >> (Question: are there other monads for which unsafeIOToMonad exists?)
>> >>
>> >>       bsMapM_IO :: (Word8 -> IO a) -> ByteString -> IO ()
>> >>       bsMapM_IO f s = unsafeUseAsCStringLen s mapp
>> >>         where
>> >>           mapp (ptr, len) = go 0
>> >>             where
>> >>               go i | i == len  = return ()
>> >>                    | otherwise = peekByteOff ptr i >>= f >> go (i+1)
>> >>
>> >> A-and here’s a table comparing performance of all three functions.
>> >> All timings are in milliseconds.
>> >>
>> >>               ghci       ghc       ghc -O     ghc -O2
>> >>           +----------+----------+----------+----------+
>> >>     gen   |   380    |    85    |   4.1    |   4.0    |
>> >>     short |    45    |    46    |  17.2    |  16.5    |
>> >>     IO    |   434    |    92    |   2.4    |   2.4    |
>> >>           +----------+----------+----------+----------+
>> >>
>> >> Here’s the code I used. (Question: have I messed up anything?)
>> >>
>> >>       import qualified Data.ByteString as BS
>> >>       import Data.Random
>> >>       import System.Random
>> >>       import System.IO.Unsafe
>> >>       import Control.Monad
>> >>       import Data.IORef
>> >>       import Criterion.Main
>> >>       import BSMaps
>> >>
>> >>       --a bytestring consisting of 65536 random bytes
>> >>       testCase = BS.pack $ fst $
>> >>                  flip sampleState (mkStdGen 8) $
>> >>                  replicateM (2^16) stdUniform
>> >>
>> >>       --sums elements of a bytestring, using given mapM_
>> >>       sumIO :: ((Word8 -> IO ()) -> BS.ByteString -> IO ()) ->
>> >>                BS.ByteString -> Word8
>> >>       sumIO f s = unsafePerformIO $ do
>> >>         sm <- newIORef 0
>> >>         f (modifyIORef' sm . (+)) s
>> >>         readIORef sm
>> >>
>> >>       --runs the tests
>> >>       main = defaultMain [
>> >>         bench "IO"    $ whnf (sumIO bsMapM_IO)    testCase,
>> >>         bench "short" $ whnf (sumIO bsMapM_short) testCase,
>> >>         bench "gen"   $ whnf (sumIO bsMapM_gen)   testCase]
>> >>
>> >> Finally, if there isn’t anything wrong, what are my next steps to see
>> >> this included into next version of bytestring?
>> >>
>> >>
>> >> _______________________________________________
>> >> Libraries mailing list
>> >> Libraries at haskell.org
>> >> http://www.haskell.org/mailman/listinfo/libraries
>> >
>> >
>> >
>> > _______________________________________________
>> > Libraries mailing list
>> > Libraries at haskell.org
>> > http://www.haskell.org/mailman/listinfo/libraries
>> >
>
>