mapM_ for bytestring

Edward Kmett ekmett at gmail.com
Fri Sep 6 01:36:26 CEST 2013


I personally find myself on the other side of the divide, especially if
someone else is willing to do the work to write and test the method for two
reasons.

1.) The bytestring API is already fairly noisy in that it really needs to
be imported qualified already. Finding such a common function *missing* from
an API that is so all-inclusive is pretty jarring from a user perspective,
so it would seem to follow the principle of least surprise to include it if
possible.

2.) Bytestring was built from the ground up for speed. A factor of 4-8x
speed difference even before dipping into inlinePerformIO tricks isn't
negligible here.

-Edward



On Thu, Sep 5, 2013 at 7:07 PM, Johan Tibell <johan.tibell at gmail.com> wrote:

> I think the important issue here is whether we want to add monadic
> versions of functions in bytestring (where it makes sense)? It wield
> yield better performance than using `unpack`, but at the cost of lots
> of code duplication.
>
> We have the same issue in containers where you could for example want
> an `updateM` function, that would let you decide whether to update a
> value by performing some side effect. You could simulate this using a
> combination of `lookup` and `insert`, but that requires twice the
> number of traversals of the data structure.
>
> Right now I'm of the mind that the extra traversals (and using unpack
> in the case of ByteString) is better than the code duplication.
>
> On Thu, Sep 5, 2013 at 3:51 PM, Edward Kmett <ekmett at gmail.com> wrote:
> > The maintainers for bytestring are still listed as Don Stewart and Duncan
> > Coutts on the package, and it doesn't seem to fall the list of core
> packages
> > per http://www.haskell.org/haskellwiki/Library_submissions so I suppose
> it
> > would come down to talking one of them into taking the patch.
> >
> > It seems odd that a fundamental package like this is omitted from the
> > Library_submissions page though, as the older
> > http://trac.haskell.org/haskell-platform/wiki/PackageMaintainers page
> on the
> > trac shows it as maintained by GHC Central.
> >
> > -Edward
> >
> >
> >
> > On Thu, Sep 5, 2013 at 4:38 PM, Artyom Kazak <yom at artyom.me> wrote:
> >>
> >> So, I have written several implementations of mapM_:
> >>     * bsMapM_gen   — generic, works for any monad
> >>     * bsMapM_short — essentially (\f s -> mapM_ f $ unpack s)
> >>     * bsMapM_IO    — hand-written version specifically for IO
> >>
> >> Generic and hand-written versions don’t differ much. The overhead
> >> seems to be coming from inlinePerformIO (am I right here? Also, am
> >> I using inlinePerformIO legitimately?), which is only needed when
> >> we’re not in the IO monad.
> >>
> >>       {-# SPECIALISE ... IO #-}
> >>       {-# SPECIALISE ... ST #-}
> >>       bsMapM_gen :: Monad m => (Word8 -> m a) -> ByteString -> m ()
> >>       bsMapM_gen f s = unsafePerformIO $ unsafeUseAsCStringLen s mapp
> >>         where
> >>           mapp (ptr, len) = return $ go 0
> >>             where
> >>               go i | i == len  = return ()
> >>                    | otherwise = let !b = inlinePerformIO $
> >>                                           peekByteOff ptr i
> >>                                  in  f b >> go (i+1)
> >>
> >> The short version relies on fusion of `unpack` and `mapM_`. Its
> >> advantage is that even when compiled without optimisations, it’s
> >> still fast. (Question: would the same happen to other versions,
> >> when put into Data.ByteString module? I suppose packages like
> >> bytestring are compiled with optimisations, so it probably would.)
> >>
> >>       {-# SPECIALISE ... IO #-}
> >>       {-# SPECIALISE ... ST #-}
> >>       bsMapM_shortIO :: (Word8 -> IO a) -> ByteString -> IO ()
> >>       bsMapM_shortIO f s = mapM_ f (unpack s)
> >>
> >> Finally, the IO-specialised version. It’s faster than generic
> >> version (and, similarly, an ST-specialised version using
> >> unsafeIOToST would be just as fast), so I assume a SPECIALISE pragma
> >> involving bsMapM_IO and bsMapM_ST should be present.
> >> (Question: are there other monads for which unsafeIOToMonad exists?)
> >>
> >>       bsMapM_IO :: (Word8 -> IO a) -> ByteString -> IO ()
> >>       bsMapM_IO f s = unsafeUseAsCStringLen s mapp
> >>         where
> >>           mapp (ptr, len) = go 0
> >>             where
> >>               go i | i == len  = return ()
> >>                    | otherwise = peekByteOff ptr i >>= f >> go (i+1)
> >>
> >> A-and here’s a table comparing performance of all three functions.
> >> All timings are in milliseconds.
> >>
> >>               ghci       ghc       ghc -O     ghc -O2
> >>           +----------+----------+----------+----------+
> >>     gen   |   380    |    85    |   4.1    |   4.0    |
> >>     short |    45    |    46    |  17.2    |  16.5    |
> >>     IO    |   434    |    92    |   2.4    |   2.4    |
> >>           +----------+----------+----------+----------+
> >>
> >> Here’s the code I used. (Question: have I messed up anything?)
> >>
> >>       import qualified Data.ByteString as BS
> >>       import Data.Random
> >>       import System.Random
> >>       import System.IO.Unsafe
> >>       import Control.Monad
> >>       import Data.IORef
> >>       import Criterion.Main
> >>       import BSMaps
> >>
> >>       --a bytestring consisting of 65536 random bytes
> >>       testCase = BS.pack $ fst $
> >>                  flip sampleState (mkStdGen 8) $
> >>                  replicateM (2^16) stdUniform
> >>
> >>       --sums elements of a bytestring, using given mapM_
> >>       sumIO :: ((Word8 -> IO ()) -> BS.ByteString -> IO ()) ->
> >>                BS.ByteString -> Word8
> >>       sumIO f s = unsafePerformIO $ do
> >>         sm <- newIORef 0
> >>         f (modifyIORef' sm . (+)) s
> >>         readIORef sm
> >>
> >>       --runs the tests
> >>       main = defaultMain [
> >>         bench "IO"    $ whnf (sumIO bsMapM_IO)    testCase,
> >>         bench "short" $ whnf (sumIO bsMapM_short) testCase,
> >>         bench "gen"   $ whnf (sumIO bsMapM_gen)   testCase]
> >>
> >> Finally, if there isn’t anything wrong, what are my next steps to see
> >> this included into next version of bytestring?
> >>
> >>
> >> _______________________________________________
> >> Libraries mailing list
> >> Libraries at haskell.org
> >> http://www.haskell.org/mailman/listinfo/libraries
> >
> >
> >
> > _______________________________________________
> > Libraries mailing list
> > Libraries at haskell.org
> > http://www.haskell.org/mailman/listinfo/libraries
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/libraries/attachments/20130905/58adfdb7/attachment.htm>


More information about the Libraries mailing list