mapM_ for bytestring

Edward Kmett ekmett at gmail.com
Fri Sep 6 16:17:55 CEST 2013


ByteString itself works through a ForeignPtr, rather than a ByteArray# and
it does a lot of 'inlinePerformIO' tricks to avoid the MVar implicit in a
full unsafePerformIO as well as eke out some extra performance benefits
over and above unsafeDupablePerformIO that I forget the details of.

Text, which came later on the other hand does work under ByteArray# and
MutableByteArray# s in a much more principled fashion.

There is, however, a case for keeping some ability to work with
ForeignPtr's in a ByteString style, because there are a lot of tricks that
live on the ByteString stack for doing things like mmap'ing that can't work
with the ByteArray# approach.

-Edward


On Fri, Sep 6, 2013 at 3:11 AM, Simon Peyton-Jones <simonpj at microsoft.com>wrote:

> | The overhead
> | seems to be coming from inlinePerformIO (am I right here? Also, am
>
> Why do you need all this 'unsafe' stuff?  I think because bytestrings work
> internally using side effects. But why can't you use runST?
>
> If for some reason you really need IO, it's presumably not because you are
> really doing I/O.  It's a kind of specialised application.  The overheads
> from unsafePerformIO and friends are partly related (I think) to the bad
> IO-ish things that might happen inside.
>
> So perhaps you (plural) can look carefully at what you are really trying
> to do, and propose new primpops or whatever to support it at low cost.
>
> Don't give *me* the answers!  I'm just pointing out that there is no
> fundamental reason for this to run slower than it would in C, and if it
> does it might be worth digging a bit.
>
> Simon
>
> | -----Original Message-----
> | From: Libraries [mailto:libraries-bounces at haskell.org] On Behalf Of
> | Artyom Kazak
> | Sent: 05 September 2013 21:38
> | To: libraries at haskell.org
> | Subject: Re: mapM_ for bytestring
> |
> | So, I have written several implementations of mapM_:
> |      * bsMapM_gen   — generic, works for any monad
> |      * bsMapM_short — essentially (\f s -> mapM_ f $ unpack s)
> |      * bsMapM_IO    — hand-written version specifically for IO
> |
> | Generic and hand-written versions don’t differ much. The overhead
> | seems to be coming from inlinePerformIO (am I right here? Also, am
> | I using inlinePerformIO legitimately?), which is only needed when
> | we’re not in the IO monad.
> |
> |        {-# SPECIALISE ... IO #-}
> |        {-# SPECIALISE ... ST #-}
> |        bsMapM_gen :: Monad m => (Word8 -> m a) -> ByteString -> m ()
> |        bsMapM_gen f s = unsafePerformIO $ unsafeUseAsCStringLen s mapp
> |          where
> |            mapp (ptr, len) = return $ go 0
> |              where
> |                go i | i == len  = return ()
> |                     | otherwise = let !b = inlinePerformIO $
> |                                            peekByteOff ptr i
> |                                   in  f b >> go (i+1)
> |
> | The short version relies on fusion of `unpack` and `mapM_`. Its
> | advantage is that even when compiled without optimisations, it’s
> | still fast. (Question: would the same happen to other versions,
> | when put into Data.ByteString module? I suppose packages like
> | bytestring are compiled with optimisations, so it probably would.)
> |
> |        {-# SPECIALISE ... IO #-}
> |        {-# SPECIALISE ... ST #-}
> |        bsMapM_shortIO :: (Word8 -> IO a) -> ByteString -> IO ()
> |        bsMapM_shortIO f s = mapM_ f (unpack s)
> |
> | Finally, the IO-specialised version. It’s faster than generic
> | version (and, similarly, an ST-specialised version using
> | unsafeIOToST would be just as fast), so I assume a SPECIALISE pragma
> | involving bsMapM_IO and bsMapM_ST should be present.
> | (Question: are there other monads for which unsafeIOToMonad exists?)
> |
> |        bsMapM_IO :: (Word8 -> IO a) -> ByteString -> IO ()
> |        bsMapM_IO f s = unsafeUseAsCStringLen s mapp
> |          where
> |            mapp (ptr, len) = go 0
> |              where
> |                go i | i == len  = return ()
> |                     | otherwise = peekByteOff ptr i >>= f >> go (i+1)
> |
> | A-and here’s a table comparing performance of all three functions.
> | All timings are in milliseconds.
> |
> |                ghci       ghc       ghc -O     ghc -O2
> |            +----------+----------+----------+----------+
> |      gen   |   380    |    85    |   4.1    |   4.0    |
> |      short |    45    |    46    |  17.2    |  16.5    |
> |      IO    |   434    |    92    |   2.4    |   2.4    |
> |            +----------+----------+----------+----------+
> |
> | Here’s the code I used. (Question: have I messed up anything?)
> |
> |        import qualified Data.ByteString as BS
> |        import Data.Random
> |        import System.Random
> |        import System.IO.Unsafe
> |        import Control.Monad
> |        import Data.IORef
> |        import Criterion.Main
> |        import BSMaps
> |
> |        --a bytestring consisting of 65536 random bytes
> |        testCase = BS.pack $ fst $
> |                   flip sampleState (mkStdGen 8) $
> |                   replicateM (2^16) stdUniform
> |
> |        --sums elements of a bytestring, using given mapM_
> |        sumIO :: ((Word8 -> IO ()) -> BS.ByteString -> IO ()) ->
> |                 BS.ByteString -> Word8
> |        sumIO f s = unsafePerformIO $ do
> |          sm <- newIORef 0
> |          f (modifyIORef' sm . (+)) s
> |          readIORef sm
> |
> |        --runs the tests
> |        main = defaultMain [
> |          bench "IO"    $ whnf (sumIO bsMapM_IO)    testCase,
> |          bench "short" $ whnf (sumIO bsMapM_short) testCase,
> |          bench "gen"   $ whnf (sumIO bsMapM_gen)   testCase]
> |
> | Finally, if there isn’t anything wrong, what are my next steps to see
> | this included into next version of bytestring?
> |
> | _______________________________________________
> | Libraries mailing list
> | Libraries at haskell.org
> | http://www.haskell.org/mailman/listinfo/libraries
> _______________________________________________
> Libraries mailing list
> Libraries at haskell.org
> http://www.haskell.org/mailman/listinfo/libraries
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/libraries/attachments/20130906/1e1bd137/attachment.htm>


More information about the Libraries mailing list