mapM_ for bytestring
Edward Kmett
ekmett at gmail.com
Fri Sep 6 16:17:55 CEST 2013
ByteString itself works through a ForeignPtr, rather than a ByteArray# and
it does a lot of 'inlinePerformIO' tricks to avoid the MVar implicit in a
full unsafePerformIO as well as eke out some extra performance benefits
over and above unsafeDupablePerformIO that I forget the details of.
Text, which came later on the other hand does work under ByteArray# and
MutableByteArray# s in a much more principled fashion.
There is, however, a case for keeping some ability to work with
ForeignPtr's in a ByteString style, because there are a lot of tricks that
live on the ByteString stack for doing things like mmap'ing that can't work
with the ByteArray# approach.
-Edward
On Fri, Sep 6, 2013 at 3:11 AM, Simon Peyton-Jones <simonpj at microsoft.com>wrote:
> | The overhead
> | seems to be coming from inlinePerformIO (am I right here? Also, am
>
> Why do you need all this 'unsafe' stuff? I think because bytestrings work
> internally using side effects. But why can't you use runST?
>
> If for some reason you really need IO, it's presumably not because you are
> really doing I/O. It's a kind of specialised application. The overheads
> from unsafePerformIO and friends are partly related (I think) to the bad
> IO-ish things that might happen inside.
>
> So perhaps you (plural) can look carefully at what you are really trying
> to do, and propose new primpops or whatever to support it at low cost.
>
> Don't give *me* the answers! I'm just pointing out that there is no
> fundamental reason for this to run slower than it would in C, and if it
> does it might be worth digging a bit.
>
> Simon
>
> | -----Original Message-----
> | From: Libraries [mailto:libraries-bounces at haskell.org] On Behalf Of
> | Artyom Kazak
> | Sent: 05 September 2013 21:38
> | To: libraries at haskell.org
> | Subject: Re: mapM_ for bytestring
> |
> | So, I have written several implementations of mapM_:
> | * bsMapM_gen — generic, works for any monad
> | * bsMapM_short — essentially (\f s -> mapM_ f $ unpack s)
> | * bsMapM_IO — hand-written version specifically for IO
> |
> | Generic and hand-written versions don’t differ much. The overhead
> | seems to be coming from inlinePerformIO (am I right here? Also, am
> | I using inlinePerformIO legitimately?), which is only needed when
> | we’re not in the IO monad.
> |
> | {-# SPECIALISE ... IO #-}
> | {-# SPECIALISE ... ST #-}
> | bsMapM_gen :: Monad m => (Word8 -> m a) -> ByteString -> m ()
> | bsMapM_gen f s = unsafePerformIO $ unsafeUseAsCStringLen s mapp
> | where
> | mapp (ptr, len) = return $ go 0
> | where
> | go i | i == len = return ()
> | | otherwise = let !b = inlinePerformIO $
> | peekByteOff ptr i
> | in f b >> go (i+1)
> |
> | The short version relies on fusion of `unpack` and `mapM_`. Its
> | advantage is that even when compiled without optimisations, it’s
> | still fast. (Question: would the same happen to other versions,
> | when put into Data.ByteString module? I suppose packages like
> | bytestring are compiled with optimisations, so it probably would.)
> |
> | {-# SPECIALISE ... IO #-}
> | {-# SPECIALISE ... ST #-}
> | bsMapM_shortIO :: (Word8 -> IO a) -> ByteString -> IO ()
> | bsMapM_shortIO f s = mapM_ f (unpack s)
> |
> | Finally, the IO-specialised version. It’s faster than generic
> | version (and, similarly, an ST-specialised version using
> | unsafeIOToST would be just as fast), so I assume a SPECIALISE pragma
> | involving bsMapM_IO and bsMapM_ST should be present.
> | (Question: are there other monads for which unsafeIOToMonad exists?)
> |
> | bsMapM_IO :: (Word8 -> IO a) -> ByteString -> IO ()
> | bsMapM_IO f s = unsafeUseAsCStringLen s mapp
> | where
> | mapp (ptr, len) = go 0
> | where
> | go i | i == len = return ()
> | | otherwise = peekByteOff ptr i >>= f >> go (i+1)
> |
> | A-and here’s a table comparing performance of all three functions.
> | All timings are in milliseconds.
> |
> | ghci ghc ghc -O ghc -O2
> | +----------+----------+----------+----------+
> | gen | 380 | 85 | 4.1 | 4.0 |
> | short | 45 | 46 | 17.2 | 16.5 |
> | IO | 434 | 92 | 2.4 | 2.4 |
> | +----------+----------+----------+----------+
> |
> | Here’s the code I used. (Question: have I messed up anything?)
> |
> | import qualified Data.ByteString as BS
> | import Data.Random
> | import System.Random
> | import System.IO.Unsafe
> | import Control.Monad
> | import Data.IORef
> | import Criterion.Main
> | import BSMaps
> |
> | --a bytestring consisting of 65536 random bytes
> | testCase = BS.pack $ fst $
> | flip sampleState (mkStdGen 8) $
> | replicateM (2^16) stdUniform
> |
> | --sums elements of a bytestring, using given mapM_
> | sumIO :: ((Word8 -> IO ()) -> BS.ByteString -> IO ()) ->
> | BS.ByteString -> Word8
> | sumIO f s = unsafePerformIO $ do
> | sm <- newIORef 0
> | f (modifyIORef' sm . (+)) s
> | readIORef sm
> |
> | --runs the tests
> | main = defaultMain [
> | bench "IO" $ whnf (sumIO bsMapM_IO) testCase,
> | bench "short" $ whnf (sumIO bsMapM_short) testCase,
> | bench "gen" $ whnf (sumIO bsMapM_gen) testCase]
> |
> | Finally, if there isn’t anything wrong, what are my next steps to see
> | this included into next version of bytestring?
> |
> | _______________________________________________
> | Libraries mailing list
> | Libraries at haskell.org
> | http://www.haskell.org/mailman/listinfo/libraries
> _______________________________________________
> Libraries mailing list
> Libraries at haskell.org
> http://www.haskell.org/mailman/listinfo/libraries
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/libraries/attachments/20130906/1e1bd137/attachment.htm>
More information about the Libraries
mailing list