[Haskell-beginners] How to write faster ByteString/Conduit code
John Ky
newhoggy at gmail.com
Sun Apr 3 13:55:41 UTC 2016
Hi Haskellers,
I just rewrote the code to a state-machine in the hope that I can
eventually collapse several stages in a pipeline into one, but this simple
state-machine version turns out to be about 3 times slower even though it
does the same thing:
newtype Blank = Blank
{ blank :: BS.ByteString -> Maybe (Word8, (BS.ByteString, Blank))
}
escapeChar :: BS.ByteString -> Maybe (Word8, (BS.ByteString, Blank))
escapeChar bs = case BS.uncons bs of
Just (c, cs) -> Just (c, (cs, Blank (if c /= wBackslash then
escapeChar else escapedChar)))
Nothing -> Nothing
escapedChar :: BS.ByteString -> Maybe (Word8, (BS.ByteString, Blank))
escapedChar bs = case BS.uncons bs of
Just (_, cs) -> Just (wUnderscore, (cs, Blank escapeChar))
Nothing -> Nothing
fastBlank :: MonadThrow m => Conduit BS.ByteString m BS.ByteString
fastBlank = fastBlank' escapeChar
fastBlank' :: MonadThrow m => (BS.ByteString -> Maybe (Word8,
(BS.ByteString, Blank))) -> Conduit BS.ByteString m BS.ByteString
fastBlank' blank = do
mbs <- await
case mbs of
Just bs -> do
let (cs, Just (_, Blank newBlank)) = unfoldrN (BS.length bs)
(\(bs, Blank f) -> f bs) (bs, Blank blank)
yield cs
fastBlank' newBlank
Nothing -> return ()
I worry that if I go this approach, just the cost of the state-machine
might mean I only break-even.
Is there any reason why this version should be slower?
Cheers,
-John
On Sun, 3 Apr 2016 at 23:11 John Ky <newhoggy at gmail.com> wrote:
> Hello Haskellers,
>
> I’ve been trying to squeeze as much performance out of my code as possible
> and I’ve come to a point where can’t figure out what more I can do.
>
> Here is some example code:
>
> blankEscapedChars :: MonadThrow m => Conduit BS.ByteString m BS.ByteString
> blankEscapedChars = blankEscapedChars' ""
>
> blankEscapedChars' :: MonadThrow m => BS.ByteString -> Conduit BS.ByteString m BS.ByteString
> blankEscapedChars' rs = do
> mbs <- await
> case mbs of
> Just bs -> do
> let cs = if BS.length rs /= 0 then BS.concat [rs, bs] else bs
> let ds = fst (unfoldrN (BS.length cs) unescapeByteString (False, cs))
> yield ds
> blankEscapedChars' (BS.drop (BS.length ds) cs)
> Nothing -> when (BS.length rs > 0) (yield rs)
> where
> unescapeByteString :: (Bool, ByteString) -> Maybe (Word8, (Bool, ByteString))
> unescapeByteString (wasEscaped, bs) = case BS.uncons bs of
> Just (_, cs) | wasEscaped -> Just (wUnderscore, (False, cs))
> Just (c, cs) | c /= wBackslash -> Just (c, (False, cs))
> Just (c, cs) -> Just (c, (True, cs))
> Nothing -> Nothing
>
> The above function blankEscapedChars will go find all \ characters and
> convert the following character to a _. For a 1 MB in memory JSON
> ByteString, it benches at about 6.6 ms
>
> In all my code the basic strategy is the same. await for the next byte
> string, then use and unfoldrN to produce a new ByteString for yielding.
>
> Anyone know of a way to go faster?
>
> Cheers,
>
> -John
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/beginners/attachments/20160403/ed64048c/attachment-0001.html>
More information about the Beginners
mailing list