[Haskell-beginners] How to write faster ByteString/Conduit code

Sun Apr 3 13:55:41 UTC 2016

Hi Haskellers,

I just rewrote the code to a state-machine in the hope that I can
eventually collapse several stages in a pipeline into one, but this simple
state-machine version turns out to be about 3 times slower even though it
does the same thing:

newtype Blank = Blank
  { blank :: BS.ByteString -> Maybe (Word8, (BS.ByteString, Blank))
  }

escapeChar :: BS.ByteString -> Maybe (Word8, (BS.ByteString, Blank))
escapeChar bs = case BS.uncons bs of
  Just (c, cs)  -> Just (c, (cs, Blank (if c /= wBackslash then
escapeChar else escapedChar)))
  Nothing       -> Nothing

escapedChar :: BS.ByteString -> Maybe (Word8, (BS.ByteString, Blank))
escapedChar bs = case BS.uncons bs of
  Just (_, cs) -> Just (wUnderscore, (cs, Blank escapeChar))
  Nothing      -> Nothing

fastBlank :: MonadThrow m => Conduit BS.ByteString m BS.ByteString
fastBlank = fastBlank' escapeChar

fastBlank' :: MonadThrow m => (BS.ByteString -> Maybe (Word8,
(BS.ByteString, Blank))) -> Conduit BS.ByteString m BS.ByteString
fastBlank' blank = do
  mbs <- await
  case mbs of
    Just bs -> do
      let (cs, Just (_, Blank newBlank)) = unfoldrN (BS.length bs)
(\(bs, Blank f) -> f bs) (bs, Blank blank)
      yield cs
      fastBlank' newBlank
    Nothing -> return ()

I worry that if I go this approach, just the cost of the state-machine
might mean I only break-even.

Is there any reason why this version should be slower?

Cheers,

-John

On Sun, 3 Apr 2016 at 23:11 John Ky <newhoggy at gmail.com> wrote:

> Hello Haskellers,
>
> I’ve been trying to squeeze as much performance out of my code as possible
> and I’ve come to a point where can’t figure out what more I can do.
>
> Here is some example code:
>
> blankEscapedChars :: MonadThrow m => Conduit BS.ByteString m BS.ByteString
> blankEscapedChars = blankEscapedChars' ""
>
> blankEscapedChars' :: MonadThrow m => BS.ByteString -> Conduit BS.ByteString m BS.ByteString
> blankEscapedChars' rs = do
>   mbs <- await
>   case mbs of
>     Just bs -> do
>       let cs = if BS.length rs /= 0 then BS.concat [rs, bs] else bs
>       let ds = fst (unfoldrN (BS.length cs) unescapeByteString (False, cs))
>       yield ds
>       blankEscapedChars' (BS.drop (BS.length ds) cs)
>     Nothing -> when (BS.length rs > 0) (yield rs)
>   where
>     unescapeByteString :: (Bool, ByteString) -> Maybe (Word8, (Bool, ByteString))
>     unescapeByteString (wasEscaped, bs) = case BS.uncons bs of
>       Just (_, cs) | wasEscaped       -> Just (wUnderscore, (False, cs))
>       Just (c, cs) | c /= wBackslash  -> Just (c, (False, cs))
>       Just (c, cs)                    -> Just (c, (True, cs))
>       Nothing                         -> Nothing
>
> The above function blankEscapedChars will go find all \ characters and
> convert the following character to a _. For a 1 MB in memory JSON
> ByteString, it benches at about 6.6 ms
>
> In all my code the basic strategy is the same. await for the next byte
> string, then use and unfoldrN to produce a new ByteString for yielding.
>
> Anyone know of a way to go faster?
>
> Cheers,
>
> -John
> 
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/beginners/attachments/20160403/ed64048c/attachment-0001.html>