[Haskell-beginners] How to write faster ByteString/Conduit code

John Ky newhoggy at gmail.com
Mon Apr 4 09:37:27 UTC 2016


It turns out that using a simple enum type to implement a state machine
instead of a function avoids the performance penalty and allows me to
collapse a four stage conduit pipeline into one with 4 x performance
improvement.

blankStrings :: MonadThrow m => Conduit BS.ByteString m BS.ByteString
blankStrings = blankStrings' InJson

blankStrings' :: MonadThrow m => FastState -> Conduit BS.ByteString m
BS.ByteString
blankStrings' lastState = do
  mbs <- await
  case mbs of
    Just bs -> do
      let (!cs, Just (!nextState, _)) = unfoldrN (BS.length bs)
blankByteString (lastState, bs)
      yield cs
      blankStrings' nextState
    Nothing -> return ()
  where
    blankByteString :: (FastState, ByteString) -> Maybe (Word8,
(FastState, ByteString))
    blankByteString (InJson, bs) = case BS.uncons bs of
      Just (!c, !cs) | isLeadingDigit c   -> Just (w1         , (InNumber , cs))
      Just (!c, !cs) | c == wDoubleQuote  -> Just (wOpenParen , (InString , cs))
      Just (!c, !cs) | isAlphabetic c     -> Just (c          , (InIdent  , cs))
      Just (!c, !cs)                      -> Just (c          , (InJson   , cs))
      Nothing -> Nothing
    blankByteString (InString, bs) = case BS.uncons bs of
      Just (!c, !cs) | c == wBackslash    -> Just (wSpace     , (Escaped  , cs))
      Just (!c, !cs) | c == wDoubleQuote  -> Just (wCloseParen, (InJson   , cs))
      Just (_ , !cs)                      -> Just (wSpace     , (InString , cs))
      Nothing                             -> Nothing
    blankByteString (Escaped, bs) = case BS.uncons bs of
      Just (_, !cs)                       -> Just (wSpace, (InString, cs))
      Nothing                             -> Nothing
    blankByteString (InNumber, bs) = case BS.uncons bs of
      Just (!c, !cs) | isTrailingDigit c  -> Just (w0         , (InNumber , cs))
      Just (!c, !cs) | c == wDoubleQuote  -> Just (wOpenParen , (InString , cs))
      Just (!c, !cs) | isAlphabetic c     -> Just (c          , (InIdent  , cs))
      Just (!c, !cs)                      -> Just (c          , (InJson   , cs))
      Nothing                             -> Nothing
    blankByteString (InIdent, bs) = case BS.uncons bs of
      Just (!c, !cs) | isAlphabetic c     -> Just (wUnderscore, (InIdent  , cs))
      Just (!c, !cs) | isLeadingDigit c   -> Just (w1         , (InNumber , cs))
      Just (!c, !cs) | c == wDoubleQuote  -> Just (wOpenParen , (InString , cs))
      Just (!c, !cs)                      -> Just (c          , (InJson   , cs))
      Nothing                             -> Nothing

I’m quite please with this, but any further suggestions are still welcome.

Cheers,

-John

On Sun, 3 Apr 2016 at 23:55 John Ky newhoggy at gmail.com
<http://mailto:newhoggy@gmail.com> wrote:

Hi Haskellers,
>
> I just rewrote the code to a state-machine in the hope that I can
> eventually collapse several stages in a pipeline into one, but this simple
> state-machine version turns out to be about 3 times slower even though it
> does the same thing:
>
> newtype Blank = Blank
>   { blank :: BS.ByteString -> Maybe (Word8, (BS.ByteString, Blank))
>   }
>
> escapeChar :: BS.ByteString -> Maybe (Word8, (BS.ByteString, Blank))
> escapeChar bs = case BS.uncons bs of
>   Just (c, cs)  -> Just (c, (cs, Blank (if c /= wBackslash then escapeChar else escapedChar)))
>   Nothing       -> Nothing
>
> escapedChar :: BS.ByteString -> Maybe (Word8, (BS.ByteString, Blank))
> escapedChar bs = case BS.uncons bs of
>   Just (_, cs) -> Just (wUnderscore, (cs, Blank escapeChar))
>   Nothing      -> Nothing
>
> fastBlank :: MonadThrow m => Conduit BS.ByteString m BS.ByteString
> fastBlank = fastBlank' escapeChar
>
> fastBlank' :: MonadThrow m => (BS.ByteString -> Maybe (Word8, (BS.ByteString, Blank))) -> Conduit BS.ByteString m BS.ByteString
> fastBlank' blank = do
>   mbs <- await
>   case mbs of
>     Just bs -> do
>       let (cs, Just (_, Blank newBlank)) = unfoldrN (BS.length bs) (\(bs, Blank f) -> f bs) (bs, Blank blank)
>       yield cs
>       fastBlank' newBlank
>     Nothing -> return ()
>
> I worry that if I go this approach, just the cost of the state-machine
> might mean I only break-even.
>
> Is there any reason why this version should be slower?
>
> Cheers,
>
> -John
>>
> On Sun, 3 Apr 2016 at 23:11 John Ky <newhoggy at gmail.com> wrote:
>
>> Hello Haskellers,
>>
>> I’ve been trying to squeeze as much performance out of my code as
>> possible and I’ve come to a point where can’t figure out what more I can do.
>>
>> Here is some example code:
>>
>> blankEscapedChars :: MonadThrow m => Conduit BS.ByteString m BS.ByteString
>> blankEscapedChars = blankEscapedChars' ""
>>
>> blankEscapedChars' :: MonadThrow m => BS.ByteString -> Conduit BS.ByteString m BS.ByteString
>> blankEscapedChars' rs = do
>>   mbs <- await
>>   case mbs of
>>     Just bs -> do
>>       let cs = if BS.length rs /= 0 then BS.concat [rs, bs] else bs
>>       let ds = fst (unfoldrN (BS.length cs) unescapeByteString (False, cs))
>>       yield ds
>>       blankEscapedChars' (BS.drop (BS.length ds) cs)
>>     Nothing -> when (BS.length rs > 0) (yield rs)
>>   where
>>     unescapeByteString :: (Bool, ByteString) -> Maybe (Word8, (Bool, ByteString))
>>     unescapeByteString (wasEscaped, bs) = case BS.uncons bs of
>>       Just (_, cs) | wasEscaped       -> Just (wUnderscore, (False, cs))
>>       Just (c, cs) | c /= wBackslash  -> Just (c, (False, cs))
>>       Just (c, cs)                    -> Just (c, (True, cs))
>>       Nothing                         -> Nothing
>>
>> The above function blankEscapedChars will go find all \ characters and
>> convert the following character to a _. For a 1 MB in memory JSON
>> ByteString, it benches at about 6.6 ms
>>
>> In all my code the basic strategy is the same. await for the next byte
>> string, then use and unfoldrN to produce a new ByteString for yielding.
>>
>> Anyone know of a way to go faster?
>>
>> Cheers,
>>
>> -John
>>>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/beginners/attachments/20160404/2a10b43f/attachment-0001.html>


More information about the Beginners mailing list