[Haskell-beginners] How to write faster ByteString/Conduit code
John Ky
newhoggy at gmail.com
Mon Apr 4 09:37:27 UTC 2016
It turns out that using a simple enum type to implement a state machine
instead of a function avoids the performance penalty and allows me to
collapse a four stage conduit pipeline into one with 4 x performance
improvement.
blankStrings :: MonadThrow m => Conduit BS.ByteString m BS.ByteString
blankStrings = blankStrings' InJson
blankStrings' :: MonadThrow m => FastState -> Conduit BS.ByteString m
BS.ByteString
blankStrings' lastState = do
mbs <- await
case mbs of
Just bs -> do
let (!cs, Just (!nextState, _)) = unfoldrN (BS.length bs)
blankByteString (lastState, bs)
yield cs
blankStrings' nextState
Nothing -> return ()
where
blankByteString :: (FastState, ByteString) -> Maybe (Word8,
(FastState, ByteString))
blankByteString (InJson, bs) = case BS.uncons bs of
Just (!c, !cs) | isLeadingDigit c -> Just (w1 , (InNumber , cs))
Just (!c, !cs) | c == wDoubleQuote -> Just (wOpenParen , (InString , cs))
Just (!c, !cs) | isAlphabetic c -> Just (c , (InIdent , cs))
Just (!c, !cs) -> Just (c , (InJson , cs))
Nothing -> Nothing
blankByteString (InString, bs) = case BS.uncons bs of
Just (!c, !cs) | c == wBackslash -> Just (wSpace , (Escaped , cs))
Just (!c, !cs) | c == wDoubleQuote -> Just (wCloseParen, (InJson , cs))
Just (_ , !cs) -> Just (wSpace , (InString , cs))
Nothing -> Nothing
blankByteString (Escaped, bs) = case BS.uncons bs of
Just (_, !cs) -> Just (wSpace, (InString, cs))
Nothing -> Nothing
blankByteString (InNumber, bs) = case BS.uncons bs of
Just (!c, !cs) | isTrailingDigit c -> Just (w0 , (InNumber , cs))
Just (!c, !cs) | c == wDoubleQuote -> Just (wOpenParen , (InString , cs))
Just (!c, !cs) | isAlphabetic c -> Just (c , (InIdent , cs))
Just (!c, !cs) -> Just (c , (InJson , cs))
Nothing -> Nothing
blankByteString (InIdent, bs) = case BS.uncons bs of
Just (!c, !cs) | isAlphabetic c -> Just (wUnderscore, (InIdent , cs))
Just (!c, !cs) | isLeadingDigit c -> Just (w1 , (InNumber , cs))
Just (!c, !cs) | c == wDoubleQuote -> Just (wOpenParen , (InString , cs))
Just (!c, !cs) -> Just (c , (InJson , cs))
Nothing -> Nothing
I’m quite please with this, but any further suggestions are still welcome.
Cheers,
-John
On Sun, 3 Apr 2016 at 23:55 John Ky newhoggy at gmail.com
<http://mailto:newhoggy@gmail.com> wrote:
Hi Haskellers,
>
> I just rewrote the code to a state-machine in the hope that I can
> eventually collapse several stages in a pipeline into one, but this simple
> state-machine version turns out to be about 3 times slower even though it
> does the same thing:
>
> newtype Blank = Blank
> { blank :: BS.ByteString -> Maybe (Word8, (BS.ByteString, Blank))
> }
>
> escapeChar :: BS.ByteString -> Maybe (Word8, (BS.ByteString, Blank))
> escapeChar bs = case BS.uncons bs of
> Just (c, cs) -> Just (c, (cs, Blank (if c /= wBackslash then escapeChar else escapedChar)))
> Nothing -> Nothing
>
> escapedChar :: BS.ByteString -> Maybe (Word8, (BS.ByteString, Blank))
> escapedChar bs = case BS.uncons bs of
> Just (_, cs) -> Just (wUnderscore, (cs, Blank escapeChar))
> Nothing -> Nothing
>
> fastBlank :: MonadThrow m => Conduit BS.ByteString m BS.ByteString
> fastBlank = fastBlank' escapeChar
>
> fastBlank' :: MonadThrow m => (BS.ByteString -> Maybe (Word8, (BS.ByteString, Blank))) -> Conduit BS.ByteString m BS.ByteString
> fastBlank' blank = do
> mbs <- await
> case mbs of
> Just bs -> do
> let (cs, Just (_, Blank newBlank)) = unfoldrN (BS.length bs) (\(bs, Blank f) -> f bs) (bs, Blank blank)
> yield cs
> fastBlank' newBlank
> Nothing -> return ()
>
> I worry that if I go this approach, just the cost of the state-machine
> might mean I only break-even.
>
> Is there any reason why this version should be slower?
>
> Cheers,
>
> -John
>
>
> On Sun, 3 Apr 2016 at 23:11 John Ky <newhoggy at gmail.com> wrote:
>
>> Hello Haskellers,
>>
>> I’ve been trying to squeeze as much performance out of my code as
>> possible and I’ve come to a point where can’t figure out what more I can do.
>>
>> Here is some example code:
>>
>> blankEscapedChars :: MonadThrow m => Conduit BS.ByteString m BS.ByteString
>> blankEscapedChars = blankEscapedChars' ""
>>
>> blankEscapedChars' :: MonadThrow m => BS.ByteString -> Conduit BS.ByteString m BS.ByteString
>> blankEscapedChars' rs = do
>> mbs <- await
>> case mbs of
>> Just bs -> do
>> let cs = if BS.length rs /= 0 then BS.concat [rs, bs] else bs
>> let ds = fst (unfoldrN (BS.length cs) unescapeByteString (False, cs))
>> yield ds
>> blankEscapedChars' (BS.drop (BS.length ds) cs)
>> Nothing -> when (BS.length rs > 0) (yield rs)
>> where
>> unescapeByteString :: (Bool, ByteString) -> Maybe (Word8, (Bool, ByteString))
>> unescapeByteString (wasEscaped, bs) = case BS.uncons bs of
>> Just (_, cs) | wasEscaped -> Just (wUnderscore, (False, cs))
>> Just (c, cs) | c /= wBackslash -> Just (c, (False, cs))
>> Just (c, cs) -> Just (c, (True, cs))
>> Nothing -> Nothing
>>
>> The above function blankEscapedChars will go find all \ characters and
>> convert the following character to a _. For a 1 MB in memory JSON
>> ByteString, it benches at about 6.6 ms
>>
>> In all my code the basic strategy is the same. await for the next byte
>> string, then use and unfoldrN to produce a new ByteString for yielding.
>>
>> Anyone know of a way to go faster?
>>
>> Cheers,
>>
>> -John
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/beginners/attachments/20160404/2a10b43f/attachment-0001.html>
More information about the Beginners
mailing list