[Haskell-cafe] Announcing binary-parsers

Mon Oct 10 03:39:33 UTC 2016

> The only reason for scanning ahead is in case you're dealing with lazy
> bytestrings and so need to glue them together in order to use
> bytestring-lexing. Older versions of the library did have support for
> lazy bytestrings, but I removed it because it was bitrotten and
> unused. But if you really need it, I can add new variants of the
> lexers for dealing with the possibility of requesting new data when
> the input runs out.

Yes, please! the only reason i have to use `takeWhile isDigit` myself is that `takeWhile` will take care partial input for me, but if you can provide a version which is easy to deal incremental input, then i should rely on bytestring-lexing completely. You may be interested in `scanChunks` combinator in binary-parsers. Let’s work something out, if you need any help please tell me, thanks!  

cheers!~
winter

> On Oct 9, 2016, at 13:56, wren romano <winterkoninkje at gmail.com> wrote:
> 
> On Sun, Oct 2, 2016 at 3:17 AM, 韩冬(基础平台部) <handongwinter at didichuxing.com> wrote:
>> Hi wren!
>> 
>> Yes, i noticed that attoparsec's numeric parsers are slow. I have a benchmark set to compare attoparsec and binary-parsers on different sample JSON files, it's on github: https://github.com/winterland1989/binary-parsers.
>> 
>> I'm pretty sure bytestring-lexing helped a lot, for example, the average decoding speed improvement is around 20%, but numeric only benchmarks(integers and numbers) improved by 30% !
> 
> So still some substantial gains for non-numeric stuff, nice!
> 
>> Parsing is just a part of JSON decoding, lots of time is spent on unescaping, .etc. So the parser's improvement is quite large IMHO.
>> 
>> BTW, can you provide a version of lexer which doesn't check whether a Word is a digit? In binary-parsers i use something like `takeWhile isDigit` to extract the input ByteString, so there's no need to verify this in lexer again. Maybe we can have another performance improvement.
> 
> I suppose I could, but then it wouldn't be guaranteed to return
> correct answers. The way things are set up now, the intended workflow
> is that wherever you're expecting a number, you should just hand the
> ByteString over to bytestring-lexing (i.e., not bother
> scanning/pre-lexing via `takeWhile isDigit`) and it'll give back the
> answer together with the remainder of the input. This ensures that you
> don't need to do two passes over the characters. So, for Attoparsec
> itself you'd wrap it up with something like:
> 
>    decimal :: Integral a => Parser a
>    decimal =
>        get >>= \bs ->
>        case readDecimal bs of
>        Nothing -> fail "error message"
>        Just (a, bs') -> put bs' >> return a
> 
> Alas `get` isn't exported[1], but you get the idea. Of course, for
> absolute performance you may want to inline all the combinators to see
> if there's stuff you can get rid of.
> 
> The only reason for scanning ahead is in case you're dealing with lazy
> bytestrings and so need to glue them together in order to use
> bytestring-lexing. Older versions of the library did have support for
> lazy bytestrings, but I removed it because it was bitrotten and
> unused. But if you really need it, I can add new variants of the
> lexers for dealing with the possibility of requesting new data when
> the input runs out.
> 
> 
> [1] <http://hackage.haskell.org/package/attoparsec-0.13.1.0/docs/src/Data-Attoparsec-ByteString-Internal.html#get>
> 
> -- 
> Live well,
> ~wren
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.