[Haskell-cafe] Announcing binary-parsers

winter drkoster at qq.com
Tue Oct 11 10:14:32 UTC 2016


Hi, wren

BTW, I think it’s a good idea to host your code on github which is easier to send patch .etc, can you mirror your bytestring-lexing repo to github? 

happy hacking!

winter


> On Oct 10, 2016, at 11:39, winter <drkoster at qq.com> wrote:
> 
> 
> 
> 
> 
> 
>> On Oct 9, 2016, at 13:56, wren romano <winterkoninkje at gmail.com> wrote:
>> 
>> On Sun, Oct 2, 2016 at 3:17 AM, 韩冬(基础平台部) <handongwinter at didichuxing.com> wrote:
>>> Hi wren!
>>> 
>>> Yes, i noticed that attoparsec's numeric parsers are slow. I have a benchmark set to compare attoparsec and binary-parsers on different sample JSON files, it's on github: https://github.com/winterland1989/binary-parsers.
>>> 
>>> I'm pretty sure bytestring-lexing helped a lot, for example, the average decoding speed improvement is around 20%, but numeric only benchmarks(integers and numbers) improved by 30% !
>> 
>> So still some substantial gains for non-numeric stuff, nice!
>> 
>>> Parsing is just a part of JSON decoding, lots of time is spent on unescaping, .etc. So the parser's improvement is quite large IMHO.
>>> 
>>> BTW, can you provide a version of lexer which doesn't check whether a Word is a digit? In binary-parsers i use something like `takeWhile isDigit` to extract the input ByteString, so there's no need to verify this in lexer again. Maybe we can have another performance improvement.
>> 
>> I suppose I could, but then it wouldn't be guaranteed to return
>> correct answers. The way things are set up now, the intended workflow
>> is that wherever you're expecting a number, you should just hand the
>> ByteString over to bytestring-lexing (i.e., not bother
>> scanning/pre-lexing via `takeWhile isDigit`) and it'll give back the
>> answer together with the remainder of the input. This ensures that you
>> don't need to do two passes over the characters. So, for Attoparsec
>> itself you'd wrap it up with something like:
>> 
>>   decimal :: Integral a => Parser a
>>   decimal =
>>       get >>= \bs ->
>>       case readDecimal bs of
>>       Nothing -> fail "error message"
>>       Just (a, bs') -> put bs' >> return a
>> 
>> Alas `get` isn't exported[1], but you get the idea. Of course, for
>> absolute performance you may want to inline all the combinators to see
>> if there's stuff you can get rid of.
>> 
>> The only reason for scanning ahead is in case you're dealing with lazy
>> bytestrings and so need to glue them together in order to use
>> bytestring-lexing. Older versions of the library did have support for
>> lazy bytestrings, but I removed it because it was bitrotten and
>> unused. But if you really need it, I can add new variants of the
>> lexers for dealing with the possibility of requesting new data when
>> the input runs out.
>> 
>> 
>> [1] <http://hackage.haskell.org/package/attoparsec-0.13.1.0/docs/src/Data-Attoparsec-ByteString-Internal.html#get>
>> 
>> -- 
>> Live well,
>> ~wren
>> _______________________________________________
>> Haskell-Cafe mailing list
>> To (un)subscribe, modify options or view archives go to:
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>> Only members subscribed via the mailman list are allowed to post.
> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20161011/3ff0a2fd/attachment.html>


More information about the Haskell-Cafe mailing list