[Haskell-cafe] parsec2 vs. parsec3... again

Thu Jan 13 20:07:37 CET 2011

On 12/23/2010 06:01 AM, Evan Laforge wrote:

> This is not very encouraging!  Especially strange is how Text
> generates *more* allocation... I'd expect less since it doesn't unpack
> all the Texts.  
Errgh. To check against predicate, library HAS to unpack checked
character. There is no way around it.
> There's an obvious problem where I get the digits as a String and then
> parse that with list functions, but I can't see any way to get parsec
> to return a chunk of Text.  This is roughly how parsec itself parses
> numbers, in Text.Parsec.Token.
>
> Any ideas or experience?
>
If you wish performance so desperatley, you can try hand-coded parsing.
What I mean is, that if library has to unpack characters to check them
against isDigit predicate, why not to use it in building numeral value
immidiatley? This will eliminate intermidiate list.

However, every back-tracking parser is slow by definition. If you wish
maximum possible speed, consider hand-written lexer (this is not too
hard) and possibly Happy to generate parser.

BTW, how much utf16 text is around? I never found any in wild web.