[Haskell-cafe] performance question

David Thomas davidleothomas at gmail.com
Fri Feb 15 00:27:05 CET 2013


(I'll be brief because my head is hurting, but please don't interpret that
as an intent to offend)

A few points:

1) Capture groups are all you need to do some meaningful interpretation of
data; these were around long before perl.

2) Yacc is typically used in conjunction with lex, partly for (a)
efficiency and partly for (b) ease of use (compared to writing out [a-z] as
production rules).

3) I've actually used lex without yacc (well, flex without bison) when
faced with dealing with a language that's regular (and easy enough to
express that way - cf. an enormous finite subset of a context-free
language).


2b is mostly irrelevant in Haskell, as Parsec already provides functions
that can easily match the same things a regexp would.

2a, if it stands up to testing, is the best argument for ripping things
apart in Haskell using a DFA.  Parsec and cousins are efficient, but it's
hard to beat a single table lookup per character.  The questions are 1) is
the difference enough to matter in many cases, and 2) is there a way to get
this out of parsec without touching regexps?  (It's not impossible that
parsec already recognizes when a language is regular, although I'd be
weakly surprised).





On Thu, Feb 14, 2013 at 3:07 PM, wren ng thornton <wren at freegeek.org> wrote:

> On 2/13/13 11:18 PM, wren ng thornton wrote:
>
>> On 2/13/13 11:32 AM, Nicolas Bock wrote:
>>
>>> Since I have very little experience with Haskell and am not used to
>>> Haskell-think yet, I don't quite understand your statement that
>>> regexes are
>>> seen as foreign to Haskell-think. Could you elaborate? What would a more
>>> "native" solution look like? From what I have learned so far, it seems to
>>> me that Haskell is a lot about clear, concise, and well structured
>>> code. I
>>> find regexes extremely compact and powerful, allowing for very concise
>>> code, which should fit the bill perfectly, or shouldn't it?
>>>
>>
>> Regexes are powerful and concise for recognizing regular languages. They
>> are not, however, very good for *parsing* regular languages; nor can
>> they handle non-regular languages (unless you're relying on the badness
>> of pcre). In other languages people press regexes into service for
>> parsing because the alternative is using an external DSL like lex/yacc,
>> javaCC, etc. Whereas, in Haskell, we have powerful and concise tools for
>> parsing context-free languages and beyond (e.g., parsec, attoparsec).
>>
>
>
> Just to be clear, the problem isn't that proper regexes are only good for
> regular languages (many files have regular syntax afterall). The problem is
> that regexes are only good for recognition. They're an excellent tool for
> deciding whether a given string is "good" or "bad"; but they're completely
> unsuitable for the task of parsing/interpreting a string into some
> structure or semantic response. If you've ever used tools like yacc or
> javaCC, one of the crucial things they offer is the ability to add these
> semantic responses. Parser combinator libraries in Haskell are similar,
> since the string processing is integrated into a programming language so we
> can say things like:
>
>     myParser = do
>         x <- blah
>         guard (p x)
>         y <- blargh
>         return (f x y)
>
> where p and f can be an arbitrary Haskell functions. Perl extends on
> regular expressions to try and do things like this, but it's extremely
> baroque, hard to get right, and impossible to maintain. (N.B., I was raised
> on Perl and still love it.) And at some point we have to call into question
> the idea of regexes as an embedded DSL when we then turn around and try to
> have Perl be a DSL embedded into the regex language.
>
> One of the big things that makes regexes so nice is that they identify
> crucial combinators like choice and repetition. However, once those
> combinators have been identified, we can just offer them directly as
> functions in the host language. No need for a special DSL or special
> syntax. The big trick is doing this efficiently. Parser combinators were an
> academic curiosity for a long time until Parsec came around and made them
> efficient. And we've come a long way since then: with things like
> attoparsec, PEG parsing, and non-monadic applicative parsers (which can
> perform more optimizations because they can identify the structure of the
> grammar).
>
> The theory of regular expressions is indeed beautiful and elegant.
> However, it's a theory of recognition, not a theory of parsing; and that's
> a crucial distinction. Haskell is about clear, concise, and well-structured
> code; but to be clear, concise, and well-structured we have to choose the
> right tool for the job.
>
>
> --
> Live well,
> ~wren
>
> ______________________________**_________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/**mailman/listinfo/haskell-cafe<http://www.haskell.org/mailman/listinfo/haskell-cafe>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20130214/b392dd06/attachment.htm>


More information about the Haskell-Cafe mailing list