[Haskell-cafe] Re: Has anybody replicated =~ s/../../ or even something more basic for doing replacements with pcre haskell regexen?

Thomas Hartman tphyahoo at gmail.com
Mon Mar 16 15:24:47 EDT 2009


Thanks, that was extremely helpful.

My bad for being so sloppy reading the documentation so sloppily -- I
somehow glossed over the bit that backreferences worked as one would
expect.

To atone for this,
http://patch-tag.com/repo/haskell-learning/browse/regexStuff/pcreReplace.hs

shows successful =~ s/../../   -like behavior for a pcre and a
posix-like (but compatible with pcre engine) regex in the same
example, which is based on pcre regex. (See testPcre, testPosix).

FWIW, I still think that there should be a library subRegex function
for all regex flavors, and not just Posix.

If there are gotchas about how capture references work in different
flavors I might backpedal on this, but Im not aware of any.

2009/3/16 ChrisK <haskell at list.mightyreason.com>:
> Thomas Hartman wrote:
>>
>> testPcre = ( subRegex (mkRegex "(?<!\n)\n(?!\n)") "asdf\n \n\n\nadsf"
>> "" ) == "asdf \n\n\nadsf"
>
> quoting from the man page for regcomp:
>
>> REG_NEWLINE   Compile for newline-sensitive matching.  By default, newline
>> is a completely ordinary character with
>>              no special meaning in either REs or strings.  With this flag,
>> `[^' bracket expressions and `.' never
>>              match newline, a `^' anchor matches the null string after any
>> newline in the string in addition to
>>              its normal function, and the `$' anchor matches the null
>> string before any newline in the string in
>>              addition to its normal function.
>
> This is the carried over to Text.Regex with
>
>> mkRegexWithOpts Source
>> :: String       The regular expression to compile
>> -> Bool True <=> '^' and '$' match the beginning and end of individual
>> lines respectively, and '.' does not match the newline character.
>> -> Bool True <=> matching is case-sensitive
>> -> Regex        Returns: the compiled regular expression
>> Makes a regular expression, where the multi-line and case-sensitive
>> options can be changed from the default settings.
>
> Or with regex-posix directly the flag is "compNewline":
> http://hackage.haskell.org/packages/archive/regex-posix/0.94.1/doc/html/Text-Regex-Posix-Wrap.html
>> The defaultCompOpt is (compExtended .|. compNewline).
>
> You want to match a \n that is not next to any other \n.
>
> So you want to turn off REG_NEWLINE.
>
>> import Text.Regex.Compat
>>
>> r :: Regex
>> r = mkRegexWithOpts "(^|[^\n])\n($|[^\n])" False True  -- False is
>> important here
>
>
> The ^ and $ take care of matching a lone newline at the start or end of the
> whole text.  In the middle of the text the pattern is equivalent to
> [^\n]\n[^\n].
>
> When substituting you can use the \1 and \2 captures to restore the matched
> non-newline character if one was present.
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>


More information about the Haskell-Cafe mailing list