[Haskell-cafe] Re: Has anybody replicated =~ s/../../ or even something more basic for doing replacements with pcre haskell regexen?

ChrisK haskell at list.mightyreason.com
Mon Mar 16 08:50:41 EDT 2009

Thomas Hartman wrote:
> testPcre = ( subRegex (mkRegex "(?<!\n)\n(?!\n)") "asdf\n \n\n\nadsf"
> "" ) == "asdf \n\n\nadsf"

quoting from the man page for regcomp:

> REG_NEWLINE   Compile for newline-sensitive matching.  By default, newline is a completely ordinary character with
>               no special meaning in either REs or strings.  With this flag, `[^' bracket expressions and `.' never
>               match newline, a `^' anchor matches the null string after any newline in the string in addition to
>               its normal function, and the `$' anchor matches the null string before any newline in the string in
>               addition to its normal function.

This is the carried over to Text.Regex with

> mkRegexWithOpts	Source
> :: String	The regular expression to compile
> -> Bool	True <=> '^' and '$' match the beginning and end of individual lines respectively, and '.' does not match the newline character.
> -> Bool	True <=> matching is case-sensitive
> -> Regex	Returns: the compiled regular expression
> Makes a regular expression, where the multi-line and case-sensitive options can be changed from the default settings.

Or with regex-posix directly the flag is "compNewline":
 > The defaultCompOpt is (compExtended .|. compNewline).

You want to match a \n that is not next to any other \n.

So you want to turn off REG_NEWLINE.

> import Text.Regex.Compat
> r :: Regex
> r = mkRegexWithOpts "(^|[^\n])\n($|[^\n])" False True  -- False is important here

The ^ and $ take care of matching a lone newline at the start or end of the 
whole text.  In the middle of the text the pattern is equivalent to [^\n]\n[^\n].

When substituting you can use the \1 and \2 captures to restore the matched 
non-newline character if one was present.

More information about the Haskell-Cafe mailing list