Policy change for regex libraries
Don Stewart
dons at galois.com
Mon Jan 12 17:07:59 EST 2009
Ensuring compatibilty with the usage described in "Real World Haskell"
would make me really really really happy :)
haskell:
> Before I go and add or change anything in the Haskell regex-posix library,
> I wanted to get some feedback. regex-posix provides Text.Regex.Posix, and
> is built on the regex-base package.
>
> regex-posix is also used as the backend for Text.Regex (the regex-compat
> package does this). I do not intend to change the behavior of the old
> Text.Regex API.
>
> The main issue is the behavior when returning a list of all matches of a
> Regex against a target text. I no longer think the current behavior is the
> right choice when it comes to zero-length matches.
>
> The current behavior is to return non-overlapping matches with the caveat
> that after the first zero-length match the search is ended. Note that the
> zero-length match may be occur at the end position of a previous
> non-zero-length match.
>
> Notably, no one has complained about this policy. But I no longer like it.
> So here are a few of my ideas of what to change it to:
>
> 0) No change, not worth the effort.
>
> 1) return the zero-length match, skip forward 1 character, and continue
> searching. If the consumer wishes the old policy they can truncate the
> list. This could also be filtered to resemble option 2 below.
>
> 2) Mimic "sed". It seems "sed" has a policy where a zero-length match is
> forbidden to occur at the end position of a non-zero-length match. "sed"
> does not stop with the first zero-length match.
>
> 3) implement additional execution options, so the user can choose a policy.
> The default policy choice left with the current behavior.
>
> 4) implement additional execution options, so the user can choose a policy.
> The default policy choice set to he behavior in (1).
>
> 5) Return valid matches starting from all positions, including overlapping
> matches. This I really do not like and one can run the search starting one
> character after the start of the last match to get this information.
>
> Matching "0123" and replacing all matches with themselves wrapped in angle
> brackets. The policies of 0, 1, and 2 above lead to (computed partly by
> hand):
>
> regex of "[0123]?"
> 0): "<0><1><2><3><>"
> 1): "<0><1><2><3><>"
> 2): "<0><1><2><3>"
>
> regex of "[012]?"
> 0): "<0><1><2>3<>"
> 1): "<0><1><2>3<>"
> 2): "<0><1><2>3<>"
>
> regex of "[013]?"
> 0): "<0><1><>23"
> 1): "<0><1><>2<3><>"
> 2): "<0><1>2<3>"
>
> regex of "[023]?"
> 0): "<0><>123"
> 1): "<0><>1<2><3><>"
> 2): "<0>1<2><3>"
>
> regex of "[123]?"
> 0): "<>0123"
> 1): "<>0<1><2><3><>"
> 2): "<>0<1><2><3>"
>
> regex of "[03]?"
> 0): "<0><>123"
> 1): "<0><>1<>2<3><>"
> 2): "<0>1<>2<3>"
>
> regex of "[03]?"
> 0): "<0><>123"
> 1): "<0><>1<>2<3><>"
> 2): "<0>1<>2<3>"
>
> regex of "[12]?"
> 0): "<>0123"
> 1): "<>0<1><2><>3<>"
> 2): "<>0<1><2>3<>"
>
> I am leaning to simply changing it from policy 0 to policy 1.
>
> Are there any objections?
>
> Perhaps I should set a deadline? Now where is that library policy...
>
> --
> Chris
> _______________________________________________
> Libraries mailing list
> Libraries at haskell.org
> http://www.haskell.org/mailman/listinfo/libraries
More information about the Libraries
mailing list