Policy change for regex libraries

Don Stewart dons at galois.com
Mon Jan 12 17:07:59 EST 2009


Ensuring compatibilty with the usage described in "Real World Haskell"
would make me really really really happy :)

haskell:
> Before I go and add or change anything in the Haskell regex-posix library, 
> I wanted to get some feedback.  regex-posix provides Text.Regex.Posix, and 
> is built on the regex-base package.
> 
> regex-posix is also used as the backend for Text.Regex (the regex-compat 
> package does this).  I do not intend to change the behavior of the old 
> Text.Regex API.
> 
> The main issue is the behavior when returning a list of all matches of a 
> Regex against a target text.  I no longer think the current behavior is the 
> right choice when it comes to zero-length matches.
> 
> The current behavior is to return non-overlapping matches with the caveat 
> that after the first zero-length match the search is ended.  Note that the 
> zero-length match may be occur at the end position of a previous 
> non-zero-length match.
> 
> Notably, no one has complained about this policy.  But I no longer like it. 
> So here are a few of my ideas of what to change it to:
> 
> 0) No change, not worth the effort.
> 
> 1) return the zero-length match, skip forward 1 character, and continue 
> searching.  If the consumer wishes the old policy they can truncate the 
> list. This could also be filtered to resemble option 2 below.
> 
> 2) Mimic "sed".  It seems "sed" has a policy where a zero-length match is 
> forbidden to occur at the end position of a non-zero-length match.  "sed" 
> does not stop with the first zero-length match.
> 
> 3) implement additional execution options, so the user can choose a policy. 
> The default policy choice left with the current behavior.
> 
> 4) implement additional execution options, so the user can choose a policy. 
> The default policy choice set to he behavior in (1).
> 
> 5) Return valid matches starting from all positions, including overlapping 
> matches.  This I really do not like and one can run the search starting one 
> character after the start of the last match to get this information.
> 
> Matching "0123" and replacing all matches with themselves wrapped in angle 
> brackets.  The policies of 0, 1, and 2 above lead to (computed partly by 
> hand):
> 
> regex of "[0123]?"
> 0): "<0><1><2><3><>"
> 1): "<0><1><2><3><>"
> 2): "<0><1><2><3>"
> 
> regex of "[012]?"
> 0): "<0><1><2>3<>"
> 1): "<0><1><2>3<>"
> 2): "<0><1><2>3<>"
> 
> regex of "[013]?"
> 0): "<0><1><>23"
> 1): "<0><1><>2<3><>"
> 2): "<0><1>2<3>"
> 
> regex of "[023]?"
> 0): "<0><>123"
> 1): "<0><>1<2><3><>"
> 2): "<0>1<2><3>"
> 
> regex of "[123]?"
> 0): "<>0123"
> 1): "<>0<1><2><3><>"
> 2): "<>0<1><2><3>"
> 
> regex of "[03]?"
> 0): "<0><>123"
> 1): "<0><>1<>2<3><>"
> 2): "<0>1<>2<3>"
> 
> regex of "[03]?"
> 0): "<0><>123"
> 1): "<0><>1<>2<3><>"
> 2): "<0>1<>2<3>"
> 
> regex of "[12]?"
> 0): "<>0123"
> 1): "<>0<1><2><>3<>"
> 2): "<>0<1><2>3<>"
> 
> I am leaning to simply changing it from policy 0 to policy 1.
> 
> Are there any objections?
> 
> Perhaps I should set a deadline? Now where is that library policy...
> 
> -- 
> Chris
> _______________________________________________
> Libraries mailing list
> Libraries at haskell.org
> http://www.haskell.org/mailman/listinfo/libraries


More information about the Libraries mailing list