[Haskell-cafe] trivial function application question

Chris Kuklewicz haskell at list.mightyreason.com
Fri Jan 5 12:07:19 EST 2007


Bill Wood wrote:
> It would seem that for a regular expression facility to constitute a
> parser it would have to be able to work on token streams. So my question
> is, does either the the perl6 generalization or any of the Haskell regex
> facilities support regular expressions over any free monoid other than
> finite character sequences?
> 
>  -- Bill Wood
> 

Currently: The regular expressions themselves are finite, but some Haskell
regex-* backend packages can run searches on "infinite" [Char], since they are
lazy.  This is true of regex-parsec and regex-dfa.  In particular they take care
not to consume extra Char's from the input stream, though Parsec insists on
looking at the next character.

The regex-base type class API in Text.Regex.Base.RegexLike does not limit you to
any particular source of the regex or any particular type of thing to be
searched.  There is a limit that comes from using Int as the index of choice.
If you are searching something more that 2 billion units long then you would run
into API problems.  (One could just make a parallel BigRegex type class API and
make instances for it for the backends that can handle it.
).  Or I may expand it to take any (Num).  Hmmm.....

Specifically, the RegerMaker is parameterized over "source" and "regex" and so
is completely generic.  This "source" is what specifies how to build the the
compiled "regex" opaque type.

Specifically, the Extract is parameterized over "source" (but limits the index
to Int).  This "source" is the type being searched.

Specifically, the RegexLike class is parameterized over "regex" and "source",
where "regex" is the supposed to be the opaque compiled type from RegexMaker and
"source" is the type being searched.

Currently the RegexMaker source can be [Char] or ByteString and the
RegexLike/Extract "source" can be [Char] or ByteString.

Adding (Data.Sequence Char) would make sense, and perhaps (Data.Sequence Word8)
as ASCII.  If you write a very generic backend then you may be able to make more
generic instances of the API.  Note that the instances should be obvious because
your generic backend uses a unique opaque "regex" type.

Also not that the API is named Regex in several places but there is no need to
use anything like a Regex syntax.  In fact you could use something other than
RegexMaker to create the "regex" type used for specifying the
searching/matching.  I have not considered it until now, but maybe one could
create an instance of RegexLike based around Parsec's GenParser.

-- 
Chris


More information about the Haskell-Cafe mailing list