[Haskell-cafe] Perl-ish =~ operator

ozone at algorithm.com.au ozone at algorithm.com.au
Tue Feb 24 15:06:09 EST 2004

On 24/02/2004, at 1:30 AM, ozone at algorithm.com.au wrote:

> In my effort to turn Haskell into a language more like Perl 
> (muahaha)[1], I got a bit fed up and implemented something like Perl 
> 5's =~ binding operator (a.k.a. "regex" operator); I thought maybe 
> somebody else here might find it useful.  Perl has the concept of 
> 'contexts': a function does something different depending on what type 
> its caller expects back from the function.  Sounds like a perfect 
> abuse of type classes, to me :).

Bonus round: I've managed to hack up something which simulates Perl's 
s/// operator, and also something to emulate its /e modifier 
(evaluate), so that you can run a function over the resulting 
subexpression matches.  (You don't really _need_ the latter, because 
all you have to do is use the =~ operator in a [String] context and run 
your function over that, but hey, it's cool, and I get to abuse type 
classes even more :).

Now, =~ has to change its behaviour not only depending on its context, 
but also depending on what "operation" you do with it: matching, 
substitution, etc.  We differentiate each of the operations by giving 
them a different type: a simple match operation takes in a String (the 
regex to match), whereas substitution requires two strings: the string 
to match against, and the substitution string, i.e. (String, String).

{-# OPTIONS -fglasgow-exts #-}

--  Need this for "instance Foo [String]" declarations (look, don't even
--  need undecidable or overlapping instances :)

module PLRE where
-- Perl-Like Regular Expressions

import Maybe
import Text.Regex

-- Perl-Like =~ operator, which changes behaviour depending on its 
-- context

class Bind op context where
   (=~) :: String -> op -> context

-- (=~) :: String -> String -> Bool
-- returns whether the regex matched or not
instance Bind String Bool where
   s =~ re = case matchRegex (mkRegex re) s of
     Nothing -> False
     Just x -> True

-- (=~) :: String -> String -> [String]
-- returns a list of subexpression matches
instance Bind String [String] where
   s =~ re = case matchRegex (mkRegex re) s of
     Nothing -> []
     Just x -> x

-- (=~) :: String -> (String, String) -> String
-- substitution: "foo" =~ ("f", "g") = "goo"
instance Bind (String, String) String where
   s =~ (re, sub) = case matchRegexAll (mkRegex re) s of
     Nothing -> []
     Just (before, _, after, _) -> before ++ sub ++ after

-- perl's /e modifier.  We expect a function that takes in an argument 
-- type [String] (and can output any type): in that argument, index 0 of
-- the list is the original string to match against, index 1 (if it
-- exists) is the first subexpression match, index 2 is the second
-- subexpression match, etc.

instance Bind (String, ([String] -> context)) context where
   s =~ (re, fn) = case matchRegex (mkRegex re) s of
     Just matches -> fn (s:matches)
     Nothing -> fn [] -- or maybe this should be an error?

boolContextTest string regEx =
   case string =~ regEx of
     True -> print True
     False -> print False

stringListContextTest string regEx =
   case string =~ regEx of
     (a:x) -> print ("First match: " ++ a)
     _ -> error "No subexpression matches"

For an example of how to use the /e-like operator:

PLRE> "foo" =~ ("^(..)", \l -> map Char.ord (l!!1) ) :: [Int]

i.e. it (vaguely) resembles something like $foo =~ s/^(..)/ord $1/;

One thing which would be really nice is to use implicit parameters for 
the subexpression match instead of passing the list of subexpression 
matches explicitly to the function, so that you could instead write:

PLRE> "foo" =~ ("^(..)", map Char.ord ?_1 ) :: [Int]

So ?_n maps nicely on to Perl's $n match variable (or \n, if you're a 
sed foo).  I couldn't find any way for this to work, though, since 
implicit parameters aren't allowed in an instance declaration.

% Andre Pang : trust.in.love.to.save

More information about the Haskell-Cafe mailing list