[Haskell-cafe] Text.Regex.Base throws exceptions with makeRegexOptsM

Daniel Fischer daniel.is.fischer at googlemail.com
Fri Dec 30 01:24:02 CET 2011


On Thursday 29 December 2011, 23:52:46, Omari Norman wrote:
> Hi folks,
> 
> I'm using Text.Regex.Base with the TDFA and PCRE backends. I want to
> compile regular expressions first and make sure the patterns were
> actually valid, so I used makeRegexOptsM, which indicates a bad regular
> expression by calling fail. That allows you to use makeRegexOptsM with
> Maybe or with (Either String) (assuming that Either String is an
> instance of Monad, which of course is defined in Control.Monad.Error.)
> 
> Doing this with Maybe Regex works like it should--bad pattern gives you
> a Nothing. But if you want to see the error message by using Either
> String, an exception gets thrown with the bad pattern, rather than
> getting a Left String.
> 
> Why is this?

The cause is that a pattern-match failure in a do-block or equivalent 
causes the Monad's 'fail' method to be invoked.

For Maybe, we have

fail _ = Nothing

For Either, there used to be

instance Error e => Monad (Either e) where
    ...
    fail s = Left (strMsg s)

in mtl's Control.Monad.error, and all was fine if one used the regex 
functions with e.g. (Either String) as the Monad.

Recently, however, it was decided to have

instance Monad (Either e) where
    ...
    fail s = error s -- not explicitly, but by Monad's default method

in Control.Monad.Instances. So now, if you have a pattern-match failure 
using (Either String), you don't get a nice 'Left message' but an error.

So why was it decided to have that change?

'fail' doesn't properly belong in the Monad class, it was added for the 
purpose of dealing with pattern-match failures, but most monads can't do 
anything better than abort with an error in such cases.
'fail' is widely considered a wart.

On the other hand, the restriction to Either's first parameter to belong to 
the Error class is artificial, mathematically, (Either e) is a Monad for 
every type e. And (Either e) has use-cases as a Monad for types which 
aren't Error members.

So the general consensus was that it was better to get rid of the arbitrary 
(Error e) restriction.

Now, what can you do to get the equivalent of the old (Either String)?

Use 'ErrorT String Identity'.

It's a bit more cumbersome to get at the result,

foo = runidentity . runErrorT $ bar

but it's clean.

> Seems like an odd bug somewhere.

A change in behaviour that was accepted as the price of fixing what was 
widely considered a mistake.

> I am a Haskell novice, but
> I looked at the code for Text.Regex.Base and for the TDFA and PCRE
> backends and there's nothing in there to suggest this kind of
> behavior--it should work with Either String.

It used to.

> 
> The attached code snippet demonstrates the problem. I'm on GHC 7.0.3
> (though I also got the problem with 6.12.3) and regex-base-0.93.2 and
> regex-tdfa-1.1.8 and regex-pcre-0.94.2. Thanks very much for any tips or
> ideas. --Omari




More information about the Haskell-Cafe mailing list