[web-devel] xml-types IsString instance for Name causes crashes

John Millikin jmillikin at gmail.com
Thu Jun 9 19:38:13 CEST 2011


On Thu, Jun 9, 2011 at 01:27, Yitzchak Gale <gale at sefer.org> wrote:
>>> Since the whole idea of xml-types is for it to be a
>>> unifying standard, I'd like to see it usable in that kind
>>> of environment, too.
>
>> It is usable in such an environment -- simply do not
>> use the IsString instance.
>
> Once you have the Name type rather than just Text -
> which is useful for people needing namespaces -
> the IsString instance is important to keep code from
> becoming really awkward.

I don't think the code is awkward without IsString. Compare:

-------------------------------------------------
bar :: Name
bar = "{urn:foo}bar

bar :: Name
bar = Name "bar" (Just "urn:foo") Nothing

-- optional: utility function

name :: Text -> Text -> Name
name ns local = Name local (Just ns) Nothing

bar :: Name
bar = name "urn:foo" "bar"
-------------------------------------------------

Using the Name constructor directly is more verbose, yes, but not very
much. If you know that you'll always have names of a certain form
(such as "has namespace, no prefix") then you can define a simple,
type-safe utility function. The IsString instance is a purely optional
syntactic sugar.

> Perhaps the real problem here is including Clark
> notation in the IsString instance. Clark notation is
> very nice, but it doesn't really belong in the IsString
> instance. Clark notation could be a function, or a
> quasi-quoter. Perhaps the client library should be
> allowed to decide.

Is there an alternative syntax you'd prefer?

IMO, IsString is only useful if it allows including both a namespace
and local name. To do so, it has to parse the input, which can fail.

> But if it is included in the IsString instance, it
> absolutely cannot raise an asynchronous exception.
> That is a serious bug.

To me, the choice is between raising an exception or removing IsString.

IsString without namespaces is pointless. IsString without input
checking is dangerous. If fromString cannot fail on invalid input,
then it shouldn't be defined.

> The Name type already produces invalid XML. A client library
> that wants to avoid invalid names must already check for them.
> If a typo in Clark notation were another way to create invalid
> XML that wouldn't change anything.

You're right -- it is already possible for Names to be invalid. There
should probably be stricter input checking on names, to ensure they
match the XML spec. Something like this:

----------------------------------------------------------------------

-- Constructor isn't exported
data Name = Name Text (Maybe Text) (Maybe Text)

-- exported
name :: Text -> Maybe Text -> Maybe Text -> Maybe Name
name = -- validates input

-- exported, raises exception on failure
name_ :: Text -> Maybe Text -> Maybe Text -> Name
name_ local ns prefix = case name local ns prefix of
    Just n -> n
    Nothing -> error ("invalid name: error msg here")

instance IsString Name where
  fromString = name_ . Data.Text.pack

----------------------------------------------------------------------

> Right, and it shouldn't. IsString is just a way of giving a different
> string type, like ByteString or Text, to Haskell's string literal syntax.
> There is no parsing to do beyond what the compiler already does.
> Any IsString instance should just take the contents of the string
> literal and incorporate it directly into the string type.

IMO, the ByteString and Text instances for IsString are broken.

ByteString should raise an exception if any (\c -> ord c > 255)

Text should raise an exception if any of the characters are invalid Unicode

I have had real failures in some of my programs as a result of these
overly-liberal instances, which could have been caught *much* sooner
if they'd simply raised an exception instead of silently returning a
corrupt value.

> There is no language I've ever heard of where you can cause a
> program to crash at *runtime* because of the particular characters
> you include in a string literal in your source code. Please don't make
> Haskell the first. That would be very ironic.

Again, this cannot cause crashes. Name's fromString throws exceptions,
which can be caught. If your software needs to be reliable, it should
*already* be catching and handling/logging unexpected exceptions,
since they can be raised from literally any point in the code.

Incorrect string literals can cause runtime crashes or exceptions in
many languages; consider [[ char c = "abc"[3]; ]]

>> if you're very concerned about it, I could add a flag like
>> "NoIsString" which disables that particular instance. You could enable
>> it in your build scripts.
>
> I wouldn't use that flag. I need and use the IsString instance.
> The instance must be defined in the module that defines the type.

Nobody *needs* the IsString instance. It's a trivial utility for use
in small scripts and other simple programs. It saves you from typing a
dozen characters in a utility module somewhere, nothing more.

If you're concerned that someone might accidentally use it, and thus
introduce errors into your program, I can offer the choice of
disabling it. The normal way to construct names will still exist.



More information about the web-devel mailing list