[web-devel] xml-types IsString instance for Name causes crashes

Simon Meier iridcode at gmail.com
Fri Jun 10 16:20:43 CEST 2011

2011/6/9 John Millikin <jmillikin at gmail.com>:
> On Thu, Jun 9, 2011 at 01:27, Yitzchak Gale <gale at sefer.org> wrote:
>>>> Since the whole idea of xml-types is for it to be a
>>>> unifying standard, I'd like to see it usable in that kind
>>>> of environment, too.
>>> It is usable in such an environment -- simply do not
>>> use the IsString instance.
>> Once you have the Name type rather than just Text -
>> which is useful for people needing namespaces -
>> the IsString instance is important to keep code from
>> becoming really awkward.
> I don't think the code is awkward without IsString. Compare:
> -------------------------------------------------
> bar :: Name
> bar = "{urn:foo}bar
> bar :: Name
> bar = Name "bar" (Just "urn:foo") Nothing
> -- optional: utility function
> name :: Text -> Text -> Name
> name ns local = Name local (Just ns) Nothing
> bar :: Name
> bar = name "urn:foo" "bar"
> -------------------------------------------------
> Using the Name constructor directly is more verbose, yes, but not very
> much. If you know that you'll always have names of a certain form
> (such as "has namespace, no prefix") then you can define a simple,
> type-safe utility function. The IsString instance is a purely optional
> syntactic sugar.
>> Perhaps the real problem here is including Clark
>> notation in the IsString instance. Clark notation is
>> very nice, but it doesn't really belong in the IsString
>> instance. Clark notation could be a function, or a
>> quasi-quoter. Perhaps the client library should be
>> allowed to decide.
> Is there an alternative syntax you'd prefer?
> IMO, IsString is only useful if it allows including both a namespace
> and local name. To do so, it has to parse the input, which can fail.
>> But if it is included in the IsString instance, it
>> absolutely cannot raise an asynchronous exception.
>> That is a serious bug.
> To me, the choice is between raising an exception or removing IsString.
> IsString without namespaces is pointless. IsString without input
> checking is dangerous. If fromString cannot fail on invalid input,
> then it shouldn't be defined.
>> The Name type already produces invalid XML. A client library
>> that wants to avoid invalid names must already check for them.
>> If a typo in Clark notation were another way to create invalid
>> XML that wouldn't change anything.
> You're right -- it is already possible for Names to be invalid. There
> should probably be stricter input checking on names, to ensure they
> match the XML spec. Something like this:
> ----------------------------------------------------------------------
> -- Constructor isn't exported
> data Name = Name Text (Maybe Text) (Maybe Text)
> -- exported
> name :: Text -> Maybe Text -> Maybe Text -> Maybe Name
> name = -- validates input
> -- exported, raises exception on failure
> name_ :: Text -> Maybe Text -> Maybe Text -> Name
> name_ local ns prefix = case name local ns prefix of
>    Just n -> n
>    Nothing -> error ("invalid name: error msg here")
> instance IsString Name where
>  fromString = name_ . Data.Text.pack
> ----------------------------------------------------------------------
>> Right, and it shouldn't. IsString is just a way of giving a different
>> string type, like ByteString or Text, to Haskell's string literal syntax.
>> There is no parsing to do beyond what the compiler already does.
>> Any IsString instance should just take the contents of the string
>> literal and incorporate it directly into the string type.
> IMO, the ByteString and Text instances for IsString are broken.
> ByteString should raise an exception if any (\c -> ord c > 255)
> Text should raise an exception if any of the characters are invalid Unicode
> I have had real failures in some of my programs as a result of these
> overly-liberal instances, which could have been caught *much* sooner
> if they'd simply raised an exception instead of silently returning a
> corrupt value.
>> There is no language I've ever heard of where you can cause a
>> program to crash at *runtime* because of the particular characters
>> you include in a string literal in your source code. Please don't make
>> Haskell the first. That would be very ironic.
> Again, this cannot cause crashes. Name's fromString throws exceptions,
> which can be caught. If your software needs to be reliable, it should
> *already* be catching and handling/logging unexpected exceptions,
> since they can be raised from literally any point in the code.
> Incorrect string literals can cause runtime crashes or exceptions in
> many languages; consider [[ char c = "abc"[3]; ]]
>>> if you're very concerned about it, I could add a flag like
>>> "NoIsString" which disables that particular instance. You could enable
>>> it in your build scripts.
>> I wouldn't use that flag. I need and use the IsString instance.
>> The instance must be defined in the module that defines the type.
> Nobody *needs* the IsString instance. It's a trivial utility for use
> in small scripts and other simple programs. It saves you from typing a
> dozen characters in a utility module somewhere, nothing more.
> If you're concerned that someone might accidentally use it, and thus
> introduce errors into your program, I can offer the choice of
> disabling it. The normal way to construct names will still exist.
> _______________________________________________
> web-devel mailing list
> web-devel at haskell.org
> http://www.haskell.org/mailman/listinfo/web-devel

Note that the OverloadedStringLiterals together with the 'IsString'
instances gives you more than just a nice syntactic abbreviation: the
defined string literals are also lifted out of their context and
defined as CAFs. Therefore, you can share the conversion from String
to <your-type> during the whole run of a program. We exploit that in
blaze-html to do escaping and UTF-8 encoding only once. Hence, in some
contexts the IsString instances are necessary.

More information about the web-devel mailing list