Long live String = [Char] (Was: Re: String != [Char])

Thomas Schilling nominolo at googlemail.com
Sat Mar 24 20:29:52 CET 2012


On 24 March 2012 12:53, Henrik Nilsson <nhn at cs.nott.ac.uk> wrote:
> Hi all,
>
> Thomas Schilling wrote:
>
>> I think most here agree that the main advantage of the current
>> definition is only pedagogical.
>
> But that in itself is not a small deal. In fact, it's a pretty
> major advantage.
>
> Moreover, the utter simplicity of String = [Char] is a benefit
> in its own right. Let's not forget that this, in practice,
> across all Haskell applications, works just fine in the vast
> majority of cases.
>
> I get the sense that the proponents for deprecating, and ultimately
> get rid of, String = [Char], are suggesting that this would lead
> to noticeable performance improvements across the board by virtue
> of preventing programmers from accidentally making a poor choice
> of data structure for representing string. But I conjecture that
> the performance impact of switching form e.g. String to Text at
> the level of complete applications would be negligible in most
> cases, simply because most Haskell applications are not dominated
> by heavy-duty string processing. And those that are, probably
> already uses something like Text, and were written be people
> who know a thing or two about appropriate choice of data structures
> anyway.
>
> As to teaching:
>
>> I don't really
>> think that having an abstract type is such a big problem for teaching.
>> You can do string processing by doing (pack . myfunction . unpack)
>
> Here at Nottingham, we're teaching all our 1st-year undergraduates
> Haskell. It works, but it is a challenge, and, alas, far from everyone
> "gets" it. And this is despite the module being taught by one of
> the leading and most experienced Haskell educators (and text book
> author), Graham Hutton.
>
> Without starting an endless discussion about how to best teach
> programming languages in general and Haskell in particular to
> (near) beginners, I dare say that idioms like the one suggested
> above would do nothing to help.
>
> String != [Char] would break no end of code, text books, tutorials,
> lecture slides, would not help with teaching Haskell, all
> for very little if any benefit in the grand scheme of things.

OK, I agree that breaking text books is a big deal.  On the other
hand, the lack of a good Text data type forced text books to teach bad
approaches to dealing with strings.  Haskell should do better.

Johan mentioned both semantic and performance problems with Strings.
A part he didn't stress is that Strings are also a horribly
memory-inefficient way of storing strings.  On 64 bit GHC systems a
single ASCII character needs 16 bytes of memory (i.e., an overhead of
16x). A non-ASCII character (ord c > 255) actually requires 32 bytes.
(This is due to a de-duplication optimisation in the GHC GC).  Other
implementations may do better, but an abstract type would still be
better to enable more freedom for implementors.

Correct handling of unicode strings is a Hard Problem and String =
[Char] is only better if you ignore all the issues (which is certainly
fine a teaching environment).

I would be happy to have a simplistic String = [Char] coexist with a
Text type if it weren't for the problem that so many things are biased
towards String.  E.g., error takes a String, Show is used everywhere
and produces strings, the pretty printing library uses Strings, Read
parses Strings.

> On the other hand, a standardised, well thought-out, API for
> high-performance strings and appropriate mechanisms such
> as a measure of overloading to make it easy and palatable to
> use, and that work alongside the present String = [Char], would be a
> good thing.

As I said, while I'm not a huge fan of having two String types
co-exist, I could accept it as a necessary trade-off to keep text
books valid and preserve backwards compatibility.  (There are also
other issues with String.  For example, you can't write an instance
MyClass String in Haskell2010, and even with GHC extensions it seems
wrong and you often end up writing instances that overlap with MyClass
[a].)  I'm using Data.Text a lot, so I can work around the issue, but
unfortunately you run into a lot of issues where the standard library
forces the use of String, and that, I believe, is wrong.

If changing the standard library is the bigger issue, however, then
I'm not sure whether this discussion needs to take place on the
haskell-prime list or on the libraries list.

/ Thomas



More information about the Haskell-prime mailing list