String != [Char]

Thomas Schilling nominolo at googlemail.com
Fri Mar 23 21:03:55 CET 2012


OK, so I think we should separate the parts of the proposal a bit.

  - Remove   type String = [Char]

  - Make String an abstract type (it could be named Text to encourage
users to think about whether they are operating on  a representation
of text or on a sequence of bytes).

  - Specify operations on such an abstract String/Text type.
Personally, I think the standard shouldn't specify too many operations
over such a type to not limit implementors' freedom too much.

  - Integrate the rest of the standard library with this new abstract
type.  This, I think, is actually the hardest part.

I think most here agree that the main advantage of the current
definition is only pedagogical.  Even then Strings are often built in
a very inefficient way by using ++ instead of ShowS + function
composition (which actually is a builder on its own).  I don't really
think that having an abstract type is such a big problem for teaching.
 You can do string processing by doing (pack . myfunction . unpack)
which is fine for this purpose.  Once students are comfortable with
using higher-order functions, you can tell them to use the more
optimised Text-specific combinators.  Builders are also a very nice
application of monoids.

The larger problem for the Prelude would be that you can no longer use
the list functions on String/Text.  This mainly leads to an issue with
naming things (e.g., length for lists and length for strings).
Similarly, file functions like readFile probably shouldn't return Text
but ByteStrings.  But that would mean making ByteString part of the
Prelude as well.  So I'm not too sure on these particular issues.

On 23 March 2012 19:30, Edward Kmett <ekmett at gmail.com> wrote:
> Like I said, my objection to including Text is a lot less strong than my
> feelings on any notion of deprecating String.
>
> However, I still see a potentially huge downside from an pedagogical
> perspective to pushing Text, especially into a place where it will be front
> and center to new users. String lets the user learn about induction, and
> encourages a "Haskelly" programming style, where you aren't mucking about
> with indices and Builders everywhere, which is frankly very difficult to use
> when building Text. If you cons or append to build up a Text fragment,
> frankly you're doing it wrong.
>
> The pedagogical concern is quite real, remember many introductory lanuage
> classes have time to present Haskell and the list data type and not much
> else. Showing parsing through pattern matching on strings makes a very
> powerful tool, its harder to show that with Text.
>
> But even when taking apart Text, the choice of UTF16 internally makes it
> pretty much a worst case for many string manipulation purposes. (e.g.
> slicing has to spend linear time scanning the string) due to the existence
> of codepoints outside of plane 0.
>
> The major benefits of Text come from FFI opportunities, but even there if
> you dig into its internals it has to copy out of the array to talk to
> foreign functions because it lives in unpinned memory unlike ByteString.
>
> The workarounds for these  limitations all require access to the internals,
> so a Text proposed in an implementation-agnostic manner is less than useful,
> and one supplied with a rigid set of implementation choices seems to
> fossilize the current design.
>
> All of these things make me lean towards a position that it is premature to
> push Text as the one true text representation.
>
> That I am very sympathetic to the position that the standard should ensure
> that there are Text equivalents for all of the exposed string operations,
> like read, show, etc, and the various IO primitives, so that a user who is
> savvy to all of these concerns has everything he needs to make his code
> perform well.
>
> Sent from my iPad
>
> On Mar 23, 2012, at 1:32 PM, Brandon Allbery <allbery.b at gmail.com> wrote:
>
> On Fri, Mar 23, 2012 at 13:05, Edward Kmett <ekmett at gmail.com> wrote:
>>
>> Isn't it enough that it is part of the platform?
>
>
> As long as the entire Prelude and large chunks of the bootlibs are based
> around String, String will be preferred.  String as a boxed singly-linked
> list type is therefore a major problem.
>
> --
> brandon s allbery                                      allbery.b at gmail.com
> wandering unix systems administrator (available)     (412) 475-9364 vm/sms
>
>
> _______________________________________________
> Haskell-prime mailing list
> Haskell-prime at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-prime
>



-- 
Push the envelope. Watch it bend.



More information about the Haskell-prime mailing list