[Haskell-cafe] [perl #129843] [LTA] Indexing on a Str throws generic “out of range” message which is less than awesome (“hello”[2])

Tue May 9 10:59:06 UTC 2017

Am 08.05.2017 um 23:00 schrieb Brandon Allbery:
>
> On Mon, May 8, 2017 at 4:49 PM, Joachim Durchholz <jo at durchholz.org
> <mailto:jo at durchholz.org>> wrote:
>
>     If the mental model for Perl6 strings is "array of characters" though
>
> Perl has never had that mental model, is my point.

Right, I should have written "is supposed to evolve to" instead of "is".
Array of characters may be a useful abstraction to have in Perl6, or not 
(see below).

 > It's generally
> imported by folks who come from languages where strings *are* "arrays of
> characters" --- and where that model has a strong tendency to cause
> problems. (See Python 3's struggles with Unicode as an example.  And
> C/C++, well, don't even get me started.

Some of these struggles originate from equating bytes with characters. 
Since Perl6 is more or less a clean slate, it can avoid these.

Other struggles originate from the structure of Unicode: it defines 
multiple levels of sequences, each useful for different tasks:
- code points
- graphemes
- characters (various normalizations exist)
- word parts (for line breaking)
- words
- sentences
- paragraphs
and possible a few more.

Ideally, developers will be able to use the same API structure at each 
level, maybe with the exception of the grapeme level where Perl6 has its 
native representation (the better the API, the less of such 
implementation details is visible and relevant to the programmer).

 > Bytes stopped being the basis of
> characters even *before* Unicode. C and C++ are still struggling to
> understand that.

I think you're being unfair to them.
The issues are actually well-understood in the C++ arena, as 
demonstrated by the ICU library.
It's just that language evolution is constrained by legacy, plus 
possibly short-sighed decisions by compiler makers. Also, C++ (by 
necessity) evolves slower than Unicode. Under these conditions, Unicode 
support in a library is actually preferrable to anything inside the 
language, it's enough if the language can interoperate with the library.