[Haskell-i18n] Unicode in source
Simon Marlow
simonmar@microsoft.com
Mon, 26 Aug 2002 10:33:03 +0100
> > > The other interpretation is that all glyphs have widths
> which are an
> > > integral number of "columns". Western (latin, cyrillic, Greek)
> > > characters are a single column wide, while CJK characters are
> > > typically two columns wide. The (Unix98) wcwidth() function can be
> > > used to obtain the width (in columns) of a given wide character
> > > (wchar_t) in the current locale.
> >
> > I see, I wasn't aware of this, thanks for pointing it out.
> In this case,
> > we should get some way of obtaining the width in columns of
> a Char in
> > Haskell and let the layout rule talk about columns, correct?
>
> I would think so. Although it might be preferable to simply require
> line breaks, so that you only need to deal with spaces.
>
> My suspicion is that the existing layout rules were decided with an
> implicit assumption of "one character equals one column". If that
> ceases to be the case, maybe the decision should be revisited.
Allowing characters to span more than one column wouldn't break the
layout rule, as long as the character to column mapping is generally
agreed upon across editors and locales. (I think we established that
this is not necessarily the always case, although in practice it should
be).
Requiring a newline before a new layout context would break *a lot* of
code. You can't write 'let x = 42 in x + 1' for example. Sure, a
refinement could be made to allow these kind of things, but this will
serve to make the layout rule more complex, rather than less. So to
extend gracefully while keeping backwards compatibility, I propose:
- There be a fixed character->column mapping
- Tab stops are every 8 columns
- We recommend that programmers avoid using indentation levels
which depend on the widths of non-space characters.
Obeying the third requirement means that your code will look fine in a
proportional font. The compiler could warn about violations quite
easily. Note that it is ok to write 'let x = 42 in x + 1', because the
meaning of the code doesn't depend on the actual indentation level of
the first 'x'.
Cheers,
Simon