[Haskell-i18n] Proposal for a Unicode-safe layout rule
Ben Rudiak-Gould
benrg@dark.darkweb.com
Fri, 1 Aug 2003 19:06:15 -0700 (PDT)
I propose that Haskell's layout rule be changed in the following simple
way:
* We identify a set of "layout-unsafe" Unicode characters which may
occupy something other than one column in some fixed-width fonts. This
would include (among other things) combining characters and full-width
CJK characters. Explicit Unicode escape sequences, if any, should also
count as layout-unsafe. Anything doubtful should be layout-unsafe.
* A special unknown-column value is added to the set of possible column
positions.
* All characters following any layout-unsafe character on a source line
are taken to be at position unknown-column.
* Any time a layout decision requires comparing two column positions and
one or both of them is unknown-column, the lexer will abort with a
helpful error message.
If TAB is treated as layout-unsafe (as it should be) then this rule change
will break some existing code, but only code that deserves to be broken.
If TAB is treated specially as it currently is, this change should not
break any existing code. More importantly, the change is safe in the sense
that any program which is correct under the new rule has the same meaning
as it did under the old rule. This is true regardless of what characters
end up in the set layout-unsafe.
-- Ben