[Haskell-i18n] Proposal for a Unicode-safe layout rule

Ben Rudiak-Gould benrg@dark.darkweb.com
Fri, 1 Aug 2003 19:06:15 -0700 (PDT)


I propose that Haskell's layout rule be changed in the following simple
way:

  * We identify a set of "layout-unsafe" Unicode characters which may
    occupy something other than one column in some fixed-width fonts. This
    would include (among other things) combining characters and full-width
    CJK characters. Explicit Unicode escape sequences, if any, should also
    count as layout-unsafe. Anything doubtful should be layout-unsafe.

  * A special unknown-column value is added to the set of possible column
    positions.

  * All characters following any layout-unsafe character on a source line
    are taken to be at position unknown-column.

  * Any time a layout decision requires comparing two column positions and
    one or both of them is unknown-column, the lexer will abort with a
    helpful error message.

If TAB is treated as layout-unsafe (as it should be) then this rule change
will break some existing code, but only code that deserves to be broken.
If TAB is treated specially as it currently is, this change should not
break any existing code. More importantly, the change is safe in the sense
that any program which is correct under the new rule has the same meaning
as it did under the old rule. This is true regardless of what characters
end up in the set layout-unsafe.

-- Ben