[Haskell-i18n] Proposal for a Unicode-safe layout rule

Glynn Clements glynn.clements@virgin.net
Mon, 4 Aug 2003 12:07:02 +0100


Keith Wansbrough wrote:

> > If TAB is treated as layout-unsafe (as it should be) then this rule change
> > will break some existing code, but only code that deserves to be broken.
> > If TAB is treated specially as it currently is, this change should not
> > break any existing code.
> 
> Alternative: make the column value a partial order: identical white
> space is equal indent, additional white space increases the indent and
> less decreases it, but different white space is incomparable.  In
> other words, you can use whatever mix of spaces and tabs you like, as
> long as you use it consistently.  Eight spaces, four spaces, and one
> tab are all different, but if you always use one tab you're fine.
> Like this:
> 
> -- suggested whitespace partial order for layout
> 
> import Char
> import List
> 
> -- |Opaque indent value type
> data Indent = Indent String deriving Show
> 
> -- |Given a line, compute the indent of that line
> indentOf :: String -> Indent
> indentOf s = Indent (takeWhile isSpace s)
> 
> -- |Compare two indents (NB: partial order)
> cmpIndent :: Indent -> Indent -> Maybe Ordering
> cmpIndent (Indent s1) (Indent s2) | s1 == s2         = Just EQ
>                                   | isPrefixOf s1 s2 = Just LT
>                                   | isPrefixOf s2 s1 = Just GT
>                                   | otherwise        = Nothing
> 
> This way, it doesn't matter how wide any whitespace character is, or
> even what tab width you use.  The system doesn't even need to know -
> and any Unicode whitespace character can be used, not just the Latin-1
> "isSpace" ones.

But note that your algorithm doesn't specify what would happen if you
had used tabs instead of spaces on the last three lines above. If you
allow tabs, you need to know exactly what they represent.

There is nothing wrong with the current layout rules[1]; they don't
need to change.

[1] Regarding tabs, that is; there *is* a problem when you start
dealing with characters which aren't necessarily "one character cell"
wide (e.g. CJK).

-- 
Glynn Clements <glynn.clements@virgin.net>