UTF-8 decoding error

Jan-Willem Maessen jmaessen at alum.mit.edu
Wed Sep 27 15:27:03 EDT 2006


On Sep 27, 2006, at 6:05 AM, Matthew Pocock wrote:

> Fortress (sun's possibly-not-vaporware hpc language) supports  
> arbitrary
> unicode chars in code, and has an escape syntax for commonly used  
> things.

I have spent the past week writing Fortress code (which runs in  
parallel, even).  But I'm perhaps a special case. :-)

> Similarly, proof-general/isabelle supports tex-style escapes for  
> symbols &
> greek. It seems to me that a pre-processor that turns human- 
> friendly escapes
> (e.g. \{lambda} rather than some magic number) into unicode and a  
> slightly
> intelligent IDE (or emacs mode?) would go most of the way to  
> letting us use
> up-side-down ys and curly as with all the visual beauty and editor  
> niceness
> that we have now with ascii.

In Fortress we spent a *lot* of effort making the "TWiki" syntax as  
painless as possible for stuff which we planned to use often (for  
example, -> and => turn into Unicode arrows, and the language syntax  
is defined in terms of them).  One source of both encouragement and  
frustration is the fact that every unicode code point has an  
associated description.  We support using these descriptions---and  
various shortenings of them, since they are too verbose for day-to- 
day use.  The frustration is that the names or their shortenings are  
not necessarily unique.  For characters which only occur in strings  
this is less critical, but a little effort will go a long way.

One heuristic we've used is: "if I do a diff on the ASCII  
representation provided by my version control system, will I be able  
to read the result?"

We of course have a little program which processes an official  
unicode character table (downloaded from the web) plus some  
information about our special cases and uses it to generate the  
appropriate conversion functions.  This is important because Unicode  
is constantly changing (mostly getting bigger).

-Jan-Willem Maessen
  Fortress developer, Haskell hacker

>
> Matthew
>
> On Wednesday 20 September 2006 21:42, Duncan Coutts wrote:
>> On Wed, 2006-09-20 at 18:14 +0200, Christian Maeder wrote:
>>> How can I convince ghc version 6.5.20060919 to accept latin1  
>>> characters
>>>  in literals?
>>>
>>> I wish to keep source files (containing umlauts in strings) that  
>>> can be
>>> compiled by either ghc-6.4.2 and ghc-6.6.
>>
>> You can use numeric escapes like "\222".
>>
>> Duncan
>>
>> _______________________________________________
>> Glasgow-haskell-users mailing list
>> Glasgow-haskell-users at haskell.org
>> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
> _______________________________________________
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users at haskell.org
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2425 bytes
Desc: not available
Url : http://www.haskell.org/pipermail/glasgow-haskell-users/attachments/20060927/d608bc9c/smime.bin


More information about the Glasgow-haskell-users mailing list