[Haskell-cafe] getting crazy with character encoding

Brandon S. Allbery KF8NH allbery at ece.cmu.edu
Wed Sep 12 10:53:29 EDT 2007


On Sep 12, 2007, at 10:18 , Andrea Rossato wrote:

> supposed that, in a Linux system, in an utf-8 locale, you create a  
> file
> with non ascii characters. For instance:
> touch abèèè
>
> Now, I would expect that the output of a shell command such as
> "ls ab*"
> would be a string/list of 5 chars. Instead I find it to be a list of 8
> chars...;-)

That is expected.  The low level filesystem storage doesn't know  
about character sets, so non-ASCII filenames must be encoded in e.g.  
UTF-8.  8 characters is therefore correct, and you must do UTF-8  
decoding on input because Haskell does not do so automatically.

This will also be true with getdirent() aka getDirectoryContents.

-- 
brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery at kf8nh.com
system administrator [openafs,heimdal,too many hats] allbery at ece.cmu.edu
electrical and computer engineering, carnegie mellon university    KF8NH




More information about the Haskell-Cafe mailing list