[Haskell-cafe] What's the status with unicode characters on
haddock ?
Thomas ten Cate
ttencate at gmail.com
Fri Jul 10 09:06:26 EDT 2009
I ran a little experiment of my own, using a GHC HEAD build of a week
or so ago. Here's a hex dump of my test source, so that we can see
that it's really UTF-8.
$ od -xc Test.hs
0000000 6f6d 7564 656c 4d20 6961 206e 6877 7265
m o d u l e M a i n w h e r
0000020 0a65 2d0a 202d 207c 7250 6e69 7374 7420
e \n \n - - | P r i n t s t
0000040 6568 7420 7865 2074 4822 6c65 6f6c 7720
h e t e x t " H e l l o w
0000060 726f 646c 2e22 2d0a 202d 6548 6572 7327
o r l d " . \n - - H e r e ' s
0000100 6120 6520 7275 206f 6973 6e67 202c 82e2
a e u r o s i g n , 342 202
0000120 20ac 5528 322b 4130 2943 202c 6e61 2064
254 ( U + 2 0 A C ) , a n d
0000140 6e61 6520 656c 656d 746e 6f2d 2066 6973
a n e l e m e n t - o f s i
0000160 6e67 203a 88e2 208a 5528 322b 3032 2941
g n : 342 210 212 ( U + 2 2 0 A )
0000200 0a2e 616d 6e69 3a20 203a 4f49 2820 0a29
. \n m a i n : : I O ( ) \n
0000220 616d 6e69 3d20 7020 7475 7453 4c72 206e
m a i n = p u t S t r L n
0000240 4822 6c65 6f6c 7720 726f 646c 0a22
" H e l l o w o r l d " \n
0000256
Then I invoked
$ haddock -h Test.hs
The generated Main.html contains this tag:
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">
Firefox picks this up, because in the View menu, Character Encoding is
set to UTF-8.
Yet, I see the little blocks instead of the characters from my source file! Why?
$ od -xc Main.html
...
0003220 6120 6520 7275 206f 6973 6e67 202c 2004
a e u r o s i g n , 004
...
0003260 6520 656c 656d 746e 6f2d 2066 6973 6e67
e l e m e n t - o f s i g n
0003300 203a 2004 5528 322b 3032 2941 0a2e 2f3c
: 004 ( U + 2 2 0 A ) . \n < /
It seems that Haddock replaced both characters with a 0x04 (ASCII
end-of-transmission) byte! Apparently you've hit a bug in Haddock.
Since Haskell source files are UTF-8 by definition, and the HTML file
it produces is also UTF-8, this is clearly incorrect behaviour.
Thomas
More information about the Haskell-Cafe
mailing list