[Haskell-cafe] Gitit - Encoding

Jeremy Shaw jeremy at n-heptane.com
Tue Dec 30 09:14:32 EST 2008


Hello,

I have not looked at the gitit source code, but I have had this
problem in other HAppS applications. The problem is that by default
HAppS does nothing about string encodings. The easy fix is to use
utf-8 and unicode everywhere. ('easy' compared to supporting multiple
encodings).

The goal is to make sure that in gitit, a String is always a list of
unicode code points, and not a list of utf-8 encoded octets. This
means that whenever data comes in or goes out of gitit it needs to be
decoded or encoded.

To transition you need to do atleast the following:

1. Set the charset of the outgoing pages so that the browser knows
that the pages is supposed to be utf-8:

 For html, this can be done by adding this meta to the <head> of each page:

  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

 However, for text/plain, etc, you must set it in the HTTP header
 (which I will cover later). For html, it is still useful to set the
 meta tag though, so that if the page is saved to disk, the encoding
 is not lost.

2. use the utf8-string library, and make sure that all the
inputs/outputs are decoded/encoded properly.

This probably means patching your copy of HAppS-Server (or copying the
modified functions into gitit). 

For example, lookPairs currently looks like this:

> lookPairs :: RqData [(String,String)]
> lookPairs = asks fst >>= return . map (\(n,vbs)->(n,L.unpack $ inputValue vbs))

As you can see, it just takes the incoming bytes and converts them to
a String, but without doing any decoding. You probably want something
more like:

> lookPairs :: RqData [(String,String)]
> lookPairs = asks fst >>= return . map (\(n,vbs)->(n,Data.ByteString.Lazy.UTF8.toString $ inputValue vbs))

Some of the other look* functions need patching as well.

Similarily, the ToMessage instances need to encode the outgoing data. Consider:

> instance ToMessage Html where
>    toContentType _ = B.pack "text/html"
>    toMessage = L.pack . renderHtml

We really want to make two changes:

> instance ToMessage Html where
>    toContentType _ = B.pack "text/html; charset=UTF-8"            -- add the encoding
>    toMessage = Data.ByteString.Lazy.UTF8.fromString . renderHtml  -- encode the data

3. make sure that any I/O (readFile, writeFile, etc) uses the utf-8
functions from utf8-string.

If you don't want to patch HAppS-Server, then you could work around it by doing silliness like:

 do pairs' <- lookPairs
    let pairs = map (first toString . second toString) pairs'

but that seems error prone and not a long term solution. The obvious
long term solution is for HAppS to fix its encoding issues. The simple
fix is to hardwire it for utf-8, but a system that would supports
arbitrary encodings might be nice?

As far as I know, no one has even tried to submit a patch hardwiring
HAppS to use utf-8 -- which seems like a good short-term solution. You
might try posting on the HAppS mailing list and see if such a patch
would be welcome:

http://groups.google.com/group/HAppS

hope this helps.
- jeremy


At Tue, 30 Dec 2008 13:58:15 +0100, 
Arnaud Bailly wrote:
> 
> Hello,
> I have started using Gitit and I am very happy with it and eager to
> start hacking. I am running into a practical problem: characters
> encoding. When I edit pages using accented characters (I am french),
> the accents get mangled when the page come back from server.
> 
> The raw files are incorrectly encoded. Where Shall I look for fixing
> this issue ?
> 
> Thanks
> 
> ps: the wiki is live at http://www.notre-ecole.org(some of the other look funct
> 
> -- 
> Arnaud Bailly, PhD
> OQube - Software Engineering
> 
> web> http://www.oqube.com
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe


More information about the Haskell-Cafe mailing list