[Haskell-cafe] Encoding issues with LDAP package
wren ng thornton
wren at freegeek.org
Wed May 23 05:33:44 CEST 2012
On 5/22/12 10:30 AM, Vincent Ambo wrote:
> I'm using the LDAP package by John Goerzen to retrieve some information from an Active Directory database. Part of this information are the full names of my company's employees.
> Many of these names contain characters which aren't part of the standard ASCII set, for example ä å ü ê and so on. When I retrieve those names from the directory (the LDAP package returns them as Strings) the encoding breaks and I get results like "R\195\188diger" instead of "Rüdiger".
> The Active Directory server supports LDAP v2 and v3. I assume the OpenLDAP C API, which is the backend behind the LDAP package, automatically chooses v3 to connect if available (this is speculation, correct me if I'm wrong).
> Since LDAP v3 only speaks UTF8 and ASCII I also assume that the server returns UTF8.
> Is this a known problem in the LDAP package? Or is this related to the OpenLDAP C API? Or even something on the server side?
I haven't used the LDAP package, though I have done a good deal of LDAP
hackery back in the day. Without looking at any of the code involved, it
sounds like the LDAP server is handing off utf8 encoded C-style char
but that the Haskell code is interpreting that byte-by-byte (a la
Data.ByteString.Char8 or similar) rather than properly decoding it into
a list of Char (i.e., Unicode code points).
If you're familiar with the LDAP package and the FFI, it should be easy
to poke into the code and see if that's actually what's going on.
More information about the Haskell-Cafe