[Haskell-cafe] HDBC 2.1, UTF8 and Umlauts

John Goerzen jgoerzen at complete.org
Mon May 4 12:19:38 EDT 2009


Guenther Schmidt wrote:
> Hi John,
> 
> thanks for taking the time. It actually is \252 that turned into 
> something else because of my email client, damn the thing.

OK, perhaps we have some confusion here.

Are you saying that you entered the Unicode characters directly into
your Haskell source as literals?  In other words, you did not type:

  backslash two five two

but instead just typed the umlaut on the keyboard?

If so, that won't work directly -- I think.  Maybe somebody can correct
me on this, but my hunch is that would save the umlaut as UTF-8 when you
save the .hs file.  Then you will get a String which is supposed to have
 decoded Unicode data, instead having encoded UTF-8 data.

You could wrap it with Codec.Binary.UTF8.String.decodeString from
utf8-string and see if that helps.  If it does, that'll be your problem.

It's a complicated topic, I know.  And the scary thing is that Unicode
makes this all *easier*.



> 
> I'll do some further investigating and give you some more details when I 
> have them, thanks in advance.
> 
> Günther
> 
> 
> John Goerzen schrieb:
>> On Mon, May 04, 2009 at 04:44:04PM +0200, Guenther Schmidt wrote:
>>   
>>> Hi John,
>>>
>>> I'm trying stuff like:
>>>
>>>    dbc <- connectSqlite3 "somedatabase"
>>>    run dbc "insert into someTable values (?)" [toSql "Günni"].
>>>     
>> SO what do you get back after adding:
>>
>>      commit
>>      r <- quickQuery' dbc "select * from someTable"
>>      print r
>>
>> Just knowing it's garbled doesn't help.  Need to know *how* it's
>> garbled.
>>
>> But the problem is that \374 isn't Unicode at all.  It's ISO-8859-1.
>> You're not actually giving it Unicode data to start with.  I believe
>> the proper sequence is \252.
>>
>> For all I know, \374 may not even be a valid Unicode encoding (haven't
>> tested it).
>>
>> Try \252.
>>
>>
>>   
>>> I also tried:
>>>
>>>    dbc <- connectSqlite3 "somedatabase"
>>>    run dbc "insert into someTable values ('Günni')" [].
>>>
>>> So since this is Haskell code I presume it's in UTF-8, my emacs stores  
>>> all my *.hs files as UTF-8
>>>
>>> In either case the "ü" becomes garbled.
>>>
>>> With the previous version of HDBC, 1.1.6, this worked just fine.
>>>
>>>
>>> It also garbles any Umlauts coming *out*, the source is an UTF-8 sqlite3  
>>> db file.
>>>
>>> Günther
>>>
>>>
>>>
>>> John Goerzen schrieb:
>>>     
>>>> GüŸnther Schmidt wrote:
>>>>   
>>>>       
>>>>> Hi guys,
>>>>>
>>>>> for some reason, any way I try, all the Umlauts get garbled with HDBC 2.1.
>>>>> HDBC 1.16 worked fine with any backend (ODBC, Sqlite3, ... what have you).
>>>>>
>>>>> Anybody else had similar problems and knows how to solve this?
>>>>>     
>>>>>         
>>>> You need to be more specific, but it is likely you are trying to send
>>>> something to HDBC that isn't encoded in UTF-8.  HDBC 2.x has a global
>>>> preference for UTF-8 now, actually partly to resolve complaints like this.
>>>>
>>>> If you are feeding it ISO-8859-1 data or somesuch, try giving it UTF-8
>>>> instead.
>>>>
>>>> -- John
>>>>   
>>>>       
>>>     
> 
> 
> 



More information about the Haskell-Cafe mailing list