[Haskell-cafe] Re: haskelldb + sqlite problem.

John Goerzen jgoerzen at complete.org
Mon Jun 22 17:15:38 EDT 2009


GüŸnther Schmidt wrote:
> Hi Cloud,
> 
> this often occurs when the path to the database includes a non-ascii 
> character.
> 
> In my dev environment, the path to the database deliberately contains an 
> umlaut and the original code base of hdbc.sqlite3 from John Goerzen, 
> version 2.0 & version 2.1 thus does not work.

This is conflating many issues.

I do recall some discussion about data within a database; I don't recall
one about the filename of it, which would certainly be a separate
discussion.  I can see why a connectRaw or some such function could be
useful if you want to pass a raw binary string as the file path to
Sqlite3.  (I don't think any other DB would have a use for such a thing).


> John Goerzen, the author of HDBC has considerably rewritten some parts 
> of his hdbc package to use utf8-string wrapping, which includes wrapping 
>   the connection string, and in my case caused considerable problems, it 
> just wouldn't work. So my solution was to rollback all these changes 
> where he used the utf8-wrapping, which was quite a lot of work. I did 

And unnecessary work at that, if all you cared about was the filename.

I see your point on the filename, and tweaking that would have been a
one-line fix for you.

The mess we had before was this huge cloud of **UNDEFINED BEHAVIOR**
when dealing with anything other than 7-bit ASCII.  Databases could have
some encodings, systems could have encodings, and it was all a huge fiasco.

So with HDBC 2, what we have is:

 * If you want to communicate with the database in a "raw" manner, use
ByteStrings.  If you want a String out of it, convert it yourself.

 * If you want to use Strings to communicate with the databases, these
will automatically be converted to the appropriate Unicode
representation by the library.  For all current database backends, that
means converting them to a UTF-8 CStringLen type of thing, and back.

> Anyway what you can do, for now, is to put your sqlite3 database file 
> into a location where the path contains no non-ascii characters, that 
> should fix the problem.

His problem is not caused by non-ASCII characters.

> You may experience other, utf8-wrapping related problems, for instance 
> when you want to insert non-ascii strings into varchar columns. They may 
> not come back as you put them in.

They will, unless you are doing something weird like putting Latin1
8-bit text into a String and passing it to HDBC as a String, when the
documentation specifically states that Strings are expected to be in the
Unicode space.  As I recall, that is specifically what you were doing.

That doesn't mean I haven't provided an outlet for you to do deal with
things in the Latin1 space (see the ByteString discussion above.)

But in truth, HDBC is not a character set conversion library, nor should
it be.  If you have more complex needs than Unicode Strings, use one of
the many quality encoding libraries available for Haskell, and combine
it with the ByteString features in HDBC.

Every popular database that I am aware of can either speak UTF-8
directly, or convert transparently to and from it.

So, to summarize:

1) This is not the original poster's problem.

2) HDBC 2 is simpler than HDBC 1, and actually defines behavior in terms
of character sets rather than leaving it completely undefined.

3) HDBC 2 standardizes character sets around UTF-8, the most common
global standard, and structures its API in a way that this is
transparent when you want it to be, and available for manual processing
when you want that.

4) Nothing requires you to use UTF-8, which is why the ByteString API is
there.

5) A one-line patch would have fixed your filename connection issue.

6) If memory serves, your "not getting things back" is because you are
storing non-Unicode data in your Strings, and then using an improper API
to store it.

-- John


More information about the Haskell-Cafe mailing list