[Haskell] ANN: HDBC (Haskell Database Connectivity)

Tue Jan 3 18:39:35 EST 2006

On Wed, Jan 04, 2006 at 01:08:47AM +0200, Krasimir Angelov wrote:
> Hi John Goerzen,
> 
> I wonder which design decisions are causing you troubles. Could you
> explain this? All features which you mentioned can be added easily to
> HSQL as well. It is better to share the effort on single library
> rather than to have multiple similar libraries. I am willing to work
> on HSQL improvement.

Hi Krasimir,

First off, thank you for all your work on HSQL.  It is great to have a
database layer like that for Haskell, and despite my troubles with it, I
think you have done a wonderful service for the community.  I continue
to have HSQL code in production and have found it a useful tool.

The final thing that prompted me to do this was that the PostgreSQL --
and possibly the Sqlite -- module for HSQL was segfaulting.  I spent
quite a bit of time with gdb and the HSQL code, and even with Simon
Marlow's assistance, was unable to track down the precise cause.  To
make matters worse, the problem was intermittent.  The Haskell program
in question was pure Haskell, and switching it to HDBC solved this
issue.

I also had extremely high memory usage when dealing with large result
sets -- somewhere on the order of 700MB; the same consumes about 12MB
with HDBC.  My guess from looking briefly at the code is that the entire
result set is being read into memory up front.

There were a number of other problems as well:

 * No support for prepared queries or for supplying
   replacable parameters.  (Supported everywhere in HDBC, which removes
   the need to have escaping.)  That's really my #1 complaint
   (well, aside from the segfaulting <g>).

 * Escaping function was global, rather than per-DB, which caused some
   trouble with Sqlite3 at least.  (See SF bug 1324873 that I submitted
   on Oct. 12 with no replies since then)

 * No way to retrieve result data by column index instead of column name

 * HSQL provided no way to see the result set as a lazy list, and
   the public API provided no way to implement that.  (There is 'fetch',
   but it seems that the entire result set was read into memory in
   advance anyway.)

 * The code wasn't very easy to understand.  (This may be just me
   though.)

 * Unclear semantics in multithreaded programs.

 * No testsuite.

I knew I couldn't fix it the right way in HSQL (since I had trouble
following the code), and it also seemed like these weren't high-priority
issues for you.  (No blame here; it's the same way for me with the code
I maintain.  I can't expect you to fix my bugs in something that's
free.)

In hindsight, I should have contacted you first, and I apologize for not
doing that.  I just sorta sat down to design a DB API that I'd like, and
pretty soon had a working prototype, and then some drivers...  I'm
dangerous when I'm on vacation ;-)

I'm not quite sure where to go from here.  Both packages have features
that the other lack.  I don't think that it's possible to merge all the
HDBC features into HSQL without a major API and architecture
refactoring.  The HSQL features that HDBC lacks are mostly in progress
already, and I've tried to design the HDBC API with them in mind.

So, I'd invite you to take a look at the HDBC API at

  http://darcs.complete.org/hdbc/doc/Database-HDBC.html

and let me know how you think we might be able to collaborate.

If nothing else, I'm sure we can share ideas.  (Some of that you'll see
in HDBC, I'm sure.)  Perhaps we could even have a HDBC backend for HSQL
and vice-versa.

-- John