[Haskell-cafe] HDBC, postgresql, bytestrings and embedded NULLs

Tue Jan 18 04:20:48 CET 2011

On Mon, Jan 17, 2011 at 11:38 PM, John Goerzen <jgoerzen at complete.org> wrote:
> On 01/17/2011 03:16 PM, Michael Snoyman wrote:
>>
>> I've brought up before my problem with the convertible package: it
>> encourages usage of partial functions. I would prefer two typeclasses,
>> one for guaranteed conversions and one for conversions which may fail.
>> In fact, that is precisely why convertible-text[1] exists.
>
> I would be open to making that change in convertible.  The unfortunate
> reality with databases, however, is that many times we put things into
> strings for sending to the DB engine, and get things back from it in the
> form of strings, which must then be parsed into numeric types and the like.
>  We can't, as a matter of type system principles, guarantee that a String
> can be converted to an Integer.  How were you thinking the separation into
> these typeclasses would be applied in the context of databases/
>
>> As a related issue, there are a large number of data constructors in
>> HDBC for SqlValue. I would not argue with the presence of any of them:
>> for your purposes, every one of them is necessary. But for someone
>> writing a cross-backend package with a more limited set of datatypes,
>> it gets to be a problem. I know I can use convertible for this, but
>> see my previous paragraph ;).
>
> How about using an import...hiding statement?  Perhaps even your own module
> that only re-exports the constructors you like?

In Persistent, we already have a very good idea of what the datatype
will be (integral, date/time, etc), and therefore dealing with the raw
bytestring would probably be preferable for us. (This is a bit of a
simplification, but close enough to reality.) I'm not actually
requesting you change HDBC to return ByteStrings, I'm simply stating
that, while a high-level API using Haskell datatypes is correct for
*most* uses, there are use cases whee a low-level API makes more
sense.

And the many-constructor issue isn't a matter of convenience or being
overwhelmed by constructors, it's a matter of correctness: if I want
to map a SqlValue onto a UTCTime value, I need to check a number of
different constructors, as opposed to using a ByteString and reading
out the value directly. In Persistent, I always know which type of
DATE or DATETIME was used for creating tables, so I know which version
to anticipate. In theory, this argument applies to HDBC's constructors
as well; however, I think I was bitten by a bug previously to do with
timezoning issues.

>> I also don't like using the lazy result functions. I'm sure for many
>> people, they are precisely what is needed. However, in my
>> applications, I try to avoid it whenever possible. I've had bugs crop
>> up because I accidently used the lazy instead of strict version of a
>> function. I would prefer using an interface that uses enumerators[2].
>
> It would be pretty simple to add an option to the API to force the use of
> the strict versions of functions in all cases (or perhaps to generate an
> exception if a lazy version is attempted.)  Would that address the concern?
>  Or perhaps separating them into separate modules?

Again, I *personally* think that would be better, but I'm sure many
other HDBC users would consider this a change for the worse: there are
a lot of Haskellers who have no problem with lazy IO, and I don't want
to adversely affect their programming on a whim.

> I took a quick look at the enumerators library, but it doesn't seem to have
> the necessary support for handling data that comes from arbitrary C API
> function calls rather than handles or sockets.

It does support this, for prior art see yaml[1] or yajl-enumerator[2].
I'd be happy to help you with this, as having some enumerator
experience is a big help here.

>> For none of these do I actually think that HDBC should change. I think
>> it is a great library with a well-thought-out API. All I'm saying is
>> that I doubt there will ever be a single high-level API that will suit
>> everyone's need, and I see a huge amount of value in splitting out the
>> low-level code into a separate package. That way, *everyone* can share
>> that code together, *everyone* can find the bugs in it, and *everyone*
>> can benefit from improvements.
>
> Splitting out the backend code is quite reasonable, and actually that was
> one of the goals with the HDBC v2 API.  I would have no objection if people
> take, say, HDBC-postgresql and add a bunch of non-HDBC stuff to it, or even
> break off the C bindings to a separate package and then make HDBC-postgresql
> an interface atop that.
>
> I hope that we can, however, agree upon one low-level database API.  The
> Java, Python, and Perl communities, at least, have.  Failing to do so
> produces unnecessary incompatibility.
>
> I would also hope that this database API would be good enough that there is
> rarely call to bypass it and use a database backend directly.

I agree 100%. The question is whether or not we can all agree on what
is low-level. I think the difference between Haskell and other
languages here is that, while everyone else basically is satisfied
with a cursor approach, in Haskell we have at least two other options:
enumerators and lazy IO. I would be in favor of trying to design
something very low-level (read: more low-level than HDBC) that can
represent any database backend.

On the other hand, I don't think such a system would be used to allow
your code to switch from SQLite to PostgreSQL, since they represent
data so differently. In other words, I would like a SQLite package
that presents a SQLite-specific API and the same for PostgreSQL, but
for them to be designed to be very similar from the beginning to make
the job of implementing higher-level APIs easier.

Michael

[1] http://hackage.haskell.org/package/yaml
[2] http://hackage.haskell.org/package/yajl-enumerator