[database-devel] Request for Feedback on COPY IN/OUT support for postgresql-simple

Fri Jun 28 11:12:57 CEST 2013

On Thu, Jun 27, 2013 at 11:40 PM, Leon Smith <leon.p.smith at gmail.com> wrote:

> Yeah, eventually adding some level of understanding of the formats would
> be nice,   but honestly my immediate need is a higher-performance (and
> somewhat specialized) replication solution;  I only need a tiny bit of
> understanding of the data I'm shipping from server to server.    But at the
> same time,  a complete solution would need to deal with all three formats
> and the numerous flavors thereof,  not just CSV.  I don't doubt that the
> current FromField/ToField interface isn't particularly well suited to
> dealing with copy data,  though.  =)
>

In response to a proposal to add COPY escaping functionality to libpq, Tom
Lane pointed out that "one of the key requirements for anything dealing
with COPY is that it be fast" [1].  postgresql-copy-escape as-is provides a
faster way to insert data than multi-row INSERT.  We don't need to support
all of the formats for this.  I didn't even implement COPY TO support since
PostgreSQL returns result sets pretty efficiently through the normal
interface.

Your case is even simpler: send COPY data from one server to another.  This
shouldn't require packing/unpacking fields.

 [1]: http://www.postgresql.org/message-id/19641.1331821069@sss.pgh.pa.us

As for threadWaitWrite,  I understand that,  but the code is actually dead
> code at the moment because postgresql-simple doesn't use libpq connections
> in asynchronous mode;  it's just there for when postgresql-simple starts
> using async libpq calls.    Which is something I would like to do,  but it
> would also be a disaster on Windows at the moment.  I'm thinking perhaps
> that the most sensible thing to do would be to reimplement the blocking
> libpq calls in haskell using nonblocking libpq calls,  and then only use
> this module on unix systems.    Hopefully that would cut down on the CPP
> hackery.
>

The libpq source may be helpful.  Take a look at PQexecFinish in fe-exec.c,
which reveals some interesting considerations:

 * When PQexec is given multiple SQL commands, it discards all but the last
PGresult, but it combines error messages from these results.

 * When PQexec encounters a result status of PGRES_COPY_*, it stops
returning results, to allow the application to perform the transfer.

One thing to bear in mind is that threadWait* can be interrupted on Unix
(that's one motivation for using the async API in the first place).  You'll
have to figure out what 'exec' should do when interrupted while waiting for
a result.  Perhaps it should use PQcancel to cancel the request, then
continue waiting for results.  If interrupted yet again, a subsequent
'exec' will need to call PQgetResult to discard results from the
interrupted query before calling PQsendQuery again.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/database-devel/attachments/20130628/b0fdf157/attachment.htm>