[PATCH 0/4] COPY to a UDF: "COPY ... TO FUNCTION ..."

From: Daniel Farina <dfarina(at)truviso(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Daniel Farina <dfarina(at)truviso(dot)com>
Subject: [PATCH 0/4] COPY to a UDF: "COPY ... TO FUNCTION ..."
Date: 2009-11-23 21:34:38
Message-ID: 1259012082-6196-1-git-send-email-dfarina@truviso.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I have extended COPY to support using a UDF as a target instead of the
normal 'file' or STDOUT targets. This dovetails nicely with a couple
of extensions I have also written for dblink for the purposes of
enabling direct cross-node bulk loading and replication. Please
peruse the patches (the non-test containing patches also possess
robust human-readable summaries and explanations) that are In-Reply-To
this email for more details.

You can also get these patches from a git repo. These patches are
applied against the history tracked by git.postgresql.org:

git fetch http://fdr.lolrus.org/postgresql.git \
copy-to-function:copy-to-function
git checkout copy-to-function

While the functionality exposed in these patches has appeared robust
and efficient to us at Truviso, the code had ease-of-upstream merging
as a central design point, and as such I have shied away from adding
more invasive functionality that would make the interface less
byzantine/more usable. This was intended to be the most surgical cut
before it seemed likely that this might be interesting to the
PostgreSQL project.

At least one additional datapoint of someone else wanting such a
functionality is seen in this thread:

http://archives.postgresql.org/pgsql-hackers/2009-08/msg00428.php

Open Issues:

* setup/copy/teardown and error handling: as-is it is unhealthily
tempting to use a global variable (as seen in the dblink patches)
to track state between setup/copy/cleanup steps. I'm not sure what
the right aesthetic is to make this a little more controlled than
calling specific functions in exactly the right order.

* 'transition state': similar to an aggregate, it may make sense for
the target of TO FUNCTION to have a context in which it can stash
state, or at least have access to a few constant parameters as it
accepts records. If such functionality existed one might be able
to conveniently rewrite the current COPY ... TO (STDOUT|'file')
behavior to be syntactic sugar for TO FUNCTION behavior, which is
somewhat aesthetically pleasing to me.

* It might be interesting to increase the symmetry of this operation
allowing COPY to bulk load into UDFs. With that in mind, the
design the interfaces may change...or not.

This work is released under the BSD license as utilized by the
PostgreSQL project. The copyright owner is Truviso, Inc in 2009.

Daniel Farina (4):
Add "COPY ... TO FUNCTION ..." support
Add tests for "COPY ... TO FUNCTION ..."
Add dblink functions for use with COPY ... TO FUNCTION ...
Add tests to dblink covering use of COPY TO FUNCTION

contrib/dblink/dblink.c | 190 ++++++++++++++++++++++++
contrib/dblink/dblink.h | 5 +
contrib/dblink/dblink.sql.in | 20 +++
contrib/dblink/expected/dblink.out | 272 +++++++++++++++++++++++++++++++++++
contrib/dblink/sql/dblink.sql | 112 ++++++++++++++
contrib/dblink/uninstall_dblink.sql | 8 +
src/backend/catalog/namespace.c | 21 +++
src/backend/commands/copy.c | 190 +++++++++++++++++++++----
src/backend/executor/spi.c | 2 +-
src/backend/nodes/copyfuncs.c | 2 +-
src/backend/nodes/equalfuncs.c | 2 +-
src/backend/parser/gram.y | 30 +++--
src/include/catalog/namespace.h | 1 +
src/include/nodes/parsenodes.h | 3 +-
src/test/regress/input/copy.source | 38 +++++
src/test/regress/output/copy.source | 69 +++++++++
src/test/regress/regress.c | 56 +++++++
17 files changed, 982 insertions(+), 39 deletions(-)

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Farina 2009-11-23 21:34:39 [PATCH 1/4] Add "COPY ... TO FUNCTION ..." support
Previous Message Marko Tiikkaja 2009-11-23 21:33:29 Re: Writeable CTE patch