[PATCH 0/4] COPY to a UDF: "COPY ... TO FUNCTION ..."

Lists: pgsql-hackers
From: Daniel Farina <dfarina(at)truviso(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Daniel Farina <dfarina(at)truviso(dot)com>
Subject: [PATCH 0/4] COPY to a UDF: "COPY ... TO FUNCTION ..."
Date: 2009-11-23 21:34:38
Message-ID: 1259012082-6196-1-git-send-email-dfarina@truviso.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I have extended COPY to support using a UDF as a target instead of the
normal 'file' or STDOUT targets. This dovetails nicely with a couple
of extensions I have also written for dblink for the purposes of
enabling direct cross-node bulk loading and replication. Please
peruse the patches (the non-test containing patches also possess
robust human-readable summaries and explanations) that are In-Reply-To
this email for more details.

You can also get these patches from a git repo. These patches are
applied against the history tracked by git.postgresql.org:

git fetch http://fdr.lolrus.org/postgresql.git \
copy-to-function:copy-to-function
git checkout copy-to-function

While the functionality exposed in these patches has appeared robust
and efficient to us at Truviso, the code had ease-of-upstream merging
as a central design point, and as such I have shied away from adding
more invasive functionality that would make the interface less
byzantine/more usable. This was intended to be the most surgical cut
before it seemed likely that this might be interesting to the
PostgreSQL project.

At least one additional datapoint of someone else wanting such a
functionality is seen in this thread:

http://archives.postgresql.org/pgsql-hackers/2009-08/msg00428.php

Open Issues:

* setup/copy/teardown and error handling: as-is it is unhealthily
tempting to use a global variable (as seen in the dblink patches)
to track state between setup/copy/cleanup steps. I'm not sure what
the right aesthetic is to make this a little more controlled than
calling specific functions in exactly the right order.

* 'transition state': similar to an aggregate, it may make sense for
the target of TO FUNCTION to have a context in which it can stash
state, or at least have access to a few constant parameters as it
accepts records. If such functionality existed one might be able
to conveniently rewrite the current COPY ... TO (STDOUT|'file')
behavior to be syntactic sugar for TO FUNCTION behavior, which is
somewhat aesthetically pleasing to me.

* It might be interesting to increase the symmetry of this operation
allowing COPY to bulk load into UDFs. With that in mind, the
design the interfaces may change...or not.

This work is released under the BSD license as utilized by the
PostgreSQL project. The copyright owner is Truviso, Inc in 2009.

Daniel Farina (4):
Add "COPY ... TO FUNCTION ..." support
Add tests for "COPY ... TO FUNCTION ..."
Add dblink functions for use with COPY ... TO FUNCTION ...
Add tests to dblink covering use of COPY TO FUNCTION

contrib/dblink/dblink.c | 190 ++++++++++++++++++++++++
contrib/dblink/dblink.h | 5 +
contrib/dblink/dblink.sql.in | 20 +++
contrib/dblink/expected/dblink.out | 272 +++++++++++++++++++++++++++++++++++
contrib/dblink/sql/dblink.sql | 112 ++++++++++++++
contrib/dblink/uninstall_dblink.sql | 8 +
src/backend/catalog/namespace.c | 21 +++
src/backend/commands/copy.c | 190 +++++++++++++++++++++----
src/backend/executor/spi.c | 2 +-
src/backend/nodes/copyfuncs.c | 2 +-
src/backend/nodes/equalfuncs.c | 2 +-
src/backend/parser/gram.y | 30 +++--
src/include/catalog/namespace.h | 1 +
src/include/nodes/parsenodes.h | 3 +-
src/test/regress/input/copy.source | 38 +++++
src/test/regress/output/copy.source | 69 +++++++++
src/test/regress/regress.c | 56 +++++++
17 files changed, 982 insertions(+), 39 deletions(-)


From: Daniel Farina <dfarina(at)truviso(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Daniel Farina <dfarina(at)truviso(dot)com>
Subject: [PATCH 1/4] Add "COPY ... TO FUNCTION ..." support
Date: 2009-11-23 21:34:39
Message-ID: 1259012082-6196-2-git-send-email-dfarina@truviso.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

This relatively small change enables all sort of new and shiny evil by
allowing specification of a function to COPY that accepts each
produced row's content in turn. The function must accept a single
INTERNAL-type argument, which is actually of the type StringInfo.

Patch highlights:

- Grammar production changes to enable "TO FUNCTION <qualified name>"

- A new enumeration value in CopyDest to indicate this new mode
called COPY_FN.

- CopyStateData's "filename" field has been renamed "destination"
and is now a Node type. Before it was either a string or NULL;
now it may be a RangeVar, a string, or NULL. Some code now has to
go through some minor strVal() unboxing for the regular TO '/file'
behavior.

- Due to the relatively restricted way this function can be called
it was possible to reduce per-row overhead by computing the
FunctionCallInfo once and then reusing it, as opposed to simply
using one of the convenience functions in the fmgr.

- Add and expose the makeNameListFromRangeVar procedure to
src/catalog/namespace.c, the inverse of makeRangeVarFromNameList.

Signed-off-by: Daniel Farina <dfarina(at)truviso(dot)com>
---
src/backend/catalog/namespace.c | 21 +++++
src/backend/commands/copy.c | 190 +++++++++++++++++++++++++++++++++-----
src/backend/executor/spi.c | 2 +-
src/backend/nodes/copyfuncs.c | 2 +-
src/backend/nodes/equalfuncs.c | 2 +-
src/backend/parser/gram.y | 30 ++++--
src/include/catalog/namespace.h | 1 +
src/include/nodes/parsenodes.h | 3 +-
8 files changed, 212 insertions(+), 39 deletions(-)

diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 99c9140..8911e29 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -2467,6 +2467,27 @@ QualifiedNameGetCreationNamespace(List *names, char **objname_p)
}

/*
+ * makeNameListFromRangeVar
+ * Utility routine to convert a qualified-name list into RangeVar form.
+ */
+List *
+makeNameListFromRangeVar(RangeVar *rangevar)
+{
+ List *names = NIL;
+
+ Assert(rangevar->relname != NULL);
+ names = lcons(makeString(rangevar->relname), names);
+
+ if (rangevar->schemaname != NULL)
+ names = lcons(makeString(rangevar->schemaname), names);
+
+ if (rangevar->catalogname != NULL)
+ names = lcons(makeString(rangevar->catalogname), names);
+
+ return names;
+}
+
+/*
* makeRangeVarFromNameList
* Utility routine to convert a qualified-name list into RangeVar form.
*/
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 5e95fd7..985505a 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -33,6 +33,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "optimizer/planner.h"
+#include "parser/parse_func.h"
#include "parser/parse_relation.h"
#include "rewrite/rewriteHandler.h"
#include "storage/fd.h"
@@ -55,7 +56,8 @@ typedef enum CopyDest
{
COPY_FILE, /* to/from file */
COPY_OLD_FE, /* to/from frontend (2.0 protocol) */
- COPY_NEW_FE /* to/from frontend (3.0 protocol) */
+ COPY_NEW_FE, /* to/from frontend (3.0 protocol) */
+ COPY_FN /* to function */
} CopyDest;

/*
@@ -104,7 +106,8 @@ typedef struct CopyStateData
Relation rel; /* relation to copy to or from */
QueryDesc *queryDesc; /* executable query to copy from */
List *attnumlist; /* integer list of attnums to copy */
- char *filename; /* filename, or NULL for STDIN/STDOUT */
+ Node *destination; /* filename, or NULL for STDIN/STDOUT, or a
+ * function */
bool binary; /* binary format? */
bool oids; /* include OIDs? */
bool csv_mode; /* Comma Separated Value format? */
@@ -131,6 +134,13 @@ typedef struct CopyStateData
MemoryContext rowcontext; /* per-row evaluation context */

/*
+ * For writing rows out to a function. Used if copy_dest == COPY_FN
+ *
+ * Avoids repeated use of DirectFunctionCall for efficiency.
+ */
+ FunctionCallInfoData output_fcinfo;
+
+ /*
* These variables are used to reduce overhead in textual COPY FROM.
*
* attribute_buf holds the separated, de-escaped text for each field of
@@ -425,9 +435,11 @@ CopySendEndOfRow(CopyState cstate)
{
StringInfo fe_msgbuf = cstate->fe_msgbuf;

+ /* Take care adding row delimiters*/
switch (cstate->copy_dest)
{
case COPY_FILE:
+ case COPY_FN:
if (!cstate->binary)
{
/* Default line termination depends on platform */
@@ -437,6 +449,18 @@ CopySendEndOfRow(CopyState cstate)
CopySendString(cstate, "\r\n");
#endif
}
+ break;
+ case COPY_NEW_FE:
+ case COPY_OLD_FE:
+ /* The FE/BE protocol uses \n as newline for all platforms */
+ if (!cstate->binary)
+ CopySendChar(cstate, '\n');
+ break;
+ }
+
+ switch (cstate->copy_dest)
+ {
+ case COPY_FILE:

(void) fwrite(fe_msgbuf->data, fe_msgbuf->len,
1, cstate->copy_file);
@@ -446,10 +470,6 @@ CopySendEndOfRow(CopyState cstate)
errmsg("could not write to COPY file: %m")));
break;
case COPY_OLD_FE:
- /* The FE/BE protocol uses \n as newline for all platforms */
- if (!cstate->binary)
- CopySendChar(cstate, '\n');
-
if (pq_putbytes(fe_msgbuf->data, fe_msgbuf->len))
{
/* no hope of recovering connection sync, so FATAL */
@@ -459,13 +479,19 @@ CopySendEndOfRow(CopyState cstate)
}
break;
case COPY_NEW_FE:
- /* The FE/BE protocol uses \n as newline for all platforms */
- if (!cstate->binary)
- CopySendChar(cstate, '\n');
-
/* Dump the accumulated row as one CopyData message */
(void) pq_putmessage('d', fe_msgbuf->data, fe_msgbuf->len);
break;
+ case COPY_FN:
+ FunctionCallInvoke(&cstate->output_fcinfo);
+
+ /*
+ * These are set earlier and are not supposed to change row to row.
+ */
+ Assert(cstate->output_fcinfo.arg[0] ==
+ PointerGetDatum(cstate->fe_msgbuf));
+ Assert(!cstate->output_fcinfo.argnull[0]);
+ break;
}

resetStringInfo(fe_msgbuf);
@@ -577,6 +603,12 @@ CopyGetData(CopyState cstate, void *databuf, int minread, int maxread)
bytesread += avail;
}
break;
+ case COPY_FN:
+ /*
+ * Should be disallowed by some prior step
+ */
+ Assert(false);
+ break;
}

return bytesread;
@@ -719,7 +751,7 @@ DoCopy(const CopyStmt *stmt, const char *queryString)
{
CopyState cstate;
bool is_from = stmt->is_from;
- bool pipe = (stmt->filename == NULL);
+ bool pipe = (stmt->destination == NULL);
List *attnamelist = stmt->attlist;
List *force_quote = NIL;
List *force_notnull = NIL;
@@ -986,6 +1018,14 @@ DoCopy(const CopyStmt *stmt, const char *queryString)
errhint("Anyone can COPY to stdout or from stdin. "
"psql's \\copy command also works for anyone.")));

+ /* Disallow COPY ... FROM FUNCTION (only TO FUNCTION supported) */
+ if (is_from && cstate->destination != NULL &&
+ IsA(cstate->destination, RangeVar))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY FROM does not support functions as sources")));
+
+
if (stmt->relation)
{
Assert(!stmt->query);
@@ -1183,7 +1223,7 @@ DoCopy(const CopyStmt *stmt, const char *queryString)
cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->client_encoding);

cstate->copy_dest = COPY_FILE; /* default */
- cstate->filename = stmt->filename;
+ cstate->destination = stmt->destination;

if (is_from)
CopyFrom(cstate); /* copy from file to database */
@@ -1225,7 +1265,7 @@ DoCopy(const CopyStmt *stmt, const char *queryString)
static void
DoCopyTo(CopyState cstate)
{
- bool pipe = (cstate->filename == NULL);
+ bool pipe = (cstate->destination == NULL);

if (cstate->rel)
{
@@ -1257,37 +1297,128 @@ DoCopyTo(CopyState cstate)
else
cstate->copy_file = stdout;
}
- else
+ else if (IsA(cstate->destination, String))
{
mode_t oumask; /* Pre-existing umask value */
struct stat st;
+ char *dest_filename = strVal(cstate->destination);

/*
* Prevent write to relative path ... too easy to shoot oneself in the
* foot by overwriting a database file ...
*/
- if (!is_absolute_path(cstate->filename))
+ if (!is_absolute_path(dest_filename))
ereport(ERROR,
(errcode(ERRCODE_INVALID_NAME),
errmsg("relative path not allowed for COPY to file")));

oumask = umask((mode_t) 022);
- cstate->copy_file = AllocateFile(cstate->filename, PG_BINARY_W);
+ cstate->copy_file = AllocateFile(dest_filename, PG_BINARY_W);
umask(oumask);

if (cstate->copy_file == NULL)
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not open file \"%s\" for writing: %m",
- cstate->filename)));
+ dest_filename)));

fstat(fileno(cstate->copy_file), &st);
if (S_ISDIR(st.st_mode))
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
- errmsg("\"%s\" is a directory", cstate->filename)));
+ errmsg("\"%s\" is a directory", dest_filename)));
}

+ /* Branch taken in the "COPY ... TO FUNCTION funcname" situation */
+ else if (IsA(cstate->destination, RangeVar))
+ {
+ List *names;
+ FmgrInfo *flinfo;
+ FuncDetailCode fdresult;
+ Oid funcid;
+ Oid rettype;
+ bool retset;
+ int nvargs;
+ Oid *true_typeids;
+ const int nargs = 1;
+ Oid argtypes[] = { INTERNALOID };
+
+ /* Flip copy-action dispatch flag */
+ cstate->copy_dest = COPY_FN;
+
+ /* Make an fcinfo that can be reused and is stored on the cstate. */
+ names = makeNameListFromRangeVar((RangeVar *) cstate->destination);
+ flinfo = palloc0(sizeof *flinfo);
+
+
+ fdresult = func_get_detail(names, NIL, NIL, nargs, argtypes, false,
+ false,
+
+ /* Begin out-arguments */
+ &funcid, &rettype, &retset, &nvargs,
+ &true_typeids, NULL);
+
+ /*
+ * Check to ensure that this is a "normal" function when looked up,
+ * otherwise error.
+ */
+ switch (fdresult)
+ {
+ /* Normal function found; do nothing */
+ case FUNCDETAIL_NORMAL:
+ break;
+
+ case FUNCDETAIL_NOTFOUND:
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_FUNCTION),
+ errmsg("function %s does not exist",
+ func_signature_string(names, nargs, NIL,
+ argtypes))));
+ break;
+
+ case FUNCDETAIL_AGGREGATE:
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("function %s must not be an aggregate",
+ func_signature_string(names, nargs, NIL,
+ argtypes))));
+ break;
+
+ case FUNCDETAIL_WINDOWFUNC:
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("function %s must not be a window function",
+ func_signature_string(names, nargs, NIL,
+ argtypes))));
+ break;
+
+ case FUNCDETAIL_COERCION:
+ /*
+ * Should never be yielded from func_get_detail if it is passed
+ * fargs == NIL, as it is previously.
+ */
+ Assert(false);
+ break;
+
+ case FUNCDETAIL_MULTIPLE:
+ /*
+ * Only support one signature, thus overloading of a name with
+ * different types should never occur.
+ */
+ Assert(false);
+ break;
+
+ }
+
+ fmgr_info(funcid, flinfo);
+ InitFunctionCallInfoData(cstate->output_fcinfo, flinfo,
+ 1, NULL, NULL);
+ }
+ else
+ /* Unexpected type was found for cstate->destination. */
+ Assert(false);
+
+
PG_TRY();
{
if (cstate->fe_copy)
@@ -1310,13 +1441,13 @@ DoCopyTo(CopyState cstate)
}
PG_END_TRY();

- if (!pipe)
+ if (!pipe && cstate->copy_dest != COPY_FN)
{
if (FreeFile(cstate->copy_file))
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not write to file \"%s\": %m",
- cstate->filename)));
+ strVal(cstate->destination))));
}
}

@@ -1342,6 +1473,13 @@ CopyTo(CopyState cstate)
/* We use fe_msgbuf as a per-row buffer regardless of copy_dest */
cstate->fe_msgbuf = makeStringInfo();

+ /*
+ * fe_msgbuf is never rebound, so there is only a need to set up the
+ * output_fcinfo once.
+ */
+ cstate->output_fcinfo.arg[0] = PointerGetDatum(cstate->fe_msgbuf);
+ cstate->output_fcinfo.argnull[0] = false;
+
/* Get info about the columns we need to process. */
cstate->out_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo));
foreach(cur, cstate->attnumlist)
@@ -1668,7 +1806,7 @@ limit_printout_length(const char *str)
static void
CopyFrom(CopyState cstate)
{
- bool pipe = (cstate->filename == NULL);
+ bool pipe = (cstate->destination == NULL);
HeapTuple tuple;
TupleDesc tupDesc;
Form_pg_attribute *attr;
@@ -1768,19 +1906,21 @@ CopyFrom(CopyState cstate)
{
struct stat st;

- cstate->copy_file = AllocateFile(cstate->filename, PG_BINARY_R);
+ cstate->copy_file = AllocateFile(strVal(cstate->destination),
+ PG_BINARY_R);

if (cstate->copy_file == NULL)
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not open file \"%s\" for reading: %m",
- cstate->filename)));
+ strVal(cstate->destination))));

fstat(fileno(cstate->copy_file), &st);
if (S_ISDIR(st.st_mode))
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
- errmsg("\"%s\" is a directory", cstate->filename)));
+ errmsg("\"%s\" is a directory",
+ strVal(cstate->destination))));
}

tupDesc = RelationGetDescr(cstate->rel);
@@ -2215,7 +2355,7 @@ CopyFrom(CopyState cstate)
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not read from file \"%s\": %m",
- cstate->filename)));
+ strVal(cstate->destination))));
}

/*
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index f045f9c..0914dc9 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1829,7 +1829,7 @@ _SPI_execute_plan(SPIPlanPtr plan, ParamListInfo paramLI,
{
CopyStmt *cstmt = (CopyStmt *) stmt;

- if (cstmt->filename == NULL)
+ if (cstmt->destination == NULL)
{
my_res = SPI_ERROR_COPY;
goto fail;
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 8bc72d1..9b39abe 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2485,7 +2485,7 @@ _copyCopyStmt(CopyStmt *from)
COPY_NODE_FIELD(query);
COPY_NODE_FIELD(attlist);
COPY_SCALAR_FIELD(is_from);
- COPY_STRING_FIELD(filename);
+ COPY_NODE_FIELD(destination);
COPY_NODE_FIELD(options);

return newnode;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 3d65d8b..6ddf226 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1085,7 +1085,7 @@ _equalCopyStmt(CopyStmt *a, CopyStmt *b)
COMPARE_NODE_FIELD(query);
COMPARE_NODE_FIELD(attlist);
COMPARE_SCALAR_FIELD(is_from);
- COMPARE_STRING_FIELD(filename);
+ COMPARE_NODE_FIELD(destination);
COMPARE_NODE_FIELD(options);

return true;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 130e6f4..23331ee 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -251,8 +251,7 @@ static TypeName *TableFuncTypeName(List *columns);
%type <value> TriggerFuncArg
%type <node> TriggerWhen

-%type <str> copy_file_name
- database_name access_method_clause access_method attr_name
+%type <str> database_name access_method_clause access_method attr_name
index_name name cursor_name file_name cluster_index_specification

%type <list> func_name handler_name qual_Op qual_all_Op subquery_Op
@@ -433,6 +432,8 @@ static TypeName *TableFuncTypeName(List *columns);
%type <ival> opt_frame_clause frame_extent frame_bound


+%type <node> copy_file_or_function_name
+
/*
* Non-keyword token types. These are hard-wired into the "flex" lexer.
* They must be listed first so that their numeric codes do not depend on
@@ -1977,14 +1978,15 @@ ClosePortalStmt:
*****************************************************************************/

CopyStmt: COPY opt_binary qualified_name opt_column_list opt_oids
- copy_from copy_file_name copy_delimiter opt_with copy_options
+ copy_from copy_file_or_function_name copy_delimiter opt_with
+ copy_options
{
CopyStmt *n = makeNode(CopyStmt);
n->relation = $3;
n->query = NULL;
n->attlist = $4;
n->is_from = $6;
- n->filename = $7;
+ n->destination = $7;

n->options = NIL;
/* Concatenate user-supplied flags */
@@ -1998,14 +2000,15 @@ CopyStmt: COPY opt_binary qualified_name opt_column_list opt_oids
n->options = list_concat(n->options, $10);
$$ = (Node *)n;
}
- | COPY select_with_parens TO copy_file_name opt_with copy_options
+ | COPY select_with_parens TO copy_file_or_function_name
+ opt_with copy_options
{
CopyStmt *n = makeNode(CopyStmt);
n->relation = NULL;
n->query = $2;
n->attlist = NIL;
n->is_from = false;
- n->filename = $4;
+ n->destination = $4;
n->options = $6;
$$ = (Node *)n;
}
@@ -2021,10 +2024,17 @@ copy_from:
* used depends on the direction. (It really doesn't make sense to copy from
* stdout. We silently correct the "typo".) - AY 9/94
*/
-copy_file_name:
- Sconst { $$ = $1; }
- | STDIN { $$ = NULL; }
- | STDOUT { $$ = NULL; }
+copy_file_or_function_name:
+ Sconst { $$ = (Node *) makeString($1); }
+
+ /*
+ * Note that func_name is not used here because there is no need to
+ * accept the "funcname(TYPES)" construction, as there is only one
+ * valid signature.
+ */
+ | FUNCTION qualified_name { $$ = (Node *) $2; }
+ | STDIN { $$ = NULL; }
+ | STDOUT { $$ = NULL; }
;

copy_options: copy_opt_list { $$ = $1; }
diff --git a/src/include/catalog/namespace.h b/src/include/catalog/namespace.h
index d356635..1d801cd 100644
--- a/src/include/catalog/namespace.h
+++ b/src/include/catalog/namespace.h
@@ -94,6 +94,7 @@ extern Oid LookupExplicitNamespace(const char *nspname);

extern Oid LookupCreationNamespace(const char *nspname);
extern Oid QualifiedNameGetCreationNamespace(List *names, char **objname_p);
+extern List *makeNameListFromRangeVar(RangeVar *rangevar);
extern RangeVar *makeRangeVarFromNameList(List *names);
extern char *NameListToString(List *names);
extern char *NameListToQuotedString(List *names);
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index b34300f..203088c 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -1293,7 +1293,8 @@ typedef struct CopyStmt
List *attlist; /* List of column names (as Strings), or NIL
* for all columns */
bool is_from; /* TO or FROM */
- char *filename; /* filename, or NULL for STDIN/STDOUT */
+ Node *destination; /* filename, or NULL for STDIN/STDOUT, or a
+ * function */
List *options; /* List of DefElem nodes */
} CopyStmt;

--
1.6.5.3


From: Daniel Farina <dfarina(at)truviso(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Daniel Farina <dfarina(at)truviso(dot)com>
Subject: [PATCH 2/4] Add tests for "COPY ... TO FUNCTION ..."
Date: 2009-11-23 21:34:40
Message-ID: 1259012082-6196-3-git-send-email-dfarina@truviso.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Signed-off-by: Daniel Farina <dfarina(at)truviso(dot)com>
---
src/test/regress/input/copy.source | 38 +++++++++++++++++++
src/test/regress/output/copy.source | 69 +++++++++++++++++++++++++++++++++++
src/test/regress/regress.c | 56 ++++++++++++++++++++++++++++
3 files changed, 163 insertions(+), 0 deletions(-)

diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index 376329d..e5dcd62 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -107,3 +107,41 @@ this is just a line full of junk that would error out if parsed

copy copytest3 to stdout csv header;

+
+-- test copy to function
+
+CREATE FUNCTION copyto_setup_state ()
+ RETURNS void
+ AS '@libdir@/regress(at)DLSUFFIX@'
+ LANGUAGE C;
+
+CREATE FUNCTION copyto_function (internal)
+ RETURNS void
+ AS '@libdir@/regress(at)DLSUFFIX@'
+ LANGUAGE C;
+
+CREATE FUNCTION copyto_yield_len ()
+ RETURNS int4
+ AS '@libdir@/regress(at)DLSUFFIX@'
+ LANGUAGE C;
+
+CREATE FUNCTION copyto_yield_text ()
+ RETURNS text
+ AS '@libdir@/regress(at)DLSUFFIX@'
+ LANGUAGE C;
+
+CREATE FUNCTION copyto_free ()
+ RETURNS void
+ AS '@libdir@/regress(at)DLSUFFIX@'
+ LANGUAGE C;
+
+select copyto_setup_state();
+copy copytest to function copyto_function;
+select copyto_yield_len();
+select copyto_yield_text();
+select copyto_free();
+
+select copyto_setup_state();
+copy binary copytest to function copyto_function;
+select copyto_yield_len();
+select copyto_free();
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index 5a88d6e..74ea935 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -71,3 +71,72 @@ copy copytest3 to stdout csv header;
c1,"col with , comma","col with "" quote"
1,a,1
2,b,2
+-- test copy to function
+CREATE FUNCTION copyto_setup_state ()
+ RETURNS void
+ AS '@libdir@/regress(at)DLSUFFIX@'
+ LANGUAGE C;
+CREATE FUNCTION copyto_function (internal)
+ RETURNS void
+ AS '@libdir@/regress(at)DLSUFFIX@'
+ LANGUAGE C;
+CREATE FUNCTION copyto_yield_len ()
+ RETURNS int4
+ AS '@libdir@/regress(at)DLSUFFIX@'
+ LANGUAGE C;
+CREATE FUNCTION copyto_yield_text ()
+ RETURNS text
+ AS '@libdir@/regress(at)DLSUFFIX@'
+ LANGUAGE C;
+CREATE FUNCTION copyto_free ()
+ RETURNS void
+ AS '@libdir@/regress(at)DLSUFFIX@'
+ LANGUAGE C;
+select copyto_setup_state();
+ copyto_setup_state
+--------------------
+
+(1 row)
+
+copy copytest to function copyto_function;
+select copyto_yield_len();
+ copyto_yield_len
+------------------
+ 76
+(1 row)
+
+select copyto_yield_text();
+ copyto_yield_text
+-------------------------------------------
+ DOS abc\r\ndef 1 +
+ Unix abc\ndef 2 +
+ Mac abc\rdef 3 +
+ esc\\ape a\\r\\\r\\\n\\nb 4+
+
+(1 row)
+
+select copyto_free();
+ copyto_free
+-------------
+
+(1 row)
+
+select copyto_setup_state();
+ copyto_setup_state
+--------------------
+
+(1 row)
+
+copy binary copytest to function copyto_function;
+select copyto_yield_len();
+ copyto_yield_len
+------------------
+ 142
+(1 row)
+
+select copyto_free();
+ copyto_free
+-------------
+
+(1 row)
+
diff --git a/src/test/regress/regress.c b/src/test/regress/regress.c
index 0e94e68..a96a085 100644
--- a/src/test/regress/regress.c
+++ b/src/test/regress/regress.c
@@ -16,6 +16,7 @@
#include "executor/spi.h"
#include "utils/builtins.h"
#include "utils/geo_decls.h"
+#include "utils/memutils.h"


#define P_MAXDIG 12
@@ -34,6 +35,11 @@ extern char *reverse_name(char *string);
extern int oldstyle_length(int n, text *t);
extern Datum int44in(PG_FUNCTION_ARGS);
extern Datum int44out(PG_FUNCTION_ARGS);
+extern Datum copyto_free(PG_FUNCTION_ARGS);
+extern Datum copyto_function(PG_FUNCTION_ARGS);
+extern Datum copyto_setup_state(PG_FUNCTION_ARGS);
+extern Datum copyto_yield_len(PG_FUNCTION_ARGS);
+extern Datum copyto_yield_text(PG_FUNCTION_ARGS);

#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
@@ -737,3 +743,53 @@ int44out(PG_FUNCTION_ARGS)
*--walk = '\0';
PG_RETURN_CSTRING(result);
}
+
+/*
+ * copyto testing
+ */
+static StringInfo global_buf;
+
+PG_FUNCTION_INFO_V1(copyto_setup_state);
+
+Datum
+copyto_setup_state(PG_FUNCTION_ARGS)
+{
+ MemoryContext oldcxt = MemoryContextSwitchTo(TopMemoryContext);
+ global_buf = makeStringInfo();
+ MemoryContextSwitchTo(oldcxt);
+ PG_RETURN_VOID();
+}
+
+PG_FUNCTION_INFO_V1(copyto_function);
+
+Datum
+copyto_function(PG_FUNCTION_ARGS)
+{
+ StringInfo copybuf = (void *) PG_GETARG_POINTER(0);
+ appendBinaryStringInfo(global_buf, copybuf->data, copybuf->len);
+ PG_RETURN_VOID();
+}
+
+PG_FUNCTION_INFO_V1(copyto_yield_len);
+
+Datum
+copyto_yield_len(PG_FUNCTION_ARGS)
+{
+ PG_RETURN_INT32(global_buf->len);
+}
+
+PG_FUNCTION_INFO_V1(copyto_yield_text);
+
+Datum
+copyto_yield_text(PG_FUNCTION_ARGS)
+{
+ PG_RETURN_DATUM(DirectFunctionCall1(textin,
+ CStringGetDatum(global_buf->data)));
+}
+
+Datum
+copyto_free(PG_FUNCTION_ARGS)
+{
+ pfree(global_buf);
+ PG_RETURN_VOID();
+}
--
1.6.5.3


From: Daniel Farina <dfarina(at)truviso(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Daniel Farina <dfarina(at)truviso(dot)com>
Subject: [PATCH 3/4] Add dblink functions for use with COPY ... TO FUNCTION ...
Date: 2009-11-23 21:34:41
Message-ID: 1259012082-6196-4-git-send-email-dfarina@truviso.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

This patch enables dblink to be used for the purpose of efficient
bulk-loading via COPY and libpq in combination with the COPY TO
FUNCTION patch.

The following functions were added to accomplish this:

dblink_connection_reset: useful when handling errors and one just
wants to restore a connection to a known state, rolling back as many
transactions as necessary.

dblink_copy_end: completes the COPY

dblink_copy_open: puts a connection into the COPY state. Accepts
connection name, relation name, and binary mode flag.

dblink_copy_write: writes a row to the last connection put in the COPY
state by dblink_copy_open.

Generally speaking, code that uses this will look like the following
(presuming a named connection has already been made):

try:
SELECT dblink_copy_open('myconn', 'relation_name', true);
COPY bar TO FUNCTION dblink_copy_write;

-- since the dblink connection is still in the COPY state, one
-- can even copy some more data in multiple steps...
COPY bar_2 TO FUNCTION dblink_copy_write;

SELECT dblink_copy_end();
finally:
SELECT dblink_copy_reset('myconn');

Signed-off-by: Daniel Farina <dfarina(at)truviso(dot)com>
---
contrib/dblink/dblink.c | 190 +++++++++++++++++++++++++++++++++++
contrib/dblink/dblink.h | 5 +
contrib/dblink/dblink.sql.in | 20 ++++
contrib/dblink/uninstall_dblink.sql | 8 ++
4 files changed, 223 insertions(+), 0 deletions(-)

diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 72fdf56..d32aeec 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -1722,6 +1722,196 @@ dblink_get_notify(PG_FUNCTION_ARGS)
* internal functions
*/

+/*
+ * Attempts to take the connection into a known state by rolling back
+ * transactions. If unable to restore the connection to a known idle state,
+ * raises an error.
+ */
+PG_FUNCTION_INFO_V1(dblink_connection_reset);
+Datum
+dblink_connection_reset(PG_FUNCTION_ARGS)
+{
+ PGresult *res = NULL;
+ PGconn *conn = NULL;
+ char *conname = NULL;
+ remoteConn *rconn = NULL;
+
+ bool triedonce = false;
+
+ DBLINK_INIT;
+
+ /* must be text */
+ Assert(PG_NARGS() == 1);
+ DBLINK_GET_NAMED_CONN;
+
+ if (!conn)
+ DBLINK_CONN_NOT_AVAIL;
+
+ while (!triedonce)
+ {
+ switch (PQtransactionStatus(conn))
+ {
+ case PQTRANS_IDLE:
+ /* Everything is okay */
+ goto finish;
+ case PQTRANS_ACTIVE:
+ case PQTRANS_INTRANS:
+ case PQTRANS_INERROR:
+ res = PQexec(conn, "ROLLBACK;");
+
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ ereport(ERROR,
+ (errcode(ERRCODE_CONNECTION_FAILURE),
+ errmsg("%s: could not issue ROLLBACK command",
+ PG_FUNCNAME_MACRO)));
+
+ PQclear(res);
+ triedonce = true;
+ break;
+ case PQTRANS_UNKNOWN:
+ elog(ERROR, "%s: connection in unknown transaction state",
+ PG_FUNCNAME_MACRO);
+ }
+ }
+
+finish:
+ PG_RETURN_VOID();
+}
+
+/*
+ * dblink COPY support, procedures and variables
+ */
+static PGconn *dblink_copy_current = NULL;
+
+/*
+ * dblink_copy_open
+ *
+ * Start a COPY into a given relation on the named remote connection.
+ */
+PG_FUNCTION_INFO_V1(dblink_copy_open);
+Datum
+dblink_copy_open(PG_FUNCTION_ARGS)
+{
+ PGresult *res = NULL;
+ PGconn *conn = NULL;
+ char *conname = NULL;
+ remoteConn *rconn = NULL;
+
+ const char *copy_stmt = "COPY %s FROM STDIN%s;";
+ const char *with_binary = " WITH BINARY";
+ const char *quoted_remoted_relname;
+ bool isbinary;
+ int snprintf_retcode;
+
+ /*
+ * Should be large enough to contain any formatted output. Formed by
+ * counting the characters in the static formatting sections plus the
+ * bounded length of identifiers. Some modest padding was added for
+ * paranoia's sake, although all uses of this buffer are checked for
+ * over-length formats anyway.
+ */
+ char buf[64 + NAMEDATALEN];
+
+ DBLINK_INIT;
+
+ /* must be text,text,bool */
+ Assert(PG_NARGS() == 3);
+ DBLINK_GET_NAMED_CONN;
+
+ if (!conn)
+ DBLINK_CONN_NOT_AVAIL;
+
+ /* Read the procedure arguments into primitive values */
+ quoted_remoted_relname = NameListToQuotedString(
+ textToQualifiedNameList(PG_GETARG_TEXT_P(1)));
+ isbinary = PG_GETARG_BOOL(2);
+
+ /*
+ * Query parameterization only handles value-parameters -- of which
+ * identifiers are not considered one of -- so format the string the old
+ * fashioned way. It is very important to quote identifiers for this
+ * reason, as performed previously.
+ */
+ snprintf_retcode = snprintf(buf, sizeof buf, copy_stmt,
+ quoted_remoted_relname,
+ isbinary ? with_binary : "");
+
+ if (snprintf_retcode < 0)
+ elog(ERROR, "could not format dblink COPY query");
+ else if (snprintf_retcode >= sizeof buf)
+ /*
+ * Should not be able to happen, see documentation of the "buf" value
+ * in this procedure.
+ */
+ elog(ERROR, "could not fit formatted dblink COPY query into buffer");
+
+ /*
+ * Run the created query, and check to ensure that PGRES_COPY_IN state has
+ * been achieved.
+ */
+ res = PQexec(conn, buf);
+ if (!res || PQresultStatus(res) != PGRES_COPY_IN)
+ ereport(ERROR,
+ (errcode(ERRCODE_CONNECTION_FAILURE),
+ errmsg("could not start COPY FROM on remote node")));
+ PQclear(res);
+
+ /*
+ * Everything went well; finally bind the global dblink_copy_current to the
+ * connection ready to accept copy data.
+ */
+ dblink_copy_current = conn;
+ PG_RETURN_TEXT_P(cstring_to_text("OK"));
+}
+
+/*
+ * dblink_copy_write
+ *
+ * Write the provided StringInfo to the currently open COPY.
+ */
+PG_FUNCTION_INFO_V1(dblink_copy_write);
+Datum
+dblink_copy_write(PG_FUNCTION_ARGS)
+{
+ StringInfo copybuf = (void *) PG_GETARG_POINTER(0);
+
+ if (PQputCopyData(dblink_copy_current, copybuf->data, copybuf->len) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_CONNECTION_EXCEPTION),
+ errmsg("could not send row to remote node")));
+
+ PG_RETURN_VOID();
+}
+
+/*
+ * dblink_copy_end
+ *
+ * Finish the currently open COPY.
+ */
+PG_FUNCTION_INFO_V1(dblink_copy_end);
+Datum
+dblink_copy_end(PG_FUNCTION_ARGS)
+{
+ PGresult *res;
+
+ /* Check to ensure that termination data was sent successfully */
+ if (PQputCopyEnd(dblink_copy_current, NULL) != 1)
+ elog(ERROR, "COPY end failed");
+
+ do
+ {
+ res = PQgetResult(dblink_copy_current);
+ if (res == NULL)
+ break;
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ elog(ERROR, "COPY failed: %s",
+ PQerrorMessage(dblink_copy_current));
+ PQclear(res);
+ } while (true);
+
+ dblink_copy_current = NULL;
+ PG_RETURN_TEXT_P(cstring_to_text("OK"));
+}

/*
* get_pkey_attnames
diff --git a/contrib/dblink/dblink.h b/contrib/dblink/dblink.h
index 255f5d0..8a2faee 100644
--- a/contrib/dblink/dblink.h
+++ b/contrib/dblink/dblink.h
@@ -59,4 +59,9 @@ extern Datum dblink_build_sql_update(PG_FUNCTION_ARGS);
extern Datum dblink_current_query(PG_FUNCTION_ARGS);
extern Datum dblink_get_notify(PG_FUNCTION_ARGS);

+extern Datum dblink_connection_reset(PG_FUNCTION_ARGS);
+
+extern Datum dblink_copy_open(PG_FUNCTION_ARGS);
+extern Datum dblink_copy_write(PG_FUNCTION_ARGS);
+extern Datum dblink_copy_end(PG_FUNCTION_ARGS);
#endif /* DBLINK_H */
diff --git a/contrib/dblink/dblink.sql.in b/contrib/dblink/dblink.sql.in
index da5dd65..aedca34 100644
--- a/contrib/dblink/dblink.sql.in
+++ b/contrib/dblink/dblink.sql.in
@@ -221,3 +221,23 @@ CREATE OR REPLACE FUNCTION dblink_get_notify(
RETURNS setof record
AS 'MODULE_PATHNAME', 'dblink_get_notify'
LANGUAGE C STRICT;
+
+CREATE OR REPLACE FUNCTION dblink_connection_reset (text)
+RETURNS void
+AS 'MODULE_PATHNAME','dblink_connection_reset'
+LANGUAGE C STRICT;
+
+CREATE OR REPLACE FUNCTION dblink_copy_open (text, text, boolean)
+RETURNS text
+AS 'MODULE_PATHNAME','dblink_copy_open'
+LANGUAGE C STRICT;
+
+CREATE OR REPLACE FUNCTION dblink_copy_write (internal)
+RETURNS void
+AS 'MODULE_PATHNAME','dblink_copy_write'
+LANGUAGE C STRICT;
+
+CREATE OR REPLACE FUNCTION dblink_copy_end ()
+RETURNS text
+AS 'MODULE_PATHNAME','dblink_copy_end'
+LANGUAGE C STRICT;
diff --git a/contrib/dblink/uninstall_dblink.sql b/contrib/dblink/uninstall_dblink.sql
index 45cf13c..465beb7 100644
--- a/contrib/dblink/uninstall_dblink.sql
+++ b/contrib/dblink/uninstall_dblink.sql
@@ -11,6 +11,14 @@ DROP FUNCTION dblink_build_sql_delete (text, int2vector, int4, _text);

DROP FUNCTION dblink_build_sql_insert (text, int2vector, int4, _text, _text);

+DROP FUNCTION dblink_copy_end ();
+
+DROP FUNCTION dblink_copy_open (text, text, boolean);
+
+DROP FUNCTION dblink_copy_write (internal);
+
+DROP FUNCTION dblink_connection_reset (text);
+
DROP FUNCTION dblink_get_pkey (text);

DROP TYPE dblink_pkey_results;
--
1.6.5.3


From: Daniel Farina <dfarina(at)truviso(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Daniel Farina <dfarina(at)truviso(dot)com>
Subject: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-23 21:34:42
Message-ID: 1259012082-6196-5-git-send-email-dfarina@truviso.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Signed-off-by: Daniel Farina <dfarina(at)truviso(dot)com>
---
contrib/dblink/expected/dblink.out | 272 ++++++++++++++++++++++++++++++++++++
contrib/dblink/sql/dblink.sql | 112 +++++++++++++++
2 files changed, 384 insertions(+), 0 deletions(-)

diff --git a/contrib/dblink/expected/dblink.out b/contrib/dblink/expected/dblink.out
index d39aa45..788b2a3 100644
--- a/contrib/dblink/expected/dblink.out
+++ b/contrib/dblink/expected/dblink.out
@@ -872,6 +872,278 @@ SELECT * from dblink_get_notify();
-------------+--------+-------
(0 rows)

+-- test COPY ... TO FUNCTION support
+CREATE SCHEMA dblink_copy_to_function;
+SET search_path = dblink_copy_to_function, public;
+CREATE TABLE xyzzy(f1 int, f2 text, f3 text[], primary key (f1,f2));
+NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "xyzzy_pkey" for table "xyzzy"
+INSERT INTO xyzzy VALUES (0,'a','{"a0","b0","c0"}');
+INSERT INTO xyzzy VALUES (1,'b','{"a1","b1","c1"}');
+INSERT INTO xyzzy VALUES (2,'c','{"a2","b2","c2"}');
+INSERT INTO xyzzy VALUES (3,'d','{"a3","b3","c3"}');
+INSERT INTO xyzzy VALUES (4,'e','{"a4","b4","c4"}');
+INSERT INTO xyzzy VALUES (5,'f','{"a5","b5","c5"}');
+INSERT INTO xyzzy VALUES (6,'g','{"a6","b6","c6"}');
+INSERT INTO xyzzy VALUES (7,'h','{"a7","b7","c7"}');
+INSERT INTO xyzzy VALUES (8,'i','{"a8","b8","c8"}');
+INSERT INTO xyzzy VALUES (9,'j','{"a9","b9","c9"}');
+CREATE TABLE bar(f1 int, f2 text, f3 text[], primary key (f1,f2));
+NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "bar_pkey" for table "bar"
+INSERT INTO bar VALUES (100,'w','{"a100","b100","c100"}');
+INSERT INTO bar VALUES (101,'x','{"a101","b101","c101"}');
+CREATE TABLE baz(f1 int, f2 text, f3 text[], primary key (f1,f2));
+NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "baz_pkey" for table "baz"
+INSERT INTO baz VALUES (102,'y','{"a102","b102","c102"}');
+INSERT INTO baz VALUES (103,'z','{"a103","b103","c103"}');
+CREATE TABLE plugh(f1 int, f2 text, f3 text[], primary key (f1,f2));
+NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "plugh_pkey" for table "plugh"
+INSERT INTO plugh VALUES (104,'u','{"a102","b102","c102"}');
+INSERT INTO plugh VALUES (105,'v','{"a103","b103","c103"}');
+SELECT dblink_connect('copytofunction','dbname=contrib_regression');
+ dblink_connect
+----------------
+ OK
+(1 row)
+
+SELECT dblink_exec('copytofunction',
+ 'SET search_path = dblink_copy_to_function, public;');
+ dblink_exec
+-------------
+ SET
+(1 row)
+
+-- ensure that original base data is present
+SELECT *
+FROM dblink('copytofunction', 'SELECT * FROM xyzzy') AS t(a int, b text, c text[]);
+ a | b | c
+---+---+------------
+ 0 | a | {a0,b0,c0}
+ 1 | b | {a1,b1,c1}
+ 2 | c | {a2,b2,c2}
+ 3 | d | {a3,b3,c3}
+ 4 | e | {a4,b4,c4}
+ 5 | f | {a5,b5,c5}
+ 6 | g | {a6,b6,c6}
+ 7 | h | {a7,b7,c7}
+ 8 | i | {a8,b8,c8}
+ 9 | j | {a9,b9,c9}
+(10 rows)
+
+-- try doing a few consecutive copies with one open connection
+SELECT dblink_copy_open('copytofunction', 'xyzzy', false);
+ dblink_copy_open
+------------------
+ OK
+(1 row)
+
+COPY bar TO FUNCTION dblink_copy_write;
+COPY baz TO FUNCTION dblink_copy_write;
+SELECT dblink_copy_end();
+ dblink_copy_end
+-----------------
+ OK
+(1 row)
+
+-- confirm that data has arrived
+SELECT *
+FROM dblink('copytofunction', 'SELECT * FROM xyzzy') AS t(a int, b text, c text[]);
+ a | b | c
+-----+---+------------------
+ 0 | a | {a0,b0,c0}
+ 1 | b | {a1,b1,c1}
+ 2 | c | {a2,b2,c2}
+ 3 | d | {a3,b3,c3}
+ 4 | e | {a4,b4,c4}
+ 5 | f | {a5,b5,c5}
+ 6 | g | {a6,b6,c6}
+ 7 | h | {a7,b7,c7}
+ 8 | i | {a8,b8,c8}
+ 9 | j | {a9,b9,c9}
+ 100 | w | {a100,b100,c100}
+ 101 | x | {a101,b101,c101}
+ 102 | y | {a102,b102,c102}
+ 103 | z | {a103,b103,c103}
+(14 rows)
+
+-- try doing a binary COPY
+SELECT dblink_copy_open('copytofunction', 'xyzzy', true);
+ dblink_copy_open
+------------------
+ OK
+(1 row)
+
+COPY plugh TO FUNCTION dblink_copy_write BINARY;
+SELECT dblink_copy_end();
+ dblink_copy_end
+-----------------
+ OK
+(1 row)
+
+-- confirm that data has arrived
+SELECT *
+FROM dblink('copytofunction', 'SELECT * FROM xyzzy') AS t(a int, b text, c text[]);
+ a | b | c
+-----+---+------------------
+ 0 | a | {a0,b0,c0}
+ 1 | b | {a1,b1,c1}
+ 2 | c | {a2,b2,c2}
+ 3 | d | {a3,b3,c3}
+ 4 | e | {a4,b4,c4}
+ 5 | f | {a5,b5,c5}
+ 6 | g | {a6,b6,c6}
+ 7 | h | {a7,b7,c7}
+ 8 | i | {a8,b8,c8}
+ 9 | j | {a9,b9,c9}
+ 100 | w | {a100,b100,c100}
+ 101 | x | {a101,b101,c101}
+ 102 | y | {a102,b102,c102}
+ 103 | z | {a103,b103,c103}
+ 104 | u | {a102,b102,c102}
+ 105 | v | {a103,b103,c103}
+(16 rows)
+
+-- try using reset to abort out of a copy state
+SELECT dblink_copy_open('copytofunction', 'xyzzy', true);
+ dblink_copy_open
+------------------
+ OK
+(1 row)
+
+COPY plugh TO FUNCTION dblink_copy_write BINARY;
+SELECT dblink_connection_reset('copytofunction');
+ dblink_connection_reset
+-------------------------
+
+(1 row)
+
+-- should fail, as COPY should have been aborted
+SELECT dblink_copy_end();
+ERROR: COPY end failed
+-- no new data should have appeared
+SELECT *
+FROM dblink('copytofunction', 'SELECT * FROM xyzzy') AS t(a int, b text, c text[]);
+ a | b | c
+-----+---+------------------
+ 0 | a | {a0,b0,c0}
+ 1 | b | {a1,b1,c1}
+ 2 | c | {a2,b2,c2}
+ 3 | d | {a3,b3,c3}
+ 4 | e | {a4,b4,c4}
+ 5 | f | {a5,b5,c5}
+ 6 | g | {a6,b6,c6}
+ 7 | h | {a7,b7,c7}
+ 8 | i | {a8,b8,c8}
+ 9 | j | {a9,b9,c9}
+ 100 | w | {a100,b100,c100}
+ 101 | x | {a101,b101,c101}
+ 102 | y | {a102,b102,c102}
+ 103 | z | {a103,b103,c103}
+ 104 | u | {a102,b102,c102}
+ 105 | v | {a103,b103,c103}
+(16 rows)
+
+-- should be a no-op, since no transaction should be active at this
+-- point
+SELECT dblink_connection_reset('copytofunction');
+ dblink_connection_reset
+-------------------------
+
+(1 row)
+
+-- generate an error in the remote transaction
+SELECT dblink_exec('copytofunction','BEGIN');
+ dblink_exec
+-------------
+ BEGIN
+(1 row)
+
+SELECT * FROM dblink('copytofunction', 'SELECT 1 / 0') AS t (a int);
+ERROR: division by zero
+CONTEXT: Error occurred on dblink connection named "unnamed": could not execute query.
+-- rollback the errored transaction
+SELECT dblink_connection_reset('copytofunction');
+ dblink_connection_reset
+-------------------------
+
+(1 row)
+
+-- should just work, if reset didn't actually reset the transaction
+-- state an error would result.
+SELECT * FROM dblink('copytofunction', 'SELECT 1;') AS t (a int);
+ a
+---
+ 1
+(1 row)
+
+-- try a really long identifier to test string handlig in
+-- dblink_copy_open. This should neatly hit NAMEDATALEN on most
+-- systems, or 64 - 1
+create table
+"012345678901234567890123456789012345678901234567890123456789012" (a int);
+-- should put the connection into the COPY state without complaint...
+SELECT dblink_copy_open('copytofunction',
+ '012345678901234567890123456789012345678901234567890123456789012',
+ true);
+ dblink_copy_open
+------------------
+ OK
+(1 row)
+
+COPY (SELECT generate_series(1, 5)) TO FUNCTION dblink_copy_write BINARY;
+SELECT dblink_copy_end();
+ dblink_copy_end
+-----------------
+ OK
+(1 row)
+
+-- check to see if data made it
+SELECT * FROM
+ "012345678901234567890123456789012345678901234567890123456789012";
+ a
+---
+ 1
+ 2
+ 3
+ 4
+ 5
+(5 rows)
+
+-- postgres truncates long identifiers and advertises with a NOTICE,
+-- and as of right now dblink does no remote-machine NOTICE handling.
+-- The result is silent truncation to the remote machine's
+-- NAMEDATALEN.
+SELECT dblink_copy_open('copytofunction',
+ '012345678901234567890123456789012345678901234567890123456789012345678',
+ true);
+ dblink_copy_open
+------------------
+ OK
+(1 row)
+
+COPY (SELECT generate_series(6, 10)) TO FUNCTION dblink_copy_write BINARY;
+SELECT dblink_copy_end();
+ dblink_copy_end
+-----------------
+ OK
+(1 row)
+
+-- check to see if data made it
+SELECT * FROM
+ "012345678901234567890123456789012345678901234567890123456789012";
+ a
+----
+ 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+ 10
+(10 rows)
+
SELECT dblink_disconnect();
dblink_disconnect
-------------------
diff --git a/contrib/dblink/sql/dblink.sql b/contrib/dblink/sql/dblink.sql
index d0ad876..919fd78 100644
--- a/contrib/dblink/sql/dblink.sql
+++ b/contrib/dblink/sql/dblink.sql
@@ -405,4 +405,116 @@ SELECT notify_name, be_pid = (select t.be_pid from dblink('select pg_backend_pid

SELECT * from dblink_get_notify();

+-- test COPY ... TO FUNCTION support
+CREATE SCHEMA dblink_copy_to_function;
+SET search_path = dblink_copy_to_function, public;
+CREATE TABLE xyzzy(f1 int, f2 text, f3 text[], primary key (f1,f2));
+INSERT INTO xyzzy VALUES (0,'a','{"a0","b0","c0"}');
+INSERT INTO xyzzy VALUES (1,'b','{"a1","b1","c1"}');
+INSERT INTO xyzzy VALUES (2,'c','{"a2","b2","c2"}');
+INSERT INTO xyzzy VALUES (3,'d','{"a3","b3","c3"}');
+INSERT INTO xyzzy VALUES (4,'e','{"a4","b4","c4"}');
+INSERT INTO xyzzy VALUES (5,'f','{"a5","b5","c5"}');
+INSERT INTO xyzzy VALUES (6,'g','{"a6","b6","c6"}');
+INSERT INTO xyzzy VALUES (7,'h','{"a7","b7","c7"}');
+INSERT INTO xyzzy VALUES (8,'i','{"a8","b8","c8"}');
+INSERT INTO xyzzy VALUES (9,'j','{"a9","b9","c9"}');
+
+CREATE TABLE bar(f1 int, f2 text, f3 text[], primary key (f1,f2));
+INSERT INTO bar VALUES (100,'w','{"a100","b100","c100"}');
+INSERT INTO bar VALUES (101,'x','{"a101","b101","c101"}');
+
+CREATE TABLE baz(f1 int, f2 text, f3 text[], primary key (f1,f2));
+INSERT INTO baz VALUES (102,'y','{"a102","b102","c102"}');
+INSERT INTO baz VALUES (103,'z','{"a103","b103","c103"}');
+
+CREATE TABLE plugh(f1 int, f2 text, f3 text[], primary key (f1,f2));
+INSERT INTO plugh VALUES (104,'u','{"a102","b102","c102"}');
+INSERT INTO plugh VALUES (105,'v','{"a103","b103","c103"}');
+
+SELECT dblink_connect('copytofunction','dbname=contrib_regression');
+SELECT dblink_exec('copytofunction',
+ 'SET search_path = dblink_copy_to_function, public;');
+
+-- ensure that original base data is present
+SELECT *
+FROM dblink('copytofunction', 'SELECT * FROM xyzzy') AS t(a int, b text, c text[]);
+
+-- try doing a few consecutive copies with one open connection
+SELECT dblink_copy_open('copytofunction', 'xyzzy', false);
+COPY bar TO FUNCTION dblink_copy_write;
+COPY baz TO FUNCTION dblink_copy_write;
+SELECT dblink_copy_end();
+
+-- confirm that data has arrived
+SELECT *
+FROM dblink('copytofunction', 'SELECT * FROM xyzzy') AS t(a int, b text, c text[]);
+
+-- try doing a binary COPY
+SELECT dblink_copy_open('copytofunction', 'xyzzy', true);
+COPY plugh TO FUNCTION dblink_copy_write BINARY;
+SELECT dblink_copy_end();
+
+-- confirm that data has arrived
+SELECT *
+FROM dblink('copytofunction', 'SELECT * FROM xyzzy') AS t(a int, b text, c text[]);
+
+-- try using reset to abort out of a copy state
+SELECT dblink_copy_open('copytofunction', 'xyzzy', true);
+COPY plugh TO FUNCTION dblink_copy_write BINARY;
+SELECT dblink_connection_reset('copytofunction');
+
+-- should fail, as COPY should have been aborted
+SELECT dblink_copy_end();
+
+-- no new data should have appeared
+SELECT *
+FROM dblink('copytofunction', 'SELECT * FROM xyzzy') AS t(a int, b text, c text[]);
+
+-- should be a no-op, since no transaction should be active at this
+-- point
+SELECT dblink_connection_reset('copytofunction');
+
+-- generate an error in the remote transaction
+SELECT dblink_exec('copytofunction','BEGIN');
+SELECT * FROM dblink('copytofunction', 'SELECT 1 / 0') AS t (a int);
+
+-- rollback the errored transaction
+SELECT dblink_connection_reset('copytofunction');
+
+-- should just work, if reset didn't actually reset the transaction
+-- state an error would result.
+SELECT * FROM dblink('copytofunction', 'SELECT 1;') AS t (a int);
+
+-- try a really long identifier to test string handlig in
+-- dblink_copy_open. This should neatly hit NAMEDATALEN on most
+-- systems, or 64 - 1
+create table
+"012345678901234567890123456789012345678901234567890123456789012" (a int);
+
+-- should put the connection into the COPY state without complaint...
+SELECT dblink_copy_open('copytofunction',
+ '012345678901234567890123456789012345678901234567890123456789012',
+ true);
+COPY (SELECT generate_series(1, 5)) TO FUNCTION dblink_copy_write BINARY;
+SELECT dblink_copy_end();
+
+-- check to see if data made it
+SELECT * FROM
+ "012345678901234567890123456789012345678901234567890123456789012";
+
+-- postgres truncates long identifiers and advertises with a NOTICE,
+-- and as of right now dblink does no remote-machine NOTICE handling.
+-- The result is silent truncation to the remote machine's
+-- NAMEDATALEN.
+SELECT dblink_copy_open('copytofunction',
+ '012345678901234567890123456789012345678901234567890123456789012345678',
+ true);
+COPY (SELECT generate_series(6, 10)) TO FUNCTION dblink_copy_write BINARY;
+SELECT dblink_copy_end();
+
+-- check to see if data made it
+SELECT * FROM
+ "012345678901234567890123456789012345678901234567890123456789012";
+
SELECT dblink_disconnect();
--
1.6.5.3


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Daniel Farina <dfarina(at)truviso(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-23 22:16:31
Message-ID: 603c8f070911231416j504d35d4wb10d23fcf57034b6@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Nov 23, 2009 at 4:34 PM, Daniel Farina <dfarina(at)truviso(dot)com> wrote:
> Signed-off-by: Daniel Farina <dfarina(at)truviso(dot)com>

Thanks for the patch. You may want to take a look at this:

http://wiki.postgresql.org/wiki/Submitting_a_Patch

I'm fuzzy on what problem this is attempting to solve... as mentioned
in the above guidelines, it's usually good to start with some design
discussions before writing/submitting code. Also, we prefer that
patches be submitted as context diffs and that they not be split up
over multiple emails.

Thanks,

...Robert


From: Daniel Farina <dfarina(at)truviso(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-23 23:31:08
Message-ID: 429f3b220911231531v6e268b31j593147aacd416767@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Nov 23, 2009 at 2:16 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Nov 23, 2009 at 4:34 PM, Daniel Farina <dfarina(at)truviso(dot)com> wrote:
>> Signed-off-by: Daniel Farina <dfarina(at)truviso(dot)com>
>
> Thanks for the patch.  You may want to take a look at this:
>
> http://wiki.postgresql.org/wiki/Submitting_a_Patch
>
> I'm fuzzy on what problem this is attempting to solve...

It seems somewhat strange that the only things COPY can do with its
output stream of bytes is exactly two modes that are baked into
Postgres in the core. This allows carefully written UDFs to do
whatever they will with the stream of bytes, such as sending into a
waiting libpq connection.

> as mentioned in the above guidelines, it's usually good to start with some design
> discussions before writing/submitting code.

The patch is derived from functionality in the Truviso
postgres-derived database product which is non-optional. This is
extruded from that.

> Also, we prefer that patches be submitted as context diffs

I actually remembered this right after I sent it...sorry about that.

> And that they not be split up over multiple emails.

With the possible exception of squashing together the test cases into
their implementing patches, I would say this is at least two patches.
One is to a contrib, the other to core PostgreSQL. It so happens the
core addition makes the contrib changes much more obviously useful.

fdr


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-23 23:31:29
Message-ID: 4B0B1B51.10807@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas wrote:
> I'm fuzzy on what problem this is attempting to solve... as mentioned
> in the above guidelines, it's usually good to start with some design
> discussions before writing/submitting code.
This has been through some heavy design discussions with a few PG
hackers you know and some you don't, they just couldn't release the
result until now. As for what it's good for, if you look at what you
can do now with dblink, you can easily move rows between nodes using
dblink_build_sql_insert. This is perfectly fine for small bits of work,
but the performance isn't nearly good enough to do serious replication
with it. The upper level patch here allows using COPY as the mechanism
to move things between them, which is much faster for some use cases
(which includes most of the really useful ones). It dramatically
increases the scale of what you can move around using dblink as the
replication transport.

The lower level patch is needed to build that layer, which is an
immediate proof of its utility. In addition, adding a user-defined
function as a COPY target opens up all sorts of possibilities for things
like efficient ETL implementation. And if this approach is accepted as
a reasonable one, as Dan suggested a next step might even be to
similarly allow passing COPY FROM through a UDF, which has the potential
to provide a new efficient implementation path for some of the custom
input filter requests that pop up here periodically.

--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.com


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-23 23:46:07
Message-ID: 4B0B1EBF.7000301@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Smith wrote:
> Robert Haas wrote:
>> I'm fuzzy on what problem this is attempting to solve... as mentioned
>> in the above guidelines, it's usually good to start with some design
>> discussions before writing/submitting code.
> This has been through some heavy design discussions with a few PG
> hackers you know and some you don't, they just couldn't release the
> result until now. As for what it's good for, if you look at what you
> can do now with dblink, you can easily move rows between nodes using
> dblink_build_sql_insert. This is perfectly fine for small bits of
> work, but the performance isn't nearly good enough to do serious
> replication with it. The upper level patch here allows using COPY as
> the mechanism to move things between them, which is much faster for
> some use cases (which includes most of the really useful ones). It
> dramatically increases the scale of what you can move around using
> dblink as the replication transport.

I recently found myself trying to push data through dblink() and ended
up writing code to make a call to the target to call a function which
called back to the source to select the data and insert it. The speedup
was massive, so I'll be interested to dig into the details here.

> The lower level patch is needed to build that layer, which is an
> immediate proof of its utility. In addition, adding a user-defined
> function as a COPY target opens up all sorts of possibilities for
> things like efficient ETL implementation. And if this approach is
> accepted as a reasonable one, as Dan suggested a next step might even
> be to similarly allow passing COPY FROM through a UDF, which has the
> potential to provide a new efficient implementation path for some of
> the custom input filter requests that pop up here periodically.

I'm also interested in COPY returning rows as text[], which was
discussed recently. Does this help move us towards that?

cheers

andrew


From: Daniel Farina <dfarina(at)truviso(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-24 00:03:38
Message-ID: 429f3b220911231603pe3c3635v18f729bca5122792@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Nov 23, 2009 at 3:46 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> I recently found myself trying to push data through dblink() and ended up
> writing code to make a call to the target to call a function which called
> back to the source to select the data and insert it. The speedup was
> massive, so I'll be interested to dig into the details here.

The way the indirection is accomplished here should be very cheap.
Overhead should be comparable to the fwrite() call that is used for
copying to a file...this is why I mentioned that it would be
interesting to make this good enough to be the underlying mechanism of
TO STDOUT/TO 'file' to reduce the overall number of mechanisms used to
perform COPY TO.

> I'm also interested in COPY returning rows as text[], which was discussed
> recently. Does this help move us towards that?

Yes. Take a look at the tests introduced to core PostgeSQL (see patch
2), where instead of returning a text[] I return just a single text of
the verbatim output of the copy. You could imagine making that an SRF
instead. It would have to understand COPY row delimiters in whatever
mode you were operating in, though.

fdr


From: Daniel Farina <dfarina(at)truviso(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-24 00:06:34
Message-ID: 429f3b220911231606o51f03279ye7736d084576bda9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Nov 23, 2009 at 4:03 PM, Daniel Farina <dfarina(at)truviso(dot)com> wrote:
> Yes.  Take a look at the tests introduced to core PostgeSQL (see patch
> 2), where instead of returning a text[] I return just a single text of
> the verbatim output of the copy.  You could imagine making that an SRF
> instead.  It would have to understand COPY row delimiters in whatever
> mode you were operating in, though.

Actually, sorry, I lie, but not in a bad way.... Since COPY operates
row at a time (rather than a stream of bytes with arbitrary
boundaries) you could rely on being passed each record one-at-a-time.
You don't have to understand the delimiter. So you could even make a
bytea[][] that even contains the binary output, the first dimension
being row number, the second being the bytes themselves. The header
would pose an interesting problem, though.

fdr


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-24 00:20:05
Message-ID: 28253.1259022005@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Smith <greg(at)2ndquadrant(dot)com> writes:
> Robert Haas wrote:
>> I'm fuzzy on what problem this is attempting to solve... as mentioned
>> in the above guidelines, it's usually good to start with some design
>> discussions before writing/submitting code.

> This has been through some heavy design discussions with a few PG
> hackers you know and some you don't, they just couldn't release the
> result until now.

Those discussions don't have a lot of credibility if they didn't take
place on the public mailing lists.

pgsql-hackers had some preliminary discussions a couple months back
on refactoring COPY to allow things like this --- see the thread
starting here:
http://archives.postgresql.org/pgsql-hackers/2009-09/msg00486.php
While I don't think we arrived at any final decisions, I would like
to know how this design fits in with what was discussed then.

regards, tom lane


From: Daniel Farina <dfarina(at)truviso(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-24 00:25:31
Message-ID: 429f3b220911231625m11f793b0u488fbb0db305ab97@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Nov 23, 2009 at 4:20 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> pgsql-hackers had some preliminary discussions a couple months back
> on refactoring COPY to allow things like this --- see the thread
> starting here:
> http://archives.postgresql.org/pgsql-hackers/2009-09/msg00486.php
> While I don't think we arrived at any final decisions, I would like
> to know how this design fits in with what was discussed then.

This seems to be about importing/ingress, whereas this patch is about
exporting/egress...it is an interesting question on how much parsing
to do before on the ingress side before handing a row to a function
though, should we try to make these kinds of operations a bit more
symmetrical.

I did consider refactoring COPY, but since it's never clear when we
start a feature whether it is going to manifest itself as a good
upstream candidate we default to trying to make future merges with
Postgres tractable I did not take on such a large and artistic task.

fdr


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-24 01:46:06
Message-ID: 4B0B3ADE.3070302@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Those discussions don't have a lot of credibility if they didn't take
> place on the public mailing lists.
>
You know how people complain about how new contributors are treated
here? Throwing out comments like this, that come off as belittling to
other people's work, doesn't help. All I was suggesting was that Dan
wasn't developing this in complete isolation from the hackers community
as Robert had feared, as will be obvious when we get to:

> pgsql-hackers had some preliminary discussions a couple months back
> on refactoring COPY to allow things like this --- see the thread
> starting here:
> http://archives.postgresql.org/pgsql-hackers/2009-09/msg00486.php
> While I don't think we arrived at any final decisions, I would like
> to know how this design fits in with what was discussed then.
>
The patch provided here is a first step toward what you suggested in the
related "Copy enhancements thread" a few days later, at
http://archives.postgresql.org/pgsql-hackers/2009-09/msg00616.php It's
one way to implement a better decoupled "data transformation" layer on
top of COPY. When Dan showed me an earlier implementation of the basic
idea embodied in this patch (developed independently and earlier
actually), I gave it a thumbs-up as seeming to match the general
direction the community discussion suggested, and encouraged him to work
on getting the code to where it could be released. He started with
output rather than input, mainly because the dblink feature had a
compelling use-case that justified spending time on development for the
company. Rather than keep going into input transformation, this
development checkpoint made a good place to pause and solicit public
feedback, particularly since the input side has additional hurdles to
clear before it can work.

As far as other past discussion here that might be relevant, this patch
includes a direct change to gram.y to support the new syntax. You've
already suggested before that it might be time to update COPY the same
way EXPLAIN and now VACUUM have been overhauled to provide a more
flexible options interface:
http://archives.postgresql.org/pgsql-hackers/2009-09/msg00616.php This
patch might be more fuel for that idea.

Emmanuel has given up the more error tolerant COPY patch that thread was
associated with, and I haven't heard anything from Andrew about ragged
CVS import either. I think that ultimately those features are useful,
but just exceed what the existing code could be hacked to handle
cleanly. If it's possible to lower the complexity bar to implementing
them by abstracting the transformation into a set of functions, and have
more flexible syntax for the built-in ones the database ships with, that
may be useful groundwork for returning to implementing those features
without bloating COPY's internals (and therefore performance)
intolerably. Dan even suggested in his first note that it might be
possible to write the current STDOUT|file behavior in terms of the new
function interface, which has the potential to make the COPY
implementation cleaner rather than more cluttered (as long as the
performance doesn't suffer).

--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.com


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-24 02:37:25
Message-ID: 4B0B46E5.803@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Smith wrote:
> I haven't heard anything from Andrew about ragged CVS import either.
> I think that ultimately those features are useful, but just exceed
> what the existing code could be hacked to handle cleanly.

The patch is attached for your edification/amusement. I have backpatched
it to 8.4 for the client that needed it, and it's working just fine. I
didn't pursue it when it was clear that it was not going to be accepted.
COPY returning text[] would allow us to achieve the same thing, a bit
more verbosely, but it would be a lot more work to develop.

cheers

andrew

Attachment Content-Type Size
raggedcopy.patch text/x-patch 8.4 KB

From: Hannu Krosing <hannu(at)krosing(dot)net>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-24 08:29:22
Message-ID: 1259051362.30357.56.camel@hvost1700
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2009-11-23 at 18:31 -0500, Greg Smith wrote:
> Robert Haas wrote:
> > I'm fuzzy on what problem this is attempting to solve... as mentioned
> > in the above guidelines, it's usually good to start with some design
> > discussions before writing/submitting code.
> This has been through some heavy design discussions with a few PG
> hackers you know and some you don't, they just couldn't release the
> result until now. As for what it's good for, if you look at what you
> can do now with dblink, you can easily move rows between nodes using
> dblink_build_sql_insert. This is perfectly fine for small bits of work,
> but the performance isn't nearly good enough to do serious replication
> with it. The upper level patch here allows using COPY as the mechanism
> to move things between them, which is much faster for some use cases
> (which includes most of the really useful ones). It dramatically
> increases the scale of what you can move around using dblink as the
> replication transport.
>
> The lower level patch is needed to build that layer, which is an
> immediate proof of its utility. In addition, adding a user-defined
> function as a COPY target opens up all sorts of possibilities for things
> like efficient ETL implementation. And if this approach is accepted as
> a reasonable one, as Dan suggested a next step might even be to
> similarly allow passing COPY FROM through a UDF, which has the potential
> to provide a new efficient implementation path for some of the custom
> input filter requests that pop up here periodically.

Can this easily be extended to do things like

COPY stdin TO udf();
or
COPY udf() FROM stdin;

so that I could write a simple partitioning function, either local for
partitioned tables or using pl/proxy for partitioned databases

?

> --
> Greg Smith 2ndQuadrant Baltimore, MD
> PostgreSQL Training, Services and Support
> greg(at)2ndQuadrant(dot)com www.2ndQuadrant.com
>
>


From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: Daniel Farina <dfarina(at)truviso(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-24 08:38:37
Message-ID: 1259051917.30357.60.camel@hvost1700
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2009-11-23 at 16:25 -0800, Daniel Farina wrote:
> On Mon, Nov 23, 2009 at 4:20 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > pgsql-hackers had some preliminary discussions a couple months back
> > on refactoring COPY to allow things like this --- see the thread
> > starting here:
> > http://archives.postgresql.org/pgsql-hackers/2009-09/msg00486.php
> > While I don't think we arrived at any final decisions, I would like
> > to know how this design fits in with what was discussed then.
>
> This seems to be about importing/ingress, whereas this patch is about
> exporting/egress...it is an interesting question on how much parsing
> to do before on the ingress side before handing a row to a function
> though,

My suggestion for

COPY func(rowtype) FROM stdin;

would be to pass the function a fully processed row of that type with
all fields resolved and converted to right types.

it may be useful to also have forms like

COPY func(text) FROM stdin;

and

COPY func(bytea[]) FROM stdin;

for getting a less processed input

> should we try to make these kinds of operations a bit more
> symmetrical.

--
Hannu Krosing http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability
Services, Consulting and Training


From: Daniel Farina <drfarina(at)acm(dot)org>
To: Hannu Krosing <hannu(at)krosing(dot)net>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-24 08:54:26
Message-ID: 7b97c5a40911240054x18773238s55be23237b911cd3@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Nov 24, 2009 at 12:29 AM, Hannu Krosing <hannu(at)krosing(dot)net> wrote:
> COPY stdin TO udf();

If stdin becomes (is?) a legitimate source of records, then this patch
will Just Work.

The patch is already quite useful in the COPY (SELECT ...) TO FUNCTION
... scenario.

> COPY udf() FROM stdin;

This is unaddressed, but I think it would be a good idea to consider
enabling this kind of thing prior to application.

fdr


From: Daniel Farina <drfarina(at)gmail(dot)com>
To: Hannu Krosing <hannu(at)2ndquadrant(dot)com>
Cc: Daniel Farina <dfarina(at)truviso(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-24 08:57:02
Message-ID: 7b97c5a40911240057q4bbf9c7ldb5f1bac382c3f42@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Nov 24, 2009 at 12:38 AM, Hannu Krosing <hannu(at)2ndquadrant(dot)com> wrote:
> COPY func(rowtype) FROM stdin;

I didn't consider rowtype...I did consider a type list, such as:

COPY func(typea, typeb, typec) FROM ...

Which would then operate just like a table, but be useless for the
data-cleaning case, and would not allow type overloading to be the
mechanism of selecting behavior.

It was just an idea...I don't really know the use cases well enough,
my anticipated used case was updating eagerly materialized quantities
with the UDF.

fdr


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Hannu Krosing <hannu(at)krosing(dot)net>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-24 10:10:48
Message-ID: 162867790911240210n1f95eca8g26bc6b7d9ca89245@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello

I thing, so this patch is maybe good idea. I am missing better
function specification. Specification by name isn't enough - we can
have a overloaded functions. This syntax doesn't allow to use explicit
cast - from my personal view, the syntax is ugly - with type
specification we don't need to keyword FUNCTION

what about::

var a) by type specification

COPY table TO foo(varchar, int)

var b) by value expression - it allows some changes in order, casting, constants

COPY table TO foo($3, $1::date, $2::text, CURRENT_DATE, true);

One question:
We have a fast copy statement - ok., we have a fast function ok, but
inside a function we have to call "slow" sql query. Personally What is
advantage?

We need pipes like

like COPY table TO foo(..) TO table

foo() should be a transformation function, or real pipe function

Regards
Pavel Stehule


From: Daniel Farina <drfarina(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Hannu Krosing <hannu(at)krosing(dot)net>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-24 10:37:37
Message-ID: 7b97c5a40911240237m6e23fb4fh977ab1e4196da776@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Nov 24, 2009 at 2:10 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> Hello
>
> I thing, so this patch is maybe good idea. I am missing better
> function specification. Specification by name isn't enough - we can
> have a overloaded functions. This syntax doesn't allow to use explicit
> cast - from my personal view, the syntax is ugly - with type
> specification we don't need to keyword FUNCTION

As long as things continue to support the INTERNAL-type behavior for
extremely low overhead bulk transfers I am open to suggestions about
how to enrich things...but how would I do so under this proposal?

I am especially fishing for suggestions in the direction of managing
state for the function between rows though...I don't like how the
current design seems to scream "use a global variable."

> We have a fast copy statement - ok., we have a fast function ok, but
> inside a function we have to call "slow" sql query. Personally What is
> advantage?

The implementation here uses a type 'internal' for performance. It
doesn't even recompute the fcinfo because of the very particular
circumstances of how the function is called. It doesn't do a memory
copy of the argument buffer either, to the best of my knowledge. In
the dblink patches you basically stream directly from the disk, format
the COPY bytes, and shove it into a waiting COPY on another postgres
node...there's almost no additional work in-between. All utilized
time would be some combination of the normal COPY byte stream
generation and libpq.

This, of course, presumes that everyone who is interested in building
on this is going to use some UDFs written in C...

>
> We need pipes like
>
> like COPY table TO foo(..) TO table
>
> foo() should be a transformation function, or real pipe function

I've actually considered this pipe thing with a colleague while
driving home from work...it occurred to us that it would be nice to
have both pipes and tees (basically composition vs. mapping
application of functions over the input) in some form. Not sure what
an elegant way to express that is or how to control it. Since you can
work around this by composing or applying functions on your own in
another function, I'm not sure if that's as high priority for me
personally.

fdr


From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: Daniel Farina <drfarina(at)gmail(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-24 10:50:48
Message-ID: 1259059848.30357.68.camel@hvost1700
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2009-11-24 at 02:37 -0800, Daniel Farina wrote:
> On Tue, Nov 24, 2009 at 2:10 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> > Hello
> >
> > I thing, so this patch is maybe good idea. I am missing better
> > function specification. Specification by name isn't enough - we can
> > have a overloaded functions. This syntax doesn't allow to use explicit
> > cast - from my personal view, the syntax is ugly - with type
> > specification we don't need to keyword FUNCTION
>
> As long as things continue to support the INTERNAL-type behavior for
> extremely low overhead bulk transfers I am open to suggestions about
> how to enrich things...but how would I do so under this proposal?
>
> I am especially fishing for suggestions in the direction of managing
> state for the function between rows though...I don't like how the
> current design seems to scream "use a global variable."

Can't you use existing aggregate function design ?

CREATE AGGREGATE name ( input_data_type [ , ... ] ) (
SFUNC = sfunc,
STYPE = state_data_type
[ , FINALFUNC = ffunc ]
[ , INITCOND = initial_condition ]
[ , SORTOP = sort_operator ]
)

and maybe use additional INITFUNC=, if you need it for dblink type
things which don't do connection management it automatically like
pl/proxy does.

--
Hannu Krosing http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability
Services, Consulting and Training


From: Daniel Farina <drfarina(at)gmail(dot)com>
To: Hannu Krosing <hannu(at)2ndquadrant(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-24 10:51:20
Message-ID: 7b97c5a40911240251y46ff0ccdl93a60007081fbd91@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Nov 24, 2009 at 2:50 AM, Hannu Krosing <hannu(at)2ndquadrant(dot)com> wrote:
> Can't you use existing aggregate function design ?
>
> CREATE AGGREGATE name ( input_data_type [ , ... ] ) (
>    SFUNC = sfunc,
>    STYPE = state_data_type
>    [ , FINALFUNC = ffunc ]
>    [ , INITCOND = initial_condition ]
>    [ , SORTOP = sort_operator ]
> )

Actually, yes. I just thought that this was an idea so crazy that no
one would like it.

fdr


From: Daniel Farina <drfarina(at)gmail(dot)com>
To: Hannu Krosing <hannu(at)2ndquadrant(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-24 10:56:08
Message-ID: 7b97c5a40911240256n3ef924absff51e92523b129d7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Nov 24, 2009 at 2:51 AM, Daniel Farina <drfarina(at)gmail(dot)com> wrote:
> On Tue, Nov 24, 2009 at 2:50 AM, Hannu Krosing <hannu(at)2ndquadrant(dot)com> wrote:
>> Can't you use existing aggregate function design ?
>>
>> CREATE AGGREGATE name ( input_data_type [ , ... ] ) (
>>    SFUNC = sfunc,
>>    STYPE = state_data_type
>>    [ , FINALFUNC = ffunc ]
>>    [ , INITCOND = initial_condition ]
>>    [ , SORTOP = sort_operator ]
>> )
>
> Actually, yes.  I just thought that this was an idea so crazy that no
> one would like it.

Oh, and the other elephant in the room: error handling. How to handle
error conditions...try/catch/finally type stuff. Aggregates do not
necessarily provide a slot for this one. I did consider using
aggregates though, but somehow it felt to me like "I need at least a
three-tuple, why not fish around for any random bundling of three
functions..."

After all, I would not want to actually call the nodeAgg stuff to
apply the function anyway...so it'd basically be abused as a
three-tuple of functions.

Also, what if you wanted, say, replace the mechanism for COPY TO
'file'? It'd be nice to make the following interaction (which uses
some implied global variables) not use such global variables:

BEGIN;
select open_file('/tmp/file', 'w+');
copy foo to function write_to_file;
-- what happens here if COPY aborts? Does the transaction being in
the error state mean that files will not get closed?
select close_file();
COMMIT;

fdr


From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: Daniel Farina <drfarina(at)gmail(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-24 11:25:04
Message-ID: 1259061904.30357.88.camel@hvost1700
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2009-11-24 at 02:56 -0800, Daniel Farina wrote:
> On Tue, Nov 24, 2009 at 2:51 AM, Daniel Farina <drfarina(at)gmail(dot)com> wrote:
> > On Tue, Nov 24, 2009 at 2:50 AM, Hannu Krosing <hannu(at)2ndquadrant(dot)com> wrote:
> >> Can't you use existing aggregate function design ?
> >>
> >> CREATE AGGREGATE name ( input_data_type [ , ... ] ) (
> >> SFUNC = sfunc,
> >> STYPE = state_data_type
> >> [ , FINALFUNC = ffunc ]
> >> [ , INITCOND = initial_condition ]
> >> [ , SORTOP = sort_operator ]
> >> )
> >
> > Actually, yes. I just thought that this was an idea so crazy that no
> > one would like it.

seems kind of natural choice for me - in essence this is an aggregate
function, aggregating over rows/tuples supplied to it.

> Oh, and the other elephant in the room: error handling. How to handle
> error conditions...try/catch/finally type stuff.

Same as current aggregates - either ignore the error, logi it and
continue, or bail out

> Aggregates do not necessarily provide a slot for this one.

Neither do ordinary funtions, we have no "ON ERROR DO ..." clause for
function definitions

> I did consider using
> aggregates though, but somehow it felt to me like "I need at least a
> three-tuple, why not fish around for any random bundling of three
> functions..."

Why do you need three ?

> After all, I would not want to actually call the nodeAgg stuff to
> apply the function anyway...so it'd basically be abused as a
> three-tuple of functions.

Actually it would be best if it could use straight generic funtions, so
you could do something like

COPY stdin TO filterfunc(int) TO avg(int);

You can bypass using nodeAgg in your own C functions as an optimisation.

> Also, what if you wanted, say, replace the mechanism for COPY TO
> 'file'? It'd be nice to make the following interaction (which uses
> some implied global variables) not use such global variables:
>
> BEGIN;
> select open_file('/tmp/file', 'w+');
> copy foo to function write_to_file;
> -- what happens here if COPY aborts? Does the transaction being in
> the error state mean that files will not get closed?
> select close_file();
> COMMIT;

pass the file name in as an argument to SFUNC, open it on first call,
ignore later (if it stays the same ;)

for foreign connections use SQL-MED and pass the handle to "foreign
data"

--
Hannu Krosing http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability
Services, Consulting and Training


From: Daniel Farina <drfarina(at)gmail(dot)com>
To: Hannu Krosing <hannu(at)2ndquadrant(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-24 11:48:52
Message-ID: 7b97c5a40911240348k703996c0r760b943feabe1686@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Nov 24, 2009 at 3:25 AM, Hannu Krosing <hannu(at)2ndquadrant(dot)com> wrote:
> On Tue, 2009-11-24 at 02:56 -0800, Daniel Farina wrote:
>> On Tue, Nov 24, 2009 at 2:51 AM, Daniel Farina <drfarina(at)gmail(dot)com> wrote:
>> > On Tue, Nov 24, 2009 at 2:50 AM, Hannu Krosing <hannu(at)2ndquadrant(dot)com> wrote:
>> >> Can't you use existing aggregate function design ?
>> >>
>> >> CREATE AGGREGATE name ( input_data_type [ , ... ] ) (
>> >>    SFUNC = sfunc,
>> >>    STYPE = state_data_type
>> >>    [ , FINALFUNC = ffunc ]
>> >>    [ , INITCOND = initial_condition ]
>> >>    [ , SORTOP = sort_operator ]
>> >> )
>> >
>> > Actually, yes.  I just thought that this was an idea so crazy that no
>> > one would like it.
>
> seems kind of natural choice for me - in essence this is an aggregate
> function, aggregating over rows/tuples supplied to it.

Okay, well, maybe that wasn't such a crazy idea after all...

>> Oh, and the other elephant in the room: error handling.  How to handle
>> error conditions...try/catch/finally type stuff.
>
> Same as current aggregates - either ignore the error, logi it and
> continue, or bail out
>[snip]
> Neither do ordinary funtions, we have no "ON ERROR DO ..." clause  for
> function definitions

It is assumed most functions do not have side effects outside the
database, so this is gotten rather for free. The driving use case for
this *is* side effects on other systems. I'm not sure if it's as easy
to use this justification here...normally rollbacks just take care of
all the error handling a function would want. Here I'm not so sure
that is as common a case.

>
>> I did consider using
>> aggregates though, but somehow it felt to me like "I need at least a
>> three-tuple, why not fish around for any random bundling of three
>> functions..."
>
> Why do you need three ?

I'm counting the aggregate prototype itself to refer to the bundle,
which I suppose would be more normally considered a two-tuple of
functions. This is a self-referential tuple, I suppose...

>> After all, I would not want to actually call the nodeAgg stuff to
>> apply the function anyway...so it'd basically be abused as a
>> three-tuple of functions.
>
> Actually it would be best if it could use straight generic funtions, so
> you could do something like
>
> COPY stdin TO filterfunc(int) TO avg(int);

Generic functions? Do you mean just scalar functions? That'd be
neat, but as I said previously, composition could just be wrapped into
a function of the user's choice. Also, what about use of
multi-function-apply?

COPY stdin TO replicant1(datum) AND replicant2(datum);

You could imagine all sorts of new 2PC evil. But again, one could
just write a little function to absorb the rows and dole them out
without bloating COPY syntax...

I am in no way suggesting that syntax seriously or unseriously.

> pass the file name in as an argument to SFUNC, open it on first call,
> ignore later (if it stays the same ;)

So either you are going to pass it with every row and ignore it, or
create a new initial aggregate state for each COPY TO FUNCTION...how
are you going to get it passed to SFUNC?

fdr


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Daniel Farina <drfarina(at)gmail(dot)com>
Cc: Hannu Krosing <hannu(at)krosing(dot)net>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-24 12:37:18
Message-ID: 162867790911240437t75aea7c4t9c477178b50a321a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2009/11/24 Daniel Farina <drfarina(at)gmail(dot)com>:
> On Tue, Nov 24, 2009 at 2:10 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>> Hello
>>
>> I thing, so this patch is maybe good idea. I am missing better
>> function specification. Specification by name isn't enough - we can
>> have a overloaded functions. This syntax doesn't allow to use explicit
>> cast - from my personal view, the syntax is ugly - with type
>> specification we don't need to keyword FUNCTION
>
> As long as things continue to support the INTERNAL-type behavior for
> extremely low overhead bulk transfers I am open to suggestions about
> how to enrich things...but how would I do so under this proposal?
>

using an INTERNAL type is wrong. It breaks design these functions for
usual PL. I don't see any reason, why it's necessary.

> I am especially fishing for suggestions in the direction of managing
> state for the function between rows though...I don't like how the
> current design seems to scream "use a global variable."
>
>> We have a fast copy statement - ok., we have a fast function ok, but
>> inside a function we have to call "slow" sql query. Personally What is
>> advantage?
>
> The implementation here uses a type 'internal' for performance.  It
> doesn't even recompute the fcinfo because of the very particular
> circumstances of how the function is called.  It doesn't do a memory
> copy of the argument buffer either, to the best of my knowledge.  In
> the dblink patches you basically stream directly from the disk, format
> the COPY bytes, and shove it into a waiting COPY on another postgres
> node...there's almost no additional work in-between.  All utilized
> time would be some combination of the normal COPY byte stream
> generation and libpq.
>

I understand and I dislike it. This design isn't general - or it is
far from using a function. It doesn't use complete FUNCAPI interface.
I thing so you need different semantic. You are not use a function.
You are use some like "stream object". This stream object can have a
input, output function, and parameters should be internal (I don't
thing, so internal could to carry any significant performance here) or
standard. Syntax should be similar to CREATE AGGREGATE.

then syntax should be:

COPY table TO streamname(parameters)

COPY table TO filestream('/tmp/foo.dta') ...
COPY table TO dblinkstream(connectionstring) ...

This design is only ideas. It's not important.

What is important - limited design. There are not possible to use PL
mainly untrusted PL. Using an internal type is simple hack.

Pavel

> This, of course, presumes that everyone who is interested in building
> on this is going to use some UDFs written in C...
>
>>
>> We need pipes like
>>
>> like COPY table TO foo(..) TO table
>>
>> foo() should be a transformation function, or real pipe function
>
> I've actually considered this pipe thing with a colleague while
> driving home from work...it occurred to us that it would be nice to
> have both pipes and tees (basically composition vs. mapping
> application of functions over the input) in some form.  Not sure what
> an elegant way to express that is or how to control it.  Since you can
> work around this by composing or applying functions on your own in
> another function, I'm not sure if that's as high priority for me
> personally.
>
> fdr
>


From: Daniel Farina <drfarina(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Hannu Krosing <hannu(at)krosing(dot)net>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-24 13:00:30
Message-ID: 7b97c5a40911240500u20ec1dc3yb86347f1449aed@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Nov 24, 2009 at 4:37 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> 2009/11/24 Daniel Farina <drfarina(at)gmail(dot)com>:
>> On Tue, Nov 24, 2009 at 2:10 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>>> Hello
>>>
>>> I thing, so this patch is maybe good idea. I am missing better
>>> function specification. Specification by name isn't enough - we can
>>> have a overloaded functions. This syntax doesn't allow to use explicit
>>> cast - from my personal view, the syntax is ugly - with type
>>> specification we don't need to keyword FUNCTION
>>
>> As long as things continue to support the INTERNAL-type behavior for
>> extremely low overhead bulk transfers I am open to suggestions about
>> how to enrich things...but how would I do so under this proposal?
>>
>
> using an INTERNAL type is wrong. It breaks design these functions for
> usual PL. I don't see any reason, why it's necessary.
>
>> I am especially fishing for suggestions in the direction of managing
>> state for the function between rows though...I don't like how the
>> current design seems to scream "use a global variable."
>>
>>> We have a fast copy statement - ok., we have a fast function ok, but
>>> inside a function we have to call "slow" sql query. Personally What is
>>> advantage?
>>
>> The implementation here uses a type 'internal' for performance.  It
>> doesn't even recompute the fcinfo because of the very particular
>> circumstances of how the function is called.  It doesn't do a memory
>> copy of the argument buffer either, to the best of my knowledge.  In
>> the dblink patches you basically stream directly from the disk, format
>> the COPY bytes, and shove it into a waiting COPY on another postgres
>> node...there's almost no additional work in-between.  All utilized
>> time would be some combination of the normal COPY byte stream
>> generation and libpq.
>>
>
> I understand and I dislike it. This design isn't general - or it is
> far from using a function. It doesn't use complete FUNCAPI interface.
> I thing so you need different semantic. You are not use a function.
> You are use some like "stream object". This stream object can have a
> input, output function, and parameters should be internal (I don't
> thing, so internal could to carry any significant performance here) or
> standard. Syntax should be similar to CREATE AGGREGATE.

I think you might be right about this. At the time I was too shy to
add a DDL command for this hack, though. But what I did want is a
form of currying, and that's not easily accomplished in SQL without
extension...

> then syntax should be:
>
> COPY table TO streamname(parameters)
>
> COPY table TO filestream('/tmp/foo.dta') ...
> COPY table TO dblinkstream(connectionstring) ...

I like this one quite a bit...it's a bit like an aggregate, except the
initial condition can be set in a rather function-callish way.

But that does seem to require making a DDL command, which leaves a
nice green field. In particular, we could then make as many hooks,
flags, and options as we wanted, but sometimes there is a paradox of
choice...I just did not want to anticipate on Postgres being friendly
to a new DDL command when writing this the first time.

fdr


From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: Daniel Farina <drfarina(at)gmail(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-24 13:20:57
Message-ID: 1259068857.30357.104.camel@hvost1700
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2009-11-24 at 03:48 -0800, Daniel Farina wrote:
> On Tue, Nov 24, 2009 at 3:25 AM, Hannu Krosing <hannu(at)2ndquadrant(dot)com> wrote:
> > On Tue, 2009-11-24 at 02:56 -0800, Daniel Farina wrote:
> >> On Tue, Nov 24, 2009 at 2:51 AM, Daniel Farina <drfarina(at)gmail(dot)com> wrote:
> >> > On Tue, Nov 24, 2009 at 2:50 AM, Hannu Krosing <hannu(at)2ndquadrant(dot)com> wrote:
> >> >> Can't you use existing aggregate function design ?
> >> >>
> >> >> CREATE AGGREGATE name ( input_data_type [ , ... ] ) (
> >> >> SFUNC = sfunc,
> >> >> STYPE = state_data_type
> >> >> [ , FINALFUNC = ffunc ]
> >> >> [ , INITCOND = initial_condition ]
> >> >> [ , SORTOP = sort_operator ]
> >> >> )
> >> >
> >> > Actually, yes. I just thought that this was an idea so crazy that no
> >> > one would like it.
> >
> > seems kind of natural choice for me - in essence this is an aggregate
> > function, aggregating over rows/tuples supplied to it.
>
> Okay, well, maybe that wasn't such a crazy idea after all...
>
> >> Oh, and the other elephant in the room: error handling. How to handle
> >> error conditions...try/catch/finally type stuff.
> >
> > Same as current aggregates - either ignore the error, logi it and
> > continue, or bail out
> >[snip]
> > Neither do ordinary funtions, we have no "ON ERROR DO ..." clause for
> > function definitions
>
> It is assumed most functions do not have side effects outside the
> database, so this is gotten rather for free. The driving use case for
> this *is* side effects on other systems. I'm not sure if it's as easy
> to use this justification here...normally rollbacks just take care of
> all the error handling a function would want. Here I'm not so sure
> that is as common a case.

A cleaner solution for undoing external effects would be ON ROLLBACK
trigger, or maybe even extension to BEGIN

BEGIN WORK ON ROLLBACK RUN externalCleanupFunction();

ROLLBACK trigger could also be done as SET parameter inside a session,
so it wont bloat/pollute system tables if changed often;

> >
> >> I did consider using
> >> aggregates though, but somehow it felt to me like "I need at least a
> >> three-tuple, why not fish around for any random bundling of three
> >> functions..."
> >
> > Why do you need three ?
>
> I'm counting the aggregate prototype itself to refer to the bundle,
> which I suppose would be more normally considered a two-tuple of
> functions. This is a self-referential tuple, I suppose...
>
> >> After all, I would not want to actually call the nodeAgg stuff to
> >> apply the function anyway...so it'd basically be abused as a
> >> three-tuple of functions.
> >
> > Actually it would be best if it could use straight generic funtions, so
> > you could do something like
> >
> > COPY stdin TO filterfunc(int) TO avg(int);
>
> Generic functions? Do you mean just scalar functions?

Type. Actually I meant our existing aggregate functions.

> That'd be
> neat, but as I said previously, composition could just be wrapped into
> a function of the user's choice. Also, what about use of
> multi-function-apply?
>
> COPY stdin TO replicant1(datum) AND replicant2(datum);

seems like a rare case, but you could use a wrapper func

CREATE FUNCTION replicants_1_and_2(datum) AS
replicant1(datum)
replicant2(datum)

> You could imagine all sorts of new 2PC evil.

2PC is evil enyway, at least when performance is concerned ;)

> But again, one could
> just write a little function to absorb the rows and dole them out
> without bloating COPY syntax...
>
> I am in no way suggesting that syntax seriously or unseriously.
>
> > pass the file name in as an argument to SFUNC, open it on first call,
> > ignore later (if it stays the same ;)
>
> So either you are going to pass it with every row and ignore it,

That would be my preferred way, yes

> or create a new initial aggregate state for each COPY TO FUNCTION

third, more hackish way would to set it as INITCOND = '/file/name' :)

> ...how are you going to get it passed to SFUNC?

keep the file handle in the aggregate node - it is for keeping state,
and file handle sure is part of state.

--
Hannu Krosing http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability
Services, Consulting and Training


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Daniel Farina <drfarina(at)gmail(dot)com>
Cc: Hannu Krosing <hannu(at)krosing(dot)net>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-24 13:39:21
Message-ID: 162867790911240539v3aa7e091g583aa6e77e6dcfe7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2009/11/24 Daniel Farina <drfarina(at)gmail(dot)com>:
> On Tue, Nov 24, 2009 at 4:37 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>> 2009/11/24 Daniel Farina <drfarina(at)gmail(dot)com>:
>>> On Tue, Nov 24, 2009 at 2:10 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>>>> Hello
>>>>
>>>> I thing, so this patch is maybe good idea. I am missing better
>>>> function specification. Specification by name isn't enough - we can
>>>> have a overloaded functions. This syntax doesn't allow to use explicit
>>>> cast - from my personal view, the syntax is ugly - with type
>>>> specification we don't need to keyword FUNCTION
>>>
>>> As long as things continue to support the INTERNAL-type behavior for
>>> extremely low overhead bulk transfers I am open to suggestions about
>>> how to enrich things...but how would I do so under this proposal?
>>>
>>
>> using an INTERNAL type is wrong. It breaks design these functions for
>> usual PL. I don't see any reason, why it's necessary.
>>
>>> I am especially fishing for suggestions in the direction of managing
>>> state for the function between rows though...I don't like how the
>>> current design seems to scream "use a global variable."
>>>
>>>> We have a fast copy statement - ok., we have a fast function ok, but
>>>> inside a function we have to call "slow" sql query. Personally What is
>>>> advantage?
>>>
>>> The implementation here uses a type 'internal' for performance.  It
>>> doesn't even recompute the fcinfo because of the very particular
>>> circumstances of how the function is called.  It doesn't do a memory
>>> copy of the argument buffer either, to the best of my knowledge.  In
>>> the dblink patches you basically stream directly from the disk, format
>>> the COPY bytes, and shove it into a waiting COPY on another postgres
>>> node...there's almost no additional work in-between.  All utilized
>>> time would be some combination of the normal COPY byte stream
>>> generation and libpq.
>>>
>>
>> I understand and I dislike it. This design isn't general - or it is
>> far from using a function. It doesn't use complete FUNCAPI interface.
>> I thing so you need different semantic. You are not use a function.
>> You are use some like "stream object". This stream object can have a
>> input, output function, and parameters should be internal (I don't
>> thing, so internal could to carry any significant performance here) or
>> standard. Syntax should be similar to CREATE AGGREGATE.
>
> I think you might be right about this.  At the time I was too shy to
> add a DDL command for this hack, though.  But what I did want is a
> form of currying, and that's not easily accomplished in SQL without
> extension...
>

COPY is a PostgreSQL extension. If there are other related extensions - why not?
PostgreSQL has lot of database objects over SQL standard - see
fulltext implementation. I am not sure if STREAM is good keyword now.
It could be in collision with STREAM from streaming databases.

>> then syntax should be:
>>
>> COPY table TO streamname(parameters)
>>
>> COPY table TO filestream('/tmp/foo.dta') ...
>> COPY table TO dblinkstream(connectionstring) ...
>
> I like this one quite a bit...it's a bit like an aggregate, except the
> initial condition can be set in a rather function-callish way.
>
> But that does seem to require making a DDL command, which leaves a
> nice green field.  In particular, we could then make as many hooks,
> flags, and options as we wanted, but sometimes there is a paradox of
> choice...I just did not want to anticipate on Postgres being friendly
> to a new DDL command when writing this the first time.
>

sure - nobody like too much changes in gram.y. But well designed
general feature with related SQL enhancing is more acceptable, then
fast simply hack. Don't be a hurry. This idea is good - but it needs:

a) good designed C API like:

initialise_functions(fcinfo) -- std fcinfo
consument_process_tuple(fcinfo) -- gets standard row -- Datum
dvalues[] + Row description
producent_process_tuple(fcinfo) -- returns standard row -- Datum
dvalues[] + Row description (look on SRF API)
terminate_funnction(fcinfo)

I am sure, so this could be similar to AGGREGATE api
+ some samples to contrib

b) good designed PLPerlu and PLPythonu interface
+ some samples to documentation

Regards
Pavel Stehule

>
>


From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: Daniel Farina <drfarina(at)gmail(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-24 14:35:29
Message-ID: 1259073329.30357.125.camel@hvost1700
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2009-11-24 at 05:00 -0800, Daniel Farina wrote:
> On Tue, Nov 24, 2009 at 4:37 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:

> > then syntax should be:
> >
> > COPY table TO streamname(parameters)
> >
> > COPY table TO filestream('/tmp/foo.dta') ...
> > COPY table TO dblinkstream(connectionstring) ...

You probably meant

COPY table TO dblinkstream(connectionstring, table)

?

> I like this one quite a bit...it's a bit like an aggregate, except the
> initial condition can be set in a rather function-callish way.
>
> But that does seem to require making a DDL command, which leaves a
> nice green field.

not necessarily DDL, maybe just a "copystream" type and a set of
functions creating objects of that type.

if you make it a proper type with input and output function, then you
can probably use it in statements like this

COPY table TO (select stream::copystream from streams where id = 7);

COPY table TO 'file:/tmp/outfile':: copystream;

COPY table TO 'dblink::<connectstring>':: copystream;

> In particular, we could then make as many hooks,
> flags, and options as we wanted, but sometimes there is a paradox of
> choice...I just did not want to anticipate on Postgres being friendly
> to a new DDL command when writing this the first time.

fulltext lived for quite some time as set of types and functions before
it was glorified with its own DDL syntax.

It may be good to have the same approach here - do it as a set of types
and functions first, think about adding DDL once it has stabilised
enough

--
Hannu Krosing http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability
Services, Consulting and Training


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-24 14:50:25
Message-ID: 603c8f070911240650j291c65fek8da45132615e4e0f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Nov 23, 2009 at 8:46 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> You know how people complain about how new contributors are treated here?
>  Throwing out comments like this, that come off as belittling to other
> people's work, doesn't help.  All I was suggesting was that Dan wasn't
> developing this in complete isolation from the hackers community as Robert
> had feared, as will be obvious when we get to:

I still think it's better to have discussion on the mailing list than
elsewhere. But we're doing that now, so, good.

> As far as other past discussion here that might be relevant, this patch
> includes a direct change to gram.y to support the new syntax.  You've
> already suggested before that it might be time to update COPY the same way
> EXPLAIN and now VACUUM have been overhauled to provide a more flexible
> options interface:
>  http://archives.postgresql.org/pgsql-hackers/2009-09/msg00616.php  This
> patch might be more fuel for that idea.

FWIW, Tom already committed a patch by Emmanuel and myself that did this.

...Robert


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Hannu Krosing <hannu(at)2ndquadrant(dot)com>
Cc: Daniel Farina <drfarina(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-24 14:52:07
Message-ID: 162867790911240652j71c9e9f5j9c63ace7e430699@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2009/11/24 Hannu Krosing <hannu(at)2ndquadrant(dot)com>:
> On Tue, 2009-11-24 at 05:00 -0800, Daniel Farina wrote:
>> On Tue, Nov 24, 2009 at 4:37 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>
>> > then syntax should be:
>> >
>> > COPY table TO streamname(parameters)
>> >
>> > COPY table TO filestream('/tmp/foo.dta') ...
>> > COPY table TO dblinkstream(connectionstring) ...
>
> You probably meant
>
> COPY table TO dblinkstream(connectionstring, table)
>
> ?
>
>> I like this one quite a bit...it's a bit like an aggregate, except the
>> initial condition can be set in a rather function-callish way.
>>
>> But that does seem to require making a DDL command, which leaves a
>> nice green field.
>
> not necessarily DDL, maybe just a "copystream" type and a set of
> functions creating objects of that type.
>
> if you make it a proper type with input and output function, then you
> can probably use it in statements like this
>
> COPY table TO (select stream::copystream from streams where id = 7);
>
> COPY table TO 'file:/tmp/outfile':: copystream;
>
> COPY table TO 'dblink::<connectstring>':: copystream;

it interesting - but still you have to have DDL for declaring stream.
It is analogous to function:

CREATE FUNCTION ....

SELECT 'foo'::regprocedure

but syntax COPY table TO copystream is good idea. I like it.

>
>> In particular, we could then make as many hooks,
>> flags, and options as we wanted, but sometimes there is a paradox of
>> choice...I just did not want to anticipate on Postgres being friendly
>> to a new DDL command when writing this the first time.
>
> fulltext lived for quite some time as set of types and functions before
> it was glorified with its own DDL syntax.

What is DDL? Wrapper for insert to system catalog.

so we can have table pg_catalog.copystream

and for first testing

CREATE OR REPLACE FUNCTION register_copystream(regproc, regproc, regproc ...)

if we will happy - than it is one day work for support statement

CREATE COPYSTREAM ( ...

Regards
Pavel Stehule

>
> It may be good to have the same approach here - do it as a set of types
> and functions first, think about adding DDL once it has stabilised
> enough
>
>
> --
> Hannu Krosing   http://www.2ndQuadrant.com
> PostgreSQL Scalability and Availability
>   Services, Consulting and Training
>
>
>


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-24 14:56:28
Message-ID: 603c8f070911240656y79415a5asf86b302361ece7f0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Nov 23, 2009 at 9:37 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
>
> Greg Smith wrote:
>>
>> I haven't heard anything from Andrew about ragged CVS import either.  I
>> think that ultimately those features are useful, but just exceed what the
>> existing code could be hacked to handle cleanly.
>
> The patch is attached for your edification/amusement. I have backpatched it
> to 8.4 for the client that needed it, and it's working just fine. I didn't
> pursue it when it was clear that it was not going to be accepted. COPY
> returning text[] would allow us to achieve the same thing, a bit more
> verbosely, but it would be a lot more work to develop.

FWIW, I've somewhat come around to this idea. But I might be the only one.

...Robert


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Daniel Farina <drfarina(at)acm(dot)org>
Cc: Hannu Krosing <hannu(at)krosing(dot)net>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-25 02:51:32
Message-ID: 1259117492.6014.113.camel@monkey-cat.sm.truviso.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2009-11-24 at 00:54 -0800, Daniel Farina wrote:
> On Tue, Nov 24, 2009 at 12:29 AM, Hannu Krosing <hannu(at)krosing(dot)net> wrote:
> > COPY stdin TO udf();
>
> If stdin becomes (is?) a legitimate source of records, then this patch
> will Just Work.
>

STDIN is a source of bytes representing a set of records. Currently, the
first argument to COPY is a source or destination of records; and the
second argument is a source or destination of bytes representing a set
of records.

I think we want the first argument to remain a source or destination of
real records with types; that is, a table or perhaps a function. And we
want the second argument to remain a source or destination of bytes;
that is, a file or perhaps a function (albeit not the same kind as the
former function).

> > COPY udf() FROM stdin;
>
> This is unaddressed, but I think it would be a good idea to consider
> enabling this kind of thing prior to application.

This makes much more sense, but it is a very different type of function
from the original proposal (which basically accepts a buffer). I agree
that it sounds useful and would be good for the sake of symmetry.

One use case may be a degree of data cleaning. For instance, you could
use a "looser" function definition, like udf(cstring, cstring, ...),
where all COPY does is break up the records into fields, and the
function can recover from type input failures using subtransactions.
Binary mode could do a similar thing with bytea.

However, I recommend that you don't try to generalize this as a data
cleanup feature that can handle ragged input. That seems like a separate
problem that will distract from the original use case.

Regards,
Jeff Davis


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Daniel Farina <drfarina(at)gmail(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-25 04:16:20
Message-ID: 1259122580.19289.6.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2009-11-24 at 14:39 +0100, Pavel Stehule wrote:
> a) good designed C API like:
>
> initialise_functions(fcinfo) -- std fcinfo
> consument_process_tuple(fcinfo) -- gets standard row -- Datum
> dvalues[] + Row description
> producent_process_tuple(fcinfo) -- returns standard row -- Datum
> dvalues[] + Row description (look on SRF API)
> terminate_funnction(fcinfo)
>

Don't you still need the functions to accept an argument of type
internal? Otherwise, we lose the ability to copy a buffer to the dblink
connection, which was the original motivation.

Regards,
Jeff Davis


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Daniel Farina <drfarina(at)gmail(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-25 04:44:26
Message-ID: 5584.1259124266@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Davis <pgsql(at)j-davis(dot)com> writes:
> Don't you still need the functions to accept an argument of type
> internal? Otherwise, we lose the ability to copy a buffer to the dblink
> connection, which was the original motivation.

If you do that, then there is no possibility of ever using this feature
except with C-coded functions, which seems to me to remove most of
whatever use-case there was.

regards, tom lane


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Daniel Farina <drfarina(at)gmail(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-25 04:45:53
Message-ID: 162867790911242045g1e6b8fbbp88d2769b6fb483af@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2009/11/25 Jeff Davis <pgsql(at)j-davis(dot)com>:
> On Tue, 2009-11-24 at 14:39 +0100, Pavel Stehule wrote:
>> a) good designed C API  like:
>>
>>    initialise_functions(fcinfo) -- std fcinfo
>>    consument_process_tuple(fcinfo) -- gets standard row -- Datum
>> dvalues[] + Row description
>>    producent_process_tuple(fcinfo) -- returns standard row  -- Datum
>> dvalues[] + Row description (look on SRF API)
>>    terminate_funnction(fcinfo)
>>
>
> Don't you still need the functions to accept an argument of type
> internal? Otherwise, we lose the ability to copy a buffer to the dblink
> connection, which was the original motivation.
>

It depends on design. I don't thing so internal is necessary. It is
just wrong design.

Pavel

> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


From: Daniel Farina <drfarina(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-25 05:12:20
Message-ID: 7b97c5a40911242112l4c3f6f30m8c0458edd11a646a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Nov 24, 2009 at 8:45 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> It depends on design. I don't thing so internal is necessary. It is
> just wrong design.

Depends on how lean you want to be when doing large COPY...right now
the cost is restricted to having to call a function pointer and a few
branches. If you want to take SQL values, then the semantics of
function calling over a large number of rows is probably notably more
expensive, although I make no argument against the fact that the
non-INTERNAL version would give a lot more people more utility.

fdr


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Daniel Farina <drfarina(at)gmail(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-25 05:13:04
Message-ID: 1259125984.19289.20.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2009-11-24 at 23:44 -0500, Tom Lane wrote:
> If you do that, then there is no possibility of ever using this feature
> except with C-coded functions, which seems to me to remove most of
> whatever use-case there was.

It fits the use case involving dblink (or dblink-like modules).

Maybe the patch's performance should be tested with and without copying
the buffer, to see if we're losing anything significant. If we can do
almost as well copying the data and passing that as a bytea value to the
function, then I agree that would be better.

I still don't see any reason to force it to be record by record though.
If the point is to push data from a table into a remote table, why
should the copied data be translated out of binary format into a record,
and then back into binary form to send to the remote system?

Currently, the second argument to copy is a source or destination of
bytes, not records. So forcing it to deal with records is inconsistent.

Regards,
Jeff Davis


From: Daniel Farina <drfarina(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-25 05:31:57
Message-ID: 7b97c5a40911242131j25a1d376r19c04301da8354c3@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Nov 24, 2009 at 9:13 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:

>
> I still don't see any reason to force it to be record by record though.
> If the point is to push data from a table into a remote table, why
> should the copied data be translated out of binary format into a record,
> and then back into binary form to send to the remote system?
>
> Currently, the second argument to copy is a source or destination of
> bytes, not records. So forcing it to deal with records is inconsistent.

You are correct. It so happens as an artifact of how COPY is written
that things are delivered row-by-row, but at some fundamental level it
does not matter were that not the case...

fdr


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Daniel Farina <drfarina(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-25 05:35:59
Message-ID: 162867790911242135q4acfac9xf861814b5a889339@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2009/11/25 Daniel Farina <drfarina(at)gmail(dot)com>:
> On Tue, Nov 24, 2009 at 8:45 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>> It depends on design. I don't thing so internal is necessary. It is
>> just wrong design.
>
> Depends on how lean you want to be when doing large COPY...right now
> the cost is restricted to having to call a function pointer and a few
> branches.  If you want to take SQL values, then the semantics of
> function calling over a large number of rows is probably notably more
> expensive, although I make no argument against the fact that the
> non-INTERNAL version would give a lot more people more utility.

I believe so using an "internal" minimalize necessary changes in COPY
implementation. Using a funcapi needs more work inside COPY - you
have to take some functionality from COPY to stream functions.
Probably the most slow operations is parsing - calling a input
functions. This is called once every where. Second slow operation is
reading from network - it is same. So I don't see too much reasons,
why non internal implementation have to be significant slower than
your actual implementation. I am sure, so it needs more work.

What is significant - when I better join COPY and some streaming
function, then I don't need use tuplestore - or SRF functions. COPY
reads data directly.

>
> fdr
>


From: Daniel Farina <drfarina(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-25 05:42:15
Message-ID: 7b97c5a40911242142s178939f5lf6295d008dd40c0b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Nov 24, 2009 at 9:35 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> 2009/11/25 Daniel Farina <drfarina(at)gmail(dot)com>:
>> On Tue, Nov 24, 2009 at 8:45 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>>> It depends on design. I don't thing so internal is necessary. It is
>>> just wrong design.
>>
>> Depends on how lean you want to be when doing large COPY...right now
>> the cost is restricted to having to call a function pointer and a few
>> branches.  If you want to take SQL values, then the semantics of
>> function calling over a large number of rows is probably notably more
>> expensive, although I make no argument against the fact that the
>> non-INTERNAL version would give a lot more people more utility.
>
> I believe so using an "internal" minimalize necessary changes in COPY
> implementation. Using a funcapi needs more work inside COPY -  you
> have to take some functionality from COPY to stream functions.
> Probably the most slow operations is parsing - calling a input
> functions. This is called once every where. Second slow operation is
> reading from network - it is same. So I don't see too much reasons,
> why non internal implementation have to be significant slower than
> your actual implementation. I am sure, so it needs more work.

You are probably right. We could try coercing to bytea and back out
to bytes, although it seems like a superfluous cost to force
*everyone* to pay just to get the same bytes to a network buffer.

fdr


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Daniel Farina <drfarina(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-25 06:09:19
Message-ID: 162867790911242209y26440926q8851b3355963a319@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2009/11/25 Daniel Farina <drfarina(at)gmail(dot)com>:
> On Tue, Nov 24, 2009 at 9:35 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>> 2009/11/25 Daniel Farina <drfarina(at)gmail(dot)com>:
>>> On Tue, Nov 24, 2009 at 8:45 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>>>> It depends on design. I don't thing so internal is necessary. It is
>>>> just wrong design.
>>>
>>> Depends on how lean you want to be when doing large COPY...right now
>>> the cost is restricted to having to call a function pointer and a few
>>> branches.  If you want to take SQL values, then the semantics of
>>> function calling over a large number of rows is probably notably more
>>> expensive, although I make no argument against the fact that the
>>> non-INTERNAL version would give a lot more people more utility.
>>
>> I believe so using an "internal" minimalize necessary changes in COPY
>> implementation. Using a funcapi needs more work inside COPY -  you
>> have to take some functionality from COPY to stream functions.
>> Probably the most slow operations is parsing - calling a input
>> functions. This is called once every where. Second slow operation is
>> reading from network - it is same. So I don't see too much reasons,
>> why non internal implementation have to be significant slower than
>> your actual implementation. I am sure, so it needs more work.
>

"internal" is important (for performance) for aggregation function -
where is protection under repeated alloc/free memory - it work well
and it is +/- ugly hack. We cannot do some things well - simply there
are missing some support. Nobody calculated with very large string,
array concatenation in design time - It is reason, why I am against to
using it.

> You are probably right.  We could try coercing to bytea and back out
> to bytes, although it seems like a superfluous cost to force
> *everyone* to pay just to get the same bytes to a network buffer.
>

I am not sure if this is good analogy. Only "filestream" or "network"
stream is stream of bytes. From any sophisticated stream I am taking
tuples - database stream, SOAP stream. I agree, so dblink could to
returns binary compatible records - but it is one special and
exclusive case. Sure, important and have to calculated. Still I am
thinking so dblink to postgres is other hack and should be replaced).

> fdr
>


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Daniel Farina <drfarina(at)gmail(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-25 06:13:32
Message-ID: 1259129612.19289.88.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2009-11-24 at 21:42 -0800, Daniel Farina wrote:
> You are probably right. We could try coercing to bytea and back out
> to bytes, although it seems like a superfluous cost to force
> *everyone* to pay just to get the same bytes to a network buffer.

Well, I suppose only performance will tell. Copying a buffer is sure to
be faster than invoking all of the type input/output functions, or even
send/recv, so perhaps it's not a huge penalty.

My disagreement with the row-by-row approach is more semantics than
performance. COPY translates records to bytes and vice-versa, and your
original patch maintains those semantics.

Regards,
Jeff Davis


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Daniel Farina <drfarina(at)gmail(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-25 06:23:41
Message-ID: 1259130221.19289.109.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2009-11-25 at 06:35 +0100, Pavel Stehule wrote:
> I believe so using an "internal" minimalize necessary changes in COPY
> implementation. Using a funcapi needs more work inside COPY - you
> have to take some functionality from COPY to stream functions.
> Probably the most slow operations is parsing - calling a input
> functions. This is called once every where. Second slow operation is
> reading from network - it is same. So I don't see too much reasons,
> why non internal implementation have to be significant slower than
> your actual implementation. I am sure, so it needs more work.

I apologize, but I don't understand what you're saying. Can you please
restate with some examples?

It seems like you're advocating that we move records from a table into a
function using COPY. But that's not what COPY normally does: COPY
normally translates records to bytes or bytes to records.

Moving records from a table to a function can be done with:
SELECT myfunc(mytable) FROM mytable;
already. The only problem is if you want initialization/destruction. But
I'm not convinced that COPY is the best tool to provide that.

Moving records from a function to a table can be done with:
INSERT INTO mytable SELECT * FROM myfunc();
And that already works fine.

So what use case are you concerned about?

Regards,
Jeff Davis


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Daniel Farina <drfarina(at)gmail(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-25 06:31:01
Message-ID: 162867790911242231p2141f4acoe0350b9d1ded1b00@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2009/11/25 Jeff Davis <pgsql(at)j-davis(dot)com>:
> On Tue, 2009-11-24 at 21:42 -0800, Daniel Farina wrote:
>> You are probably right.  We could try coercing to bytea and back out
>> to bytes, although it seems like a superfluous cost to force
>> *everyone* to pay just to get the same bytes to a network buffer.
>
> Well, I suppose only performance will tell. Copying a buffer is sure to
> be faster than invoking all of the type input/output functions, or even
> send/recv, so perhaps it's not a huge penalty.
>
> My disagreement with the row-by-row approach is more semantics than
> performance. COPY translates records to bytes and vice-versa, and your
> original patch maintains those semantics.

uff, really

COPY CSV ?

Pavel

>
> Regards,
>        Jeff Davis
>
>


From: Daniel Farina <drfarina(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-25 06:35:11
Message-ID: 7b97c5a40911242235v5459c13ex8b7082e18a363b1f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Nov 24, 2009 at 10:23 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> On Wed, 2009-11-25 at 06:35 +0100, Pavel Stehule wrote:
>> I believe so using an "internal" minimalize necessary changes in COPY
>> implementation. Using a funcapi needs more work inside COPY -  you
>> have to take some functionality from COPY to stream functions.
>> Probably the most slow operations is parsing - calling a input
>> functions. This is called once every where. Second slow operation is
>> reading from network - it is same. So I don't see too much reasons,
>> why non internal implementation have to be significant slower than
>> your actual implementation. I am sure, so it needs more work.
>
> I apologize, but I don't understand what you're saying. Can you please
> restate with some examples?
>
> It seems like you're advocating that we move records from a table into a
> function using COPY. But that's not what COPY normally does: COPY
> normally translates records to bytes or bytes to records.

Perhaps what we want is pluggable transformation functions that can
format the row any way that is desired, with the current behavior
being some default. Putting COPY TO FUNCTION as submitted aside, what
about something like this:

COPY foo TO '/tmp/foo' USING postgres_builtin_formatter(csv = true);

This is something completely different than what was submitted, so in
some aspect:

COPY foo TO FUNCTION dblink_send_row USING
postgres_builtin_formatter(binary = true);

Would compose the two features...

(Again, very, very far from a real syntax suggestion)

fdr


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Daniel Farina <drfarina(at)gmail(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-25 06:36:27
Message-ID: 162867790911242236m64ba8a3eoe53a1e63c4d245e2@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2009/11/25 Jeff Davis <pgsql(at)j-davis(dot)com>:
> On Wed, 2009-11-25 at 06:35 +0100, Pavel Stehule wrote:
>> I believe so using an "internal" minimalize necessary changes in COPY
>> implementation. Using a funcapi needs more work inside COPY -  you
>> have to take some functionality from COPY to stream functions.
>> Probably the most slow operations is parsing - calling a input
>> functions. This is called once every where. Second slow operation is
>> reading from network - it is same. So I don't see too much reasons,
>> why non internal implementation have to be significant slower than
>> your actual implementation. I am sure, so it needs more work.
>
> I apologize, but I don't understand what you're saying. Can you please
> restate with some examples?
>
> It seems like you're advocating that we move records from a table into a
> function using COPY. But that's not what COPY normally does: COPY
> normally translates records to bytes or bytes to records.
>
> Moving records from a table to a function can be done with:
>  SELECT myfunc(mytable) FROM mytable;
> already. The only problem is if you want initialization/destruction. But
> I'm not convinced that COPY is the best tool to provide that.
>
> Moving records from a function to a table can be done with:
>  INSERT INTO mytable SELECT * FROM myfunc();
> And that already works fine.

It works, but COPY FROM myfunc() should be significantly faster. You
can skip tuple store.

Pavel

>
> So what use case are you concerned about?
>
> Regards,
>        Jeff Davis
>
>


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Daniel Farina <drfarina(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-25 06:39:42
Message-ID: 162867790911242239k2d6fc090mdb8edc921fd941b0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2009/11/25 Daniel Farina <drfarina(at)gmail(dot)com>:
> On Tue, Nov 24, 2009 at 10:23 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>> On Wed, 2009-11-25 at 06:35 +0100, Pavel Stehule wrote:
>>> I believe so using an "internal" minimalize necessary changes in COPY
>>> implementation. Using a funcapi needs more work inside COPY -  you
>>> have to take some functionality from COPY to stream functions.
>>> Probably the most slow operations is parsing - calling a input
>>> functions. This is called once every where. Second slow operation is
>>> reading from network - it is same. So I don't see too much reasons,
>>> why non internal implementation have to be significant slower than
>>> your actual implementation. I am sure, so it needs more work.
>>
>> I apologize, but I don't understand what you're saying. Can you please
>> restate with some examples?
>>
>> It seems like you're advocating that we move records from a table into a
>> function using COPY. But that's not what COPY normally does: COPY
>> normally translates records to bytes or bytes to records.
>
> Perhaps what we want is pluggable transformation functions that can
> format the row any way that is desired, with the current behavior
> being some default.  Putting COPY TO FUNCTION as submitted aside, what
> about something like this:
>
> COPY foo TO '/tmp/foo' USING postgres_builtin_formatter(csv = true);
>
> This is something completely different than what was submitted, so in
> some aspect:
>
> COPY foo TO FUNCTION dblink_send_row USING
> postgres_builtin_formatter(binary = true);
>
> Would compose the two features...
>

yes - it is two features - and should be solved independently

Pavel

> (Again, very, very far from a real syntax suggestion)
>
> fdr
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Daniel Farina <drfarina(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-25 07:03:45
Message-ID: 162867790911242303p30cef474jf726b990f4d9a2e7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2009/11/25 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:
> 2009/11/25 Daniel Farina <drfarina(at)gmail(dot)com>:
>> On Tue, Nov 24, 2009 at 10:23 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>>> On Wed, 2009-11-25 at 06:35 +0100, Pavel Stehule wrote:
>>>> I believe so using an "internal" minimalize necessary changes in COPY
>>>> implementation. Using a funcapi needs more work inside COPY -  you
>>>> have to take some functionality from COPY to stream functions.
>>>> Probably the most slow operations is parsing - calling a input
>>>> functions. This is called once every where. Second slow operation is
>>>> reading from network - it is same. So I don't see too much reasons,
>>>> why non internal implementation have to be significant slower than
>>>> your actual implementation. I am sure, so it needs more work.
>>>
>>> I apologize, but I don't understand what you're saying. Can you please
>>> restate with some examples?
>>>
>>> It seems like you're advocating that we move records from a table into a
>>> function using COPY. But that's not what COPY normally does: COPY
>>> normally translates records to bytes or bytes to records.
>>
>> Perhaps what we want is pluggable transformation functions that can
>> format the row any way that is desired, with the current behavior
>> being some default.  Putting COPY TO FUNCTION as submitted aside, what
>> about something like this:
>>
>> COPY foo TO '/tmp/foo' USING postgres_builtin_formatter(csv = true);
>>
>> This is something completely different than what was submitted, so in
>> some aspect:
>>
>> COPY foo TO FUNCTION dblink_send_row USING
>> postgres_builtin_formatter(binary = true);
>>
>> Would compose the two features...
>>
>
> yes - it is two features - and should be solved independently

it and it is not (some thinking) - smarter streams should to
accept/returns tuples. Formating function has sense for text output -
there are input/output formating (text based/bytea based) functions.

I see one possible problem - when formater functions will be row based
- then you cannot generate some prolog and epilog part of file -
(xml).

Pavel

>
> Pavel
>
>> (Again, very, very far from a real syntax suggestion)
>>
>> fdr
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>>
>


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Daniel Farina <drfarina(at)gmail(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-25 07:09:23
Message-ID: 1259132963.19289.166.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2009-11-25 at 07:31 +0100, Pavel Stehule wrote:
> > My disagreement with the row-by-row approach is more semantics than
> > performance. COPY translates records to bytes and vice-versa, and your
> > original patch maintains those semantics.
>
> uff, really
>
> COPY CSV ?

CSV is like text or binary: just another format to _represent_ records
as a sequence of bytes. A CSV file is not a set of postgresql records
until COPY interprets it as such.

Regards,
Jeff Davis


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Daniel Farina <drfarina(at)gmail(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-25 07:15:15
Message-ID: 1259133315.19289.176.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2009-11-25 at 07:36 +0100, Pavel Stehule wrote:
> > Moving records from a function to a table can be done with:
> > INSERT INTO mytable SELECT * FROM myfunc();
> > And that already works fine.
>
> It works, but COPY FROM myfunc() should be significantly faster. You
> can skip tuple store.

If SRFs use a tuplestore in that situation, it sounds like that should
be fixed. Why do we need to provide alternate syntax involving COPY?

Regards,
Jeff Davis


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Daniel Farina <drfarina(at)gmail(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-25 08:23:20
Message-ID: 162867790911250023t10503a83qa143955e148c9dc9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2009/11/25 Jeff Davis <pgsql(at)j-davis(dot)com>:
> On Wed, 2009-11-25 at 07:36 +0100, Pavel Stehule wrote:
>> > Moving records from a function to a table can be done with:
>> >  INSERT INTO mytable SELECT * FROM myfunc();
>> > And that already works fine.
>>
>> It works, but COPY FROM myfunc() should be significantly faster. You
>> can skip tuple store.
>
> If SRFs use a tuplestore in that situation, it sounds like that should
> be fixed. Why do we need to provide alternate syntax involving COPY?

It isn't problem of SRF function design. It allow both mode - row and
tuplestor. This is problem of INSERT statement, resp. INSERT INTO
SELECT implementation.

Regards
Pavel

>
> Regards,
>        Jeff Davis
>
>


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Daniel Farina <drfarina(at)gmail(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-25 08:37:10
Message-ID: 1259138230.19289.253.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2009-11-25 at 09:23 +0100, Pavel Stehule wrote:
> > If SRFs use a tuplestore in that situation, it sounds like that should
> > be fixed. Why do we need to provide alternate syntax involving COPY?
>
> It isn't problem of SRF function design. It allow both mode - row and
> tuplestor.

select * from generate_series(1,1000000000) limit 1;

That statement takes a long time, which indicates to me that it's
materializing the result of the SRF. And there's no insert there.

> This is problem of INSERT statement, resp. INSERT INTO
> SELECT implementation.

If "tmp" is a new table, and "zero" is a table with a million zeros in
it, then:
insert into tmp select 1/i from zero;
fails instantly. That tells me that it's not materializing the result of
the select; rather, it's feeding the rows in one at a time.

Can show me in more detail what you mean? I'm having difficulty
understanding your short replies.

Regards,
Jeff Davis


From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Daniel Farina <drfarina(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-25 10:26:39
Message-ID: 1259144799.30357.173.camel@hvost1700
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2009-11-24 at 21:13 -0800, Jeff Davis wrote:
> On Tue, 2009-11-24 at 23:44 -0500, Tom Lane wrote:
> > If you do that, then there is no possibility of ever using this feature
> > except with C-coded functions, which seems to me to remove most of
> > whatever use-case there was.
>
> It fits the use case involving dblink (or dblink-like modules).
>
> Maybe the patch's performance should be tested with and without copying
> the buffer, to see if we're losing anything significant. If we can do
> almost as well copying the data and passing that as a bytea value to the
> function, then I agree that would be better.

I'd make this dependent on funtion signature
- if it takes bytea or text, then call it with (binary) rows
- if it takes rowtype (of some hypothetic table),
then resolve rows to this rowtype

> I still don't see any reason to force it to be record by record though.
> If the point is to push data from a table into a remote table, why
> should the copied data be translated out of binary format into a record,
> and then back into binary form to send to the remote system?
>
> Currently, the second argument to copy is a source or destination of
> bytes, not records. So forcing it to deal with records is inconsistent.
>
> Regards,
> Jeff Davis
>

--
Hannu Krosing http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability
Services, Consulting and Training


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Daniel Farina <drfarina(at)gmail(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-25 10:32:17
Message-ID: 162867790911250232u75985e29mba6310358d4b2911@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2009/11/25 Jeff Davis <pgsql(at)j-davis(dot)com>:
> On Wed, 2009-11-25 at 09:23 +0100, Pavel Stehule wrote:
>> > If SRFs use a tuplestore in that situation, it sounds like that should
>> > be fixed. Why do we need to provide alternate syntax involving COPY?
>>
>> It isn't problem of SRF function design. It allow both mode - row and
>> tuplestor.
>
> select * from generate_series(1,1000000000) limit 1;
>
> That statement takes a long time, which indicates to me that it's
> materializing the result of the SRF. And there's no insert there.

This is missing optimalisation. If I understand - PostgreSQL wait for
complete result set - so in this case materialisation is necessary. In
your query pg do materialisation too early.

postgres=# select * from generate_series(1,100000) limit 1;
generate_series
─────────────────
1
(1 row)

Time: 59,540 ms
postgres=# select generate_series(1,100000) limit 1;
generate_series
─────────────────
1
(1 row)

Time: 1,107 ms

But usually we can process all rows from SRF function - so problem
with LIMIT isn't significant.

I am testing:

1.
postgres=# select count(*) from generate_series(1,1000000);
count
─────────
1000000
(1 row)

Time: 930,720 ms

2.
postgres=# select count(*) from (select generate_series(1,1000000)) x;
count
─────────
1000000
(1 row)

Time: 276,511 ms

2. is significantly faster then 1 (there are not SRF materialisation)

postgres=# create table foo(a integer);
CREATE TABLE

postgres=# insert into foo select generate_series(1,1000000);
INSERT 0 1000000
Time: 4274,869 ms

postgres=# insert into foo select * from generate_series(1,1000000);
INSERT 0 1000000
Time: 4814,756 ms

postgres=# copy foo to '/tmp/xxx';
COPY 1000000
Time: 1033,078 ms

postgres=# set synchronous_commit to off;
SET

postgres=# copy foo from '/tmp/xxx';
COPY 1000000
Time: 2749,277 ms

postgres=# insert into foo select generate_series(1,1000000);
INSERT 0 1000000
Time: 3948,734 ms

generate_function is fast and simple - but still COPY is about 30% faster

>
>> This is problem of INSERT statement, resp. INSERT INTO
>> SELECT implementation.
>
> If "tmp" is a new table, and "zero" is a table with a million zeros in
> it, then:
>  insert into tmp select 1/i from zero;
> fails instantly. That tells me that it's not materializing the result of
> the select; rather, it's feeding the rows in one at a time.
>

I thing, so materialisation is every time, when you use any SQL
statement without cursor.

> Can show me in more detail what you mean? I'm having difficulty
> understanding your short replies.

I thing, so COPY tab from/to fce() should be used for very large
import export, where INSERT SELECT needs minimally one
materialisation.

p.s. I am sorry - I am not native speaker - so I am speaking in short replies.

Pavel

>
> Regards,
>        Jeff Davis
>
>


From: Daniel Farina <drfarina(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-25 17:43:03
Message-ID: 7b97c5a40911250943u4ff7df92m3215f48352d60959@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Nov 24, 2009 at 10:39 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> yes - it is two features - and should be solved independently

There are some common problems though. I was thinking about this with
some mind towards my existing mental model of thinking of specifying
some parameters up-front and getting a stream of records or bytes
(depending on what feature you are referring to) as a form of
currying, combined with the not-complete revulsion as to using
aggregates as a base for such functionality....

What if we extended aggregates to support a function as an initial
condition which could be called with parameters when initializing the
aggregate? If you squint at it just right, the current form is that
of a value/constant -- effectively the zero-parameter function.

Here's a gist of what I want to realize:

SELECT (avg())(column_name)
FROM ...

This is a vanilla average. That's not very interesting since avg only
has one default initial value. However, at Truviso we have
encountered a real case where we wanted SUM to be initialized to "0"
instead of "NULL". I had to create a new aggregate with that as an
initial condition, which is fine because we only needed one extra
standard behavior. But perhaps instead it could have been written
this way:

SELECT (sum(0))(column_name)
FROM ...

That way people could get 'zero' rather than NULL when their query
yields no rows. You could also imagine some code out there that may
have a running-sum of sorts, and may want to seed SUM to some
non-zero, non-NULL initial value as set by the application.

At that point we may be able to abuse the aggregate infrastructure to
doing what we want in the case of these COPY extensions more easily...

fdr


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Daniel Farina <drfarina(at)gmail(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-25 18:34:53
Message-ID: 22154.1259174093@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Davis <pgsql(at)j-davis(dot)com> writes:
> On Wed, 2009-11-25 at 09:23 +0100, Pavel Stehule wrote:
>>> If SRFs use a tuplestore in that situation, it sounds like that should
>>> be fixed. Why do we need to provide alternate syntax involving COPY?
>>
>> It isn't problem of SRF function design. It allow both mode - row and
>> tuplestor.

> select * from generate_series(1,1000000000) limit 1;

> That statement takes a long time, which indicates to me that it's
> materializing the result of the SRF.

Yeah. This is certainly fixable if someone wants to do the legwork of
refactoring ExecMakeTableFunctionResult(). It was done that way
originally just to keep down the complexity of introducing the
function-as-table-source feature at all.

regards, tom lane


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Daniel Farina <drfarina(at)gmail(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-25 20:56:09
Message-ID: 1259182569.19289.280.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2009-11-25 at 11:32 +0100, Pavel Stehule wrote:
> 1.
> postgres=# select count(*) from generate_series(1,1000000);
> count
> ─────────
> 1000000
> (1 row)
>
> Time: 930,720 ms
>
> 2.
> postgres=# select count(*) from (select generate_series(1,1000000)) x;
> count
> ─────────
> 1000000
> (1 row)
>
> Time: 276,511 ms
>
> 2. is significantly faster then 1 (there are not SRF materialisation)

I think case #1 can be fixed.

> generate_function is fast and simple - but still COPY is about 30% faster

My quick tests are not consistent enough, so I will have to try with
more data. The times look similar to me so far.

If there is a difference, I wonder what it is?

> I thing, so materialisation is every time, when you use any SQL
> statement without cursor.

I don't think that is true. Here's an expanded version of my previous
example:

create table zero(i int);
create table tmp(j int);
insert into zero select 0 from generate_series(1,1000000); -- all 0
insert into tmp select 1/i from zero; -- error immediately, doesn't wait

The error would take longer if it materialized the table "zero". But
instead, it passes the first tuple to the function for "/" before the
other tuples are read, and gets an error immediately. So no
materialization.

I worry that we're getting further away from the original problem. Let's
allow functions to get the bytes of data from a COPY, like the original
proposal. I am not sure COPY is the best mechanism to move records
around when INSERT ... SELECT already does that.

Regards,
Jeff Davis


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Daniel Farina <drfarina(at)gmail(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-26 04:01:33
Message-ID: 162867790911252001m3cd1d08bv585392b8b88fd6d5@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2009/11/25 Jeff Davis <pgsql(at)j-davis(dot)com>:
> On Wed, 2009-11-25 at 11:32 +0100, Pavel Stehule wrote:
>> 1.
>> postgres=# select count(*) from generate_series(1,1000000);
>>   count
>> ─────────
>>  1000000
>> (1 row)
>>
>> Time: 930,720 ms
>>
>> 2.
>> postgres=# select count(*) from (select generate_series(1,1000000)) x;
>>   count
>> ─────────
>>  1000000
>> (1 row)
>>
>> Time: 276,511 ms
>>
>> 2. is significantly faster then 1 (there are not SRF materialisation)
>
> I think case #1 can be fixed.
>
>> generate_function is fast and simple - but still COPY is about 30% faster
>
> My quick tests are not consistent enough, so I will have to try with
> more data. The times look similar to me so far.
>
> If there is a difference, I wonder what it is?
>
>> I thing, so materialisation is every time, when you use any SQL
>> statement without cursor.
>
> I don't think that is true. Here's an expanded version of my previous
> example:
>
> create table zero(i int);
> create table tmp(j int);
> insert into zero select 0 from generate_series(1,1000000); -- all 0
> insert into tmp select 1/i from zero; -- error immediately, doesn't wait
>
> The error would take longer if it materialized the table "zero". But
> instead, it passes the first tuple to the function for "/" before the
> other tuples are read, and gets an error immediately. So no
> materialization.

this show nothing.

It working like:

1. EXECUTE SELECT 0 FROM generate_series(1,...);
2. STORE RESULT TO TABLE zero;
3. EXECUTE SELECT 1/i FROM zero;
4. STORE RESULT TO TABLE tmp;

Problem is in seq execution. Result is stored to destination after
execution - so any materialisation is necessary,

>
> I worry that we're getting further away from the original problem. Let's
> allow functions to get the bytes of data from a COPY, like the original
> proposal. I am not sure COPY is the best mechanism to move records
> around when INSERT ... SELECT already does that.
>

In one single case hack I prefer using any hook and feature stored
contrib. I don't see a general using for this feature.

Regards
Pavel Stehule

> Regards,
>        Jeff Davis
>
>


From: "Andrew Dunstan" <andrew(at)dunslane(dot)net>
To: "Jeff Davis" <pgsql(at)j-davis(dot)com>
Cc: "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com>, "Daniel Farina" <drfarina(at)gmail(dot)com>, "Hannu Krosing" <hannu(at)krosing(dot)net>, "Greg Smith" <greg(at)2ndquadrant(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Daniel Farina" <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-26 05:35:42
Message-ID: 51560.76.125.11.227.1259213742.squirrel@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, November 25, 2009 3:56 pm, Jeff Davis wrote:
>
> I worry that we're getting further away from the original problem. Let's
> allow functions to get the bytes of data from a COPY, like the original
> proposal. I am not sure COPY is the best mechanism to move records
> around when INSERT ... SELECT already does that.
>

I am not at all sure I think that's a good idea, though. We have
pg_read_file() for getting raw bytes from files. Building that into COPY
does not strike me as a good fit.

cheers

andrew


From: Daniel Farina <drfarina(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-26 05:49:17
Message-ID: 7b97c5a40911252149mbd41c6btd82239bbf9ab0e96@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Nov 25, 2009 at 9:35 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> On Wed, November 25, 2009 3:56 pm, Jeff Davis wrote:
>>
>> I worry that we're getting further away from the original problem. Let's
>> allow functions to get the bytes of data from a COPY, like the original
>> proposal. I am not sure COPY is the best mechanism to move records
>> around when INSERT ... SELECT already does that.
>>
>
>
> I am not at all sure I think that's a good idea, though. We have
> pg_read_file() for getting raw bytes from files. Building that into COPY
> does not strike me as a good fit.

I think we speak of the opposite direction...

fdr


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Daniel Farina <drfarina(at)gmail(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-26 07:18:55
Message-ID: 1259219935.19289.505.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2009-11-26 at 05:01 +0100, Pavel Stehule wrote:
> It working like:
>
> 1. EXECUTE SELECT 0 FROM generate_series(1,...);
> 2. STORE RESULT TO TABLE zero;
> 3. EXECUTE SELECT 1/i FROM zero;
> 4. STORE RESULT TO TABLE tmp;
>
> Problem is in seq execution. Result is stored to destination after
> execution - so any materialisation is necessary,
>

My example showed that steps 3 and 4 are not executed sequentially, but
are executed together. If 3 was executed entirely before 4, then the
statement:
insert into tmp select 1/i from zero;
would have to read the whole table "zero" before an error is
encountered.

However, the statement errors immediately, showing that steps 3 and 4
are pipelined.

Regards,
Jeff Davis


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Daniel Farina <drfarina(at)gmail(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-26 07:22:10
Message-ID: 1259220130.19289.508.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2009-11-26 at 00:35 -0500, Andrew Dunstan wrote:
> On Wed, November 25, 2009 3:56 pm, Jeff Davis wrote:
> >
> > I worry that we're getting further away from the original problem. Let's
> > allow functions to get the bytes of data from a COPY, like the original
> > proposal. I am not sure COPY is the best mechanism to move records
> > around when INSERT ... SELECT already does that.
> >
>
>
> I am not at all sure I think that's a good idea, though. We have
> pg_read_file() for getting raw bytes from files. Building that into COPY
> does not strike me as a good fit.

I think we're in agreement. All I mean is that the second argument to
COPY should produce/consume bytes and not records. I'm not discussing
the internal implementation at all, only semantics.

In other words, STDIN is not a source of records, it's a source of
bytes; and likewise for STDOUT.

Regards,
Jeff Davis


From: "Andrew Dunstan" <andrew(at)dunslane(dot)net>
To: "Jeff Davis" <pgsql(at)j-davis(dot)com>
Cc: "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com>, "Daniel Farina" <drfarina(at)gmail(dot)com>, "Hannu Krosing" <hannu(at)krosing(dot)net>, "Greg Smith" <greg(at)2ndquadrant(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Daniel Farina" <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-26 15:09:03
Message-ID: 51705.76.125.11.227.1259248143.squirrel@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, November 26, 2009 2:22 am, Jeff Davis wrote:
> On Thu, 2009-11-26 at 00:35 -0500, Andrew Dunstan wrote:
>> On Wed, November 25, 2009 3:56 pm, Jeff Davis wrote:
>> >
>> > I worry that we're getting further away from the original problem.
>> Let's
>> > allow functions to get the bytes of data from a COPY, like the
>> original
>> > proposal. I am not sure COPY is the best mechanism to move records
>> > around when INSERT ... SELECT already does that.
>> >
>>
>>
>> I am not at all sure I think that's a good idea, though. We have
>> pg_read_file() for getting raw bytes from files. Building that into COPY
>> does not strike me as a good fit.
>
> I think we're in agreement. All I mean is that the second argument to
> COPY should produce/consume bytes and not records. I'm not discussing
> the internal implementation at all, only semantics.
>
> In other words, STDIN is not a source of records, it's a source of
> bytes; and likewise for STDOUT.
>

Hmm. I can just imagine wanting to do that as a way of running COPY over
dblink. Are there other use cases?

cheers

andrew


From: "Andrew Dunstan" <andrew(at)dunslane(dot)net>
To: "Jeff Davis" <pgsql(at)j-davis(dot)com>
Cc: "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com>, "Daniel Farina" <drfarina(at)gmail(dot)com>, "Hannu Krosing" <hannu(at)krosing(dot)net>, "Greg Smith" <greg(at)2ndquadrant(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Daniel Farina" <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-26 15:11:12
Message-ID: 51840.76.125.11.227.1259248272.squirrel@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, November 26, 2009 2:22 am, Jeff Davis wrote:
> On Thu, 2009-11-26 at 00:35 -0500, Andrew Dunstan wrote:
>> On Wed, November 25, 2009 3:56 pm, Jeff Davis wrote:
>> >
>> > I worry that we're getting further away from the original problem.
>> Let's
>> > allow functions to get the bytes of data from a COPY, like the
>> original
>> > proposal. I am not sure COPY is the best mechanism to move records
>> > around when INSERT ... SELECT already does that.
>> >
>>
>>
>> I am not at all sure I think that's a good idea, though. We have
>> pg_read_file() for getting raw bytes from files. Building that into COPY
>> does not strike me as a good fit.
>
> I think we're in agreement. All I mean is that the second argument to
> COPY should produce/consume bytes and not records. I'm not discussing
> the internal implementation at all, only semantics.
>
> In other words, STDIN is not a source of records, it's a source of
> bytes; and likewise for STDOUT.
>

Hmm. I can just imagine wanting to do that as a way of running COPY over
dblink. Are there other use cases?

cheers

andrew


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Daniel Farina <drfarina(at)gmail(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-26 15:48:36
Message-ID: 162867790911260748k454e89a1q8cd7850d2669461b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2009/11/26 Jeff Davis <pgsql(at)j-davis(dot)com>:
> On Thu, 2009-11-26 at 05:01 +0100, Pavel Stehule wrote:
>> It working like:
>>
>> 1. EXECUTE SELECT 0 FROM generate_series(1,...);
>> 2. STORE RESULT TO TABLE zero;
>> 3. EXECUTE SELECT 1/i FROM zero;
>> 4. STORE RESULT TO TABLE tmp;
>>
>> Problem is in seq execution. Result is stored to destination after
>> execution - so any materialisation is necessary,
>>
>
> My example showed that steps 3 and 4 are not executed sequentially, but
> are executed together. If 3 was executed entirely before 4, then the
> statement:
>  insert into tmp select 1/i from zero;
> would have to read the whole table "zero" before an error is
> encountered.

you have a true. I checked it with functions in plpgsql and before trigger

postgres=# create or replace function generator() returns setof int as
$$begin raise notice 'generator start'; for i in 1..10 loop raise
notice 'generator %', i; return next i; end loop; raise notice
'generator end'; return; end$$ language plpgsql;
CREATE FUNCTION

postgres=# create or replace function rowfce(int) returns int as
$$begin raise notice 'rowfce %i', $1; return $1 + 1; end; $$ language
plpgsql;
CREATE FUNCTION

postgres=# create function trgbody() returns trigger as $$begin raise
notice 'trgbody %', new; return new; end;$$ language plpgsql;
CREATE FUNCTION

postgres=# create trigger xxx before insert on dest for each row
execute procedure trgbody();
CREATE TRIGGER

then I checked

postgres=# insert into dest select rowfce(i) from generator() g(i);
NOTICE: generator start
NOTICE: generator 1
NOTICE: generator 2
NOTICE: generator 3
NOTICE: generator 4
NOTICE: generator 5
NOTICE: generator 6
NOTICE: generator 7
NOTICE: generator 8
NOTICE: generator 9
NOTICE: generator 10
NOTICE: generator end
NOTICE: rowfce 1i
NOTICE: trgbody (2)
NOTICE: rowfce 2i
NOTICE: trgbody (3)
NOTICE: rowfce 3i
NOTICE: trgbody (4)
NOTICE: rowfce 4i
NOTICE: trgbody (5)
NOTICE: rowfce 5i
NOTICE: trgbody (6)
NOTICE: rowfce 6i
NOTICE: trgbody (7)
NOTICE: rowfce 7i
NOTICE: trgbody (8)
NOTICE: rowfce 8i
NOTICE: trgbody (9)
NOTICE: rowfce 9i
NOTICE: trgbody (10)
NOTICE: rowfce 10i
NOTICE: trgbody (11)

so INSERT INTO SELECT works well. Problem is in func scan implementation.

Regards
Pavel Stehule

>
> However, the statement errors immediately, showing that steps 3 and 4
> are pipelined.
>
> Regards,
>        Jeff Davis
>
>


From: David Fetter <david(at)fetter(dot)org>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Daniel Farina <drfarina(at)gmail(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-27 02:13:46
Message-ID: 20091127021346.GA2303@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Nov 26, 2009 at 10:11:12AM -0500, Andrew Dunstan wrote:
> On Thu, November 26, 2009 2:22 am, Jeff Davis wrote:
> > On Thu, 2009-11-26 at 00:35 -0500, Andrew Dunstan wrote:
> >> On Wed, November 25, 2009 3:56 pm, Jeff Davis wrote:
> >> >
> >> > I worry that we're getting further away from the original problem.
> >> Let's
> >> > allow functions to get the bytes of data from a COPY, like the
> >> original
> >> > proposal. I am not sure COPY is the best mechanism to move records
> >> > around when INSERT ... SELECT already does that.
> >> >
> >>
> >>
> >> I am not at all sure I think that's a good idea, though. We have
> >> pg_read_file() for getting raw bytes from files. Building that into COPY
> >> does not strike me as a good fit.
> >
> > I think we're in agreement. All I mean is that the second argument to
> > COPY should produce/consume bytes and not records. I'm not discussing
> > the internal implementation at all, only semantics.
> >
> > In other words, STDIN is not a source of records, it's a source of
> > bytes; and likewise for STDOUT.
>
> Hmm. I can just imagine wanting to do that as a way of running COPY over
> dblink. Are there other use cases?

It'd be nice to make this available to the next revision of DBI-Link
and it'll be pretty handy for our SQL/MED whenever that happens.

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Daniel Farina <drfarina(at)gmail(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Jeff Davis <pgsql(at)j-davis(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-27 02:30:08
Message-ID: 7b97c5a40911261830w5187cc49lfcfe93c30cbc9d4e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Nov 26, 2009 at 6:13 PM, David Fetter <david(at)fetter(dot)org> wrote:
> It'd be nice to make this available to the next revision of DBI-Link
> and it'll be pretty handy for our SQL/MED whenever that happens.

Okay, so this thread sort of wandered into how we could refactor other
elements of COPY. Do we have a good sense on what we should do to the
current patch (or at least the idea represented by it) to get it into
a committable state within finite time?

I think adding a bytea and/or text mode is once such improvement...I
am still reluctant to give up on INTERNAL because the string buffer
passed in the INTERNAL scenario is ideal for C programmers -- the
interface is even simpler than dealing with varlena types. But I
agree that auxiliary modes should exist to enable easier hacking.

The thorniest issue in my mind is how state can be initialized
retained and/or modified between calls to the bytestream-acceptance
function.

Arguably it is already in a state where it is no worse than dblink,
which itself has a global hash table to manage state.

Also, if you look carefully at the dblink test suite I submitted,
you'll see an interesting trick: one can COPY from multiple sources
consecutively to a single COPY on a remote node when in text mode
(binary mode has a header that cannot be so neatly catenated). This
is something that's pretty hard to enable with any automatic
startup-work-cleanup approach.

fdr


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Daniel Farina <drfarina(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-27 13:39:43
Message-ID: 1259329183.13774.3185.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2009-11-24 at 22:13 -0800, Jeff Davis wrote:

> My disagreement with the row-by-row approach is more semantics than
> performance. COPY translates records to bytes and vice-versa, and your
> original patch maintains those semantics.

The bytes <-> records conversion is a costly one. Anything we can do to
avoid that in either direction will be worth it. I would regard
performance as being part/most of the reason to support this.

--
Simon Riggs www.2ndQuadrant.com


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Daniel Farina <drfarina(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-27 20:55:49
Message-ID: 1259355349.19289.536.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2009-11-27 at 13:39 +0000, Simon Riggs wrote:
> On Tue, 2009-11-24 at 22:13 -0800, Jeff Davis wrote:
>
> > My disagreement with the row-by-row approach is more semantics than
> > performance. COPY translates records to bytes and vice-versa, and your
> > original patch maintains those semantics.
>
> The bytes <-> records conversion is a costly one. Anything we can do to
> avoid that in either direction will be worth it. I would regard
> performance as being part/most of the reason to support this.
>

Right. I was responding to an idea that copy support sending records
from a table to a function, or from a function to a table, which is
something that INSERT/SELECT can already do.

Our use case is a table to a remote table, so it would go something
like:
1. COPY TO WITH BINARY on local node
2. stream output bytes from #1 to remote node
3. COPY FROM WITH BINARY on remote node

The only faster mechanism that I could imagine is sending the records
themselves, which would be machine-dependent.

Regards,
Jeff Davis


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Daniel Farina <drfarina(at)gmail(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-28 01:28:13
Message-ID: 4B107CAD.2060600@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Davis wrote:
> All I mean is that the second argument to
> COPY should produce/consume bytes and not records. I'm not discussing
> the internal implementation at all, only semantics.
>
> In other words, STDIN is not a source of records, it's a source of
> bytes; and likewise for STDOUT.
>
In the context of the read case, I'm not as sure it's so black and
white. While the current situation does map better to a function that
produces a stream of bytes, that's not necessarily the optimal approach
for all situations. It's easy to imagine a function intended for
accelerating bulk loading that is internally going to produce a stream
of already processed records. A good example would be a function that
is actually reading from another database system for the purpose of
converting its data into PostgreSQL. If those were then loaded by a
fairly direct path, that would happen at a much higher rate than if one
had to convert those back into a stream of bytes with delimiters and
then re-parse.

I think there's a very valid use-case for both approaches. Maybe it
just turns into an option, so you can get a faster loading path record
at a time or just produce a stream characters, depending on what your
data source maps to better. Something like this:

COPY target FROM FUNCTION foo() WITH RECORDS;
COPY target FROM FUNCTION foo() WITH BYTES;

Would seem to cover both situations. I'd think that the WITH BYTES
situation would just do some basic parsing and then pass the result
through the same basic code path as WITH RECORDS, so having both
available shouldn't increase the size of the implementation that much.

--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.com


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Daniel Farina <drfarina(at)gmail(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-30 02:23:39
Message-ID: 1259547819.3355.31.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2009-11-27 at 20:28 -0500, Greg Smith wrote:
> In the context of the read case, I'm not as sure it's so black and
> white. While the current situation does map better to a function that
> produces a stream of bytes, that's not necessarily the optimal approach
> for all situations. It's easy to imagine a function intended for
> accelerating bulk loading that is internally going to produce a stream
> of already processed records.

The binary COPY mode is one of the closest things I can think of to
"already-processed records". Is binary COPY slow? If so, the only thing
faster would have to be machine-specific, I think.

> I think there's a very valid use-case for both approaches.
...
> COPY target FROM FUNCTION foo() WITH RECORDS;

In what format would the records be?

Also, this still doesn't really answer why INSERT ... SELECT isn't good
enough. If the records really are in their internal format, then
INSERT ... SELECT seems like the way to go.

Regards,
Jeff Davis


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Daniel Farina <drfarina(at)gmail(dot)com>
Cc: David Fetter <david(at)fetter(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-30 02:35:45
Message-ID: 1259548545.3355.43.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2009-11-26 at 18:30 -0800, Daniel Farina wrote:
> Okay, so this thread sort of wandered into how we could refactor other
> elements of COPY. Do we have a good sense on what we should do to the
> current patch (or at least the idea represented by it) to get it into
> a committable state within finite time?

We're in the middle of a commitfest, so a lot of hackers are
concentrating on other patches. In a week or two, when it winds down,
people will be more willing to make decisions on new proposals and
designs. I still think this thread has been productive.

> I think adding a bytea and/or text mode is once such improvement...I
> am still reluctant to give up on INTERNAL because the string buffer
> passed in the INTERNAL scenario is ideal for C programmers -- the
> interface is even simpler than dealing with varlena types. But I
> agree that auxiliary modes should exist to enable easier hacking.

I like the idea of an internal mode as well. We may need some kind of
performance numbers to justify avoiding the extra memcpy, though.

> The thorniest issue in my mind is how state can be initialized
> retained and/or modified between calls to the bytestream-acceptance
> function.
>
> Arguably it is already in a state where it is no worse than dblink,
> which itself has a global hash table to manage state.

The idea of using a separate type of object (e.g. "CREATE COPYSTREAM")
to bundle the init/read/write/end functions together might work. That
also allows room to specify what the functions should accept
(internal/bytea/text).

I think that's the most elegant solution (at least it sounds good to
me), but others might not like the idea of a new object type just for
this feature. Perhaps if it fits nicely within an overall SQL/MED-like
infrastructure, it will be easier to justify.

> Also, if you look carefully at the dblink test suite I submitted,
> you'll see an interesting trick: one can COPY from multiple sources
> consecutively to a single COPY on a remote node when in text mode
> (binary mode has a header that cannot be so neatly catenated). This
> is something that's pretty hard to enable with any automatic
> startup-work-cleanup approach.

What if the network buffer is flushed in the middle of a line? Is that
possible, or is there a guard against that somewhere?

Regards,
Jeff Davis


From: Daniel Farina <drfarina(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: David Fetter <david(at)fetter(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-30 02:53:58
Message-ID: 7b97c5a40911291853j1388346cv4051a98404c9d5ef@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Nov 29, 2009 at 6:35 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> What if the network buffer is flushed in the middle of a line? Is that
> possible, or is there a guard against that somewhere?

What do you mean? They both catenate onto one stream of bytes, it
shouldn't matter where the flush boundaries are...

It so happens as a convenient property of the textual modes is that
adding more payload is purely concatenative (not true for binary,
where there's a header that would cause confusion to the receiving
side)

fdr


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Daniel Farina <drfarina(at)gmail(dot)com>
Cc: David Fetter <david(at)fetter(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-30 03:01:59
Message-ID: 1259550119.3355.45.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, 2009-11-29 at 18:53 -0800, Daniel Farina wrote:
> On Sun, Nov 29, 2009 at 6:35 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> > What if the network buffer is flushed in the middle of a line? Is that
> > possible, or is there a guard against that somewhere?
>
> What do you mean? They both catenate onto one stream of bytes, it
> shouldn't matter where the flush boundaries are...

Nevermind, for some reason I thought you were talking about interleaving
the data rather than concatenating.

Regards,
Jeff Davis


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Hannu Krosing <hannu(at)krosing(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-11-30 20:14:35
Message-ID: 4B1427AB.4070302@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Davis wrote:
>> COPY target FROM FUNCTION foo() WITH RECORDS;
>>
>
> In what format would the records be?
>
What was your intended internal format for this form to process?

--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.com


From: Daniel Farina <drfarina(at)acm(dot)org>
To: Greg Smith <greg(at)2ndquadrant(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-12-05 08:32:04
Message-ID: 7b97c5a40912050032g5651c257q49b9cdca6da82e2d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Nov 30, 2009 at 12:14 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> Jeff Davis wrote:
>
> COPY target FROM FUNCTION foo() WITH RECORDS;
>
>
> In what format would the records be?

As a not-quite aside, I think I have a better syntax suggestion. I
have recently been digging around in the grammar with the changes made
in the following commit:

commit a6833fbb85cb5212a9d8085849e7281807f732a6
Author: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Date: Mon Sep 21 20:10:21 2009 +0000

Define a new, more extensible syntax for COPY options.

This is intentionally similar to the recently revised syntax for EXPLAIN
options, ie, (name value, ...). The old syntax is still supported for
backwards compatibility, but we intend that any options added in future
will be provided only in the new syntax.

Robert Haas, Emmanuel Cecchet

As it turns out, the following syntax may work pretty well:

COPY y TO FUNCTION (setup_function the_setup_function('some arg', 3, 7, 42))

Basically the func_expr reduction fits very neatly into the
copy_generic_opt_elem reduction:

copy_generic_opt_elem:
ColLabel copy_generic_opt_arg
{
$$ = (Node *) makeDefElem($1, $2);
}
| ColLabel func_expr
{
$$ = (Node *) $2;
}
;

Now we can use more or less any reasonable number of symbol names and
function calls we desire. This makes life considerably easier, I
think...

We can also try to refactor COPY's internals to take advantage of
these features (and potentially reduce the number of mechanisms. For
example, the legacy "COPY ... TO '/place' WITH CSV" perhaps can be
more verbosely/generically expressed as:

COPY ... TO FUNCTION (setup_function to_file('/place'),
record_converter csv_formatter,
stream_function fwrite
end_function fclose);

We can also add auxiliary symbols for error handling behavior. For
example, were the COPY to fail for some reason maybe it would make
sense "on_error" to call "unlink" to clean up the partially finished
file.

I also have what I think is a somewhat interesting hack. Consider
some of the functions up there without arguments (presumably they'd be
called with a somewhat fixed contract the mechanics of COPY itself):
how does one disambiguate them? Ideally, one could sometimes use
literal arguments (when the return value of that function is desired
to be threaded through the other specified functions) and other times
it'd be nice to disambiguate functions via type names. That would
look something like the following:

COPY ... TO FUNCTION (setup_function to_file('/place'),
record_converter csv_formatter(record),
stream_function fwrite(bytea),
end_function fclose(text));

I think this is possible to implement without much ambiguity, drawing
on the observation that the COPY statement does not have -- and
probably will never have -- references via Var(iable) node, unlike
normal SQL statements such as SELECT, INSERT, et al. That means we
might be able disambiguate using the following rules when scanning the
funcexpr's arguments during the semantic analysis phase to figure out
what to do:

* Only literal list items found: it's a function call with the types
of those literals. Ex: my_setup_function('foo'::text, 3)

* Only non-literal list items found: it's type specifiers. Ex:
csv_formatter(record).

* Both literal and non-literal values found: report an error.

This only works because no cases where a non-literal quantity could be
confused with a type name come to mind. If one could name a type "3"
and being forced to double-quote "3" to get your type disambiguated
was just too ugly, then we are at an impasse. But otherwise I think
this may work quite well.

Common constellations of functions could perhaps be bound together
into a DDL to reduce the amount of symbol soup going on here, but that
seems like a pretty clean transition strategy at some later time.
Most of the functionality could still be captured with this simple
approach for now...

Also note that factoring out high-performance implementations of
things like csv_formatter (and friends: pg_binary_formatter) will
probably take some time, but ultimately I think all the existing
functionality could be realized as a layer of syntactic sugar over
this mechanism.

fdr


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Daniel Farina <drfarina(at)acm(dot)org>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-12-30 01:57:35
Message-ID: 603c8f070912291757u51f7ced5ga48fd1e92f77af5a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Dec 5, 2009 at 3:32 AM, Daniel Farina <drfarina(at)acm(dot)org> wrote:
> On Mon, Nov 30, 2009 at 12:14 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
>> Jeff Davis wrote:
>>
>> COPY target FROM FUNCTION foo() WITH RECORDS;
>>
>>
>> In what format would the records be?
>
> As a not-quite aside, I think I have a better syntax suggestion.  I
> have recently been digging around in the grammar with the changes made
> in the following commit:
>
> commit a6833fbb85cb5212a9d8085849e7281807f732a6
> Author: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> Date:   Mon Sep 21 20:10:21 2009 +0000
>
>    Define a new, more extensible syntax for COPY options.
>
>    This is intentionally similar to the recently revised syntax for EXPLAIN
>    options, ie, (name value, ...).  The old syntax is still supported for
>    backwards compatibility, but we intend that any options added in future
>    will be provided only in the new syntax.
>
>    Robert Haas, Emmanuel Cecchet
>
> As it turns out, the following syntax may work pretty well:
>
>  COPY y TO FUNCTION (setup_function the_setup_function('some arg', 3, 7, 42))
>
> Basically the func_expr reduction fits very neatly into the
> copy_generic_opt_elem reduction:
>
>    copy_generic_opt_elem:
>                            ColLabel copy_generic_opt_arg
>                                    {
>                                            $$ = (Node *) makeDefElem($1, $2);
>                                    }
>                            | ColLabel func_expr
>                                    {
>                                            $$ = (Node *) $2;
>                                    }
>                    ;
>
> Now we can use more or less any reasonable number of symbol names and
> function calls we desire.  This makes life considerably easier, I
> think...
>
> We can also try to refactor COPY's internals to take advantage of
> these features (and potentially reduce the number of mechanisms.  For
> example, the legacy "COPY ... TO '/place' WITH CSV" perhaps can be
> more verbosely/generically expressed as:
>
>  COPY ... TO FUNCTION (setup_function to_file('/place'),
>                        record_converter csv_formatter,
>                        stream_function fwrite
>                        end_function fclose);
>
> We can also add auxiliary symbols for error handling behavior.  For
> example, were the COPY to fail for some reason maybe it would make
> sense "on_error" to call "unlink" to clean up the partially finished
> file.
>
> I also have what I think is a somewhat interesting hack.  Consider
> some of the functions up there without arguments (presumably they'd be
> called with a somewhat fixed contract the mechanics of COPY itself):
> how does one disambiguate them?  Ideally, one could sometimes use
> literal arguments (when the return value of that function is desired
> to be threaded through the other specified functions) and other times
> it'd be nice to disambiguate functions via type names.  That would
> look something like the following:
>
>  COPY ... TO FUNCTION (setup_function to_file('/place'),
>                        record_converter csv_formatter(record),
>                        stream_function fwrite(bytea),
>                        end_function fclose(text));
>
> I think this is possible to implement without much ambiguity, drawing
> on the observation that the COPY statement does not have -- and
> probably will never have -- references via Var(iable) node, unlike
> normal SQL statements such as SELECT, INSERT, et al.  That means we
> might be able disambiguate using the following rules when scanning the
> funcexpr's arguments during the semantic analysis phase to figure out
> what to do:
>
>  * Only literal list items found: it's a function call with the types
>    of those literals.  Ex: my_setup_function('foo'::text, 3)
>
>  * Only non-literal list items found: it's type specifiers.  Ex:
>    csv_formatter(record).
>
>  * Both literal and non-literal values found: report an error.
>
> This only works because no cases where a non-literal quantity could be
> confused with a type name come to mind.  If one could name a type "3"
> and being forced to double-quote "3" to get your type disambiguated
> was just too ugly, then we are at an impasse.  But otherwise I think
> this may work quite well.
>
> Common constellations of functions could perhaps be bound together
> into a DDL to reduce the amount of symbol soup going on here, but that
> seems like a pretty clean transition strategy at some later time.
> Most of the functionality could still be captured with this simple
> approach for now...
>
> Also note that factoring out high-performance implementations of
> things like csv_formatter (and friends: pg_binary_formatter) will
> probably take some time, but ultimately I think all the existing
> functionality could be realized as a layer of syntactic sugar over
> this mechanism.

I am very fuzzy on where we stand with this patch. There's a link to
the initial version on the commitfest application, but there has been
so much discussion since then that I think there are probably some
revisions to be made, though I can't say I know exactly what they are.

I did also check the git branch, but it does not merge cleanly with the master.

Is there a more recent version? Is this still under development? Or
have we decided to go a different direction with it?

Thanks,

...Robert


From: Daniel Farina <drfarina(at)acm(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-12-30 02:23:38
Message-ID: 7b97c5a40912291823v405037b9v30e454b5c7d2c39@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 29, 2009 at 5:57 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I am very fuzzy on where we stand with this patch.  There's a link to
> the initial version on the commitfest application, but there has been
> so much discussion since then that I think there are probably some
> revisions to be made, though I can't say I know exactly what they are.
>
> I did also check the git branch, but it does not merge cleanly with the master.
>
> Is there a more recent version?  Is this still under development?  Or
> have we decided to go a different direction with it?

I think we've decided to go in a different direction for
implementation, and my last communication was to suggest a mechanism
that would allow for something more clean using the copy options
refactoring. I wouldn't even attempt to merge it unless there's big
outcry for the feature as-is, which I doubt there is. But perhaps
it's worthy TODO fodder.

The mechanics in the email you replied to probably need more feedback to act.

fdr


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Daniel Farina <drfarina(at)acm(dot)org>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-12-30 02:48:11
Message-ID: 603c8f070912291848x6cd0bbcck96af0cb75cd1e401@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 29, 2009 at 9:23 PM, Daniel Farina <drfarina(at)acm(dot)org> wrote:
> On Tue, Dec 29, 2009 at 5:57 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> I am very fuzzy on where we stand with this patch.  There's a link to
>> the initial version on the commitfest application, but there has been
>> so much discussion since then that I think there are probably some
>> revisions to be made, though I can't say I know exactly what they are.
>>
>> I did also check the git branch, but it does not merge cleanly with the master.
>>
>> Is there a more recent version?  Is this still under development?  Or
>> have we decided to go a different direction with it?
>
> I think we've decided to go in a different direction for
> implementation, and my last communication was to suggest a mechanism
> that would allow for something more clean using the copy options
> refactoring.  I wouldn't even attempt to merge it unless there's big
> outcry for the feature as-is, which I doubt there is.  But perhaps
> it's worthy TODO fodder.
>
> The mechanics in the email you replied to probably need more feedback to act.

I think there's clear support for a version of COPY that returns rows
like a SELECT statement, particularly for use with CTEs. There seems
to be support both for a mode that returns text[] or something like it
and also for a mode that returns a defined record type. But that all
seems separate from what you're proposing here, which is a
considerably lower-level facility which seems like it would not be of
much use to ordinary users, but might be of some value to tool
implementors - or perhaps you'd disagree with that characterization?

Anyway, my specific reaction to your suggestions in the email that I
quoted is that it seems a bit baroque and that I'm not really sure
what it's useful for in practice. I'm certainly not saying it ISN'T
useful, because I can't believe that you would have gone to the
trouble to work through all of this unless you had some ideas about
nifty things that could be done with it, but I think maybe we need to
back up and start by talking about the problems you're trying to
solve, before we get too far down into a discussion of implementation
details. It doesn't appear to me that's been discussed too much so
far, although there's enough enthusiasm here to make me suspect that
other people may understand it better than I do.

Based on your comments above, I'm going to go ahead and mark this
particular patch as Returned with Feedback, since it seems you don't
intend to continue with it and therefore it won't need to be reviewed
during the January CommitFest. But I'm interesting in continuing the
discussion and in any new patches that come out of it.

...Robert


From: Daniel Farina <drfarina(at)acm(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-12-30 02:56:16
Message-ID: 7b97c5a40912291856k21b8a2x6da061aad0ea9089@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 29, 2009 at 6:48 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I think there's clear support for a version of COPY that returns rows
> like a SELECT statement, particularly for use with CTEs.  There seems
> to be support both for a mode that returns text[] or something like it
> and also for a mode that returns a defined record type.  But that all
> seems separate from what you're proposing here, which is a
> considerably lower-level facility which seems like it would not be of
> much use to ordinary users, but might be of some value to tool
> implementors - or perhaps you'd disagree with that characterization?
>

This is in the other direction: freeing COPY from the restriction that
it can only put bytes into two places:

* A network socket (e.g. stdout)
* A file (as supseruser)

Instead, it can hand off bytes to an arbitrary UDF that can handle it
in any way. A clean design should be able to subsume at least the
existing simple behaviors, plus enabling more, as well as potentially
providing inspiration for how to decouple at least a few components of
COPY that perhaps can benefit the long-term cleanup effort there.

> Anyway, my specific reaction to your suggestions in the email that I
> quoted is that it seems a bit baroque and that I'm not really sure
> what it's useful for in practice.  I'm certainly not saying it ISN'T
> useful, because I can't believe that you would have gone to the
> trouble to work through all of this unless you had some ideas about
> nifty things that could be done with it, but I think maybe we need to
> back up and start by talking about the problems you're trying to
> solve, before we get too far down into a discussion of implementation
> details.

At Truviso this is used a piece of our replication solution. In the
patches submitted we see how enhancing dblink allows postgres to copy
directly from one node to another. Truviso uses it to directly write
bytes to a libpq connection (see the dblink patch) in the open COPY
state to achieve direct cross-node bulk loading for the purposes of
replication.

One could imagine a lot of ETL or data warehouse offloading
applications that can be enabled by allowing bytes to be handled by
arbitrary code, although this patch achieves nothing that writing some
middleware could not accomplish: it's just convenient to have and
likely more efficient than writing some application middleware to do
the same thing.

fdr


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Daniel Farina <drfarina(at)acm(dot)org>, Greg Smith <greg(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-12-30 03:21:13
Message-ID: 1262143273.22866.3324.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2009-12-29 at 21:48 -0500, Robert Haas wrote:
> Anyway, my specific reaction to your suggestions in the email that I
> quoted is that it seems a bit baroque and that I'm not really sure
> what it's useful for in practice. I'm certainly not saying it ISN'T
> useful, because I can't believe that you would have gone to the
> trouble to work through all of this unless you had some ideas about
> nifty things that could be done with it, but I think maybe we need to
> back up and start by talking about the problems you're trying to
> solve, before we get too far down into a discussion of implementation
> details. It doesn't appear to me that's been discussed too much so
> far, although there's enough enthusiasm here to make me suspect that
> other people may understand it better than I do.

Dan made the use case fairly clear:

"I have extended COPY to support using a UDF as a target instead of the
normal 'file' or STDOUT targets. This dovetails nicely with a couple
of extensions I have also written for dblink for the purposes of
enabling direct cross-node bulk loading and replication."
http://archives.postgresql.org/pgsql-hackers/2009-11/msg01555.php

David Fetter had some ideas as well:
http://archives.postgresql.org/pgsql-hackers/2009-11/msg01826.php

Regards,
Jeff Davis


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Daniel Farina <drfarina(at)acm(dot)org>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-12-30 04:11:18
Message-ID: 603c8f070912292011r1fac6429kf7985e1bb2ad3022@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 29, 2009 at 9:56 PM, Daniel Farina <drfarina(at)acm(dot)org> wrote:
> On Tue, Dec 29, 2009 at 6:48 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> I think there's clear support for a version of COPY that returns rows
>> like a SELECT statement, particularly for use with CTEs.  There seems
>> to be support both for a mode that returns text[] or something like it
>> and also for a mode that returns a defined record type.  But that all
>> seems separate from what you're proposing here, which is a
>> considerably lower-level facility which seems like it would not be of
>> much use to ordinary users, but might be of some value to tool
>> implementors - or perhaps you'd disagree with that characterization?
>>
>
> This is in the other direction: freeing COPY from the restriction that
> it can only put bytes into two places:
>
> * A network socket (e.g. stdout)
> * A file (as supseruser)

Oh, duh. Actually, that seems like a pretty solid idea.

I fear that to make this really useful we would need to define some
new SQL syntax, like:

CREATE [OR REPLACE] COPY TARGET name (STARTUP function_name, STREAM
function_name, SHUTDOWN function_name);
DROP COPY TARGET name;
GRANT USAGE ON COPY TARGET TO ...;

COPY ... TO/FROM TARGET name (generic_option_list) WITH (options);

We could define the startup function to get the parameter list as a
list of DefElems and return an internal structure of its own devising
which would then be passed to the stream and shutdown functions.

It might be possible to do this without introducing a notion of an
explicit object, but there are a couple of problems with that. First,
it requires the user to specify a lot of details on every COPY
invocation, which is awkward. Second, there's a security issue to
think about here. If we were just copying to a UDF that took a string
as an argument, that would be fine, but it's not safe to let
unprivileged users to directly invoke functions that take a
type-internal argument. Introducing an explicit object type would
allow the creation of copy targets to be restricted to super-users but
then granted out to whom the super-user chooses.

Thoughts?

...Robert


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Daniel Farina <drfarina(at)acm(dot)org>, Greg Smith <greg(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-12-30 04:44:49
Message-ID: 1262148289.22866.3446.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2009-12-29 at 23:11 -0500, Robert Haas wrote:
> I fear that to make this really useful we would need to define some
> new SQL syntax, like:
>
> CREATE [OR REPLACE] COPY TARGET name (STARTUP function_name, STREAM
> function_name, SHUTDOWN function_name);
> DROP COPY TARGET name;
> GRANT USAGE ON COPY TARGET TO ...;
>
> COPY ... TO/FROM TARGET name (generic_option_list) WITH (options);

Similar ideas were already suggested:

http://archives.postgresql.org/pgsql-hackers/2009-11/msg01601.php
http://archives.postgresql.org/pgsql-hackers/2009-11/msg01610.php

Regardless, I think there needs to be a way to pass arguments to the
functions (at least the startup one). The obvious use case is to pass
the destination table name, so that you don't have to define a separate
target for each destination table.

Regards,
Jeff Davis


From: Daniel Farina <drfarina(at)acm(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-12-30 04:47:44
Message-ID: 7b97c5a40912292047y14e1121arfafa98b8ff579e28@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 29, 2009 at 8:11 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> It might be possible to do this without introducing a notion of an
> explicit object, but there are a couple of problems with that.  First,
> it requires the user to specify a lot of details on every COPY
> invocation, which is awkward.  Second, there's a security issue to
> think about here.  If we were just copying to a UDF that took a string
> as an argument, that would be fine, but it's not safe to let
> unprivileged users to directly invoke functions that take a
> type-internal argument.  Introducing an explicit object type would
> allow the creation of copy targets to be restricted to super-users but
> then granted out to whom the super-user chooses.
>
> Thoughts?

Agree on the type internal and superuser access -- indeed, if one were
to refactor the two existing COPY modes into external functions, the
stdout behavior would be marked with SECURITY DEFINER and the to-file
functions would only be superuser-accessible. (Interesting note: that
means copy.c could theoretically lose the special check for superuser
privilege for this mode, relying on standard function permissions...).

I was mostly satisfied with a byzantine but otherwise semantically
simple interface until the idea matures some more -- perhaps in
practice -- to inform the most useful kind of convenience to support
it. I don't see a strong reason personally to rush into defining such
an interface just yet, although an important interface we would have
to define is contract a user would have to follow to enable their very
own fully featured COPY output mode.

Still, the patch as-submitted is quite far from achieving one of my
main litmus test goals: subsumption of existing COPY behavior.
Particularly thorny was how to support the copying-to-a-file
semantics, but I believe that the copy-options patch provide a nice
avenue to solve this problem, as one can call a function in the
options list and somehow pass the return value of that initializer --
which may include a file handle -- to the byte-handling function.

Finally, I think a valid point was made that the patch is much more
powerful to end users if it supports byte arrays, and there are some
open questions as to whether this should be the only/primary supported
mode. I personally like the INTERNAL-type interface, as dealing with
the StringInfo buffer used by current COPY code is very convenient
from C and the current implementation is not very complicated yet
avoids unnecessary copying/value coercions, but I agree there is
definitely value in enabling the use of more normal data types.

fdr


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Daniel Farina <drfarina(at)acm(dot)org>, Greg Smith <greg(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-12-30 05:00:34
Message-ID: 603c8f070912292100i7b99bdddj46a59c96bca9e4b0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 29, 2009 at 11:44 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> On Tue, 2009-12-29 at 23:11 -0500, Robert Haas wrote:
>> I fear that to make this really useful we would need to define some
>> new SQL syntax, like:
>>
>> CREATE [OR REPLACE] COPY TARGET name (STARTUP function_name, STREAM
>> function_name, SHUTDOWN function_name);
>> DROP COPY TARGET name;
>> GRANT USAGE ON COPY TARGET TO ...;
>>
>> COPY ... TO/FROM TARGET name (generic_option_list) WITH (options);
>
> Similar ideas were already suggested:
>
> http://archives.postgresql.org/pgsql-hackers/2009-11/msg01601.php
> http://archives.postgresql.org/pgsql-hackers/2009-11/msg01610.php

Sorry, it's been a while since I've read through this thread and I am
not as up on it as perhaps I should be. I generally agree with those
ideas, although I think that trying to make the existing aggregate
interface serve this purpose will probably turn out to be trying to
make a square peg fit a round hole.

> Regardless, I think there needs to be a way to pass arguments to the
> functions (at least the startup one). The obvious use case is to pass
> the destination table name, so that you don't have to define a separate
> target for each destination table.

Agreed, note that I suggested a way to do that.

...Robert


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Daniel Farina <drfarina(at)acm(dot)org>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-12-30 05:11:48
Message-ID: 603c8f070912292111s1e7e90fei86b27c1ca4fa2815@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 29, 2009 at 11:47 PM, Daniel Farina <drfarina(at)acm(dot)org> wrote:
> On Tue, Dec 29, 2009 at 8:11 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> It might be possible to do this without introducing a notion of an
>> explicit object, but there are a couple of problems with that.  First,
>> it requires the user to specify a lot of details on every COPY
>> invocation, which is awkward.  Second, there's a security issue to
>> think about here.  If we were just copying to a UDF that took a string
>> as an argument, that would be fine, but it's not safe to let
>> unprivileged users to directly invoke functions that take a
>> type-internal argument.  Introducing an explicit object type would
>> allow the creation of copy targets to be restricted to super-users but
>> then granted out to whom the super-user chooses.
>>
>> Thoughts?
>
> Agree on the type internal and superuser access -- indeed, if one were
> to refactor the two existing COPY modes into external functions, the
> stdout behavior would be marked with SECURITY DEFINER and the to-file
> functions would only be superuser-accessible.  (Interesting note: that
> means copy.c could theoretically lose the special check for superuser
> privilege for this mode, relying on standard function permissions...).
>
> I was mostly satisfied with a byzantine but otherwise semantically
> simple interface until the idea matures some more -- perhaps in
> practice -- to inform the most useful kind of convenience to support
> it.  I don't see a strong reason personally to rush into defining such
> an interface just yet, although an important interface we would have
> to define is contract a user would have to follow to enable their very
> own fully featured COPY output mode.

I think we should try to put an interface layer in place in the first
iteration. I don't really see this as having much value without that.
If we implement this strictly as a series of COPY options, we're
going to be committed to supporting that interface forever whether it
turns out to be good for anything or not. Since any such interface
would pretty much have to be superuser-only, I'm inclined to think
that there is not enough bang for the buck to justify the ongoing
maintenance effort.

One way to figure out whether we've define the COPY TARGET interface
correctly is to put together a few sample implementations and see
whether they seem functional and useful, without too many lacunas. I
think it should be possible to assess from a few such implementations
whether we have basically the right design.

> Still, the patch as-submitted is quite far from achieving one of my
> main litmus test goals: subsumption of existing COPY behavior.
> Particularly thorny was how to support the copying-to-a-file
> semantics, but I believe that the copy-options patch provide a nice
> avenue to solve this problem, as one can call a function in the
> options list and somehow pass the return value of that initializer --
> which may include a file handle -- to the byte-handling function.

It may be useful to consider whether the current behavior could be
re-implemented using some proposed extension mechanism, but I would
not put much emphasis on trying to really subsume the current behavior
into such an implementation. Certainly, we will need to continue
support the existing syntax forever, so the existing functionality
will need to be special-cased at least to that extent.

> Finally, I think a valid point was made that the patch is much more
> powerful to end users if it supports byte arrays, and there are some
> open questions as to whether this should be the only/primary supported
> mode.  I personally like the INTERNAL-type interface, as dealing with
> the StringInfo buffer used by current COPY code is very convenient
> from C and the current implementation is not very complicated yet
> avoids unnecessary copying/value coercions, but I agree there is
> definitely value in enabling the use of more normal data types.

I agree that needs some further thought. Again, a few sample
implementations might help provide clarity here. I agree that the
StringInfo representation is a bit random, but OTOH I mostly see
someone installing a few COPY TARGET implementations into their DB and
then using them over and over again. Creating a new COPY TARGET
should be relatively rare, so if they have to be written in C, we may
not care that much. On the flip side we should be looking for some
concrete gain from putting that restriction into the mix.

...Robert


From: Daniel Farina <drfarina(at)acm(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-12-30 05:26:46
Message-ID: 7b97c5a40912292126t437866eam3b1a8a1ebb446f0a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 29, 2009 at 9:11 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I think we should try to put an interface layer in place in the first
> iteration.  I don't really see this as having much value without that.
>  If we implement this strictly as a series of COPY options, we're
> going to be committed to supporting that interface forever whether it
> turns out to be good for anything or not.  Since any such interface
> would pretty much have to be superuser-only, I'm inclined to think
> that there is not enough bang for the buck to justify the ongoing
> maintenance effort.

Interestingly, I came to the opposite conclusion you did: I thought
supporting new top-level syntax might be the more invasive and
support-heavy change as opposed to just changing the option handling
grammar to support funcall-looking things on the right-hand-side.

True, the semantics and interfaces between symbols would need to be
somewhat frozen and documented in either case. I do value in
supporting a notion of "this constellation of functions is OK, even if
one of them does take type INTERNAL", whereas my proposal has no
mechanism to address such a constellation: everything is independent,
which does not capture all the information necessary to determine if
it's safe to execute a particular combination of functions to perform
a COPY.

I think you may be right about this, so we could basically move most
things from the COPY option list into a new DDL option list...I am
more ambivalent, but it seems that would require a catalog, and so on,
which is why I did not leap to do that initially for the very
support-reasons you mentioned.

> One way to figure out whether we've define the COPY TARGET interface
> correctly is to put together a few sample implementations and see
> whether they seem functional and useful, without too many lacunas.  I
> think it should be possible to assess from a few such implementations
> whether we have basically the right design.

Okay, I was simply less optimistic than that (that we could get it
right so easily), and was content to put out something that had
reasonable design but perhaps byzantine (but well-defined) interfaces
and let complaints/wishes drive completion.

But given that the main difference between your proposal and mine is
to move things out of COPY's option list and into a toplevel DDL
option list, under the covers things are pretty samey, except I would
imagine your proposal requires committing to a new catalog(?) which
also enables addressing the combination of functions as a unit and a
new top-level DDL (but avoids committing to many new COPY options).

> I agree that needs some further thought.  Again, a few sample
> implementations might help provide clarity here.  I agree that the
> StringInfo representation is a bit random, but OTOH I mostly see
> someone installing a few COPY TARGET implementations into their DB and
> then using them over and over again.  Creating a new COPY TARGET
> should be relatively rare, so if they have to be written in C, we may
> not care that much.  On the flip side we should be looking for some
> concrete gain from putting that restriction into the mix.

I was originally inclined to agree, but I believe there are others who
the most interesting and use applications are only captured if
userland types are supported, and I can see good reason for that
belief...

Even something as simple as authoring "tee" for COPY will require
writing C here, whereas when supporting userland types people can hack
out a PL function that calls some other existing C-written functions
(we can provide a BYTEA veneer on top of the INTERNAL-version of a
function rather trivially express for this purpose...)

fdr


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Daniel Farina <drfarina(at)acm(dot)org>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-12-30 05:46:31
Message-ID: 603c8f070912292146y23e91a5dp2fc7a967e5908a5@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Dec 30, 2009 at 12:26 AM, Daniel Farina <drfarina(at)acm(dot)org> wrote:
> I think you may be right about this, so we could basically move most
> things from the COPY option list into a new DDL option list...I am
> more ambivalent, but it seems that would require a catalog, and so on,
> which is why I did not leap to do that initially for the very
> support-reasons you mentioned.

Adding a catalog doesn't bother me a bit. That doesn't really cost
anything. What I'm concerned about in terms of maintenance is that
anything we support in 8.5 will likely have to be supported in 8.6,
8.7, 8.8, 8.9, and 8.10, and probably beyond that. If we do something
that turns out to be CLEARLY a bad idea, then we MIGHT eventually be
able to get rid of it. But we probably won't be that (un)lucky.
We're more likely to do something that's sorta-useful, but makes
future enhancements that we care about more difficult. And then we'll
be stuck with it. That's really what I care about avoiding.

In other words, it's OK if we add something now that may need to be
further extended in the future, but it's BAD if we add something now
that we want to change incompatibly or take out in the future.

> But given that the main difference between your proposal and mine is
> to move things out of COPY's option list and into a toplevel DDL
> option list, under the covers things are pretty samey, except I would
> imagine your proposal requires committing to a new catalog(?) which
> also enables addressing the combination of functions as a unit and a
> new top-level DDL (but avoids committing to many new COPY options).

I agree with that summary. I don't think the catalog is a problem
(though we need to make sure to get the dependency handling correct).
The additional DDL syntax is more likely to draw objections, although
perhaps unsurprisingly I don't have a problem with it personally.

>> I agree that needs some further thought.  Again, a few sample
>> implementations might help provide clarity here.  I agree that the
>> StringInfo representation is a bit random, but OTOH I mostly see
>> someone installing a few COPY TARGET implementations into their DB and
>> then using them over and over again.  Creating a new COPY TARGET
>> should be relatively rare, so if they have to be written in C, we may
>> not care that much.  On the flip side we should be looking for some
>> concrete gain from putting that restriction into the mix.
>
> I was originally inclined to agree, but I believe there are others who
> the most interesting and use applications are only captured if
> userland types are supported, and I can see good reason for that
> belief...
>
> Even something as simple as authoring "tee" for COPY will require
> writing C here, whereas when supporting userland types people can hack
> out a PL function that calls some other existing C-written functions
> (we can provide a BYTEA veneer on top of the INTERNAL-version of a
> function rather trivially express for this purpose...)

Sure. If you think you can make it work using bytea, that seems
clearly better than using an internal type, all things being equal.

...Robert


From: Daniel Farina <drfarina(at)acm(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-12-30 05:56:04
Message-ID: 7b97c5a40912292156u13719546pbddd2f43557e3d71@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 29, 2009 at 9:46 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Sure.  If you think you can make it work using bytea, that seems
> clearly better than using an internal type, all things being equal.

I think both are perfectly viable and can be supported simultaneously,
actually...I simply like INTERNAL because the mechanism and passed
data structure for when you *do* want to write a C function is just
really crisp and simple, and is not going to be a source of overhead.

fdr


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Daniel Farina <drfarina(at)acm(dot)org>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-12-30 05:58:55
Message-ID: 603c8f070912292158p72740e57x62ea70df3470d74b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Dec 30, 2009 at 12:56 AM, Daniel Farina <drfarina(at)acm(dot)org> wrote:
> On Tue, Dec 29, 2009 at 9:46 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> Sure.  If you think you can make it work using bytea, that seems
>> clearly better than using an internal type, all things being equal.
>
> I think both are perfectly viable and can be supported simultaneously,
> actually...I simply like INTERNAL because the mechanism and passed
> data structure for when you *do* want to write a C function is just
> really crisp and simple, and is not going to be a source of overhead.

OK. I think we'll have to see the proposed patch before we can really
make a final judgement on this, but hopefully I've at least given you
enough feedback/food for thought to move this forward in a meaningful
way...

...Robert


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Daniel Farina <drfarina(at)acm(dot)org>, Greg Smith <greg(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date: 2009-12-30 06:00:25
Message-ID: 1262152825.22866.3527.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2009-12-30 at 00:00 -0500, Robert Haas wrote:
> I generally agree with those
> ideas, although I think that trying to make the existing aggregate
> interface serve this purpose will probably turn out to be trying to
> make a square peg fit a round hole.

I didn't think that they intended to actually use CREATE AGGREGATE,
although I could be mistaken.

Regards,
Jeff Davis