dblink bulk operations

Lists: pgsql-hackers
From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: dblink bulk operations
Date: 2009-08-06 15:11:58
Message-ID: 4A7AF2BE.2050205@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Last night I needed to move a bunch of data from an OLTP database to an
archive database, and used dblink with a bunch of insert statements.
Since I was moving about 4m records this was distressingly but not
surprisingly slow. It set me wondering why we don't build more support
for libpq operations into dblink, like transactions and prepared
queries, and maybe COPY too. It would be nice to be able to do something
like:

select dblink_connect('dbh','dbname=foo');
select dblink_begin('dbh');
select dblink_prepare('dbh','sth','insert into bar values ($1,$2,$3)');
select dblink_exec_prepared('dbh','sth',row(a,b,c)) from bar; -- can
we do this?
select dblink_commit('dbh');
select dblink_disconnect('dbh');

Does this seem worthwhile and doable, or am I smoking crack?

cheers

andrew


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: dblink bulk operations
Date: 2009-08-06 15:21:56
Message-ID: b42b73150908060821q5f5c5e2el93ffe36f1d936928@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Aug 6, 2009 at 11:11 AM, Andrew Dunstan<andrew(at)dunslane(dot)net> wrote:
>
> Last night I needed to move a bunch of data from an OLTP database to an
> archive database, and used dblink with a bunch of insert statements. Since I
> was moving about 4m records this was distressingly but not surprisingly
> slow. It set me wondering why we don't build more support for libpq
> operations into dblink, like transactions and prepared queries, and maybe
> COPY too. It would be nice to be able to do something like:
>
>   select dblink_connect('dbh','dbname=foo');
>   select dblink_begin('dbh');

you can always exec a sql 'begin'.

>   select dblink_prepare('dbh','sth','insert into bar values ($1,$2,$3)');
>   select dblink_exec_prepared('dbh','sth',row(a,b,c)) from bar; -- can
>   we do this?

The answer to this I think is yes, but not quite that way. Much
better I think is to use 8.4 variable argument functions, use
parametrized features off libpq always, and use the binary protocol
when possible. This does end up running much faster, and easier to
use...(we've done exactly that for our in house stuff). IIRC you can
parameterize 'execute', so the above should work for prepared queries
as well.

If we get the ability to set specific OIDs for types, I can remove
some of the hacks we have to send text for composites and arrays of
composites.

select * from pqlink_exec(connstr, 'select $1 + $2', 3, 4) as R(v int);
v
---
7
(1 row)

merlin


From: David Fetter <david(at)fetter(dot)org>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: dblink bulk operations
Date: 2009-08-06 15:44:28
Message-ID: 20090806154428.GJ4185@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Aug 06, 2009 at 11:11:58AM -0400, Andrew Dunstan wrote:
>
> Last night I needed to move a bunch of data from an OLTP database to an
> archive database, and used dblink with a bunch of insert statements.
> Since I was moving about 4m records this was distressingly but not
> surprisingly slow. It set me wondering why we don't build more support
> for libpq operations into dblink, like transactions and prepared
> queries, and maybe COPY too. It would be nice to be able to do something
> like:
>
> select dblink_connect('dbh','dbname=foo');
> select dblink_begin('dbh');
> select dblink_prepare('dbh','sth','insert into bar values ($1,$2,$3)');
> select dblink_exec_prepared('dbh','sth',row(a,b,c)) from bar; -- can
> we do this?
> select dblink_commit('dbh');
> select dblink_disconnect('dbh');
>
>
> Does this seem worthwhile and doable, or am I smoking crack?

For what it's worth, DBI-Link provides a lot of this.

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: David Fetter <david(at)fetter(dot)org>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: dblink bulk operations
Date: 2009-08-06 16:28:15
Message-ID: 4A7B049F.2040805@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

David Fetter wrote:
>
> For what it's worth, DBI-Link provides a lot of this.
>
>
>

Indeed, but that assumes that perl+DBI+DBD::Pg is available, which is by
no means always the case. If we're going to have a dblink module ISTM it
should be capable of reasonable bulk operations.

cheers

andrew


From: David Fetter <david(at)fetter(dot)org>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: dblink bulk operations
Date: 2009-08-06 16:37:58
Message-ID: 20090806163758.GK4185@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Aug 06, 2009 at 12:28:15PM -0400, Andrew Dunstan wrote:
> David Fetter wrote:
>>
>> For what it's worth, DBI-Link provides a lot of this.
>
> Indeed, but that assumes that perl+DBI+DBD::Pg is available, which
> is by no means always the case. If we're going to have a dblink
> module ISTM it should be capable of reasonable bulk operations.

I didn't mean to suggest that you should use DBI-Link, just that it's
a requirement that's come up in very similar contexts to that of
dblink.

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: dblink bulk operations
Date: 2009-08-06 16:49:01
Message-ID: b42b73150908060949y36ffbad3s63b29611b747d049@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Aug 6, 2009 at 11:11 AM, Andrew Dunstan<andrew(at)dunslane(dot)net> wrote:
>
> Last night I needed to move a bunch of data from an OLTP database to an
> archive database, and used dblink with a bunch of insert statements. Since I
> was moving about 4m records this was distressingly but not surprisingly
> slow. It set me wondering why we don't build more support for libpq
> operations into dblink, like transactions and prepared queries, and maybe
> COPY too. It would be nice to be able to do something like:
>
>   select dblink_connect('dbh','dbname=foo');
>   select dblink_begin('dbh');
>   select dblink_prepare('dbh','sth','insert into bar values ($1,$2,$3)');
>   select dblink_exec_prepared('dbh','sth',row(a,b,c)) from bar; -- can
>   we do this?
>   select dblink_commit('dbh');
>   select dblink_disconnect('dbh');

thinking about this some more, you can get pretty close with vanilla
dblink with something like (i didn't test):

select dblink_exec('dbh', 'prepare xyz as insert into foo select ($1::foo).*');
select dblink_exec('dbh', 'execute xyz(' || my_foo::text || ')');

This maybe defeats a little bit of what you are trying to achieve
(especially performance), but is much easier to craft for basically
any table as long as the fields match. The above runs into problems
with quoting (composite with bytea in it), but works ok most of the
time.

If you want faster/better, dblink need to be factored to parametrize
queries and, if possible, use binary.

merlin