Re: pg_dump.c

Lists: pgsql-hackers
From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: pg_dump.c
Date: 2011-09-08 19:20:14
Message-ID: 4E69156E.5060509@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


In the "refactoring Large C files" discussion one of the biggest files
Bruce mentioned is pg_dump.c. There has been discussion in the past of
turning lots of the knowledge currently embedded in this file into a
library, which would make it available to other clients (e.g. psql). I'm
not sure what a reasonable API for that would look like, though. Does
anyone have any ideas?

cheers

andrew


From: Pavel Golub <pavel(at)microolap(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_dump.c
Date: 2011-09-09 05:25:34
Message-ID: 326099617.20110909082534@gf.microolap.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello, Andrew.

You wrote:

AD> In the "refactoring Large C files" discussion one of the biggest files
AD> Bruce mentioned is pg_dump.c. There has been discussion in the past of
AD> turning lots of the knowledge currently embedded in this file into a
AD> library, which would make it available to other clients (e.g. psql).

+1
It would be great to have library with such functionality!

AD> I'm
AD> not sure what a reasonable API for that would look like, though. Does
AD> anyone have any ideas?

AD> cheers

AD> andrew

--
With best wishes,
Pavel mailto:pavel(at)gf(dot)microolap(dot)com


From: David Fetter <david(at)fetter(dot)org>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_dump.c
Date: 2011-09-11 14:25:07
Message-ID: 20110911142507.GB18580@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Sep 08, 2011 at 03:20:14PM -0400, Andrew Dunstan wrote:
>
> In the "refactoring Large C files" discussion one of the biggest
> files Bruce mentioned is pg_dump.c. There has been discussion in the
> past of turning lots of the knowledge currently embedded in this
> file into a library, which would make it available to other clients
> (e.g. psql). I'm not sure what a reasonable API for that would look
> like, though. Does anyone have any ideas?

Here's a sketch.

In essence, libpgdump should have the following areas of functionality:

- Discover the user-defined objects in the database.
- Tag each as pre-data, data, and post-data.
- Make available the dependency graph of the user-defined objects in the database.
- Enable the mechanical selection of subgraphs which may or may not be connected.
- Discover parallelization capability, if available.
- Dump requested objects of an arbitrary subset of the database,
optionally using such capability.

Then there's questions of scope, which I'm straddling the fence about.
Should there be separate libraries to transform and restore?

A thing I'd really like to have in a libdump would be to have the
RDBMS-specific parts as loadable modules, but that, too, could be way
out of scope for this.

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_dump.c
Date: 2011-09-11 15:26:59
Message-ID: CA+TgmoY=eFj+Z+MjG95YEm3t=zksXN7BtEbnAk4AXpTr=Fs7yQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Sep 8, 2011 at 3:20 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> In the "refactoring Large C files" discussion one of the biggest files Bruce
> mentioned is pg_dump.c. There has been discussion in the past of turning
> lots of the knowledge currently embedded in this file into a library, which
> would make it available to other clients (e.g. psql). I'm not sure what a
> reasonable API for that would look like, though. Does anyone have any ideas?

A good start might be to merge together more of pg_dump and pg_dumpall
than is presently the case.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: David Fetter <david(at)fetter(dot)org>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_dump.c
Date: 2011-09-11 16:18:40
Message-ID: 4E6CDF60.9000904@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 09/11/2011 10:25 AM, David Fetter wrote:
> On Thu, Sep 08, 2011 at 03:20:14PM -0400, Andrew Dunstan wrote:
>> In the "refactoring Large C files" discussion one of the biggest
>> files Bruce mentioned is pg_dump.c. There has been discussion in the
>> past of turning lots of the knowledge currently embedded in this
>> file into a library, which would make it available to other clients
>> (e.g. psql). I'm not sure what a reasonable API for that would look
>> like, though. Does anyone have any ideas?
> Here's a sketch.
>
> In essence, libpgdump should have the following areas of functionality:
>
> - Discover the user-defined objects in the database.
> - Tag each as pre-data, data, and post-data.
> - Make available the dependency graph of the user-defined objects in the database.
> - Enable the mechanical selection of subgraphs which may or may not be connected.
> - Discover parallelization capability, if available.
> - Dump requested objects of an arbitrary subset of the database,
> optionally using such capability.
>
> Then there's questions of scope, which I'm straddling the fence about.
> Should there be separate libraries to transform and restore?
>
> A thing I'd really like to have in a libdump would be to have the
> RDBMS-specific parts as loadable modules, but that, too, could be way
> out of scope for this.
>
>

In the first place, this isn't an API, it's a description of
functionality. A C library's API is expressed in its header files.

Also, I think you have seriously misunderstood the intended scope of the
library. Dumping and restoring, parallelization, and so on are not in
the scope I was thinking of. I think those are very properly the
property of pg_dump.c and friends. The only part I was thinking of
moving to a library was the discovery part, which is in fact a very
large part of pg_dump.c.

One example of what I'd like to provide is something this:

char * pg_get_create_sql(PGconn *conn, object oid, catalog_class
oid, pretty boolean);

Which would give you the sql to create an object, optionally pretty
printing it.

Another is:

char * pg_get_select(PGconn *conn, table_or_view oid, pretty
boolean, alias *char );

which would generate a select statement for all the fields in a given
table, with an optional alias prefix.

For the purposes of pg_dump, perhaps we'd want to move all the getFoo()
functions in pg_dump.c into the library, along with a couple of bits
from common.c like getSchemaData().

(Kinda thinking out loud here.)

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: David Fetter <david(at)fetter(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_dump.c
Date: 2011-09-11 18:50:06
Message-ID: 10792.1315767006@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> One example of what I'd like to provide is something this:

> char * pg_get_create_sql(PGconn *conn, object oid, catalog_class
> oid, pretty boolean);

> Which would give you the sql to create an object, optionally pretty
> printing it.

I think the major problem with creating a decent API here is that
"the SQL to create an object" is only a small part ... almost a trivial
part ... of what pg_dump needs to know about it. It's also aware of
ownership, permissions, schema membership, dependencies, etc etc.
I'm not sure about a reasonable representation for all that.

In particular, I think that discovering a safe dump order for a selected
set of objects is a pretty key portion of pg_dump's functionality.
Do we really want to assume that that needn't be included in a
hypothetical library?

Other issues include:

* pg_dump's habit of assuming that the SQL is being generated to work
with a current server as target, even when dumping from a much older
server. It's not clear to me that other clients for a library would
want that behavior ... but catering to multiple output versions would
kick the complexity up by an order of magnitude.

* a lot of other peculiar things that pg_dump does in the name of
backwards compatibility or robustness of the output script, which again
aren't necessarily useful for other purposes. An example here is the
choice to treat tablespace of a table as a separate property that's
not specified in the base CREATE TABLE command, so that the script
doesn't fail completely if the target database hasn't got such a
tablespace.

* performance. Getting the data retail per-object, as the above API
implies, would utterly suck. You have to think a little more carefully
about the integration between the discovery phase and the output phase,
as in there has to be a good deal of it.

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: David Fetter <david(at)fetter(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_dump.c
Date: 2011-09-11 19:18:13
Message-ID: 4E6D0975.1010804@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 09/11/2011 02:50 PM, Tom Lane wrote:
> In particular, I think that discovering a safe dump order for a selected
> set of objects is a pretty key portion of pg_dump's functionality.
> Do we really want to assume that that needn't be included in a
> hypothetical library?

Maybe. Who else would need it?

> Other issues include:
>
> * pg_dump's habit of assuming that the SQL is being generated to work
> with a current server as target, even when dumping from a much older
> server. It's not clear to me that other clients for a library would
> want that behavior ... but catering to multiple output versions would
> kick the complexity up by an order of magnitude.

Good point. Maybe what we need to think about instead is adding some
backend functions to do the sort of things I want. That would avoid
version issues and have the advantage that it would be available to all
clients, as well as avoiding possible performance issues you mention.

cheers

andrew


From: Rob Wultsch <wultsch(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: David Fetter <david(at)fetter(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_dump.c
Date: 2011-09-11 19:33:28
Message-ID: CAGdn2uiZEo7j6yyCSxfOydUsAEe7dbrZPCSN96DGX+w17P1HOg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Sep 11, 2011 at 9:18 AM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
>
> On 09/11/2011 10:25 AM, David Fetter wrote:
>>
>> On Thu, Sep 08, 2011 at 03:20:14PM -0400, Andrew Dunstan wrote:
>>>
>>> In the "refactoring Large C files" discussion one of the biggest
>>> files Bruce mentioned is pg_dump.c. There has been discussion in the
>>> past of turning lots of the knowledge currently embedded in this
>>> file into a library, which would make it available to other clients
>>> (e.g. psql). I'm not sure what a reasonable API for that would look
>>> like, though. Does anyone have any ideas?
>>
>> Here's a sketch.
>>
>> In essence, libpgdump should have the following areas of functionality:
>>
>> - Discover the user-defined objects in the database.
>> - Tag each as pre-data, data, and post-data.
>> - Make available the dependency graph of the user-defined objects in the
>> database.
>> - Enable the mechanical selection of subgraphs which may or may not be
>> connected.
>> - Discover parallelization capability, if available.
>> - Dump requested objects of an arbitrary subset of the database,
>>   optionally using such capability.
>>
>> Then there's questions of scope, which I'm straddling the fence about.
>> Should there be separate libraries to transform and restore?
>>
>> A thing I'd really like to have in a libdump would be to have the
>> RDBMS-specific parts as loadable modules, but that, too, could be way
>> out of scope for this.
>>
>>
>
> In the first place, this isn't an API, it's a description of functionality.
> A C library's API is expressed in its header files.
>
> Also, I think you have seriously misunderstood the intended scope of the
> library. Dumping and restoring, parallelization, and so on are not in the
> scope I was thinking of. I think those are very properly the property of
> pg_dump.c and friends. The only part I was thinking of moving to a library
> was the discovery part, which is in fact a very large part of pg_dump.c.
>
> One example of what I'd like to provide is something this:
>
>    char * pg_get_create_sql(PGconn *conn, object oid, catalog_class oid,
> pretty boolean);
>
> Which would give you the sql to create an object, optionally pretty printing
> it.
>
> Another is:
>
>    char * pg_get_select(PGconn *conn, table_or_view oid, pretty boolean,
> alias *char );
>
> which would generate a select statement for all the fields in a given table,
> with an optional alias prefix.
>
> For the purposes of pg_dump, perhaps we'd want to move all the getFoo()
> functions in pg_dump.c into the library, along with a couple of bits from
> common.c like getSchemaData().
>
> (Kinda thinking out loud here.)
>
> cheers
>
> andrew
>
>
>

For whatever it is worth, the "SHOW CREATE TABLE" command in MySQL is
well loved. Having the functionality to generate SQL in the server can
be very nice.

--
Rob Wultsch
wultsch(at)gmail(dot)com