Re: Suggestion for --truncate-tables to pg_restore

Lists: pgsql-hackers
From: "Karl O(dot) Pinc" <kop(at)meme(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Suggestion for --truncate-tables to pg_restore
Date: 2012-09-20 17:24:49
Message-ID: 1348161889.25476.1@mofo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

I've had problems using pg_restore --data-only when
restoring individual schemas (which contain data which
has had bad things done to it). --clean does not work
well because of dependent objects in other schemas.

Patch to the docs attached (before I go and do any
real coding.)

Karl <kop(at)meme(dot)com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

Attachment Content-Type Size
pg_restore.sgml.diff text/x-patch 1.1 KB

From: "Karl O(dot) Pinc" <kop(at)meme(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Suggestion for --truncate-tables to pg_restore
Date: 2012-09-21 15:54:05
Message-ID: 1348242845.21480.0@mofo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 09/20/2012 12:24:49 PM, Karl O. Pinc wrote:

> I've had problems using pg_restore --data-only when
> restoring individual schemas (which contain data which
> has had bad things done to it). --clean does not work
> well because of dependent objects in other schemas.

Before doing any more work I want to report on the
discussions that took place at the code sprint at
Postgres Open in Chicago. Because I'm going to add
in additional thoughts I've had and to avoid mis-representing
anybody's opinion I'll not mention who said what.
Feel free to step forward and claim Ingenious Ideas
as your own. Likewise I apologize if lack of attribution
makes it more difficult to discern (my) uninformed drivel
from intelligent insight.

----

First, the problem:

Begin with the following structure:

CREATE TABLE schemaA.foo (id PRIMARY KEY, data INT);

CREATE VIEW schemaB.bar AS SELECT * FROM schemaA.foo;

Then, by accident, somebody does:

UPDATE schemaA.foo SET data = data + (RANDOM() * 1000)::INT;

So, you want to restore the data into schemaA.foo.
But schemaA.foo has (bad) data in it that must first
be removed. It would seem that using

pg_restore --clean -n schemaA -t foo my_pg_dump_backup

would solve the problem, it would drop schemaA.foo,
recreate it, and then restore the data. But this does
not work. schemaA.foo does not drop because it's
got a dependent database object, schemaB.bar.

Of course there are manual work-arounds. One of these
is truncating schemaA.foo and then doing a pg_restore
with --data-only. The manual work-arounds become
increasingly burdensome as you need to restore more
tables. The case that motivated me was an attempt
to restore the data in an entire schema, one which
contained a significant number of tables.

So, the idea here is to be able to do a data-only
restore, first truncating the data in the tables
being restored to remove the existing corrupted data.

The proposal is to add a --truncate-tables option
to pg_restore.

----

There were some comments on syntax.

I proposed to use -u as a short option. This was
thought confusing, given it's use in other
Unix command line programs (mysql). Since there's
no obvious short option, forget it. Just have
a long option.

Another choice is to avoid introducing yet another
option and instead overload --clean so that when
doing a --data-only restore --clean truncates tables
and otherwise --clean retains the existing behavior of
dropping and re-creating the restored objects.

(I tested pg_restore with 9.1 and when --data-only is
used --clean is ignored, it does not even produce a warning.
This is arguably a bug.)

----

More serious objections were raised regarding semantics.

What if, instead, the initial structure looked like:

CREATE TABLE schemaA.foo
(id PRIMARY KEY, data INT);

CREATE TABLE schemaB.bar
(id INT CONSTRAINT "bar_on_foo" REFERENCES foo
, moredata INT);

With a case like this, in most real-world situations, you'd
have to use pg_restore with --disable-triggers if you wanted
to use --data-only and --truncate-tables. The possibility of
foreign key referential integrity corruption is obvious.

Aside: Unless you're restoring databases in their entirety
the pg_restore --disable-triggers option makes it easy to
introduce foreign key referential integrity corruption.
In fact, since pg_restore does not wrap it's operations
in one big transaction, it's easy to attempt restoration
of a portion of a database, have part of the process
succeed and part of it fail (due to either schema
or data dependencies), and be left off worse
than before you started. The pg_restore docs might
benefit from a big fat warning regarding
attempts to restore less than an entire database.

So, the discussion went, pg_restore is just another
application and introducing more options
which could lead to corruption of referential integrity is
a bad idea.

But pg_restore should not be thought of as just another
front-end. It should be thought of as a data recovery
tool. Recovering some data and being left with referential
integrity problems is better than having no data. This
is true even if, due to different users owning different
schemas and so forth, nobody knows exactly what
might be broken.

Yes, but we can do better. (The unstated sub-text being that
we don't want to introduce an inferior feature which
will then need to be supported forever.)

How could we do better:

Here I will record only the ideas related to restore,
although there was some mention of dump as well.

There has apparently been some discussion of writing
a foreign data wrapper which would operate on a database
dump. This might (in ways that are not immediately
obvious to me) address this issue.

The restore process could, based on what table data needs
restoration, look at foreign key dependencies and produce a
list of the tables which all must be restored into order to
ensure foreign key referential integrity. In the case of
restoration into a empty database the foreign key
dependences must be calculated from the dump. (An
"easy" way to do this would be to create
all the database objects in some temporary place and query
the system catalogs to produce the dependency graph.)
In the case of restoration into an
existing database the foreign key dependences should
come from the database into which the data is to be restored.
(This is necessary to capture dependences which may have
been introduced after the dump was taken.)

The above applies to data-only restoration. When restoring the
database schema meta-information (object definition) a similar
graph of database object dependences must be produced and used
to determine what needs to be restored.

But when doing a partial data-only restore there is more
to data integrity than just foreign key referential integrity.
Other constraints and triggers ensure other sorts of
data integrity rules. It is not enough to leave
triggers turned on when restoring data. Data not
restored may validate against restored data in triggers
fired only on manipulation of the un-restored table content.
The only solution I can see is to also include in the
computed set of tables which require restoration those
tables having triggers which reference any of the restored
data.

Just how far should pg_restore go in attempting to
preserve data integrity?

----

Two things are clear:

The current table and schema oriented options for backing
up and restoring portions of databases are flawed with
respect to data integrity.

Life is complicated.

Where should I go from here? I am not now in a position to
pursue anything more complicated than completing the code to
add a --truncate-tables option to pg_restore. Should I
finish this and send in a patch?

Karl <kop(at)meme(dot)com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein


From: "Karl O(dot) Pinc" <kop(at)meme(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Suggestion for --truncate-tables to pg_restore
Date: 2012-09-23 05:19:07
Message-ID: 1348377547.29247.1@mofo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 09/21/2012 10:54:05 AM, Karl O. Pinc wrote:
> On 09/20/2012 12:24:49 PM, Karl O. Pinc wrote:
>
> > I've had problems using pg_restore --data-only when
> > restoring individual schemas (which contain data which
> > has had bad things done to it). --clean does not work
> > well because of dependent objects in other schemas.

Since there wasn't much more to do I've gone ahead
and written the patch. Works for me.

Against git master.
Passes regression tests, but there's no regression
tests for pg_restore so this does not say much.
Since there's no regression tests I've not written one.

Since this is a real patch for application I've given
it a new name (it's not a v2).

Truncate done right before COPY, since that's what
the parallel restores do.

Karl <kop(at)meme(dot)com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

Attachment Content-Type Size
pg_restore_truncate.patch text/x-patch 0 bytes

From: "Karl O(dot) Pinc" <kop(at)meme(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Suggestion for --truncate-tables to pg_restore
Date: 2012-09-23 05:24:27
Message-ID: 1348377867.968.0@mofo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Whoops. Do over. Sent the wrong file.

On 09/23/2012 12:19:07 AM, Karl O. Pinc wrote:
> On 09/21/2012 10:54:05 AM, Karl O. Pinc wrote:
> > On 09/20/2012 12:24:49 PM, Karl O. Pinc wrote:
> >
> > > I've had problems using pg_restore --data-only when
> > > restoring individual schemas (which contain data which
> > > has had bad things done to it). --clean does not work
> > > well because of dependent objects in other schemas.
>
> Since there wasn't much more to do I've gone ahead
> and written the patch. Works for me.
>
> Against git master.
> Passes regression tests, but there's no regression
> tests for pg_restore so this does not say much.
> Since there's no regression tests I've not written one.
>
> Since this is a real patch for application I've given
> it a new name (it's not a v2).
>
> Truncate done right before COPY, since that's what
> the parallel restores do.

Karl <kop(at)meme(dot)com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

Attachment Content-Type Size
pg_restore_truncate.patch text/x-patch 5.4 KB

From: "Karl O(dot) Pinc" <kop(at)meme(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Suggestion for --truncate-tables to pg_restore
Date: 2012-09-24 01:52:07
Message-ID: 1348451527.32124.0@mofo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Attached is version 2. The sgml did not build.

On 09/23/2012 12:24:27 AM, Karl O. Pinc wrote:
> Whoops. Do over. Sent the wrong file.
>
> On 09/23/2012 12:19:07 AM, Karl O. Pinc wrote:
> > On 09/21/2012 10:54:05 AM, Karl O. Pinc wrote:
> > > On 09/20/2012 12:24:49 PM, Karl O. Pinc wrote:
> > >
> > > > I've had problems using pg_restore --data-only when
> > > > restoring individual schemas (which contain data which
> > > > has had bad things done to it). --clean does not work
> > > > well because of dependent objects in other schemas.
> >
> > Since there wasn't much more to do I've gone ahead
> > and written the patch. Works for me.
> >
> > Against git master.
> > Passes regression tests, but there's no regression
> > tests for pg_restore so this does not say much.
> > Since there's no regression tests I've not written one.
> >
> > Since this is a real patch for application I've given
> > it a new name (it's not a v2).
> >
> > Truncate done right before COPY, since that's what
> > the parallel restores do.
>
>
> Karl <kop(at)meme(dot)com>
> Free Software: "You don't pay back, you pay forward."
> -- Robert A. Heinlein
>
>

------quoted attachment------
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Karl <kop(at)meme(dot)com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

Attachment Content-Type Size
pg_restore_truncate_v2.patch text/x-patch 5.4 KB

From: "Karl O(dot) Pinc" <kop(at)meme(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Suggestion for --truncate-tables to pg_restore
Date: 2012-10-17 02:48:06
Message-ID: 1350442086.7618.4@mofo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Attached is version 3.

The convention seems to be to leave the operator at the
end of the line when breaking long lines, so do that.
Add extra () -- make operator precedence explicit and
have indentation reflect operator precedence.

On 09/23/2012 08:52:07 PM, Karl O. Pinc wrote:

> On 09/23/2012 12:24:27 AM, Karl O. Pinc wrote:

> > On 09/23/2012 12:19:07 AM, Karl O. Pinc wrote:
> > > On 09/21/2012 10:54:05 AM, Karl O. Pinc wrote:
> > > > On 09/20/2012 12:24:49 PM, Karl O. Pinc wrote:
> > > >
> > > > > I've had problems using pg_restore --data-only when
> > > > > restoring individual schemas (which contain data which
> > > > > has had bad things done to it). --clean does not work
> > > > > well because of dependent objects in other schemas.
> > >
> > > Since there wasn't much more to do I've gone ahead
> > > and written the patch. Works for me.
> > >
> > > Against git master.
> > > Passes regression tests, but there's no regression
> > > tests for pg_restore so this does not say much.
> > > Since there's no regression tests I've not written one.
> > >
> > > Since this is a real patch for application I've given
> > > it a new name (it's not a v2).
> > >
> > > Truncate done right before COPY, since that's what
> > > the parallel restores do.

Karl <kop(at)meme(dot)com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

Attachment Content-Type Size
pg_restore_truncate_v3.patch text/x-patch 5.4 KB

From: "Karl O(dot) Pinc" <kop(at)meme(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Suggestion for --truncate-tables to pg_restore
Date: 2012-11-12 17:45:44
Message-ID: 1352742344.21373.4@mofo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Attached is version 4. Version 3 no longer
built against head.

On 10/16/2012 09:48:06 PM, Karl O. Pinc wrote:
>
> On 09/23/2012 08:52:07 PM, Karl O. Pinc wrote:
>
> > On 09/23/2012 12:24:27 AM, Karl O. Pinc wrote:
>
> > > On 09/23/2012 12:19:07 AM, Karl O. Pinc wrote:
> > > > On 09/21/2012 10:54:05 AM, Karl O. Pinc wrote:
> > > > > On 09/20/2012 12:24:49 PM, Karl O. Pinc wrote:
> > > > >
> > > > > > I've had problems using pg_restore --data-only when
> > > > > > restoring individual schemas (which contain data which
> > > > > > has had bad things done to it). --clean does not work
> > > > > > well because of dependent objects in other schemas.
> > > >
> > > > Since there wasn't much more to do I've gone ahead
> > > > and written the patch. Works for me.
> > > >
> > > > Against git master.
> > > > Passes regression tests, but there's no regression
> > > > tests for pg_restore so this does not say much.
> > > > Since there's no regression tests I've not written one.
> > > >
> > > > Since this is a real patch for application I've given
> > > > it a new name (it's not a v2).
> > > >
> > > > Truncate done right before COPY, since that's what
> > > > the parallel restores do.
>
>
> Karl <kop(at)meme(dot)com>
> Free Software: "You don't pay back, you pay forward."
> -- Robert A. Heinlein
>
>

------quoted attachment------
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Karl <kop(at)meme(dot)com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

Attachment Content-Type Size
pg_restore_truncate_v4.patch text/x-patch 5.4 KB

From: Josh Kupershmidt <schmiddy(at)gmail(dot)com>
To: "Karl O(dot) Pinc" <kop(at)meme(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Suggestion for --truncate-tables to pg_restore
Date: 2012-11-21 05:53:23
Message-ID: CAK3UJRHFHN_q6AOib7mtchwS3KXdDGyPg5awWAHOF3R=1DdjOw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Karl,
I signed on to review this patch for the current CF. Most of the
background for the patch seems to be in the message below, so I'm
going to respond to this one first.

On Fri, Sep 21, 2012 at 8:54 AM, Karl O. Pinc <kop(at)meme(dot)com> wrote:
> On 09/20/2012 12:24:49 PM, Karl O. Pinc wrote:
>
>> I've had problems using pg_restore --data-only when
>> restoring individual schemas (which contain data which
>> has had bad things done to it). --clean does not work
>> well because of dependent objects in other schemas.

OK.

> ----
>
> First, the problem:
>
> Begin with the following structure:
>
> CREATE TABLE schemaA.foo (id PRIMARY KEY, data INT);
>
> CREATE VIEW schemaB.bar AS SELECT * FROM schemaA.foo;
>
> Then, by accident, somebody does:
>
> UPDATE schemaA.foo SET data = data + (RANDOM() * 1000)::INT;
>
> So, you want to restore the data into schemaA.foo.
> But schemaA.foo has (bad) data in it that must first
> be removed. It would seem that using
>
> pg_restore --clean -n schemaA -t foo my_pg_dump_backup
>
> would solve the problem, it would drop schemaA.foo,
> recreate it, and then restore the data. But this does
> not work. schemaA.foo does not drop because it's
> got a dependent database object, schemaB.bar.

Right.

> Of course there are manual work-arounds. One of these
> is truncating schemaA.foo and then doing a pg_restore
> with --data-only.

Simply doing TRUNCATE manually was the first thought that occurred to
me when I saw your example.

> The manual work-arounds become
> increasingly burdensome as you need to restore more
> tables. The case that motivated me was an attempt
> to restore the data in an entire schema, one which
> contained a significant number of tables.

TBH, I didn't find the example above particularly compelling for
demonstrating the need for this feature. If you've just got one table
with dependent views which needs to be restored, it's pretty easy to
manually TRUNCATE and have pg_restore --data-only reload the table.
(And easy enough to combine the truncate and restore into a single
transaction in case anything goes wrong, if need be.)

But I'm willing to grant that this proposed feature is potentially as
useful as existing restore-jiggering options like --disable-triggers.
And I guess I could see that if you're really stuck having to perform
a --data-only restore of many tables, this feature could come in
handy.

> So, the idea here is to be able to do a data-only
> restore, first truncating the data in the tables
> being restored to remove the existing corrupted data.
>
> The proposal is to add a --truncate-tables option
> to pg_restore.
>
> ----
>
> There were some comments on syntax.
>
> I proposed to use -u as a short option. This was
> thought confusing, given it's use in other
> Unix command line programs (mysql). Since there's
> no obvious short option, forget it. Just have
> a long option.

Agreed.

> Another choice is to avoid introducing yet another
> option and instead overload --clean so that when
> doing a --data-only restore --clean truncates tables
> and otherwise --clean retains the existing behavior of
> dropping and re-creating the restored objects.

I like the --truncate-tables flag idea better than overloading
--clean, since it makes the purpose immediately apparent.

> (I tested pg_restore with 9.1 and when --data-only is
> used --clean is ignored, it does not even produce a warning.
> This is arguably a bug.)

+1 for having pg_restore bail out with an error if the user specifies
--data-only --clean. By the same token, --clean and --truncate-tables
together should also raise a "not allowed" error.

> ----
>
> More serious objections were raised regarding semantics.
>
> What if, instead, the initial structure looked like:
>
> CREATE TABLE schemaA.foo
> (id PRIMARY KEY, data INT);
>
> CREATE TABLE schemaB.bar
> (id INT CONSTRAINT "bar_on_foo" REFERENCES foo
> , moredata INT);
>
> With a case like this, in most real-world situations, you'd
> have to use pg_restore with --disable-triggers if you wanted
> to use --data-only and --truncate-tables. The possibility of
> foreign key referential integrity corruption is obvious.

Why would --disable-triggers be used in this example? I don't think
you could use --truncate-tables to restore only table "foo", because
you would get:

ERROR: cannot truncate a table referenced in a foreign key constraint

(At least, I think so, not having tried with the actual patch.) You
could use --disable-triggers when restoring "bar". Though if you're
just enabling that option for performance purposes, and are unable to
guarantee that the restore will leave the tables in a consistent
state, well then it seems like you shouldn't use the option.

> Aside: Unless you're restoring databases in their entirety
> the pg_restore --disable-triggers option makes it easy to
> introduce foreign key referential integrity corruption.

Yup, and I think the docs could do more to warn users about
--disable-triggers in particular. And I see you've submitted a
separate patch along those lines.

> In fact, since pg_restore does not wrap it's operations
> in one big transaction, it's easy to attempt restoration
> of a portion of a database, have part of the process
> succeed and part of it fail (due to either schema
> or data dependencies), and be left off worse
> than before you started.

That's what the --single-transaction option is for.

> So, the discussion went, pg_restore is just another
> application and introducing more options
> which could lead to corruption of referential integrity is
> a bad idea.

I do agree that increasing the ways for pg_restore to be a foot-gun is
a Bad Idea.

> But pg_restore should not be thought of as just another
> front-end. It should be thought of as a data recovery
> tool.

I don't totally agree that charter for pg_restore should be a "data
recovery tool" (i.e. general purpose), but for the sake of this patch
we can leave that aside.

> Recovering some data and being left with referential
> integrity problems is better than having no data.

Well, this is really a judgment call, and I have a hunch that many
admins would actually choose "none of the above". And I think this
point gets to the crux of whether the --truncate-tables option will be
useful as-is.

For your first example, --truncate-tables seems to have some use, so
that the admin isn't forced to recreate various views which may be
dependent on the table. (Though it's not too difficult to work around
this case today.)

For the second example involving FKs, I'm a little bit more hesitant
about endorsing the use of --truncate-tables combined with
--disable-triggers to potentially allow bogus FKs. I know this is
possible to some extent today using the --disable-triggers option, but
it makes me nervous to be adding a mode to pg_restore if it's
primarily designed to work together with --disable-triggers as a
larger foot-gun.

> This
> is true even if, due to different users owning different
> schemas and so forth, nobody knows exactly what
> might be broken.
>
> Yes, but we can do better. (The unstated sub-text being that
> we don't want to introduce an inferior feature which
> will then need to be supported forever.)
>
> How could we do better:
>
> Here I will record only the ideas related to restore,
> although there was some mention of dump as well.
>
> There has apparently been some discussion of writing
> a foreign data wrapper which would operate on a database
> dump. This might (in ways that are not immediately
> obvious to me) address this issue.

That's an interesting idea. If you could SELECT directly out of a dump
file via FDW, you could handle the restore process purely in SQL. But
not directly relevant to this patch.

> The restore process could, based on what table data needs
> restoration, look at foreign key dependencies and produce a
> list of the tables which all must be restored into order to
> ensure foreign key referential integrity.

One mode of operation of pg_restore is outputting to a file or pipe
for subsequent processing, which of course wouldn't work with this
idea of having the restore be dependent on the target database
structure.

> [snip more discussion of pg_restore possibly reordering objects and ensuring integrity]

For the purposes of actually completing a review of the patch in
question, I'm going to avoid further blue-sky speculation here.

Just a few initial comments about the doc portion of the patch:

+ This option is only relevant when performing a data-only

If we are going to restrict the --truncate-tables option to be used
with --data-only, "only allowed" may be more clear than "only
relevant".

+ <para>
+ The <option>--disable-triggers</option> will almost always
+ always need to be used in conjunction with this option to
+ disable check constraints on foreign keys.
+ </para>

For the first example you posted, of a view dependent on a table which
needed to be restored, this advice would not be accurate. IMO it's a
little dangerous advising users to "almost always" use a foot-gun like
--disable-triggers.

I'm out of time for today, and I haven't had a chance to actually try
out the patch, but I wanted to send off my thoughts so far. I should
have a chance for another look later this week.

Josh


From: "Karl O(dot) Pinc" <kop(at)meme(dot)com>
To: Josh Kupershmidt <schmiddy(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Suggestion for --truncate-tables to pg_restore
Date: 2012-11-21 10:48:56
Message-ID: 1353494936.3737.0@mofo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Josh,

On 11/20/2012 11:53:23 PM, Josh Kupershmidt wrote:
> Hi Karl,
> I signed on to review this patch for the current CF.

I noticed. Thanks very much.

> On Fri, Sep 21, 2012 at 8:54 AM, Karl O. Pinc <kop(at)meme(dot)com> wrote:
> > On 09/20/2012 12:24:49 PM, Karl O. Pinc wrote:

> > First, the problem:
> >
> > Begin with the following structure:
> >
> > CREATE TABLE schemaA.foo (id PRIMARY KEY, data INT);
> >
> > CREATE VIEW schemaB.bar AS SELECT * FROM schemaA.foo;
> >
> > Then, by accident, somebody does:
> >
> > UPDATE schemaA.foo SET data = data + (RANDOM() * 1000)::INT;
> >
> > So, you want to restore the data into schemaA.foo.
> > But schemaA.foo has (bad) data in it that must first
> > be removed. It would seem that using
> >
> > pg_restore --clean -n schemaA -t foo my_pg_dump_backup
> >
> > would solve the problem, it would drop schemaA.foo,
> > recreate it, and then restore the data. But this does
> > not work. schemaA.foo does not drop because it's
> > got a dependent database object, schemaB.bar.

> TBH, I didn't find the example above particularly compelling for
> demonstrating the need for this feature. If you've just got one table
> with dependent views which needs to be restored, it's pretty easy to
> manually TRUNCATE and have pg_restore --data-only reload the table.
> (And easy enough to combine the truncate and restore into a single
> transaction in case anything goes wrong, if need be.)

I was not clear in stating the problem. (But see below
regarding foreign keys.) The dependent view
was an example, but there can also be REFERENCES constraints and
trigger related constraints involving other tables in other schemas.
The easiest work-around I can think of here is destroying all the
triggers and constraints, either everything in the whole db
or doing some work to be more selective, truncating all the schema's
tables. doing a data-only restore of the
schema, and then pg_restore --data-only, and then re-creating
all the triggers and constraints.

>
> But I'm willing to grant that this proposed feature is potentially as
> useful as existing restore-jiggering options like --disable-triggers.
> And I guess I could see that if you're really stuck having to perform
> a --data-only restore of many tables, this feature could come in
> handy.

I think so. See above.

>
> > So, the idea here is to be able to do a data-only
> > restore, first truncating the data in the tables
> > being restored to remove the existing corrupted data.
> >
> > The proposal is to add a --truncate-tables option
> > to pg_restore.
> >
> > ----

> > (I tested pg_restore with 9.1 and when --data-only is
> > used --clean is ignored, it does not even produce a warning.
> > This is arguably a bug.)
>
> +1 for having pg_restore bail out with an error if the user specifies
> --data-only --clean. By the same token, --clean and --truncate-tables
> together should also raise a "not allowed" error.

OT:
After looking at the code I found a number of "conflicting"
option combinations are not tested for or rejected. I can't
recall what they are now. The right way to fix these would be
to send in a separate patch for each "conflict" all attached
to one email/commitfest item? Or one patch that just gets
adjusted until it's right?

>
> > ----
> >
> > More serious objections were raised regarding semantics.
> >
> > What if, instead, the initial structure looked like:
> >
> > CREATE TABLE schemaA.foo
> > (id PRIMARY KEY, data INT);
> >
> > CREATE TABLE schemaB.bar
> > (id INT CONSTRAINT "bar_on_foo" REFERENCES foo
> > , moredata INT);
> >
> > With a case like this, in most real-world situations, you'd
> > have to use pg_restore with --disable-triggers if you wanted
> > to use --data-only and --truncate-tables. The possibility of
> > foreign key referential integrity corruption is obvious.
>
> Why would --disable-triggers be used in this example? I don't think
> you could use --truncate-tables to restore only table "foo", because
> you would get:
>
> ERROR: cannot truncate a table referenced in a foreign key
> constraint
>
> (At least, I think so, not having tried with the actual patch.) You
> could use --disable-triggers when restoring "bar".

I tried it and you're quite right. (I thought I'd tried this
before, but clearly I did not -- proving the utility of the review
process. :-) My assumption was that since triggers
were turned off that constraints, being triggers, would be off as
well and therefore tables with foreign keys could be truncated.
Obviously not, since the I get the above error.

I just tried it. --disable-triggers does not disable constraints.

>
> > Recovering some data and being left with referential
> > integrity problems is better than having no data.
>
> Well, this is really a judgment call, and I have a hunch that many
> admins would actually choose "none of the above". And I think this
> point gets to the crux of whether the --truncate-tables option will
> be
> useful as-is.

Well, yes. "None of the above" is best. :) In my case I had munged
the data in the one important schema and wanted to restore. The
setting is an academic one and a lot of cruft gets left laying
around in the other schemas, some of which consist entirely of cruft.
Responsibility for the non-main schema content is distributed.

In the interest of getting things working again quickly I wished to
restore the important schema quickly, and didn't want to have to sort
through the cruft ahead of time. In other words, I was entirely
willing to break things and pick up the pieces afterwords.

>
> For your first example, --truncate-tables seems to have some use, so
> that the admin isn't forced to recreate various views which may be
> dependent on the table. (Though it's not too difficult to work around
> this case today.)

As an aside: I never have an issue with this, having planned ahead.
I'm curious what the not-too-difficult work-around is that you have
in mind. I'm not coming up with a tidy way to, e.g, pull a lot
of views out of a dump.

> For the second example involving FKs, I'm a little bit more hesitant
> about endorsing the use of --truncate-tables combined with
> --disable-triggers to potentially allow bogus FKs. I know this is
> possible to some extent today using the --disable-triggers option,
> but
> it makes me nervous to be adding a mode to pg_restore if it's
> primarily designed to work together with --disable-triggers as a
> larger foot-gun.

This is the crux of the issue, and where I was thinking of
taking this patch. I very often am of the mindset that
foreign keys are no more or less special than other
more complex data integrity rules enforced with triggers.
(I suppose others might say the same regards to integrity
enforced at the application level.) This naturally
inclines me to think that one more way, beyond
--disable-triggers, to break integrity is no big deal.

But I quite see your point. Is it possible to get
resolution on this issue before either of us do any
more work in the direction of foreign keys?

An additional foot-gun, --disable-constraints,
seems like the natural progression in this direction.
Constraints, unlike triggers (?), can, in theory,
be fired at any time to check data content so perhaps
providing a way to test existing constraints against
db content would be a way to mitigate the foot-gun-ness
and drive an after-the-fact data cleanup process.

--disable-constraints seems like an entirely separate
patch so maybe we can stop the FK related issue right
here? (Although I would appreciate feedback regards
whether such an option might be accepted, at minimum
I'd like to get this out of my brain.)

>
> Just a few initial comments about the doc portion of the patch:
>
> + This option is only relevant when performing a data-only
>
> If we are going to restrict the --truncate-tables option to be used
> with --data-only, "only allowed" may be more clear than "only
> relevant".

Make sense to me. My intent here was to use the language
used elsewhere in the docs but I'm happy to go in a better direction.
(And perhaps later submit more patches to move other parts of the docs
in that direction.)

>
> + <para>
> + The <option>--disable-triggers</option> will almost always
> + always need to be used in conjunction with this option to
> + disable check constraints on foreign keys.
> + </para>
>
> For the first example you posted, of a view dependent on a table
> which
> needed to be restored, this advice would not be accurate. IMO it's a
> little dangerous advising users to "almost always" use a foot-gun
> like
> --disable-triggers.

Sorta brings us back to the sticking point above....
And sounds like, at the moment at least, this paragraph can
be deleted.

> I'm out of time for today, and I haven't had a chance to actually try
> out the patch, but I wanted to send off my thoughts so far. I should
> have a chance for another look later this week.

Thanks for the work.

I'll hold off on submitting any revisions to the patch for the moment.

One thing you'll want to pay attention to is the point
in the restore process at which the truncation is done.
In the current version each table is truncated immediately
before being copied. It might or might not be better to do
all the truncation up-front, in the fashion of --clean.
I would appreciate some guidance on this.

In case it's helpful I'm attaching two files
I used to test the foreign key issue.

Regards,

Karl <kop(at)meme(dot)com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

Attachment Content-Type Size
pg_restore_test.sql text/x-sql 391 bytes
pg_restore_test.sh application/x-shellscript 206 bytes

From: Josh Kupershmidt <schmiddy(at)gmail(dot)com>
To: "Karl O(dot) Pinc" <kop(at)meme(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Suggestion for --truncate-tables to pg_restore
Date: 2012-11-23 19:54:43
Message-ID: CAK3UJRHx6auRt5aSnx0OeJMZeK3Gr+MZFnH=CKofQBRfnGVykQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Nov 21, 2012 at 5:48 AM, Karl O. Pinc <kop(at)meme(dot)com> wrote:

>> On Fri, Sep 21, 2012 at 8:54 AM, Karl O. Pinc <kop(at)meme(dot)com> wrote:
>> > On 09/20/2012 12:24:49 PM, Karl O. Pinc wrote:

> OT:
> After looking at the code I found a number of "conflicting"
> option combinations are not tested for or rejected. I can't
> recall what they are now. The right way to fix these would be
> to send in a separate patch for each "conflict" all attached
> to one email/commitfest item? Or one patch that just gets
> adjusted until it's right?

Typically I think it's easiest for the reviewer+committer to consider
a bunch of such similar changes altogether in one patch, rather than
in a handful of separate patches, though that's just my own
preference.

>> > ----
>> >
>> > More serious objections were raised regarding semantics.
>> >
>> > What if, instead, the initial structure looked like:
>> >
>> > CREATE TABLE schemaA.foo
>> > (id PRIMARY KEY, data INT);
>> >
>> > CREATE TABLE schemaB.bar
>> > (id INT CONSTRAINT "bar_on_foo" REFERENCES foo
>> > , moredata INT);
>> >
>> > With a case like this, in most real-world situations, you'd
>> > have to use pg_restore with --disable-triggers if you wanted
>> > to use --data-only and --truncate-tables. The possibility of
>> > foreign key referential integrity corruption is obvious.
>>
>> Why would --disable-triggers be used in this example? I don't think
>> you could use --truncate-tables to restore only table "foo", because
>> you would get:
>>
>> ERROR: cannot truncate a table referenced in a foreign key
>> constraint
>>
>> (At least, I think so, not having tried with the actual patch.) You
>> could use --disable-triggers when restoring "bar".
>
> I tried it and you're quite right. (I thought I'd tried this
> before, but clearly I did not -- proving the utility of the review
> process. :-) My assumption was that since triggers
> were turned off that constraints, being triggers, would be off as
> well and therefore tables with foreign keys could be truncated.
> Obviously not, since the I get the above error.
>
> I just tried it. --disable-triggers does not disable constraints.

Just to be clear, I believe the problem in this example is that
--disable-triggers does not disable any foreign key constraints of
other tables when you are restoring a single table. So with:

pg_restore -1 --truncate-tables --disable-triggers --table=foo \
--data-only my.dump ...

you would get the complaint

ERROR: cannot truncate a table referenced in a foreign key constraint

which is talking about bar's referencing foo, and there was no

ALTER TABLE bar DISABLE TRIGGER ALL

done, since "bar" isn't being restored. I don't have a quibble with
this existing behavior -- you are instructing pg_restore to only mess
with "bar", after all. See below, though, for a comparison of --clean
and --truncate-tables when restoring "foo" and "bar" together.

>> For your first example, --truncate-tables seems to have some use, so
>> that the admin isn't forced to recreate various views which may be
>> dependent on the table. (Though it's not too difficult to work around
>> this case today.)
>
> As an aside: I never have an issue with this, having planned ahead.
> I'm curious what the not-too-difficult work-around is that you have
> in mind. I'm not coming up with a tidy way to, e.g, pull a lot
> of views out of a dump.

Well, for the first example, there was one table and only one view
which depended on the table, so manually editing the list file like
so:

pg_restore --list --file=my.dump > output.list
# manually edit file "output.list", select only entries pertaining
# to the table and dependent view
pg_restore -1 --clean --use-list=output.list ...

isn't too arduous, though it would become so if you had more dependent
views to worry about.

I'm willing to count this use-case as a usability win for
--truncate-tables, since with that option the restore can be boiled
down to a short and sweet:

pg_restore -1 --data-only --truncate-tables --table=mytable my.dump ...

Though note this may not prove practical for large tables, since
--data-only leaves constraints and indexes intact during the restore,
and can be massively slower compared to adding the constraints only
after the data has been COPYed in, as pg_restore otherwise does.

>> For the second example involving FKs, I'm a little bit more hesitant
>> about endorsing the use of --truncate-tables combined with
>> --disable-triggers to potentially allow bogus FKs. I know this is
>> possible to some extent today using the --disable-triggers option,
>> but
>> it makes me nervous to be adding a mode to pg_restore if it's
>> primarily designed to work together with --disable-triggers as a
>> larger foot-gun.
>
> This is the crux of the issue, and where I was thinking of
> taking this patch. I very often am of the mindset that
> foreign keys are no more or less special than other
> more complex data integrity rules enforced with triggers.
> (I suppose others might say the same regards to integrity
> enforced at the application level.) This naturally
> inclines me to think that one more way, beyond
> --disable-triggers, to break integrity is no big deal.
>
> But I quite see your point. Is it possible to get
> resolution on this issue before either of us do any
> more work in the direction of foreign keys?

I think the patch has some promise with at least one use-case (view(s)
dependent on table which needs to be reloaded, as discussed above).
With the other use-case we have been discussing, of reloading a table
referenced by other table(s)'s FKs, whether --truncate-tables is
helpful is less clear to me, at least in the patch's current state.
(See also bottom.)

> An additional foot-gun, --disable-constraints,
> seems like the natural progression in this direction.
> Constraints, unlike triggers (?), can, in theory,
> be fired at any time to check data content so perhaps
> providing a way to test existing constraints against
> db content would be a way to mitigate the foot-gun-ness
> and drive an after-the-fact data cleanup process.
>
> --disable-constraints seems like an entirely separate
> patch so maybe we can stop the FK related issue right
> here? (Although I would appreciate feedback regards
> whether such an option might be accepted, at minimum
> I'd like to get this out of my brain.)

I'm not sure I follow exactly how you envision --disable-constraints
would work, but it does seem a separate issue from the
--truncate-tables option at hand.

> One thing you'll want to pay attention to is the point
> in the restore process at which the truncation is done.
> In the current version each table is truncated immediately
> before being copied. It might or might not be better to do
> all the truncation up-front, in the fashion of --clean.
> I would appreciate some guidance on this.

IMO --truncate-tables is, at a minimum, going to need to truncate
tables up-front and in the appropriate order to avoid the annoying
"ERROR: cannot truncate a table referenced in a foreign key
constraint", at least when avoiding that error is possible.

Let's go back to your example of two tables, with "bar" referencing
"foo". This existing --clean functionality works fine:

# edit list to only include lines mentioning "foo" and "bar"
pg_restore -1 --clean --use-list=output.list ...

But this won't work, since pg_restore attempts to truncate and restore
foo separately from bar:

# edit list to only include lines mentioning "foo" and "bar"
pg_restore --truncate-tables --data-only -1 --use-list=output.list ...

i.e. will run into "ERROR: cannot truncate a table referenced in a
foreign key constraint".

> In case it's helpful I'm attaching two files
> I used to test the foreign key issue.

Thanks, I do appreciate seeing testcases scripts like this, since it
can neatly demonstrate the intended use-case of the feature, and help
bring to light anything that's missing. I tried your testcase, and
got:

pg_restore: [archiver (db)] could not execute query: ERROR: cannot
truncate a table referenced in a foreign key constraint
DETAIL: Table "bar" references "foo".

as discussed above. If you'd like to advertise this feature as being
handy for reloading a table referenced by other FKs, I'd be interested
to see a testcase demonstrating this use, along with any changes to
the patch (e.g. moving TRUNCATEs to the start) which might be needed.

Josh


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Josh Kupershmidt <schmiddy(at)gmail(dot)com>
Cc: "Karl O(dot) Pinc" <kop(at)meme(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Suggestion for --truncate-tables to pg_restore
Date: 2012-11-26 18:06:56
Message-ID: CA+TgmoY7MZatD7qiR97hQDOz-Pxg+vaKmrKXdm36=RS=mtH58A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Nov 21, 2012 at 12:53 AM, Josh Kupershmidt <schmiddy(at)gmail(dot)com> wrote:
> TBH, I didn't find the example above particularly compelling for
> demonstrating the need for this feature. If you've just got one table
> with dependent views which needs to be restored, it's pretty easy to
> manually TRUNCATE and have pg_restore --data-only reload the table.
> (And easy enough to combine the truncate and restore into a single
> transaction in case anything goes wrong, if need be.)
>
> But I'm willing to grant that this proposed feature is potentially as
> useful as existing restore-jiggering options like --disable-triggers.
> And I guess I could see that if you're really stuck having to perform
> a --data-only restore of many tables, this feature could come in
> handy.

I think I would come down on the other side of this. We've never
really been able to get --clean work properly in all scenarios, and it
seems likely that a similar fate will befall this option.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: "Karl O(dot) Pinc" <kop(at)meme(dot)com>
To: Josh Kupershmidt <schmiddy(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Suggestion for --truncate-tables to pg_restore
Date: 2012-11-26 21:51:05
Message-ID: 1353966665.29451.1@mofo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11/26/2012 12:06:56 PM, Robert Haas wrote:
> On Wed, Nov 21, 2012 at 12:53 AM, Josh Kupershmidt
> <schmiddy(at)gmail(dot)com> wrote:
> > TBH, I didn't find the example above particularly compelling for
> > demonstrating the need for this feature. If you've just got one
> table
> > with dependent views which needs to be restored, it's pretty easy
> to
> > manually TRUNCATE and have pg_restore --data-only reload the table.
> > (And easy enough to combine the truncate and restore into a single
> > transaction in case anything goes wrong, if need be.)
> >
> > But I'm willing to grant that this proposed feature is potentially
> as
> > useful as existing restore-jiggering options like
> --disable-triggers.
> > And I guess I could see that if you're really stuck having to
> perform
> > a --data-only restore of many tables, this feature could come in
> > handy.
>
> I think I would come down on the other side of this. We've never
> really been able to get --clean work properly in all scenarios, and
> it
> seems likely that a similar fate will befall this option.

Where I would like to go with this is to first introduce,
as a new patch, an ALTER TABLE option to disable a
constraint. Something like

ALTER TABLE foo UNVALIDATE CONSTRAINT "constraintname";

This would mark the constraint NOT VALID, as if the
constraint were created with the NOT VALID option.
After a constraint is UNVALIDATEd the existing

ALTER TABLE foo VALIDATE CONSTRAINT "constraintname";

feature would turn the constraint on and check the data.

With UNVALIDATE CONSTRAINT, pg_restore could first turn
off all the constraints concerning tables to be restored,
truncate the tables, restore the data, turn the
constraints back on and re-validate the constraints.
No need to worry about ordering based on a FK referential
dependency graph or loops in such a graph (due to
DEFERRABLE INITIALLY DEFERRED).

This approach would allow the content of a table or
tables to be restored regardless of dependent objects
or FK references and preserve FK referential integrity.
Right? I need some guidance here from someone who
knows more than I do.

There would likely need to be a pg_restore option like
--disable-constraints to invoke this functionality,
but that can be worked out later.
Likewise, I see an update and a delete trigger in
pg_triggers associated with the referenced table
in REFERENCES constraints, but no trigger for
truncate. So making a constraint NOT VALID may
not be as easy as it seems.

I don't know what the problems are with --clean
but I would like to know if this appears
a promising approach. If so I can pursue it,
although I make no promises. (I sent in
the --disable-triggers patch because it seemed
easy and I'm not sure where a larger project fits
into my life.)

Regards,

Karl <kop(at)meme(dot)com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

P.S. An outstanding question regards --truncate-tables
is whether it should drop indexes before truncate
and re-create them after restore. Sounds like it should.


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "Karl O(dot) Pinc" <kop(at)meme(dot)com>
Cc: Josh Kupershmidt <schmiddy(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Suggestion for --truncate-tables to pg_restore
Date: 2012-11-26 22:42:40
Message-ID: CA+Tgmobrs2R4VK9Re2J370x+be+G0n4fi22TBEGdLC3TJa2VUg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Nov 26, 2012 at 4:51 PM, Karl O. Pinc <kop(at)meme(dot)com> wrote:
> Where I would like to go with this is to first introduce,
> as a new patch, an ALTER TABLE option to disable a
> constraint. Something like
>
> ALTER TABLE foo UNVALIDATE CONSTRAINT "constraintname";

This doesn't really make sense, because constraints that are not
validated are still enforced for new rows. This thus wouldn't save
anything performance-wise. We should perhaps have two more states:
not enforced but blindly assumed true, and not enforced and not
assumed true either. But currently, we don't.

> I don't know what the problems are with --clean
> but I would like to know if this appears
> a promising approach. If so I can pursue it,
> although I make no promises. (I sent in
> the --disable-triggers patch because it seemed
> easy and I'm not sure where a larger project fits
> into my life.)

I'm not really sure what the issues were any more; but I think they
may have had to do with dependencies between different objects messing
things up, which I think is likely to be a problem for this proposal
as well.

> P.S. An outstanding question regards --truncate-tables
> is whether it should drop indexes before truncate
> and re-create them after restore. Sounds like it should.

Well, that would improve performance, but it also makes the behavior
of object significantly different from what one might expect from the
name. One of the problems here is that there seem to be a number of
slightly-different things that one might want to do, and it's not
exactly clear what all of them are, or whether a reasonable number of
options can cater to all of them.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Josh Kupershmidt <schmiddy(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "Karl O(dot) Pinc" <kop(at)meme(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Suggestion for --truncate-tables to pg_restore
Date: 2012-11-27 02:45:08
Message-ID: CAK3UJRFggRHvgnpadTMTKcTW_r_J25StLChYP+NfYBcDAme68w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Nov 26, 2012 at 3:42 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Nov 26, 2012 at 4:51 PM, Karl O. Pinc <kop(at)meme(dot)com> wrote:
>> P.S. An outstanding question regards --truncate-tables
>> is whether it should drop indexes before truncate
>> and re-create them after restore. Sounds like it should.
>
> Well, that would improve performance, but it also makes the behavior
> of object significantly different from what one might expect from the
> name. One of the problems here is that there seem to be a number of
> slightly-different things that one might want to do, and it's not
> exactly clear what all of them are, or whether a reasonable number of
> options can cater to all of them.

Another problem: attempting to drop a unique constraint or primary key
(if we're counting these as indexes to be dropped and recreated, which
they should be if the goal is reasonable restore performance) which is
referenced by another table's foreign key will cause:
ERROR: cannot drop constraint xxx on table yyy
because other objects depend on it

and as discussed upthread, it would be impolite for pg_restore to
presume it should monkey with dropping+recreating other tables'
constraints to work around this problem, not to mention impossible
when pg_restore is not connected to the target database.

It is a common administrative task to selectively restore some
existing tables' contents from a backup, and IIRC was the impetus for
this patch. Instead of adding a bunch of options to pg_restore,
perhaps a separate tool specific to this task would be the way to go.
It could handle the minutiae of truncating, dropping and recreating
constraints and indexes of the target tables, and dealing with FKs
sensibly, without worrying about conflicts with existing pg_restore
options and behavior.

Josh


From: "Karl O(dot) Pinc" <kop(at)meme(dot)com>
To: Josh Kupershmidt <schmiddy(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Suggestion for --truncate-tables to pg_restore
Date: 2012-11-27 03:30:48
Message-ID: 1353987048.29451.4@mofo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11/26/2012 08:45:08 PM, Josh Kupershmidt wrote:
> On Mon, Nov 26, 2012 at 3:42 PM, Robert Haas <robertmhaas(at)gmail(dot)com>
> wrote:
> > On Mon, Nov 26, 2012 at 4:51 PM, Karl O. Pinc <kop(at)meme(dot)com> wrote:
> >> P.S. An outstanding question regards --truncate-tables
> >> is whether it should drop indexes before truncate
> >> and re-create them after restore. Sounds like it should.
> >
> > Well, that would improve performance, but it also makes the
> behavior
> > of object significantly different from what one might expect from
> the
> > name. One of the problems here is that there seem to be a number
> of
> > slightly-different things that one might want to do, and it's not
> > exactly clear what all of them are, or whether a reasonable number
> of
> > options can cater to all of them.
>
> Another problem: attempting to drop a unique constraint or primary
> key
> (if we're counting these as indexes to be dropped and recreated,
> which
> they should be if the goal is reasonable restore performance) which
> is
> referenced by another table's foreign key will cause:
> ERROR: cannot drop constraint xxx on table yyy
> because other objects depend on it
>
> and as discussed upthread, it would be impolite for pg_restore to
> presume it should monkey with dropping+recreating other tables'
> constraints to work around this problem, not to mention impossible
> when pg_restore is not connected to the target database.

I'm thinking impossible because it's impossible to know
what the existing FKs are without a db connection. Impossible is
a problem. You may have another reason why it's impossible.

> It is a common administrative task to selectively restore some
> existing tables' contents from a backup, and IIRC was the impetus for
> this patch.

Yes. (And aside from listing tables individually it'd be nice
to restore tables per schema.)

It's also a bit surprising that restoring table content
is so hard/unsupported, given a db of more than minimal
complexity.

> Instead of adding a bunch of options to pg_restore,
> perhaps a separate tool specific to this task would be the way to go.
> It could handle the minutiae of truncating, dropping and recreating
> constraints and indexes of the target tables, and dealing with FKs
> sensibly, without worrying about conflicts with existing pg_restore
> options and behavior.

Per above, the tool would then either require a db connection
or at least a dump which contains the system catalogs.

I'm afraid I don't have a clear picture of what such a tool
would look like, if it does not look a lot like pg_restore.
I would like to have such a tool. I'm not certain how
much I'd be able to contribute toward making one.

Meanwhile it sounds like the --truncate-tables patch
is looking less and less desirable. I'm ready for
rejection, but will soldier on in the interest of
not wasting other people work on this, if given
direction to move forward.

Regards,

Karl <kop(at)meme(dot)com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein


From: "Karl O(dot) Pinc" <kop(at)meme(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Josh Kupershmidt <schmiddy(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Suggestion for --truncate-tables to pg_restore
Date: 2012-11-27 03:48:02
Message-ID: 1353988082.29451.6@mofo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11/26/2012 09:30:48 PM, Karl O. Pinc wrote:
> On 11/26/2012 08:45:08 PM, Josh Kupershmidt wrote:

> > It is a common administrative task to selectively restore some
> > existing tables' contents from a backup, and IIRC was the impetus
> for
> > this patch.
>
> Yes. (And aside from listing tables individually it'd be nice
> to restore tables per schema.)

As long as I'm daydreaming it'd be nice to be able to
restore a table, data and schema, and have available
the various combinations of: new table name, different
owner, different schema, different db. Without having
to edit a file by hand.

Of course I've not done the brain work involved in
figuring out just what this would mean in terms
of related objects like triggers, constraints,
indexes and so forth. But then who doesn't want
a pony? :-)

Regards,

Regards,

Karl <kop(at)meme(dot)com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein


From: Josh Kupershmidt <schmiddy(at)gmail(dot)com>
To: "Karl O(dot) Pinc" <kop(at)meme(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Suggestion for --truncate-tables to pg_restore
Date: 2012-12-05 03:26:47
Message-ID: CAK3UJRGgHpJOEq8yp2H+umHPUO84F60fNN2vJAEnUyuQ-ya-6g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Sorry for the delay in following up here.

On Mon, Nov 26, 2012 at 8:30 PM, Karl O. Pinc <kop(at)meme(dot)com> wrote:
> On 11/26/2012 08:45:08 PM, Josh Kupershmidt wrote:
>> On Mon, Nov 26, 2012 at 3:42 PM, Robert Haas <robertmhaas(at)gmail(dot)com>
>> wrote:
>> > On Mon, Nov 26, 2012 at 4:51 PM, Karl O. Pinc <kop(at)meme(dot)com> wrote:
>> >> P.S. An outstanding question regards --truncate-tables
>> >> is whether it should drop indexes before truncate
>> >> and re-create them after restore. Sounds like it should.
>> >
>> > Well, that would improve performance, but it also makes the
>> behavior
>> > of object significantly different from what one might expect from
>> the
>> > name. One of the problems here is that there seem to be a number
>> of
>> > slightly-different things that one might want to do, and it's not
>> > exactly clear what all of them are, or whether a reasonable number
>> of
>> > options can cater to all of them.
>>
>> Another problem: attempting to drop a unique constraint or primary
>> key
>> (if we're counting these as indexes to be dropped and recreated,
>> which
>> they should be if the goal is reasonable restore performance) which
>> is
>> referenced by another table's foreign key will cause:
>> ERROR: cannot drop constraint xxx on table yyy
>> because other objects depend on it
>>
>> and as discussed upthread, it would be impolite for pg_restore to
>> presume it should monkey with dropping+recreating other tables'
>> constraints to work around this problem, not to mention impossible
>> when pg_restore is not connected to the target database.
>
> I'm thinking impossible because it's impossible to know
> what the existing FKs are without a db connection. Impossible is
> a problem. You may have another reason why it's impossible.

Yes, that's what I meant.

> Meanwhile it sounds like the --truncate-tables patch
> is looking less and less desirable. I'm ready for
> rejection, but will soldier on in the interest of
> not wasting other people work on this, if given
> direction to move forward.

Well, as far as I was able to tell, the use-case where this patch
worked without trouble was limited to restoring a table, or schema
with table(s), that:
a.) has some view(s) dependent on it
b.) has no other tables with FK references to it, so that we don't run into:
ERROR: cannot truncate a table referenced in a foreign key constraint
c.) is not so large that it takes forever for data to be restored
with indexes and constraints left intact
d.) and whose admin does not want to use --clean plus a list-file
which limits pg_restore to the table and its views

I was initially hoping that the patch would be more useful for
restoring a table with FKs pointing to it, but it seems the only
reliable way to do this kind of selective restore with pg_restore is
with --clean and editing the list-file. Editing the list-file is
certainly tedious and prone to manual error, but I'm not sure this
particular patch has a wide enough use-case to alleviate that pain
significantly.

Josh


From: "Karl O(dot) Pinc" <kop(at)meme(dot)com>
To: Josh Kupershmidt <schmiddy(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Suggestion for --truncate-tables to pg_restore
Date: 2012-12-05 04:40:29
Message-ID: 1354682429.2831.0@mofo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12/04/2012 09:26:47 PM, Josh Kupershmidt wrote:
> Sorry for the delay in following up here.

No problem at all.

> Well, as far as I was able to tell, the use-case where this patch
> worked without trouble was limited to restoring a table, or schema
> with table(s), that:
> a.) has some view(s) dependent on it
> b.) has no other tables with FK references to it, so that we don't
> run into:
> ERROR: cannot truncate a table referenced in a foreign key
> constraint
> c.) is not so large that it takes forever for data to be restored
> with indexes and constraints left intact
> d.) and whose admin does not want to use --clean plus a list-file
> which limits pg_restore to the table and its views
>
> I was initially hoping that the patch would be more useful for
> restoring a table with FKs pointing to it, but it seems the only
> reliable way to do this kind of selective restore with pg_restore is
> with --clean and editing the list-file. Editing the list-file is
> certainly tedious and prone to manual error, but I'm not sure this
> particular patch has a wide enough use-case to alleviate that pain
> significantly.

I think there must be a reliable way to restore tables with FKs
pointing to them, but getting pg_restore to do it seems perilous; at
least given your expectations for the behavior of pg_restore in the
context of the existing option set.

As with you, I am not inclined to add another option to pg_restore
unless it's really useful. (Pg_restore already has gobs of options,
to the point where it's getting sticky.) I don't think this
patch meets the utility bar. It might be able to be re-worked into
something useful, or it might need to evolve into some sort of new
restore utility per your thoughts. For now better to reject it until
the right/comprehensive way to proceed is figured out.

Thanks for all your work. I hope that this has at least
moved the discussion forward and not been a waste of everybody's
time. I would like to see a "better" way of restoring
tables that are FK reference targets.

Regards,

Karl <kop(at)meme(dot)com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein