Re: Suggestion for --truncate-tables to pg_restore

From: Josh Kupershmidt <schmiddy(at)gmail(dot)com>
To: "Karl O(dot) Pinc" <kop(at)meme(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Suggestion for --truncate-tables to pg_restore
Date: 2012-11-23 19:54:43
Message-ID: CAK3UJRHx6auRt5aSnx0OeJMZeK3Gr+MZFnH=CKofQBRfnGVykQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 21, 2012 at 5:48 AM, Karl O. Pinc <kop(at)meme(dot)com> wrote:

>> On Fri, Sep 21, 2012 at 8:54 AM, Karl O. Pinc <kop(at)meme(dot)com> wrote:
>> > On 09/20/2012 12:24:49 PM, Karl O. Pinc wrote:

> OT:
> After looking at the code I found a number of "conflicting"
> option combinations are not tested for or rejected. I can't
> recall what they are now. The right way to fix these would be
> to send in a separate patch for each "conflict" all attached
> to one email/commitfest item? Or one patch that just gets
> adjusted until it's right?

Typically I think it's easiest for the reviewer+committer to consider
a bunch of such similar changes altogether in one patch, rather than
in a handful of separate patches, though that's just my own
preference.

>> > ----
>> >
>> > More serious objections were raised regarding semantics.
>> >
>> > What if, instead, the initial structure looked like:
>> >
>> > CREATE TABLE schemaA.foo
>> > (id PRIMARY KEY, data INT);
>> >
>> > CREATE TABLE schemaB.bar
>> > (id INT CONSTRAINT "bar_on_foo" REFERENCES foo
>> > , moredata INT);
>> >
>> > With a case like this, in most real-world situations, you'd
>> > have to use pg_restore with --disable-triggers if you wanted
>> > to use --data-only and --truncate-tables. The possibility of
>> > foreign key referential integrity corruption is obvious.
>>
>> Why would --disable-triggers be used in this example? I don't think
>> you could use --truncate-tables to restore only table "foo", because
>> you would get:
>>
>> ERROR: cannot truncate a table referenced in a foreign key
>> constraint
>>
>> (At least, I think so, not having tried with the actual patch.) You
>> could use --disable-triggers when restoring "bar".
>
> I tried it and you're quite right. (I thought I'd tried this
> before, but clearly I did not -- proving the utility of the review
> process. :-) My assumption was that since triggers
> were turned off that constraints, being triggers, would be off as
> well and therefore tables with foreign keys could be truncated.
> Obviously not, since the I get the above error.
>
> I just tried it. --disable-triggers does not disable constraints.

Just to be clear, I believe the problem in this example is that
--disable-triggers does not disable any foreign key constraints of
other tables when you are restoring a single table. So with:

pg_restore -1 --truncate-tables --disable-triggers --table=foo \
--data-only my.dump ...

you would get the complaint

ERROR: cannot truncate a table referenced in a foreign key constraint

which is talking about bar's referencing foo, and there was no

ALTER TABLE bar DISABLE TRIGGER ALL

done, since "bar" isn't being restored. I don't have a quibble with
this existing behavior -- you are instructing pg_restore to only mess
with "bar", after all. See below, though, for a comparison of --clean
and --truncate-tables when restoring "foo" and "bar" together.

>> For your first example, --truncate-tables seems to have some use, so
>> that the admin isn't forced to recreate various views which may be
>> dependent on the table. (Though it's not too difficult to work around
>> this case today.)
>
> As an aside: I never have an issue with this, having planned ahead.
> I'm curious what the not-too-difficult work-around is that you have
> in mind. I'm not coming up with a tidy way to, e.g, pull a lot
> of views out of a dump.

Well, for the first example, there was one table and only one view
which depended on the table, so manually editing the list file like
so:

pg_restore --list --file=my.dump > output.list
# manually edit file "output.list", select only entries pertaining
# to the table and dependent view
pg_restore -1 --clean --use-list=output.list ...

isn't too arduous, though it would become so if you had more dependent
views to worry about.

I'm willing to count this use-case as a usability win for
--truncate-tables, since with that option the restore can be boiled
down to a short and sweet:

pg_restore -1 --data-only --truncate-tables --table=mytable my.dump ...

Though note this may not prove practical for large tables, since
--data-only leaves constraints and indexes intact during the restore,
and can be massively slower compared to adding the constraints only
after the data has been COPYed in, as pg_restore otherwise does.

>> For the second example involving FKs, I'm a little bit more hesitant
>> about endorsing the use of --truncate-tables combined with
>> --disable-triggers to potentially allow bogus FKs. I know this is
>> possible to some extent today using the --disable-triggers option,
>> but
>> it makes me nervous to be adding a mode to pg_restore if it's
>> primarily designed to work together with --disable-triggers as a
>> larger foot-gun.
>
> This is the crux of the issue, and where I was thinking of
> taking this patch. I very often am of the mindset that
> foreign keys are no more or less special than other
> more complex data integrity rules enforced with triggers.
> (I suppose others might say the same regards to integrity
> enforced at the application level.) This naturally
> inclines me to think that one more way, beyond
> --disable-triggers, to break integrity is no big deal.
>
> But I quite see your point. Is it possible to get
> resolution on this issue before either of us do any
> more work in the direction of foreign keys?

I think the patch has some promise with at least one use-case (view(s)
dependent on table which needs to be reloaded, as discussed above).
With the other use-case we have been discussing, of reloading a table
referenced by other table(s)'s FKs, whether --truncate-tables is
helpful is less clear to me, at least in the patch's current state.
(See also bottom.)

> An additional foot-gun, --disable-constraints,
> seems like the natural progression in this direction.
> Constraints, unlike triggers (?), can, in theory,
> be fired at any time to check data content so perhaps
> providing a way to test existing constraints against
> db content would be a way to mitigate the foot-gun-ness
> and drive an after-the-fact data cleanup process.
>
> --disable-constraints seems like an entirely separate
> patch so maybe we can stop the FK related issue right
> here? (Although I would appreciate feedback regards
> whether such an option might be accepted, at minimum
> I'd like to get this out of my brain.)

I'm not sure I follow exactly how you envision --disable-constraints
would work, but it does seem a separate issue from the
--truncate-tables option at hand.

> One thing you'll want to pay attention to is the point
> in the restore process at which the truncation is done.
> In the current version each table is truncated immediately
> before being copied. It might or might not be better to do
> all the truncation up-front, in the fashion of --clean.
> I would appreciate some guidance on this.

IMO --truncate-tables is, at a minimum, going to need to truncate
tables up-front and in the appropriate order to avoid the annoying
"ERROR: cannot truncate a table referenced in a foreign key
constraint", at least when avoiding that error is possible.

Let's go back to your example of two tables, with "bar" referencing
"foo". This existing --clean functionality works fine:

# edit list to only include lines mentioning "foo" and "bar"
pg_restore -1 --clean --use-list=output.list ...

But this won't work, since pg_restore attempts to truncate and restore
foo separately from bar:

# edit list to only include lines mentioning "foo" and "bar"
pg_restore --truncate-tables --data-only -1 --use-list=output.list ...

i.e. will run into "ERROR: cannot truncate a table referenced in a
foreign key constraint".

> In case it's helpful I'm attaching two files
> I used to test the foreign key issue.

Thanks, I do appreciate seeing testcases scripts like this, since it
can neatly demonstrate the intended use-case of the feature, and help
bring to light anything that's missing. I tried your testcase, and
got:

pg_restore: [archiver (db)] could not execute query: ERROR: cannot
truncate a table referenced in a foreign key constraint
DETAIL: Table "bar" references "foo".

as discussed above. If you'd like to advertise this feature as being
handy for reloading a table referenced by other FKs, I'd be interested
to see a testcase demonstrating this use, along with any changes to
the patch (e.g. moving TRUNCATEs to the start) which might be needed.

Josh

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dimitri Fontaine 2012-11-23 20:36:05 Re: review: Deparsing DDL command strings
Previous Message Heikki Linnakangas 2012-11-23 18:00:33 Re: [BUG] False indication in pg_stat_replication.sync_state