Re: Fixing pg_dump

Lists: pgsql-hackers
From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Fixing pg_dump
Date: 2004-06-24 03:58:52
Message-ID: 40DA517C.2030406@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

OK,

I think it might save me some time if I get some guidance on how we
should modify pg_dump to fix the owner/grants issue.

I intend to make new archives created with 7.5 pg_dump have the fix, and
restoring pre 7.5 binary dumps will have exactly the previous
behaviour. The reason for this is that extracting the acls and owners
to the end requires scanning the entire archive twice - not necessarily
something we want to do (is it?)

So, I have to make acls and owners completely independent
DumpableObjects, right? Then I can get them sorted to the end, etc., etc.

I also intend to implement a flag that makes it use the sql standard set
session_authorization behaviour.

Is all that the way to go?

Chris


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
Cc: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fixing pg_dump
Date: 2004-06-24 15:53:45
Message-ID: 14760.1088092425@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au> writes:
> I intend to make new archives created with 7.5 pg_dump have the fix, and
> restoring pre 7.5 binary dumps will have exactly the previous
> behaviour. The reason for this is that extracting the acls and owners
> to the end requires scanning the entire archive twice - not necessarily
> something we want to do (is it?)

I might be wrong about this, but I had the idea that the entire archive
TOC gets loaded into memory, and so there'd be no noticeable penalty in
scanning it twice.

regards, tom lane


From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fixing pg_dump
Date: 2004-06-25 03:25:21
Message-ID: 40DB9B21.1010508@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>I intend to make new archives created with 7.5 pg_dump have the fix, and
>> restoring pre 7.5 binary dumps will have exactly the previous
>>behaviour. The reason for this is that extracting the acls and owners
>>to the end requires scanning the entire archive twice - not necessarily
>>something we want to do (is it?)
>
> I might be wrong about this, but I had the idea that the entire archive
> TOC gets loaded into memory, and so there'd be no noticeable penalty in
> scanning it twice.

Well, it probably gets scanned twice to do -c mode?

But...it seems kind of hacky to scan it again for owners and privs - are
you sure you want me to go that way?

Chris


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
Cc: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fixing pg_dump
Date: 2004-06-25 03:30:55
Message-ID: 1389.1088134255@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au> writes:
> But...it seems kind of hacky to scan it again for owners and privs - are
> you sure you want me to go that way?

If there's not a big performance penalty, sure. Being fully compatible
with existing archive files is a sufficient win to justify sins much
worse than this one.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
Cc: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fixing pg_dump
Date: 2004-06-25 04:01:47
Message-ID: 1683.1088136107@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au> writes:
> Out of interest - have you developed an opinion on the
> pg_get_serial_sequence thing?

Will apply as soon as I get a chance --- I'd intended to clean out the
patch queue today, but didn't get past Magnus' stuff.

> Also, what's the latest I can get pg_dump fixes in?

Roughly speaking, sizeof(patch)/time_to_freeze is a constant.

regards, tom lane


From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fixing pg_dump
Date: 2004-06-25 04:03:57
Message-ID: 40DBA42D.1090105@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> If there's not a big performance penalty, sure. Being fully compatible
> with existing archive files is a sufficient win to justify sins much
> worse than this one.

Out of interest - have you developed an opinion on the
pg_get_serial_sequence thing? Also, what's the latest I can get pg_dump
fixes in?

Chris


From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fixing pg_dump
Date: 2004-06-27 08:22:05
Message-ID: 40DE83AD.2060207@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>But...it seems kind of hacky to scan it again for owners and privs - are
>>you sure you want me to go that way?
>
> If there's not a big performance penalty, sure. Being fully compatible
> with existing archive files is a sufficient win to justify sins much
> worse than this one.

Ah, crap.

I tried adding the extra scan in and it as all well and good up until
the second where I realised that the TocEntry struct has no field that
allows me to know the correct way of finding the full descriptor of each
object.

For example, consider an operator. It's not enough for me to know the
name of the operator, I also must know the type of its operands.

That given, I see no way to implement this using a second scan, except
perhaps rewriting the drop command...which is extremely dodgy!

I'm running out of time unfortunately, and I need to know from you
whether I should go back to my work on making owner and acl TOC entries
fully independent? All this means is that people restoring pre-7.5
binary dumps into 7.5 will not get the owner fixes... But people using
the binary format to upgrade seems like a pretty rare case to me!

Chris


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
Cc: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fixing pg_dump
Date: 2004-06-27 14:42:46
Message-ID: 11914.1088347366@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au> writes:
> I tried adding the extra scan in and it as all well and good up until
> the second where I realised that the TocEntry struct has no field that
> allows me to know the correct way of finding the full descriptor of each
> object.

Ugh. Definitely an oversight. Don't suppose you want to think about
pulling the name out of the DROP command ;-) ?

> I'm running out of time unfortunately, and I need to know from you
> whether I should go back to my work on making owner and acl TOC entries
> fully independent? All this means is that people restoring pre-7.5
> binary dumps into 7.5 will not get the owner fixes... But people using
> the binary format to upgrade seems like a pretty rare case to me!

Hardly --- for instance, people using large objects have no other
choice.

You can *not* change pg_restore in a way that will make it impossible
for such people to restore their dumps (and no, I don't think it will
fly to tell someone after the fact they should have used 7.5 pg_dump...)

Maybe it's sufficient to have a backwards-compatibility mode in which
the SET SESSION AUTH commands still get issued same-as-ever. In fact,
you could just automatically do that if you see the archive version is
too old to have ALTER OWNER support.

On the whole though, I think editing the DROP commands might be the best
way. Are there any cases where that would actually not work?

regards, tom lane


From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fixing pg_dump
Date: 2004-06-28 01:45:53
Message-ID: 40DF7851.2060001@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Ugh. Definitely an oversight. Don't suppose you want to think about
> pulling the name out of the DROP command ;-) ?

Yeah, I've already done it - it's ugleeey, but it works :P

>>I'm running out of time unfortunately, and I need to know from you
>>whether I should go back to my work on making owner and acl TOC entries
>>fully independent? All this means is that people restoring pre-7.5
>>binary dumps into 7.5 will not get the owner fixes... But people using
>>the binary format to upgrade seems like a pretty rare case to me!
>
>
> Hardly --- for instance, people using large objects have no other
> choice.

Yes they can - they can just use the 7.5 pg_dump to upgrade...

> You can *not* change pg_restore in a way that will make it impossible
> for such people to restore their dumps (and no, I don't think it will
> fly to tell someone after the fact they should have used 7.5 pg_dump...)

I didn't say they won't work, I just said that they will not have a fix
for the owner/acl issue - I'll just check the version of their binary
dump and if it doesn't have independent acls and owners, I will just
restore it using the old set session authorization method.

> Maybe it's sufficient to have a backwards-compatibility mode in which
> the SET SESSION AUTH commands still get issued same-as-ever. In fact,
> you could just automatically do that if you see the archive version is
> too old to have ALTER OWNER support.

That's what I was suggesting :) Peter E wanted me to keep a set session
auth mode around anyway for sql standard compatibilty - is that still a
requirement (it doesn't seem hard to do). However...I think that if you
ask 100 postgres users if they want to be able to restore their own
backups, or have sql standard backups, I know which one 100 of them will
choose :)

Actually, this brings up another point - people occasionally complain on
the list that pg_dump is not considered important enough :( ie. Is
there any good reason we cannot backport the entire new pg_dump to the
7.4 branch, and change the 3 small things that prevent its output
restoring to 7.4. This is because there are many 7.4 users who cannot
restore their own backups to 7.4 in an emergency, and have resorted to
using CVS pg_dump and running sed scripts over it...

> On the whole though, I think editing the DROP commands might be the best
> way. Are there any cases where that would actually not work?

There's a couple of tricks.

* Drop commands for TYPEs have 'CASCADE' on the end (has that always
been true)

* I currently assume that the last two characters in a drop command are
; and \n. I'm not sure if this has always been the case. Maybe I
should make it loop until it removes the trailing semi-colon.

* Do we no longer worry about the SCHEMA AUTHORIZATION clause? I might
set it to keep being issued in 'sql standard mode', but otherwise we
cannot use it in dumps any more.

* I have to special case VIEWS and SEQUENCES to use ALTER TABLE...OWNER TO

* On the second scan I'm not sure what to use as the comment above each
owner and acl bit - I'll make it the same as the master object.

* On the second scan I have to hard-code the list of object types that
need to have owners and acls dumped.

Other than that it works really well :) I just have to tie off some
loose ends and make really sure that my drop command extraction is safe
from buffer errors, etc. Hopefully I'll be able to submit tonight.

Chris


From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fixing pg_dump
Date: 2004-06-28 02:09:01
Message-ID: 40DF7DBD.2050905@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> * Do we no longer worry about the SCHEMA AUTHORIZATION clause? I might
> set it to keep being issued in 'sql standard mode', but otherwise we
> cannot use it in dumps any more.

Actually, that's not true - I'm being silly. We can use the
AUTHORIZATION clause instead of ALTER SCHEMA ... OWNER TO :)

Another question:

We currently fully qualify DROP command with the namespace so that drops
will not accidentally modify the system catalogs. Shouldn't this also
be necessary on ALL non-CREATE commands?

ie. All ALTER, GRANT and REVOKE commands?

Otherwise, if the create table command associated with each of these
fails (for whatever reason), the script could happily carry on and
modify the system catalog tables?

Chris


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
Cc: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fixing pg_dump
Date: 2004-06-28 02:09:08
Message-ID: 16266.1088388548@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au> writes:
> Actually, this brings up another point - people occasionally complain on
> the list that pg_dump is not considered important enough :( ie. Is
> there any good reason we cannot backport the entire new pg_dump to the
> 7.4 branch, and change the 3 small things that prevent its output
> restoring to 7.4.

Lack of testing. Maybe after 7.5 has been out for a while, we'll have
enough trust in current pg_dump to think of doing that, but on the other
hand it would be somewhat moot at that point. The main point though is
that 7.4 is supposed to be a *stable* branch, and dropping a rewritten
pg_dump that hasn't even been through beta yet into a stable branch is
Simply Not Done.

> * Drop commands for TYPEs have 'CASCADE' on the end (has that always
> been true)

Yeek. That's got to be a hangover from pre-dependency-chasing days.
Let's lose it in our current output, at least.

> * I currently assume that the last two characters in a drop command are
> ; and \n. I'm not sure if this has always been the case. Maybe I
> should make it loop until it removes the trailing semi-colon.

I'd go for "remove while the last char is either ; or \n"; should cover
all cases.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
Cc: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fixing pg_dump
Date: 2004-06-28 02:18:55
Message-ID: 16353.1088389135@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au> writes:
> We currently fully qualify DROP command with the namespace so that drops
> will not accidentally modify the system catalogs. Shouldn't this also
> be necessary on ALL non-CREATE commands?
> Otherwise, if the create table command associated with each of these
> fails (for whatever reason), the script could happily carry on and
> modify the system catalog tables?

I don't buy it. There's a tradeoff here between certainty of doing what
you want and having a script that is easy to edit. DROP is a dangerous
weapon and we should be circumspect about applying it, but ALTER OWNER
etc are much less so.

Also, the point about qualifying the DROP is that you do not know
whether the object is there initially, and so you could be dropping
the wrong thing even in non-error cases. The scenario where the CREATE
fails is much less probable.

regards, tom lane


From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fixing pg_dump
Date: 2004-06-28 03:20:17
Message-ID: 40DF8E71.1040302@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> I don't buy it. There's a tradeoff here between certainty of doing what
> you want and having a script that is easy to edit. DROP is a dangerous
> weapon and we should be circumspect about applying it, but ALTER OWNER
> etc are much less so.
>
> Also, the point about qualifying the DROP is that you do not know
> whether the object is there initially, and so you could be dropping
> the wrong thing even in non-error cases. The scenario where the CREATE
> fails is much less probable.

OK, does it matter then if I leave the ALTER OWNER statement qualified?
It would be a pain in the butt to parse off the existing DROP
command's namespace qualification.

I've run into an ACLs problem now. This is the situation:

ALTER TABLE public.forums_threads OWNER TO usadmin;

--
-- Name: forums_threads; Type: ACL; Schema: public; Owner: usadmin
--

REVOKE ALL ON TABLE forums_threads FROM PUBLIC;
REVOKE ALL ON TABLE forums_threads FROM usadmin;
SET SESSION AUTHORIZATION brett;
GRANT ALL ON TABLE forums_threads TO brett WITH GRANT OPTION;
RESET SESSION AUTHORIZATION;

It fails trying to grant brett the GRANT OPTION as the user brett. I
haven't yet looked into the logic behind dumping these acls in common.c,
but it seems that maybe it's a hangover from historical alter owner
commands on that relation? ie. The acl is still there from when brett
used to own that table?

Do you still plan to fix that? I'll see if there's anything I can do in
pg_dump to detect the condition and fix it if you like.

Chris


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
Cc: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fixing pg_dump
Date: 2004-06-28 03:21:51
Message-ID: 16822.1088392911@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au> writes:
> ... The acl is still there from when brett used to own that table?
> Do you still plan to fix that?

Yeah, that's still on my should-fix-for-7.5 list (and I think Fabien was
going to, or already did, submit some ACL-hacking code to help). That
is, ALTER OWNER should adjust the ACL so that grants made by/to the
former owner now appear to be by/to the new owner.

However, there's still the problem that a dump from a 7.4 or older
database might exhibit the inconsistent ACL entries. Not very sure
how to cope ...

regards, tom lane


From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fixing pg_dump
Date: 2004-06-28 03:27:15
Message-ID: 40DF9013.3070409@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>* Drop commands for TYPEs have 'CASCADE' on the end (has that always
>>been true)
>
> Yeek. That's got to be a hangover from pre-dependency-chasing days.
> Let's lose it in our current output, at least.

I think it's necessary due to the circular dependency between types and
their I/O functions.

Chris


From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fixing pg_dump
Date: 2004-06-28 03:52:52
Message-ID: 40DF9614.5050502@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Yeah, that's still on my should-fix-for-7.5 list (and I think Fabien was
> going to, or already did, submit some ACL-hacking code to help). That
> is, ALTER OWNER should adjust the ACL so that grants made by/to the
> former owner now appear to be by/to the new owner.
>
> However, there's still the problem that a dump from a 7.4 or older
> database might exhibit the inconsistent ACL entries. Not very sure
> how to cope ...

How about "if the grant is to user blah, from user blah and blah is not
the owner of the object, grant as default user, not user blah"?

Chris