Re: alter table only ... drop constraint broken in HEAD

Lists: pgsql-hackers
From: Alex Hunsaker <badalex(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: alter table only ... drop constraint broken in HEAD
Date: 2011-10-05 21:53:58
Message-ID: CAFaPBrRjrZGS26RQ_WFugh=paQV8B=TmJ=ndATWDsbtvxc0LhQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

tldr:

Seems to be broken by
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=4da99ea4231e3d8bbf28b666748c1028e7b7d665
:
commit 4da99ea4231e3d8bbf28b666748c1028e7b7d665
Author: Robert Haas <rhaas(at)postgresql(dot)org>
Date: Mon Jun 27 10:27:17 2011 -0400

Avoid having two copies of the HOT-chain search logic.

Full story:

While playing with the non inheritable constraints patch
(https://commitfest.postgresql.org/action/patch_view?id=611) I noticed
the following sequence was broken:
create table colors (c text check (not null));
create table colors_i () inherits (colors);
alter table only colors drop constraint colors_check;
ERROR: relation 16412 has non-inherited constraint "colors_check"

Naturally I assumed it was the patches fault, but after further
digging turns out it was not. I bisected it down to the above commit
(for those that have not tried git bisect and git bisect run its great
for this kind of thing).

The basic problem seems to be after we update pg_constraint to
decrement inhcount for a child table we call
CommandCounterIncrement(); then we fetch the next row from
pg_constraint... which happens to be the row we just updated. So we
try to decrement it again, only now its at 0 which shouldn't happen so
we error out.

The simple fix seemed to be to move the CommandCounterIncrement()
outside of the while(... systable_getnext()) loop. Im not sure if
that's the right thing to-do or if there is some other bug here...

Attachment Content-Type Size
alter_drop_constr_hot.patch text/x-patch 3.2 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Alex Hunsaker <badalex(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: alter table only ... drop constraint broken in HEAD
Date: 2011-10-06 02:29:27
Message-ID: CA+TgmoaEq=bkbirOKcCqbGLv200VA+sp9GaiX1aNaHRZLNMCyA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Oct 5, 2011 at 5:53 PM, Alex Hunsaker <badalex(at)gmail(dot)com> wrote:
> tldr:
>
> Seems to be broken by
> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=4da99ea4231e3d8bbf28b666748c1028e7b7d665
> :
> commit 4da99ea4231e3d8bbf28b666748c1028e7b7d665
> Author: Robert Haas <rhaas(at)postgresql(dot)org>
> Date:   Mon Jun 27 10:27:17 2011 -0400
>
>    Avoid having two copies of the HOT-chain search logic.

Hmm... that's pretty terrible. Yikes. That commit wasn't intended to
change any behavior, just to clean up the code, so I think the right
thing to do here is figure out how I changed the behavior without
meaning too, rather than to start patching up all the places that
might have been affected by whatever the behavior change was. I'm too
tired to figure this out right now, but I'll spend some time staring
at it tomorrow.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Alex Hunsaker <badalex(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: alter table only ... drop constraint broken in HEAD
Date: 2011-10-06 13:24:42
Message-ID: CA+Tgmob47Xu4ZbHjdLFQ2Tzbz7nMXbPdsLRH_j0wc-RVaKbdPg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Oct 5, 2011 at 10:29 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Oct 5, 2011 at 5:53 PM, Alex Hunsaker <badalex(at)gmail(dot)com> wrote:
>> tldr:
>>
>> Seems to be broken by
>> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=4da99ea4231e3d8bbf28b666748c1028e7b7d665
>> :
>> commit 4da99ea4231e3d8bbf28b666748c1028e7b7d665
>> Author: Robert Haas <rhaas(at)postgresql(dot)org>
>> Date:   Mon Jun 27 10:27:17 2011 -0400
>>
>>    Avoid having two copies of the HOT-chain search logic.
>
> Hmm... that's pretty terrible.  Yikes.  That commit wasn't intended to
> change any behavior, just to clean up the code, so I think the right
> thing to do here is figure out how I changed the behavior without
> meaning too, rather than to start patching up all the places that
> might have been affected by whatever the behavior change was.  I'm too
> tired to figure this out right now, but I'll spend some time staring
> at it tomorrow.

Oh, I see the problem, and I now agree that it's the DROP CONSTRAINT
code that is buggy.

Here's what happened. In the old code, when index_getnext() is walking
a HOT chain, before returning each tuple, it remembers the XMAX of
that tuple and the offset of the next tuple in the chain. So in the
following scenario, we can fail to walk the entire HOT chain:

1. index_getnext() returns what is currently the last tuple in the
chain, remembering the xmax and next tid
2. the caller, or some other process, performs a HOT update of that tuple
3. now index_getnext() is called again, but it uses the remembered
xmax and tid, which are now out-of-date

The new code, on the other hand, doesn't remember the xmax and tid;
instead, it waits until the next call to index_getnext(), and then
fetches them from the previous-returned tuple at that time. So it
always sees the most current values and, therefore, walks the entire
chain.

In this particular case, the DROP CONSTRAINT code is counting on the
fact that it won't see its own updates, even though it is bumping the
command counter after each one, which is clearly wrong. I don't think
this was really safe even under the old coding, because there's no
guarantee that any particular update will be HOT, so we might have
found our own update anyway; it was just less likely.

I took a look around for other places that might have this same
problem and didn't find any, but I'm not too confident that that means
there are none there, since there are a fair number of things that can
call CommandCounterIncrement() indirectly. shdepReassignOwned() does
a direct CCI from within a scan, but ISTM that any update we do there
would produce a new tuple that doesn't match the scan, so that should
be OK.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Alex Hunsaker <badalex(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: alter table only ... drop constraint broken in HEAD
Date: 2011-10-07 03:38:09
Message-ID: CAFaPBrS_LUmiR1+tdh_0bXoOqJQ8N+cxSqg+ubx=2no2V99YOg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Oct 6, 2011 at 07:24, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Oct 5, 2011 at 10:29 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Wed, Oct 5, 2011 at 5:53 PM, Alex Hunsaker <badalex(at)gmail(dot)com> wrote:
>>> tldr:
>>>
>>> Seems to be broken by
>>> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=4da99ea4231e3d8bbf28b666748c1028e7b7d665
>>> :
>>> commit 4da99ea4231e3d8bbf28b666748c1028e7b7d665
>>> Author: Robert Haas <rhaas(at)postgresql(dot)org>
>>> Date:   Mon Jun 27 10:27:17 2011 -0400
>>>
>>>    Avoid having two copies of the HOT-chain search logic.
>>

> Oh, I see the problem, and I now agree that it's the DROP CONSTRAINT
> code that is buggy.

Want me to roll this fix in as part of the alter table only constraint
patch? Or keep it split out? We might want to backpatch to at least
8.3 where HOT was introduced (yes looks like the bug existed back
then). I suppose its a fairly narrow chance to hit this bug so I could
see the argument for not back patching...

> I took a look around for other places that might have this same
> problem and didn't find any, but I'm not too confident that that means
> there are none there, since there are a fair number of things that can
> call CommandCounterIncrement() indirectly.

I skimmed for the direct easy to find ones as well. Didn't see any
other than the one you note below.

>  shdepReassignOwned() does
> a direct CCI from within a scan, but ISTM that any update we do there
> would produce a new tuple that doesn't match the scan, so that should
> be OK.

Well its on purpose so I hope its ok :-)


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Alex Hunsaker <badalex(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: alter table only ... drop constraint broken in HEAD
Date: 2011-10-07 13:53:53
Message-ID: CA+TgmoYgfLVcO-2MEsKJi0A8Qxyfkz-=SHFSf7SfMj3W3V83nw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Oct 6, 2011 at 11:38 PM, Alex Hunsaker <badalex(at)gmail(dot)com> wrote:
>> Oh, I see the problem, and I now agree that it's the DROP CONSTRAINT
>> code that is buggy.
>
> Want me to roll this fix in as part of the alter table only constraint
> patch? Or keep it split out? We might want to backpatch to at least
> 8.3 where HOT was introduced (yes looks like the bug existed back
> then). I suppose its a fairly narrow chance to hit this bug so I could
> see the argument for not back patching...

Yeah, I'm not inclined to back-patch it. The chance of hitting this
in older versions must be very small, or somebody would have noticed
by now. If we get a report from the field, we can always back-patch
it then, but right now it doesn't seem worth taking any risks for.

On a related note, your fix seems slightly fragile to me ... because
we're pulling a CCI out of the innermost loop, but a CCI can still
happen inside that same loop if we recurse, because the recursive call
will do one before returning. Now, maybe that's OK, because the
recursive call in that case will just be deleting the tuple, so there
won't be a new version for us to stumble over. The only way we could
trip up in that case is if there were two identically named
constraints. We'd have to visit the first tuple, update it, then
visit the second tuple, recurse (thus incrementing the command
counter), and then visit the updated version of the first tuple. And
that should be impossible, because we've got code to disallow multiple
constraints on the same relation with the same name (though no unique
index, for some reason). Still, that's a long chain of reasoning, so
I'm wondering if we can't come up with something that is more
obviously correct.

If we're confident that the inner loop here should never iterate more
than once (i.e. the lack of a unique index is not an ominous sign)
then maybe we should just rewrite this so that the inner loop scans
until it finds a match and then terminates. Then, outside the loop,
we check whether a tuple was found and if so process it - but without
ever going back to look for another one. See attached.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
drop-constraint.patch application/octet-stream 4.2 KB

From: Alex Hunsaker <badalex(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: alter table only ... drop constraint broken in HEAD
Date: 2011-10-07 15:19:32
Message-ID: CAFaPBrSBp93_adjei64sM4mCGVeDoJ=_EGYYchz2Xqu_01fhAw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Oct 7, 2011 at 07:53, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> The only way we could
> trip up in that case is if there were two identically named
> constraints.  We'd have to visit the first tuple, update it, then
> visit the second tuple, recurse (thus incrementing the command
> counter), and then visit the updated version of the first tuple.  And
> that should be impossible, because we've got code to disallow multiple
> constraints on the same relation with the same name (though no unique
> index, for some reason).

Surely an oversight...

>  Still, that's a long chain of reasoning, so
> I'm wondering if we can't come up with something that is more
> obviously correct.
>
> If we're confident that the inner loop here should never iterate more
> than once (i.e. the lack of a unique index is not an ominous sign)
> then maybe we should just rewrite this so that the inner loop scans
> until it finds a match and then terminates.  Then, outside the loop,
> we check whether a tuple was found and if so process it - but without
> ever going back to look for another one.  See attached.

I eyeballed it and it does indeed seem simpler. My only thought is
perhaps we should add that missing unique index on (conrelid,
conname). If we are not going to support duplicate names in the code,
we might as well enforce it. No?


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Alex Hunsaker <badalex(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: alter table only ... drop constraint broken in HEAD
Date: 2011-10-07 15:50:00
Message-ID: CA+TgmobzURTVi2sHoQvV2o1D_rGasmC+D2tK+NQQ3oMf_mEL4w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Oct 7, 2011 at 11:19 AM, Alex Hunsaker <badalex(at)gmail(dot)com> wrote:
> On Fri, Oct 7, 2011 at 07:53, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
>> The only way we could
>> trip up in that case is if there were two identically named
>> constraints.  We'd have to visit the first tuple, update it, then
>> visit the second tuple, recurse (thus incrementing the command
>> counter), and then visit the updated version of the first tuple.  And
>> that should be impossible, because we've got code to disallow multiple
>> constraints on the same relation with the same name (though no unique
>> index, for some reason).
>
> Surely an oversight...
>
>>  Still, that's a long chain of reasoning, so
>> I'm wondering if we can't come up with something that is more
>> obviously correct.
>>
>> If we're confident that the inner loop here should never iterate more
>> than once (i.e. the lack of a unique index is not an ominous sign)
>> then maybe we should just rewrite this so that the inner loop scans
>> until it finds a match and then terminates.  Then, outside the loop,
>> we check whether a tuple was found and if so process it - but without
>> ever going back to look for another one.  See attached.
>
> I eyeballed it and it does indeed seem simpler. My only thought is
> perhaps we should add that missing unique index on (conrelid,
> conname). If we are not going to support duplicate names in the code,
> we might as well enforce it. No?

Not sure. There could be performance or other ramifications to that.
For now I'm more interested in fixing this particular bug than I am in
getting into a wider world of re-engineering...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Alex Hunsaker <badalex(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: alter table only ... drop constraint broken in HEAD
Date: 2011-10-07 16:45:29
Message-ID: CAFaPBrTE-rn51bi0Mbx1pOuRqX1K5T4k9Keu8PqjMDoeiExdNg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Oct 7, 2011 at 09:50, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Oct 7, 2011 at 11:19 AM, Alex Hunsaker <badalex(at)gmail(dot)com> wrote:
>> My only thought is
>> perhaps we should add that missing unique index on (conrelid,
>> conname). If we are not going to support duplicate names in the code,
>> we might as well enforce it. No?
>
> Not sure.  There could be performance or other ramifications to that.
> For now I'm more interested in fixing this particular bug than I am in
> getting into a wider world of re-engineering...

Yeah, looking at the code a bit closer we would also want to fix
various places to take advantage of the index. Seems like it could be
a big win when you have thousands of constraints (albeit only in the
add/drop case).

If I find the time maybe Ill submit something along these lines for
the next commit fest.

Thanks!


From: Greg Stark <stark(at)mit(dot)edu>
To: Alex Hunsaker <badalex(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: alter table only ... drop constraint broken in HEAD
Date: 2011-10-09 15:17:45
Message-ID: CAM-w4HNminB2QNhx0pQwmQ9+1N9x+Rw1Fsz-PdgjgZ9v6Wh4gQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Oct 7, 2011 at 5:45 PM, Alex Hunsaker <badalex(at)gmail(dot)com> wrote:
> If I find the time maybe Ill submit something along these lines for
> the next commit fest.
>

So i just picked up the non-inherited constraints patch and quickly
ran into the same problem and found this thread.

I think it makes sense to hold off on this patch until these issues
are resolved. Because we really do need to test the cases when adding
or removing child tables that have constraints with the same name as
non-inherited parent tables. And I'm not sure what will happen in
these cases once these issues are resolved.

--
greg


From: Alex Hunsaker <badalex(at)gmail(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: alter table only ... drop constraint broken in HEAD
Date: 2011-10-09 17:56:08
Message-ID: CAFaPBrQiQorDDHrA6e+0CW62Be61ThTgvhPZq41O8HbPdPfNuw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Oct 9, 2011 at 09:17, Greg Stark <stark(at)mit(dot)edu> wrote:
> On Fri, Oct 7, 2011 at 5:45 PM, Alex Hunsaker <badalex(at)gmail(dot)com> wrote:
>> If I find the time maybe Ill submit something along these lines for
>> the next commit fest.
>>
>
> So i just picked up the non-inherited constraints patch and quickly
> ran into the same problem and found this thread.
>
> I think it makes sense to hold off on this patch until these issues
> are resolved. Because we really do need to test the cases when adding
> or removing child tables that have constraints with the same name as
> non-inherited parent tables. And I'm not sure what will happen in
> these cases once these issues are resolved.

Doesn't someone just need to commit Roberts patch? I suppose it could
do with a better review than my eyeballing... Maybe thats where the
hang up is?


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Alex Hunsaker <badalex(at)gmail(dot)com>
Cc: Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: alter table only ... drop constraint broken in HEAD
Date: 2011-10-10 03:48:43
Message-ID: CA+TgmoYa2jLbaJ13fTC0KyacacPdpT0AR_UMAQ55gz0a=vCJ+g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Oct 9, 2011 at 1:56 PM, Alex Hunsaker <badalex(at)gmail(dot)com> wrote:
>> So i just picked up the non-inherited constraints patch and quickly
>> ran into the same problem and found this thread.
>>
>> I think it makes sense to hold off on this patch until these issues
>> are resolved. Because we really do need to test the cases when adding
>> or removing child tables that have constraints with the same name as
>> non-inherited parent tables. And I'm not sure what will happen in
>> these cases once these issues are resolved.
>
> Doesn't someone just need to commit Roberts patch?

Yeah, I've just been mostly AFK for ~53 hours. It's committed now.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company