Quick Links

Re: ALTER TABLE lock strength reduction patch is unsafe

Lists:	pgsql-hackers

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject:	ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-16 22:54:48
Message-ID:	28389.1308264888@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

If you set up a pgbench test case that hits the database with a lot of
concurrent selects and non-exclusive-locking ALTER TABLEs, 9.1 soon
falls over. For example:

$ cat foo.script
alter table pgbench_accounts set (fillfactor = 100);
SELECT abalance FROM pgbench_accounts WHERE aid = 525212;

$ createdb bench
$ pgbench -i -s 10 bench
...
$ pgbench -c 50 -t 1000000 -f foo.script bench
starting vacuum...end.
Client 10 aborted in state 0: ERROR: relation "pgbench_accounts" does not exist
Client 5 aborted in state 1: ERROR: cache lookup failed for relation 46260
Client 44 aborted in state 0: ERROR: relation "pgbench_accounts" does not exist
Client 3 aborted in state 1: ERROR: relation "pgbench_accounts" does not exist
LINE 1: SELECT abalance FROM pgbench_accounts WHERE aid = 525212;
^
Client 45 aborted in state 1: ERROR: could not open relation with OID 46260
LINE 1: SELECT abalance FROM pgbench_accounts WHERE aid = 525212;
^
Client 15 aborted in state 1: ERROR: cache lookup failed for relation 46260
Client 34 aborted in state 1: ERROR: could not open relation with OID 46260
LINE 1: SELECT abalance FROM pgbench_accounts WHERE aid = 525212;
^
Client 43 aborted in state 1: ERROR: cache lookup failed for relation 46260
Client 49 aborted in state 1: ERROR: relation "pgbench_accounts" does not exist
LINE 1: SELECT abalance FROM pgbench_accounts WHERE aid = 525212;
^
Client 12 aborted in state 0: ERROR: relation "pgbench_accounts" does not exist
Client 23 aborted in state 0: ERROR: relation "pgbench_accounts" does not exist
Client 14 aborted in state 0: ERROR: relation "pgbench_accounts" does not exist
Client 6 aborted in state 1: ERROR: could not open relation with OID 46260
LINE 1: SELECT abalance FROM pgbench_accounts WHERE aid = 525212;
^
Client 11 aborted in state 1: ERROR: could not open relation with OID 46260
LINE 1: SELECT abalance FROM pgbench_accounts WHERE aid = 525212;
^
Client 4 aborted in state 0: ERROR: relation "pgbench_accounts" does not exist
... etc etc ...

On my four-core workstation, the failures are infrequent at up to 30
clients but come pretty fast and furious at 50.

What is happening here is this:

1. Some backend commits an ALTER TABLE and sends out an sinval message.

2. In response, some other backend starts to reload its relcache entry
for pgbench_accounts when it begins its next command. It does an
indexscan with SnapshotNow on pg_class to find the updated pg_class row.

3. Meanwhile, some third backend commits another ALTER TABLE, updating
the pg_class row another time. Since we have removed the
AccessExclusiveLock that all variants of ALTER TABLE used to take, this
commit can happen while backend #2 is in process of scanning pg_class.

4. Backend #2 visits the new, about-to-be-committed version of
pgbench_accounts' pg_class row just before backend #3 commits.
It sees the row as not good and keeps scanning. By the time it
reaches the previous version of the row, however, backend #3
*has* committed. So that version isn't good according to SnapshotNow
either.

5. Thus, backend #2 fails to find any version of the pg_class row
that satisfies SnapshotNow, and it reports an error. Depending on just
when this happens during the cache load process, you can get any of
the errors displayed above, or probably some other ones.

The particular case I'm showing here only updates pg_class, but other
non-exclusive-lock variants of ALTER TABLE can probably provoke similar
failures with respect to other catalogs, leading to yet different
misbehaviors.

In typical cases where both versions of the row are on the same page,
the window for the concurrent commit to happen is very narrow --- that's
why you need so many clients to make it happen easily. With enough
clients there's a good chance of losing the CPU between tuple visits.
But of course Murphy's Law says this will happen in production
situations even if the load isn't so high.

I believe that this is fundamentally unavoidable so long as we use
SnapshotNow to read catalogs --- which is something we've talked about
changing, but it will require a pretty major R&D effort to make it
happen. In the meantime, we have to go back to using
AccessExclusiveLock for table alterations. It doesn't help to have
a lower lock level if that means that concurrent transactions will
unpredictably fail instead of waiting.

regards, tom lane

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-16 23:15:14
Message-ID:	BANLkTim2CxBVAg0qZ+u0Hg_FNyb0QD3sRw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jun 16, 2011 at 11:54 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> In typical cases where both versions of the row are on the same page,
> the window for the concurrent commit to happen is very narrow --- that's
> why you need so many clients to make it happen easily. With enough
> clients there's a good chance of losing the CPU between tuple visits.
> But of course Murphy's Law says this will happen in production
> situations even if the load isn't so high.

Thanks for doing the research on this. Much better we know now than
enter production like this.

> I believe that this is fundamentally unavoidable so long as we use
> SnapshotNow to read catalogs --- which is something we've talked about
> changing, but it will require a pretty major R&D effort to make it
> happen. In the meantime, we have to go back to using
> AccessExclusiveLock for table alterations. It doesn't help to have
> a lower lock level if that means that concurrent transactions will
> unpredictably fail instead of waiting.

If we were to change ALTER TABLE so it takes a session lock rather
than a normal lock, then we can commit the change and then wait until
the relcache invalidations have been received before we release the
lock. That way we would be able to avoid the concurrent issues you
describe.

Or alternatively we could just re-scan if we can't find a valid row
when building the cache. We have time in the failure path...

Do you have stomach for any this in 9.1?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-17 02:32:12
Message-ID:	BANLkTimGYm1H5aZgiPJ-6sXT8bB-S6wNRg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jun 16, 2011 at 6:54 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I believe that this is fundamentally unavoidable so long as we use
> SnapshotNow to read catalogs --- which is something we've talked about
> changing, but it will require a pretty major R&D effort to make it
> happen. In the meantime, we have to go back to using
> AccessExclusiveLock for table alterations. It doesn't help to have
> a lower lock level if that means that concurrent transactions will
> unpredictably fail instead of waiting.

Ouch.

I wonder if we could avoid this anomaly by taking a throwaway MVCC
snapshot at the beginning of each system catalog scan and using it
just for the duration of that scan. If nothing that has touched the
catalog commits while the scan is open, then this is logically
equivalent to SnapshotNow. If something does commit in mid-scan, then
we might not get the latest version of the row, but we should end up
with exactly one. If it's not the latest one, we'll do the rebuild
again upon seeing the next sinval message; in the meantime, the
version we're using mustn't be too intolerably bad or it was an error
not to use AccessExclusiveLock in the first place.

IIUC, the problem with this approach is not correctness but
performance. Taking snapshots is (currently) expensive.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Simon Riggs <simon(at)2ndquadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-17 08:32:46
Message-ID:	BANLkTi=YxA4oY5KQkURpTtfGNgzfCV8PmA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jun 16, 2011 at 11:54 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> 2. In response, some other backend starts to reload its relcache entry
> for pgbench_accounts when it begins its next command. It does an
> indexscan with SnapshotNow on pg_class to find the updated pg_class row.
>
> 3. Meanwhile, some third backend commits another ALTER TABLE, updating
> the pg_class row another time. Since we have removed the
> AccessExclusiveLock that all variants of ALTER TABLE used to take, this
> commit can happen while backend #2 is in process of scanning pg_class.

This part is the core of the problem:

We must not be able to update the catalog entry while a relcache
rebuild scan is in place.

So I'm prototyping something that allows
LockRelationDefinitionOid(targetRelId, ShareLock);

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-17 08:40:12
Message-ID:	BANLkTim+529wVosOMoaP=v_-MD0Rhk+GfQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jun 17, 2011 at 9:32 AM, simon <simon(at)2ndquadrant(dot)com> wrote:
> On Thu, Jun 16, 2011 at 11:54 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> 2. In response, some other backend starts to reload its relcache entry
>> for pgbench_accounts when it begins its next command. It does an
>> indexscan with SnapshotNow on pg_class to find the updated pg_class row.
>>
>> 3. Meanwhile, some third backend commits another ALTER TABLE, updating
>> the pg_class row another time. Since we have removed the
>> AccessExclusiveLock that all variants of ALTER TABLE used to take, this
>> commit can happen while backend #2 is in process of scanning pg_class.
>
> This part is the core of the problem:
>
> We must not be able to update the catalog entry while a relcache rebuild scan is in place.
>
> So I'm prototyping something that allows
> LockRelationDefinitionOid(targetRelId, ShareLock);

Similar to the way we lock a relation for extension, as a sub-lock of
the main relation lock.

Relcache rebuilds use a ShareLock, ALTER TABLE uses an ExclusiveLock.

I've written the patch, just need to test later today.... gotta step out now.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-17 17:22:40
Message-ID:	19729.1308331360@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Thu, Jun 16, 2011 at 6:54 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I believe that this is fundamentally unavoidable so long as we use
>> SnapshotNow to read catalogs --- which is something we've talked about
>> changing, but it will require a pretty major R&D effort to make it
>> happen.

> Ouch.

> I wonder if we could avoid this anomaly by taking a throwaway MVCC
> snapshot at the beginning of each system catalog scan and using it
> just for the duration of that scan. If nothing that has touched the
> catalog commits while the scan is open, then this is logically
> equivalent to SnapshotNow. If something does commit in mid-scan, then
> we might not get the latest version of the row, but we should end up
> with exactly one. If it's not the latest one, we'll do the rebuild
> again upon seeing the next sinval message; in the meantime, the
> version we're using mustn't be too intolerably bad or it was an error
> not to use AccessExclusiveLock in the first place.

Yeah, this seems like a possibly workable direction to explore. I like
this better than what Simon is proposing, because it would fix the
generic issue for all types of catalog SnapshotNow scans.

> IIUC, the problem with this approach is not correctness but
> performance. Taking snapshots is (currently) expensive.

Yeah. After mulling it for awhile, what about this idea: we could
redefine SnapshotNow as a snapshot type that includes a list of
transactions-in-progress, somewhat like an MVCC snapshot, but we don't
fill that list from the PGPROC array. Instead, while running a scan
with SnapshotNow, anytime we determine that a particular XID is
still-in-progress, we add that XID to the snapshot's list.
Subsequently, the SnapshotNow code assumes that XID to be
still-in-progress without consulting its actual state. We reset the XID
list to empty when starting a new SnapshotNow scan. (We might be able
to do so less often than that, like only when we do
AcceptInvalidationMessages, but it's not clear to me that there's any
real benefit in hanging onto the state longer.)

This costs no performance; if anything it should be faster than now,
because we'll be replacing expensive transaction state probes with
relatively-cheap searches of an XID array that should almost always
be quite short.

With this approach, we would have no serialization anomalies from single
transactions committing while a scan is in progress. There could be
anomalies resulting from considering an earlier XID to be in-progress
while a later XID is considered committed (because we didn't observe
it until later). So far as I can see offhand, the impact of that would
be that there might be multiple versions of a tuple that are considered
good, but never that there would be no version considered good (so long
as the other XIDs simply updated the tuple and didn't delete it). I
think this would be all right, since the scan would just seize on the
first good version it finds. As you argue above, if that's not good
enough for our purposes then the updater(s) should have taken a stronger
lock.

I am not, however, particularly pleased with the idea of trying to make
this work in 9.1. I still think that we should back off the attempt
to reduce lock strength in 9.1, and take it up for 9.2. We need to be
stabilizing 9.1 for release, not introducing new untested mechanisms in
it.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-17 18:24:09
Message-ID:	BANLkTikgOY+1AnXueY6Ws2qe3fiXOOQo0Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jun 17, 2011 at 1:22 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Yeah, this seems like a possibly workable direction to explore. I like
> this better than what Simon is proposing, because it would fix the
> generic issue for all types of catalog SnapshotNow scans.

It would also avoid adding more lock manager traffic which - as
recently discussed on the relevant threads - turns out to be a
significant performance bottleneck for us right now on some workloads.

>> IIUC, the problem with this approach is not correctness but
>> performance. Taking snapshots is (currently) expensive.
>
> Yeah. After mulling it for awhile, what about this idea: we could
> redefine SnapshotNow as a snapshot type that includes a list of
> transactions-in-progress, somewhat like an MVCC snapshot, but we don't
> fill that list from the PGPROC array. Instead, while running a scan
> with SnapshotNow, anytime we determine that a particular XID is
> still-in-progress, we add that XID to the snapshot's list.
> Subsequently, the SnapshotNow code assumes that XID to be
> still-in-progress without consulting its actual state. We reset the XID
> list to empty when starting a new SnapshotNow scan. (We might be able
> to do so less often than that, like only when we do
> AcceptInvalidationMessages, but it's not clear to me that there's any
> real benefit in hanging onto the state longer.)

I think that something like that might possibly work, but what if the
XID array overflows?

A while back I proposed the idea of a "lazy" snapshot, by which I had
in mind something similar to what you are suggesting but different in
detail. Initially, when asked to acquire a snapshot, the snapshot
manager acknowledges having taken one but does not actually do any
work. As long as it sees only XIDs that either precede the oldest XID
still running anywhere in the cluster, or have aborted, it can provide
answers that are 100% correct without any further data. If it ever
sees a newer, non-aborted XID then it goes and really gets an MVCC
snapshot at that point, which it can uses from that point onward. I
think that it might be possible to make such a system work even for
MVCC snapshots generally, but even if not, it might be sufficient for
this purpose. Unlike your approach, it would avoid both the "see no
rows" and the "see multiple rows" cases, which might be thought an
advantage.

> I am not, however, particularly pleased with the idea of trying to make
> this work in 9.1. I still think that we should back off the attempt
> to reduce lock strength in 9.1, and take it up for 9.2. We need to be
> stabilizing 9.1 for release, not introducing new untested mechanisms in
> it.

I like this feature a lot, but it's hard to imagine that any of the
fixes anyone has so far suggested can be implemented without
collateral damage. Nor is there any certainty that this is the last
bug.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-17 18:44:13
Message-ID:	21540.1308336253@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Fri, Jun 17, 2011 at 1:22 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Yeah. After mulling it for awhile, what about this idea: we could
>> redefine SnapshotNow as a snapshot type that includes a list of
>> transactions-in-progress, somewhat like an MVCC snapshot, but we don't
>> fill that list from the PGPROC array. Instead, while running a scan
>> with SnapshotNow, anytime we determine that a particular XID is
>> still-in-progress, we add that XID to the snapshot's list.

> I think that something like that might possibly work, but what if the
> XID array overflows?

Well, you repalloc it bigger. In either this idea or yours below,
I fear SnapshotNow snaps will have to become dynamically-allocated
structures instead of being simple references to a shared constant
object. (This is because we can sometimes do a SnapshotNow scan when
another one is already in progress, and we couldn't let the inner one
change the outer one's state.) That's not really a performance problem;
one more palloc to do a catalog scan is a non-issue. But it is likely
to be a large notational change compared to what we've got now.

> A while back I proposed the idea of a "lazy" snapshot, by which I had
> in mind something similar to what you are suggesting but different in
> detail. Initially, when asked to acquire a snapshot, the snapshot
> manager acknowledges having taken one but does not actually do any
> work. As long as it sees only XIDs that either precede the oldest XID
> still running anywhere in the cluster, or have aborted, it can provide
> answers that are 100% correct without any further data. If it ever
> sees a newer, non-aborted XID then it goes and really gets an MVCC
> snapshot at that point, which it can uses from that point onward. I
> think that it might be possible to make such a system work even for
> MVCC snapshots generally, but even if not, it might be sufficient for
> this purpose. Unlike your approach, it would avoid both the "see no
> rows" and the "see multiple rows" cases, which might be thought an
> advantage.

Hmm, yeah, I think this idea is probably better than mine, just because
of the less dubious semantics. I don't see how you'd make it work for
generic MVCC scans, because the behavior will be "the database state as
of some hard-to-predict time after the scan starts", which is not what
we want for MVCC. But it ought to be fine for SnapshotNow.

regards, tom lane

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-17 19:08:06
Message-ID:	BANLkTimSSU-v_dRTrdbyJnJn5oNkFm6r3w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jun 17, 2011 at 7:24 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Fri, Jun 17, 2011 at 1:22 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Yeah, this seems like a possibly workable direction to explore. I like
>> this better than what Simon is proposing, because it would fix the
>> generic issue for all types of catalog SnapshotNow scans.
>
> It would also avoid adding more lock manager traffic which - as
> recently discussed on the relevant threads - turns out to be a
> significant performance bottleneck for us right now on some workloads.

>> I am not, however, particularly pleased with the idea of trying to make
>> this work in 9.1. I still think that we should back off the attempt
>> to reduce lock strength in 9.1, and take it up for 9.2. We need to be
>> stabilizing 9.1 for release, not introducing new untested mechanisms in
>> it.
>
> I like this feature a lot, but it's hard to imagine that any of the
> fixes anyone has so far suggested can be implemented without
> collateral damage. Nor is there any certainty that this is the last
> bug.

Not so. The extra locking would only occur on the first lock
acquisition after DDL operations occur. If that was common then your
other performance patch would not be an effective optimisation. There
is no additional locking from what I've proposed in the common code
path - that's why we have a relcache.

Any effects from the additional locking will only be felt by people
issuing a stream of DDL statements against a table. Even assuming
there are some effects of real note.

So there is no "collateral damage" and additional locking is a viable
solution for 9.1.

It's possible that we may have a better solution in 9.2+ but then
we've said that before and have it never happen, many times.

Having spent a few hours mulling through this, I think there is a
reasonable solution for 9.1 and I continue to work on it.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-17 19:09:15
Message-ID:	BANLkTimr9txspNJD7kxRndLxMRsMD_6UpA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jun 17, 2011 at 2:44 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Hmm, yeah, I think this idea is probably better than mine, just because
> of the less dubious semantics. I don't see how you'd make it work for
> generic MVCC scans, because the behavior will be "the database state as
> of some hard-to-predict time after the scan starts", which is not what
> we want for MVCC. But it ought to be fine for SnapshotNow.

Department of second thoughts: I think I see a problem.

Suppose we have a tuple that has not been updated for a long time.
Its XMIN is committed and all-visible, and its XMAX is invalid. As
we're scanning the table (transaction T1), we see that tuple and say,
oh, it's visible. Now, another transaction (T2) begins, updates the
tuple, and commits. Our scan then reaches the page where the new
tuple is located, and says, oh, this is recent, I'd better take a real
snapshot. Of course, the new snapshot can see the new version of the
tuple, too. Of course, if T1 had taken its snapshot before starting
the scan, the second tuple would have been invisible. But since we
didn't take it until later, after T2 had already committed, we see a
duplicate.

That's still no worse than your idea, which rests on the theory that
duplicates don't matter anyway, but the case for it being better is a
lot thinner. I'd sure prefer something that had less crazy semantics
than either of these ideas, if we can think of something.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-17 19:15:34
Message-ID:	BANLkTi=0uPvo1B1yGVKEEg2J+g-7L124RA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jun 17, 2011 at 3:08 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> Not so. The extra locking would only occur on the first lock
> acquisition after DDL operations occur. If that was common then your
> other performance patch would not be an effective optimisation. There
> is no additional locking from what I've proposed in the common code
> path - that's why we have a relcache.

The extra locking would also occur when *initially* building relcache
entries. In other words, this would increase - likely quite
significantly - the overhead of backend startup. It's not going to be
sufficient to do this just for pg_class; I think you'll have to do it
for pg_attribute, pg_attrdef, pg_constraint, pg_index, pg_trigger,
pg_rewrite, and maybe a few others I'm not thinking of right now.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-17 19:34:26
Message-ID:	BANLkTikRgCpvi9keJhojETzZP5nYBk1HTQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jun 17, 2011 at 8:15 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Jun 17, 2011 at 3:08 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> Not so. The extra locking would only occur on the first lock
>> acquisition after DDL operations occur. If that was common then your
>> other performance patch would not be an effective optimisation. There
>> is no additional locking from what I've proposed in the common code
>> path - that's why we have a relcache.
>
> The extra locking would also occur when *initially* building relcache
> entries. In other words, this would increase - likely quite
> significantly - the overhead of backend startup. It's not going to be
> sufficient to do this just for pg_class; I think you'll have to do it
> for pg_attribute, pg_attrdef, pg_constraint, pg_index, pg_trigger,
> pg_rewrite, and maybe a few others I'm not thinking of right now.

Nothing you say here is accurate, regrettably.

The "extra locking" would be one call to the lock manager per
relation. Taken in shared mode, so it doesn't block.

I see no reason at all to have separate locks for each catalog table,
since it's the relation lock that is the top level lock.

Locking is a very well known solution to such problems. We use it
everywhere and we can use it here, and now.

I think you'd better wait to see the patch.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-17 19:45:15
Message-ID:	22858.1308339915@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Department of second thoughts: I think I see a problem.

Um, yeah, so that doesn't really work any better than my idea.

On further reflection, there's a problem at a higher level than this
anyway. Even if we can get a single SnapshotNow scan to produce
guaranteed-self-consistent results, that doesn't ensure consistency
between the results of scans occurring serially. An example here is
ALTER COLUMN DROP DEFAULT, which is currently imagined to impact only
writers. However, suppose that a concurrent relcache load fetches the
pg_attribute row, notes that it has atthasdef = true, and then the ALTER
commits before we start to scan pg_attrdef. The consistency checks in
AttrDefaultFetch() will complain about a missing pg_attrdef entry, and
rightly so. We could lobotomize those checks, but it doesn't feel right
to do so; and anyway there may be other cases that are harder to kluge up.

So really we need consistency across *at least* one entire relcache load
cycle. We could maybe arrange to take an MVCC snap (or some lighter
weight version of that) at the start, and use that for all the resulting
scans, but I think that would be notationally messy. It's not clear
that it'd solve everything anyhow. There are parts of a relcache entry
that we fetch only on-demand, so they are typically loaded later than
the core items, and probably couldn't use the same snapshot. Worse,
there are lots of places where we assume that use of catcache entries or
direct examination of the catalogs will yield results consistent with
the relcache.

I suspect these latter problems will impact Simon's idea as well.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-17 19:54:16
Message-ID:	BANLkTimUhqA+dpkpX1xFryfZ8OmVK00b7w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jun 17, 2011 at 3:45 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> Department of second thoughts: I think I see a problem.
>
> Um, yeah, so that doesn't really work any better than my idea.
>
> On further reflection, there's a problem at a higher level than this
> anyway. Even if we can get a single SnapshotNow scan to produce
> guaranteed-self-consistent results, that doesn't ensure consistency
> between the results of scans occurring serially. An example here is
> ALTER COLUMN DROP DEFAULT, which is currently imagined to impact only
> writers. However, suppose that a concurrent relcache load fetches the
> pg_attribute row, notes that it has atthasdef = true, and then the ALTER
> commits before we start to scan pg_attrdef. The consistency checks in
> AttrDefaultFetch() will complain about a missing pg_attrdef entry, and
> rightly so. We could lobotomize those checks, but it doesn't feel right
> to do so; and anyway there may be other cases that are harder to kluge up.
>
> So really we need consistency across *at least* one entire relcache load
> cycle. We could maybe arrange to take an MVCC snap (or some lighter
> weight version of that) at the start, and use that for all the resulting
> scans, but I think that would be notationally messy. It's not clear
> that it'd solve everything anyhow. There are parts of a relcache entry
> that we fetch only on-demand, so they are typically loaded later than
> the core items, and probably couldn't use the same snapshot. Worse,
> there are lots of places where we assume that use of catcache entries or
> direct examination of the catalogs will yield results consistent with
> the relcache.
>
> I suspect these latter problems will impact Simon's idea as well.

I suspect we're going to be told that they don't.

I suspect I'm not going to believe it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-17 20:41:08
Message-ID:	1308343136-sup-6017@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Tom Lane's message of vie jun 17 13:22:40 -0400 2011:

> With this approach, we would have no serialization anomalies from single
> transactions committing while a scan is in progress. There could be
> anomalies resulting from considering an earlier XID to be in-progress
> while a later XID is considered committed (because we didn't observe
> it until later). So far as I can see offhand, the impact of that would
> be that there might be multiple versions of a tuple that are considered
> good, but never that there would be no version considered good (so long
> as the other XIDs simply updated the tuple and didn't delete it). I
> think this would be all right, since the scan would just seize on the
> first good version it finds. As you argue above, if that's not good
> enough for our purposes then the updater(s) should have taken a stronger
> lock.

Hmm, would there be a problem if a scan on catalog A yields results from
supposedly-running transaction X but another scan on catalog B yields
result from transaction Y? (X != Y) For example, a scan on pg_class
says that there are N triggers but scanning pg_trigger says N-1?

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-17 21:02:41
Message-ID:	24369.1308344561@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> I like this feature a lot, but it's hard to imagine that any of the
> fixes anyone has so far suggested can be implemented without
> collateral damage. Nor is there any certainty that this is the last
> bug.

And in fact, here's something else to worry about: consider pg_dump.
pg_dump is pretty heavily reliant on backend catalog-interpretation code
(such as ruleutils.c) that mostly runs on SnapshotNow time. But it also
does a fair amount of work on the basis of its own inspection of the
catalogs, which is done according to the serializable snapshot it gets
at the beginning of the dump run. If these two views of a table's
schema aren't consistent, you might get a pg_dump error, but it's at
least as likely that you'll get a silently incorrect dump.

pg_dump tries to minimize the risk by taking AccessShareLock right away
on each table it's going to dump. This is not perfect but it at least
results in a narrow window for conflicting table changes to occur.
However, that strategy has been blown out of the water by the ALTER
TABLE lock strength reduction. There is now a *very* wide window for
concurrent ALTERs to occur and possibly break the dump results.

As far as I can see, the only simple way to return pg_dump to its
previous level of safety while retaining this patch is to make it take
ShareUpdateExclusiveLocks, so that it will still block all forms of
ALTER TABLE. This is rather unpleasant, since it will also block
autovacuum for the duration of the dump.

In the long run, we really ought to fix things so that ruleutils.c
runs on the transaction snapshot, but that's a massive rewrite that is
certainly not getting done for 9.1, and will likely result in
considerable code bloat :-(.

(BTW, I just noticed that dumpSchema does a pretty fair amount of work
before it gets around to calling getTables, which is where said locks
get taken. Seems like we'd better rearrange the order of operations
there...)

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-17 21:08:25
Message-ID:	24498.1308344905@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Hmm, would there be a problem if a scan on catalog A yields results from
> supposedly-running transaction X but another scan on catalog B yields
> result from transaction Y? (X != Y) For example, a scan on pg_class
> says that there are N triggers but scanning pg_trigger says N-1?

Yeah, I came to that same conclusion downthread.

regards, tom lane

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-17 21:17:03
Message-ID:	1308345410-sup-542@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Tom Lane's message of vie jun 17 17:08:25 -0400 2011:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> > Hmm, would there be a problem if a scan on catalog A yields results from
> > supposedly-running transaction X but another scan on catalog B yields
> > result from transaction Y? (X != Y) For example, a scan on pg_class
> > says that there are N triggers but scanning pg_trigger says N-1?
>
> Yeah, I came to that same conclusion downthread.

Something is seriously wrong with my email :-(

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-17 21:25:01
Message-ID:	BANLkTi=bWPuNuqYebE2EJVhYqUXfsdnzGw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jun 17, 2011 at 5:02 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> As far as I can see, the only simple way to return pg_dump to its
> previous level of safety while retaining this patch is to make it take
> ShareUpdateExclusiveLocks, so that it will still block all forms of
> ALTER TABLE. This is rather unpleasant, since it will also block
> autovacuum for the duration of the dump.

I have been thinking for a while now that it would be sensible to make
vacuum use a different lock type, much as we do for relation
extension. DROP TABLE and CLUSTER and at least some forms of ALTER
TABLE and maybe a few other things like CREATE INDEX would need to
grab that lock in addition to the ones they already acquire, but a
whole lot of other things wouldn't. In particular, it's currently not
possible to lock a table against SELECT without also locking it
against VACUUM - and booting off any auto-vacuum worker that happens
to already be processing it. If you imagine a large table with a
bunch of short-duration exclusive locks, it's not too hard to see how
you can end up with VACUUM starvation.

But that's not something I want to do in 9.1, and I doubt it would
completely solve this problem anyway.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-17 21:36:22
Message-ID:	25184.1308346582@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> I have been thinking for a while now that it would be sensible to make
> vacuum use a different lock type, much as we do for relation
> extension.

Hmm. I had just been toying with the idea of introducing a new
user-visible locking level to allow separation of anti-vacuum locks from
anti-schema-alteration locks. But I think you're probably right that it
could be done as a specialized LockTag. That would make it not easily
user-accessible, but it's hard to think of reasons for users to lock out
vacuum anyway, unless they want to lock out everything via
AccessExclusiveLock.

> ... In particular, it's currently not
> possible to lock a table against SELECT without also locking it
> against VACUUM

Well, it still wouldn't be, since AccessExclusiveLock certainly had
better lock out vacuum. As said above, I think the important thing
is to distinguish vacuum from schema changes.

> But that's not something I want to do in 9.1,

Definitely.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-17 22:34:49
Message-ID:	BANLkTi=ynQJVvy6KtN=UGi-BO4ipTDoJ2w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jun 16, 2011 at 6:54 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> 4. Backend #2 visits the new, about-to-be-committed version of
> pgbench_accounts' pg_class row just before backend #3 commits.
> It sees the row as not good and keeps scanning. By the time it
> reaches the previous version of the row, however, backend #3
> *has* committed. So that version isn't good according to SnapshotNow
> either.

Why isn't this a danger for every pg_class update? For example, it
would seem that if VACUUM updates relpages/reltuples, it would be
prone to this same hazard.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-17 22:41:24
Message-ID:	29703.1308350484@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Thu, Jun 16, 2011 at 6:54 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> 4. Backend #2 visits the new, about-to-be-committed version of
>> pgbench_accounts' pg_class row just before backend #3 commits.
>> It sees the row as not good and keeps scanning. By the time it
>> reaches the previous version of the row, however, backend #3
>> *has* committed. So that version isn't good according to SnapshotNow
>> either.

> <thinks some more>

> Why isn't this a danger for every pg_class update? For example, it
> would seem that if VACUUM updates relpages/reltuples, it would be
> prone to this same hazard.

VACUUM does that with an in-place, nontransactional update. But yes,
this is a risk for every transactional catalog update.

regards, tom lane

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-19 21:13:09
Message-ID:	BANLkTi=a0dEjCT0BLj6iZ3t4f=bHJ5KJTA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jun 17, 2011 at 11:41 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Thu, Jun 16, 2011 at 6:54 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> 4. Backend #2 visits the new, about-to-be-committed version of
>>> pgbench_accounts' pg_class row just before backend #3 commits.
>>> It sees the row as not good and keeps scanning. By the time it
>>> reaches the previous version of the row, however, backend #3
>>> *has* committed. So that version isn't good according to SnapshotNow
>>> either.
>
>> <thinks some more>
>
>> Why isn't this a danger for every pg_class update? For example, it
>> would seem that if VACUUM updates relpages/reltuples, it would be
>> prone to this same hazard.
>
> VACUUM does that with an in-place, nontransactional update. But yes,
> this is a risk for every transactional catalog update.

Well, after various efforts to fix the problem, I notice that there
are two distinct problems brought out by your test case.

One of them is caused by my patch, one of them was already there in
the code - this latter one is actually the hardest to fix.

It took me about an hour to fix the first bug, but its taken a while
of thinking about the second before I realised it was a pre-existing
bug.

The core problem is, as you observed that a pg_class update can cause
rows to be lost with concurrent scans.
We scan pg_class in two ways: to rebuild a relcache entry based on a
relation's oid (easy fix). We also scan pg_class to resolve the name
to oid mapping. The name to oid mapping is performed *without* a lock
on the relation, since we don't know which relation to lock. So the
name lookup can fail if we are in the middle of a pg_class update.
This is an existing potential bug in Postgres unrelated to my patch.
Ref: SearchCatCache()

I've been looking at ways to lock the relation name and namespace
prior to the lookup (or more precisely the hash), but its worth
discussing whether we want that at all?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-20 08:48:49
Message-ID:	BANLkTinOuCxFFex8P4=S0Jf3=ZZWtEMYng@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Jun 19, 2011 at 10:13 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

> pre-existing bug

I've been able to reproduce the bug on 9.0 stable as well, using the
OP's test script.
Multiple errors like this...

ERROR: relation "pgbench_accounts" does not exist
STATEMENT: alter table pgbench_accounts set (fillfactor = 100);

Presume we want to fix both bugs, not just the most recent one so will
continue with both patches.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-20 13:14:23
Message-ID:	BANLkTim5t+83pW9F5w6Ab_7GCqTL0kmxVA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Jun 19, 2011 at 5:13 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Fri, Jun 17, 2011 at 11:41 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>> On Thu, Jun 16, 2011 at 6:54 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>> 4. Backend #2 visits the new, about-to-be-committed version of
>>>> pgbench_accounts' pg_class row just before backend #3 commits.
>>>> It sees the row as not good and keeps scanning. By the time it
>>>> reaches the previous version of the row, however, backend #3
>>>> *has* committed. So that version isn't good according to SnapshotNow
>>>> either.
>>
>>> <thinks some more>
>>
>>> Why isn't this a danger for every pg_class update? For example, it
>>> would seem that if VACUUM updates relpages/reltuples, it would be
>>> prone to this same hazard.
>>
>> VACUUM does that with an in-place, nontransactional update. But yes,
>> this is a risk for every transactional catalog update.
>
> Well, after various efforts to fix the problem, I notice that there
> are two distinct problems brought out by your test case.
>
> One of them is caused by my patch, one of them was already there in
> the code - this latter one is actually the hardest to fix.
>
> It took me about an hour to fix the first bug, but its taken a while
> of thinking about the second before I realised it was a pre-existing
> bug.
>
> The core problem is, as you observed that a pg_class update can cause
> rows to be lost with concurrent scans.
> We scan pg_class in two ways: to rebuild a relcache entry based on a
> relation's oid (easy fix). We also scan pg_class to resolve the name
> to oid mapping. The name to oid mapping is performed *without* a lock
> on the relation, since we don't know which relation to lock. So the
> name lookup can fail if we are in the middle of a pg_class update.
> This is an existing potential bug in Postgres unrelated to my patch.
> Ref: SearchCatCache()
>
> I've been looking at ways to lock the relation name and namespace
> prior to the lookup (or more precisely the hash), but its worth
> discussing whether we want that at all?

If this is a pre-existing bug, then it's not clear to me why we need
to do anything about it at all right now. I mean, it would be nice to
have a fix, but it's hard to imagine that any proposed fix will be
low-risk, and I don't remember user complaints about this. I continue
to think that the root of the problem here is that SnapshotNow is Evil
(TM). If we get rid of that, then this problem goes away, but that
strikes me as a long-term project.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-20 13:24:47
Message-ID:	BANLkTin_m_XHbgVm7yPh_o72cP4x1YYVkg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jun 20, 2011 at 2:14 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> If this is a pre-existing bug, then it's not clear to me why we need
> to do anything about it at all right now. I mean, it would be nice to
> have a fix, but it's hard to imagine that any proposed fix will be
> low-risk, and I don't remember user complaints about this. I continue
> to think that the root of the problem here is that SnapshotNow is Evil
> (TM). If we get rid of that, then this problem goes away, but that
> strikes me as a long-term project.

There are 2 bugs, one caused by my patch in 9.1, one that is pre-existing.

The 9.1 bug can be fixed easily. I will edit my patch down and repost
here shortly.

The pre-existing bug is slightly harder/contentious because we have to
lock the name of a possible relation, even before we know it exists.
I've been looking to implement that as a lock on the uint32 hash of
the relation's name and namespace. I'm looking for opinions ranging
from fix-now-and-backpatch thru to ignore and discuss for 9.2.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Simon Riggs" <simon(at)2ndQuadrant(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc:	"pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-20 13:42:51
Message-ID:	4DFF080B020000250003E92F@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> wrote:

> I'm looking for opinions ranging from fix-now-and-backpatch thru
> to ignore and discuss for 9.2.

If it's a pre-existing bug I would think that one option would be to
put it into the next bug-fix release of each supported major release
in which it is manifest. Of course, if it is *safe* to work it into
9.1, that'd be great.

-Kevin

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-20 16:57:29
Message-ID:	BANLkTimeEd0xzO1tBC4EGqvs+0i82vrd2g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jun 20, 2011 at 9:42 AM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> wrote:
>
>> I'm looking for opinions ranging from fix-now-and-backpatch thru
>> to ignore and discuss for 9.2.
>
> If it's a pre-existing bug I would think that one option would be to
> put it into the next bug-fix release of each supported major release
> in which it is manifest. Of course, if it is *safe* to work it into
> 9.1, that'd be great.

I'm currently on the other end of the spectrum: ignore and consider for 9.2.

But that's mostly based on the belief that there isn't going to be a
way of fixing this that isn't far too invasive to back-patch. Should
that turn out to be incorrect, that's a different matter, of course...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>, "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-20 17:21:59
Message-ID:	4DFF3B67020000250003E9CF@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> I'm currently on the other end of the spectrum: ignore and
> consider for 9.2.

I guess part of my point was that if we can't safely get something
into the initial 9.1 release, it doesn't mean we necessarily need to
wait for 9.2. Bugs can be fixed along the way in minor releases. A
9.1.1 fix might give us more time to work through details and ensure
that it is a safe and well-targeted fix.

-Kevin

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-20 21:56:19
Message-ID:	5961.1308606979@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Sun, Jun 19, 2011 at 5:13 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> We scan pg_class in two ways: to rebuild a relcache entry based on a
>> relation's oid (easy fix). We also scan pg_class to resolve the name
>> to oid mapping. The name to oid mapping is performed *without* a lock
>> on the relation, since we don't know which relation to lock. So the
>> name lookup can fail if we are in the middle of a pg_class update.
>> This is an existing potential bug in Postgres unrelated to my patch.

> If this is a pre-existing bug, then it's not clear to me why we need
> to do anything about it at all right now.

Yeah. This behavior has been there since day zero, and there have been
very few complaints about it. But note that there's only a risk for
pg_class updates, not any other catalog, and there is exactly one kind
of failure with very predictable consequences. The ALTER TABLE patch
has greatly expanded the scope of the issue, and that *is* a regression
compared to prior releases.

BTW, it seems to me that this issue is closely related to what Noah is
trying to fix here:
http://archives.postgresql.org/message-id/20110612191843.GF21098@tornado.leadboat.com

regards, tom lane

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-20 22:55:35
Message-ID:	BANLkTi=SCuC9mmMh3DgmLZDh32SGOiGaKQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jun 20, 2011 at 10:56 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Sun, Jun 19, 2011 at 5:13 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>> We scan pg_class in two ways: to rebuild a relcache entry based on a
>>> relation's oid (easy fix). We also scan pg_class to resolve the name
>>> to oid mapping. The name to oid mapping is performed *without* a lock
>>> on the relation, since we don't know which relation to lock. So the
>>> name lookup can fail if we are in the middle of a pg_class update.
>>> This is an existing potential bug in Postgres unrelated to my patch.
>
>> If this is a pre-existing bug, then it's not clear to me why we need
>> to do anything about it at all right now.
>
> Yeah. This behavior has been there since day zero, and there have been
> very few complaints about it. But note that there's only a risk for
> pg_class updates, not any other catalog, and there is exactly one kind
> of failure with very predictable consequences.

I agree we shouldn't do anything about the name lookups for 9.1
That is SearchCatCache using RELNAMENSP lookups, to be precise, as
well as triggers and few other similar call types.

> The ALTER TABLE patch
> has greatly expanded the scope of the issue, and that *is* a regression
> compared to prior releases.

I agree the scope for RELOID errors increased with my 9.1 patch. I'm
now happy with the locking patch (attached), which significantly
reduces the scope - back to the original error scope, in my testing.

I tried to solve both, but I think that's a step too far given the timing.

It seems likely that there will be objections to this patch. All I
would say is that issuing a stream of ALTER TABLEs against the same
table is not a common situation; if it were we would have seen more of
the pre-existing bug. ALTER TABLE command encompasses many subcommands
and we should evaluate each subcommand differently when we decide what
to do.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment	Content-Type	Size
relation_definition_lock.v3.patch	application/octet-stream	10.1 KB

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-20 23:03:00
Message-ID:	BANLkTinmADCTafq4Ko-aYep-MtG+Eu-_xQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jun 20, 2011 at 11:55 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

> I agree we shouldn't do anything about the name lookups for 9.1
> That is SearchCatCache using RELNAMENSP lookups, to be precise, as
> well as triggers and few other similar call types.

Name lookups give ERRORs that look like this...

relation "pgbench_accounts" does not exist

>> The ALTER TABLE patch
>> has greatly expanded the scope of the issue, and that *is* a regression
>> compared to prior releases.
>
> I agree the scope for RELOID errors increased with my 9.1 patch.

Which originally generated ERRORs like

cache lookup failed for relation 16390
or
could not open relation with OID 16390

which should now be absent from the server log.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-21 13:37:11
Message-ID:	BANLkTikpkyKBSY41A_cLDW2QE7rbGLqv=g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jun 20, 2011 at 5:56 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Sun, Jun 19, 2011 at 5:13 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>> We scan pg_class in two ways: to rebuild a relcache entry based on a
>>> relation's oid (easy fix). We also scan pg_class to resolve the name
>>> to oid mapping. The name to oid mapping is performed *without* a lock
>>> on the relation, since we don't know which relation to lock. So the
>>> name lookup can fail if we are in the middle of a pg_class update.
>>> This is an existing potential bug in Postgres unrelated to my patch.
>
>> If this is a pre-existing bug, then it's not clear to me why we need
>> to do anything about it at all right now.
>
> Yeah. This behavior has been there since day zero, and there have been
> very few complaints about it. But note that there's only a risk for
> pg_class updates, not any other catalog, and there is exactly one kind
> of failure with very predictable consequences. The ALTER TABLE patch
> has greatly expanded the scope of the issue, and that *is* a regression
> compared to prior releases.

It's not entirely clear to me how many additional failure cases we've
bought ourselves with this patch. The particular one you've
demonstrated seems pretty similar to the on we already had, although
possibly the window for it is wider. Did you run the same test you
used on 9.1 on 9.0 for comparison?

> BTW, it seems to me that this issue is closely related to what Noah is
> trying to fix here:
> http://archives.postgresql.org/message-id/20110612191843.GF21098@tornado.leadboat.com

I think there are two general problems here. One, we use SnapshotNow
to scan system catalogs, and SnapshotNow semantics are a mess. Two,
even if we used an MVCC snapshot to scan the system catalogs, it's not
necessarily safe or correct to latch onto an old row version, because
the row update is combined with other actions that are not MVCC-safe,
like removing files on disk. Breaking it down a bit more:

A. The problem with using a DDL lock level < AccessExclusiveLock,
AFAICS, is entirely due to SnapshotNow semantics. Any row version we
can possibly see is OK, but we had better see exactly one. Or at
least not less than one.

B. The problem with name lookups failing in the middle of a pg_class
update is also entirely due to SnapshotNow semantics.

C. The problem Noah is complaining about is NOT due to SnapshotNow
semantics. If someone does BEGIN; DROP TABLE foo; CREATE TABLE foo();
COMMIT, it is no longer OK for anyone to be looking at the old foo.
An MVCC snapshot is enough to guarantee that we'll see some row, but
examining the old one won't cut it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-21 13:40:16
Message-ID:	BANLkTimOb+t7UNQNGaheaLJPKw-PyewWYw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jun 20, 2011 at 6:55 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> I agree the scope for RELOID errors increased with my 9.1 patch. I'm
> now happy with the locking patch (attached), which significantly
> reduces the scope - back to the original error scope, in my testing.
>
> I tried to solve both, but I think that's a step too far given the timing.
>
> It seems likely that there will be objections to this patch. All I
> would say is that issuing a stream of ALTER TABLEs against the same
> table is not a common situation; if it were we would have seen more of
> the pre-existing bug. ALTER TABLE command encompasses many subcommands
> and we should evaluate each subcommand differently when we decide what
> to do.

Well, my principal objection is that I think heavyweight locking is an
excessively expensive solution to this problem. I think the patch is
simple enough that I wouldn't object to applying it on those grounds
even at this late date, but I bet if we do some benchmarking on the
right workload we'll find a significant performance regression.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-21 15:06:22
Message-ID:	14084.1308668782@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Mon, Jun 20, 2011 at 5:56 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Yeah. This behavior has been there since day zero, and there have been
>> very few complaints about it. But note that there's only a risk for
>> pg_class updates, not any other catalog, and there is exactly one kind
>> of failure with very predictable consequences. The ALTER TABLE patch
>> has greatly expanded the scope of the issue, and that *is* a regression
>> compared to prior releases.

> It's not entirely clear to me how many additional failure cases we've
> bought ourselves with this patch. The particular one you've
> demonstrated seems pretty similar to the on we already had, although
> possibly the window for it is wider.

It's not so much the probability of failure that is bothering me, as
the variety of possible symptoms. There was exactly one failure mode
before, namely "no such relation". I'm not sure how many possible
symptoms there are now, but there's a lot, and most of them are going
to be weird "what the heck was that??" behaviors. If we let 9.1 ship
like this, we are going to be creating a support headache. Even worse,
knowing that those bugs exist will tempt us to write off reports of
weird cache lookup failures as being instances of this problem, when
closer investigation might show that they're something else.

Please note that this position should not be regarded as support for
Simon's proposed patch. I still think the right decision is to revert
the ALTER TABLE feature, mainly because I do not believe this is the
last bug in it. And the fact that there's a pre-existing bug with a
vaguely similar symptom is no justification for introducing more bugs.

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-21 15:12:35
Message-ID:	14208.1308669155@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> On Mon, Jun 20, 2011 at 10:56 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> The ALTER TABLE patch
>> has greatly expanded the scope of the issue, and that *is* a regression
>> compared to prior releases.

> I agree the scope for RELOID errors increased with my 9.1 patch. I'm
> now happy with the locking patch (attached), which significantly
> reduces the scope - back to the original error scope, in my testing.

> I tried to solve both, but I think that's a step too far given the timing.

> It seems likely that there will be objections to this patch.

Yup, you're right. Having read this patch, I have absolutely zero
confidence in it. It introduces some locks in random places, with no
rhyme or reason that I can see. There is no reason to think that this
is a complete solution, and considerable reason to think that it isn't
(notably, the RELOID syscache is hardly the only one at risk). Worse,
it's adding more locking in performance-critical places, which seems
to me to severely degrade the argument for the original feature,
namely that it was supposed to give us *less* locking.

regards, tom lane

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-21 17:13:08
Message-ID:	1308675968-sup-3892@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Robert Haas's message of mar jun 21 09:40:16 -0400 2011:
> On Mon, Jun 20, 2011 at 6:55 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> > I agree the scope for RELOID errors increased with my 9.1 patch. I'm
> > now happy with the locking patch (attached), which significantly
> > reduces the scope - back to the original error scope, in my testing.
> >
> > I tried to solve both, but I think that's a step too far given the timing.
> >
> > It seems likely that there will be objections to this patch. All I
> > would say is that issuing a stream of ALTER TABLEs against the same
> > table is not a common situation; if it were we would have seen more of
> > the pre-existing bug. ALTER TABLE command encompasses many subcommands
> > and we should evaluate each subcommand differently when we decide what
> > to do.
>
> Well, my principal objection is that I think heavyweight locking is an
> excessively expensive solution to this problem. I think the patch is
> simple enough that I wouldn't object to applying it on those grounds
> even at this late date, but I bet if we do some benchmarking on the
> right workload we'll find a significant performance regression.

Yeah, taking a hw lock at each relcache item build is likely to be
prohibitively expensive. Heck, relation extension is expensive already
in some loads. (I'm guessing that things will tank when there's a
relcache reset). Still, this seems to be an overall better approach to
the problem than what's been proposed elsewhere. Maybe we can do
something with a map of relations that are protected by a bunch of
lwlocks instead. (We could use that for relation extension too; that'd
rock.)

The patch may be simple, but is it complete? I think you need to have
lock acquisition on create rule and create index too. Right now it only
has locks on trigger stuff and alter table.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-21 18:24:31
Message-ID:	1308680556-sup-3287@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Tom Lane's message of mar jun 21 11:06:22 -0400 2011:

> Please note that this position should not be regarded as support for
> Simon's proposed patch. I still think the right decision is to revert
> the ALTER TABLE feature, mainly because I do not believe this is the
> last bug in it. And the fact that there's a pre-existing bug with a
> vaguely similar symptom is no justification for introducing more bugs.

Note that this feature can be disabled by tweaking
AlterTableGetLockLevel so that it always returns AccessExclusive.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-21 18:37:42
Message-ID:	5540.1308681462@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Excerpts from Tom Lane's message of mar jun 21 11:06:22 -0400 2011:
>> Please note that this position should not be regarded as support for
>> Simon's proposed patch. I still think the right decision is to revert
>> the ALTER TABLE feature, mainly because I do not believe this is the
>> last bug in it. And the fact that there's a pre-existing bug with a
>> vaguely similar symptom is no justification for introducing more bugs.

> Note that this feature can be disabled by tweaking
> AlterTableGetLockLevel so that it always returns AccessExclusive.

I think Simon had also hacked a couple of other places such as CREATE
TRIGGER, but yeah, I was thinking of just lobotomizing that function
with an #ifdef. When and if we get these problems worked out, it'll
be easy to re-enable the feature.

regards, tom lane

From:	Simon Riggs <simon(at)2ndquadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-22 15:26:21
Message-ID:	BANLkTinkzstqMRer81D_8y_Q+NrkwWt=gg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jun 21, 2011 at 4:12 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
>> On Mon, Jun 20, 2011 at 10:56 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> The ALTER TABLE patch
>>> has greatly expanded the scope of the issue, and that *is* a regression
>>> compared to prior releases.
>
>> I agree the scope for RELOID errors increased with my 9.1 patch. I'm
>> now happy with the locking patch (attached), which significantly
>> reduces the scope - back to the original error scope, in my testing.
>
>> I tried to solve both, but I think that's a step too far given the timing.
>
>> It seems likely that there will be objections to this patch.
>
> Yup, you're right. Having read this patch, I have absolutely zero
> confidence in it. It introduces some locks in random places, with no
> rhyme or reason that I can see.

There is something to be salvaged here and I have additional analysis
and suggestions. Please read all the way through.

We already allowed INHERIT/dis INHERIT with a lower lock level than
AccessExclusiveLock and have received no complaints to speak of. We
aren't discussing moving that operation to AccessExclusiveLock in the
light of this discussion, nor are we discussing TYPE OF operations. I
don't think its sensible to reset those operations to
AccessExclusiveLock.

The patch has extensive comments that explain the carefully analysed
and tested locations for the locking.

We take an ExclusiveLock on the RelationDefinition during an ALTER TABLE.
We take a ShareLock on CatCache and RelCache rebuilds.

The only time these can block is during an ALTER TABLE with a lower
lock than AccessExclusiveLock that occurs immediately after other DDL.
The locking has been placed to minimise the lock window for simple
ALTER TABLE statements.

> There is no reason to think that this
> is a complete solution, and considerable reason to think that it isn't
> (notably, the RELOID syscache is hardly the only one at risk).

The whole of the relcache rebuild is protected, not just access to
pg_class, so that is safe, no matter which catalog tables are
accessed.

Catcache does look more complex, so your fears here are worth looking
into. I was already doing that since my last post and have now
finished.

Based on the above, I think we should limit the patch to these areas:

1. Simple operations on pg_class and/or pg_attribute. This includes
case AT_SetStatistics: ATTNAME catcache
case AT_SetOptions: ATTNAME catcache
case AT_ResetOptions: ATTNAME catcache
case AT_SetStorage: ATTNAME catcache
case AT_SetRelOptions: RELOID catcache
case AT_ResetRelOptions: RELOID catcache

2. FK VALIDATION which touches CONSTROID

Limiting the scope of the patch to those areas provides good benefit,
whilst at the same time limiting our risk footprint. Doing that means
we can evaluate the patch against those aspects.

For people that still think there is still risk, I've added a
parameter called "relaxed_ddl_locking" which defaults to "off". That
way people can use this feature in an informed way without exposing us
to support risks from its use.

[Patch needs minor docs and test documentation changes prior to commit]

> Worse,
> it's adding more locking in performance-critical places, which seems
> to me to severely degrade the argument for the original feature,
> namely that it was supposed to give us *less* locking.

I think we should be clear that this adds *one* lock/unlock for each
object for each session, ONCE only after each transaction that
executed a DDL statement against an object. So with 200 sessions, a
typical ALTER TABLE statement will cause 400 locks. The current lock
manager maxes out above 50,000 locks per second, so any comparative
analysis will show this is a minor blip on even a heavy workload.

It isn't in a performance critical place and the lock held in case of
a rebuild is a ShareLock so won't block, accept against an ALTER TABLE
statement.

This is a marked *reduction* in locking overhead, in comparison with
ALTER TABLE being an AccessExclusiveLock which would make the SQL
statement wait for potentially a very long time.

With relaxed_ddl_locking = off (default) there is zero effect.

So that is not a valid argument to refuse this patch.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment	Content-Type	Size
relaxed_ddl_locking.v4.patch	application/octet-stream	13.3 KB

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-23 19:59:27
Message-ID:	BANLkTikdh2noLuwKPgGdSsm3cAJOi79_yA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jun 22, 2011 at 11:26 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> For people that still think there is still risk, I've added a
> parameter called "relaxed_ddl_locking" which defaults to "off". That
> way people can use this feature in an informed way without exposing us
> to support risks from its use.

I can't get excited about adding a GUC that says "you can turn this
off if you want but don't blame us if it breaks". That just doesn't
seem like good software engineering to me.

> I think we should be clear that this adds *one* lock/unlock for each
> object for each session, ONCE only after each transaction that
> executed a DDL statement against an object. So with 200 sessions, a
> typical ALTER TABLE statement will cause 400 locks. The current lock
> manager maxes out above 50,000 locks per second, so any comparative
> analysis will show this is a minor blip on even a heavy workload.

I think that using locks to address this problem is the wrong
approach. I think the right fix is to use something other than
SnapshotNow to do the system catalog scans. However, if we were going
to use locking, then we'd need to ensure that:

(1) No transaction which has made changes to a system catalog may
commit while some other transaction is in the middle of scanning that
catalog.
(2) No transaction which has made changes to a set of system catalogs
may commit while some other transaction is in the midst of fetching
interrelated data from a subset of those catalogs.

It's important to note, I think, that the problem here doesn't occur
at the time that the table rows are updated, because those rows aren't
visible yet. The problem happens at commit time. So unless I'm
missing something, ALTER TABLE would only need to acquire the relation
definition lock after it has finished all of its other work. In fact,
it doesn't even really need to acquire it then, either. It could just
queue up a list of relation definition locks to acquire immediately
before commit, which would be advantageous if the transaction went on
to abort, since in that case we'd skip the work of acquiring the locks
at all.

In fact, without doing that, this patch would undo much of the point
of the original ALTER TABLE lock strength reduction:

begin;
alter table foo alter column a set storage plain;
<start a new psql session in another window>
select * from foo;

In master, the SELECT completes without blocking. With this patch
applied, the SELECT blocks, just as it did in 9.0, because it can't
rebuild the relcache entry without getting the relation definition
lock, which the alter table will hold until commit. In fact, the same
thing happens even if foo has been accessed previously in the same
session. If there is already an open *transaction* that has accessed
foo, then, it seems, it can continue to do so until it commits. But
as soon as it tries to start a new transaction, it blocks again. I
don't quite understand why that is - I didn't think we invalidated the
entire relcache on every commit. But that's what happens.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Simon Riggs <simon(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-24 19:46:17
Message-ID:	BANLkTik8KAbV3Ka5jgjHy5hpdFOF+fWmDQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jun 23, 2011 at 8:59 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Jun 22, 2011 at 11:26 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> For people that still think there is still risk, I've added a
>> parameter called "relaxed_ddl_locking" which defaults to "off". That
>> way people can use this feature in an informed way without exposing us
>> to support risks from its use.
>
> I can't get excited about adding a GUC that says "you can turn this
> off if you want but don't blame us if it breaks". That just doesn't
> seem like good software engineering to me.

Nobody is suggesting we fix the pre-existing bug - we're just going to
leave it. That doesn't sound like good software engineering either,
but I'm not going to complain because I understand.

We're in a bind here and I'm trying to suggest practical ways out, as
well as cater for the levels of risk people are willing to accept. I
don't think we need that parameter at all, but if you do then I'm
willing to tolerate it.

>> I think we should be clear that this adds *one* lock/unlock for each
>> object for each session, ONCE only after each transaction that
>> executed a DDL statement against an object. So with 200 sessions, a
>> typical ALTER TABLE statement will cause 400 locks. The current lock
>> manager maxes out above 50,000 locks per second, so any comparative
>> analysis will show this is a minor blip on even a heavy workload.
>
> I think that using locks to address this problem is the wrong
> approach. I think the right fix is to use something other than
> SnapshotNow to do the system catalog scans.

I agree with that, but its a much bigger job than you think and I
really doubt that you will do it for 9.2, definitely not for 9.1.

There are so many call points, I would not bother personally to
attempt a such an critical and widely invasive fix.

So I'd put that down as a Blue Sky will-fix-one-day thing.

> However, if we were going
> to use locking, then we'd need to ensure that:
>
> (1) No transaction which has made changes to a system catalog may
> commit while some other transaction is in the middle of scanning that
> catalog.
> (2) No transaction which has made changes to a set of system catalogs
> may commit while some other transaction is in the midst of fetching
> interrelated data from a subset of those catalogs.
>
> It's important to note, I think, that the problem here doesn't occur
> at the time that the table rows are updated, because those rows aren't
> visible yet. The problem happens at commit time. So unless I'm
> missing something, ALTER TABLE would only need to acquire the relation
> definition lock after it has finished all of its other work. In fact,
> it doesn't even really need to acquire it then, either. It could just
> queue up a list of relation definition locks to acquire immediately
> before commit, which would be advantageous if the transaction went on
> to abort, since in that case we'd skip the work of acquiring the locks
> at all.

That would work if the only thing ALTER TABLE does is write. There are
various places where we read catalogs and all of those catalogs need
to be relationally integrated. So the only safe way, without lots of
incredibly detailed analysis (which you would then find fault with) is
to lock at the top and hold the lock throughout most of the
processing. That's the safe thing to do, at least.

I do already use the point you make in the patch, where it's used to
unlock before and then relock after the FK constraint check, so that
the RelationDefinition lock is never held for long periods in any code
path.

> In fact, without doing that, this patch would undo much of the point
> of the original ALTER TABLE lock strength reduction:
>
> begin;
> alter table foo alter column a set storage plain;
> <start a new psql session in another window>
> select * from foo;
>
> In master, the SELECT completes without blocking. With this patch
> applied, the SELECT blocks, just as it did in 9.0, because it can't
> rebuild the relcache entry without getting the relation definition
> lock, which the alter table will hold until commit.

It's not the same at all. Yes, they are both locks; what counts is the
frequency and duration of locking.

With AccessExclusiveLock we prevent all SELECTs from accessing the
table while we queue for the lock, which can be blocked behind a long
running query. So the total outage to the application for one minor
change can be hours - unpredictably. (Which is why I never wrote the
ALTER TABLE set ndistinct myself, even though I suggested it, cos its
mostly unusable).

With the reduced locking level AND this patch the *rebuild* can be
blocked behind the DDL. But the DDL itself is not blocked in the same
way, so the total outage is much reduced and the whole set of actions
goes through fairly quickly.

If you only submit one DDL operation then the rebuild has nothing to
block against, so the whole thing goes through smoothly.

> In fact, the same
> thing happens even if foo has been accessed previously in the same
> session. If there is already an open *transaction* that has accessed
> foo, then, it seems, it can continue to do so until it commits. But
> as soon as it tries to start a new transaction, it blocks again. I
> don't quite understand why that is - I didn't think we invalidated the
> entire relcache on every commit.

We don't invalidate the relcache on every commit.

> But that's what happens.

Test case please. I don't understand the problem you're describing.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-24 20:00:27
Message-ID:	BANLkTin49Z+7BSO5NHu4ay6w3jn=rAM1Yw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jun 24, 2011 at 3:46 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> Test case please. I don't understand the problem you're describing.

S1: select * from foo;
S2: begin;
S2: alter table foo alter column a set storage plain;
S1: select * from foo;
<blocks>

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-24 20:27:22
Message-ID:	BANLkTimQ03Ref-eXUChQOFVxEAPvP6f7_w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jun 24, 2011 at 9:00 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Jun 24, 2011 at 3:46 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> Test case please. I don't understand the problem you're describing.
>
> S1: select * from foo;
> S2: begin;
> S2: alter table foo alter column a set storage plain;
> S1: select * from foo;
> <blocks>

Er,,.yes, that what locks do. Where is the bug?

We have these choices of behaviour
1. It doesn't error and doesn't block - not possible for 9.1, probably
not for 9.2 either
2. It doesn't block, but may throw an error sometimes - the reported bug
3. It blocks in some cases for short periods where people do repeated
DDL, but never throws errors - this patch
4. Full scale locking - human sacrifice, cats and dogs, living
together, mass hysteria

If you want to avoid the blocking, then don't hold open the transaction.

Do this

S1: select * from foo
S2: alter table.... run in its own transaction
S1: select * from foo

Doesn't block, no errors. Which is exactly what most people do on
their production servers. The ALTER TABLE statements we're talking
about are not schema changes. They don't need to be coordinated with
other DDL.

This patch has locking, but its the most reduced form of locking that
is available for a non invasive patch for 9.1

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-24 21:08:30
Message-ID:	BANLkTi=cy+VyyrR=GtMBH5tf_q=gzeVtGQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jun 24, 2011 at 4:27 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Fri, Jun 24, 2011 at 9:00 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Fri, Jun 24, 2011 at 3:46 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>> Test case please. I don't understand the problem you're describing.
>>
>> S1: select * from foo;
>> S2: begin;
>> S2: alter table foo alter column a set storage plain;
>> S1: select * from foo;
>> <blocks>
>
> Er,,.yes, that what locks do. Where is the bug?

I didn't say it was a bug. I said that the additional locking you
added acted to remove the much of the benefit of reducing the lock
strength in the first place. If a brief exclusive lock (such as would
be taken by ALTER TABLE SET STORAGE in 9.0 and prior release) wasn't
acceptable, a brief exclusive lock on a different lock tag that blocks
most of the same operations isn't going to be any better. You might
still see some improvement in the cases where some complicated
operation like scanning or rewriting the table can be performed
without holding the relation definition lock, and then only get the
relation definition lock afterward. But for something like the above
example (or the ALTER TABLE SET (n_distinct) feature you mentioned in
your previous email) the benefit is going to be darned close to zero.

> This patch has locking, but its the most reduced form of locking that
> is available for a non invasive patch for 9.1

I don't like the extra lock manager traffic this adds to operations
that aren't even performing DDL, I'm not convinced that it is safe,
the additional locking negates a significant portion of the benefit of
the original patch, and Tom's made a fairly convincing argument that
even with this pg_dump will still become significantly less reliable
than heretofore even if we did apply it. In my view, what we need to
do is revert to using AccessExclusiveLock until some of these other
details are worked out. I wasn't entirely convinced that that was
necessary when Tom first posted this email, but having thought through
the issues and looked over your patch, it's now my position that that
is the best way forward. The fact that your proposed patch appears
not to be thinking very clearly about when locks need to be taken and
released only adds to my feeling that we are in for a bad time if we
go down this route.

In short: -1 from me.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-24 21:28:23
Message-ID:	BANLkTim9ArQZrNbS+NgMGQTp6P-e08v2ug@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jun 24, 2011 at 10:08 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Jun 24, 2011 at 4:27 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> On Fri, Jun 24, 2011 at 9:00 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> On Fri, Jun 24, 2011 at 3:46 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>>> Test case please. I don't understand the problem you're describing.
>>>
>>> S1: select * from foo;
>>> S2: begin;
>>> S2: alter table foo alter column a set storage plain;
>>> S1: select * from foo;
>>> <blocks>
>>
>> Er,,.yes, that what locks do. Where is the bug?
>
> I didn't say it was a bug. I said that the additional locking you
> added acted to remove the much of the benefit of reducing the lock
> strength in the first place. If a brief exclusive lock (such as would
> be taken by ALTER TABLE SET STORAGE in 9.0 and prior release) wasn't
> acceptable, a brief exclusive lock on a different lock tag that blocks
> most of the same operations isn't going to be any better. You might
> still see some improvement in the cases where some complicated
> operation like scanning or rewriting the table can be performed
> without holding the relation definition lock, and then only get the
> relation definition lock afterward. But for something like the above
> example (or the ALTER TABLE SET (n_distinct) feature you mentioned in
> your previous email) the benefit is going to be darned close to zero.
>
>> This patch has locking, but its the most reduced form of locking that
>> is available for a non invasive patch for 9.1
>
> I don't like the extra lock manager traffic this adds to operations
> that aren't even performing DDL, I'm not convinced that it is safe,
> the additional locking negates a significant portion of the benefit of
> the original patch, and Tom's made a fairly convincing argument that
> even with this pg_dump will still become significantly less reliable
> than heretofore even if we did apply it. In my view, what we need to
> do is revert to using AccessExclusiveLock until some of these other
> details are worked out. I wasn't entirely convinced that that was
> necessary when Tom first posted this email, but having thought through
> the issues and looked over your patch, it's now my position that that
> is the best way forward. The fact that your proposed patch appears
> not to be thinking very clearly about when locks need to be taken and
> released only adds to my feeling that we are in for a bad time if we
> go down this route.
>
> In short: -1 from me.

I've explained all of the above points to you already and you're wrong.

I don't think you understand this patch at all, nor even the original feature.

Locking the table is completely different from locking the definition
of a table.

Do you advocate that all ALTER TABLE operations use
AccessExclusiveLock, or just the operations I added? If you see
problems here, then you must also be willing to increase the lock
strength of things such as INHERITS, and back patch them to where the
same problems exist. You'll wriggle out of that, I'm sure. There are
regrettably, many bugs here and they can't be fixed in the simple
manner you propose.

It's not me you block Robert, I'm not actually a user and I will sleep
well whatever happens, happy that I tried to resolve this. Users watch
and remember.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-24 21:43:47
Message-ID:	BANLkTin6CYn1cCSa=ptN-vm19L6XiN_THQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jun 24, 2011 at 5:28 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> I've explained all of the above points to you already and you're wrong.

We're going to have to agree to disagree on that point.

> Do you advocate that all ALTER TABLE operations use
> AccessExclusiveLock, or just the operations I added? If you see
> problems here, then you must also be willing to increase the lock
> strength of things such as INHERITS, and back patch them to where the
> same problems exist. You'll wriggle out of that, I'm sure. There are
> regrettably, many bugs here and they can't be fixed in the simple
> manner you propose.

I think there is quite a lot of difference between realizing that we
can't fix every problem, and deciding to put out a release that adds a
whole lot more of them that we have no plans to fix.

> It's not me you block Robert, I'm not actually a user and I will sleep
> well whatever happens, happy that I tried to resolve this. Users watch
> and remember.

If you are proposing that I should worry about a posse of angry
PostgreSQL users hunting me down (or abandoning the product) because I
agreed with Tom Lane on the necessity of reverting one of your
patches, then I'm willing to take that chance. For one thing, there's
a pretty good chance they'll go after Tom first. For two things,
there's at least an outside chance I might be rescued by an
alternative posse who supports our tradition of putting out high
quality releases.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-26 16:16:27
Message-ID:	BANLkTinhkr=vFe5=vMn6e_ynaPFA_+A=7w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jun 24, 2011 at 10:43 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

>> It's not me you block Robert, I'm not actually a user and I will sleep
>> well whatever happens, happy that I tried to resolve this. Users watch
>> and remember.
>
> If you are proposing that I should worry about a posse of angry
> PostgreSQL users hunting me down (or abandoning the product) because I
> agreed with Tom Lane on the necessity of reverting one of your
> patches, then I'm willing to take that chance. For one thing, there's
> a pretty good chance they'll go after Tom first. For two things,
> there's at least an outside chance I might be rescued by an
> alternative posse who supports our tradition of putting out high
> quality releases.

Not sure where anger and violence entered the discussion, but a posse
does sound amusing.

If such groups ever met, I'm sure both would support the tradition of
high quality, but I suspect they might define it differently and that
would be the source of the problem. Much like Life of Brian.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-27 08:11:52
Message-ID:	BANLkTikduzKLNSwO=L51kf49pV00DgfKuw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jun 17, 2011 at 8:45 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> Department of second thoughts: I think I see a problem.
>
> Um, yeah, so that doesn't really work any better than my idea.
>
> On further reflection, there's a problem at a higher level than this
> anyway. Even if we can get a single SnapshotNow scan to produce
> guaranteed-self-consistent results, that doesn't ensure consistency
> between the results of scans occurring serially. An example here is
> ALTER COLUMN DROP DEFAULT, which is currently imagined to impact only
> writers. However, suppose that a concurrent relcache load fetches the
> pg_attribute row, notes that it has atthasdef = true, and then the ALTER
> commits before we start to scan pg_attrdef. The consistency checks in
> AttrDefaultFetch() will complain about a missing pg_attrdef entry, and
> rightly so. We could lobotomize those checks, but it doesn't feel right
> to do so; and anyway there may be other cases that are harder to kluge up.

Locking the whole definition is at least one way of solving this
problem. My locking fix does that.

> So really we need consistency across *at least* one entire relcache load
> cycle. We could maybe arrange to take an MVCC snap (or some lighter
> weight version of that) at the start, and use that for all the resulting
> scans, but I think that would be notationally messy. It's not clear
> that it'd solve everything anyhow. There are parts of a relcache entry
> that we fetch only on-demand, so they are typically loaded later than
> the core items, and probably couldn't use the same snapshot. Worse,
> there are lots of places where we assume that use of catcache entries or
> direct examination of the catalogs will yield results consistent with
> the relcache.
>
> I suspect these latter problems will impact Simon's idea as well.

I think you're probably right, or at least, the suspicion is not
something I can address quickly enough to be safe.

I will revert to the AccessExclusiveLocks.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-06-27 08:46:53
Message-ID:	BANLkTik+wEm2EWRTMPG4kzeGnO0kfgAfnA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jun 17, 2011 at 6:22 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Thu, Jun 16, 2011 at 6:54 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> I believe that this is fundamentally unavoidable so long as we use
>>> SnapshotNow to read catalogs --- which is something we've talked about
>>> changing, but it will require a pretty major R&D effort to make it
>>> happen.
>
>> Ouch.
>
>> I wonder if we could avoid this anomaly by taking a throwaway MVCC
>> snapshot at the beginning of each system catalog scan and using it
>> just for the duration of that scan. If nothing that has touched the
>> catalog commits while the scan is open, then this is logically
>> equivalent to SnapshotNow. If something does commit in mid-scan, then
>> we might not get the latest version of the row, but we should end up
>> with exactly one. If it's not the latest one, we'll do the rebuild
>> again upon seeing the next sinval message; in the meantime, the
>> version we're using mustn't be too intolerably bad or it was an error
>> not to use AccessExclusiveLock in the first place.
>
> Yeah, this seems like a possibly workable direction to explore. I like
> this better than what Simon is proposing, because it would fix the
> generic issue for all types of catalog SnapshotNow scans.
>
>> IIUC, the problem with this approach is not correctness but
>> performance. Taking snapshots is (currently) expensive.
>
> Yeah. After mulling it for awhile, what about this idea: we could
> redefine SnapshotNow as a snapshot type that includes a list of
> transactions-in-progress, somewhat like an MVCC snapshot, but we don't
> fill that list from the PGPROC array. Instead, while running a scan
> with SnapshotNow, anytime we determine that a particular XID is
> still-in-progress, we add that XID to the snapshot's list.
> Subsequently, the SnapshotNow code assumes that XID to be
> still-in-progress without consulting its actual state. We reset the XID
> list to empty when starting a new SnapshotNow scan. (We might be able
> to do so less often than that, like only when we do
> AcceptInvalidationMessages, but it's not clear to me that there's any
> real benefit in hanging onto the state longer.)
>
> This costs no performance; if anything it should be faster than now,
> because we'll be replacing expensive transaction state probes with
> relatively-cheap searches of an XID array that should almost always
> be quite short.
>
> With this approach, we would have no serialization anomalies from single
> transactions committing while a scan is in progress. There could be
> anomalies resulting from considering an earlier XID to be in-progress
> while a later XID is considered committed (because we didn't observe
> it until later). So far as I can see offhand, the impact of that would
> be that there might be multiple versions of a tuple that are considered
> good, but never that there would be no version considered good (so long
> as the other XIDs simply updated the tuple and didn't delete it). I
> think this would be all right, since the scan would just seize on the
> first good version it finds. As you argue above, if that's not good
> enough for our purposes then the updater(s) should have taken a stronger
> lock.

I liked this idea, so began to prototype the code. My rough hack is
attached, for the record.

One thing that occurs to me about this is that SnapshotNow with or
without these changes returns the latest committed row and ignores
in-progress changes.

Accepting an older version of the definition will always be
potentially dangerous. I can't see a way of doing this that doesn't
require locking - for changes such as new constraints we need to wait
until in progress changes are complete.

So maybe this idea is worth doing, but I don't think it helps us much
reduce lock levels for DDL.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment	Content-Type	Size
snapshotnow_consistent.v1.patch	application/octet-stream	6.5 KB

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-11-29 16:50:28
Message-ID:	201111291650.pATGoSJ06716@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs wrote:
> > This costs no performance; if anything it should be faster than now,
> > because we'll be replacing expensive transaction state probes with
> > relatively-cheap searches of an XID array that should almost always
> > be quite short.
> >
> > With this approach, we would have no serialization anomalies from single
> > transactions committing while a scan is in progress. ?There could be
> > anomalies resulting from considering an earlier XID to be in-progress
> > while a later XID is considered committed (because we didn't observe
> > it until later). ?So far as I can see offhand, the impact of that would
> > be that there might be multiple versions of a tuple that are considered
> > good, but never that there would be no version considered good (so long
> > as the other XIDs simply updated the tuple and didn't delete it). ?I
> > think this would be all right, since the scan would just seize on the
> > first good version it finds. ?As you argue above, if that's not good
> > enough for our purposes then the updater(s) should have taken a stronger
> > lock.
>
> I liked this idea, so began to prototype the code. My rough hack is
> attached, for the record.
>
> One thing that occurs to me about this is that SnapshotNow with or
> without these changes returns the latest committed row and ignores
> in-progress changes.
>
> Accepting an older version of the definition will always be
> potentially dangerous. I can't see a way of doing this that doesn't
> require locking - for changes such as new constraints we need to wait
> until in progress changes are complete.
>
> So maybe this idea is worth doing, but I don't think it helps us much
> reduce lock levels for DDL.

Just confirming we decided not to persue this.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-11-30 04:42:10
Message-ID:	CA+TgmobygoSoxFjxuPcw4H7s5Ub+Zj9LJGMYc7QsgzVHgeNzqQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Nov 29, 2011 at 11:50 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> Just confirming we decided not to persue this.

Doesn't sound like it.

I've been thinking a lot about the more general problem here - namely,
that allowing catalog changes without an access exclusive lock is
unsafe - and trying to come up with a good solution, because I'd
really like to see us make this work. It strikes me that there are
really two separate problems here.

1. If you are scanning a system catalog using SnapshotNow, and a
commit happens at just the right moment, you might see two versions of
the row or zero rather than one. You'll see two if you scan the old
row version, then the concurrent transaction commits, and then you
scan the new one. You'll see zero if you scan the new row (ignoring
it, since it isn't committed yet) and then the concurrent transaction
commits, and then you scan the old row. On a related note, if you
scan several interrelated catalogs (e.g. when building a relcache
entry) and a transaction that has made matching modifications to
multiple catalogs commits midway through, you might see the old
version of the data from one table and the new version of the data
from some other table. Using AccessExclusiveLock fixes the problem
because we guarantee that anybody who wants to read those rows first
takes some kind of lock, which they won't be able to do while the
updater holds AccessExclusiveLock.

2. Other backends may have data in the relcache or catcaches which
won't get invalidated until they do AcceptInvalidationMessages().
That will always happen no later than the next time they lock the
relation, so if you are holding AccessExclusiveLock then you should be
OK: no one else holds any lock, and they'll have to go get one before
doing anything interesting. But if you are holding any weaker lock,
there's nothing to force AcceptInvalidationMessages to happen before
you reuse those cache entries.

In both cases, name lookups are an exception: we don't know what OID
to lock until we've done the name lookup.

Random ideas for solving the first problem:

1-A. Take a fresh MVCC snapshot for each catalog scan. I think we've
rejected this before because of the cost involved, but maybe with
Pavan's patch and some of the other improvements that are on the way
we could make this cheap enough to be acceptable.
1-B. Don't allow transactions that have modified system catalog X to
commit while we're scanning system catalog X; make the scans take some
kind of shared lock and commits an exclusive lock. This seems awfully
bad for concurrency, though, not to mention the overhead of taking and
releasing all those locks even when nothing conflicts.
1-C. Wrap a retry loop around every place where we scan a system
catalog. If a transaction that has written rows in catalog X commits
while we're scanning catalog X, discard the results of the first scan
and declare a do-over (or maybe something more coarse-grained, or at
the other extreme even more fine-grained). If we're doing several
interrelated scans then the retry loop must surround the entire unit,
and enforce a do-over of the whole operation if commits occur in any
of the catalogs before the scan completes.

Unfortunately, any of these seem likely to require a Lot of Work,
probably more than can be done in time for 9.2. However, perhaps it
would be feasible to hone in on a limited subset of the cases (e.g.
name lookups, which are mainly done in only a handful of places, and
represent an existing bug) and implement the necessary code changes
just for those cases. Then we could generalize that pattern to other
cases as time and energy permit.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-12-16 12:07:52
Message-ID:	CA+U5nM+ybP2jXoXBg5v9OpWAJaOwfUHakytj9LiLnQfWz=YF3Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Nov 30, 2011 at 4:42 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> It strikes me that there are really two separate problems here.
>
> 1. If you are scanning a system catalog using SnapshotNow, and a
> commit happens at just the right moment, you might see two versions of
> the row or zero rather than one. You'll see two if you scan the old
> row version, then the concurrent transaction commits, and then you
> scan the new one. You'll see zero if you scan the new row (ignoring
> it, since it isn't committed yet) and then the concurrent transaction
> commits, and then you scan the old row.

That is a bug and one we should fix. I supplied a patch for that,
written to Tom's idea for how to solve it.

I will apply that, unless there are objections.

> 2. Other backends may have data in the relcache or catcaches which
> won't get invalidated until they do AcceptInvalidationMessages().
> That will always happen no later than the next time they lock the
> relation, so if you are holding AccessExclusiveLock then you should be
> OK: no one else holds any lock, and they'll have to go get one before
> doing anything interesting. But if you are holding any weaker lock,
> there's nothing to force AcceptInvalidationMessages to happen before
> you reuse those cache entries.

Yes, inconsistent cache entries will occur if we allow catalog updates
while the table is already locked by others.

The question is whether that is a problem in all cases.

With these ALTER TABLE subcommands, I don't see any problem with
different backends seeing different values.

/*
* These subcommands affect general strategies for performance
* and maintenance, though don't change the semantic results
* from normal data reads and writes. Delaying an ALTER TABLE
* behind currently active writes only delays the point where
* the new strategy begins to take effect, so there is no
* benefit in waiting. In this case the minimum restriction
* applies: we don't currently allow concurrent catalog
* updates.
*/
case AT_SetStatistics:
// only used by ANALYZE, which is shut out by the ShareUpdateExclusiveLock

case AT_ClusterOn:
case AT_DropCluster:
// only used by CLUSTER, which is shut out because it needs AccessExclusiveLock

case AT_SetRelOptions:
case AT_ResetRelOptions:
case AT_SetOptions:
case AT_ResetOptions:
case AT_SetStorage:
// not critical

case AT_ValidateConstraint:
// not a problem if some people think its invalid when it is valid

So ISTM that we can allow reduced lock levels for the above commands
if we fix SnapshotNow.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-12-16 13:38:53
Message-ID:	CA+TgmoYSXyFJUnFOuXkojQkDDyyV6rJK4SjTRD6kCoV6ovb8yg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Dec 16, 2011 at 7:07 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> That is a bug and one we should fix. I supplied a patch for that,
> written to Tom's idea for how to solve it.
>
> I will apply that, unless there are objections.

I remember several attempts at that, but I don't remember any that
didn't meet with objections. Do you have a link?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-12-16 13:54:37
Message-ID:	CA+U5nMK5k_8aNc1OgE=CM5XjWWJDKo5MbG678vpUTLH2XH_qdA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Dec 16, 2011 at 1:38 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Dec 16, 2011 at 7:07 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> That is a bug and one we should fix. I supplied a patch for that,
>> written to Tom's idea for how to solve it.
>>
>> I will apply that, unless there are objections.
>
> I remember several attempts at that, but I don't remember any that
> didn't meet with objections. Do you have a link?

My patch to implement SnapshotNow correctly, from Jun 27 on this
thread was never reviewed or commented upon by anybody. That was
probably because it only fixes one of the problems, not all of them.

But it does fix a current bug and that's why I'm asking now if there
are objections to committing it.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-12-17 03:22:17
Message-ID:	CA+Tgmob7Ly6L9pKK_2W_-KR4VbT9PHxdndt3YyfRdwzg4y88Vg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Dec 16, 2011 at 8:54 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Fri, Dec 16, 2011 at 1:38 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Fri, Dec 16, 2011 at 7:07 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>> That is a bug and one we should fix. I supplied a patch for that,
>>> written to Tom's idea for how to solve it.
>>>
>>> I will apply that, unless there are objections.
>>
>> I remember several attempts at that, but I don't remember any that
>> didn't meet with objections. Do you have a link?
>
> My patch to implement SnapshotNow correctly, from Jun 27 on this
> thread was never reviewed or commented upon by anybody. That was
> probably because it only fixes one of the problems, not all of them.

Well, I think it was mostly because you didn't sound terribly
optimistic about the approach: "So maybe this idea is worth doing, but
I don't think it helps us much reduce lock levels for DDL." And also
because you described the patch as a "rough hack", and not something
you thought ready to commit.

I am also not entirely sure I believe that this is plugging all the
failure cases. I think that it may just be making the failure cases
more obscure, rather than really getting rid of them. Consider
something like the following:

T1: Update row version A, creating new row version B.
T2: Begin scanning the catalog in question. We happen to encounter
row version B first. We remember T1's XID as in progress, but don't
see the row since T1 hasn't committed.
T1: Rollback.
T3: Update row version A, creating new row version C.
T3: Commit.
T2: Scan now comes to row version A; we don't see that version either,
since T3 is committed.

I don't think there's any guarantee that T2's scan will see tuples
inserted after the start of the scan. If I'm correct about that, and
I'm pretty sure it's true for sequential scans anyway, then T2's scan
might end without seeing C either.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-12-17 11:11:58
Message-ID:	CA+U5nM+N25Ygf5fKxOg3-coR02ZYUoZRu=GbkPn1Oi8pkk4-Bg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Dec 17, 2011 at 3:22 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> I am also not entirely sure I believe that this is plugging all the
> failure cases. I think that it may just be making the failure cases
> more obscure, rather than really getting rid of them. Consider
> something like the following:
>
> T1: Update row version A, creating new row version B.
> T2: Begin scanning the catalog in question. We happen to encounter
> row version B first. We remember T1's XID as in progress, but don't
> see the row since T1 hasn't committed.
> T1: Rollback.
> T3: Update row version A, creating new row version C.
> T3: Commit.
> T2: Scan now comes to row version A; we don't see that version either,
> since T3 is committed.
>
> I don't think there's any guarantee that T2's scan will see tuples
> inserted after the start of the scan. If I'm correct about that, and
> I'm pretty sure it's true for sequential scans anyway, then T2's scan
> might end without seeing C either.

A simpler example shows it better. Tom's idea prevents 2 rows being
returned, but doesn't prevent zero rows.

That's a useful improvement, but not the only thing.

ISTM we can run a scan as we do now, without locking. If scan returns
zero rows we scan again, but take a definition lock first. ALTER TABLE
holds the definition lock while running, which won't be a problem
because we would only use that on shorter AT statements.

Definition lock being the type of lock described earlier on this
thread, that we already have code for. So we have code for all the
parts we need to make this work.

So that means we'd be able to plug the gaps noted, without reducing
performance on the common code paths.

I don't see any objections made so far remain valid with that approach.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-12-19 17:05:09
Message-ID:	CA+TgmoYCAcka193syVqiwDpCbxbwcLr6-7NfS19uCrScup4+sg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Dec 17, 2011 at 6:11 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> A simpler example shows it better. Tom's idea prevents 2 rows being
> returned, but doesn't prevent zero rows.

I don't think it prevents either one. Suppose we read and return a
tuple, and then someone HOT-updates it and commits. SnapshotNow is
very happy to also return the updated version of that same tuple on
the next iteration of the scan. See commit
c0f03aae0469e758964faac0fb741685170c39a5.

> That's a useful improvement, but not the only thing.
>
> ISTM we can run a scan as we do now, without locking. If scan returns
> zero rows we scan again, but take a definition lock first. ALTER TABLE
> holds the definition lock while running, which won't be a problem
> because we would only use that on shorter AT statements.
>
> Definition lock being the type of lock described earlier on this
> thread, that we already have code for. So we have code for all the
> parts we need to make this work.
>
> So that means we'd be able to plug the gaps noted, without reducing
> performance on the common code paths.
>
> I don't see any objections made so far remain valid with that approach.

I think a retry loop is a possibly useful tool for solving this
problem, but I'm very skeptical about the definition locks, unless we
can arrange to have them taken just before commit (regardless of when
during the transaction ALTER TABLE was executed) and released
immediately afterwards. What I think might be even better is
something along the lines of what Noah Misch did RangeVarGetRelid --
don't try to lock out concurrent activity, just detect it and redo the
operation if anything bad happens while we're in process. In this
case, that would mean having a mechanism to know whether any
concurrent transaction had updated the catalog we're scanning during
the scan.

Yet another option, which I wonder whether we're dismissing too
lightly, is to just call GetSnapshotData() and do the scan using a
plain old MVCC snapshot. Sure, it's more expensive than SnapshotNow,
but is it expensive enough to worry about?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Noah Misch <noah(at)leadboat(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-12-20 03:10:02
Message-ID:	20111220031002.GA26050@tornado.leadboat.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Dec 19, 2011 at 12:05:09PM -0500, Robert Haas wrote:
> Yet another option, which I wonder whether we're dismissing too
> lightly, is to just call GetSnapshotData() and do the scan using a
> plain old MVCC snapshot. Sure, it's more expensive than SnapshotNow,
> but is it expensive enough to worry about?

Good point. For the most part, we already regard a catalog scan as too
expensive for bulk use, hence relcache and catcache. That's not license to
slow them down recklessly, but it's worth discovering how much of a hit we'd
actually face. I created a function that does this in a loop:

HeapTuple t;

CatalogCacheFlushCatalog(ProcedureRelationId);
t = SearchSysCache1(PROCOID, ObjectIdGetDatum(42) /* int4in */);
if (!HeapTupleIsValid(t))
elog(ERROR, "cache lookup failed for function 42");
ReleaseSysCache(t);

Then, I had pgbench call the function once per client with various numbers of
clients and a loop iteration count such that the total number of scans per run
was always 19200000. Results for master and for a copy patched to use MVCC
snapshots in catcache.c only:

2 clients, master: 4:30.66elapsed
4 clients, master: 4:26.82elapsed
32 clients, master: 4:25.30elapsed
2 clients, master: 4:25.67elapsed
4 clients, master: 4:26.58elapsed
32 clients, master: 4:26.40elapsed
2 clients, master: 4:27.54elapsed
4 clients, master: 4:26.60elapsed
32 clients, master: 4:27.20elapsed
2 clients, mvcc-catcache: 4:35.13elapsed
4 clients, mvcc-catcache: 4:30.40elapsed
32 clients, mvcc-catcache: 4:37.91elapsed
2 clients, mvcc-catcache: 4:28.13elapsed
4 clients, mvcc-catcache: 4:27.06elapsed
32 clients, mvcc-catcache: 4:32.84elapsed
2 clients, mvcc-catcache: 4:32.47elapsed
4 clients, mvcc-catcache: 4:24.35elapsed
32 clients, mvcc-catcache: 4:31.54elapsed

I see roughly a 2% performance regression. However, I'd expect any bulk
losses to come from increased LWLock contention, which just doesn't
materialize in a big way on this 2-core box. If anyone would like to rerun
this on a larger machine, I can package it up for reuse.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Noah Misch <noah(at)leadboat(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-12-20 04:13:57
Message-ID:	9451.1324354437@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Noah Misch <noah(at)leadboat(dot)com> writes:
> On Mon, Dec 19, 2011 at 12:05:09PM -0500, Robert Haas wrote:
>> Yet another option, which I wonder whether we're dismissing too
>> lightly, is to just call GetSnapshotData() and do the scan using a
>> plain old MVCC snapshot. Sure, it's more expensive than SnapshotNow,
>> but is it expensive enough to worry about?

That might actually be workable ...

> I created a function that does this in a loop:

> HeapTuple t;

> CatalogCacheFlushCatalog(ProcedureRelationId);
> t = SearchSysCache1(PROCOID, ObjectIdGetDatum(42) /* int4in */);
> if (!HeapTupleIsValid(t))
> elog(ERROR, "cache lookup failed for function 42");
> ReleaseSysCache(t);

... but this performance test seems to me to be entirely misguided,
because it's testing a situation that isn't going to occur much in the
field, precisely because the syscache should prevent constant reloads of
the same syscache entry.

Poking around a bit, it looks to me like one of the bigger users of
non-cache-fronted SnapshotNow scans is dependency.c. So maybe testing
the speed of large cascaded drops would be a more relevant test case.
For instance, create a schema containing a few thousand functions, and
see how long it takes to drop the schema.

Another thing that would be useful to know is what effect such a change
would have on the time to run the regression tests with
CLOBBER_CACHE_ALWAYS. That has nothing to do with any real-world
performance concerns, but it would be good to see if we're going to
cause a problem for the long-suffering buildfarm member that does that.

regards, tom lane

From:	Noah Misch <noah(at)leadboat(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-12-20 16:23:02
Message-ID:	20111220162302.GA5979@tornado.leadboat.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Dec 19, 2011 at 11:13:57PM -0500, Tom Lane wrote:
> Noah Misch <noah(at)leadboat(dot)com> writes:
> > I created a function that does this in a loop:
>
> > HeapTuple t;
>
> > CatalogCacheFlushCatalog(ProcedureRelationId);
> > t = SearchSysCache1(PROCOID, ObjectIdGetDatum(42) /* int4in */);
> > if (!HeapTupleIsValid(t))
> > elog(ERROR, "cache lookup failed for function 42");
> > ReleaseSysCache(t);
>
> ... but this performance test seems to me to be entirely misguided,
> because it's testing a situation that isn't going to occur much in the
> field, precisely because the syscache should prevent constant reloads of
> the same syscache entry.

> [ideas for more-realistic tests]

Granted, but I don't hope to reliably measure a change in a macro-benchmark
after seeing a rickety 2% change in a micro-benchmark.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Noah Misch <noah(at)leadboat(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2011-12-20 16:34:58
Message-ID:	20007.1324398898@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Noah Misch <noah(at)leadboat(dot)com> writes:
> On Mon, Dec 19, 2011 at 11:13:57PM -0500, Tom Lane wrote:
>> ... but this performance test seems to me to be entirely misguided,
>> because it's testing a situation that isn't going to occur much in the
>> field, precisely because the syscache should prevent constant reloads of
>> the same syscache entry.

>> [ideas for more-realistic tests]

> Granted, but I don't hope to reliably measure a change in a macro-benchmark
> after seeing a rickety 2% change in a micro-benchmark.

No, I'm not sure about that at all. In particular I think that
CatalogCacheFlushCatalog is pretty expensive and so the snapshot costs
could be a larger part of a more-realistic test.

regards, tom lane

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2012-01-02 17:09:16
Message-ID:	CA+U5nMKKSSPwMJ1eTZYoyC80pHMGd_G0CjSb_vaSbQFta3o3nQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Dec 20, 2011 at 4:34 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Noah Misch <noah(at)leadboat(dot)com> writes:
>> On Mon, Dec 19, 2011 at 11:13:57PM -0500, Tom Lane wrote:
>>> ... but this performance test seems to me to be entirely misguided,
>>> because it's testing a situation that isn't going to occur much in the
>>> field, precisely because the syscache should prevent constant reloads of
>>> the same syscache entry.
>
>>> [ideas for more-realistic tests]
>
>> Granted, but I don't hope to reliably measure a change in a macro-benchmark
>> after seeing a rickety 2% change in a micro-benchmark.
>
> No, I'm not sure about that at all. In particular I think that
> CatalogCacheFlushCatalog is pretty expensive and so the snapshot costs
> could be a larger part of a more-realistic test.

Attached patch makes SnapshotNow into an MVCC snapshot, initialised at
the start of each scan iff SnapshotNow is passed as the scan's
snapshot. It's fairly brief but seems to do the trick.

Assuming that is the right approach, some timings

10,000 functions individually
Create 15.7s 14.3s 14.8s 14.8s
Drop 11.9s 11.7 11.6s 12.0s

10,000 functions in a schema
Create ... same ...
Drop Cascade 2.5s

That seems OK to me.

Any feedback?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment	Content-Type	Size
snapshotnow_consistent.v4.patch	text/x-patch	3.7 KB

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2012-01-02 17:17:31
Message-ID:	23724.1325524651@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> Assuming that is the right approach, some timings

Um ... timings of what?

regards, tom lane

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2012-01-02 17:50:13
Message-ID:	CA+U5nM+_h-6w9DwKQDym3684hSvy34yfAhRa5z3vtT+JYyrdDw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jan 2, 2012 at 5:17 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
>> Assuming that is the right approach, some timings
>
> Um ... timings of what?

Apologies for being terse, no problem to give a full explanation.

You suggested earlier that putting calls to GetSnapshotData() in place
of using SnapshotNow might increase the time taken to do catalog
scans. That matters most in places where stuff isn't cached, one such
place you suggested might be the dependency check code, so a test of
dropping thousands of functions might show up any bad regressions.

From what I can see, there are no bad performance regressions from
making SnapshotNow use MVCC.

We're benefiting in 9.2 from good reductions in GetSnapshotData()
contention and improved performance anyway, so it looks like a
reasonable balance.

I'm proposing that we apply the above patch (v4) to allow SnapshotNow
scans to return consistent, trustable results. That fixes some of the
corner case bugs we've seen and paves the way for me to re-enable the
lock reduction code.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Noah Misch <noah(at)leadboat(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2012-01-02 18:06:26
Message-ID:	20120102180626.GA23436@tornado.leadboat.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jan 02, 2012 at 05:09:16PM +0000, Simon Riggs wrote:
> Attached patch makes SnapshotNow into an MVCC snapshot, initialised at
> the start of each scan iff SnapshotNow is passed as the scan's
> snapshot. It's fairly brief but seems to do the trick.

That's a neat trick. However, if you start a new SnapshotNow scan while one is
ongoing, the primordial scan's snapshot will change mid-stream.

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Noah Misch <noah(at)leadboat(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2012-01-02 18:41:31
Message-ID:	CA+U5nMJmHizgiH2TZQWvxftMYjT1Td-WLHB5-PsaoSVQeh9cBA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jan 2, 2012 at 6:06 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> On Mon, Jan 02, 2012 at 05:09:16PM +0000, Simon Riggs wrote:
>> Attached patch makes SnapshotNow into an MVCC snapshot, initialised at
>> the start of each scan iff SnapshotNow is passed as the scan's
>> snapshot. It's fairly brief but seems to do the trick.
>
> That's a neat trick. However, if you start a new SnapshotNow scan while one is
> ongoing, the primordial scan's snapshot will change mid-stream.

Do we ever do that? (and if so, Why?!? or perhaps just Where?)

We can use more complex code if required, but we'll be adding
complexity and code into the main path that I'd like to avoid.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2012-01-02 19:13:00
Message-ID:	27994.1325531580@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> On Mon, Jan 2, 2012 at 6:06 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
>> On Mon, Jan 02, 2012 at 05:09:16PM +0000, Simon Riggs wrote:
>>> Attached patch makes SnapshotNow into an MVCC snapshot,

>> That's a neat trick. However, if you start a new SnapshotNow scan while one is
>> ongoing, the primordial scan's snapshot will change mid-stream.

> Do we ever do that?

Almost certainly yes. For example, a catcache load may invoke catcache
or relcache reload operations on its way to opening the table or index
needed to fetch the desired row.

I think you can only safely do this if each caller has its own snapshot
variable, a la SnapshotDirty, and that's going to be hugely more
invasive.

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2012-01-02 19:14:38
Message-ID:	28045.1325531678@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> On Mon, Jan 2, 2012 at 5:17 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Um ... timings of what?

> Apologies for being terse, no problem to give a full explanation.

But you still didn't. I wanted to know what those numbers were and how
they show that there's not a performance regression. Presumably you
meant that some were "before" and some "after", but they were not so
labeled.

regards, tom lane

From:	Noah Misch <noah(at)leadboat(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2012-01-02 19:25:25
Message-ID:	20120102192525.GB23436@tornado.leadboat.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jan 02, 2012 at 06:41:31PM +0000, Simon Riggs wrote:
> On Mon, Jan 2, 2012 at 6:06 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> > On Mon, Jan 02, 2012 at 05:09:16PM +0000, Simon Riggs wrote:
> >> Attached patch makes SnapshotNow into an MVCC snapshot, initialised at
> >> the start of each scan iff SnapshotNow is passed as the scan's
> >> snapshot. It's fairly brief but seems to do the trick.
> >
> > That's a neat trick. ?However, if you start a new SnapshotNow scan while one is
> > ongoing, the primordial scan's snapshot will change mid-stream.
>
> Do we ever do that? (and if so, Why?!? or perhaps just Where?)

I hacked up your patch a bit, as attached, to emit a WARNING for any nested
use of SnapshotNow. This made 97/127 test files fail. As one example,
RelationBuildRuleLock() does TextDatumGetCString() for every tuple of its
SnapshotNow scan. That may need a detoast, which itself runs a scan.

> We can use more complex code if required, but we'll be adding
> complexity and code into the main path that I'd like to avoid.

Agreed.

Attachment	Content-Type	Size
find-overlap-snapshotnow.patch	text/plain	4.8 KB

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Noah Misch <noah(at)leadboat(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2012-01-02 19:33:28
Message-ID:	1325532677-sup-1833@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Noah Misch's message of lun ene 02 16:25:25 -0300 2012:
> On Mon, Jan 02, 2012 at 06:41:31PM +0000, Simon Riggs wrote:
> > On Mon, Jan 2, 2012 at 6:06 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> > > On Mon, Jan 02, 2012 at 05:09:16PM +0000, Simon Riggs wrote:
> > >> Attached patch makes SnapshotNow into an MVCC snapshot, initialised at
> > >> the start of each scan iff SnapshotNow is passed as the scan's
> > >> snapshot. It's fairly brief but seems to do the trick.
> > >
> > > That's a neat trick. ?However, if you start a new SnapshotNow scan while one is
> > > ongoing, the primordial scan's snapshot will change mid-stream.
> >
> > Do we ever do that? (and if so, Why?!? or perhaps just Where?)
>
> I hacked up your patch a bit, as attached, to emit a WARNING for any nested
> use of SnapshotNow. This made 97/127 test files fail. As one example,
> RelationBuildRuleLock() does TextDatumGetCString() for every tuple of its
> SnapshotNow scan. That may need a detoast, which itself runs a scan.

Uh, I thought detoasting had its own visibility test function .. I mean,
otherwise, what is HeapTupleSatisfiesToast for?

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Noah Misch <noah(at)leadboat(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2012-01-02 19:39:09
Message-ID:	20120102193909.GC23436@tornado.leadboat.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jan 02, 2012 at 04:33:28PM -0300, Alvaro Herrera wrote:
>
> Excerpts from Noah Misch's message of lun ene 02 16:25:25 -0300 2012:
> > On Mon, Jan 02, 2012 at 06:41:31PM +0000, Simon Riggs wrote:
> > > On Mon, Jan 2, 2012 at 6:06 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> > > > On Mon, Jan 02, 2012 at 05:09:16PM +0000, Simon Riggs wrote:
> > > >> Attached patch makes SnapshotNow into an MVCC snapshot, initialised at
> > > >> the start of each scan iff SnapshotNow is passed as the scan's
> > > >> snapshot. It's fairly brief but seems to do the trick.
> > > >
> > > > That's a neat trick. ?However, if you start a new SnapshotNow scan while one is
> > > > ongoing, the primordial scan's snapshot will change mid-stream.
> > >
> > > Do we ever do that? (and if so, Why?!? or perhaps just Where?)
> >
> > I hacked up your patch a bit, as attached, to emit a WARNING for any nested
> > use of SnapshotNow. This made 97/127 test files fail. As one example,
> > RelationBuildRuleLock() does TextDatumGetCString() for every tuple of its
> > SnapshotNow scan. That may need a detoast, which itself runs a scan.
>
> Uh, I thought detoasting had its own visibility test function .. I mean,
> otherwise, what is HeapTupleSatisfiesToast for?

The SnapshotNow scan was actually to build the relcache entry for the toast
table before scanning the toast table itself. Stack trace:

#0 InitSnapshotNowIfNeeded (snap=0xb7e8c0) at snapmgr.c:216
#1 0x000000000048f773 in index_beginscan_internal (indexRelation=0x7f9a091e5cc8, nkeys=2, norderbys=0, snapshot=0xb7e8c0) at indexam.c:295
#2 0x000000000048f7e9 in index_beginscan (heapRelation=0x7f9a091a6c60, indexRelation=0xb7e8c0, snapshot=0xb7e8c0, nkeys=152984776, norderbys=0) at indexam.c:238
#3 0x000000000048e1de in systable_beginscan (heapRelation=0x7f9a091a6c60, indexId=<value optimized out>, indexOK=<value optimized out>, snapshot=0xb7e8c0, nkeys=2, key=0x7fffe211b610) at genam.c:289
#4 0x00000000007287db in RelationBuildTupleDesc (targetRelId=<value optimized out>, insertIt=1 '\001') at relcache.c:455
#5 RelationBuildDesc (targetRelId=<value optimized out>, insertIt=1 '\001') at relcache.c:880
#6 0x000000000072a050 in RelationIdGetRelation (relationId=2838) at relcache.c:1567
#7 0x00000000004872c0 in relation_open (relationId=2838, lockmode=1) at heapam.c:910
#8 0x0000000000487328 in heap_open (relationId=12052672, lockmode=152984776) at heapam.c:1052
#9 0x000000000048a737 in toast_fetch_datum (attr=<value optimized out>) at tuptoaster.c:1586
#10 0x000000000048bf74 in heap_tuple_untoast_attr (attr=0xb7e8c0) at tuptoaster.c:135
#11 0x0000000000737d77 in pg_detoast_datum_packed (datum=0xb7e8c0) at fmgr.c:2266
#12 0x00000000006f0fb0 in text_to_cstring (t=0xb7e8c0) at varlena.c:135
#13 0x0000000000727083 in RelationBuildRuleLock (relation=0x7f9a091b3818) at relcache.c:673
#14 0x000000000072909e in RelationBuildDesc (targetRelId=<value optimized out>, insertIt=1 '\001') at relcache.c:886
#15 0x000000000072a050 in RelationIdGetRelation (relationId=11119) at relcache.c:1567
#16 0x00000000004872c0 in relation_open (relationId=11119, lockmode=0) at heapam.c:910
#17 0x0000000000487483 in relation_openrv_extended (relation=<value optimized out>, lockmode=<value optimized out>, missing_ok=<value optimized out>) at heapam.c:1011
#18 0x000000000048749c in heap_openrv_extended (relation=0xb7e8c0, lockmode=152984776, missing_ok=5 '\005') at heapam.c:1110
#19 0x000000000053570c in parserOpenTable (pstate=0xc885f8, relation=0xc88258, lockmode=1) at parse_relation.c:835
#20 0x000000000053594d in addRangeTableEntry (pstate=0xc885f8, relation=0xc88258, alias=0x0, inh=1 '\001', inFromCl=1 '\001') at parse_relation.c:901
#21 0x0000000000524dbf in transformTableEntry (pstate=0xc885f8, n=0xc88258, top_rte=0x7fffe211bd78, top_rti=0x7fffe211bd84, relnamespace=0x7fffe211bd70, containedRels=0x7fffe211bd68) at parse_clause.c:445
#22 transformFromClauseItem (pstate=0xc885f8, n=0xc88258, top_rte=0x7fffe211bd78, top_rti=0x7fffe211bd84, relnamespace=0x7fffe211bd70, containedRels=0x7fffe211bd68) at parse_clause.c:683
#23 0x0000000000526124 in transformFromClause (pstate=0xc885f8, frmList=<value optimized out>) at parse_clause.c:129
#24 0x000000000050c321 in transformSelectStmt (pstate=0xb7e8c0, parseTree=0xc88340) at analyze.c:896
#25 transformStmt (pstate=0xb7e8c0, parseTree=0xc88340) at analyze.c:186
#26 0x000000000050d583 in parse_analyze (parseTree=0xc88340, sourceText=0xc87828 "table pg_stat_user_tables;", paramTypes=0x0, numParams=0) at analyze.c:94
#27 0x00000000006789c5 in pg_analyze_and_rewrite (parsetree=0xc88340, query_string=0xc87828 "table pg_stat_user_tables;", paramTypes=0x0, numParams=0) at postgres.c:583
#28 0x0000000000678c77 in exec_simple_query (query_string=0xc87828 "table pg_stat_user_tables;") at postgres.c:941
#29 0x0000000000679997 in PostgresMain (argc=<value optimized out>, argv=<value optimized out>, username=0xbe03c0 "nm") at postgres.c:3881
#30 0x0000000000633d41 in BackendRun () at postmaster.c:3587
#31 BackendStartup () at postmaster.c:3272
#32 ServerLoop () at postmaster.c:1350
#33 0x0000000000635330 in PostmasterMain (argc=1, argv=0xbdc170) at postmaster.c:1110
#34 0x00000000005d206b in main (argc=1, argv=0xbdc170) at main.c:199

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2012-01-02 19:42:01
Message-ID:	CA+U5nMJk99ogge2ouWh-8nYZz9Fjr7mhWgNLbR78pTo98vdC6A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jan 2, 2012 at 7:13 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
>> On Mon, Jan 2, 2012 at 6:06 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
>>> On Mon, Jan 02, 2012 at 05:09:16PM +0000, Simon Riggs wrote:
>>>> Attached patch makes SnapshotNow into an MVCC snapshot,
>
>>> That's a neat trick. However, if you start a new SnapshotNow scan while one is
>>> ongoing, the primordial scan's snapshot will change mid-stream.
>
>> Do we ever do that?
>
> Almost certainly yes. For example, a catcache load may invoke catcache
> or relcache reload operations on its way to opening the table or index
> needed to fetch the desired row.

Ah, of course. I was thinking they would be rare by design.

> I think you can only safely do this if each caller has its own snapshot
> variable, a la SnapshotDirty, and that's going to be hugely more
> invasive.

OK, will look into it.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2012-01-02 19:46:48
Message-ID:	CA+U5nM+WNtfOUEZOPM1sjgbgveNg1J-7iPEAzqok7xxSLgRrbg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jan 2, 2012 at 7:14 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
>> On Mon, Jan 2, 2012 at 5:17 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Um ... timings of what?
>
>> Apologies for being terse, no problem to give a full explanation.
>
> But you still didn't. I wanted to know what those numbers were and how
> they show that there's not a performance regression. Presumably you
> meant that some were "before" and some "after", but they were not so
> labeled.

All timings were "after" applying the patch. Since all of the tests
had very acceptable absolute values I didn't test without-patch.

Anyway, looks like we need to bin that and retest with new patch when it comes.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Noah Misch <noah(at)leadboat(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2012-01-02 19:49:51
Message-ID:	1325533773-sup-6881@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Noah Misch's message of lun ene 02 16:39:09 -0300 2012:
>
> On Mon, Jan 02, 2012 at 04:33:28PM -0300, Alvaro Herrera wrote:

> > Uh, I thought detoasting had its own visibility test function .. I mean,
> > otherwise, what is HeapTupleSatisfiesToast for?
>
> The SnapshotNow scan was actually to build the relcache entry for the toast
> table before scanning the toast table itself. Stack trace:

Oh, right, that makes sense.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2012-01-02 19:54:33
Message-ID:	CA+U5nMJeavFtk9N4w-a6Sd0W7scmH=aweY+YwO1B2HGmVLdfAQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jan 2, 2012 at 7:49 PM, Alvaro Herrera
<alvherre(at)commandprompt(dot)com> wrote:
>
> Excerpts from Noah Misch's message of lun ene 02 16:39:09 -0300 2012:
>>
>> On Mon, Jan 02, 2012 at 04:33:28PM -0300, Alvaro Herrera wrote:
>
>> > Uh, I thought detoasting had its own visibility test function .. I mean,
>> > otherwise, what is HeapTupleSatisfiesToast for?
>>
>> The SnapshotNow scan was actually to build the relcache entry for the toast
>> table before scanning the toast table itself. Stack trace:
>
> Oh, right, that makes sense.

Certainly does. Thanks Noah, much appreciated.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2012-01-02 20:20:08
Message-ID:	1310.1325535608@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> On Mon, Jan 2, 2012 at 7:14 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> But you still didn't. I wanted to know what those numbers were and how
>> they show that there's not a performance regression. Presumably you
>> meant that some were "before" and some "after", but they were not so
>> labeled.

> All timings were "after" applying the patch. Since all of the tests
> had very acceptable absolute values I didn't test without-patch.

What is a "very acceptable absolute value", and how do you know it's
acceptable if you don't know what the previous performance was? This
reasoning makes no sense to me at all.

regards, tom lane

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2012-01-02 21:07:15
Message-ID:	CA+U5nMKySRh2Kq9i8Up6JeZz6n3k4OWpy=xwe5GP609FtHpkYg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jan 2, 2012 at 8:20 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
>> On Mon, Jan 2, 2012 at 7:14 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> But you still didn't. I wanted to know what those numbers were and how
>>> they show that there's not a performance regression. Presumably you
>>> meant that some were "before" and some "after", but they were not so
>>> labeled.
>
>> All timings were "after" applying the patch. Since all of the tests
>> had very acceptable absolute values I didn't test without-patch.
>
> What is a "very acceptable absolute value", and how do you know it's
> acceptable if you don't know what the previous performance was? This
> reasoning makes no sense to me at all.

I don't need to do things twice before deciding I enjoyed it the first
time. A low value showed that there were no problems, to me.

If you want to see more or discuss, that's cool, no problem. But no
need to beat me up for not guessing correctly the level of rigour that
would be acceptable to you. Now I have the first test result of your
requirements, I will be able to judge further test results against the
required standard.

As I've said, this is all invalid now anyway.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2012-01-02 21:17:34
Message-ID:	CA+U5nMK2v=7djJJ9E7-pggs4dyO2UH3HgYQBz66An9-tqrxm4A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I think I can avoid doing things that way and keep it non-invasive.

If we
#define SnapshotNow (GetSnapshotNow())

and make GetSnapshotNow() reuse the static SnapshotNowData if possible.
If not available we can allocate a new snapshot in TopTransactionContext.

We can free the snapshot at the end of scan.

That hides most of the complexity without causing any problems, ISTM.

Working on this now.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2012-01-03 00:12:21
Message-ID:	CA+U5nMJ2Li8yps0OuEbNyoV64jfW2nw3mYXQNLPQNkTGG9XJcA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jan 2, 2012 at 9:17 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Mon, Jan 2, 2012 at 7:13 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
>>> On Mon, Jan 2, 2012 at 6:06 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
>>>> On Mon, Jan 02, 2012 at 05:09:16PM +0000, Simon Riggs wrote:
>>>>> Attached patch makes SnapshotNow into an MVCC snapshot,
>>
>>>> That's a neat trick. However, if you start a new SnapshotNow scan while one is
>>>> ongoing, the primordial scan's snapshot will change mid-stream.
>>
>>> Do we ever do that?
>>
>> Almost certainly yes. For example, a catcache load may invoke catcache
>> or relcache reload operations on its way to opening the table or index
>> needed to fetch the desired row.
>>
>> I think you can only safely do this if each caller has its own snapshot
>> variable, a la SnapshotDirty, and that's going to be hugely more
>> invasive.
>
>
> I think I can avoid doing things that way and keep it non-invasive.
>
> If we
> #define SnapshotNow (GetSnapshotNow())
>
> and make GetSnapshotNow() reuse the static SnapshotNowData if possible.
> If not available we can allocate a new snapshot in TopTransactionContext.
>
> We can free the snapshot at the end of scan.
>
> That hides most of the complexity without causing any problems, ISTM.
>
> Working on this now.

Options for implementing this are

1) add FreeSnapshotNowIfNeeded() to every heap_endscan and index_endscan
Small code footprint, touches every scan

2) change all heap_scans that uses SnapshotNow into a systable_scan
and add FreeSnapshotNowIfNeeded() to systable_endscan
35 call points, touches only system table scans

3) code specific calls to create a snapshot for each SnapshotNow call,
similar to InitDirtySnapshot
185 call points, very fine grained control, but probably overkill

(2) seems the right way to do this, with selective use of (3) and has
the side benefit of a reasonable code rationalisation

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Noah Misch <noah(at)leadboat(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2012-01-03 01:18:41
Message-ID:	CA+U5nM+u0Dy-EcLUGgVsyjQLA6K-1Wpe+g9eUfoEMCGirYhnng@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jan 2, 2012 at 6:41 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Mon, Jan 2, 2012 at 6:06 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
>> On Mon, Jan 02, 2012 at 05:09:16PM +0000, Simon Riggs wrote:
>>> Attached patch makes SnapshotNow into an MVCC snapshot, initialised at
>>> the start of each scan iff SnapshotNow is passed as the scan's
>>> snapshot. It's fairly brief but seems to do the trick.
>>
>> That's a neat trick. However, if you start a new SnapshotNow scan while one is
>> ongoing, the primordial scan's snapshot will change mid-stream.
>
> Do we ever do that? (and if so, Why?!? or perhaps just Where?)

Just for the record, yes we do run multiple catalog scans in some
parts of the code.

So I can see how we might trigger 4 nested scans, using cache
replacement while scanning, so best assume more, with no guarantee of
them being neatly stacked for pop/push type access.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Noah Misch <noah(at)leadboat(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2012-01-03 04:40:38
Message-ID:	20120103044038.GA4070@tornado.leadboat.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 03, 2012 at 01:18:41AM +0000, Simon Riggs wrote:
> Just for the record, yes we do run multiple catalog scans in some
> parts of the code.
>
> So I can see how we might trigger 4 nested scans, using cache
> replacement while scanning, so best assume more, with no guarantee of
> them being neatly stacked for pop/push type access.

Yeah, I wouldn't want to commit to a nesting limit. However, I _would_ have
expected that a stack would suffice; PushActiveSnapshot()/PopActiveSnapshot()
is adequate for a great deal of the backend, after all. In what sort of
situation do catalog scans not strictly nest?

Thanks,
nm

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Noah Misch <noah(at)leadboat(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2012-01-03 10:47:17
Message-ID:	CA+U5nMLpfx-NpC8fD6-7EEV5LxKLmM7YjZKLGicV91U3Rg+kaA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 3, 2012 at 4:40 AM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> On Tue, Jan 03, 2012 at 01:18:41AM +0000, Simon Riggs wrote:
>> Just for the record, yes we do run multiple catalog scans in some
>> parts of the code.
>>
>> So I can see how we might trigger 4 nested scans, using cache
>> replacement while scanning, so best assume more, with no guarantee of
>> them being neatly stacked for pop/push type access.
>
> Yeah, I wouldn't want to commit to a nesting limit.

Agreed

> However, I _would_ have
> expected that a stack would suffice; PushActiveSnapshot()/PopActiveSnapshot()
> is adequate for a great deal of the backend, after all. In what sort of
> situation do catalog scans not strictly nest?

It's possible. Making assumptions about what is possible bit me
before, as you showed.

I've seen code where we are explicitly running two concurrent scans.

I don't want to put unnecessary and subtle assumptions into catalog
scan code so I'm working on the assumption that endscan may not be
called in strict FIFO order. The difference is fairly minor and
doesn't restrict us in other ways.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2012-01-03 15:24:52
Message-ID:	CA+Tgmobsah2y8K3XU4syw2gVo5+r7MxTaMsaPBS_aSRYKh9tBA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 3, 2012 at 5:47 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Tue, Jan 3, 2012 at 4:40 AM, Noah Misch <noah(at)leadboat(dot)com> wrote:
>> On Tue, Jan 03, 2012 at 01:18:41AM +0000, Simon Riggs wrote:
>>> Just for the record, yes we do run multiple catalog scans in some
>>> parts of the code.
>>>
>>> So I can see how we might trigger 4 nested scans, using cache
>>> replacement while scanning, so best assume more, with no guarantee of
>>> them being neatly stacked for pop/push type access.
>>
>> Yeah, I wouldn't want to commit to a nesting limit.
>
> Agreed
>
>> However, I _would_ have
>> expected that a stack would suffice; PushActiveSnapshot()/PopActiveSnapshot()
>> is adequate for a great deal of the backend, after all. In what sort of
>> situation do catalog scans not strictly nest?
>
> It's possible. Making assumptions about what is possible bit me
> before, as you showed.
>
> I've seen code where we are explicitly running two concurrent scans.
>
> I don't want to put unnecessary and subtle assumptions into catalog
> scan code so I'm working on the assumption that endscan may not be
> called in strict FIFO order. The difference is fairly minor and
> doesn't restrict us in other ways.

I feel like the first thing we should be doing here is some
benchmarking. If we change just the scans in dependency.c and then
try the test case Tom suggested (dropping a schema containing a large
number of functions), we can compare the patched code with master and
get an idea of whether the performance is acceptable. If it is,
changing everything else is a mostly mechanical process that we can
simply grind through. If it's not, I'd rather learn that before we
start grinding.

I started to do this before Christmas, but then I ... went on
vacation. Here's Perl script that can be used to generate an SQL
script to create as many functions as you'd like to try dropping at
once, in case it's helpful. The idea is to run the resulting SQL
script through psql and then test the speed of "DROP SCHEMA
lots_of_functions CASCADE;".

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment	Content-Type	Size
lots_of_functions.pl	application/octet-stream	196 bytes

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2012-01-03 16:17:26
Message-ID:	CA+U5nMJMcwHfJ6aGrrWD+bm4+Sd3HRU9=iL5P=wta2GfR_cwKQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 3, 2012 at 3:24 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

Yes, I've done this and it takes 2.5s to drop 10,000 functions using
an MVCC snapshot.

That was acceptable to *me*, so I didn't try measuring using just SnapshotNow.

We can do a lot of tests but at the end its a human judgement. Is 100%
correct results from catalog accesses worth having when the real world
speed of it is not substantially very good? (Whether its x1000000
times slower or not is not relevant if it is still fast enough).

Anybody with that many DDL statements probably cares about doing
various operations with lower lock levels.

Fastpath locking will slow down DDL but we didn't measure the
performance slow down there. We understood the benefit and were
willing to pay the price.

So I'll proceed for now with the patch, which isn't as simple as you think.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2012-01-03 17:39:16
Message-ID:	CA+TgmobpQzJ16+5NH3tMLSZ5_K4V8zqJwpFZQ_+5BRjbtimGrw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 3, 2012 at 11:17 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Tue, Jan 3, 2012 at 3:24 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> I feel like the first thing we should be doing here is some
>> benchmarking. If we change just the scans in dependency.c and then
>> try the test case Tom suggested (dropping a schema containing a large
>> number of functions), we can compare the patched code with master and
>> get an idea of whether the performance is acceptable.
>
> Yes, I've done this and it takes 2.5s to drop 10,000 functions using
> an MVCC snapshot.
>
> That was acceptable to *me*, so I didn't try measuring using just SnapshotNow.
>
> We can do a lot of tests but at the end its a human judgement. Is 100%
> correct results from catalog accesses worth having when the real world
> speed of it is not substantially very good? (Whether its x1000000
> times slower or not is not relevant if it is still fast enough).

Sure. But I don't see why that means it wouldn't be nice to know
whether or not it is in fact a million times slower. If Tom's
artificial worst case is a million times slower, then probably there
are some cases we care about more that are going to be measurably
impacted, and we're going to want to think about what to do about
that. We can wait until you've finished the patch before we do that
testing, or we can do it now and maybe get some idea whether the
approach is likely to be viable or whether it's going to require some
adjustment before we actually go trawl through all that code.

On my laptop, dropping a schema with 10,000 functions using commit
d5448c7d31b5af66a809e6580bae9bd31448bfa7 takes 400-500 ms. If my
laptop is the same speed as your laptop, that would mean a 5-6x
slowdown, but of course that's comparing apples and oranges... in any
event, if the real number is anywhere in that ballpark, it's probably
a surmountable obstacle, but I'd like to know rather than guessing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2012-01-03 17:55:58
Message-ID:	22534.1325613358@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> That was acceptable to *me*, so I didn't try measuring using just SnapshotNow.

> We can do a lot of tests but at the end its a human judgement. Is 100%
> correct results from catalog accesses worth having when the real world
> speed of it is not substantially very good? (Whether its x1000000
> times slower or not is not relevant if it is still fast enough).

That argument is just nonsense AFAICT. Yes, 2.5 s to drop 10000
functions is probably fine, but that is an artificial test case whose
only real interest is to benchmark what a change in SnapshotNow scans
might cost us. In the real world it's hard to guess what fraction of a
real workload might consist of such scans, but I suspect there are some
where a significant increase in that cost might hurt. So in my mind the
point of the exercise is to find out how much the cost increased, and
I'm just baffled as to why you won't use a benchmark case to, um,
benchmark.

Another point that requires some thought is that switching SnapshotNow
to be MVCC-based will presumably result in a noticeable increase in each
backend's rate of wanting to acquire snapshots. Hence, more contention
in GetSnapshotData can be expected. A single-threaded test case doesn't
prove anything at all about what that might cost under load.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2012-01-03 18:13:46
Message-ID:	CA+TgmoZf5kktkvo8bGf8rEyNzL_OU6iHy8D0Q=tq3mgrWsqv-g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 3, 2012 at 12:55 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
>> That was acceptable to *me*, so I didn't try measuring using just SnapshotNow.
>
>> We can do a lot of tests but at the end its a human judgement. Is 100%
>> correct results from catalog accesses worth having when the real world
>> speed of it is not substantially very good? (Whether its x1000000
>> times slower or not is not relevant if it is still fast enough).
>
> That argument is just nonsense AFAICT. Yes, 2.5 s to drop 10000
> functions is probably fine, but that is an artificial test case whose
> only real interest is to benchmark what a change in SnapshotNow scans
> might cost us. In the real world it's hard to guess what fraction of a
> real workload might consist of such scans, but I suspect there are some
> where a significant increase in that cost might hurt. So in my mind the
> point of the exercise is to find out how much the cost increased, and
> I'm just baffled as to why you won't use a benchmark case to, um,
> benchmark.

Ditto.

> Another point that requires some thought is that switching SnapshotNow
> to be MVCC-based will presumably result in a noticeable increase in each
> backend's rate of wanting to acquire snapshots. Hence, more contention
> in GetSnapshotData can be expected. A single-threaded test case doesn't
> prove anything at all about what that might cost under load.

This is obviously true at some level, but I'm not sure that it really
matters. It's not that difficult to construct a test case where we
have lots of people concurrently reading a table, or reading many
tables, or writing a table, or writing many tables, but what kind of
realistic test case involves enough DDL for any of this to matter? If
you're creating or dropping tables, for example, the filesystem costs
are likely to be a much bigger problem than GetSnapshotData(), to the
point where you probably can't get enough concurrency for
GetSnapshotData() to matter. Maybe you could find a problem case
involving creating or dropping lots and lots of functions
concurrently, or something like that, but who does that? You'd have
to be performing operations on hundreds of non-table SQL objects per
second, and it is hard for me to imagine why anyone would be doing
that. Maybe I'm not imaginative enough?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2012-01-03 18:14:59
Message-ID:	1325614296-sup-9131@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Robert Haas's message of mar ene 03 12:24:52 -0300 2012:

> I feel like the first thing we should be doing here is some
> benchmarking. If we change just the scans in dependency.c and then
> try the test case Tom suggested (dropping a schema containing a large
> number of functions), we can compare the patched code with master and
> get an idea of whether the performance is acceptable. If it is,
> changing everything else is a mostly mechanical process that we can
> simply grind through. If it's not, I'd rather learn that before we
> start grinding.

If there are many call sites, maybe it'd be a good idea to use a
semantic patcher tool such as Coccinelle instead of doing it one by one.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2012-01-03 18:24:41
Message-ID:	23111.1325615081@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Tue, Jan 3, 2012 at 12:55 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Another point that requires some thought is that switching SnapshotNow
>> to be MVCC-based will presumably result in a noticeable increase in each
>> backend's rate of wanting to acquire snapshots. Hence, more contention
>> in GetSnapshotData can be expected. A single-threaded test case doesn't
>> prove anything at all about what that might cost under load.

> This is obviously true at some level, but I'm not sure that it really
> matters. It's not that difficult to construct a test case where we
> have lots of people concurrently reading a table, or reading many
> tables, or writing a table, or writing many tables, but what kind of
> realistic test case involves enough DDL for any of this to matter?

Um ... you're supposing that only DDL uses SnapshotNow, which is wrong.
I refer you to the parser, the planner, execution functions for arrays,
records, enums, any sort of relcache reload, etc etc etc. Yes, some
of that is masked by backend-internal caching, some of the time, but
it's folly to just assume that there are no SnapshotNow scans during
normal queries.

None of this is necessarily grounds to reject a patch along the proposed
lines. I'm just asking for some benchmarking effort to establish what
the costs might be, rather than naively hoping they are negligible.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2012-01-03 18:33:42
Message-ID:	CA+TgmoaNehGhOTVc58bXLE0aL2efnNhDFS7-JyeVFhwPo3UQ8Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 3, 2012 at 1:24 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Um ... you're supposing that only DDL uses SnapshotNow, which is wrong.
> I refer you to the parser, the planner, execution functions for arrays,
> records, enums, any sort of relcache reload, etc etc etc. Yes, some
> of that is masked by backend-internal caching, some of the time, but
> it's folly to just assume that there are no SnapshotNow scans during
> normal queries.

Hmm. That's unfortunate, because it seems difficult to construct a
test case that will exercise every feature in the system.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Robert Haas" <robertmhaas(at)gmail(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Noah Misch" <noah(at)leadboat(dot)com>, "Bruce Momjian" <bruce(at)momjian(dot)us>, "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2012-01-03 18:42:04
Message-ID:	4F02F79C0200002500044296@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Um ... you're supposing that only DDL uses SnapshotNow, which is
>> wrong. I refer you to the parser, the planner, execution
>> functions for arrays, records, enums, any sort of relcache
>> reload, etc etc etc. Yes, some of that is masked by
>> backend-internal caching, some of the time, but it's folly to
>> just assume that there are no SnapshotNow scans during normal
>> queries.
>
> Hmm. That's unfortunate, because it seems difficult to construct
> a test case that will exercise every feature in the system.

Would the result of IsMVCCSnapshot() change for these snapshots? If
so, it might require work in the SSI code to avoid a performance hit
there. In early profiling and stepping through execution I noticed
that we had overhead in serializable transactions for the planner
checks for the actual values at the beginning or end of an index.
This went away when we avoided SSI work for reads using a non-MVCC
snapshot. If we're going to start using MVCC snapshots for such
things, we need some other way to avoid unnecessary work in this
area (and possibly others).

At a minimum, some comparative benchmarks at the serializable
isolation level would be in order when considering a patch like
this.

-Kevin

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2012-01-03 18:42:40
Message-ID:	23479.1325616160@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I wrote:
>> Another point that requires some thought is that switching SnapshotNow
>> to be MVCC-based will presumably result in a noticeable increase in each
>> backend's rate of wanting to acquire snapshots.

BTW, I wonder if this couldn't be ameliorated by establishing some
ground rules about how up-to-date a snapshot really needs to be.
Arguably, it should be okay for successive SnapshotNow scans to use the
same snapshot as long as we have not acquired a new lock in between.
If not, reusing an old snap doesn't introduce any race condition that
wasn't there already.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2012-01-03 19:04:28
Message-ID:	CA+TgmoYf6gOBHiOqyBr_NZjxyrQUprGGxBt7kmzM24R8iv_EvQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 3, 2012 at 1:42 PM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Um ... you're supposing that only DDL uses SnapshotNow, which is
>>> wrong. I refer you to the parser, the planner, execution
>>> functions for arrays, records, enums, any sort of relcache
>>> reload, etc etc etc. Yes, some of that is masked by
>>> backend-internal caching, some of the time, but it's folly to
>>> just assume that there are no SnapshotNow scans during normal
>>> queries.
>>
>> Hmm. That's unfortunate, because it seems difficult to construct
>> a test case that will exercise every feature in the system.
>
> Would the result of IsMVCCSnapshot() change for these snapshots? If
> so, it might require work in the SSI code to avoid a performance hit
> there. In early profiling and stepping through execution I noticed
> that we had overhead in serializable transactions for the planner
> checks for the actual values at the beginning or end of an index.
> This went away when we avoided SSI work for reads using a non-MVCC
> snapshot. If we're going to start using MVCC snapshots for such
> things, we need some other way to avoid unnecessary work in this
> area (and possibly others).
>
> At a minimum, some comparative benchmarks at the serializable
> isolation level would be in order when considering a patch like
> this.

Ugh. Yeah, that sounds like a problem. :-(

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2012-01-03 19:16:35
Message-ID:	CA+Tgmoby9We8pXOTRzmCaAmj1aQ4=1eN+jjfH=5t4Tpj54DPXQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 3, 2012 at 1:42 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I wrote:
>>> Another point that requires some thought is that switching SnapshotNow
>>> to be MVCC-based will presumably result in a noticeable increase in each
>>> backend's rate of wanting to acquire snapshots.
>
> BTW, I wonder if this couldn't be ameliorated by establishing some
> ground rules about how up-to-date a snapshot really needs to be.
> Arguably, it should be okay for successive SnapshotNow scans to use the
> same snapshot as long as we have not acquired a new lock in between.
> If not, reusing an old snap doesn't introduce any race condition that
> wasn't there already.

Is that likely to help much? I think our usual pattern is to lock the
catalog, scan it, and then release the lock, so there will normally be
an AcceptInvalidationMessages() just before the scan. Or at least, I
think there will.

Another thought is that it should always be safe to reuse an old
snapshot if no transactions have committed or aborted since it was
taken (possibly we could narrow that to "no transactions have
committed since it was taken", but I think that might cause some
headaches with RecentGlobalXmin). I wonder if we could come up with a
cheap, hopefully lock-free method of determining whether or not that's
the case. For example, suppose we store the last XID to commit or
abort in shared memory. In GetSnapshotData(), we check whether that
value has changed (probably, after enforcing a memory barrier of some
kind) before acquiring the lock. If it has, we proceed normally, but
if not, we just reuse the results from the previous GetSnapshotData().

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2012-01-03 19:22:28
Message-ID:	24313.1325618548@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Tue, Jan 3, 2012 at 1:42 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> BTW, I wonder if this couldn't be ameliorated by establishing some
>> ground rules about how up-to-date a snapshot really needs to be.
>> Arguably, it should be okay for successive SnapshotNow scans to use the
>> same snapshot as long as we have not acquired a new lock in between.
>> If not, reusing an old snap doesn't introduce any race condition that
>> wasn't there already.

> Is that likely to help much? I think our usual pattern is to lock the
> catalog, scan it, and then release the lock, so there will normally be
> an AcceptInvalidationMessages() just before the scan. Or at least, I
> think there will.

Um, good point. Those locks aren't meant to avoid race conditions,
but the mechanism doesn't know that.

> Another thought is that it should always be safe to reuse an old
> snapshot if no transactions have committed or aborted since it was
> taken

Yeah, that might work better. And it'd be a win for all MVCC snaps,
not just the ones coming from promoted SnapshotNow ...

regards, tom lane

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2012-01-03 20:55:02
Message-ID:	CA+U5nMLCgTSr+j5VRxZKDSaFAVQxtVwZ9=rtaqfY_0AkF_K0oQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 3, 2012 at 6:24 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Tue, Jan 3, 2012 at 12:55 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Another point that requires some thought is that switching SnapshotNow
>>> to be MVCC-based will presumably result in a noticeable increase in each
>>> backend's rate of wanting to acquire snapshots. Hence, more contention
>>> in GetSnapshotData can be expected. A single-threaded test case doesn't
>>> prove anything at all about what that might cost under load.
>
>> This is obviously true at some level, but I'm not sure that it really
>> matters. It's not that difficult to construct a test case where we
>> have lots of people concurrently reading a table, or reading many
>> tables, or writing a table, or writing many tables, but what kind of
>> realistic test case involves enough DDL for any of this to matter?
>
> Um ... you're supposing that only DDL uses SnapshotNow, which is wrong.
> I refer you to the parser, the planner, execution functions for arrays,
> records, enums, any sort of relcache reload, etc etc etc. Yes, some
> of that is masked by backend-internal caching, some of the time, but
> it's folly to just assume that there are no SnapshotNow scans during
> normal queries.
>
> None of this is necessarily grounds to reject a patch along the proposed
> lines. I'm just asking for some benchmarking effort to establish what
> the costs might be, rather than naively hoping they are negligible.

All of which is reasonable doubt.

So far, all I'm saying is that on a couple of heavy duty cases that
I've run, I haven't noticed a problem. That is sufficient for me to
push forwards with a patch that does very close to what we want, so we
can judge acceptability with wider tests. I'm not saying it will be
acceptable a priori, only that it is worth trying because the benefits
are high. The number of call points are also high.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2012-01-03 20:57:56
Message-ID:	CA+U5nMKPYF96t85fcEFHbEFSfMqCkMFo7rmNdvG7-YzOmW0guQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 3, 2012 at 6:14 PM, Alvaro Herrera
<alvherre(at)commandprompt(dot)com> wrote:

> If there are many call sites, maybe it'd be a good idea to use a
> semantic patcher tool such as Coccinelle instead of doing it one by one.

Thanks for the suggestion, regrettably I've already made those changes.

After examining the call sites, I identified 35 that might need
changing. Of those, about 30 were changed to use systable_beginscan,
while a few others use declared snapshots instead. So not a great
effort and worth doing the by-hand inspection.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2012-01-03 21:07:13
Message-ID:	1325624774-sup-4853@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Simon Riggs's message of mar ene 03 17:57:56 -0300 2012:
>
> On Tue, Jan 3, 2012 at 6:14 PM, Alvaro Herrera
> <alvherre(at)commandprompt(dot)com> wrote:
>
> > If there are many call sites, maybe it'd be a good idea to use a
> > semantic patcher tool such as Coccinelle instead of doing it one by one.
>
> Thanks for the suggestion, regrettably I've already made those changes.
>
> After examining the call sites, I identified 35 that might need
> changing. Of those, about 30 were changed to use systable_beginscan,
> while a few others use declared snapshots instead. So not a great
> effort and worth doing the by-hand inspection.

Yeah, for 35 changes I wouldn't have gone all the length required to
learn the new tool either. If they had been 200, OTOH ...

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2012-08-17 01:11:11
Message-ID:	20120817011111.GN30286@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Was this resolved? (Sorry to be bugging everyone.)

---------------------------------------------------------------------------

On Tue, Nov 29, 2011 at 11:42:10PM -0500, Robert Haas wrote:
> On Tue, Nov 29, 2011 at 11:50 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > Just confirming we decided not to persue this.
>
> Doesn't sound like it.
>
> I've been thinking a lot about the more general problem here - namely,
> that allowing catalog changes without an access exclusive lock is
> unsafe - and trying to come up with a good solution, because I'd
> really like to see us make this work. It strikes me that there are
> really two separate problems here.
>
> 1. If you are scanning a system catalog using SnapshotNow, and a
> commit happens at just the right moment, you might see two versions of
> the row or zero rather than one. You'll see two if you scan the old
> row version, then the concurrent transaction commits, and then you
> scan the new one. You'll see zero if you scan the new row (ignoring
> it, since it isn't committed yet) and then the concurrent transaction
> commits, and then you scan the old row. On a related note, if you
> scan several interrelated catalogs (e.g. when building a relcache
> entry) and a transaction that has made matching modifications to
> multiple catalogs commits midway through, you might see the old
> version of the data from one table and the new version of the data
> from some other table. Using AccessExclusiveLock fixes the problem
> because we guarantee that anybody who wants to read those rows first
> takes some kind of lock, which they won't be able to do while the
> updater holds AccessExclusiveLock.
>
> 2. Other backends may have data in the relcache or catcaches which
> won't get invalidated until they do AcceptInvalidationMessages().
> That will always happen no later than the next time they lock the
> relation, so if you are holding AccessExclusiveLock then you should be
> OK: no one else holds any lock, and they'll have to go get one before
> doing anything interesting. But if you are holding any weaker lock,
> there's nothing to force AcceptInvalidationMessages to happen before
> you reuse those cache entries.
>
> In both cases, name lookups are an exception: we don't know what OID
> to lock until we've done the name lookup.
>
> Random ideas for solving the first problem:
>
> 1-A. Take a fresh MVCC snapshot for each catalog scan. I think we've
> rejected this before because of the cost involved, but maybe with
> Pavan's patch and some of the other improvements that are on the way
> we could make this cheap enough to be acceptable.
> 1-B. Don't allow transactions that have modified system catalog X to
> commit while we're scanning system catalog X; make the scans take some
> kind of shared lock and commits an exclusive lock. This seems awfully
> bad for concurrency, though, not to mention the overhead of taking and
> releasing all those locks even when nothing conflicts.
> 1-C. Wrap a retry loop around every place where we scan a system
> catalog. If a transaction that has written rows in catalog X commits
> while we're scanning catalog X, discard the results of the first scan
> and declare a do-over (or maybe something more coarse-grained, or at
> the other extreme even more fine-grained). If we're doing several
> interrelated scans then the retry loop must surround the entire unit,
> and enforce a do-over of the whole operation if commits occur in any
> of the catalogs before the scan completes.
>
> Unfortunately, any of these seem likely to require a Lot of Work,
> probably more than can be done in time for 9.2. However, perhaps it
> would be feasible to hone in on a limited subset of the cases (e.g.
> name lookups, which are mainly done in only a handful of places, and
> represent an existing bug) and implement the necessary code changes
> just for those cases. Then we could generalize that pattern to other
> cases as time and energy permit.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ALTER TABLE lock strength reduction patch is unsafe
Date:	2012-08-17 19:45:34
Message-ID:	CA+TgmoZsoCVbnySJnLEimGSLfM6MY8Q9xr2HvTgd-hRbdP+VeQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Aug 16, 2012 at 9:11 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> Was this resolved? (Sorry to be bugging everyone.)

Nope. I've got some ideas, but not enough round tuits.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company