Thoughts about bug #3883

Lists: pgsql-hackerspgsql-patches
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Thoughts about bug #3883
Date: 2008-01-21 23:01:04
Message-ID: 21534.1200956464@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Steven Flatt's report in this thread:
http://archives.postgresql.org/pgsql-bugs/2008-01/msg00138.php
exposes two more-or-less-independent flaws.

One problem is that we allow operations like TRUNCATE on tables that are
open in the current backend. This poses a risk of strange behavior,
such as

regression=# create table foo as select x from generate_series(1,1000) x;
SELECT
regression=# begin;
BEGIN
regression=# declare c cursor for select * from foo;
DECLARE CURSOR
regression=# fetch 10 from c;
x
----
1
2
3
4
5
6
7
8
9
10
(10 rows)

regression=# truncate foo;
TRUNCATE TABLE
regression=# fetch 10 from c;
x
----
11
12
13
14
15
16
17
18
19
20
(10 rows)

regression=# fetch all from c;
ERROR: could not read block 1 of relation 1663/133283/156727: read only 0 of 8192 bytes

It's not too consistent that we could still read rows from c until we
needed to fetch the next page of the table. For more complex queries
involving indexscans, I'm afraid the behavior could be even more
bizarre.

What I propose we do about this is put the same check into TRUNCATE,
CLUSTER, and REINDEX that is already in ALTER TABLE, namely that we
reject the command if the current transaction is already holding
the table open.

The issue Steven directly complained of is a potential for undetected
deadlock via LockBufferForCleanup. Ordinarily, buffer-level locks don't
pose a deadlock risk because we don't hold one while trying to acquire
another (except in UPDATE, which uses an ordering rule to avoid the
risk). The problem with LockBufferForCleanup is that it can be blocked
by a mere pin, which another backend could well hold while trying to
acquire a lock that will be blocked by VACUUM.

There are a couple of migitating factors: first, patching TRUNCATE et al
as suggested above will prevent the immediate case, and second, as of
8.3 this isn't a problem for autovacuum because of the facility for
kicking autovacuum off a table if it's blocking someone else's lock
request. Still, undetected deadlocks are unpleasant, so it'd be nice
to have some way to recognize the situation if we do get into it.
I have no idea about a reasonable way to do that though. Getting the
heavyweight lock manager involved in buffer accesses seems right out on
performance grounds.

Comments, ideas?

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Thoughts about bug #3883
Date: 2008-01-22 15:42:39
Message-ID: 20080122154239.GC10897@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:

> What I propose we do about this is put the same check into TRUNCATE,
> CLUSTER, and REINDEX that is already in ALTER TABLE, namely that we
> reject the command if the current transaction is already holding
> the table open.

+1.

> The issue Steven directly complained of is a potential for undetected
> deadlock via LockBufferForCleanup. Ordinarily, buffer-level locks don't
> pose a deadlock risk because we don't hold one while trying to acquire
> another (except in UPDATE, which uses an ordering rule to avoid the
> risk). The problem with LockBufferForCleanup is that it can be blocked
> by a mere pin, which another backend could well hold while trying to
> acquire a lock that will be blocked by VACUUM.

Seems like a hard problem.

I wonder if we can invoke the deadlock checker in LockBufferForCleanup.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Thoughts about bug #3883
Date: 2008-01-25 18:49:54
Message-ID: 3393.1201286994@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

I wrote:
> The issue Steven directly complained of is a potential for undetected
> deadlock via LockBufferForCleanup. Ordinarily, buffer-level locks don't
> pose a deadlock risk because we don't hold one while trying to acquire
> another (except in UPDATE, which uses an ordering rule to avoid the
> risk). The problem with LockBufferForCleanup is that it can be blocked
> by a mere pin, which another backend could well hold while trying to
> acquire a lock that will be blocked by VACUUM.

> There are a couple of migitating factors: first, patching TRUNCATE et al
> as suggested above will prevent the immediate case, and second, as of
> 8.3 this isn't a problem for autovacuum because of the facility for
> kicking autovacuum off a table if it's blocking someone else's lock
> request.

I did some experimentation and was dismayed to find that that last
statement isn't true --- at least on HPUX, the attempted SIGINT doesn't
cause autovacuum to cancel. Some debugging and manual-reading led me to
the conclusion that a signal marked SA_RESTART is serviced but does not
cause semop() to return with EINTR; it just goes right back to waiting,
so we never get to the CHECK_FOR_INTERRUPTS call in PGSemaphoreLock.
Furthermore this appears to be the behavior expected by the Single Unix
Spec. Cancellation does work on Linux, whose man page for semop() says

semop() is never automatically restarted after being interrupted by a
signal handler, regardless of the setting of the SA_RESTART flag when
establishing a signal handler.

But there is no such statement in the SUS. Mac OS X seems to behave
like Linux, though this seems directly contrary to what it says in
its man pages. That may mean that other BSDen do likewise.

If HPUX is the only platform that behaves like this, it might be that
we don't care enough to fix it, but I suspect that we might have a
problem on some other platforms too.

PGSemaphoreLock is called in three places: LWLockAcquire, which isn't
a problem because we don't want to accept cancel interrupts there
anyway; ProcWaitForSignal, which is the problem case; and ProcSleep
(ie, heavyweight-lock acquisition), which survives this case because it
sets lockAwaited, which is tested inside the interrupt service routine
via LockWaitCancel. The simplest fix seems to be to invent an
additional flag variable "signalAwaited" which is set/cleared by
ProcWaitForSignal and checked by LockWaitCancel. This would make
cancelling out of a ProcWaitForSignal call exactly analogous to
cancelling out of a heavyweight-lock acquisition.

My first inclination is to fix this in HEAD but not back branches,
because HEAD is the only branch where we really care that much about
autovacuum responding to SIGINT. On the other hand, the undetected
deadlock is real enough in back branches, and maybe we should back-patch
it so that DBAs can get a manually issued signal to work. Thoughts?

regards, tom lane


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Thoughts about bug #3883
Date: 2008-01-25 20:52:28
Message-ID: 87abmt631v.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> The simplest fix seems to be to invent an additional flag variable
> "signalAwaited" which is set/cleared by ProcWaitForSignal and checked by
> LockWaitCancel. This would make cancelling out of a ProcWaitForSignal call
> exactly analogous to cancelling out of a heavyweight-lock acquisition.

Is that the flag that is an assertion that no cleanup is needed? Or is that
something else?

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's 24x7 Postgres support!


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Thoughts about bug #3883
Date: 2008-01-25 21:16:25
Message-ID: 19175.1201295785@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> The simplest fix seems to be to invent an additional flag variable
>> "signalAwaited" which is set/cleared by ProcWaitForSignal and checked by
>> LockWaitCancel. This would make cancelling out of a ProcWaitForSignal call
>> exactly analogous to cancelling out of a heavyweight-lock acquisition.

> Is that the flag that is an assertion that no cleanup is needed? Or is that
> something else?

No, the problem is merely to get LockWaitCancel to return "true" so that
StatementCancelHandler will go ahead with the immediate interrupt. No
new cleanup is needed other than resetting the new flag.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-patches(at)postgreSQL(dot)org
Subject: Re: [HACKERS] Thoughts about bug #3883
Date: 2008-01-25 23:00:41
Message-ID: 25179.1201302041@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

I wrote:
> No, the problem is merely to get LockWaitCancel to return "true" so that
> StatementCancelHandler will go ahead with the immediate interrupt. No
> new cleanup is needed other than resetting the new flag.

Actually there's a better way to do it: we can remove a little bit of
complexity instead of adding more. The basic problem is that
StatementCancelHandler wants to tell the difference between waiting for
client input (which there is no use for it to interrupt) and other
states in which ImmediateInterruptOK is set. ProcWaitForSignal() is
falling on the wrong side of the classification --- and I argue that any
other newly added interruptable wait would too. We should reverse the
sense of the test so that it's "not in client input" instead of "in lock
wait". At the time that code was written, there wasn't any conveniently
cheap way to check for client input state, so we kluged up
LockWaitCancel to detect the other case. But now that we have the
DoingCommandRead flag, it's easy to check that. This lets us remove
LockWaitCancel's return value (which was always a bit untidy, since all
but one of its callers ignored the result), ending up with exactly
parallel code in die() and StatementCancelHandler(). Seems cleaner than
before.

regards, tom lane

Attachment Content-Type Size
unknown_filename text/plain 10.6 KB