Re: Assertion failure on hot standby

Lists: pgsql-hackers
From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Assertion failure on hot standby
Date: 2010-11-24 04:27:33
Message-ID: AANLkTi=NvzerYbWH1ObX48px5euYSUup7+eEhqYkyzM0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

http://archives.postgresql.org/pgsql-hackers/2010-11/msg01303.php

When I did unusual operations (e.g., suspend bgwriter by gdb,
pgbench -i and issue txid_current many times) on the master
in order to try to reproduce the above HS error, I encountered
the following assertion error.

Since I compiled the standby postgres with WAL_DEBUG and
ran it with wal_debug = on, all the replayed WAL records were
logged.

------------------------
sby LOG: REDO @ 0/134C0490; LSN 0/134C04D0: prev 0/134C0450; xid
23253; len 32: Transaction - commit: 2010-11-24 12:15:02.315634+09
sby LOG: REDO @ 0/134C04D0; LSN 0/134C0510: prev 0/134C0490; xid
23254; len 32: Transaction - commit: 2010-11-24 12:15:02.325252+09
sby LOG: consistent recovery state reached at 0/134C0510
sby LOG: REDO @ 0/134C0510; LSN 0/134C0550: prev 0/134C04D0; xid
23255; len 32: Transaction - commit: 2010-11-24 12:15:09.224343+09
sby LOG: REDO @ 0/134C0550; LSN 0/134C0580: prev 0/134C0510; xid 0;
len 16: Standby - AccessExclusive locks: xid 0 db 11910 rel 16409
sby LOG: REDO @ 0/134C0580; LSN 0/134C05B8: prev 0/134C0550; xid 0;
len 20: Standby - running xacts: nextXid 23256 latestCompletedXid
23255 oldestRunningXid 23256
TRAP: FailedAssertion("!(((xid) != ((TransactionId) 0)))", File:
"twophase.c", Line: 1209)
sby LOG: database system is ready to accept read only connections
sby LOG: startup process (PID 32666) was terminated by signal 6: Aborted
sby LOG: terminating any other active server processes
------------------------

Does anyone know what the cause of the problem is?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Assertion failure on hot standby
Date: 2010-11-25 07:59:48
Message-ID: AANLkTikGGKPFPBGSVxF2C2oPsMcopchZjUjLU129vrg8@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Nov 24, 2010 at 1:27 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> Hi,
>
> http://archives.postgresql.org/pgsql-hackers/2010-11/msg01303.php
>
> When I did unusual operations (e.g., suspend bgwriter by gdb,
> pgbench -i and issue txid_current many times) on the master
> in order to try to reproduce the above HS error, I encountered
> the following assertion error.
>
> Since I compiled the standby postgres with WAL_DEBUG and
> ran it with wal_debug = on, all the replayed WAL records were
> logged.
>
> ------------------------
> sby LOG:  REDO @ 0/134C0490; LSN 0/134C04D0: prev 0/134C0450; xid
> 23253; len 32: Transaction - commit: 2010-11-24 12:15:02.315634+09
> sby LOG:  REDO @ 0/134C04D0; LSN 0/134C0510: prev 0/134C0490; xid
> 23254; len 32: Transaction - commit: 2010-11-24 12:15:02.325252+09
> sby LOG:  consistent recovery state reached at 0/134C0510
> sby LOG:  REDO @ 0/134C0510; LSN 0/134C0550: prev 0/134C04D0; xid
> 23255; len 32: Transaction - commit: 2010-11-24 12:15:09.224343+09
> sby LOG:  REDO @ 0/134C0550; LSN 0/134C0580: prev 0/134C0510; xid 0;
> len 16: Standby - AccessExclusive locks: xid 0 db 11910 rel 16409
> sby LOG:  REDO @ 0/134C0580; LSN 0/134C05B8: prev 0/134C0550; xid 0;
> len 20: Standby -  running xacts: nextXid 23256 latestCompletedXid
> 23255 oldestRunningXid 23256
> TRAP: FailedAssertion("!(((xid) != ((TransactionId) 0)))", File:
> "twophase.c", Line: 1209)
> sby LOG:  database system is ready to accept read only connections
> sby LOG:  startup process (PID 32666) was terminated by signal 6: Aborted
> sby LOG:  terminating any other active server processes
> ------------------------
>
> Does anyone know what the cause of the problem is?

I was able to reproduce this problem. This happens because CHECKPOINT
can write the WAL record indicating that the transaction with XID = 0 has taken
the AccessExclusive lock. This WAL record causes that assertion failure in the
standby.

Here is the procedure to reproduce the problem:

-------------------
1. Execute "DROP TABLE" and suspend the execution before calling

RemoveRelations -> LockRelationOid -> LockAcquire ->
LockAcquireExtended -> LogAccessExclusiveLock

by, for example, using gdb.

2. While "DROP TABLE" is being suspended, execute CHECKPOINT.
This CHECKPOINT will generate the above-mentioned WAL record.
-------------------

To solve the problem, ISTM that XID should be assigned before the information
about AccessExclusive lock becomes visible to another process. Or CHECKPOINT
(i.e., GetRunningTransactionLocks) should ignore the locks with XID = 0.

Thought?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Subject: Re: Assertion failure on hot standby
Date: 2010-11-25 22:40:56
Message-ID: 1290724856.14888.2151.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2010-11-25 at 16:59 +0900, Fujii Masao wrote:
> To solve the problem, ISTM that XID should be assigned before the
> information about AccessExclusive lock becomes visible to another
> process. Or CHECKPOINT (i.e., GetRunningTransactionLocks) should
> ignore the locks with XID = 0.

First, thanks for pursuing this. I realise I made the mistake of
assuming there was just one bug; I see that the bug Heikki was
discussing is a separate issue.

As to solutions, it cannot be acceptable to ignore some locks just
because an xid has not been assigned.

If I understand you correctly, it seems possible to generate an
AccessExclusiveLock before an xid is assigned, so that its possible to
log that situation before the transaction assigns an xid slightly later.
So there's a narrow window where we can generate a lock WAL record with
xid 0. The sensible resolution is to ensure that all
AccessExclusiveLocks have an xid assigned prior to them registering
their proclock.

That would mean running GetCurrentTransactionId() inside LockAcquire()

if (lockmode >= AccessExclusiveLock &&
locktag->locktag_type == LOCKTAG_RELATION &&
!RecoveryInProgress())
(void) GetCurrentTransactionId();

Any objections to that fix?

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Subject: Re: Assertion failure on hot standby
Date: 2010-11-26 02:19:34
Message-ID: AANLkTimqzkgOrZ7bEcVK9GePZx+S1Zo5+6mq=CQqq8=d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Nov 26, 2010 at 7:40 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> As to solutions, it cannot be acceptable to ignore some locks just
> because an xid has not been assigned.

Even if GetRunningTransactionLocks ignores such a lock, it's eventually
WAL-logged by LogAccessExclusiveLock, isn't it?

> If I understand you correctly, it seems possible to generate an
> AccessExclusiveLock before an xid is assigned, so that its possible to
> log that situation before the transaction assigns an xid slightly later.
> So there's a narrow window where we can generate a lock WAL record with
> xid 0.

Right.

> The sensible resolution is to ensure that all
> AccessExclusiveLocks have an xid assigned prior to them registering
> their proclock.
>
> That would mean running GetCurrentTransactionId() inside LockAcquire()
>
> if (lockmode >= AccessExclusiveLock &&
>    locktag->locktag_type == LOCKTAG_RELATION &&
>    !RecoveryInProgress())
>        (void) GetCurrentTransactionId();

s/GetCurrentTransactionId/GetTopTransactionId?

> Any objections to that fix?

Or can we call LogAccessExclusiveLock before registering the lock?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Subject: Re: Assertion failure on hot standby
Date: 2010-11-26 06:11:42
Message-ID: 28739.1290751902@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> That would mean running GetCurrentTransactionId() inside LockAcquire()

> if (lockmode >= AccessExclusiveLock &&
> locktag->locktag_type == LOCKTAG_RELATION &&
> !RecoveryInProgress())
> (void) GetCurrentTransactionId();

> Any objections to that fix?

Could we have a wal level test in there too please? It's pretty awful
in any case...

regards, tom lane


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Subject: Re: Assertion failure on hot standby
Date: 2010-11-26 08:44:22
Message-ID: 1290761062.14888.2736.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2010-11-26 at 11:19 +0900, Fujii Masao wrote:

> On Fri, Nov 26, 2010 at 7:40 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> > As to solutions, it cannot be acceptable to ignore some locks just
> > because an xid has not been assigned.
>
> Even if GetRunningTransactionLocks ignores such a lock, it's eventually
> WAL-logged by LogAccessExclusiveLock, isn't it?

If it were true always, I would much prefer your solution.

Assuming that would then cause a race condition between the logging of
the RunningXactsData and the lock, which wouldn't move us forwards.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Subject: Re: Assertion failure on hot standby
Date: 2010-11-26 12:32:18
Message-ID: AANLkTinSGpu=m-inxS6+R7yAfGCjLiWn_-Rgn7i-jVZt@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Nov 26, 2010 at 1:11 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
>> That would mean running GetCurrentTransactionId() inside LockAcquire()
>
>> if (lockmode >= AccessExclusiveLock &&
>>     locktag->locktag_type == LOCKTAG_RELATION &&
>>     !RecoveryInProgress())
>>       (void) GetCurrentTransactionId();
>
>> Any objections to that fix?
>
> Could we have a wal level test in there too please?  It's pretty awful
> in any case...

+1.

Incidentally, I haven't been able to wrap my head around why we need
to propagate AccessExclusiveLocks to the standby in the first place.
Can someone explain?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Subject: Re: Assertion failure on hot standby
Date: 2010-11-26 12:41:58
Message-ID: 201011261341.59331.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Friday 26 November 2010 13:32:18 Robert Haas wrote:
> On Fri, Nov 26, 2010 at 1:11 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> >> That would mean running GetCurrentTransactionId() inside LockAcquire()
> >>
> >> if (lockmode >= AccessExclusiveLock &&
> >> locktag->locktag_type == LOCKTAG_RELATION &&
> >> !RecoveryInProgress())
> >> (void) GetCurrentTransactionId();
> >>
> >> Any objections to that fix?
> >
> > Could we have a wal level test in there too please? It's pretty awful
> > in any case...
> Incidentally, I haven't been able to wrap my head around why we need
> to propagate AccessExclusiveLocks to the standby in the first place.
> Can someone explain?
To make the standby stop applying WAL when a local transaction on the standby
uses an object.
E.g. dropping a table on the master need the standby top stop applying wal (or
kill the local client using the table).
How would you want to protect against something like that otherwise?

Andres


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Subject: Re: Assertion failure on hot standby
Date: 2010-11-26 15:28:38
Message-ID: AANLkTikqGESwOdCMT8UXjuVBYjASXsK54KW3iy-yk-xe@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Nov 26, 2010 at 7:41 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On Friday 26 November 2010 13:32:18 Robert Haas wrote:
>> On Fri, Nov 26, 2010 at 1:11 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> > Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
>> >> That would mean running GetCurrentTransactionId() inside LockAcquire()
>> >>
>> >> if (lockmode >= AccessExclusiveLock &&
>> >>     locktag->locktag_type == LOCKTAG_RELATION &&
>> >>     !RecoveryInProgress())
>> >>       (void) GetCurrentTransactionId();
>> >>
>> >> Any objections to that fix?
>> >
>> > Could we have a wal level test in there too please?  It's pretty awful
>> > in any case...
>> Incidentally, I haven't been able to wrap my head around why we need
>> to propagate AccessExclusiveLocks to the standby in the first place.
>> Can someone explain?
> To make the standby stop applying WAL when a local transaction on the standby
> uses an object.
> E.g. dropping a table on the master need the standby top stop applying wal (or
> kill the local client using the table).
> How would you want to protect against something like that otherwise?

Hmm. But it seems like that it would be enough to log any exclusive
locks held at commit time, rather than logging them as they're
acquired. By then, the XID will be assigned (if you need it - if you
don't then you probably don't need to XLOG it anyway) and you avoid
holding the lock for more than a moment on the standby.

But it seems like an even better idea would be to actually XLOG the
operations that are problematic specifically. Because, for example,
if a user session on the master does LOCK TABLE ... IN ACCESS
EXCLUSIVE MODE, AFAICS there's no reason for the standby to care. Or
am I confused?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Subject: Re: Assertion failure on hot standby
Date: 2010-11-26 15:53:29
Message-ID: 4CEFD7F9.20608@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 26.11.2010 17:28, Robert Haas wrote:
> On Fri, Nov 26, 2010 at 7:41 AM, Andres Freund<andres(at)anarazel(dot)de> wrote:
>> On Friday 26 November 2010 13:32:18 Robert Haas wrote:
>>> On Fri, Nov 26, 2010 at 1:11 AM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>> Simon Riggs<simon(at)2ndQuadrant(dot)com> writes:
>>>>> That would mean running GetCurrentTransactionId() inside LockAcquire()
>>>>>
>>>>> if (lockmode>= AccessExclusiveLock&&
>>>>> locktag->locktag_type == LOCKTAG_RELATION&&
>>>>> !RecoveryInProgress())
>>>>> (void) GetCurrentTransactionId();
>>>>>
>>>>> Any objections to that fix?
>>>>
>>>> Could we have a wal level test in there too please? It's pretty awful
>>>> in any case...
>>> Incidentally, I haven't been able to wrap my head around why we need
>>> to propagate AccessExclusiveLocks to the standby in the first place.
>>> Can someone explain?
>> To make the standby stop applying WAL when a local transaction on the standby
>> uses an object.
>> E.g. dropping a table on the master need the standby top stop applying wal (or
>> kill the local client using the table).
>> How would you want to protect against something like that otherwise?
>
> Hmm. But it seems like that it would be enough to log any exclusive
> locks held at commit time, rather than logging them as they're
> acquired. By then, the XID will be assigned (if you need it - if you
> don't then you probably don't need to XLOG it anyway) and you avoid
> holding the lock for more than a moment on the standby.
>
> But it seems like an even better idea would be to actually XLOG the
> operations that are problematic specifically. Because, for example,
> if a user session on the master does LOCK TABLE ... IN ACCESS
> EXCLUSIVE MODE, AFAICS there's no reason for the standby to care. Or
> am I confused?

Let's approach this from a different direction:

If you have operation A in the master that currently acquires an
AccessExclusiveLock on a table, do you think it's safe for another
transaction to peek at the table at the same time? Say, run a heap scan
simultaneously. If yes, why did you take an AccessExclusiveLock in the
first place. If not, then surely it's not safe to have a heap scan
running against the table in the standby either. The read-only query in
the standby sees the same actions as a read-only query on the master
would see.

You can only take AccessShareLocks on the standby, and the only locks
that conflict with an AccessShareLock is the AccessExclusiveLock. (Hmm,
looking at the code, we also allow RowShareLock and RowExclusiveLock in
the standby. You can't actually insert/update/delete tuples or set xmax
as SELECT FOR SHARE does on standby, though, so why do we allow that? )

As a concrete example, VACUUM acquires an AccessExclusiveLock when it
wants to truncate the relation. A sequential scan running against the
table in the standby will get upset, if the startup process replays a
truncation record on the table without warning.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Subject: Re: Assertion failure on hot standby
Date: 2010-11-26 16:11:22
Message-ID: 1290787882.2981.35.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2010-11-26 at 17:53 +0200, Heikki Linnakangas wrote:
> Hmm,
> looking at the code, we also allow RowShareLock and RowExclusiveLock
> in
> the standby. You can't actually insert/update/delete tuples or set
> xmax
> as SELECT FOR SHARE does on standby, though, so why do we allow that?

It was considered sensible to allow PREPARE on DML on a standby, which
required those lock levels. Archives...

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Subject: Re: Assertion failure on hot standby
Date: 2010-11-26 18:17:57
Message-ID: AANLkTi=fWbsNsCY5FrY1ACqDhrb6EQtz7fvUUV1YwvZ8@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Nov 26, 2010 at 10:53 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>>> Incidentally, I haven't been able to wrap my head around why we need
>>>> to propagate AccessExclusiveLocks to the standby in the first place.
>>>> Can someone explain?
>>>
>>> To make the standby stop applying WAL when a local transaction on the
>>> standby
>>> uses an object.
>>> E.g. dropping a table on the master need the standby top stop applying
>>> wal (or
>>> kill the local client using the table).
>>> How would you want to protect against something like that otherwise?
>>
>> Hmm.  But it seems like that it would be enough to log any exclusive
>> locks held at commit time, rather than logging them as they're
>> acquired.  By then, the XID will be assigned (if you need it - if you
>> don't then you probably don't need to XLOG it anyway) and you avoid
>> holding the lock for more than a moment on the standby.
>>
>> But it seems like an even better idea would be to actually XLOG the
>> operations that are problematic specifically.  Because, for example,
>> if a user session on the master does LOCK TABLE ... IN ACCESS
>> EXCLUSIVE MODE, AFAICS there's no reason for the standby to care.  Or
>> am I confused?
>
> Let's approach this from a different direction:
>
> If you have operation A in the master that currently acquires an
> AccessExclusiveLock on a table, do you think it's safe for another
> transaction to peek at the table at the same time?

Beep, time out. The notion of "at the same time" is extremely fuzzy
here. The operations on the master and slave are not simultaneous, or
anything close to it. Let's go back to the case of a dropped table.
Suppose that, on the master, someone begins a transaction, drops a
table, and heads out to lunch. Upon returning, they commit the
transaction. At what point does it became unsafe for readers on the
standby to be looking at the table? Surely, the whole time the guy is
out to lunch, readers on the standby are free to do whatever they
want. Only at the point when we actually remove the file does it
become a problem for somebody to be in the middle of using it.

In fact, you could apply the same logic to the master, if you were
willing to defer the removal of the actual physical file until all
transactions that were using it released their locks. The reason we
don't do that - aside from complexity - is that it would result in an
unpredictable and indefinite delay between issuing the DROP TABLE
command and OS-level storage reclamation. But in the standby
situation, there is *already* an unpredictable and indefinite delay.
The standby can fall behind in applying WAL, lose connectivity, have
replay paused, etc. You lose nothing by waiting until the last
possible moment to kick everyone out. (In fact, you gain something:
the standby is more usable.)

The problem here is not propagating operations from the master, but
making sure that actions performed by the startup process on the
standby are properly locked. In the case of dropping a relation, the
problem is that the startup process only knows which relfilenode it
needs to blow away, not which relation that relfilenode is associated
with. If the AccessShareLock were against the relfilenode rather than
the relation itself, the startup process would have no problem at all
generating a conflicting lock - it would simply lock each relfilenode
before dropping it, without any additional XLOG information at all.

> As a concrete example, VACUUM acquires an AccessExclusiveLock when it wants
> to truncate the relation. A sequential scan running against the table in the
> standby will get upset, if the startup process replays a truncation record
> on the table without warning.

This case is similar. xl_smgr_truncate has only a relfilenode number,
not a relation OID, so there's no way for the startup process to
generate a conflicting lock request itself. But if the standby
backends locked the relfilenode, or if the xl_smgr_truncate WAL record
included the relation OID, it would be simple.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Subject: Re: Assertion failure on hot standby
Date: 2010-11-26 19:06:18
Message-ID: 4CF0052A.9030200@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 26.11.2010 20:17, Robert Haas wrote:
> On Fri, Nov 26, 2010 at 10:53 AM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> If you have operation A in the master that currently acquires an
>> AccessExclusiveLock on a table, do you think it's safe for another
>> transaction to peek at the table at the same time?
>
> Beep, time out. The notion of "at the same time" is extremely fuzzy
> here. The operations on the master and slave are not simultaneous, or
> anything close to it.

In this context, "at the same time" means "at the same point in WAL".

>> As a concrete example, VACUUM acquires an AccessExclusiveLock when it wants
>> to truncate the relation. A sequential scan running against the table in the
>> standby will get upset, if the startup process replays a truncation record
>> on the table without warning.
>
> This case is similar. xl_smgr_truncate has only a relfilenode number,
> not a relation OID, so there's no way for the startup process to
> generate a conflicting lock request itself. But if the standby
> backends locked the relfilenode, or if the xl_smgr_truncate WAL record
> included the relation OID, it would be simple.

If you go down that path, you're going to spend a lot of time thinking
through every single case that uses an AccessExclusiveLock, ensuring
that the standby has enough information, and tinkering with the replay
code to acquire locks at the right moment. And gain what, exactly?

WAL-logging AccessExclusiveLocks is a bit ugly, but it beats having to
infer that information from other records hands down.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Subject: Re: Assertion failure on hot standby
Date: 2010-11-26 21:55:45
Message-ID: AANLkTind_2ZDhuxdUC+yO7tiKHJ0UkRY6RY4BhL6+z6G@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Nov 26, 2010 at 2:06 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>> As a concrete example, VACUUM acquires an AccessExclusiveLock when it
>>> wants
>>> to truncate the relation. A sequential scan running against the table in
>>> the
>>> standby will get upset, if the startup process replays a truncation
>>> record
>>> on the table without warning.
>>
>> This case is similar.  xl_smgr_truncate has only a relfilenode number,
>> not a relation OID, so there's no way for the startup process to
>> generate a conflicting lock request itself.  But if the standby
>> backends locked the relfilenode, or if the xl_smgr_truncate WAL record
>> included the relation OID, it would be simple.
>
> If you go down that path, you're going to spend a lot of time thinking
> through every single case that uses an AccessExclusiveLock, ensuring that
> the standby has enough information, and tinkering with the replay code to
> acquire locks at the right moment. And gain what, exactly?

Well, fewer useless locks on the standby, for one thing, in all
likelihood, and less WAL traffic. We probably don't have much of a
choice but to put in Simon's suggested fix (with a wal_level check
added) for 9.0. But I suspect we're going to end up revisiting this
down the road. Beyond the usability issues, the current approach
sounds brittle as heck to me. All somebody has to do is introduce a
mechanism that drops or rewrites a relation file without an access
exclusive lock, and this whole approach snaps right off (e.g. CLUSTER
CONCURRENTLY, taking only an EXCLUSIVE lock, with some kind of garbage
collection for the old files left behind; or an on-line
table-compaction operation that tries to systematically move tuples to
lower CTIDs by - for example - writing the new tuple, waiting until
all scans in progress at the time of the write have finished, and then
pruning the old tuples).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Simon Riggs <simon(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Subject: Re: Assertion failure on hot standby
Date: 2010-11-26 23:35:32
Message-ID: 5702.1290814532@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Fri, Nov 26, 2010 at 2:06 PM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> If you go down that path, you're going to spend a lot of time thinking
>> through every single case that uses an AccessExclusiveLock, ensuring that
>> the standby has enough information, and tinkering with the replay code to
>> acquire locks at the right moment. And gain what, exactly?

> Well, fewer useless locks on the standby, for one thing, in all
> likelihood, and less WAL traffic.

I think it's not only useless from a performance standpoint, but
probably actually dangerous, to not take AccessExclusiveLock on the
standby when it's taken on the master. If you try to delay taking the
lock, then locks will be taken in a different order on master and
standby, which is quite likely to lead to deadlock situations.

Speaking of which, is there any code in there to ensure that a deadlock
in the standby is resolved by killing HS queries and not the replay
process? Because deadlocks are certainly going to be possible no matter
what.

> All somebody has to do is introduce a
> mechanism that drops or rewrites a relation file without an access
> exclusive lock, and this whole approach snaps right off

... as would queries on the master, so that's not ever happening.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Simon Riggs <simon(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Subject: Re: Assertion failure on hot standby
Date: 2010-11-27 00:02:27
Message-ID: AANLkTi=ytwfhJg2yXobstkn4PQiDQ=xCr4fs6vPz9LWo@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Nov 26, 2010 at 6:35 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Fri, Nov 26, 2010 at 2:06 PM, Heikki Linnakangas
>> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>> If you go down that path, you're going to spend a lot of time thinking
>>> through every single case that uses an AccessExclusiveLock, ensuring that
>>> the standby has enough information, and tinkering with the replay code to
>>> acquire locks at the right moment. And gain what, exactly?
>
>> Well, fewer useless locks on the standby, for one thing, in all
>> likelihood, and less WAL traffic.
>
> I think it's not only useless from a performance standpoint, but
> probably actually dangerous, to not take AccessExclusiveLock on the
> standby when it's taken on the master.  If you try to delay taking the
> lock, then locks will be taken in a different order on master and
> standby, which is quite likely to lead to deadlock situations.

As far as I can see, that's complete nonsense. Deadlocks between what
and what? To get a deadlock on the standby, you have to have a cycle
in the lock-wait graph; and since recovery is single-threaded, all
locks from the master are held by the startup process. The only
possible cycle is between the startup process and some HS backend; and
why should we assume that the HS backend will acquire locks in the
same order as the transactions running on the master rather than any
other order? Perhaps that could be accomplished by very careful
application design, but in normal usage it doesn't sound very likely.

In fact, now that I think about it, what I'm proposing would actually
substantially REDUCE the risk of deadlock on the standby, because the
master would only ever need to lock a backing file long enough to drop
or truncate it, whereas under the present system the startup process
might need to hold many locks at once. It'd help reduce the chance of
lock table overflow, too.

> Speaking of which, is there any code in there to ensure that a deadlock
> in the standby is resolved by killing HS queries and not the replay
> process?  Because deadlocks are certainly going to be possible no matter
> what.

I believe the place to look is ResolveRecoveryConflictWithLock().

>> All somebody has to do is introduce a
>> mechanism that drops or rewrites a relation file without an access
>> exclusive lock, and this whole approach snaps right off
>
> ... as would queries on the master, so that's not ever happening.

There might be a problem with what I wrote in the part you didn't
quote here, but if there is, an explanation would be a lot more
helpful than a categorical statement unsupported by any argument.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Simon Riggs <simon(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Subject: Re: Assertion failure on hot standby
Date: 2010-11-27 00:27:09
Message-ID: 6794.1290817629@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Fri, Nov 26, 2010 at 6:35 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I think it's not only useless from a performance standpoint, but
>> probably actually dangerous, to not take AccessExclusiveLock on the
>> standby when it's taken on the master.

> As far as I can see, that's complete nonsense. Deadlocks between what
> and what?

master locks table A, then table B. Some transaction on the standby
locks table A, then table B (perhaps only with AccessShareLock).
This will work reliably only as long as we don't change the order in
which replay acquires locks.

Now, if the standby process takes its locks in the other order, then
it's at risk anyway. But coding that works reliably against the master
should not randomly fail on the slave.

> In fact, now that I think about it, what I'm proposing would actually
> substantially REDUCE the risk of deadlock on the standby, because the
> master would only ever need to lock a backing file long enough to drop
> or truncate it, whereas under the present system the startup process
> might need to hold many locks at once.

Now you're the one spouting nonsense. Consider a master transaction
that does

begin;
lock table foo;
alter table foo --- some rewriting table operation;
alter table foo --- some rewriting table operation;
alter table foo --- some rewriting table operation;
alter table foo --- some rewriting table operation;
alter table foo --- some rewriting table operation;
alter table foo --- some rewriting table operation;
alter table foo --- some rewriting table operation;
commit;

On the master, no other transaction can see the intermediate states.
We don't want that property to disappear on the slave.

regards, tom lane


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Subject: Re: Assertion failure on hot standby
Date: 2010-11-27 10:05:39
Message-ID: 1290852339.2981.1022.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2010-11-26 at 18:35 -0500, Tom Lane wrote:
> Speaking of which, is there any code in there to ensure that a
> deadlock
> in the standby is resolved by killing HS queries and not the replay
> process? Because deadlocks are certainly going to be possible no
> matter
> what.

Locks cause query conflicts, so any attempt to take a lock that is
blocked will be resolved before a deadlock takes place. So that
situation does not arise in the startup process.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Simon Riggs <simon(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Subject: Re: Assertion failure on hot standby
Date: 2010-11-27 11:00:07
Message-ID: AANLkTikfvLBOy2jmQNOkaPFic5ND7W4D6Mmesa--Y2qt@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Nov 26, 2010 at 7:27 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> In fact, now that I think about it, what I'm proposing would actually
>> substantially REDUCE the risk of deadlock on the standby, because the
>> master would only ever need to lock a backing file long enough to drop
>> or truncate it, whereas under the present system the startup process
>> might need to hold many locks at once.
>
> Now you're the one spouting nonsense.  Consider a master transaction
> that does
>
>        begin;
>        lock table foo;
>        alter table foo --- some rewriting table operation;
>        alter table foo --- some rewriting table operation;
>        alter table foo --- some rewriting table operation;
>        alter table foo --- some rewriting table operation;
>        alter table foo --- some rewriting table operation;
>        alter table foo --- some rewriting table operation;
>        alter table foo --- some rewriting table operation;
>        commit;
>
> On the master, no other transaction can see the intermediate states.
> We don't want that property to disappear on the slave.

And why would it? Which states are visible is something that gets
controlled by MVCC, not by the order in which we remove backing files.

It's possible that to make what I'm describing work correctly, you
need to acquire the locks and then drop all the backing files, rather
than interleaving those operations. I'm not sure about that. But
acquiring them at the same point in WAL that master did seems quite
pointless - AFAICS, you lose nothing by postponing until commit time.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Subject: Re: Assertion failure on hot standby
Date: 2010-11-29 01:14:23
Message-ID: 1290993263.4634.3142.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2010-11-26 at 01:11 -0500, Tom Lane wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> > That would mean running GetCurrentTransactionId() inside LockAcquire()
>
> > if (lockmode >= AccessExclusiveLock &&
> > locktag->locktag_type == LOCKTAG_RELATION &&
> > !RecoveryInProgress())
> > (void) GetCurrentTransactionId();
>
> > Any objections to that fix?
>
> Could we have a wal level test in there too please? It's pretty awful
> in any case...

Slightly neater version of same idea applied to resolve this.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services