Quick Links

Hot Standby: too many KnownAssignedXids

Lists:	pgsql-hackers

From:	Joachim Wieland <joe(at)mcknight(dot)de>
To:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Hot Standby: too many KnownAssignedXids
Date:	2010-11-19 21:46:04
Message-ID:	AANLkTi=Z0DS9zDk=8ooOYUOwMCcXm5FwpkHffMX6fFcN@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

I am seeing the following here on 9.0.1 on Linux x86-64:

LOG: redo starts at 1F8/FC00E978
FATAL: too many KnownAssignedXids
CONTEXT: xlog redo insert: rel 1663/16384/18373; tid 3829898/23

and this is the complete history:

postgres was running as HS in foreground, Ctrl-C'ed it for a restart.

LOG: received fast shutdown request
LOG: aborting any active transactions
FATAL: terminating walreceiver process due to administrator command
FATAL: terminating connection due to administrator command
LOG: shutting down
LOG: database system is shut down

Started it up again:

$ postgres -D /db/
LOG: database system was shut down in recovery at 2010-11-19 14:36:30 EST
LOG: entering standby mode
cp: cannot stat `/archive/00000001000001F900000001': No such file or directory
cp: cannot stat `/archive/00000001000001F8000000FC': No such file or directory
LOG: redo starts at 1F8/FC00E978
FATAL: too many KnownAssignedXids
CONTEXT: xlog redo insert: rel 1663/16384/18373; tid 3829898/23
LOG: startup process (PID 30052) exited with exit code 1
LOG: terminating any other active server processes

(copied the log files over...)

./postgres -D /db/

LOG: database system was interrupted while in recovery at log time
2010-11-19 14:36:12 EST
HINT: If this has occurred more than once some data might be
corrupted and you might need to choose an earlier recovery target.
LOG: entering standby mode
LOG: restored log file "00000001000001F900000001" from archive
LOG: restored log file "00000001000001F8000000FC" from archive
LOG: redo starts at 1F8/FC00E978
FATAL: too many KnownAssignedXids
CONTEXT: xlog redo insert: rel 1663/16384/18373; tid 3829898/23
LOG: startup process (PID 31581) exited with exit code 1
LOG: terminating any other active server processes

Changing the line in the source code to give some more output gives me:

FATAL: too many KnownAssignedXids. head: 0, tail: 0, nxids: 9978,
pArray->maxKnownAssignedXids: 6890

I still have the server, if you want me to debug anything or send a
patch against 9.0.1 that gives more output, just let me know.

Joachim

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Joachim Wieland <joe(at)mcknight(dot)de>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby: too many KnownAssignedXids
Date:	2010-11-22 04:48:24
Message-ID:	AANLkTimJNHXy9T9CsRQruOW+dkke_GmiHA+btGCy2+HO@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Nov 20, 2010 at 6:46 AM, Joachim Wieland <joe(at)mcknight(dot)de> wrote:
> I still have the server, if you want me to debug anything or send a
> patch against 9.0.1 that gives more output, just let me know.

Per previous Simon's comment, the following information would be useful.
http://archives.postgresql.org/pgsql-general/2010-10/msg00154.php

--------------
If you suspect a bug in Hot Standby, please set
trace_recovery_messages = DEBUG2
in postgresql.conf and repeat the action

Always useful to know
* max_connections
* current number of sessions
* whether we have two phase commits happening
--------------

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Joachim Wieland <joe(at)mcknight(dot)de>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby: too many KnownAssignedXids
Date:	2010-11-22 20:49:09
Message-ID:	AANLkTinSKsm-vqPTwKjDMoZ7=Jn1OXCz0M1MohEfoHgA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Nov 21, 2010 at 11:48 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> --------------
> If you suspect a bug in Hot Standby, please set
> trace_recovery_messages = DEBUG2
> in postgresql.conf and repeat the action
>
> Always useful to know
> * max_connections
> * current number of sessions
> * whether we have two phase commits happening
> --------------

The trace_recovery_messages parameter does not give more output...

max_connections is set to 100

there have been no sessions on the standby itself, but a few on the
primary database, I don't know how much but probably not more than 10.
The sessions there were doing quite some load however, among them
slony synchronization, since the hot standby master database was
actually a slony replica.

max_prepared_transactions has not been changed from its default value of 0.

Joachim

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Joachim Wieland <joe(at)mcknight(dot)de>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby: too many KnownAssignedXids
Date:	2010-11-23 07:45:25
Message-ID:	4CEB7115.3030409@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 19.11.2010 23:46, Joachim Wieland wrote:
> FATAL: too many KnownAssignedXids. head: 0, tail: 0, nxids: 9978,
> pArray->maxKnownAssignedXids: 6890

Hmm, that's a lot of entries in KnownAssignedXids.

Can you recompile with WAL_DEBUG, and run the recovery again with
wal_debug=on ? That will print all the replayed WAL records, which is a
lot of data, but it might give a hint what's going on.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Joachim Wieland <joe(at)mcknight(dot)de>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby: too many KnownAssignedXids
Date:	2010-11-24 04:56:56
Message-ID:	AANLkTikyUTTUTzvN5rqrme2mWZBPstV=P8tAmwR-y_jW@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Nov 23, 2010 at 8:45 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> On 19.11.2010 23:46, Joachim Wieland wrote:
>>
>> FATAL: too many KnownAssignedXids. head: 0, tail: 0, nxids: 9978,
>> pArray->maxKnownAssignedXids: 6890
>
> Hmm, that's a lot of entries in KnownAssignedXids.
>
> Can you recompile with WAL_DEBUG, and run the recovery again with
> wal_debug=on ? That will print all the replayed WAL records, which is a lot
> of data, but it might give a hint what's going on.

Sure, but this gives me only one more line:

[...]
LOG: redo starts at 1F8/FC00E978
LOG: REDO @ 1F8/FC00E978; LSN 1F8/FC00EE90: prev 1F8/FC00E930; xid
385669; len 21; bkpb1: Heap - insert: rel 1663/16384/18373; tid
3829898/23
FATAL: too many KnownAssignedXids
CONTEXT: xlog redo insert: rel 1663/16384/18373; tid 3829898/23
LOG: startup process (PID 4587) exited with exit code 1
LOG: terminating any other active server processes

Joachim

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Joachim Wieland <joe(at)mcknight(dot)de>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby: too many KnownAssignedXids
Date:	2010-11-24 10:48:25
Message-ID:	4CECED79.6030101@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 24.11.2010 06:56, Joachim Wieland wrote:
> On Tue, Nov 23, 2010 at 8:45 AM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> On 19.11.2010 23:46, Joachim Wieland wrote:
>>>
>>> FATAL: too many KnownAssignedXids. head: 0, tail: 0, nxids: 9978,
>>> pArray->maxKnownAssignedXids: 6890
>>
>> Hmm, that's a lot of entries in KnownAssignedXids.
>>
>> Can you recompile with WAL_DEBUG, and run the recovery again with
>> wal_debug=on ? That will print all the replayed WAL records, which is a lot
>> of data, but it might give a hint what's going on.
>
> Sure, but this gives me only one more line:
>
> [...]
> LOG: redo starts at 1F8/FC00E978
> LOG: REDO @ 1F8/FC00E978; LSN 1F8/FC00EE90: prev 1F8/FC00E930; xid
> 385669; len 21; bkpb1: Heap - insert: rel 1663/16384/18373; tid
> 3829898/23
> FATAL: too many KnownAssignedXids
> CONTEXT: xlog redo insert: rel 1663/16384/18373; tid 3829898/23
> LOG: startup process (PID 4587) exited with exit code 1
> LOG: terminating any other active server processes

Thanks, I can reproduce this now. This happens when you have a wide gap
between the oldest still active xid and the latest xid.

When recovery starts, we fetch the oldestActiveXid from the checkpoint
record. Let's say that it's 100. We then start replaying WAL records
from the Redo pointer, and the first record (heap insert in your case)
contains an Xid that's much larger than 100, say 10000. We call
RecordKnownAssignedXids() to make note that all xids between that range
are in-progress, but there isn't enough room in the array for that.

We normally get away with a smallish array because the array is trimmed
at commit and abort records, and the special xid-assignment record to
handle the case of a lot of subtransactions. We initialize the array
from the running-xacts record that's written at a checkpoint. That
mechanism fails in this case because the heap insert record is seen
before the running-xacts record, causing all those xids in the range
100-10000 to be considered in-progress. The running-xacts record that
comes later would prune them, but we don't have enough slots to hold
them until that.

Hmm. I'm not sure off the top of my head how to fix that. Perhaps stash
the xids we see during WAL replay in private memory instead of putting
them in the KnownAssignedXids array until we see the running-xacts record.

To reproduce this, I did this in the master:

postgres=# CREATE FUNCTION insertfunc(n integer) RETURNS VOID AS $$
declare
i integer;
begin
FOR i IN 1..n LOOP
BEGIN
INSERT INTO foo VALUES (1);
EXCEPTION WHEN division_by_zero THEN RAISE NOTICE 'divbyzero';
END;
END LOOP;
end;
$$ LANGUAGE plpgsql;
postgres=# SELECT insertfunc(100000000);

After letting that run for a while, so that a couple of checkpoints have
occurred, kill the master and start standby to recover that from
archive. After it has replayed all the WAL, stop the standby and restart it.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Cc:	Joachim Wieland <joe(at)mcknight(dot)de>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Subject:	Re: Hot Standby: too many KnownAssignedXids
Date:	2010-11-24 11:38:03
Message-ID:	4CECF91B.3040105@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 24.11.2010 12:48, Heikki Linnakangas wrote:
> On 24.11.2010 06:56, Joachim Wieland wrote:
>> On Tue, Nov 23, 2010 at 8:45 AM, Heikki Linnakangas
>> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>> On 19.11.2010 23:46, Joachim Wieland wrote:
>>>>
>>>> FATAL: too many KnownAssignedXids. head: 0, tail: 0, nxids: 9978,
>>>> pArray->maxKnownAssignedXids: 6890
>>>
>>> Hmm, that's a lot of entries in KnownAssignedXids.
>>>
>>> Can you recompile with WAL_DEBUG, and run the recovery again with
>>> wal_debug=on ? That will print all the replayed WAL records, which is
>>> a lot
>>> of data, but it might give a hint what's going on.
>>
>> Sure, but this gives me only one more line:
>>
>> [...]
>> LOG: redo starts at 1F8/FC00E978
>> LOG: REDO @ 1F8/FC00E978; LSN 1F8/FC00EE90: prev 1F8/FC00E930; xid
>> 385669; len 21; bkpb1: Heap - insert: rel 1663/16384/18373; tid
>> 3829898/23
>> FATAL: too many KnownAssignedXids
>> CONTEXT: xlog redo insert: rel 1663/16384/18373; tid 3829898/23
>> LOG: startup process (PID 4587) exited with exit code 1
>> LOG: terminating any other active server processes
>
> Thanks, I can reproduce this now. This happens when you have a wide gap
> between the oldest still active xid and the latest xid.
>
> When recovery starts, we fetch the oldestActiveXid from the checkpoint
> record. Let's say that it's 100. We then start replaying WAL records
> from the Redo pointer, and the first record (heap insert in your case)
> contains an Xid that's much larger than 100, say 10000. We call
> RecordKnownAssignedXids() to make note that all xids between that range
> are in-progress, but there isn't enough room in the array for that.
>
> We normally get away with a smallish array because the array is trimmed
> at commit and abort records, and the special xid-assignment record to
> handle the case of a lot of subtransactions. We initialize the array
> from the running-xacts record that's written at a checkpoint. That
> mechanism fails in this case because the heap insert record is seen
> before the running-xacts record, causing all those xids in the range
> 100-10000 to be considered in-progress. The running-xacts record that
> comes later would prune them, but we don't have enough slots to hold
> them until that.
>
> Hmm. I'm not sure off the top of my head how to fix that. Perhaps stash
> the xids we see during WAL replay in private memory instead of putting
> them in the KnownAssignedXids array until we see the running-xacts record.

Looking closer at RecordKnownAssignedTransactionIds(), there's a related
much more serious bug there too. When latestObservedXid is initialized
to the oldest still-running xid, oldestActiveXid, at WAL recovery, we
zero the CLOG starting from the oldestActiveXid. That will zap away the
clog bits of any old transactions that had already committed before the
checkpoint started, but were younger than the oldest still running
transaction. The transactions will be lost :-(.

It's dangerous to initialize latestObservedXid to anything to an older
value. The idea of keeping the seen xids in a temporary list private to
the startup process until the running-xacts record would solve that
problem too. ProcArrayInitRecoveryInfo() would not be needed anymore,
the KnownAssignedXids tracking would start at the first running-xacts
record (or shutdown checkpoint) we see, not any sooner than that.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Cc:	Joachim Wieland <joe(at)mcknight(dot)de>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Subject:	Re: Hot Standby: too many KnownAssignedXids
Date:	2010-11-24 11:39:47
Message-ID:	4CECF983.8060409@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 24.11.2010 13:38, Heikki Linnakangas wrote:
> It's dangerous to initialize latestObservedXid to anything to an older
> value.

older value than the nextXid-1 from the checkpoint record, I meant to say.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Joachim Wieland <joe(at)mcknight(dot)de>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby: too many KnownAssignedXids
Date:	2010-11-24 15:19:47
Message-ID:	1290611987.5412.22474.camel@ebony
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 2010-11-24 at 12:48 +0200, Heikki Linnakangas wrote:
> When recovery starts, we fetch the oldestActiveXid from the checkpoint
> record. Let's say that it's 100. We then start replaying WAL records
> from the Redo pointer, and the first record (heap insert in your case)
> contains an Xid that's much larger than 100, say 10000. We call
> RecordKnownAssignedXids() to make note that all xids between that
> range are in-progress, but there isn't enough room in the array for
> that.

Agreed.

> Hmm. I'm not sure off the top of my head how to fix that. Perhaps
stash
> the xids we see during WAL replay in private memory instead of
> putting
> them in the KnownAssignedXids array until we see the running-xacts
> record.

Moving LogStandbySnapshot() earlier will help but won't solve it fully.

Will think.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Joachim Wieland <joe(at)mcknight(dot)de>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby: too many KnownAssignedXids
Date:	2010-12-01 18:51:19
Message-ID:	4CF69927.2070202@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 24.11.2010 12:48, Heikki Linnakangas wrote:
> When recovery starts, we fetch the oldestActiveXid from the checkpoint
> record. Let's say that it's 100. We then start replaying WAL records
> from the Redo pointer, and the first record (heap insert in your case)
> contains an Xid that's much larger than 100, say 10000. We call
> RecordKnownAssignedXids() to make note that all xids between that range
> are in-progress, but there isn't enough room in the array for that.
>
> We normally get away with a smallish array because the array is trimmed
> at commit and abort records, and the special xid-assignment record to
> handle the case of a lot of subtransactions. We initialize the array
> from the running-xacts record that's written at a checkpoint. That
> mechanism fails in this case because the heap insert record is seen
> before the running-xacts record, causing all those xids in the range
> 100-10000 to be considered in-progress. The running-xacts record that
> comes later would prune them, but we don't have enough slots to hold
> them until that.
>
> Hmm. I'm not sure off the top of my head how to fix that. Perhaps stash
> the xids we see during WAL replay in private memory instead of putting
> them in the KnownAssignedXids array until we see the running-xacts record.

So, here's a patch using that approach.

Another approach would be to revisit the way the running-xacts snapshot
is taken. Currently, we first take a snapshot, and then WAL-log it.
There is a small window between the steps where backends can begin/end
transactions, and recovery has to deal with that. When this was
designed, there was long discussion on whether we should instead grab
WALInsertLock and ProcArrayLock at the same time, to ensure that the
running-xacts snapshot represents an up-to-date situation at the point
in WAL where it's inserted.

We didn't want to do that because both locks can be heavily contended.
But maybe we should after all. It would make the recovery code simpler.

If we want to get fancy, we wouldn't necessarily need to hold both locks
for the whole duration. We could first grab ProcArrayLock and construct
the snapshot. Then grab WALInsertLock and release ProcArrayLock, and
finally write the WAL record and release WALInsertLock. But that would
require small changes to XLogInsert.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment	Content-Type	Size
knownassignedxids-fix-1.patch	text/x-diff	10.8 KB

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Joachim Wieland <joe(at)mcknight(dot)de>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby: too many KnownAssignedXids
Date:	2010-12-02 08:39:31
Message-ID:	4CF75B43.5030101@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 01.12.2010 20:51, Heikki Linnakangas wrote:
> Another approach would be to revisit the way the running-xacts snapshot
> is taken. Currently, we first take a snapshot, and then WAL-log it.
> There is a small window between the steps where backends can begin/end
> transactions, and recovery has to deal with that. When this was
> designed, there was long discussion on whether we should instead grab
> WALInsertLock and ProcArrayLock at the same time, to ensure that the
> running-xacts snapshot represents an up-to-date situation at the point
> in WAL where it's inserted.
>
> We didn't want to do that because both locks can be heavily contended.
> But maybe we should after all. It would make the recovery code simpler.
>
> If we want to get fancy, we wouldn't necessarily need to hold both locks
> for the whole duration. We could first grab ProcArrayLock and construct
> the snapshot. Then grab WALInsertLock and release ProcArrayLock, and
> finally write the WAL record and release WALInsertLock. But that would
> require small changes to XLogInsert.

I took a look at that approach. We don't actually need to hold
ProcArrayLock while the WAL-record is written, we need to hold
XidGenLock. I believe that's less severe than holding the ProcArrayLock
as there's already precedence for writing a WAL record while holding
that: we do that when we advance to a new clog page and write a
zero-clog-page record.

So this is what we should do IMHO.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment	Content-Type	Size
knownassignedxids-fix-with-locking-1.patch	text/x-diff	14.2 KB

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Joachim Wieland <joe(at)mcknight(dot)de>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby: too many KnownAssignedXids
Date:	2010-12-02 09:02:14
Message-ID:	1291280534.2006.882.camel@ebony
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, 2010-12-02 at 10:39 +0200, Heikki Linnakangas wrote:
> On 01.12.2010 20:51, Heikki Linnakangas wrote:
> > Another approach would be to revisit the way the running-xacts snapshot
> > is taken. Currently, we first take a snapshot, and then WAL-log it.
> > There is a small window between the steps where backends can begin/end
> > transactions, and recovery has to deal with that. When this was
> > designed, there was long discussion on whether we should instead grab
> > WALInsertLock and ProcArrayLock at the same time, to ensure that the
> > running-xacts snapshot represents an up-to-date situation at the point
> > in WAL where it's inserted.
> >
> > We didn't want to do that because both locks can be heavily contended.
> > But maybe we should after all. It would make the recovery code simpler.
> >
> > If we want to get fancy, we wouldn't necessarily need to hold both locks
> > for the whole duration. We could first grab ProcArrayLock and construct
> > the snapshot. Then grab WALInsertLock and release ProcArrayLock, and
> > finally write the WAL record and release WALInsertLock. But that would
> > require small changes to XLogInsert.
>
> I took a look at that approach. We don't actually need to hold
> ProcArrayLock while the WAL-record is written, we need to hold
> XidGenLock. I believe that's less severe than holding the ProcArrayLock
> as there's already precedence for writing a WAL record while holding
> that: we do that when we advance to a new clog page and write a
> zero-clog-page record.
>
> So this is what we should do IMHO.

Oh, thanks for looking at this. I've been looking at this also and as we
might expect had a slightly different design.

First, your assessment of the locking above is better than mine. I agree
with your analysis so we should do it that way. The locking issue was
the reason I haven't patched this yet so I'm glad you've improved this.

In terms of the rest of the patch, it seems we have different designs, I
think I have a much simpler, less invasive solution:

The cause of the issue is that replay starts at one LSN and there is a
delay until the RunningXacts WAL record occurs. If there was no delay,
there would be no issue at all. In CreateCheckpoint() we start by
grabbing the WAInsertLock and later recording that pointer as part of
the checkpoint record. My proposal is to replace the "grab the lock"
code with the insert of the RunningXacts WAL record (when wal_level
set), so that recovery always starts with that record type.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Joachim Wieland <joe(at)mcknight(dot)de>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby: too many KnownAssignedXids
Date:	2010-12-02 10:41:39
Message-ID:	4CF777E3.2090701@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 02.12.2010 11:02, Simon Riggs wrote:
> The cause of the issue is that replay starts at one LSN and there is a
> delay until the RunningXacts WAL record occurs. If there was no delay,
> there would be no issue at all. In CreateCheckpoint() we start by
> grabbing the WAInsertLock and later recording that pointer as part of
> the checkpoint record. My proposal is to replace the "grab the lock"
> code with the insert of the RunningXacts WAL record (when wal_level
> set), so that recovery always starts with that record type.

Oh, interesting idea. But AFAICS closing the gap between acquiring the
running-xacts snapshot and writing it to the log is sufficient, I don't
see what moving the running-xacts record buys us. Does it allow some
further simplifications somewhere?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Joachim Wieland <joe(at)mcknight(dot)de>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby: too many KnownAssignedXids
Date:	2010-12-02 11:25:02
Message-ID:	1291289102.2006.1060.camel@ebony
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, 2010-12-02 at 12:41 +0200, Heikki Linnakangas wrote:
> On 02.12.2010 11:02, Simon Riggs wrote:
> > The cause of the issue is that replay starts at one LSN and there is a
> > delay until the RunningXacts WAL record occurs. If there was no delay,
> > there would be no issue at all. In CreateCheckpoint() we start by
> > grabbing the WAInsertLock and later recording that pointer as part of
> > the checkpoint record. My proposal is to replace the "grab the lock"
> > code with the insert of the RunningXacts WAL record (when wal_level
> > set), so that recovery always starts with that record type.
>
> Oh, interesting idea. But AFAICS closing the gap between acquiring the
> running-xacts snapshot and writing it to the log is sufficient, I don't
> see what moving the running-xacts record buys us. Does it allow some
> further simplifications somewhere?

Your patch is quite long and you do a lot more than just alter the
locking. I don't think we need those changes at all and especially would
not wish to backpatch that.

Earlier on this thread, we discussed:

On Wed, 2010-11-24 at 15:19 +0000, Simon Riggs wrote:
> On Wed, 2010-11-24 at 12:48 +0200, Heikki Linnakangas wrote:
> > When recovery starts, we fetch the oldestActiveXid from the checkpoint
> > record. Let's say that it's 100. We then start replaying WAL records
> > from the Redo pointer, and the first record (heap insert in your case)
> > contains an Xid that's much larger than 100, say 10000. We call
> > RecordKnownAssignedXids() to make note that all xids between that
> > range are in-progress, but there isn't enough room in the array for
> > that.
>
> Agreed.

The current code fails because of the gap between the redo pointer and
the XLOG_RUNNING_XACTS WAL record. If there is no gap, there is no
problem.

So my preferred solution would:
* Log XLOG_RUNNING_XACTS while holding XidGenLock, as you suggest
* Move logging to occur at the Redo pointer

That is a much smaller patch with a smaller footprint.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Joachim Wieland <joe(at)mcknight(dot)de>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby: too many KnownAssignedXids
Date:	2010-12-02 11:31:42
Message-ID:	4CF7839E.6050505@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 02.12.2010 13:25, Simon Riggs wrote:
> On Thu, 2010-12-02 at 12:41 +0200, Heikki Linnakangas wrote:
>> On 02.12.2010 11:02, Simon Riggs wrote:
>>> The cause of the issue is that replay starts at one LSN and there is a
>>> delay until the RunningXacts WAL record occurs. If there was no delay,
>>> there would be no issue at all. In CreateCheckpoint() we start by
>>> grabbing the WAInsertLock and later recording that pointer as part of
>>> the checkpoint record. My proposal is to replace the "grab the lock"
>>> code with the insert of the RunningXacts WAL record (when wal_level
>>> set), so that recovery always starts with that record type.
>>
>> Oh, interesting idea. But AFAICS closing the gap between acquiring the
>> running-xacts snapshot and writing it to the log is sufficient, I don't
>> see what moving the running-xacts record buys us. Does it allow some
>> further simplifications somewhere?
>
> Your patch is quite long and you do a lot more than just alter the
> locking. I don't think we need those changes at all and especially would
> not wish to backpatch that.

Most of the changes to procarray.c were about removing code that's no
longer necessary when we close the gap between acquiring and writing the
running-xacts WAL record. You can leave it as it is as a historical
curiosity, but I'd prefer to simplify it, given that we now know that it
doesn't actually work correctly if the gap is not closed.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby: too many KnownAssignedXids
Date:	2010-12-07 08:42:22
Message-ID:	4CFDF36E.6050102@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 02.12.2010 12:31, Heikki Linnakangas wrote:
> On 02.12.2010 13:25, Simon Riggs wrote:
>> On Thu, 2010-12-02 at 12:41 +0200, Heikki Linnakangas wrote:
>>> On 02.12.2010 11:02, Simon Riggs wrote:
>>>> The cause of the issue is that replay starts at one LSN and there is a
>>>> delay until the RunningXacts WAL record occurs. If there was no delay,
>>>> there would be no issue at all. In CreateCheckpoint() we start by
>>>> grabbing the WAInsertLock and later recording that pointer as part of
>>>> the checkpoint record. My proposal is to replace the "grab the lock"
>>>> code with the insert of the RunningXacts WAL record (when wal_level
>>>> set), so that recovery always starts with that record type.
>>>
>>> Oh, interesting idea. But AFAICS closing the gap between acquiring the
>>> running-xacts snapshot and writing it to the log is sufficient, I don't
>>> see what moving the running-xacts record buys us. Does it allow some
>>> further simplifications somewhere?
>>
>> Your patch is quite long and you do a lot more than just alter the
>> locking. I don't think we need those changes at all and especially would
>> not wish to backpatch that.
>
> Most of the changes to procarray.c were about removing code that's no
> longer necessary when we close the gap between acquiring and writing the
> running-xacts WAL record. You can leave it as it is as a historical
> curiosity, but I'd prefer to simplify it, given that we now know that it
> doesn't actually work correctly if the gap is not closed.

Ok, I've committed this patch now.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Joachim Wieland <joe(at)mcknight(dot)de>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby: too many KnownAssignedXids
Date:	2010-12-15 19:54:22
Message-ID:	AANLkTi=ZGrSVXuyxL_6Fp=gfG6LBSb4pCtXXEp85bNnD@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Dec 7, 2010 at 3:42 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Ok, I've committed this patch now.

I can confirm that I could continue replaying the logfiles on the
standby host with this patch.

Thanks a lot,
Joachim