Re: XMin Hot Standby Feedback patch

Lists: pgsql-hackers
From: Daniel Farina <daniel(at)heroku(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: XMin Hot Standby Feedback patch
Date: 2011-02-12 22:11:21
Message-ID: AANLkTi=58iQOX1Loj28=A0XQrfYizXRtttbbv4YW0ELP@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

This is another bit of the syncrep patch split out.

I will revisit the replication timeout one Real Soon, I promise -- but
I have a couple things to do today that may delay that until the
evening.

https://github.com/fdr/postgres/commit/ad3ce9ac62f0e128d7d1fd20d47184f867056af1

Context diff supplied here.

Note that this information is not exposed via catalog in the original
syncrep patch, and is not here. Do we want that kind of reporting?

--
fdr

Attachment Content-Type Size
Hot-standby-feedback-patch.patch text/x-patch 9.2 KB

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Daniel Farina <daniel(at)heroku(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: XMin Hot Standby Feedback patch
Date: 2011-02-15 16:42:42
Message-ID: 1297788162.1747.21432.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, 2011-02-12 at 14:11 -0800, Daniel Farina wrote:
> This is another bit of the syncrep patch split out.
>
> I will revisit the replication timeout one Real Soon, I promise -- but
> I have a couple things to do today that may delay that until the
> evening.
>
> https://github.com/fdr/postgres/commit/ad3ce9ac62f0e128d7d1fd20d47184f867056af1
>
> Context diff supplied here.

Greg just tipped me off to this thread a few hours ago. I saw your other
work on timeouts which looks good.

I've reworked this feature myself, and its roughly the same thing you
have posted, so I will just add on to this thread. The major change from
my earlier patch is that the logic around setting xmin on the master is
considerably tighter, and correctly uses locking.

Patch attached, no docs yet, but the patch is clear.

I'm looking to commit this in next 24 hours barring objections and/or
test failures.

> Note that this information is not exposed via catalog in the original
> syncrep patch, and is not here. Do we want that kind of reporting?

Probably, but its a small change and will conflict with other work for
now.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

Attachment Content-Type Size
hs_feedback.v11.patch text/x-patch 6.7 KB

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Daniel Farina <daniel(at)heroku(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: XMin Hot Standby Feedback patch
Date: 2011-02-15 16:49:48
Message-ID: 4D5AAEAC.6060909@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 15.02.2011 18:42, Simon Riggs wrote:
> On Sat, 2011-02-12 at 14:11 -0800, Daniel Farina wrote:
>> This is another bit of the syncrep patch split out.
>>
>> I will revisit the replication timeout one Real Soon, I promise -- but
>> I have a couple things to do today that may delay that until the
>> evening.
>>
>> https://github.com/fdr/postgres/commit/ad3ce9ac62f0e128d7d1fd20d47184f867056af1
>>
>> Context diff supplied here.
>
> Greg just tipped me off to this thread a few hours ago. I saw your other
> work on timeouts which looks good.
>
> I've reworked this feature myself, and its roughly the same thing you
> have posted, so I will just add on to this thread. The major change from
> my earlier patch is that the logic around setting xmin on the master is
> considerably tighter, and correctly uses locking.

It would be wise to also transmit the epoch in addition to xmin, to
avoid confusion if the standby is > 2 billion transactions behind.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Daniel Farina <daniel(at)heroku(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: XMin Hot Standby Feedback patch
Date: 2011-02-15 16:52:18
Message-ID: AANLkTikO-t_bitd-87RCEAq1JwDRN_4nNKiXM=tTap+i@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Feb 15, 2011 at 11:49 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> On 15.02.2011 18:42, Simon Riggs wrote:
>>
>> On Sat, 2011-02-12 at 14:11 -0800, Daniel Farina wrote:
>>>
>>> This is another bit of the syncrep patch split out.
>>>
>>> I will revisit the replication timeout one Real Soon, I promise -- but
>>> I have a couple things to do today that may delay that until the
>>> evening.
>>>
>>>
>>> https://github.com/fdr/postgres/commit/ad3ce9ac62f0e128d7d1fd20d47184f867056af1
>>>
>>> Context diff supplied here.
>>
>> Greg just tipped me off to this thread a few hours ago. I saw your other
>> work on timeouts which looks good.
>>
>> I've reworked this feature myself, and its roughly the same thing you
>> have posted, so I will just add on to this thread. The major change from
>> my earlier patch is that the logic around setting xmin on the master is
>> considerably tighter, and correctly uses locking.
>
> It would be wise to also transmit the epoch in addition to xmin, to avoid
> confusion if the standby is > 2 billion transactions behind.

That case is probably hopelessly broken anyway.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Daniel Farina <daniel(at)heroku(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: XMin Hot Standby Feedback patch
Date: 2011-02-15 16:54:22
Message-ID: 4D5AAFBE.7040403@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 15.02.2011 18:52, Robert Haas wrote:
> On Tue, Feb 15, 2011 at 11:49 AM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> It would be wise to also transmit the epoch in addition to xmin, to avoid
>> confusion if the standby is> 2 billion transactions behind.
>
> That case is probably hopelessly broken anyway.

I don't expect the feedback to do anything too useful in case of xid
wraparound, but at least the master would recognize the situation and
know to ignore the bogus xmin from that standby.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Daniel Farina <daniel(at)heroku(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: XMin Hot Standby Feedback patch
Date: 2011-02-15 16:59:35
Message-ID: 1297789175.1747.21529.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2011-02-15 at 18:49 +0200, Heikki Linnakangas wrote:

> It would be wise to also transmit the epoch in addition to xmin, to
> avoid confusion if the standby is > 2 billion transactions behind.

Yes, good idea, thanks.

That has to be the record for the fastest patch review. ;-)

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Daniel Farina <daniel(at)heroku(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: XMin Hot Standby Feedback patch
Date: 2011-02-15 17:15:44
Message-ID: AANLkTin96M5GbEyTkqdgeJcDWq2EhXc_HWNmFvhzax59@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Feb 15, 2011 at 11:42 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> Patch attached, no docs yet, but the patch is clear.
>
> I'm looking to commit this in next 24 hours barring objections and/or
> test failures.

Looks pretty good to me, though I haven't tested it. I like some of
the safety valves you put in there, but I don't understand this part:

+ /*
+ * Feedback from standby should move us
forwards, but not too far.
+ * Avoid grabbing XidGenLock here in case of
waits, so use
+ * a heuristic instead.
+ */

What's XIDGenLock got to do with it?

On another disk, I think that those warning messages are a bad idea.
That could fill up someone's disk really quickly.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Daniel Farina <daniel(at)heroku(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: XMin Hot Standby Feedback patch
Date: 2011-02-15 17:20:16
Message-ID: AANLkTimORH36U089tgiJFMz=Oc-L3zzyHBL=u+ZeyP0X@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Feb 15, 2011 at 12:15 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On another disk, I think that those warning messages are a bad idea.
> That could fill up someone's disk really quickly.

On another disk? What the heck am I talking about?

On another point? On another note? Anyway, you get the idea... hopefully.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Daniel Farina <daniel(at)heroku(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: XMin Hot Standby Feedback patch
Date: 2011-02-15 17:52:14
Message-ID: 1297792334.1747.21829.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2011-02-15 at 12:20 -0500, Robert Haas wrote:
> On Tue, Feb 15, 2011 at 12:15 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > On another disk, I think that those warning messages are a bad idea.
> > That could fill up someone's disk really quickly.
>
> On another disk? What the heck am I talking about?
>
> On another point? On another note? Anyway, you get the idea... hopefully.

Yeh. I quite liked it as a metaphorical turn of phrase.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Daniel Farina <daniel(at)heroku(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: XMin Hot Standby Feedback patch
Date: 2011-02-15 21:15:18
Message-ID: 1297804518.1747.22927.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2011-02-15 at 12:15 -0500, Robert Haas wrote:
> Looks pretty good to me, though I haven't tested it. I like some of
> the safety valves you put in there, but I don't understand this part

Reworked logic covering all feedback, plus tests, plus docs.

Last comments before commit please.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

Attachment Content-Type Size
hs_feedback.v12.patch text/x-patch 11.5 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Daniel Farina <daniel(at)heroku(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: XMin Hot Standby Feedback patch
Date: 2011-02-15 22:46:09
Message-ID: AANLkTikROUuWYQOLXLW33ETwnywK6Nz_nJcf3LmVBtTe@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Feb 15, 2011 at 4:15 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Tue, 2011-02-15 at 12:15 -0500, Robert Haas wrote:
>> Looks pretty good to me, though I haven't tested it.  I like some of
>> the safety valves you put in there, but I don't understand this part
>
> Reworked logic covering all feedback, plus tests, plus docs.
>
> Last comments before commit please.

What happens if someone has hot_standby_feedback on and then turns it
off? I think in XLogWalRcvSendReply() you need

if (hot_standby_feedback)
{
stuff
}
else
{
reply_message.xmin = InvaidXID;
reply_message.epoch = 0; /* or something */
}

Also this part looks kludgy to me:

+ GetNextXidAndEpoch(&nextXid, &nextEpoch);
+ if (nextXid < reply_message.xmin)
+ nextEpoch--;

How about introducing a GetOldestXminAndEpoch function instead?

Would it make sense to avoid grabbing the ProcArrayLock except when we
truly need to update MyProc->xmin? ProcessStandbyReplyMessage gets
called a lot...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Daniel Farina <daniel(at)heroku(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: XMin Hot Standby Feedback patch
Date: 2011-02-16 02:25:30
Message-ID: AANLkTikumNrqEYpfOvT95FWbKFs=nObcM8EEqnZ6jfqr@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Feb 16, 2011 at 6:15 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Tue, 2011-02-15 at 12:15 -0500, Robert Haas wrote:
>> Looks pretty good to me, though I haven't tested it.  I like some of
>> the safety valves you put in there, but I don't understand this part
>
> Reworked logic covering all feedback, plus tests, plus docs.
>
> Last comments before commit please.

When I started the standby with hot_standby = off and hot_standby_feedback = on,
I got the following assertion error.

-----------------
sby LOG: streaming replication successfully connected to primary
TRAP: FailedAssertion("!(((result) >= ((TransactionId) 3)))", File:
"procarray.c", Line: 1027)
act LOG: unexpected EOF on standby connection
sby LOG: WAL receiver process (PID 17572) was terminated by signal 6: Aborted
sby LOG: terminating any other active server processes
-----------------

vacuum_defer_cleanup_age on the *standby* should not affect the
feedback xid.

VACUUM TABLE on the *primary* doesn't use the feedback xid at all.
Is this intentional?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Daniel Farina <daniel(at)heroku(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: XMin Hot Standby Feedback patch
Date: 2011-02-16 07:27:59
Message-ID: 1297841279.1747.27059.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2011-02-16 at 11:25 +0900, Fujii Masao wrote:
> On Wed, Feb 16, 2011 at 6:15 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> > On Tue, 2011-02-15 at 12:15 -0500, Robert Haas wrote:
> >> Looks pretty good to me, though I haven't tested it. I like some of
> >> the safety valves you put in there, but I don't understand this part
> >
> > Reworked logic covering all feedback, plus tests, plus docs.
> >
> > Last comments before commit please.
>
> When I started the standby with hot_standby = off and hot_standby_feedback = on,
> I got the following assertion error.
>
> -----------------
> sby LOG: streaming replication successfully connected to primary
> TRAP: FailedAssertion("!(((result) >= ((TransactionId) 3)))", File:
> "procarray.c", Line: 1027)
> act LOG: unexpected EOF on standby connection
> sby LOG: WAL receiver process (PID 17572) was terminated by signal 6: Aborted
> sby LOG: terminating any other active server processes
> -----------------

Thanks

> vacuum_defer_cleanup_age on the *standby* should not affect the
> feedback xid.

Not sure, will think some more.

> VACUUM TABLE on the *primary* doesn't use the feedback xid at all.
> Is this intentional?

Yes, I was in the middle of fixing that.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services