Quick Links

Re: Disabled features on Hot Standby

Lists:	pgsql-hackers

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Disabled features on Hot Standby
Date:	2012-01-13 13:27:38
Message-ID:	CA+U5nML3mwijmPqxA9gjU_LS7cgkyGhrsHpWMBfa53HJpv0UJA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

The open items list for 9.2 says

"Index-only scans need to be disabled when running under Hot Standby"

There is no explanation, so please explain here so we can change it,
if possible.

Not sure its great policy to keep implementing things that don't work
on HS, while not giving a proper chance for other people to fix that.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Disabled features on Hot Standby
Date:	2012-01-13 14:03:49
Message-ID:	CA+TgmoZT_=x6eLxRus83mwY5+4rMBo4_bvwBMTk3Z0ytKFrR6g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jan 13, 2012 at 8:27 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> The open items list for 9.2 says
>
> "Index-only scans need to be disabled when running under Hot Standby"
>
> There is no explanation, so please explain here so we can change it,
> if possible.
>
> Not sure its great policy to keep implementing things that don't work
> on HS, while not giving a proper chance for other people to fix that.

Well, I agree that it's not a great policy to implement lots of things
that don't work on Hot Standby, but this particular issue is an old
one.

http://archives.postgresql.org/pgsql-hackers/2008-10/msg01422.php

Since the xmin horizon on the standby may precede the xmin horizon on
the master, we can't assume that a page is all-visible on the standby
just because it's all-visible on the master. The crash-safe
visibililty map code is in some ways a step forward for Hot Standby,
because in previous releases, we didn't WAL-log setting visibility map
bits at all, and therefore a just-promoted Hot Standby server would
have no bits set apart from those which had been continuously set
since the base backup was taken, or those it happened to get by chance
in full page images. Even in old releases, that's not particularly
good, because it means that (1) sequential scans on a just-promoted
standby will be slower, since those can leverage PD_ALL_VISIBLE to
avoid retail MVCC checks on every tuple and (2) the first VACUUM on
each table on the standby after promotion will possibly need to scan a
lot more of the table than was being scanned by a typical VACUUM on
the old master, owing to the relevant bits in the visibillity map not
being set. 9.2 will be better in this regard, because the bits will
propagate from master to standby as they're set on the master, but
it's still not safe to rely on them until we've exited recovery.

I'm not sure I have a brilliant idea for how to fix this. We could
skip replay of any WAL record that sets a visibility map or
PD_ALL_VISIBLE bit unless the page is also all-visible under the
standby's xmin horizon, but we'd need some way to know that, and we'd
need to worry not only about the records emitted by vacuum that
explicitly set the bit, but also about any place where we send the
whole page and the bit goes along with it; we'd need to add code to go
in and clear the bit. That seems a bit complex and error-prone. Or,
we could try to lift this restriction in the special case when
hot_standby_feedback is set, though I have a feeling that's not really
robust - any time you lose the connection to the master, it'll lose
your xmin holdback and possibly mark some things all-visible that
really aren't on the standby, and then you reconnect and replay those
WAL records and bad things happen.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Disabled features on Hot Standby
Date:	2012-01-13 15:17:14
Message-ID:	CA+U5nM+f_qdKiktgY-amu7SUYg6cWDsvnjocP3Dab9zH+vD3SQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jan 13, 2012 at 2:03 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> Since the xmin horizon on the standby may precede the xmin horizon on
> the master

When hot_standby_feedback is turned on, the xmin of the standby is
provided to the master to ensure the situation you describe never
happens.

Surely that means the problem is solved, if we can confirm the setting
of the parameter. So this seems like a cleanup issues that can/should
be resolved.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Disabled features on Hot Standby
Date:	2012-01-13 15:22:20
Message-ID:	CA+TgmoaxXV7k3oyy8iXCRJf+_XWQo08T=TcNg3f4mN9+Wi3CKg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jan 13, 2012 at 10:17 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Fri, Jan 13, 2012 at 2:03 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
>> Since the xmin horizon on the standby may precede the xmin horizon on
>> the master
>
> When hot_standby_feedback is turned on, the xmin of the standby is
> provided to the master to ensure the situation you describe never
> happens.
>
> Surely that means the problem is solved, if we can confirm the setting
> of the parameter. So this seems like a cleanup issues that can/should
> be resolved.

How do you respond to the concerns I raised about that approach in the
last sentence of my previous email?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Jim Nasby <jim(at)nasby(dot)net>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Disabled features on Hot Standby
Date:	2012-01-13 15:31:13
Message-ID:	B59DA2F0-40BF-4FB3-8F4D-4A9B71F75930@nasby.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Jan 13, 2012, at 8:03 AM, Robert Haas wrote:
> Or,
> we could try to lift this restriction in the special case when
> hot_standby_feedback is set, though I have a feeling that's not really
> robust - any time you lose the connection to the master, it'll lose
> your xmin holdback and possibly mark some things all-visible that
> really aren't on the standby, and then you reconnect and replay those
> WAL records and bad things happen.

Also, what happens if you started off with hot_standby_feedback turned off? New stuff won't just magically start working when you turn it on (or is that parameter settable only on restart?)

ISTM that hot standbys need some ability to store some "interim data" that is only needed by the hot standby while older transactions are running. IIRC we've seen other places where we have this problem too.

Perhaps it would be possible to keep older copies of pages around when there are older transactions running on the standby?
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Jim Nasby <jim(at)nasby(dot)net>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Disabled features on Hot Standby
Date:	2012-01-13 15:36:32
Message-ID:	CA+TgmoaZo8OJjgrCKBucudzSRZxWxhMB6c8N6rrzP2rz7Zbu_w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jan 13, 2012 at 10:31 AM, Jim Nasby <jim(at)nasby(dot)net> wrote:
> Perhaps it would be possible to keep older copies of pages around when there are older transactions running on the standby?

I've thought about that... it's basically a rollback segment, which
for that matter might be judged superior to what we do on the master
(i.e. VACUUM). Needless to say, that would be just a teensy bit more
work than we're likely to have time for in 9.2. But maybe for 10.2...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Disabled features on Hot Standby
Date:	2012-01-13 16:13:46
Message-ID:	CA+U5nML-W9rQkQyVFWBNEh8t_3fo2mKStKse0nUb5b_9ozJhSA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jan 13, 2012 at 3:22 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Jan 13, 2012 at 10:17 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> On Fri, Jan 13, 2012 at 2:03 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>
>>> Since the xmin horizon on the standby may precede the xmin horizon on
>>> the master
>>
>> When hot_standby_feedback is turned on, the xmin of the standby is
>> provided to the master to ensure the situation you describe never
>> happens.
>>
>> Surely that means the problem is solved, if we can confirm the setting
>> of the parameter. So this seems like a cleanup issues that can/should
>> be resolved.
>
> How do you respond to the concerns I raised about that approach in the
> last sentence of my previous email?

I think it should be you that comes up with a fix, not for me to
respond to your concerns about how hard it is. Many things that don't
fully work are rejected for that reason.

Having said that, I have input that seems to solve the problem.

Many WAL records have latestRemovedXid on them. We can use the same
idea with XLOG_HEAP2_VISIBLE records, so we add a field to send the
latest vacrelstats->latestRemovedXid. That then creates a recovery
snapshot conflict that would cancel any query that might then see a
page of the vis map that was written when the xmin was later than on
the standby. If replication disconnects briefly and a vimap bit is
updated that would cause a problem, just as the same situation would
cause a problem because of other record types.

If problem solved then we can enable IndexOnlyScans on standby.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Disabled features on Hot Standby
Date:	2012-01-13 17:08:05
Message-ID:	CA+Tgmobq22kF9LkGLsm2QqeRVGirr1sFWkXGqb=7BVjQ7Bm9ew@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jan 13, 2012 at 11:13 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> I think it should be you that comes up with a fix, not for me to
> respond to your concerns about how hard it is. Many things that don't
> fully work are rejected for that reason.

Well, I disagree. The fact that all-visible info can't be trusted in
standby mode is a problem that has existed since Hot Standby was
committed, and I don't feel obliged to fix it just because I was
involved in developing a new feature that happens to rely on
all-visible info. I'm sorry to butt heads with you on this one, but
this limitation has been long-known and discussed many times before on
pgsql-hackers, and I'm not going to drop everything and start working
on this just because you seem to think that I should.

> Having said that, I have input that seems to solve the problem.
>
> Many WAL records have latestRemovedXid on them. We can use the same
> idea with XLOG_HEAP2_VISIBLE records, so we add a field to send the
> latest vacrelstats->latestRemovedXid. That then creates a recovery
> snapshot conflict that would cancel any query that might then see a
> page of the vis map that was written when the xmin was later than on
> the standby. If replication disconnects briefly and a vimap bit is
> updated that would cause a problem, just as the same situation would
> cause a problem because of other record types.

That could create a lot of recovery conflicts when
hot_standby_feedback=off, I think, but it might work when
hot_standby_feedback=on. I don't fully understand the
latestRemovedXid machinery, but I guess the idea would be to kill any
standby transaction whose proc->xmin precedes the oldest committed
xmin or xmax on the page. If hot_standby_feedback=on then there
shouldn't be any, except in the case where it's just been enabled or
the SR connection is bouncing.

Also, what happens if an all-visible bit gets set on the standby
through some other mechanism - e.g. restored from an FPI or
XLOG_HEAP_NEWPAGE? I'm not sure whether we ever do an FPI of the
visibility map page itself, but we certainly do it for the heap pages.
So it might be that this infrastructure would (somewhat bizarrely)
trust the visibility map bits but not the PD_ALL_VISIBLE bits. I'm
hoping Heikki or Tom will comment on this thread, because I think
there are a bunch of subtle issues here and that we could easily screw
it up by trying to plow through the problem too hastily.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Disabled features on Hot Standby
Date:	2012-01-13 17:18:40
Message-ID:	CA+U5nMLk+7vzGfAieR1XiHhbc1eqDwOTPsZLg_0pSY+7Z77iiw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jan 13, 2012 at 5:08 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> Also, what happens if an all-visible bit gets set on the standby
> through some other mechanism - e.g. restored from an FPI or
> XLOG_HEAP_NEWPAGE? I'm not sure whether we ever do an FPI of the
> visibility map page itself, but we certainly do it for the heap pages.
> So it might be that this infrastructure would (somewhat bizarrely)
> trust the visibility map bits but not the PD_ALL_VISIBLE bits. I'm
> hoping Heikki or Tom will comment on this thread, because I think
> there are a bunch of subtle issues here and that we could easily screw
> it up by trying to plow through the problem too hastily.

An FPI can't change the all visible flag. If it did, it would imply
that we'd skipped generation or replay of an XLOG_HEAP2_VISIBLE
record, or that we're doing crash recovery/inital startup and HS is
not yet enabled.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Disabled features on Hot Standby
Date:	2012-01-13 22:37:22
Message-ID:	m2fwfjtejh.fsf@2ndQuadrant.fr
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Well, I disagree. The fact that all-visible info can't be trusted in
> standby mode is a problem that has existed since Hot Standby was
> committed, and I don't feel obliged to fix it just because I was
> involved in developing a new feature that happens to rely on

I'm very sorry to read that, because I sure feel like that's exactly
what us non commiters are asked to be doing when cooking any patch.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

From:	Noah Misch <noah(at)leadboat(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Disabled features on Hot Standby
Date:	2012-01-14 01:02:31
Message-ID:	20120114010231.GA26756@tornado.leadboat.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jan 13, 2012 at 12:08:05PM -0500, Robert Haas wrote:
> On Fri, Jan 13, 2012 at 11:13 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> > Many WAL records have latestRemovedXid on them. We can use the same
> > idea with XLOG_HEAP2_VISIBLE records, so we add a field to send the
> > latest vacrelstats->latestRemovedXid. That then creates a recovery
> > snapshot conflict that would cancel any query that might then see a
> > page of the vis map that was written when the xmin was later than on
> > the standby. If replication disconnects briefly and a vimap bit is
> > updated that would cause a problem, just as the same situation would
> > cause a problem because of other record types.
>
> That could create a lot of recovery conflicts when
> hot_standby_feedback=off, I think, but it might work when
> hot_standby_feedback=on. I don't fully understand the
> latestRemovedXid machinery, but I guess the idea would be to kill any
> standby transaction whose proc->xmin precedes the oldest committed
> xmin or xmax on the page. If hot_standby_feedback=on then there
> shouldn't be any, except in the case where it's just been enabled or
> the SR connection is bouncing.

FWIW, Tom aired the same idea here:
http://archives.postgresql.org/message-id/27743.1291135210@sss.pgh.pa.us

While reviewing the crash-safe visibility map patch, I echoed the idea and
explained why the extra conflicts would be immaterial:
http://archives.postgresql.org/message-id/20110618133703.GA1100@tornado.leadboat.com

Simon spoke to the FPI side of the question. For heap pages, the
XLOG_HEAP_NEWPAGE consumers are CLUSTER, VACUUM FULL and ALTER TABLE SET
TABLESPACE. For the last, we will have already logged any PD_ALL_VISIBLE bits
through normal channels. CLUSTER and VACUUM FULL never set PD_ALL_VISIBLE or
visibility map bits. When, someday, they do, we might emit a separate WAL
record to force the recovery conflict. However, CLUSTER/VACUUM FULL already
remove tuples still-visible to standby snapshots without provoking a recovery
conflict. (Again only with hot_standby_feedback=off.)

Thanks,
nm

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Disabled features on Hot Standby
Date:	2012-01-14 05:27:08
Message-ID:	CA+TgmoY6mdyJQDe_kkJw34BaXS85iCESLapv_dqUSNBczg2A+A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jan 13, 2012 at 5:37 PM, Dimitri Fontaine
<dimitri(at)2ndquadrant(dot)fr> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> Well, I disagree. The fact that all-visible info can't be trusted in
>> standby mode is a problem that has existed since Hot Standby was
>> committed, and I don't feel obliged to fix it just because I was
>> involved in developing a new feature that happens to rely on
>
> I'm very sorry to read that, because I sure feel like that's exactly
> what us non commiters are asked to be doing when cooking any patch.

There is obviously room for disagreement over what should or should
not be within the scope of any particular development project, and we
may not always agree on that. However, I do try to act in good faith
and apply, as much as I can, the same standard to other people's
patches as I do to my own. In general, when I ask for the scope of
something to be expanded, I try to limit myself to something that I
feel can be accomplished in a day or two of development and which are
closely related to the core of what the patch is about. There are
exceptions, of course, where I feel that more than that is needed, but
there are also many examples on file of me arguing for a narrower
scope - either that an already-written patch should be divided into
pieces so that the uncontroversial pieces can be committed, or that a
project need not include every element that anyone wants in order to
be acceptable. I try very hard to be sensitive to the fact that
non-committers also have stuff they want to get done, which is why I
am one of the most prolific committers of the work of non-committers,
even extending in numerous instances (most recently, today) to
expending rather large amounts of time rewriting stuff that neither I
nor my employer have a huge stake in to reach the quality of code that
I think is needed for commit, or to satisfy the objections of other
community members that I may not personally share.

In this particular case, I am not even the person who committed the
patch. Tom committed index-only scans well before I thought it was
ready for prime time, and then did a whole lot of work to improve it
afterwards. Even now, I believe there are significant issues
remaining. There is no EXPLAIN support (for which I supported a patch
today); the statistics might need work (as Magnus complained about in
response to my EXPLAIN patch); we don't yet support index-only quals
(i.e. even though you know you'll eventually need the heap tuple,
evaluate the quals you can using just the index tuple in the hopes of
avoiding some heap fetches); and I'm pretty worried that we'll run
across cases where too much of the heap is not-all-visible and we
either don't use index only scans or we use them but they don't
perform well - a problem which I suspect will require adjustment of
our vacuuming algorithm to avoid. With the exception of EXPLAIN
support which I think is merely an oversight, all of those issues,
including the problems in Hot Standby mode, remain because nobody
knows exactly what we ought to do to fix them. When somebody figures
it out, I predict they'll get fixed pretty quickly. Now, whether or
not that will be in time for 9.2, or whether I'll be the one to write
the code, I don't know. But in the meantime, there are really only
two choices here: we can understand that the feature we have has some
limitations, or we can revert it. Considering the amount of time that
was put into it, particular by Tom, I can't imagine that we really
want to revert it, but then again I'm relatively shocked that I'm
being asked, one day before final CommitFest starts, to drop
everything and work on a limitation that I thought had been
well-understood by everyone for three years and that it's not clear we
know how to lift. It may be that I didn't do a sufficiently good job
communicating that this limitation was bound to exist, and I apologize
for that. I did mention it on-list multiple times, as did Heikki, and
I thought it was understood, but apparently not. If people who didn't
object to committing the patch would have done so had they realized
this, and don't feel that I made it clear enough, I am sincerely
sorry. I would have made the same argument then that I am making now,
but at least we would have hashed it out one way or the other before
moving forward.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Noah Misch <noah(at)leadboat(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Disabled features on Hot Standby
Date:	2012-01-14 05:42:02
Message-ID:	CA+Tgmob-ty1kNpxOB_aYhfxc-3k_h6NuMZEjiviooN-6e37T4w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jan 13, 2012 at 8:02 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> Simon spoke to the FPI side of the question. For heap pages, the
> XLOG_HEAP_NEWPAGE consumers are CLUSTER, VACUUM FULL and ALTER TABLE SET
> TABLESPACE. For the last, we will have already logged any PD_ALL_VISIBLE bits
> through normal channels. CLUSTER and VACUUM FULL never set PD_ALL_VISIBLE or
> visibility map bits. When, someday, they do, we might emit a separate WAL
> record to force the recovery conflict. However, CLUSTER/VACUUM FULL already
> remove tuples still-visible to standby snapshots without provoking a recovery
> conflict. (Again only with hot_standby_feedback=off.)

Is the big about CLUSTER/VACUUM FULL a preexisting bug? If not, why not?

Other than that, it seems like we might be converging on a workable
solution: if hot_standby_feedback=off, disable index-only scans for
snapshots taken during recovery; if hot_standby_feedback=on, generate
recovery conflicts when a snapshot's xmin precedes the youngest xmin
on a page marked all-visible, but allow the use of index-only scans,
and allow sequential scans to trust PD_ALL_VISIBLE. Off the top of my
head, I don't see a hole in that logic...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Disabled features on Hot Standby
Date:	2012-01-14 05:44:06
Message-ID:	CA+TgmoYbAbbB2szn9OUXVrCUS6EGRqedVDWaxN0s7yHO19vnJQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jan 13, 2012 at 12:18 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Fri, Jan 13, 2012 at 5:08 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> Also, what happens if an all-visible bit gets set on the standby
>> through some other mechanism - e.g. restored from an FPI or
>> XLOG_HEAP_NEWPAGE? I'm not sure whether we ever do an FPI of the
>> visibility map page itself, but we certainly do it for the heap pages.
>> So it might be that this infrastructure would (somewhat bizarrely)
>> trust the visibility map bits but not the PD_ALL_VISIBLE bits. I'm
>> hoping Heikki or Tom will comment on this thread, because I think
>> there are a bunch of subtle issues here and that we could easily screw
>> it up by trying to plow through the problem too hastily.
>
> An FPI can't change the all visible flag. If it did, it would imply
> that we'd skipped generation or replay of an XLOG_HEAP2_VISIBLE
> record, or that we're doing crash recovery/inital startup and HS is
> not yet enabled.

Interesting. I hadn't considered that point.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Noah Misch <noah(at)leadboat(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Disabled features on Hot Standby
Date:	2012-01-14 06:10:14
Message-ID:	20120114061014.GD26756@tornado.leadboat.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Jan 14, 2012 at 12:42:02AM -0500, Robert Haas wrote:
> On Fri, Jan 13, 2012 at 8:02 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> > Simon spoke to the FPI side of the question. ?For heap pages, the
> > XLOG_HEAP_NEWPAGE consumers are CLUSTER, VACUUM FULL and ALTER TABLE SET
> > TABLESPACE. ?For the last, we will have already logged any PD_ALL_VISIBLE bits
> > through normal channels. ?CLUSTER and VACUUM FULL never set PD_ALL_VISIBLE or
> > visibility map bits. ?When, someday, they do, we might emit a separate WAL
> > record to force the recovery conflict. ?However, CLUSTER/VACUUM FULL already
> > remove tuples still-visible to standby snapshots without provoking a recovery
> > conflict. ?(Again only with hot_standby_feedback=off.)
>
> Is the big about CLUSTER/VACUUM FULL a preexisting bug? If not, why not?

I suspect it goes back to 9.0, yes. I'm on the fence regarding whether to call
it a bug or an unimplemented feature. In any case, +1 for improving it.

> Other than that, it seems like we might be converging on a workable
> solution: if hot_standby_feedback=off, disable index-only scans for
> snapshots taken during recovery; if hot_standby_feedback=on, generate
> recovery conflicts when a snapshot's xmin precedes the youngest xmin
> on a page marked all-visible, but allow the use of index-only scans,
> and allow sequential scans to trust PD_ALL_VISIBLE. Off the top of my
> head, I don't see a hole in that logic...

I wouldn't check hot_standby_feedback. Rather, mirror what we do for
XLOG_HEAP2_CLEAN. Unconditionally add an xid to xl_heap_visible bearing the
youngest xmin on the page (alternately, some convenient upper bound thereof).
Have heap_xlog_visible() call ResolveRecoveryConflictWithSnapshot() on that
xid. Now, unconditionally trust PD_ALL_VISIBLE and permit index-only scans.

The user's settings of hot_standby_feedback, vacuum_defer_cleanup_age,
max_standby_streaming_delay and max_standby_archive_delay will drive the
consequential trade-off: nothing, query cancellation, or recovery delay.

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Noah Misch <noah(at)leadboat(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Disabled features on Hot Standby
Date:	2012-01-14 08:08:29
Message-ID:	CA+U5nMJ9zkfpgAFDG8FguHPk-eap0oM+PCEr6q75hmUeZXYwnw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Jan 14, 2012 at 1:02 AM, Noah Misch <noah(at)leadboat(dot)com> wrote:

>> Also, what happens if an all-visible bit gets set on the standby
>> through some other mechanism - e.g. restored from an FPI or
>> XLOG_HEAP_NEWPAGE? I'm not sure whether we ever do an FPI of the
>> visibility map page itself, but we certainly do it for the heap pages.
>> So it might be that this infrastructure would (somewhat bizarrely)
>> trust the visibility map bits but not the PD_ALL_VISIBLE bits.
>
> Simon spoke to the FPI side of the question. For heap pages, the
> XLOG_HEAP_NEWPAGE consumers are CLUSTER, VACUUM FULL and ALTER TABLE SET
> TABLESPACE. For the last, we will have already logged any PD_ALL_VISIBLE bits
> through normal channels. CLUSTER and VACUUM FULL never set PD_ALL_VISIBLE or
> visibility map bits. When, someday, they do, we might emit a separate WAL
> record to force the recovery conflict. However, CLUSTER/VACUUM FULL already
> remove tuples still-visible to standby snapshots without provoking a recovery
> conflict. (Again only with hot_standby_feedback=off.)

If that were the case it would be a bug.

CLUSTER/VACUUM FULL emit an AccessExclusiveLock record that would
conflict with any current lock holders, so should be fine on that.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Noah Misch <noah(at)leadboat(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Disabled features on Hot Standby
Date:	2012-01-14 08:23:03
Message-ID:	CA+U5nMJih0PgNXqZkf1R-6qbsdAC9Hgn6EYzoba=NeP2Ri+xVA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Jan 14, 2012 at 5:42 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

Please use the logic described earlier. Using the existing mechanisms
and infrastructure is all we need here.

We want latestRemovedXid, not a new version of it via xmins. Keep
index-only scans enabled all the time and let conflicts be generated.
There are already mechanisms in place to handle conflicts, so avoiding
them another way is not required or desired.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Noah Misch <noah(at)leadboat(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Disabled features on Hot Standby
Date:	2012-01-14 11:17:19
Message-ID:	20120114111719.GA1081@tornado.leadboat.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Jan 14, 2012 at 08:08:29AM +0000, Simon Riggs wrote:
> On Sat, Jan 14, 2012 at 1:02 AM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> > However, CLUSTER/VACUUM FULL already
> > remove tuples still-visible to standby snapshots without provoking a recovery
> > conflict. ?(Again only with hot_standby_feedback=off.)
>
> If that were the case it would be a bug.
>
> CLUSTER/VACUUM FULL emit an AccessExclusiveLock record that would
> conflict with any current lock holders, so should be fine on that.

I speak of this sequence (M = master connection, S = standby connection):

M: CREATE TABLE t AS SELECT * FROM generate_series(1,1000) t(n);
S: BEGIN ISOLATION LEVEL REPEATABLE READ; SELECT 0;
M: DELETE FROM t WHERE n <= 10;
M: VACUUM FULL t;
S: SELECT count(*) FROM t; -- 990, should be 1000

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Noah Misch <noah(at)leadboat(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Disabled features on Hot Standby
Date:	2012-01-14 11:36:52
Message-ID:	CA+U5nM+AajL=MYEYq6LZJT9-wnfxZA71CeCKiS_PP1PU2ZgwXA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Jan 14, 2012 at 11:17 AM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> On Sat, Jan 14, 2012 at 08:08:29AM +0000, Simon Riggs wrote:
>> On Sat, Jan 14, 2012 at 1:02 AM, Noah Misch <noah(at)leadboat(dot)com> wrote:
>> > However, CLUSTER/VACUUM FULL already
>> > remove tuples still-visible to standby snapshots without provoking a recovery
>> > conflict. ?(Again only with hot_standby_feedback=off.)
>>
>> If that were the case it would be a bug.
>>
>> CLUSTER/VACUUM FULL emit an AccessExclusiveLock record that would
>> conflict with any current lock holders, so should be fine on that.
>
> I speak of this sequence (M = master connection, S = standby connection):
>
> M: CREATE TABLE t AS SELECT * FROM generate_series(1,1000) t(n);
> S: BEGIN ISOLATION LEVEL REPEATABLE READ; SELECT 0;
> M: DELETE FROM t WHERE n <= 10;
> M: VACUUM FULL t;
> S: SELECT count(*) FROM t; -- 990, should be 1000

OK, so we need to emit a heap_xlog_cleanup_info() record at the end of
cluster to conflict with anybody that doesn't yet have a lock but has
a snapshot that can see tuples the cluster implicitly removed. Will
do.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Disabled features on Hot Standby
Date:	2012-01-14 11:44:17
Message-ID:	m2y5taqzji.fsf@2ndQuadrant.fr
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> With the exception of EXPLAIN
> support which I think is merely an oversight, all of those issues,
> including the problems in Hot Standby mode, remain because nobody
> knows exactly what we ought to do to fix them. When somebody figures
> it out, I predict they'll get fixed pretty quickly.

So this problem goes into the 9.2 Open Items list, right? It looks not
tied to this particular commit fest. Also, with so many people involved,
I think there shouldn't be any finger pointing here.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Disabled features on Hot Standby
Date:	2012-01-14 14:55:35
Message-ID:	CA+TgmobRWe8JT1KJEoQDBh402tKB963+xXmzptD4QHzyo=N0pg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Jan 14, 2012 at 6:44 AM, Dimitri Fontaine
<dimitri(at)2ndquadrant(dot)fr> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> With the exception of EXPLAIN
>> support which I think is merely an oversight, all of those issues,
>> including the problems in Hot Standby mode, remain because nobody
>> knows exactly what we ought to do to fix them. When somebody figures
>> it out, I predict they'll get fixed pretty quickly.
>
> So this problem goes into the 9.2 Open Items list, right? It looks not
> tied to this particular commit fest. Also, with so many people involved,
> I think there shouldn't be any finger pointing here.

Well, I wouldn't personally be horrified if this didn't get fixed for
9.2, but since you and Simon and Noah seem to feel strongly about it,
I'm inclined to say we should go ahead and fix it, so sure. Actually,
come to think of it, it really is on the open items list already: "fix
it so index-only scans work on the standby" is merely a more ambitious
version of "disable index-only scans on the standby", so it pretty
much works out to the same thing.

Furthermore, if we unconditionally generate recovery conflicts as
Simon is proposing, this sounds like it might be pretty darn simple.
I was in the midst of thinking about how to make the locking work if
we changed behavior when changing hot_standby_feedback, but if we
don't even need to go there so much the better. I'm happy to accept
Simon/Noah's judgement that the rate of conflicts will be acceptable;
I haven't worked with Hot Standby enough myself to have an educated
opinion on that topic.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company