Quick Links

Re: Re: Hot Standby query cancellation and Streaming Replication integration

Lists:	pgsql-hackers

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-26 08:33:35
Message-ID:	4B87875F.2010601@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I'm happy to see we've crossed the point where the worst of the Hot
Standby and Streaming Replication issues are sorted out. A look at the
to-do lists: http://wiki.postgresql.org/wiki/Hot_Standby_TODO
http://wiki.postgresql.org/wiki/Streaming_Replication show no Must-fix
items and 5 Serious Issues for Hot Standby left; there are 9 Streaming
Replication items there, which aren't as clearly prioritized yet.
Correct me if I'm wrong here, but those read to me like tweaks and
polishing rather than major architecture issues at this point, so I
believe that code is the same position as HS: some serious issues, but
no really terrible parts.

The job Simon asked me to take a look at starting last week is which of
the listed HS "Serious Issues" might be promoted into must-fix items
after seeing how easy they were to encounter. There are a number of HS
tunables that interact with one another, and depending your priorities a
few ways you can try to optimize the configuration for what I expect to
be common use cases for this feature. I've written a blog entry at
http://blog.2ndquadrant.com/en/2010/02/tradeoffs-in-hot-standby-deplo.html
that tries to explain all that background clearly, and relate the
implementation details to how I expect DBAs will perceive them. That
was a bit much to also cover here, and had a broader audience that might
appreciate it than just this list.

Attached is a tar file with some test case demo scripts that demonstrate
the worst of the problems here IMHO. A README in there outlines how to
set the problem demo up (presuming you've already gotten a HS pair
going). What this does is execute the following sequence continuously
on the master:

UPDATE pgbench_tellers SET tbalance = tbalance + <delta> WHERE tid =
<tid>; (several times)
VACUUM pgbench_tellers;

Meanwhile, on the standby, the following long query runs on a few
sessions at once, again looping constantly:

SELECT sum(abalance) FROM pgbench_accounts;

It took a bit of testing to get the database scale and iteration times
here to easily encounter the issue here on my system, I hope this shows
up easily enough for others with the values used. (I have a similar
work in progress demo that tries to trigger the b-tree deletion problem
too, will follow up once the storm of messages about this topic dies
down, as I think this is a pre-requisite for it anyway)

I'm not sure what you might be expecting from the above combination, but
what actually happens is that many of the SELECT statements on the table
*that isn't even being updated* are canceled. You see this in the logs:

LOG: restored log file "0000000100000000000000A5" from archive
ERROR: canceling statement due to conflict with recovery
DETAIL: User query might have needed to see row versions that must be
removed.
STATEMENT: SELECT sum(abalance) FROM pgbench_accounts;

Basically, every time a WAL segment appears that wipes out a tuple that
SELECT expects should still be visible, because the dead row left behind
by the update has been vacuumed away, the query is canceled. This
happens all the time the way I've set this up, and I don't feel like
this is a contrived demo. Having a long-running query on the standby
while things get updated and then periodically autovacuumed on the
primary is going to be extremely common in the sorts of production
systems I expect want HS the most.

Now, as explained on the blog entry and in the documentation, there are
all sorts of ways you can work around this issue by tweaking parameters
or doing fun tricks with dblink. You can prioritize any two of keeping
the standby current, letting long-running queries execute on the
standby, and keeping xid advances on the master moving forward as fast
as possible. But you can't get all three at once. The choices
available are really about the best you can do given a system that's
basically the old warm-standby approach, improved with adding just Hot
Standby to the mix. Sure, you might make the conflict resolution a bit
smarter or make the UI for setting the parameters more friendly, and
there's already been plenty of argument and patching over all of that.
I don't want to belittle that work because it's been important to make
HS a useful standalone feature, but I feel like that's all triage rather
than looking for the most robust fix possible.

If you're running a system that also is using Streaming Replication,
there is a much better approach possible. This idea has been floating
around for a while and I am not taking credit for inventing it (too busy
tonight to dig into the archives to figure out exactly when this popped
up initially and who deserves credit for it). I'm just pointing out
that now is the time where it's actually possible to implement. The HS
TODO already includes the following action item, to resolve a serious
issue you can run into (that itself would be great to eliminate):

"Requires keep-alives with timestamps to be added to sync rep feature"

If those keep-alives flowed in both directions, and included both
timestamps *and* xid visibility information, the master could easily be
configured to hold open xid snapshots needed for long running queries on
the standby when that was necessary. I might be missing an
implementation detail here, but from a high level it seems like you
could make the walreceiver on the master publish the information about
where the standby has advanced to as a bit of ProcArray xmin data. Then
the master could only advance past where the standby says it cannot need
visibility behind anymore.

This is a much more elegant solution than any of the hacks available so
far. It would turn Hot Standby + Streaming Replication into a system
that stepped out of the way of the worst of the technical limitations of
HS alone. The master would easily and automatically avoid advancing
past where the queries running on the standby needed visibility back to,
essentially the same way cleanup is blocked during a long-running query
on the primary--except with the actual main query work offloaded to the
standby, the idea all along.

I don't know how difficult the keepalive feature was expected to be, and
there's certainly plenty of potential landmines in this whole xid export
idea. How to handle situations where the standby goes away for a while,
such as a network outage, so that it doesn't block the master from ever
cleaning up dead tuples is a concern. I wouldn't expect that to be too
serious of a blocker, given that if the standby isn't talking it
probably isn't running queries you need to worry about canceling
either. Not sure where else this can fall down, and unfortunately I
don't know nearly enough about the SR code to help myself with
implementing this feature. (I think Simon is in a similar
position--it's just not what we've been staring at the last few months).

But I do know that the current Hot Standby implementation is going to be
frustrating to configure correctly for people. If it's possible to make
most of that go away just by doing some final integration between it and
Streaming Replication that just wasn't practical to accomplish until
now, I think it's worth considering how to make that happen before the
final 9.0 release.

I really hope this discussion can say focused on if and how it's
possible to improve this area, with the goal being to deliver a product
everyone can be proud of with the full feature set that makes this next
release a killer one. The features that have managed to all get into
this release already are fantastic, everyone who contributed should be
proud of that progress, and it's encouraging that the alpha4 date was
nailed. It would be easy to descend into finger-pointing for why
exactly this particular problem is only getting more visibility now, or
into schedule-oriented commentary suggesting it must be ignored because
it's too late to do anything about it. I hope everyone appreciates
wandering that way will not help make PostgreSQL 9.0 a better release.
This issue is so easy to encounter, and looks so bad when it happens,
that I feel it could easily lead to an embarrassing situation for the
community if something isn't done about it before release.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us

Attachment	Content-Type	Size
hs-demo.tar.gz	application/x-gzip	1.5 KB

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-26 12:10:07
Message-ID:	407d949e1002260410o1660bbaya01da5923b615983@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Feb 26, 2010 at 8:33 AM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
>
> I'm not sure what you might be expecting from the above combination, but
> what actually happens is that many of the SELECT statements on the table
> *that isn't even being updated* are canceled. You see this in the logs:

Well I proposed that the default should be to wait forever when
applying WAL logs that conflict with a query. Precisely because I
think the expectation is that things will "just work" and queries not
fail unpredictably. Perhaps in your test a larger max_standby_delay
would have prevented the cancellations but then as soon as you try a
query which lasts longer you would have to raise it again. There's no
safe value which will be right for everyone.

> If you're running a system that also is using Streaming Replication, there
> is a much better approach possible.

So I think one of the main advantages of a log shipping system over
the trigger-based systems is precisely that it doesn't require the
master to do anything it wasn't doing already. There's nothing the
slave can do which can interfere with the master's normal operation.

This independence is really a huge feature. It means you can allow
users on the slave that you would never let near the master. The
master can continue running production query traffic while users run
all kinds of crazy queries on the slave and drive it into the ground
and the master will continue on blithely unaware that anything's
changed.

In the model you describe any long-lived queries on the slave cause
tables in the master to bloat with dead records.

I think this model is on the roadmap but it's not appropriate for
everyone and I think one of the benefits of having delayed it is that
it forces us to get the independent model right before throwing in
extra complications. It would be too easy to rely on the slave
feedback as an answer for hard questions about usability if we had it
and just ignore the question of what to do when it's not the right
solution for the user.

--
greg

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-26 14:10:26
Message-ID:	4B87D652.6050008@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Smith wrote:
> Attached is a tar file with some test case demo scripts that demonstrate
> the worst of the problems here IMHO.

Thanks for that! We've been discussing this for ages, so it's nice to
have a concrete example.

> I don't want to belittle that work because it's been important to make
> HS a useful standalone feature, but I feel like that's all triage rather
> than looking for the most robust fix possible.

Ideally the standby would stash away the old pages or tuples somewhere
so that it can still access them even after replaying the WAL records
that remove them from the main storage. I realize that's not going to
happen any time soon because it's hard to do, but that would really be
the most robust fix possible.

> I don't know how difficult the keepalive feature was expected to be, and
> there's certainly plenty of potential landmines in this whole xid export
> idea.

One such landmine is that the keepalives need to flow from client to
server while the WAL records are flowing from server to client. We'll
have to crack that problem for synchronous replication too, but I think
that alone is a big enough problem to make this 9.1 material.

> How to handle situations where the standby goes away for a while,
> such as a network outage, so that it doesn't block the master from ever
> cleaning up dead tuples is a concern.

Yeah, that's another issue that needs to be dealt with. You'd probably
need some kind of a configurable escape valve in the master, to let it
ignore a standby's snapshot once it gets too old.

> But I do know that the current Hot Standby implementation is going to be
> frustrating to configure correctly for people.

Perhaps others who are not as deep into the code as I am will have a
better view on this, but I seriously don't think that's such a big
issue. I think the max_standby_delay setting is quite intuitive and easy
to explain. Sure, it would better if there was no tradeoff between
killing queries and stalling recovery, but I don't think it'll be that
hard to understand the tradeoff.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Richard Huxton <dev(at)archonet(dot)com>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-26 14:17:23
Message-ID:	4B87D7F3.1060603@archonet.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 26/02/10 08:33, Greg Smith wrote:
> There are a number of HS
> tunables that interact with one another, and depending your priorities a
> few ways you can try to optimize the configuration for what I expect to
> be common use cases for this feature.

> I've written a blog entry at
> http://blog.2ndquadrant.com/en/2010/02/tradeoffs-in-hot-standby-deplo.html
> that tries to explain all that background clearly,

It did too. Thanks for the nice summary people can be pointed at.

> I'm not sure what you might be expecting from the above combination, but
> what actually happens is that many of the SELECT statements on the table
> *that isn't even being updated* are canceled. You see this in the logs:

Hmm - this I'd already figured out for myself. It's just occurred to me
that this could well be the case between databases too. Database A gets
vacuumed, B gets its queries kicked off on the standby. Granted lots of
people just have the one main DB, but even so...

> LOG: restored log file "0000000100000000000000A5" from archive
> ERROR: canceling statement due to conflict with recovery
> DETAIL: User query might have needed to see row versions that must be
> removed.
> STATEMENT: SELECT sum(abalance) FROM pgbench_accounts;
>
> Basically, every time a WAL segment appears that wipes out a tuple that
> SELECT expects should still be visible, because the dead row left behind
> by the update has been vacuumed away, the query is canceled. This
> happens all the time the way I've set this up, and I don't feel like
> this is a contrived demo. Having a long-running query on the standby
> while things get updated and then periodically autovacuumed on the
> primary is going to be extremely common in the sorts of production
> systems I expect want HS the most.

I can pretty much everyone wanting HS+SR. Thousands of small DBs running
on VMs for a start. Free mostly-live backup? Got to be a winner.

Dumb non-hacker question: why do we cancel all transactions rather than
just those with "ACCESS SHARE" on the vacuumed table in question? Is it
the simple fact that we don't know what table this particular section of
WAL affects, or is it the complexity of tracking all this info?

> If you're running a system that also is using Streaming Replication,
> there is a much better approach possible.

> "Requires keep-alives with timestamps to be added to sync rep feature"
>
> If those keep-alives flowed in both directions, and included both
> timestamps *and* xid visibility information, the master could easily be
> configured to hold open xid snapshots needed for long running queries on
> the standby when that was necessary.

Presumably meaning we need *another* config setting to prevent excessive
bloat on a heavily updated table on the master.

--
Richard Huxton
Archonet Ltd

From:	Richard Huxton <dev(at)archonet(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-26 14:24:30
Message-ID:	4B87D99E.6030608@archonet.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 26/02/10 14:10, Heikki Linnakangas wrote:
>
> Ideally the standby would stash away the old pages or tuples somewhere
> so that it can still access them even after replaying the WAL records
> that remove them from the main storage. I realize that's not going to
> happen any time soon because it's hard to do, but that would really be
> the most robust fix possible.

Something like snapshotting a filesystem, so updates continue while
you're still looking at a static version.

--
Richard Huxton
Archonet Ltd

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Richard Huxton <dev(at)archonet(dot)com>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-26 14:45:07
Message-ID:	4B87DE73.8000702@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Richard Huxton wrote:
> On 26/02/10 08:33, Greg Smith wrote:
>> I'm not sure what you might be expecting from the above combination, but
>> what actually happens is that many of the SELECT statements on the table
>> *that isn't even being updated* are canceled. You see this in the logs:
>
> Hmm - this I'd already figured out for myself. It's just occurred to me
> that this could well be the case between databases too. Database A gets
> vacuumed, B gets its queries kicked off on the standby.

No, it's per-database already. Only queries in the same database are
canceled.

> Dumb non-hacker question: why do we cancel all transactions rather than
> just those with "ACCESS SHARE" on the vacuumed table in question? Is it
> the simple fact that we don't know what table this particular section of
> WAL affects, or is it the complexity of tracking all this info?

The problem is that even if transaction X doesn't have an (access share)
lock on the vacuumed table at the moment, it might take one in the
future. Simon proposed mechanisms for storing the information about
vacuumed tables in shared memory, so that if X takes the lock later on
it will get canceled at that point, but that's 9.1 material.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Richard Huxton <dev(at)archonet(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-26 14:57:16
Message-ID:	4B87E14C.9080709@archonet.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 26/02/10 14:45, Heikki Linnakangas wrote:
> Richard Huxton wrote:
>> On 26/02/10 08:33, Greg Smith wrote:
>>> I'm not sure what you might be expecting from the above combination, but
>>> what actually happens is that many of the SELECT statements on the table
>>> *that isn't even being updated* are canceled. You see this in the logs:
>>
>> Hmm - this I'd already figured out for myself. It's just occurred to me
>> that this could well be the case between databases too. Database A gets
>> vacuumed, B gets its queries kicked off on the standby.
>
> No, it's per-database already. Only queries in the same database are
> canceled.

That's a relief.

>> Dumb non-hacker question: why do we cancel all transactions rather than
>> just those with "ACCESS SHARE" on the vacuumed table in question? Is it
>> the simple fact that we don't know what table this particular section of
>> WAL affects, or is it the complexity of tracking all this info?
>
> The problem is that even if transaction X doesn't have an (access share)
> lock on the vacuumed table at the moment, it might take one in the
> future. Simon proposed mechanisms for storing the information about
> vacuumed tables in shared memory, so that if X takes the lock later on
> it will get canceled at that point, but that's 9.1 material.

I see - we'd need to age the list of vacuumed tables too, so when the
oldest transactions complete the correct flags get cleared.

Can we not wait to cancel the transaction until *any* new lock is
attempted though? That should protect all the single-statement
long-running transactions that are already underway. Aggregates etc.

--
Richard Huxton
Archonet Ltd

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Richard Huxton <dev(at)archonet(dot)com>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-26 15:21:03
Message-ID:	4B87E6DF.2030501@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Richard Huxton wrote:
> Can we not wait to cancel the transaction until *any* new lock is
> attempted though? That should protect all the single-statement
> long-running transactions that are already underway. Aggregates etc.

Hmm, that's an interesting thought. You'll still need to somehow tell
the victim backend "you have to fail if you try to acquire any more
locks", but a single per-backend flag in the procarray would suffice.

You could also clear the flag whenever you free the last snapshot in the
transaction (ie. between each query in read committed mode).

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Richard Huxton <dev(at)archonet(dot)com>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-26 16:43:24
Message-ID:	4B87FA2C.9030508@archonet.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Replying to my own post - first sign of madness...

Let's see if I've got the concepts clear here, and hopefully my thinking
it through will help others reading the archives.

There are two queues:
1. Cleanup on the master
2. Replay on the slave

Running write queries on the master adds to both queues.
Running (read-only) queries on the slave prevents you removing from both
queues.

There are two interesting measurements of "age"/"size":
1. Oldest item in / length of queue (knowable)
2. How long will it take to clear the queue (estimable at best)

You'd like to know #2 to keep up with your workload. Unfortunately, you
can't for certain unless you have control over new incoming queries (on
both master and slave).

You might want four separate GUCs for the two measurements on the two
queues. We currently have two that (sort of) match #1 "Oldest item"
(vacuum_defer_cleanup_age, max_standby_delay).

Delaying replay on a slave has no effect on the master. If a slave falls
too far behind it's responsible for catch-up (via normal WAL archives).

There is no point in delaying cleanup on the master unless it's going to
help one or more slaves. In fact, you don't want to start delaying
cleanup until you have to, otherwise you're wasting your delay time.
This seems to be the case with vacuum_defer_cleanup_age. If I have a
heavily-updated table and I defer vacuuming then when any given query
starts on the slave it's going to be half used up already.

There's also no point in deferring cleanup on the master if the standby
is already waiting on a conflict that will cause its queries to be
cancelled anyway. Not only won't it help, but it might make things worse
since transactions will be cancelled, the conflict will be replayed and
(presumably) queries will be re-submitted only to be cancelled again.

This is what Greg Smith's discussion of the keep-alives was about.
Giving the master enough information to be smarter about cleanup (and
making the conflicts more fine-grained).

The situation with deferring on one or both ends of process just gets
more complicated with multiple slaves. There's all sorts of unpleasant
feedback loops I can envisage there.

For the case of single slave being used to run long reporting queries
the ideal scenario would be the following. Master starts deferring
vacuum activity just before the query starts. When that times out, the
slave will receive the cleanup info, refuse to replay it and start its
delay. This gives you a total available query time of:
natural time between vacuums + vacuum delay + WAL transfer time +
standby delay

I can think of five useful things we should be doing (and might be
already - don't know).

1. On the master, deduce whether the slave is already waiting on a
query. If so, don't bother delaying cleanup. Clearly you don't want to
be signalling hundreds of times a second though. Does the slave pause
fetching via streaming replication if replay is blocked on a query?
Could we signal "half-way to max-age" or some such?

2. Perhaps simpler than trying to make the master smarter, just allow
SET this_transaction_is_probably_a_long_one=true on the slave. That (a)
clears the queue on the slave and (b) sends the signal to the master
which then starts deferring vacuum.

3. Do a burst of cleanup activity on the master after blocking. This
should concentrate conflicts together when they reach the slave. Perhaps
vacuum_defer_cleanup_age should be vacuum_deferred_queue_size and
measure the amount of work to do, rather than the max age of the oldest
cleanup (if I've understood correctly).

4. Do a burst of replay on the slave after blocking. Perhaps every time
it cancels a transaction it should replay at least half the queued WAL
before letting new transactions start. Or perhaps it replays any vacuum
activity it comes across and then stops. That should sync with #2
assuming the slave doesn't lag the master too much.

5. I've been mixing "defer" and "delay", as do the docs. We should
probably settle on one or the other. I think defer conveys the meaning
more precisely, but what about non-native English speakers?

--
Richard Huxton
Archonet Ltd

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Richard Huxton <dev(at)archonet(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-26 17:45:33
Message-ID:	603c8f071002260945t82f6f3dt67ceca34ba9f9287@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Feb 26, 2010 at 10:21 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Richard Huxton wrote:
>> Can we not wait to cancel the transaction until *any* new lock is
>> attempted though? That should protect all the single-statement
>> long-running transactions that are already underway. Aggregates etc.
>
> Hmm, that's an interesting thought. You'll still need to somehow tell
> the victim backend "you have to fail if you try to acquire any more
> locks", but a single per-backend flag in the procarray would suffice.
>
> You could also clear the flag whenever you free the last snapshot in the
> transaction (ie. between each query in read committed mode).

Wow, that seems like it would help a lot. Although I'm not 100% sure
I follow all the details of how this works.

...Robert

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Richard Huxton <dev(at)archonet(dot)com>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-26 17:46:15
Message-ID:	407d949e1002260946i69eb58eej82415f9a6b5a7854@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Feb 26, 2010 at 4:43 PM, Richard Huxton <dev(at)archonet(dot)com> wrote:
> Let's see if I've got the concepts clear here, and hopefully my thinking it
> through will help others reading the archives.
>
> There are two queues:

I don't see two queues. I only see the one queue of operations which
have been executed on the master but not replayed yet on the slave.
Every write operation on the master enqueues an operation to it and
every operation replayed on the slave dequeues from it. Only a subset
of operations create conflicts with concurrent transactions on the
slave, namely vacuums and a few similar operations (HOT pruning and
btree index pruning).

There's no question we need to make sure users have good tools to
monitor this queue and are aware of these tools. You can query each
slave for its currently replayed log position and hopefully you can
find out how long it's been delayed (ie, if it's looking at a log
record and waiting for a conflict to clear how long ago that log
record was generated). You can also find out what the log position is
on the master.

--
greg

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-26 18:47:09
Message-ID:	201002261847.o1QIl9Z29565@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas wrote:
> > How to handle situations where the standby goes away for a while,
> > such as a network outage, so that it doesn't block the master from ever
> > cleaning up dead tuples is a concern.
>
> Yeah, that's another issue that needs to be dealt with. You'd probably
> need some kind of a configurable escape valve in the master, to let it
> ignore a standby's snapshot once it gets too old.
>
> > But I do know that the current Hot Standby implementation is going to be
> > frustrating to configure correctly for people.
>
> Perhaps others who are not as deep into the code as I am will have a
> better view on this, but I seriously don't think that's such a big
> issue. I think the max_standby_delay setting is quite intuitive and easy
> to explain. Sure, it would better if there was no tradeoff between
> killing queries and stalling recovery, but I don't think it'll be that
> hard to understand the tradeoff.

Let's look at the five documented cases of query conflict (from our manual):

1 Access Exclusive Locks from primary node, including both explicit
LOCK commands and various DDL actions

2 Dropping tablespaces on the primary while standby queries are
using those tablespaces for temporary work files (work_mem
overflow)

3 Dropping databases on the primary while users are connected to
that database on the standby.

4 The standby waiting longer than max_standby_delay to acquire a
buffer cleanup lock.

5 Early cleanup of data still visible to the current query's
snapshot

We might have a solution to #1 by only cancelling queries that try to
take locks.

#2 and #3 seem like rare occurances.

#4 can be controlled by max_standby_delay, where a large value only
delays playback during crash recovery --- again, a rare occurance.

#5 could be handled by using vacuum_defer_cleanup_age on the master.

Why is vacuum_defer_cleanup_age not listed in postgresql.conf?

In summary, I think passing snapshots to the master is not something
possible for 9.0, and ideally we will never need to add that feature.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com
PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do
+ If your life is a hard drive, Christ can be your backup. +

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-26 18:53:16
Message-ID:	1916.1267210396@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> writes:
> In the model you describe any long-lived queries on the slave cause
> tables in the master to bloat with dead records.

Yup, same as they would do on the master.

> I think this model is on the roadmap but it's not appropriate for
> everyone and I think one of the benefits of having delayed it is that
> it forces us to get the independent model right before throwing in
> extra complications. It would be too easy to rely on the slave
> feedback as an answer for hard questions about usability if we had it
> and just ignore the question of what to do when it's not the right
> solution for the user.

I'm going to make an unvarnished assertion here. I believe that the
notion of synchronizing the WAL stream against slave queries is
fundamentally wrong and we will never be able to make it work.
The information needed isn't available in the log stream and can't be
made available without very large additions (and consequent performance
penalties). As we start getting actual beta testing we are going to
uncover all sorts of missed cases that are not going to be fixable
without piling additional ugly kluges on top of the ones Simon has
already crammed into the system. Performance and reliability will both
suffer.

I think that what we are going to have to do before we can ship 9.0
is rip all of that stuff out and replace it with the sort of closed-loop
synchronization Greg Smith is pushing. It will probably be several
months before everyone is forced to accept that, which is why 9.0 is
not going to ship this year.

regards, tom lane

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Greg Stark <gsstark(at)mit(dot)edu>, Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-26 19:01:30
Message-ID:	201002261901.o1QJ1UY04265@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> I'm going to make an unvarnished assertion here. I believe that the
> notion of synchronizing the WAL stream against slave queries is
> fundamentally wrong and we will never be able to make it work.
> The information needed isn't available in the log stream and can't be
> made available without very large additions (and consequent performance
> penalties). As we start getting actual beta testing we are going to
> uncover all sorts of missed cases that are not going to be fixable
> without piling additional ugly kluges on top of the ones Simon has
> already crammed into the system. Performance and reliability will both
> suffer.
>
> I think that what we are going to have to do before we can ship 9.0
> is rip all of that stuff out and replace it with the sort of closed-loop
> synchronization Greg Smith is pushing. It will probably be several
> months before everyone is forced to accept that, which is why 9.0 is
> not going to ship this year.

Wow, can I have some varnish with that. :-O

You are right that we need to go down the road a bit before we know what
we need for 9.0 or 9.1.

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-26 19:02:12
Message-ID:	4B881AB4.4090303@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2/26/10 6:57 AM, Richard Huxton wrote:
>
> Can we not wait to cancel the transaction until *any* new lock is
> attempted though? That should protect all the single-statement
> long-running transactions that are already underway. Aggregates etc.

I like this approach. Is it fragile in some non-obvious way?

--Josh Berkus

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-26 19:06:51
Message-ID:	4B881BCB.1090000@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2/26/10 10:53 AM, Tom Lane wrote:
> I think that what we are going to have to do before we can ship 9.0
> is rip all of that stuff out and replace it with the sort of closed-loop
> synchronization Greg Smith is pushing. It will probably be several
> months before everyone is forced to accept that, which is why 9.0 is
> not going to ship this year.

I don't think that publishing visibility info back to the master ... and
subsequently burdening the master substantially for each additional
slave ... are what most users want. Certainly for use cases like NTT's,
it is, but not for most of our users.

In fact, I seem to remember specifically discussing the approach of
trying to publish snapshots back to the master, and rejecting it on this
list during the development of SR.

Does anyone know how Oracle solves these issues? Does their structure
(separate rollback log) make it easier for them?

--Josh Berkus

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-26 19:16:40
Message-ID:	2425.1267211800@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Josh Berkus <josh(at)agliodbs(dot)com> writes:
> On 2/26/10 10:53 AM, Tom Lane wrote:
>> I think that what we are going to have to do before we can ship 9.0
>> is rip all of that stuff out and replace it with the sort of closed-loop
>> synchronization Greg Smith is pushing. It will probably be several
>> months before everyone is forced to accept that, which is why 9.0 is
>> not going to ship this year.

> I don't think that publishing visibility info back to the master ... and
> subsequently burdening the master substantially for each additional
> slave ... are what most users want.

I don't see a "substantial additional burden" there. What I would
imagine is needed is that the slave transmits a single number back
--- its current oldest xmin --- and the walsender process publishes
that number as its transaction xmin in its PGPROC entry on the master.

I don't doubt that this approach will have its own gotchas that we
find as we get into it. But it looks soluble. I have no faith in
either the correctness or the usability of the approach currently
being pursued.

regards, tom lane

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-26 20:02:08
Message-ID:	4B8828C0.7030501@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> I don't see a "substantial additional burden" there. What I would
> imagine is needed is that the slave transmits a single number back
> --- its current oldest xmin --- and the walsender process publishes
> that number as its transaction xmin in its PGPROC entry on the master.

If the main purpose of the slave is long-running queries, though, this
could cause a lot of bloat on the master. That's a special case, but a
reason why we would want to preserve the stop replication functionality.

> I don't doubt that this approach will have its own gotchas that we
> find as we get into it. But it looks soluble. I have no faith in
> either the correctness or the usability of the approach currently
> being pursued.

So, why not start working on it now, instead of arguing about it? It'll
be easy to prove the approach once we have some test code.

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-26 20:04:23
Message-ID:	4B882947.4050800@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> Josh Berkus <josh(at)agliodbs(dot)com> writes:
>> On 2/26/10 10:53 AM, Tom Lane wrote:
>>> I think that what we are going to have to do before we can ship 9.0
>>> is rip all of that stuff out and replace it with the sort of closed-loop
>>> synchronization Greg Smith is pushing. It will probably be several
>>> months before everyone is forced to accept that, which is why 9.0 is
>>> not going to ship this year.
>
>> I don't think that publishing visibility info back to the master ... and
>> subsequently burdening the master substantially for each additional
>> slave ... are what most users want.
>
> I don't see a "substantial additional burden" there. What I would
> imagine is needed is that the slave transmits a single number back
> --- its current oldest xmin --- and the walsender process publishes
> that number as its transaction xmin in its PGPROC entry on the master.

The additional burden comes from the old snapshot effect. It makes it
unusable for offloading reporting queries, for example. In general, it
is a very good architectural property that the master is not affected by
what happens in a standby, and a closed-loop synchronization would break
that.

I don't actually understand how tight synchronization on its own would
solve the problem. What if the connection to the master is lost? Do you
kill all queries in the standby before reconnecting?

One way to think about this is to first consider a simple a stop-and-go
system. Clearly the database must be consistent at any point in the WAL
sequence, if recovery was stopped and the database started up. So it is
always safe to pause recovery and run a read-only query against the
database as it is at that point in time (this assumes that the index
"cleanup" operations are not required for consistent query results BTW).
After the read-only transaction is finished, you can continue recovery.

The next step up is to relax that so that you allow replay of those WAL
records that are known to not cause trouble to the read-only queries.
For example, heap_insert records are very innocent, they only add rows
with a yet-uncommitted xmin.

Things get more complex when you allow the replay of commit records; all
the known-assigned-xids tracking is related to that, so that
transactions that are not committed when a snapshot is taken in the
standby to be considered uncommitted by the snapshot even after the
commit record is later replayed. If that feels too fragile, there might
be other methods to achieve that. One I once pondered is to not track
all in-progress transactions in shared memory like we do now, but only
OldestXmin. When a backend wants to take a snapshot in the slave, it
memcpy()s clog from OldestXmin to the latest committed XID, and includes
it in the snapshot. The visibility checks use the copy instead of the
actual clog, so they see the situation as it was when the snapshot was
taken. To keep track of the OldestXmin in the slave, the master can emit
that as a WAL record every now and then; it's ok if it lags behind.

Then there's the WAL record types that remove data that might still be
required by the read-only transactions. This includes vacuum and index
deletion records.

If you really think the current approach is unworkable, I'd suggest that
we fall back to a stop-and-go system, where you either let the recovery
to progress or allow queries to run, but not both at the same time. But
FWIW I don't think the situation is that grave.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-26 20:10:11
Message-ID:	407d949e1002261210v3793df96qb29bc5b9bf403535@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Feb 26, 2010 at 7:16 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I don't see a "substantial additional burden" there. What I would
> imagine is needed is that the slave transmits a single number back
> --- its current oldest xmin --- and the walsender process publishes
> that number as its transaction xmin in its PGPROC entry on the master.

And when we want to support cascading slaves?

Or when you want to bring up a new slave and it suddenly starts
advertising a new xmin that's older than the current oldestxmin?

But in any case if I were running a reporting database I would want it
to just stop replaying logs for a few hours while my big batch report
runs, not cause the master to be unable to vacuum any dead records for
hours. That defeats much of the purpose of running the queries on the
slave.

--
greg

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-26 20:21:05
Message-ID:	3604.1267215665@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> I don't actually understand how tight synchronization on its own would
> solve the problem. What if the connection to the master is lost? Do you
> kill all queries in the standby before reconnecting?

Sure. So what? They'd have been killed if they individually lost
connections to the master (or the slave), too.

> [ assorted analysis based on WAL contents ]

The problem is all the interactions that are not reflected (historically
anyway) to WAL. We already know about btree page reclamation interlocks
and relcache init files. How many others are there, and how messy and
expensive is it going to be to deal with them?

> If you really think the current approach is unworkable, I'd suggest that
> we fall back to a stop-and-go system, where you either let the recovery
> to progress or allow queries to run, but not both at the same time. But
> FWIW I don't think the situation is that grave.

I might be wrong. I hope for the sake of the project schedule that I am
wrong. But I'm afraid that we will spend several months beavering away
to try to make the current approach solid and user-friendly, and
eventually conclude that it's a dead end. It would be prudent to have
a Plan B; and it looks to me like closed-loop synchronization is the
best Plan B. Putting off all thought about it for the next release
cycle seems like a recipe for a scheduling disaster.

regards, tom lane

From:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-26 20:22:56
Message-ID:	1267215776.11463.13.camel@jd-desktop.unknown.charter.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, 2010-02-26 at 12:02 -0800, Josh Berkus wrote:
> > I don't see a "substantial additional burden" there. What I would
> > imagine is needed is that the slave transmits a single number back
> > --- its current oldest xmin --- and the walsender process publishes
> > that number as its transaction xmin in its PGPROC entry on the master.
>
> If the main purpose of the slave is long-running queries, though, this
> could cause a lot of bloat on the master. That's a special case, but a
> reason why we would want to preserve the stop replication functionality.
>

Do we really think that users, using the slave to run long-running
queries is a special case? One of the number one things I can see this
being used for is reporting....

Sincerely,

Joshua D. Drake

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
Respect is earned, not gained through arbitrary and repetitive use or Mr. or Sir.

From:	Aidan Van Dyk <aidan(at)highrise(dot)ca>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-26 20:25:33
Message-ID:	20100226202533.GF7843@oak.highrise.ca
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

* Greg Stark <gsstark(at)mit(dot)edu> [100226 15:10]:
> On Fri, Feb 26, 2010 at 7:16 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > I don't see a "substantial additional burden" there. What I would
> > imagine is needed is that the slave transmits a single number back
> > --- its current oldest xmin --- and the walsender process publishes
> > that number as its transaction xmin in its PGPROC entry on the master.
>
> And when we want to support cascading slaves?
>
> Or when you want to bring up a new slave and it suddenly starts
> advertising a new xmin that's older than the current oldestxmin?
>
> But in any case if I were running a reporting database I would want it
> to just stop replaying logs for a few hours while my big batch report
> runs, not cause the master to be unable to vacuum any dead records for
> hours. That defeats much of the purpose of running the queries on the
> slave.

*I* would be quite happy having the stop--and-go and the closed-loop be
the only 2 modes of operation, and I'ld even be quite happy if the were
both limited to separate method:

1) Running SR - then you are forced to use a closed-loop
2) Running HS from a backup/archive - forced to use stop-n-go

#1 still needs to deal ith a slave "disappearing" and not advancing xmin
for a period (TCP timeout)?

I'll note that until SR does synchronous streaming rep (which will
likely require some close-loop plan to allow the slave to be hot), I
want situation #2, and hopefully the knob to control how long it allows
a "stop" before going again can be a HUP'able knob so I can change it
occasionally without taking the server down...

--
Aidan Van Dyk Create like a god,
aidan(at)highrise(dot)ca command like a king,
http://www.highrise.ca/ work like a slave.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-26 20:30:01
Message-ID:	3834.1267216201@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> writes:
> On Fri, Feb 26, 2010 at 7:16 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I don't see a "substantial additional burden" there. What I would
>> imagine is needed is that the slave transmits a single number back
>> --- its current oldest xmin --- and the walsender process publishes
>> that number as its transaction xmin in its PGPROC entry on the master.

> And when we want to support cascading slaves?

So? Fits right in. The walsender on the first-level slave is
advertising an xmin from the second-level one, which will be included in
what's passed back up to the master.

> Or when you want to bring up a new slave and it suddenly starts
> advertising a new xmin that's older than the current oldestxmin?

How's it going to do that, when it has no queries at the instant
of startup?

> But in any case if I were running a reporting database I would want it
> to just stop replaying logs for a few hours while my big batch report
> runs, not cause the master to be unable to vacuum any dead records for
> hours. That defeats much of the purpose of running the queries on the
> slave.

Well, as Heikki said, a stop-and-go WAL management approach could deal
with that use-case. What I'm concerned about here is the complexity,
reliability, maintainability of trying to interlock WAL application with
slave queries in any sort of fine-grained fashion.

regards, tom lane

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Greg Stark <gsstark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-26 21:06:10
Message-ID:	4B8837C2.8030509@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> Well, as Heikki said, a stop-and-go WAL management approach could deal
> with that use-case. What I'm concerned about here is the complexity,
> reliability, maintainability of trying to interlock WAL application with
> slave queries in any sort of fine-grained fashion.

This sounds a bit brute-force, but what about simply having some form of
automatic query retry on the slave?

--Josh Berkus

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-26 21:11:32
Message-ID:	407d949e1002261311y25ab9384u8c3cf70c1daae082@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Feb 26, 2010 at 8:30 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> How's it going to do that, when it has no queries at the instant
> of startup?
>

Why shouldn't it have any queries at walreceiver startup? It has any
xlog segments that were copied from the master and any it can find in
the archive, it could easily reach a consistent point long before it
needs to connect to the master. If you really want to protect your
master from any additional overhead you don't currently need to
configure a streaming connection at all, you can just use the file
shipping interface.

--
greg

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-26 21:19:39
Message-ID:	5108.1267219179@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> writes:
> Why shouldn't it have any queries at walreceiver startup? It has any
> xlog segments that were copied from the master and any it can find in
> the archive, it could easily reach a consistent point long before it
> needs to connect to the master. If you really want to protect your
> master from any additional overhead you don't currently need to
> configure a streaming connection at all, you can just use the file
> shipping interface.

There's *definitely* not going to be enough information in the WAL
stream coming from a master that doesn't think it has HS slaves.
We can't afford to record all that extra stuff in installations for
which it's just useless overhead. BTW, has anyone made any attempt
to measure the performance hit that the patch in its current form is
creating via added WAL entries?

regards, tom lane

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-26 21:22:47
Message-ID:	407d949e1002261322u68716eddl6fe33d5985ca620f@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Feb 26, 2010 at 9:19 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> There's *definitely* not going to be enough information in the WAL
> stream coming from a master that doesn't think it has HS slaves.
> We can't afford to record all that extra stuff in installations for
> which it's just useless overhead. BTW, has anyone made any attempt
> to measure the performance hit that the patch in its current form is
> creating via added WAL entries?

What extra entries?

--
greg

From:	Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Greg Stark <gsstark(at)mit(dot)edu>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-26 21:39:12
Message-ID:	m2vddju16n.fsf@hi-media.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> Well, as Heikki said, a stop-and-go WAL management approach could deal
> with that use-case. What I'm concerned about here is the complexity,
> reliability, maintainability of trying to interlock WAL application with
> slave queries in any sort of fine-grained fashion.

Some admin functions for Hot Standby were removed from the path to ease
its integration, there was a pause() and resume() feature.

I think that offering this explicit control to the user would allow them
to choose between HA setup and reporting setup easily enough: just pause
the replay when running the reporting, resume it to get fresh data
again. If you don't pause, any query can get killed, replay is the
priority.

Now as far as the feedback loop is concerned, I guess the pause()
function would cause the slave to stop publishing any xmin in the
master's procarray so that it's free to vacuum and archive whatever it
wants to.

Should the slave accumulate too much lag, it will resume from the
archive rather than live from the SR link.

How much that helps?

Regards,
--
dim

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-26 21:44:20
Message-ID:	5617.1267220660@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> writes:
> On Fri, Feb 26, 2010 at 9:19 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> There's *definitely* not going to be enough information in the WAL
>> stream coming from a master that doesn't think it has HS slaves.
>> We can't afford to record all that extra stuff in installations for
>> which it's just useless overhead. BTW, has anyone made any attempt
>> to measure the performance hit that the patch in its current form is
>> creating via added WAL entries?

> What extra entries?

Locks, just for starters. I haven't read enough of the code yet to know
what else Simon added. In the past it's not been necessary to record
any transient information in WAL, but now we'll have to.

regards, tom lane

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-26 21:48:38
Message-ID:	201002262148.o1QLmch23807@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Dimitri Fontaine wrote:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> > Well, as Heikki said, a stop-and-go WAL management approach could deal
> > with that use-case. What I'm concerned about here is the complexity,
> > reliability, maintainability of trying to interlock WAL application with
> > slave queries in any sort of fine-grained fashion.
>
> Some admin functions for Hot Standby were removed from the path to ease
> its integration, there was a pause() and resume() feature.
>
> I think that offering this explicit control to the user would allow them
> to choose between HA setup and reporting setup easily enough: just pause
> the replay when running the reporting, resume it to get fresh data
> again. If you don't pause, any query can get killed, replay is the
> priority.

Doesn't the system already adjust the delay based on the length of slave
transactions, e.g. max_standby_delay. It seems there is no need for a
user switch --- just max_standby_delay really high.

From:	Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-26 22:11:28
Message-ID:	m2iq9jtzov.fsf@hi-media.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> Doesn't the system already adjust the delay based on the length of slave
> transactions, e.g. max_standby_delay. It seems there is no need for a
> user switch --- just max_standby_delay really high.

Well that GUC looks like it allows to set a compromise between HA and
reporting, not to say "do not ever give the priority to the replay while
I'm running my reports". At least that's how I understand it.

The feedback loop might get expensive on master server when running
reporting queries on the slave, unless you can "pause" it explicitly I
think. I don't see how the system will guess that you're running a
reporting server rather than a HA node, and max_standby_delay is just a
way to tell the standby to please be nice in case of abuse.

Regards,
--
dim

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-26 22:20:37
Message-ID:	201002262220.o1QMKbD00176@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Dimitri Fontaine wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > Doesn't the system already adjust the delay based on the length of slave
> > transactions, e.g. max_standby_delay. It seems there is no need for a
> > user switch --- just max_standby_delay really high.
>
> Well that GUC looks like it allows to set a compromise between HA and
> reporting, not to say "do not ever give the priority to the replay while
> I'm running my reports". At least that's how I understand it.

Well, if you set it high, it effectively is that.

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-26 23:33:57
Message-ID:	4B885A65.6050109@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> I don't see a "substantial additional burden" there. What I would
> imagine is needed is that the slave transmits a single number back
> --- its current oldest xmin --- and the walsender process publishes
> that number as its transaction xmin in its PGPROC entry on the master.
>

That is exactly the core idea I was trying to suggest in my rambling
message. Just that small additional bit of information transmitted and
published to the master via that route, and it's possible to optimize
this problem in a way not available now. And it's a way that I believe
will feel more natural to some users who may not be well served by any
of the existing tuning possibilities.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-26 23:56:55
Message-ID:	4B885FC7.5050209@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Bruce Momjian wrote:
> Doesn't the system already adjust the delay based on the length of slave
> transactions, e.g. max_standby_delay. It seems there is no need for a
> user switch --- just max_standby_delay really high.
>

The first issue is that you're basically saying "I don't care about high
availability anymore" when you increase max_standby_delay to a high
value. Want to offload an 8 hour long batch report every day to the
standby? You can do it with max_standby_delay=8 hours. But the day
your master crashes 7 hours into that, you're in for a long wait before
your standby is available while it replays all the queued up segments.
Your 'hot standby' has actually turned into the old form of 'cold
standby' just when you need it to be responsive.

This is also the reason why the whole "pause recovery" idea is a
fruitless path to wander down. The whole point of this feature is that
people have a secondary server available for high-availability, *first
and foremost*, but they'd like it to do something more interesting that
leave it idle all the time. The idea that you can hold off on applying
standby updates for long enough to run seriously long reports is
completely at odds with the idea of high-availability.

The second major problem is that the day the report actually takes 8.1
hours instead, because somebody else ran another report that slowed you
down a little, you're screwed if that's something you depend on being
available--it just got canceled only *after* wasting 8 hours of
reporting resource time.

max_standby_delay is IMHO only useful for allowing non-real-time web-app
style uses of HS (think "Facebook status updates"), where you say "I'm
OK giving people slightly out of date info sometimes if it lets me split
the query load over two systems". Set max_standby_delay to a few
seconds or maybe a minute, enough time to service a typical user query,
make your app tolerate the occasional failed query and only send big
ones to the master, and you've just scaled up all the small ones.
Distributed queries with "eventual consistency" on all nodes is where
many of the web app designs are going, and this feature is a reasonable
match for that use case.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-27 00:11:36
Message-ID:	4B886338.3020502@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Bruce Momjian wrote:
> 5 Early cleanup of data still visible to the current query's
> snapshot
>
> #5 could be handled by using vacuum_defer_cleanup_age on the master.
> Why is vacuum_defer_cleanup_age not listed in postgresql.conf?
>

I noticed that myself and fired off a corrective patch to Simon
yesterday, he said it was intentional but not sure why that is yet.
We'll sort that out.

You are correct that my suggestion is targeting primarily #5 on this
list. There are two problems with the possible solutions using that
parameter though:

-vacuum_defer_cleanup_age is set in a unit that people cannot be
expected to work in--transactions ids. The UI is essentially useless,
and there's no obvious way how to make a better one. The best you can do
will still be really fragile.

-If you increase vacuum_defer_cleanup_age, it's active all the time.
You're basically making every single transaction that could be cleaned
up pay for the fact that a query *might* be running on the standby it
needs to avoid.

You can think of the idea of passing an xmin back from the standby as
being like an auto-tuning vacuum_defer_cleanup_age. It's 0 when no
standby queries are running, but grows in size to match longer ones. And
you don't have to have to know anything to set it correctly; just toggle
on the proposed "feedback xid from the standby" feature and you're safe.

Expecting that anyone will ever set vacuum_defer_cleanup_age correctly
in the field in its current form is pretty unreasonable I think. Since
there's no timestamp-based memory of past xid activity, it's difficult
to convert it to that form instead, and I think something in terms of
time is what people would like to set this in.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-27 00:21:11
Message-ID:	4B886577.4020808@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> That is exactly the core idea I was trying to suggest in my rambling
> message. Just that small additional bit of information transmitted and
> published to the master via that route, and it's possible to optimize
> this problem in a way not available now. And it's a way that I believe
> will feel more natural to some users who may not be well served by any
> of the existing tuning possibilities.

Well, if both you and Tom think it would be relatively easy (or at least
easier that continuing to pursue query cancel troubleshooting), then
please start coding it. It was always a possible approach, we just
collectively thought that query cancel would be easier.

--Josh Berkus

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-27 00:36:57
Message-ID:	201002270036.o1R0avG03960@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Smith wrote:
> Bruce Momjian wrote:
> > Doesn't the system already adjust the delay based on the length of slave
> > transactions, e.g. max_standby_delay. It seems there is no need for a
> > user switch --- just max_standby_delay really high.
> >
>
> The first issue is that you're basically saying "I don't care about high
> availability anymore" when you increase max_standby_delay to a high
> value. Want to offload an 8 hour long batch report every day to the
> standby? You can do it with max_standby_delay=8 hours. But the day
> your master crashes 7 hours into that, you're in for a long wait before
> your standby is available while it replays all the queued up segments.
> Your 'hot standby' has actually turned into the old form of 'cold
> standby' just when you need it to be responsive.

Well, I think the choice is either you delay vacuum on the master for 8
hours or pile up 8 hours of WAL files on the slave, and delay
application, and make recovery much slower. It is not clear to me which
option a user would prefer because the bloat on the master might be
permanent.

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-27 00:43:48
Message-ID:	407d949e1002261643s57344d25ja20848439786fd67@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Feb 26, 2010 at 11:56 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> This is also the reason why the whole "pause recovery" idea is a fruitless
> path to wander down. The whole point of this feature is that people have a
> secondary server available for high-availability, *first and foremost*, but
> they'd like it to do something more interesting that leave it idle all the
> time. The idea that you can hold off on applying standby updates for long
> enough to run seriously long reports is completely at odds with the idea of
> high-availability.

Well you can go sit in the same corner as Simon with your high
availability servers.

I want my ability to run large batch queries without any performance
or reliability impact on the primary server.

You can have one or the other but you can't get both. If you set
max_standby_delay low then you get your high availability server, if
you set it high you get a useful report server.

If you build sync replication which we don't have today and which will
open another huge can of usability worms when we haven't even finish
bottling the two we've already opened then you lose the lack of impact
on the primary. Suddenly the queries you run on the slaves cause your
production database to bloat. Plus you have extra network connections
which take resources on your master and have to be kept up at all
times or you lose your slaves.

I think the design constraint of not allowing any upstream data flow
is actually very valuable. Eventually we'll have it for sync
replication but it's much better that we've built things incrementally
and can be sure that nothing really depends on it for basic
functionality. This is what allows us to know that the slave imposes
no reliability impact on the master. It's what allows us to know that
everything will work identically regardless of whether you have a
walreceiver running or are running off archived log files.

Remember I wanted to entirely abstract away the walreciever and allow
multiple wal communication methods. I think it would make more sense
to use something like Spread to distribute the logs so the master only
has to send them once and as many slaves as you want can pick them up.
The current architecture doesn't scale very well if you want to have
hundreds of slaves for one master.

--
greg

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-27 00:54:57
Message-ID:	201002270054.o1R0svG06834@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Smith wrote:
> You can think of the idea of passing an xmin back from the standby as
> being like an auto-tuning vacuum_defer_cleanup_age. It's 0 when no
> standby queries are running, but grows in size to match longer ones. And
> you don't have to have to know anything to set it correctly; just toggle
> on the proposed "feedback xid from the standby" feature and you're safe.

Yes, there is no question there is value in passing the xid back to the
master. My point is that it is like your blog entry:

http://blog.2ndquadrant.com/en/2010/02/tradeoffs-in-hot-standby-deplo.html

you can't have all the options:

1. agressive vacuum on master
2. fast recovery of slave
3. no query cancel on slave

Again, pick two. Passing the xid back to the master gives you #2 and
#3, and that might be good for some people, but not others. Do we have
any idea which options most administrators would want? If we get xid
passback to the master, do we keep the other configuration options?

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-27 01:53:13
Message-ID:	4B887B09.5080008@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Stark wrote:
> Well you can go sit in the same corner as Simon with your high
> availability servers.
>
> I want my ability to run large batch queries without any performance
> or reliability impact on the primary server.
>

Thank you for combining a small personal attack with a selfish
commentary about how yours is the only valid viewpoint. Saves me a lot
of trouble replying to your messages, can just ignore them instead if
this is how you're going to act.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-27 02:30:56
Message-ID:	407d949e1002261830qeea9c30y9dfe65407b382198@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Feb 27, 2010 at 1:53 AM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> Greg Stark wrote:
>>
>> Well you can go sit in the same corner as Simon with your high
>> availability servers.
>>
>> I want my ability to run large batch queries without any performance
>> or reliability impact on the primary server.
>>
>
> Thank you for combining a small personal attack with a selfish commentary
> about how yours is the only valid viewpoint. Saves me a lot of trouble
> replying to your messages, can just ignore them instead if this is how
> you're going to act.

Eh? That's not what I meant at all. Actually it's kind of the exact
opposite of what I meant.

What I meant was that your description of the "High Availability first
and foremost" is only one possible use case. Simon in the past
expressed the same single-minded focus on that use case. It's a
perfectly valid use case and I would probably agree if we had to
choose just one it would be the most important.

But we don't have to choose just one. There are other valid use cases
such as load balancing and isolating your large batch queries from
your production systems. I don't want us to throw out all these other
use cases because we only consider high availability as the only use
case we're interested in.

--
greg

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-27 02:43:58
Message-ID:	4B8886EE.8010407@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Bruce Momjian wrote:
> Well, I think the choice is either you delay vacuum on the master for 8
> hours or pile up 8 hours of WAL files on the slave, and delay
> application, and make recovery much slower. It is not clear to me which
> option a user would prefer because the bloat on the master might be
> permanent.
>

But if you're running the 8 hour report on the master right now, aren't
you already exposed to a similar pile of bloat issues while it's going?
If I have the choice between "sometimes queries will get canceled" vs.
"sometimes the master will experience the same long-running transaction
bloat issues as in earlier versions even if the query runs on the
standby", I feel like leaning toward the latter at least leads to a
problem people are used to.

This falls into the principle of least astonishment category to me.
Testing the final design for how transactions get canceled here led me
to some really unexpected situations, and the downside for a mistake is
"your query is lost". Had I instead discovered that sometimes
long-running transactions on the standby can ripple back to cause a
maintenance slowdown on the master, that's not great. But it would not
have been so surprising, and it won't result in lost query results.

I think people will expect that their queries cancel because of things
like DDL changes. And the existing knobs allow inserting some slack for
things like locks taking a little bit of time to acquire sometimes.
What I don't think people will see coming is that a routine update on an
unrelated table is going to kill a query they might have been waiting
hours for the result of, just because that update crossed an autovacuum
threshold for the other table and introduced a dead row cleanup.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-27 02:52:12
Message-ID:	4B8888DC.5060401@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas wrote:
> One such landmine is that the keepalives need to flow from client to
> server while the WAL records are flowing from server to client. We'll
> have to crack that problem for synchronous replication too, but I think
> that alone is a big enough problem to make this 9.1 material.
>

This seems to be the real sticking point then, given that the
xmin/PGPROC side on the master seems logically straightforward. For
some reason I thought the sync rep feature had the reverse message flow
already going, and that some other sort of limitation just made it
impractical to merge into the main codebase this early. My hope was
that just this particular part could get cherry-picked out of there, and
that it might even have been thought about already in that context given
the known HS keepalive "serious issue". If there was a solution or
partial solution in progress to that floating around, my thought was
that just piggybacking this extra xid info on top of it would be easy
enough.

If there's not already a standby to primary communications backchannel
implementation available that can be harvested from that work, your
suggestion that this may not be feasible at all for 9.0 seems like a
more serious concern than I had thought it was going to be.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-27 02:56:01
Message-ID:	201002270256.o1R2u1u10065@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Smith wrote:
> Heikki Linnakangas wrote:
> > One such landmine is that the keepalives need to flow from client to
> > server while the WAL records are flowing from server to client. We'll
> > have to crack that problem for synchronous replication too, but I think
> > that alone is a big enough problem to make this 9.1 material.
> >
>
> This seems to be the real sticking point then, given that the
> xmin/PGPROC side on the master seems logically straightforward. For
> some reason I thought the sync rep feature had the reverse message flow
> already going, and that some other sort of limitation just made it
> impractical to merge into the main codebase this early. My hope was
> that just this particular part could get cherry-picked out of there, and
> that it might even have been thought about already in that context given
> the known HS keepalive "serious issue". If there was a solution or
> partial solution in progress to that floating around, my thought was
> that just piggybacking this extra xid info on top of it would be easy
> enough.
>
> If there's not already a standby to primary communications backchannel
> implementation available that can be harvested from that work, your
> suggestion that this may not be feasible at all for 9.0 seems like a
> more serious concern than I had thought it was going to be.

I suspect the master could connect to the slave to pull an xid.

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-27 03:11:49
Message-ID:	4B888D75.8030006@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Stark wrote:
> Eh? That's not what I meant at all. Actually it's kind of the exact
> opposite of what I meant.
>

Sorry about that--I think we just hit one of those language usage drift
bits of confusion. "Sit in the corner" has a very negative tone to it
in US English and I interpreted your message badly as a result. A
Google search for images using that phrase will quickly show you what I
mean.

> What I meant was that your description of the "High Availability first
> and foremost" is only one possible use case. Simon in the past
> expressed the same single-minded focus on that use case. It's a
> perfectly valid use case and I would probably agree if we had to
> choose just one it would be the most important.
>

Sure, there are certainly others, and as much as possible more
flexibility here is a good thing. What I was suggesting is that if the
only good way to handle long-running queries has no choice but to
sacrifice high-availability, which is is the situation if
max_standby_delay is the approach you use, then the most obvious users
for this feature are not being well served by that situation. I would
guess a large portion of the users looking forward to Hot Standby are in
the "have an underutilized high-availability standby I'd like to use for
offloading long running reports", and if there is no way to serve them
well this feature is missing the mark a bit.

You really can't do any better without better master/standby integration
though, and as pointed out a couple of times here that was considered
and just not followed through on yet. I'm increasingly concerned that
nothing else will really do though.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-27 03:40:57
Message-ID:	407d949e1002261940n43c521d6i1bb3220997886e09@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Feb 27, 2010 at 2:43 AM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
>
> But if you're running the 8 hour report on the master right now, aren't you
> already exposed to a similar pile of bloat issues while it's going? If I
> have the choice between "sometimes queries will get canceled" vs. "sometimes
> the master will experience the same long-running transaction bloat issues as
> in earlier versions even if the query runs on the standby", I feel like
> leaning toward the latter at least leads to a problem people are used to.

If they move from running these batch queries on the master to running
them on the slave then sure, the situation will be no worse than
before.

But if they move from having a plain old PITR warm standby to having
one they can run queries on they might well assume that the big
advantage of having the standby to play with is precisely that they
can do things there that they have never been able to do on the master
previously without causing damage.

I agree that having queries randomly and unpredictably canceled is
pretty awful. My argument was that max_standby_delay should default to
infinity on the basis that any other value has to be picked based on
actual workloads and SLAs.

My feeling is that there are probably only two types of configurations
that make sense, a HA replica with a low max_standby_delay or a
reporting replica with a high max_standby_delay. Any attempt to
combine the two on the same system will only work if you know your
application well and can make compromises with both.

I would also like to be able to handle load balancing read-only
queries but that will be really tricky. You want up-to-date data and
you want to be able to run moderately complex queries. That kind of
workload may well require synchronous replication to really work
properly.

--
greg

From:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-27 03:59:41
Message-ID:	e48fc22928533daea739c14514b42bfb@commandprompt.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, 27 Feb 2010 00:43:48 +0000, Greg Stark <gsstark(at)mit(dot)edu> wrote:
> On Fri, Feb 26, 2010 at 11:56 PM, Greg Smith <greg(at)2ndquadrant(dot)com>
wrote:
>> This is also the reason why the whole "pause recovery" idea is a
>> fruitless
>> path to wander down. The whole point of this feature is that people
>> have a
>> secondary server available for high-availability, *first and foremost*,
>> but
>> they'd like it to do something more interesting that leave it idle all
>> the
>> time. The idea that you can hold off on applying standby updates for
>> long
>> enough to run seriously long reports is completely at odds with the
idea
>> of
>> high-availability.

> I want my ability to run large batch queries without any performance
> or reliability impact on the primary server.

I can use any number of other technologies for high availability.

Joshua D. Drake

--
PostgreSQL - XMPP: jdrake(at)jabber(dot)postgresql(dot)org
Consulting, Development, Support, Training
503-667-4564 - http://www.commandprompt.com/
The PostgreSQL Company, serving since 1997

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-27 04:02:24
Message-ID:	407d949e1002262002s6c0a0600pb98ed7c21cd09b7d@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Feb 26, 2010 at 9:44 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Greg Stark <gsstark(at)mit(dot)edu> writes:
>
>> What extra entries?
>
> Locks, just for starters. I haven't read enough of the code yet to know
> what else Simon added. In the past it's not been necessary to record
> any transient information in WAL, but now we'll have to.

Haven't we been writing locks to the WAL since two-phase commit?

--
greg

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc:	Greg Stark <gsstark(at)mit(dot)edu>, Bruce Momjian <bruce(at)momjian(dot)us>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-27 04:31:12
Message-ID:	4B88A010.6000100@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Joshua D. Drake wrote:
> On Sat, 27 Feb 2010 00:43:48 +0000, Greg Stark <gsstark(at)mit(dot)edu> wrote:
>
>> I want my ability to run large batch queries without any performance
>> or reliability impact on the primary server.
>>
>
> +1
>
> I can use any number of other technologies for high availability.
>

Remove "must be an instant-on failover at the same time" from the
requirements and you don't even need 9.0 to handle that, this has been a
straightforward to solve problem since 8.2. It's the combination of HA
and queries that make things hard to do.

If you just want batch queries on another system without being concerned
about HA at the same time, the first option is to just fork the base
backup and WAL segment delivery to another server and run queries there.

Some simple filesystem snapshot techniques will also suffice to handle
it all on the same standby. Stop warm standby recovery, snapshot,
trigger the server, run your batch job; once finished, rollback to the
snapshot, grab the latest segment files, and resume standby catchup.
Even the lame Linux LVM snapshot features can handle that job--one of my
coworkers has the whole thing scripted even this is so common.

And if you have to go live because there's a failover, you're back to
the same "cold standby" situation a large max_standby_delay puts you at,
so it's not even very different from what you're going to get in 9.0 if
this is your priority mix. The new version is just lowering the
operational complexity involved.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-27 04:38:38
Message-ID:	4B88A1CE.1070906@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Stark wrote:
> But if they move from having a plain old PITR warm standby to having
> one they can run queries on they might well assume that the big
> advantage of having the standby to play with is precisely that they
> can do things there that they have never been able to do on the master
> previously without causing damage.
>

Just not having the actual query running on the master is such a
reduction in damage that I think it's delivering the essence of what
people are looking for regardless. That it might be possible in some
cases to additionally avoid the overhead that comes along with any
long-running query is a nice bonus, and it's great the design allows for
that possibility. But if that's only possible with risk, heavy
tweaking, and possibly some hacks, I'm not sure that's making the right
trade-offs for everyone.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us

From:	Aidan Van Dyk <aidan(at)highrise(dot)ca>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Greg Stark <gsstark(at)mit(dot)edu>, Bruce Momjian <bruce(at)momjian(dot)us>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-27 04:53:48
Message-ID:	20100227045348.GK7843@oak.highrise.ca
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

* Greg Smith <greg(at)2ndquadrant(dot)com> [100226 23:39]:

> Just not having the actual query running on the master is such a
> reduction in damage that I think it's delivering the essence of what
> people are looking for regardless. That it might be possible in some
> cases to additionally avoid the overhead that comes along with any
> long-running query is a nice bonus, and it's great the design allows for
> that possibility. But if that's only possible with risk, heavy
> tweaking, and possibly some hacks, I'm not sure that's making the right
> trade-offs for everyone.

Would we (ya, the royal we) be willing to say that if you want the
benifit of removing the MVCC overhead of long-running queries you need
to run PITR backup/archive recovery, and if you want SR, you get a
closed-loop master-follows-save-xmin behaviour?

--
Aidan Van Dyk Create like a god,
aidan(at)highrise(dot)ca command like a king,
http://www.highrise.ca/ work like a slave.

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Aidan Van Dyk <aidan(at)highrise(dot)ca>
Cc:	Greg Stark <gsstark(at)mit(dot)edu>, Bruce Momjian <bruce(at)momjian(dot)us>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-27 05:08:35
Message-ID:	4B88A8D3.2070406@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Aidan Van Dyk wrote:
> Would we (ya, the royal we) be willing to say that if you want the
> benifit of removing the MVCC overhead of long-running queries you need
> to run PITR backup/archive recovery, and if you want SR, you get a
> closed-loop master-follows-save-xmin behaviour?
>

To turn that question around a little, I think it's reasonable to say
that closed-loop master-follows-slave-xmin behavior is only practical to
consider implementing with SR--and even there, it should be optional
rather than required until there's more field experience on the whole
thing. Whether it's the default or not could take a bit of debate to
sort out too.

If you think of it in those terms, the idea that "you need to run PITR
backup/archive recovery" to not get that behavior isn't an important
distinction anymore. If you run SR with the option enabled you could
get it, any other setup and you won't.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-27 06:53:24
Message-ID:	4B88C164.6070205@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Stark wrote:
> On Fri, Feb 26, 2010 at 9:19 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> There's *definitely* not going to be enough information in the WAL
>> stream coming from a master that doesn't think it has HS slaves.
>> We can't afford to record all that extra stuff in installations for
>> which it's just useless overhead. BTW, has anyone made any attempt
>> to measure the performance hit that the patch in its current form is
>> creating via added WAL entries?
>
> What extra entries?

* An xact-assignment record is written every PGPROC_MAX_CACHED_SUBXIDS
(= 64) subtransaction ids assigned to a single top-level transaction.

* A running-xacts record is written at every online checkpoint

* A btree-reuse-page record is written whenever a dead b-tree page is
recycled

* A vacuum cleanup-info record is written once per VACUUM of a table

* A standby-lock record is written for each AccessExclusiveLock acquired.

Am I missing something?

I doubt any of these are noticeable, though I don't think anyone has
measured it.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-27 06:59:10
Message-ID:	4B88C2BE.5070004@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Josh Berkus wrote:
>> That is exactly the core idea I was trying to suggest in my rambling
>> message. Just that small additional bit of information transmitted and
>> published to the master via that route, and it's possible to optimize
>> this problem in a way not available now. And it's a way that I believe
>> will feel more natural to some users who may not be well served by any
>> of the existing tuning possibilities.
>
> Well, if both you and Tom think it would be relatively easy (or at least
> easier that continuing to pursue query cancel troubleshooting), then
> please start coding it. It was always a possible approach, we just
> collectively thought that query cancel would be easier.

You still need query cancels. A feedback loop just makes it happen less
frequently.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-27 09:07:32
Message-ID:	4B88E0D4.3040506@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Dimitri Fontaine wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
>> Doesn't the system already adjust the delay based on the length of slave
>> transactions, e.g. max_standby_delay. It seems there is no need for a
>> user switch --- just max_standby_delay really high.
>
> Well that GUC looks like it allows to set a compromise between HA and
> reporting, not to say "do not ever give the priority to the replay while
> I'm running my reports". At least that's how I understand it.

max_standby_delay=-1 does that. The documentation needs to be updated to
reflect that, it currently says:

> There is no wait-forever setting because of the potential for deadlock which that setting would introduce. This parameter can only be set in the postgresql.conf file or on the server command line.

but that is false, -1 means wait forever. Simon removed that option at
one point, but it was later put back and apparently the documentation
was never updated.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-27 09:33:34
Message-ID:	4B88E6EE.2000604@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas wrote:
> Dimitri Fontaine wrote:
>> Bruce Momjian <bruce(at)momjian(dot)us> writes:
>>> Doesn't the system already adjust the delay based on the length of slave
>>> transactions, e.g. max_standby_delay. It seems there is no need for a
>>> user switch --- just max_standby_delay really high.
>> Well that GUC looks like it allows to set a compromise between HA and
>> reporting, not to say "do not ever give the priority to the replay while
>> I'm running my reports". At least that's how I understand it.
>
> max_standby_delay=-1 does that. The documentation needs to be updated to
> reflect that, it currently says:
>
>> There is no wait-forever setting because of the potential for deadlock which that setting would introduce. This parameter can only be set in the postgresql.conf file or on the server command line.
>
> but that is false, -1 means wait forever. Simon removed that option at
> one point, but it was later put back and apparently the documentation
> was never updated.

I've put back the mention of max_standby_delay=-1 option in the docs.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-27 14:34:08
Message-ID:	201002271434.o1REY8m11296@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Smith wrote:
> Joshua D. Drake wrote:
> > On Sat, 27 Feb 2010 00:43:48 +0000, Greg Stark <gsstark(at)mit(dot)edu> wrote:
> >
> >> I want my ability to run large batch queries without any performance
> >> or reliability impact on the primary server.
> >>
> >
> > +1
> >
> > I can use any number of other technologies for high availability.
> >
>
> Remove "must be an instant-on failover at the same time" from the
> requirements and you don't even need 9.0 to handle that, this has been a
> straightforward to solve problem since 8.2. It's the combination of HA
> and queries that make things hard to do.
>
> If you just want batch queries on another system without being concerned
> about HA at the same time, the first option is to just fork the base
> backup and WAL segment delivery to another server and run queries there.

That is a lot of administrative overhead. It is hard to say it is
equivalent to HS.

From:	Tatsuo Ishii <ishii(at)postgresql(dot)org>
To:	tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc:	gsstark(at)mit(dot)edu, greg(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-27 14:56:51
Message-ID:	20100227.235651.77060715.t-ishii@sraoss.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> I think that what we are going to have to do before we can ship 9.0
> is rip all of that stuff out and replace it with the sort of closed-loop
> synchronization Greg Smith is pushing. It will probably be several
> months before everyone is forced to accept that, which is why 9.0 is
> not going to ship this year.

Oh, 9.0 will not be shipped in 2010? You guys share same opinion as
Tom?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-27 20:31:06
Message-ID:	4B89810A.3030604@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg,

> If you think of it in those terms, the idea that "you need to run PITR
> backup/archive recovery" to not get that behavior isn't an important
> distinction anymore. If you run SR with the option enabled you could
> get it, any other setup and you won't.

+1.

I always expected that we'd get this kind of behavior with "synch" in
9.1. I can see that there are two desired modes of behaviour depending
on what the replication config is:

1) Master full-speed, degraded operation on slaves: this is the current
wal_standby_delay approach. It has the advantage of supporting possibly
hundreds of slaves, and certainly dozens.

2) Master burdened, full operation on slaves: this is the
publish-xmin-back-to-master approach, which IIRC the core team first
discussed at pgCon 2008 before Simon started work, and which you and Tom
seem to think can be done soon.

I can see people wanting to use either mode depending on their use-case.
Or, for that matter, using both modes to different slaves.

Now that I think about it, the xmin thing really doesn't seem
conceptually difficult. If the slave just opens a 2nd, special query
connection back to the master and publishes its oldest xmin there, as
far as the master is concerned, it's just another query backend.
Could it be that easy?

Also, I'm serious about what I suggested earlier for "delay" mode. We
should have an option for cancelled queries to be immediately retried,
if that's feasible. It would turn something which is now a major
application design issue (lots of cancelled queries) into just degrated
performance.

Overall, though, I'd say that getting 9.0 out the door relatively
on-time is more important than getting it perfect. "Release early,
release often" isn't just a mantra; it's a very good idea if you want
your project to keep improving and not bog down and fall apart.

--Josh Berkus

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-27 22:09:59
Message-ID:	4B899837.7070807@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Josh Berkus wrote:

> Now that I think about it, the xmin thing really doesn't seem
> conceptually difficult. If the slave just opens a 2nd, special query
> connection back to the master and publishes its oldest xmin there, as
> far as the master is concerned, it's just another query backend.
> Could it be that easy?
>

Something just like that is in fact already suggested as a workaround in
the Hot Standby manual:

"The first option is to connect to the primary server and keep a query
active for as long as needed to run queries on the standby. This
guarantees that a WAL cleanup record is never generated and query
conflicts do not occur, as described above. This could be done using
contrib/dblink and pg_sleep(), or via other mechanisms."

And the idea of doing it mainly in client land has its attractions.

The main reason I wandered toward asking about it in the context of SR
is that there's already this open "Standby delay on idle system" issue
with Hot Standby, and the suggested resolution for that problem involves
publishing keep-alive data with timestamps over SR. While all these
problems and potential solutions have been floating around for a long
time, as you pointed out, the little flash of insight I had here was
that it's possible to bundle these two problems together with a combined
keep-alive timestamp+xmin message that goes in both directions. That
removes one serious Hot Standby issue altogether, and adds an additional
conflict avoidance mechanism for people who want to enable it, all with
something that needs to get done sooner or later anyway for sync rep.

The part I still don't have good visibility on is how much of the
necessary SR infrastructure needed to support this communications
channel is already available in some form. I had though the walsender
on the master was already receiving messages sometimes from the
walreceiver on the standby, but I'm getting the impression from Heikki's
comments that this not the case at all yet.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-27 22:26:22
Message-ID:	4B899C0E.9080303@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> The part I still don't have good visibility on is how much of the
> necessary SR infrastructure needed to support this communications
> channel is already available in some form. I had though the walsender
> on the master was already receiving messages sometimes from the
> walreceiver on the standby, but I'm getting the impression from Heikki's
> comments that this not the case at all yet.

I don't think asking for a 2nd connection back from the standby to the
master would be horrible for 9.0. I think it would be quite reasonable,
actually; even with 2 connections per slave, you could still have quite
a few slaves before the # of connections bogged down the master.

--Josh

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Greg Stark <gsstark(at)mit(dot)edu>, Bruce Momjian <bruce(at)momjian(dot)us>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-27 22:28:49
Message-ID:	4B899CA1.4080400@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> Thank you for combining a small personal attack with a selfish
> commentary about how yours is the only valid viewpoint. Saves me a lot
> of trouble replying to your messages, can just ignore them instead if
> this is how you're going to act.

Hey, take it easy! I read Stark's post as tongue-in-cheek, which I
think it was.

Though, Stark, if you're going to be flip, I'd suggest using a smiley
next time.

--Josh

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Greg Stark <gsstark(at)mit(dot)edu>, Bruce Momjian <bruce(at)momjian(dot)us>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-27 23:49:09
Message-ID:	4B89AF75.8070000@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Josh Berkus wrote:
> Hey, take it easy! I read Stark's post as tongue-in-cheek, which I
> think it was.
>

Yeah, I didn't get that. We've already exchanged mutual off-list
apologies for the misunderstanding in both directions, I stopped just
short of sending flowers.

I did kick off this discussion with noting a clear preference this not
wander into any personal finger-pointing. And I am far too displeased
with the technical situation here to have much of a sense of humor left
about it either.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Greg Stark <gsstark(at)mit(dot)edu>, Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-28 01:00:52
Message-ID:	603c8f071002271700t1bffd309r8b8838189d4c827e@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Feb 26, 2010 at 1:53 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Greg Stark <gsstark(at)mit(dot)edu> writes:
>> In the model you describe any long-lived queries on the slave cause
>> tables in the master to bloat with dead records.
>
> Yup, same as they would do on the master.
>
>> I think this model is on the roadmap but it's not appropriate for
>> everyone and I think one of the benefits of having delayed it is that
>> it forces us to get the independent model right before throwing in
>> extra complications. It would be too easy to rely on the slave
>> feedback as an answer for hard questions about usability if we had it
>> and just ignore the question of what to do when it's not the right
>> solution for the user.
>
> I'm going to make an unvarnished assertion here. I believe that the
> notion of synchronizing the WAL stream against slave queries is
> fundamentally wrong and we will never be able to make it work.
> The information needed isn't available in the log stream and can't be
> made available without very large additions (and consequent performance
> penalties). As we start getting actual beta testing we are going to
> uncover all sorts of missed cases that are not going to be fixable
> without piling additional ugly kluges on top of the ones Simon has
> already crammed into the system. Performance and reliability will both
> suffer.
>
> I think that what we are going to have to do before we can ship 9.0
> is rip all of that stuff out and replace it with the sort of closed-loop
> synchronization Greg Smith is pushing. It will probably be several
> months before everyone is forced to accept that, which is why 9.0 is
> not going to ship this year.

Somewhat unusually for me, I haven't been able to keep up with my
email over the last few days, so I'm weighing in on this one a bit
late. It seems to me that if we're forced to pass the xmin from the
slave back to the master, that would be a huge step backward in terms
of both scalability and performance, so I really hope it doesn't come
to that. I wish I understood better exactly what you mean by "the
notion of synchronizing the WAL stream against slave queries" and why
you don't think it will work. Can you elaborate?

...Robert

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-28 02:14:28
Message-ID:	201002280214.o1S2ES717980@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Smith wrote:
> Josh Berkus wrote:
>
> > Now that I think about it, the xmin thing really doesn't seem
> > conceptually difficult. If the slave just opens a 2nd, special query
> > connection back to the master and publishes its oldest xmin there, as
> > far as the master is concerned, it's just another query backend.
> > Could it be that easy?
> >
>
> Something just like that is in fact already suggested as a workaround in
> the Hot Standby manual:
>
> "The first option is to connect to the primary server and keep a query
> active for as long as needed to run queries on the standby. This
> guarantees that a WAL cleanup record is never generated and query
> conflicts do not occur, as described above. This could be done using
> contrib/dblink and pg_sleep(), or via other mechanisms."

I am unclear how you would easily advance the snapshot as each query
completes on the slave.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-28 05:28:17
Message-ID:	4B89FEF1.9000009@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Bruce Momjian wrote:
>> "The first option is to connect to the primary server and keep a query
>> active for as long as needed to run queries on the standby. This
>> guarantees that a WAL cleanup record is never generated and query
>> conflicts do not occur, as described above. This could be done using
>> contrib/dblink and pg_sleep(), or via other mechanisms."
>>
>
> I am unclear how you would easily advance the snapshot as each query
> completes on the slave.
>

The idea of the workaround is that if you have a single long-running
query to execute, and you want to make sure it doesn't get canceled
because of a vacuum cleanup, you just have it connect back to the master
to keep an open snapshot the whole time. That's basically the same idea
that vacuum_defer_cleanup_age implements, except you don't have to
calculate a value--you just hold open the snapshot to do it.

When that query ended, its snapshot would be removed, and then the
master would advance to whatever the next latest one is. Nothing
fancier than that. The only similarity is that if you made every query
that happened on the standby do that, it would effectively be the same
behavior I'm suggesting could be available via the standby->master xmin
publication.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-28 06:01:49
Message-ID:	407d949e1002272201r7704541at97e7bf6623ec662e@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Feb 28, 2010 at 5:28 AM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> The idea of the workaround is that if you have a single long-running query
> to execute, and you want to make sure it doesn't get canceled because of a
> vacuum cleanup, you just have it connect back to the master to keep an open
> snapshot the whole time. That's basically the same idea that
> vacuum_defer_cleanup_age implements, except you don't have to calculate a
> value--you just hold open the snapshot to do it.

This sounds like it would require a separate connection for each
client on the replica. That would be a pretty big burden for the
master.

Also, I'm not sure this actually works. When your client makes this
additional connection to the master it's connecting at some
transaction in the future from the slave's point of view. The master
could have already vacuumed away some record which the snapshot the
client gets on the slave will have in view. Even if you defer taking
the snapshot on the slave until after connecting to the master it's
still possibly "in the past" compared to the xmin on the master.

I think to make this work you would have to connect to the master,
establish a snapshot, then fetch pg_current_xlog_location(), then poll
the slave and wait until it reaches that same position -- and only
then perform your query taking care to establish a fresh snapshot for
it such as by starting a new transaction on the slave. That's a lot of
effort to go to.

Still it's a handy practical trick even if it isn't 100% guaranteed to
work. But I don't think it provides the basis for something we can
bake in.

--
greg

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-28 06:07:20
Message-ID:	4B8A0818.5000100@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas wrote:
> It seems to me that if we're forced to pass the xmin from the
> slave back to the master, that would be a huge step backward in terms
> of both scalability and performance, so I really hope it doesn't come
> to that.

Not forced to--have the option of. There are obviously workloads where
you wouldn't want this. At the same time, I think there are some pretty
common ones people are going to expect HS+SR to work on transparently
where this would obviously be the preferred trade-off to make, were it
available as one of the options. The test case I put together shows an
intentionally pathological but not completely unrealistic example of
such a workload.

> I wish I understood better exactly what you mean by "the
> notion of synchronizing the WAL stream against slave queries" and why
> you don't think it will work. Can you elaborate?
>

There's this constant WAL stream coming in from the master to the
slave. Each time the slave is about to apply a change from that stream,
it considers "will this disrupt one of the queries I'm already
executing?". If so, it has to make a decision about what to do; that's
where the synchronization problem comes from.

The current two options are "delay applying the change", at which point
the master and standby will drift out of sync until the query ends and
it can catch back up, or "cancel the query". There are tunables for
each of these, and they all seem to work fine (albeit without too much
testing in the field yet). My concern is that the tunable that tries to
implement the other thing you might want to optimize for--"avoid letting
the master generate WAL entires that are the most likely ones to
conflict"--just isn't very usable in its current form.

Tom and I don't see completely eye to eye on this, in that I'm not so
sure the current behaviors are "fundamentally wrong and we will never be
able to make [them] work". If that's really the case, you may not ever
get the scalability/performance results you're hoping for from this
release, and really we're all screwed if those are the only approaches
available.

What I am sure of is that a SR-based xmin passing approach is simpler,
easier to explain, more robust for some common workloads, and less
likely to give surprised "wow, I didn't think *that* would cancel my
standby query" reports from the field than any way you can configure Hot
Standby alone right now. And since I never like to bet against Tom's
gut feel, having it around as a "plan B" in case he's right about an
overwhelming round of bug reports piling up against the
max_standby_delay etc. logic doesn't hurt either.

I spent a little time today seeing if there was any interesting code I
might steal from the early "synchrep" branch at
http://git.postgresql.org/gitweb?p=users/fujii/postgres.git;a=summary ,
but sadly when I tried to rebase that against the master to separate out
just the parts unique to it the merge conflicts were overwhelming. I
hate getting beaten by merge bitrot even when Git is helping.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-28 06:18:00
Message-ID:	4B8A0A98.8060107@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Stark wrote:
> On Sun, Feb 28, 2010 at 5:28 AM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
>
>> The idea of the workaround is that if you have a single long-running query
>> to execute, and you want to make sure it doesn't get canceled because of a
>> vacuum cleanup, you just have it connect back to the master to keep an open
>> snapshot the whole time.
> Also, I'm not sure this actually works. When your client makes this
> additional connection to the master it's connecting at some
> transaction in the future from the slave's point of view. The master
> could have already vacuumed away some record which the snapshot the
> client gets on the slave will have in view.

Right, and there was an additional comment in the docs alluding to some
sleep time on the master that intends to try and improve thins. If you
knew how long archive_timeout was you could try to sleep longer than it
to try and increase your odds of avoiding an ugly spot. But there are
race conditions galore possible here, particularly if your archiver or
standby catchup is backlogged.

> Still it's a handy practical trick even if it isn't 100% guaranteed to
> work. But I don't think it provides the basis for something we can
> bake in.
>

Agreed on both counts, which is why it's in the current docs as a
workaround people can consider, but not what I've been advocating as the
right way to proceed.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-28 13:54:48
Message-ID:	407d949e1002280554s5d4e1fe3y29ab64e27dd4de8@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Feb 28, 2010 at 6:07 AM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> Not forced to--have the option of. There are obviously workloads where you
> wouldn't want this. At the same time, I think there are some pretty common
> ones people are going to expect HS+SR to work on transparently where this
> would obviously be the preferred trade-off to make, were it available as one
> of the options. The test case I put together shows an intentionally
> pathological but not completely unrealistic example of such a workload.

Well if we're forced to eventually have both then it kind of takes the
wind out of Tom's arguments. We had better get both features working
so it becomes only a question of which is worth doing first and which
can be held off. Since there aren't any actual bugs in evidence for
the current setup and we already have it that's a pretty easy
decision.

> What I am sure of is that a SR-based xmin passing approach is simpler,
> easier to explain, more robust for some common workloads, and less likely to
> give surprised "wow, I didn't think *that* would cancel my standby query"
> reports from the field

Really? I think we get lots of suprised wows from the field from the
idea that a long-running read-only query can cause your database to
bloat. I think the only reason that's obvious to us is that we've been
grappling with that problem for so long.

> And since I never like to bet against Tom's gut feel, having it
> around as a "plan B" in case he's right about an overwhelming round of bug
> reports piling up against the max_standby_delay etc. logic doesn't hurt
> either.

Agreed. Though I think it'll be bad in that case even if we have a
plan B. It'll mean no file-based log shipping replicas and no
guarantee that what you run on the standby can't affect the master --
which is a pretty nice guarantee. It'll also mean it'll be much more
fragile against network interruptions.

--
greg

From:	Joachim Wieland <joe(at)mcknight(dot)de>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-28 15:56:28
Message-ID:	dc7b844e1002280756m53f6371g8d66440c52ea2ca0@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Feb 28, 2010 at 2:54 PM, Greg Stark <gsstark(at)mit(dot)edu> wrote:
> Really? I think we get lots of suprised wows from the field from the
> idea that a long-running read-only query can cause your database to
> bloat. I think the only reason that's obvious to us is that we've been
> grappling with that problem for so long.

It seems to me that the scenario that you are looking at is one where
people run different queries with and without HS, i.e. that they will
run longer read-only queries than now once they have HS. I don't think
that is the case. If it isn't you cannot really speak of a master
"bloat".

Instead, I assume that most people who will grab 9.0 and use HS+SR do
already have a database with a certain query profile. Now with HS+SR
they will try to put the most costly and longest read-only queries to
the standby but in the end will run the same number of queries with
the same overall complexity.

Now let's take a look at both scenarios from the administrators' point of view:

1) With the current implementation they will see better performance on
the master and more aggressive vacuum (!), since they have less
long-running queries now on the master and autovacuum can kick in and
clean up with less delay than before. On the other hand their queries
on the standby might fail and they will start thinking that this HS+SR
feature is not as convincing as they thought it was... Next step for
them is to take the documentation and study it for a few days to learn
all about vacuum, different delays, transaction ids and age parameters
and experiment a few weeks until no more queries fail - for a while...
But they can never be sure... In the end they might also modify the
parameters in the wrong direction or overshoot because of lack of time
to experiment and lose another important property without noticing
(like being as close as possible to the master).

2) On the other hand if we could ship 9.0 with the xmin-propagation
feature, people would still see a better performance and have a hot
standby system but this time without query cancellations. Again: the
read-only queries that will be processed by the HS in the future are
being processed by the master today anyway, so why should it get
worse? The first impression will be that it just works nicely out of
the box, is easy to set up and has no negative effect (query
cancellation) that has not already shown up before (vacuum lag).

I guess that most people will just run fine with this setup and never
get to know about the internals. Of course we should still offer an
expert mode where you can turn all kinds of knobs and where you can
avoid the vacuum dependency but it would be nice if this could be the
expert mode only. Tuning this is highly installation specific and you
need to have a deep understanding of how PostgreSQL and HS work
internally and what you actually want to achieve...

> Agreed. Though I think it'll be bad in that case even if we have a
> plan B. It'll mean no file-based log shipping replicas and no
> guarantee that what you run on the standby can't affect the master --
> which is a pretty nice guarantee. It'll also mean it'll be much more
> fragile against network interruptions.

Regarding the network interruptions... in reality if you have network
interruptions of several minutes between your primary and your
standby, you have worse problems anyway... If the standby does not
renew its xmin for n seconds, log a message and just go on...

Joachim

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-28 19:47:38
Message-ID:	4B8AC85A.6090807@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

All,

First, from the nature of the arguments, we need to eventually have both
versions of SR: delay-based and xmin-pub. And it would be fantastic if
Greg Smith and Tom Lane could work on xmin-pub to see if we can get it
ready as well.

I also think, based on the discussion and Greg's test case, that we
could do two things which would make the shortcomings of delay-based SR
a vastly better experience for users:

1) Automated retry of cancelled queries on the slave. I have no idea
how hard this would be to implement, but it makes the difference between
writing lots of exception-handling code for slave connections
(unacceptable) to just slow response times on the slave (acceptable).

2) A more usable vacuum_defer_cleanup_age. If it was feasible for a
user to configure the master to not vacuum records less than, say, 5
minutes dead, then that would again offer the choice to the user of
slightly degraded performance on the master (acceptable) vs. lots of
query cancel (unacceptable). I'm going to test Greg's case with
vacuum_cleanup_age used fairly liberally to see if this approach has merit.

Why do I say that "lots of query cancel" is "unacceptable"? For the
simple reason that one cannot run the same application code against an
HS+SR cluster with lots of query cancel as one runs against a standalone
database. And if that's true, then the main advantage of HS+SR over
Slony and Londiste is gone. MySQL took great pains to make sure that
you could run the same code against replicated MySQL as standalone, and
that was based on having a fairly intimate relationship with their users
(at the time, anyway).

Another thing to keep in mind in these discussions is the
inexpensiveness of servers today. This means that, if slaves have poor
performance, that's OK; one can always spin up more slaves. But if each
slave imposes a large burden on the master, then that limits your
scalability.

--Josh Berkus

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Joachim Wieland <joe(at)mcknight(dot)de>
Cc:	Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-28 20:21:23
Message-ID:	4B8AD043.4070208@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Joachim Wieland wrote:
> Instead, I assume that most people who will grab 9.0 and use HS+SR do
> already have a database with a certain query profile. Now with HS+SR
> they will try to put the most costly and longest read-only queries to
> the standby but in the end will run the same number of queries with
> the same overall complexity.
>

This is a nice summary of the primary use-case I am trying to optimize
usability for, because I know for a fact there's a stack of pent-up
requests for exactly this form of improvement from existing warm standby
users. And your subsequent discussion of how administrators will react
in each of the possible configurations here matches my own concerns. I
would highly recommend anyone who feels this is not a critical feature
to fix carefully read Joachim's message from an advocacy perspective,
that's a better user-oriented prediction than mine of exactly how this
is going to play out in the field post-release.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-28 20:48:51
Message-ID:	4B8AD6B3.1000300@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Josh Berkus wrote:
> First, from the nature of the arguments, we need to eventually have both
> versions of SR: delay-based and xmin-pub. And it would be fantastic if
> Greg Smith and Tom Lane could work on xmin-pub to see if we can get it
> ready as well.
>

As I see it, the main technical obstacle here is that a subset of a
feature already on the SR roadmap needs to get built earlier than
expected to pull this off. I don't know about Tom, but I have no
expectation it's possible for me to get up to speed on that code fast
enough to contribute anything there. I expect the thing I'd be most
productive at as far as moving the release forward is to continue
testing this pair of features looking for rough edges, which is what I
have planned for the next month.

I'm not even close to finished with generating test cases specifically
probing for bad behavior suspected after a look the implementation
details--this is just what I came up with in my first week of that.
Count me in for more testing, but out for significant development here.
It's not what I've got my time allocated for because it's not where I
think I'll be most productive.

> 2) A more usable vacuum_defer_cleanup_age. If it was feasible for a
> user to configure the master to not vacuum records less than, say, 5
> minutes dead, then that would again offer the choice to the user of
> slightly degraded performance on the master (acceptable) vs. lots of
> query cancel (unacceptable). I'm going to test Greg's case with
> vacuum_cleanup_age used fairly liberally to see if this approach has merit.
>

I've been down that road and it leads quickly to the following
question: "how can I tell how old in time-based units an xid is?" If
there were an easy answer to that question, vacuum_defer_cleanup_age
would already be set in time units. It's the obvious UI to want, it's
just not obvious how to build it internally. Maybe I missed something,
but my guess is that vacuum_defer_cleanup_age is already as good as it's
going to get.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us

From:	Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-28 20:54:52
Message-ID:	m2fx4lgjxf.fsf@hi-media.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Josh Berkus <josh(at)agliodbs(dot)com> writes:
> 2) A more usable vacuum_defer_cleanup_age. If it was feasible for a
> user to configure the master to not vacuum records less than, say, 5
> minutes dead, then that would again offer the choice to the user of
> slightly degraded performance on the master (acceptable) vs. lots of
> query cancel (unacceptable). I'm going to test Greg's case with
> vacuum_cleanup_age used fairly liberally to see if this approach has
> merit.

I think that to associate any time based interval notion with the XID
flow, you need a ticker. We already took the txid and txid_snapshot
types and functions from Skytools, which took them from Slony.

Maybe we could consider borrowing pgqd, the C version of the ticker, for
being able to specify in human time how long a dead transaction is
allowed to remain in the heap?

http://github.com/markokr/skytools-dev/tree/master/sql/ticker/

Regards,
--
dim

From:	Joachim Wieland <joe(at)mcknight(dot)de>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-28 21:42:00
Message-ID:	dc7b844e1002281342j3ff45bf0q56c148fb980947e3@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Feb 28, 2010 at 8:47 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> 1) Automated retry of cancelled queries on the slave. I have no idea
> how hard this would be to implement, but it makes the difference between
> writing lots of exception-handling code for slave connections
> (unacceptable) to just slow response times on the slave (acceptable).

We're not only canceling queries, we are effectively canceling
transactions. It seems quite impossible to repeat all queries from a
transaction that has started in the past. One query might be or
include the result of a previous query and as the data we see now has
changed since then, the client might now want to execute a different
query when it gets a different result out of a previous query...

And even if it was possible, how often would you retry? You still have
no guarantee that your query succeeds the second time. I'd claim that
if a query failed once, chances are even higher that it fails again
than that it succeeds the second time. Moreover if you continue to
repeat the query and if queries come in at a certain rate, you need to
process more and more queries on the slave which will not really help
other queries to finish in time nor will it be beneficial for the
throughput of the system as a whole...

I fully agree with what you say about user expectations: We need to
assume that many programs are not prepared for failures of "simple"
read-only queries because in the past they have always worked...

> Another thing to keep in mind in these discussions is the
> inexpensiveness of servers today. This means that, if slaves have poor
> performance, that's OK; one can always spin up more slaves. But if each
> slave imposes a large burden on the master, then that limits your
> scalability.

The burden of the xmin-publication feature is not the number of
slaves, it's just the longest running queries on whatever slave they
are. So your argument applies to both cases... To minimize the burden
on the master, get additional slaves so that you can run your most
expensive queries in a shorter time :-)

Joachim

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Greg Stark <gsstark(at)mit(dot)edu>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-28 21:58:51
Message-ID:	1267394331.20134.3114.camel@ebony
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, 2010-02-26 at 16:44 -0500, Tom Lane wrote:
> Greg Stark <gsstark(at)mit(dot)edu> writes:
> > On Fri, Feb 26, 2010 at 9:19 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> There's *definitely* not going to be enough information in the WAL
> >> stream coming from a master that doesn't think it has HS slaves.
> >> We can't afford to record all that extra stuff in installations for
> >> which it's just useless overhead. BTW, has anyone made any attempt
> >> to measure the performance hit that the patch in its current form is
> >> creating via added WAL entries?
>
> > What extra entries?
>
> Locks, just for starters. I haven't read enough of the code yet to know
> what else Simon added. In the past it's not been necessary to record
> any transient information in WAL, but now we'll have to.

There is room for technical confusion here, so I'll just add some info.

There was/is potential for performance hit because of the volume of
additional WAL *and* the processing overhead from that additional WAL.
As Heikki points out these turn out to be minimal, though this has been
by careful design.

There is also potential for a performance hit because incoming cleanup
records may conflict with currently executing queries. If we knew for
certain that the master was not sending any cleanup records that would
effect current standby queries we could avoid that overhead altogether.
That's a good reason for integrating a solution.

AccessExclusiveLock lock records are nothing at all to do with that.
They exist simply to prevent obvious correctness issues such as somebody
reading a file while it is being deleted.

--
Simon Riggs www.2ndQuadrant.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-28 22:15:18
Message-ID:	1267395318.20134.3178.camel@ebony
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, 2010-02-26 at 03:33 -0500, Greg Smith wrote:

> I really hope this discussion can say focused on if and how it's
> possible to improve this area, with the goal being to deliver a product
> everyone can be proud of with the full feature set that makes this next
> release a killer one. The features that have managed to all get into
> this release already are fantastic, everyone who contributed should be
> proud of that progress, and it's encouraging that the alpha4 date was
> nailed. It would be easy to descend into finger-pointing for why
> exactly this particular problem is only getting more visibility now, or
> into schedule-oriented commentary suggesting it must be ignored because
> it's too late to do anything about it. I hope everyone appreciates
> wandering that way will not help make PostgreSQL 9.0 a better release.
> This issue is so easy to encounter, and looks so bad when it happens,
> that I feel it could easily lead to an embarrassing situation for the
> community if something isn't done about it before release.

Thanks Greg. It's a great relief for me to hear someone else say this
and to watch a discussion about this important issue unfold.

--
Simon Riggs www.2ndQuadrant.com

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-02-28 22:38:28
Message-ID:	4B8AF064.3060809@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg, Joachim,

> As I see it, the main technical obstacle here is that a subset of a
> feature already on the SR roadmap needs to get built earlier than
> expected to pull this off. I don't know about Tom, but I have no
> expectation it's possible for me to get up to speed on that code fast
> enough to contribute anything there. I expect the thing I'd be most
> productive at as far as moving the release forward is to continue
> testing this pair of features looking for rough edges, which is what I
> have planned for the next month.

That's OK with me. I thought you were saying that xmin-pub was going to
be easier than expected. Per my other e-mails, I think that we should
be shooting for "good enough, on time" for 9.0., rather than "perfect".
We can't ever get to "perfect" if we don't release software.

Quite frankly, simply telling people that "long-running queries on the
slave tend not to be effective, wait for 9.1" is a possibility. If you
consider the limitations and silent failures associated with MySQL
replication, let alone the issues with other Postgres solutions or the
replication of some of the nosql databases, "no long-running queries" is
a positively straightforwards restriction.

HS+SR is still a tremendous improvement over the options available
previously. We never thought it was going to work for everyone
everywhere, and shouldn't let our project's OCD tendencies run away from us.

> I've been down that road and it leads quickly to the following
> question: "how can I tell how old in time-based units an xid is?" If
> there were an easy answer to that question, vacuum_defer_cleanup_age
> would already be set in time units. It's the obvious UI to want, it's
> just not obvious how to build it internally. Maybe I missed something,
> but my guess is that vacuum_defer_cleanup_age is already as good as it's
> going to get.

Well, we could throw this on the user if we could get them some
information on how to calculate that number. For example, some way for
them to calculate the number of XIDs per minute via a query, and then
set vacuum_defer_cleanup_age appropriately on the master. Sure, it's
clunky, but we've already warned people that 9.0 will be clunky and hard
to administer. And it's no worse than setting FSM_pages used to be.

However, first we need to test that setting vacuum_defer_cleanup_age
actually benefits query cancel issues.

> We're not only canceling queries, we are effectively canceling
> transactions. It seems quite impossible to repeat all queries from a
> transaction that has started in the past. One query might be or
> include the result of a previous query and as the data we see now has
> changed since then, the client might now want to execute a different
> query when it gets a different result out of a previous query...

Sure, except that I don't expect people to be using explicit
transactions as much on the slaves, since they are read-only anyway and
can't even create temp tables. So having the retry not retry if there is
an explicit transaction would be an OK option.

> And even if it was possible, how often would you retry? You still have
> no guarantee that your query succeeds the second time. I'd claim that
> if a query failed once, chances are even higher that it fails again
> than that it succeeds the second time. Moreover if you continue to
> repeat the query and if queries come in at a certain rate, you need to
> process more and more queries on the slave which will not really help
> other queries to finish in time nor will it be beneficial for the
> throughput of the system as a whole...

Well, we'd need to have a limited number of retries, which means a GUC
in recovery.conf:

query_cancel_retry = #

This might default to, say, 2.

However, I'd still like to hear from someone with the requisite
technical knowledge whether capturing and retrying the current query in
a query cancel is even possible.

--Josh Berkus

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-03-01 03:00:09
Message-ID:	4B8B2DB9.2040106@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Josh Berkus wrote:
> Well, we could throw this on the user if we could get them some
> information on how to calculate that number. For example, some way for
> them to calculate the number of XIDs per minute via a query, and then
> set vacuum_defer_cleanup_age appropriately on the master. Sure, it's
> clunky, but we've already warned people that 9.0 will be clunky and hard
> to administer. And it's no worse than setting FSM_pages used to be.
>
> However, first we need to test that setting vacuum_defer_cleanup_age
> actually benefits query cancel issues.
>

Proving that setting works as expected is already on my test case grid,
seems fine in my limited testing so far. I've started looking into ways
to monitor XID churn in a way for setting it better. I'll take care of
providing all that in my next test case update. My intent here is to
take the ideas outlined in my "Hot Standby Tradeoffs" blog post and turn
that into a new documentation section making it more clear where the
problem steps are, regardless of what else happens here. And I need
some concrete example of XID burn rate measurement to finish that job.

The main problem with setting vacuum_defer_cleanup_age high isn't
showing it works, it's a pretty simple bit of code. It's when you
recognize that it penalizes all cleanup all the time, whether or not the
standby is actually executing a long-running query or not, that you note
the second level of pain in increasing it. Returning to the idea of
"how is this different from a site already in production?", it may very
well be the case that a site that sets vacuum_defer_cleanup_age high
enough to support off-peak batch reporting cannot tolerate how that will
impact vacuums during their peak time of day. The XID export
implementation sidesteps that issue by only making the vacuum delay
increase when queries that require it are running, turning this back
into a standard "what's the best time of day to run my big reports?"
issue that people understand how to cope with already.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-03-01 03:12:22
Message-ID:	603c8f071002281912x3b34c2d5va2459fe18dd4c49a@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Feb 28, 2010 at 5:38 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> Greg, Joachim,
>
>> As I see it, the main technical obstacle here is that a subset of a
>> feature already on the SR roadmap needs to get built earlier than
>> expected to pull this off. I don't know about Tom, but I have no
>> expectation it's possible for me to get up to speed on that code fast
>> enough to contribute anything there. I expect the thing I'd be most
>> productive at as far as moving the release forward is to continue
>> testing this pair of features looking for rough edges, which is what I
>> have planned for the next month.
>
> That's OK with me. I thought you were saying that xmin-pub was going to
> be easier than expected. Per my other e-mails, I think that we should
> be shooting for "good enough, on time" for 9.0., rather than "perfect".
> We can't ever get to "perfect" if we don't release software.

I agree. It seems to me that the right long term fix for the problem
of query cancellations on the slave is going to be to give the slave
the ability to save multiple versions of relation pages where
necessary so that older snapshots can continue to be used even after
the conflicting WAL has been applied. However, I'm pretty sure that's
going to be a very difficult project which is unlikely to be coded by
anyone any time soon, let alone merged. Until it does, we're going to
force people to pick from a fairly unappealing menu of options:
postpone WAL replay for long periods of time, cancel queries (perhaps
even seemingly unrelated to what changed on the master), bloat the
master. All of those options are seriously unpleasant.

I think, though, that we have to think of this as being like the
Windows port, or maybe even more significant than that, as an
architectural change. I think it is going to take several releases
for this feature to be well-understood and stable and have all the
options we'd like it to have. It wouldn't surprise me if we get to
10.0 before we really have truly seamless replication. I don't expect
Slony or Londiste or any of the other solutions that are out there now
to get kicked to the curb by PG 9.0. Still, a journey of a thousand
miles begins with the first step. Simon and many others have put a
great deal of time and energy into getting us to the point where we
are now, and if we let the fact that we haven't reached our ultimate
goal keep us from putting what we have out there in front of our
customers, I think we're going to regret that.

I think the thing to do is to reposition our PR around these features.
We should maybe even go so far as to call them "beta" or
"experimental". We shouldn't tell people - this is going to be
totally awesome. We should tell people - this is a big improvement,
and it's still got some pretty significant limitations, but it's good
stuff and it's going in a good direction. Overhyping what we have
today is not going to be good for the project, and I'm frankly quite
afraid that nothing we can possibly code between now and the release
is going to measure up to what people are hoping for. We need to set
our own expectations, and those of our customers, at a level at which
they can be met.

> Quite frankly, simply telling people that "long-running queries on the
> slave tend not to be effective, wait for 9.1" is a possibility.

Yep.

> HS+SR is still a tremendous improvement over the options available
> previously. We never thought it was going to work for everyone
> everywhere, and shouldn't let our project's OCD tendencies run away from us.

Yep.

> However, I'd still like to hear from someone with the requisite
> technical knowledge whether capturing and retrying the current query in
> a query cancel is even possible.

I'm not sure who you want to hear from here, but I think that's a dead end.

...Robert

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-03-01 17:50:52
Message-ID:	4B8BFE7C.4070901@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2/28/10 7:00 PM, Greg Smith wrote:
> The main problem with setting vacuum_defer_cleanup_age high isn't
> showing it works, it's a pretty simple bit of code. It's when you
> recognize that it penalizes all cleanup all the time, whether or not the
> standby is actually executing a long-running query or not, that you note
> the second level of pain in increasing it. Returning to the idea of
> "how is this different from a site already in production?", it may very
> well be the case that a site that sets vacuum_defer_cleanup_age high
> enough to support off-peak batch reporting cannot tolerate how that will
> impact vacuums during their peak time of day. The XID export
> implementation sidesteps that issue by only making the vacuum delay
> increase when queries that require it are running, turning this back
> into a standard "what's the best time of day to run my big reports?"
> issue that people understand how to cope with already.

I don't think that defer_cleanup_age is a long-term solution. But we
need *a* solution which does not involve delaying 9.0.

And I think we can measure bloat in a pgbench test, no? When I get a
chance, I'll run one for a couple hours and see the difference that
cleanup_age makes.

--Josh Berkus

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-03-01 18:26:09
Message-ID:	4B8C06C1.50006@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Josh Berkus wrote:
> And I think we can measure bloat in a pgbench test, no? When I get a
> chance, I'll run one for a couple hours and see the difference that
> cleanup_age makes.
>

The test case I attached at the start of this thread runs just the
UPDATE to the tellers table. Running something similar that focuses
just on UPDATEs to the pgbench_accounts table, without the rest of the
steps done by the standard test, is the fastest route to bloat. The
standard test will do it too, just does a lot of extra stuff too that
doesn't impact results (SELECT, INSERT) so it wastes some resources
compared to a targeted bloater script.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-03-01 18:46:29
Message-ID:	407d949e1003011046v525543eao4e677edfa8253722@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Mar 1, 2010 at 5:50 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> I don't think that defer_cleanup_age is a long-term solution. But we
> need *a* solution which does not involve delaying 9.0.

So I think the primary solution currently is to raise max_standby_age.

However there is a concern with max_standby_age. If you set it to,
say, 300s. Then run a 300s query on the slave which causes the slave
to fall 299s behind. Now you start a new query on the slave -- it gets
a snapshot based on the point in time that the slave is currently at.
If it hits a conflict it will only have 1s to finish before the
conflict causes the query to be cancelled.

In short in the current setup I think there is no safe value of
max_standby_age which will prevent query cancellations short of -1. If
the slave has a constant stream of queries and always has at least one
concurrent query running then it's possible that the slave will run
continuously max_standby_age-epsilon behind the master and cancel
queries left and right, regardless of how large max_standby_age is.

To resolve this I think you would have to introduce some chance for
the slave to catch up. Something like refusing to use a snapshot older
than max_standby_age/2 and instead wait until the existing queries
finish and the slave gets a chance to catch up and see a more recent
snapshot. The problem is that this would result in very unpredictable
and variable response times from the slave. A single long-lived query
could cause replay to pause for a big chunk of max_standby_age and
prevent any new query from starting.

Does anyone see any way to guarantee that the slave gets a chance to
replay and new snapshots will become visible without freezing out new
queries for extended periods of time?

--
greg

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-03-01 19:21:48
Message-ID:	4B8C13CC.7030908@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> So I think the primary solution currently is to raise max_standby_age.
>
> However there is a concern with max_standby_age. If you set it to,
> say, 300s. Then run a 300s query on the slave which causes the slave
> to fall 299s behind. Now you start a new query on the slave -- it gets
> a snapshot based on the point in time that the slave is currently at.
> If it hits a conflict it will only have 1s to finish before the
> conflict causes the query to be cancelled.

Completely aside from that, how many users are going to be happy with a
slave server which is constantly 5 minutes behind?

--Josh Berkus

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-03-01 19:32:24
Message-ID:	407d949e1003011132y430066f0sea04a0b03fe6fcfc@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Mar 1, 2010 at 7:21 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> Completely aside from that, how many users are going to be happy with a
> slave server which is constantly 5 minutes behind?
>

Uhm, well all the ones who are happy with our current warm standby
setup for one?

And all the ones who are looking for a standby reporting server rather
than a high availability DR site.

For what it's worth Oracle has an option to have your standby
intentionally hold back n minutes behind and I've seen that set to 5
minutes.

--
greg

From:	Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-03-01 19:35:37
Message-ID:	4B8C1709.20007@kaltenbrunner.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Stark wrote:
> On Mon, Mar 1, 2010 at 7:21 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>> Completely aside from that, how many users are going to be happy with a
>> slave server which is constantly 5 minutes behind?
>>
>
> Uhm, well all the ones who are happy with our current warm standby
> setup for one?
>
> And all the ones who are looking for a standby reporting server rather
> than a high availability DR site.
>
> For what it's worth Oracle has an option to have your standby
> intentionally hold back n minutes behind and I've seen that set to 5
> minutes.

yeah a lot of people are doing that intentionally...

Stefan

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
Cc:	Greg Stark <gsstark(at)mit(dot)edu>, Josh Berkus <josh(at)agliodbs(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-03-01 19:43:13
Message-ID:	22493.1267472593@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> writes:
> Greg Stark wrote:
>> For what it's worth Oracle has an option to have your standby
>> intentionally hold back n minutes behind and I've seen that set to 5
>> minutes.

> yeah a lot of people are doing that intentionally...

It's the old DBA screwup safety valve ... drop the main accounts table,
you have five minutes to stop replication before it's dropped on the
standby. Speaking of which, does the current HS+SR code have a
provision to force the standby to stop tracking WAL and come up live,
even when there's more WAL available? Because that's what you'd need
in order for such a thing to be helpful in that scenario.

regards, tom lane

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Greg Stark <gsstark(at)mit(dot)edu>, Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-03-01 20:04:41
Message-ID:	4B8C1DD9.5000903@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 3/1/10 11:43 AM, Tom Lane wrote:
> Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> writes:
>> Greg Stark wrote:
>>> For what it's worth Oracle has an option to have your standby
>>> intentionally hold back n minutes behind and I've seen that set to 5
>>> minutes.
>
>> yeah a lot of people are doing that intentionally...
>
> It's the old DBA screwup safety valve ... drop the main accounts table,
> you have five minutes to stop replication before it's dropped on the
> standby. Speaking of which, does the current HS+SR code have a
> provision to force the standby to stop tracking WAL and come up live,
> even when there's more WAL available? Because that's what you'd need
> in order for such a thing to be helpful in that scenario.

the "fast" recovery option should do this. You'd need some fast
reaction times, though.

However, this leaves aside Greg's point about snapshot age and
successive queries; does anyone dispute his analysis? Simon?

--Josh Berkus

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-03-01 22:32:01
Message-ID:	4B8C4061.80905@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2/28/10 7:12 PM, Robert Haas wrote:
>> However, I'd still like to hear from someone with the requisite
>> > technical knowledge whether capturing and retrying the current query in
>> > a query cancel is even possible.
>
> I'm not sure who you want to hear from here, but I think that's a dead end.

"dead end" as in "too hard to implement"? Or for some other reason?

It's undeniable that auto-retry would be better from a user's
perspective than a user-visible cancel. So if it's *reasonable* to
implement, I think we should be working on it. I'm also very puzzled as
to why nobody else wants to even discuss it; it's like some wierd blackout.

--Josh Berkus

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Josh Berkus" <josh(at)agliodbs(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc:	"Greg Smith" <greg(at)2ndquadrant(dot)com>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-03-01 22:52:50
Message-ID:	4B8BF0E2020000250002F760@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Josh Berkus <josh(at)agliodbs(dot)com> wrote:

> It's undeniable that auto-retry would be better from a user's
> perspective than a user-visible cancel. So if it's *reasonable*
> to implement, I think we should be working on it. I'm also very
> puzzled as to why nobody else wants to even discuss it; it's like
> some wierd blackout.

Well, at least for serializable transactions past the first
statement, you'd need to have the complete *logic* for the
transaction in order to do a retry. Not that this is a bad idea --
our application framework does this automatically -- but unless you
only support this for a transaction which is wrapped up as a
function, I don't see how the database itself could handle it. It
might be *possible* to do it outside of a single-function
transaction in a read committed transaction, but you'd have to be
careful about locks. I remember suggesting automatic query retry
(rather than continuing in a mixed-snapshot mode) for update
conflicts in read committed mode and Tom had objections; you might
want to check the archives for that.

-Kevin

From:	Greg Stark <stark(at)mit(dot)edu>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject:	Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-03-02 00:46:22
Message-ID:	407d949e1003011646t48b41b44sc5fc610cacae34b9@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

josh, nobody is talking about it because it doesn't make sense. you could
only retry if it was the first query in the transaction and only if you
could prove there were no side-effects outside the database and then you
would have no reason to think the retry would be any more likely to work.

greg

On 1 Mar 2010 22:32, "Josh Berkus" <josh(at)agliodbs(dot)com> wrote:

On 2/28/10 7:12 PM, Robert Haas wrote:
>> However, I'd still like to hear from someone with the requ...
"dead end" as in "too hard to implement"? Or for some other reason?

--Josh Berkus

--
Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
To make changes to your subs...

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Greg Stark <stark(at)mit(dot)edu>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-03-02 01:04:11
Message-ID:	28781.1267491851@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Stark <stark(at)mit(dot)edu> writes:
> josh, nobody is talking about it because it doesn't make sense. you could
> only retry if it was the first query in the transaction and only if you
> could prove there were no side-effects outside the database and then you
> would have no reason to think the retry would be any more likely to work.

But it's hot standby, so there are no data-modifying transactions.
Volatile functions could be a problem, though. A bigger problem is
we might have already shipped partial query results to the client.

I agree it ain't easy, but it might not be completely out of the
question. Definitely won't be happening for 9.0 though.

regards, tom lane

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Greg Stark <gsstark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-03-02 01:34:53
Message-ID:	4B8C6B3D.1060107@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Josh Berkus wrote:
> However, this leaves aside Greg's point about snapshot age and
> successive queries; does anyone dispute his analysis? Simon?
>

There's already a note on the Hot Standby TODO about unexpectly bad
max_standby_delay behavior being possible on an idle system, with no
suggested resolution for it besides better SR integration. The issue
Greg Stark has noted is another variation on that theme. It's already
on my list of theorized pathological but as yet undemonstrated concerns
that Simon and I identified, the one I'm working through creating a test
cases to prove/disprove. I'm past "it's possible..." talks at this
point though as not to spook anyone unnecessarily, and am only raising
things I can show concrete examples of in action. White box testing at
some point does require pausing one's investigation of what's in the box
and getting on with the actual testing instead.

The only real spot where my opinion diverges here that I have yet to
find any situation where 'max_standby_delay=-1' makes any sense to me.
When I try running my test cases with that setting, the whole system
just reacts far too strangely. My first patch here is probably going to
be adding more visibility into the situation when queries are blocking
replication forever, because I think the times I find myself at "why is
the system hung right now?" are when that happens and it's not obvious
as an admin what's going on.

Also, the idea that a long running query on the standby could cause an
unbounded delay in replication is so foreign to my sensibilities that I
don't ever include it in the list of useful solutions to the problems
I'm worried about. The option is there, not disputing that it makes
sense for some people because there seems some demand for it, just can't
see how it fits into any of the use-cases I'm concerned about.

I haven't said anything about query retry mainly because I can't imagine
any way it's possible to build it in time for this release, so whether
it's eventually feasible or not doesn't enter into what I'm worried
about right now. In any case, I would prioritize that behind work on
preventing the most common situations that cause cancellations in the
first place, until those are handled so well that retry is the most
effective improvement left to consider.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-03-02 01:50:28
Message-ID:	603c8f071003011750n66f3efbex9a5b1f96ee053688@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Mar 1, 2010 at 5:32 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> On 2/28/10 7:12 PM, Robert Haas wrote:
>>> However, I'd still like to hear from someone with the requisite
>>> > technical knowledge whether capturing and retrying the current query in
>>> > a query cancel is even possible.
>>
>> I'm not sure who you want to hear from here, but I think that's a dead end.
>
> "dead end" as in "too hard to implement"? Or for some other reason?

I think it's probably too hard to implement for the extremely limited
set of circumstances in which it can work. See the other responses
for some of the problems. There are others, too. Suppose that the
plan for some particular query is to read a table with a hundred
million records, sort it, and then do whatever with the results.
After reading the first 99 million records, the transaction is
cancelled and we have to start over. Maybe someone will say, fine, no
problem - but it's certainly going to be user-visible. Especially if
we retry more than once.

I think we should focus our efforts initially on reducing the
frequency of spurious cancels. What we're essentially trying to do
here is refute the proposition "the WAL record I just replayed might
change the result of this query". It's possibly equivalent to the
halting problem (and certainly impossibly hard) to refute this
proposition in every case where it is in fact false, but it sounds
like what we have in place right now doesn't come close to doing as
well as can be done.

I just read through the current documentation and it doesn't really
seem to explain very much about how HS decides which queries to kill.
Can someone try to flesh that out a bit? It also uses the term
"buffer cleanup lock", which doesn't seem to be used anywhere else in
the documentation (though it does appear in the source tree, including
README.HOT).

...Robert

From:	Aidan Van Dyk <aidan(at)highrise(dot)ca>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Greg Stark <stark(at)mit(dot)edu>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-03-02 02:08:27
Message-ID:	20100302020827.GF10960@oak.highrise.ca
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

* Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> [100301 20:04]:
> Greg Stark <stark(at)mit(dot)edu> writes:
> > josh, nobody is talking about it because it doesn't make sense. you could
> > only retry if it was the first query in the transaction and only if you
> > could prove there were no side-effects outside the database and then you
> > would have no reason to think the retry would be any more likely to work.
>
> But it's hot standby, so there are no data-modifying transactions.
> Volatile functions could be a problem, though. A bigger problem is
> we might have already shipped partial query results to the client.

But, since we know its a slave and that the reason the query was
cancelled was because it's got a backlog of updates to apply, it's very
likely that the data that the earlier parts of the transaction would be
different...

And then you have no idea if just blindly replaying all statements of
the transaction successively is a good idea...

--
Aidan Van Dyk Create like a god,
aidan(at)highrise(dot)ca command like a king,
http://www.highrise.ca/ work like a slave.

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Joachim Wieland <joe(at)mcknight(dot)de>
Cc:	Greg Stark <gsstark(at)mit(dot)edu>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-03-02 04:54:04
Message-ID:	201003020454.o224s4601113@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Joachim Wieland wrote:
> 1) With the current implementation they will see better performance on
> the master and more aggressive vacuum (!), since they have less
> long-running queries now on the master and autovacuum can kick in and
> clean up with less delay than before. On the other hand their queries
> on the standby might fail and they will start thinking that this HS+SR
> feature is not as convincing as they thought it was... Next step for
> them is to take the documentation and study it for a few days to learn
> all about vacuum, different delays, transaction ids and age parameters
> and experiment a few weeks until no more queries fail - for a while...
> But they can never be sure... In the end they might also modify the
> parameters in the wrong direction or overshoot because of lack of time
> to experiment and lose another important property without noticing
> (like being as close as possible to the master).

I assumed they would set max_standby_delay = -1 and be happy.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-03-02 04:56:45
Message-ID:	201003020456.o224ujI01477@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Josh Berkus wrote:
> HS+SR is still a tremendous improvement over the options available
> previously. We never thought it was going to work for everyone
> everywhere, and shouldn't let our project's OCD tendencies run away from us.

OCD (Obsessive-Compulsive Disorder) --- good one. :-)

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Joachim Wieland <joe(at)mcknight(dot)de>, Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-03-02 05:50:08
Message-ID:	4B8CA710.8020601@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Bruce Momjian wrote:
> Joachim Wieland wrote:
>
>> 1) With the current implementation they will see better performance on
>> the master and more aggressive vacuum (!), since they have less
>> long-running queries now on the master and autovacuum can kick in and
>> clean up with less delay than before. On the other hand their queries
>> on the standby might fail and they will start thinking that this HS+SR
>> feature is not as convincing as they thought it was...
>>
>
> I assumed they would set max_standby_delay = -1 and be happy.
>

The admin in this situation might be happy until the first time the
primary fails and a failover is forced, at which point there is an
unbounded amount of recovery data to apply that was stuck waiting behind
whatever long-running queries were active. I don't know if you've ever
watched what happens to a pre-8.2 cold standby when you start it up with
hundreds or thousands of backed up WAL files to process before the
server can start, but it's not a fast process. I watched a production
8.1 standby get >4000 files behind once due to an archive_command bug,
and it's not something I'd like to ever chew my nails off to again. If
your goal was HA and you're trying to bring up the standby, the server
is down the whole time that's going on.

This is why no admin who prioritizes HA would consider
'max_standby_delay = -1' a reasonable setting, and those are the sort of
users Joachim's example was discussing. Only takes one rogue query that
runs for a long time to make the standby so far behind it's useless for
HA purposes. And you also have to ask yourself "if recovery is halted
while waiting for this query to run, how stale is the data on the
standby getting?". That's true for any large setting for this
parameter, but using -1 for the unlimited setting also gives the maximum
possible potential for such staleness.

'max_standby_delay = -1' is really only a reasonable idea if you are
absolutely certain all queries are going to be short, which we can't
dismiss as an unfounded use case so it has value. I would expect you
have to also combine it with a matching reasonable statement_timeout to
enforce that expectation to make that situation safer.

In any of the "offload batch queries to the failover standby"
situations, it's unlikely an unlimited value for this setting will be
practical. Perhaps you set max_standby_delay to some number of hours,
to match your expected worst-case query run time and reduce the chance
of cancellation. Not putting a limit on it at all is a situation no DBA
with healthy paranoia is going to be happy with the potential downside
of in a HA environment, given that both unbounded staleness and recovery
time are then both possible. The potential of a failed long-running
query is much less risky than either of those.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-03-02 06:21:10
Message-ID:	4B8CAE56.7010503@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas wrote:
> I just read through the current documentation and it doesn't really
> seem to explain very much about how HS decides which queries to kill.
> Can someone try to flesh that out a bit?

I believe it just launches on a mass killing spree once things like
max_standby_delay expire. This I want to confirm via testing (can
simulate with a mix of long and short running pgbench queries) and then
intend to update the docs to clarify.

> It also uses the term
> "buffer cleanup lock", which doesn't seem to be used anywhere else in
> the documentation (though it does appear in the source tree, including
> README.HOT).
>

This loose end was already noted in my last docs update. I wrote an
initial description, but Bruce and I decided to leave out until
something more thorough could be put together. This is also on my docs
cleanup list, will get to it somewhere along the beta timeline.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Joachim Wieland <joe(at)mcknight(dot)de>, Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-03-02 18:30:15
Message-ID:	201003021830.o22IUFW28509@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Smith wrote:
> > I assumed they would set max_standby_delay = -1 and be happy.
> >
>
> The admin in this situation might be happy until the first time the
> primary fails and a failover is forced, at which point there is an
> unbounded amount of recovery data to apply that was stuck waiting behind
> whatever long-running queries were active. I don't know if you've ever
> watched what happens to a pre-8.2 cold standby when you start it up with
> hundreds or thousands of backed up WAL files to process before the
> server can start, but it's not a fast process. I watched a production
> 8.1 standby get >4000 files behind once due to an archive_command bug,
> and it's not something I'd like to ever chew my nails off to again. If
> your goal was HA and you're trying to bring up the standby, the server
> is down the whole time that's going on.
>
> This is why no admin who prioritizes HA would consider
> 'max_standby_delay = -1' a reasonable setting, and those are the sort of
> users Joachim's example was discussing. Only takes one rogue query that
> runs for a long time to make the standby so far behind it's useless for
> HA purposes. And you also have to ask yourself "if recovery is halted
> while waiting for this query to run, how stale is the data on the
> standby getting?". That's true for any large setting for this
> parameter, but using -1 for the unlimited setting also gives the maximum
> possible potential for such staleness.
>
> 'max_standby_delay = -1' is really only a reasonable idea if you are
> absolutely certain all queries are going to be short, which we can't
> dismiss as an unfounded use case so it has value. I would expect you
> have to also combine it with a matching reasonable statement_timeout to
> enforce that expectation to make that situation safer.

Well, as you stated in your blog, you are going to have one of these
downsides:

o master bloat
o delayed recovery
o cancelled queries

Right now you can't choose "master bloat", but you can choose the other
two. I think that is acceptable for 9.0, assuming the other two don't
have the problems that Tom foresees.

Our documentation should probably just come how and state that clearly.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-03-02 18:34:33
Message-ID:	201003021834.o22IYX529089@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Bruce Momjian wrote:
> > 'max_standby_delay = -1' is really only a reasonable idea if you are
> > absolutely certain all queries are going to be short, which we can't
> > dismiss as an unfounded use case so it has value. I would expect you
> > have to also combine it with a matching reasonable statement_timeout to
> > enforce that expectation to make that situation safer.
>
> Well, as you stated in your blog, you are going to have one of these
> downsides:
>
> o master bloat
> o delayed recovery
> o cancelled queries
>
> Right now you can't choose "master bloat", but you can choose the other
> two. I think that is acceptable for 9.0, assuming the other two don't
> have the problems that Tom foresees.

I was wrong. You can choose "master bloat" with
vacuum_defer_cleanup_age, but only crudely because it is measured in
xids and the master defers no matter what queries are running on the
slave, and there is still the possibility for query cancel for long
queries.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-03-02 19:03:14
Message-ID:	4B8D60F2.2040301@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 3/2/10 10:30 AM, Bruce Momjian wrote:
> Right now you can't choose "master bloat", but you can choose the other
> two. I think that is acceptable for 9.0, assuming the other two don't
> have the problems that Tom foresees.

Actually, if vacuum_defer_cleanup_age can be used, "master bloat" is an
option. Hopefully I'll get some time for serious testing this weekend.

--Josh Berkus

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Joachim Wieland <joe(at)mcknight(dot)de>, Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-03-02 20:11:00
Message-ID:	4B8D70D4.7010004@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Bruce Momjian wrote:
>> Right now you can't choose "master bloat", but you can choose the other
>> two. I think that is acceptable for 9.0, assuming the other two don't
>> have the problems that Tom foresees.
>>
>
> I was wrong. You can choose "master bloat" with
> vacuum_defer_cleanup_age, but only crudely because it is measured in
> xids and the master defers no matter what queries are running on the
> slave...

OK with you finding the situation acceptable, so long as it's an
informed decision. From how you're writing about this, I'm comfortable
you (and everybody else still involved here) have absorbed the issues
enough that we're all talking about the same thing now. Since there are
a couple of ugly user-space hacks possible for prioritizing "master
bloat", and nobody is stepping up to work on resolving this via my
suggestion involving better SR integration, seems to me heated
discussion of code changes has come to a resolution of sorts I (and
Simon, just checked) can live with. Sounds like we have three action
paths here:

-Tom already said he was planning a tour through the HS/SR code, I
wanted that to happen with him aware of this issue.
-Josh will continue doing his testing, also better informed about this
particular soft spot.
-I'll continue test-case construction for the problems here there are
still concerns about (pathologic max_standby_delay and b-tree split
issues being the top two on that list), and keep sharing particularly
interesting ones here to help everyone else's testing.

If it turns out any of those paths leads to a must-fix problem that
doesn't have an acceptable solution, at least the idea of this as a
"plan B" is both documented and more widely understood then when I
started ringing this particular bell.

I just updated the Open Items list:
http://wiki.postgresql.org/wiki/PostgreSQL_9.0_Open_Items to officially
put myself on the hook for the following HS related documentation items
that have come up recently, aiming to get them all wrapped up in time
before or during early beta:

-Update Hot Standby documentation: clearly explain relationships between
the 3 major setup trade-offs, "buffer cleanup lock", notes on which
queries are killed once max_standby_delay is reached, measuring XID
churn on master for setting vacuum_defer_cleanup_age
-Clean up archive_command docs related to recent "/bin/true" addition.
Given that's where I expect people who run into the pg_stop_backup
warning message recently added will end up at, noting its value for
escaping from that particular case might be useful too.

To finish airing my personal 9.0 TODO list now that I've gone this far,
I'm also still working on completing the following patches that initial
versions have been submitted of, was close to finishing both before
getting side-tracked onto this larger issue:

-pgbench > 4000 scale bug fix:
http://archives.postgresql.org/message-id/4B621BA3.7090306@2ndquadrant.com
-Improving the logging/error reporting/no timestamp issues in pg_standby
re-raised recently by Selena:
http://archives.postgresql.org/message-id/2b5e566d1001250945oae17be8n6317f827e3bd7492@mail.gmail.com

If nobody else claims them as something they're working on before, I
suspect I'll then move onto building some of the archiver UI
improvements discussed most recently as part of the "pg_stop_backup does
not complete" thread, despite Heikki having crushed my dreams of a
simple solution to those by pointing out the shared memory memory
limitation involved.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Greg Stark <gsstark(at)mit(dot)edu>, Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-03-02 20:57:39
Message-ID:	1267563459.21887.55.camel@ebony
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 2010-03-01 at 12:04 -0800, Josh Berkus wrote:

> does anyone dispute his analysis? Simon?

No dispute. I think I've discussed this before.

--
Simon Riggs www.2ndQuadrant.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Greg Stark <gsstark(at)mit(dot)edu>, Josh Berkus <josh(at)agliodbs(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-03-02 20:58:03
Message-ID:	1267563483.21887.56.camel@ebony
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 2010-03-01 at 14:43 -0500, Tom Lane wrote:
> Speaking of which, does the current HS+SR code have a
> provision to force the standby to stop tracking WAL and come up live,
> even when there's more WAL available?

Yes, trigger file.

--
Simon Riggs www.2ndQuadrant.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Joachim Wieland <joe(at)mcknight(dot)de>
Cc:	Greg Stark <gsstark(at)mit(dot)edu>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-03-02 21:00:24
Message-ID:	1267563624.21887.58.camel@ebony
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, 2010-02-28 at 16:56 +0100, Joachim Wieland wrote:

> Now let's take a look at both scenarios from the administrators' point
> of view:

Well argued, agree with all of your points.

--
Simon Riggs www.2ndQuadrant.com

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-03-02 21:36:52
Message-ID:	201003022136.o22LaqT29573@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Stark wrote:
> On Mon, Mar 1, 2010 at 5:50 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> > I don't think that defer_cleanup_age is a long-term solution. ?But we
> > need *a* solution which does not involve delaying 9.0.
>
> So I think the primary solution currently is to raise max_standby_age.
>
> However there is a concern with max_standby_age. If you set it to,
> say, 300s. Then run a 300s query on the slave which causes the slave
> to fall 299s behind. Now you start a new query on the slave -- it gets
> a snapshot based on the point in time that the slave is currently at.
> If it hits a conflict it will only have 1s to finish before the
> conflict causes the query to be cancelled.
>
> In short in the current setup I think there is no safe value of
> max_standby_age which will prevent query cancellations short of -1. If
> the slave has a constant stream of queries and always has at least one
> concurrent query running then it's possible that the slave will run
> continuously max_standby_age-epsilon behind the master and cancel
> queries left and right, regardless of how large max_standby_age is.
>
> To resolve this I think you would have to introduce some chance for
> the slave to catch up. Something like refusing to use a snapshot older
> than max_standby_age/2 and instead wait until the existing queries
> finish and the slave gets a chance to catch up and see a more recent
> snapshot. The problem is that this would result in very unpredictable
> and variable response times from the slave. A single long-lived query
> could cause replay to pause for a big chunk of max_standby_age and
> prevent any new query from starting.

That is a good point. I have added the attached documentation patch to
mention that max_standby_delay increases the master/slave inconsistency,
and not to use it for xid-keepalive connections.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do

Attachment	Content-Type	Size
/rtmp/diff	text/x-diff	2.4 KB

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-03-02 23:39:10
Message-ID:	201003022339.o22NdAi12894@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

This is sobering. I have added the attached documentation so at least
this odd behavior is documented.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do

Attachment	Content-Type	Size
/rtmp/diff	text/x-diff	2.8 KB

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Joachim Wieland <joe(at)mcknight(dot)de>, Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-03-02 23:44:16
Message-ID:	201003022344.o22NiGP13389@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Smith wrote:
> Bruce Momjian wrote:
> >> Right now you can't choose "master bloat", but you can choose the other
> >> two. I think that is acceptable for 9.0, assuming the other two don't
> >> have the problems that Tom foresees.
> >>
> >
> > I was wrong. You can choose "master bloat" with
> > vacuum_defer_cleanup_age, but only crudely because it is measured in
> > xids and the master defers no matter what queries are running on the
> > slave...
>
> OK with you finding the situation acceptable, so long as it's an
> informed decision. From how you're writing about this, I'm comfortable

Well, consider that in November we were not even sure SR or HS would be
in 9.0. We got them both, so if it is a little kudgy in 9.0, that's OK.
We are much farther along than I ever expected.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-03-10 06:29:19
Message-ID:	4B973C3F.9070501@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

All,

I've been playing with vacuum_defer_cleanup_age in reference to the
query cancel problem. It really seems to me that this is the way
forward in terms of dealing with query cancel for normal operation
rather than wal_standby_delay, or maybe in combination with it.

As a first test, I set up a deliberately pathological situation with
pgbench and a wal_standby_delay of 1 second. This allowed me to trigger
query cancel on a relatively simple reporting query; in fact, to make it
impossible to complete.

Then I increased vacuum_defer_cleanup_age to 100000, which represents
about 5 minutes of transactions on the test system. This eliminated all
query cancels for the reporting query, which takes an average of 10s.

Next is a database bloat test, but I'll need to do that on a system with
more free space than my laptop.

--Josh Berkus

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-03-10 11:38:12
Message-ID:	407d949e1003100338h764c33dfice0ccefb68b2f726@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Mar 10, 2010 at 6:29 AM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> Then I increased vacuum_defer_cleanup_age to 100000, which represents
> about 5 minutes of transactions on the test system. This eliminated all
> query cancels for the reporting query, which takes an average of 10s.
>
> Next is a database bloat test, but I'll need to do that on a system with
> more free space than my laptop.

Note that this will be heavily dependent on the use case. If you have
one of those counter records that keeps being updated and gets cleaned
up by HOT whenever the page fills up then you need to allow HOT to
clean it up before it overflows the page or else it'll bloat the table
and require a real vacuum. I think that means that a
vacuum_defer_cleanup of up to about 100 or so (it depends on the width
of your counter record) might be reasonable as a general suggestion
but anything higher will depend on understanding the specific system.

Another use case that might suprise people who are accustomed to the
current behaviour is massive updates. This is the main really pessimal
use case left in Postgres -- ideally they wouldn't bloat the table at
all but currently they double the size of the table. People may be
accustomed to the idea that they can then run vacuum and that will
limit the bloat to 50%, assuming they have no (other) long-lived
transactions. With vacuum_defer_cleanup that will no longer be true.
It will be as if you always have a query lasting n transactions in
your system at all times.

--
greg

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-03-10 18:59:50
Message-ID:	4B97EC26.1090208@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 3/10/10 3:38 AM, Greg Stark wrote:
> I think that means that a
> vacuum_defer_cleanup of up to about 100 or so (it depends on the width
> of your counter record) might be reasonable as a general suggestion
> but anything higher will depend on understanding the specific system.

100 wouldn't be useful at all. It would increase bloat without doing
anything about query cancel except on a very lightly used system.

> With vacuum_defer_cleanup that will no longer be true.
> It will be as if you always have a query lasting n transactions in
> your system at all times.

Yep, but until we get XID-publish-to-master working in 9.1, I think it's
probably the best we can do. At least it's no *worse* than having a
long-running query on the master at all times.

--Josh Berkus

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-03-11 00:51:29
Message-ID:	3f0b79eb1003101651t3ffb4a1cp8dfe40dc8481ef5e@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Mar 10, 2010 at 3:29 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> I've been playing with vacuum_defer_cleanup_age in reference to the
> query cancel problem. It really seems to me that this is the way
> forward in terms of dealing with query cancel for normal operation
> rather than wal_standby_delay, or maybe in combination with it.

Why isn't vacuum_defer_cleanup_age listed on postgresql.conf.sample?
Though I also tried to test the effect of it, I was unable to find it
in the conf file.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-03-11 01:39:37
Message-ID:	201003110139.o2B1dba20349@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Fujii Masao wrote:
> On Wed, Mar 10, 2010 at 3:29 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> > I've been playing with vacuum_defer_cleanup_age in reference to the
> > query cancel problem. ?It really seems to me that this is the way
> > forward in terms of dealing with query cancel for normal operation
> > rather than wal_standby_delay, or maybe in combination with it.
>
> Why isn't vacuum_defer_cleanup_age listed on postgresql.conf.sample?
> Though I also tried to test the effect of it, I was unable to find it
> in the conf file.

I asked about that last week and for some reason Simon didn't want it
added and Greg Smith was going to get this corrected. Greg?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date:	2010-03-11 05:58:35
Message-ID:	4B98868B.7070800@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> Why isn't vacuum_defer_cleanup_age listed on postgresql.conf.sample?
> Though I also tried to test the effect of it, I was unable to find it
> in the conf file.

Using it has some bugs we need to clean up, apparently.

--Josh Berkus