Quick Links

Hot Standby 0.2.1

Lists:	pgsql-hackers

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Hot Standby 0.2.1
Date:	2009-09-15 21:41:59
Message-ID:	1253050920.27962.60.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

OK, here is the latest version of the Hot Standby patchset. This is
about version 30+ by now, but we should regard this as 0.2.1
Patch against CVS HEAD (now): clean apply, compile, no known bugs.

OVERVIEW

You can download PDF versions of the fine manual is here
http://wiki.postgresql.org/images/0/01/Hot_Standby_main.pdf
http://wiki.postgresql.org/images/0/01/Hot_Standby_Recovery_Functions.pdf

Also available via the project Wiki, which is here
http://wiki.postgresql.org/wiki/Hot_Standby

Patch should be attached to this email. If problems, get this and other
versions from Wiki please. No offlist comments, questions etc please.

PATCH VERSIONING & STATUS

0 meaning its not fully released yet
2 meaning this is a major new re-write
1 meaning this is the first release

Patch is still in testing and will be for next few days at least.
Released now only because I promised to do so.

Is this ready for commit? Yes, it is in the shape I want it to be in,
but also, No, I can't say it's been through a wide enough range of tests
as yet to be considered immediately ready for commit.

Further bug fixing and minor cosmetic development will take place via my
GIT repo, uploaded soon. Patch included here to meet deadlines and code
inclusion. BSD.

PATCH SUMMARY

* 76 files changed, 5160 insertions(+), 59 deletions(-), 1251 mods(!)
* 7 files with more than 100 lines changed
procarray.c (1200+ additions)
xlog.c (600+ additions)
xact.c (500+ additions)
inval.c (650 additions)
lock.c
heapam.c
nbtxlog.c
* 29 files with 10 or fewer lines changed
* Applies cleanly to CVS HEAD as of now

CHANGES

The rough changes since version 1 series of patches.

* Full documentation included. Many, but not all nuances of SGML tagging
followed, but sufficient aspects there to allow for proofreading before
we do final changes. Some undocumented functions now correctly
documented. Recovery functions now split into user and control functions
in docs to make it clearer.

* Starting conditions in GetRunningTransactionData() are now much
stricter and holds more lwlocks. There are few cases where any not-found
xids are allowed during xid processing, so code is more robust. Please
check for race conditions.

* GetRunningTransactionData() now handles initialisation of
AccessExclusiveLocks correctly. Locks are counted in a low-contention
approach that avoids taking holding lock partition locks, if possible.

* Start-up conditions now recoded to allow faster start in cases where
many subtransactions are present. Recovery connections are only enabled
when the snapshot is valid.

* max_connections needs to be correctly set or HS will not allow
connections. Once snapshots are enabled they will continue to be
available always.

* RecordKnownAssignedTransactions() now contains a test for xid
wraparound threat which invokes conflict processing should that occur.

* Boolean states now clarified and corrected. Hot Standby can be turned
off completely if not required or if problems effect production. That
causes many changes but there is no change in the intention of those
sections of code.

* UnobservedXids processing follows Heikki's proposal, but has been
renamed to KnownAssignedXids. It has also been modularised and
completely re-written using a hash table approach. So far it has been
much more stable than the previous sorted array coding, which I am happy
to see in the shredder for all the problems it caused. Fully detailed
comments all through.

* All record types now respect max_standby_delay.

* All deferred conflict processing has been removed - conflict
processing itself is still enabled.

* A few other functions have been renamed and/or moved around to
rationalise their exact purpose/position within their modules.

* Prepared transactions holding AccessExclusiveLocks at the end of
recovery are now handled.

* Hash indexes are now safely handled. That was removed at request, but
we need it to avoid silent data loss for queries near those types of
index.

* Hint bits are now set, in appropriate circumstances.

* Flat file logic removed

* Large swathes of unused code removed.

* All code comments addressed and/or re-explained in more detail

* CHECKPOINT now works during recovery but performs restartpoint
instead.

* Tweaked max_standby_delay code to avoid long duration waits. Added
dynamic function to control delay during recovery. Added code for stats
collection and ps display. Set default to sensible production values.

RECENT BUGS

* Found and fixed missing relcache init file invalidation

* Found and fixed more serious VACUUM FULL-related weirdness <sigh>

* Recently discovered bug has resulted in changes in
XidInMVCCSnapshot(). Heikki's earlier approach did not correctly allow
for the maximum size of a snapshot. The simplicity of Heikki's proposal
is good, but hid a flaw in where snapshots would put their xids. I've
fully solved the problem though I expect further discussion.

I've looked through every change and verified it, but fixing all the
bugs means there's areas of new code added in last few days. I accept
that any bugs herein are my responsibility.

--
Simon Riggs www.2ndQuadrant.com

Attachment	Content-Type	Size
hot_standby.riggs.v0.2.1.patch.bz2	application/x-bzip	70.8 KB

From:	David Fetter <david(at)fetter(dot)org>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-15 22:05:49
Message-ID:	20090915220549.GM19673@fetter.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Sep 15, 2009 at 10:41:59PM +0100, Simon Riggs wrote:
>
> OK, here is the latest version of the Hot Standby patchset. This is
> about version 30+ by now, but we should regard this as 0.2.1 Patch
> against CVS HEAD (now): clean apply, compile, no known bugs.

Kudos!!!!

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

From:	Bernd Helmle <mailings(at)oopsware(dot)de>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-16 08:38:24
Message-ID:	1CB7C3A5B12627A310D79F45@[172.26.14.62]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

--On 15. September 2009 22:41:59 +0100 Simon Riggs <simon(at)2ndQuadrant(dot)com>
wrote:

> http://wiki.postgresql.org/images/0/01/Hot_Standby_Recovery_Functions.pdf

This doesn't work for me, it seems the correct link is

<http://wiki.postgresql.org/images/1/10/Hot_Standby_Recovery_Functions.pdf>
?

--
Thanks

Bernd

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-16 22:05:17
Message-ID:	4AB1611D.4030305@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

All,

Now that Simon has submitted this, can some of the heavy-hitters here
review it? Heikki?

Nobody's name is next to it.

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-16 23:32:01
Message-ID:	603c8f070909161632x7cc838d7ic29ba4931d7fac48@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Sep 16, 2009 at 6:05 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> Now that Simon has submitted this, can some of the heavy-hitters here
> review it? Heikki?
>
> Nobody's name is next to it.

I don't think anyone is planning to ignore this patch, but it wasn't
included in the first round of round-robin reviewing assignments
because it wasn't submitted until the following day, after the
announced deadline for submissions had already passed. So most people
are probably busy with with some other patch at the moment, but that's
a temporary phenomenon. This is a pretty small CommitFest, so there
shouldn't be any shortage of reviewers, though Heikki's time may be
stretched a little thin, since Streaming Replication is also in the
queue, and he is working on index-only scans. That's really for him
to comment on, though.

...Robert

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-17 06:54:03
Message-ID:	4AB1DD0B.1080300@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas wrote:
> On Wed, Sep 16, 2009 at 6:05 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>> Now that Simon has submitted this, can some of the heavy-hitters here
>> review it? Heikki?
>>
>> Nobody's name is next to it.
>
> I don't think anyone is planning to ignore this patch, but it wasn't
> included in the first round of round-robin reviewing assignments
> because it wasn't submitted until the following day, after the
> announced deadline for submissions had already passed. So most people
> are probably busy with with some other patch at the moment, but that's
> a temporary phenomenon.

Right, I've added myself as reviewer now.

> This is a pretty small CommitFest, so there
> shouldn't be any shortage of reviewers, though Heikki's time may be
> stretched a little thin, since Streaming Replication is also in the
> queue, and he is working on index-only scans. That's really for him
> to comment on, though.

I'm going to put the index-only scans aside for now to focus on hot
standby and streaming replication. Both are big patches, so there's
plenty of work in those two alone, and not only for me.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-17 07:54:46
Message-ID:	1253174086.9666.137.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, 2009-09-17 at 09:54 +0300, Heikki Linnakangas wrote:

> > This is a pretty small CommitFest, so there
> > shouldn't be any shortage of reviewers, though Heikki's time may be
> > stretched a little thin, since Streaming Replication is also in the
> > queue, and he is working on index-only scans. That's really for him
> > to comment on, though.
>
> I'm going to put the index-only scans aside for now to focus on hot
> standby and streaming replication. Both are big patches, so there's
> plenty of work in those two alone, and not only for me.

That's very good of you, thanks.

It was already clear to a few people that your time would bottleneck
trying to review both at the same time. I personally wasn't expecting
you to jump into immediate action on HS. We have time.

--
Simon Riggs www.2ndQuadrant.com

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-17 23:01:47
Message-ID:	603c8f070909171601u5b9bcfc7q899eda37ef88535f@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Sep 17, 2009 at 2:54 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Robert Haas wrote:
>> On Wed, Sep 16, 2009 at 6:05 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>>> Now that Simon has submitted this, can some of the heavy-hitters here
>>> review it? Heikki?
>>>
>>> Nobody's name is next to it.
>>
>> I don't think anyone is planning to ignore this patch, but it wasn't
>> included in the first round of round-robin reviewing assignments
>> because it wasn't submitted until the following day, after the
>> announced deadline for submissions had already passed. So most people
>> are probably busy with with some other patch at the moment, but that's
>> a temporary phenomenon.
>
> Right, I've added myself as reviewer now.
>
>> This is a pretty small CommitFest, so there
>> shouldn't be any shortage of reviewers, though Heikki's time may be
>> stretched a little thin, since Streaming Replication is also in the
>> queue, and he is working on index-only scans. That's really for him
>> to comment on, though.
>
> I'm going to put the index-only scans aside for now to focus on hot
> standby and streaming replication. Both are big patches, so there's
> plenty of work in those two alone, and not only for me.

What is the best way to attack this? Should I keep reviewing
index-only scans so that you have feedback for when you get back to
it, or should I move on to something else? If something else, does it
make more sense for me to look at HS since I did a bit of work with a
previous version, or would it be better for me to just pick one of the
other patches from the CommitFest and work on that?

Also, stepping back from me personally, should we try to assign some
additional reviewers to these patches? Is there some way we can
divide up review tasks among multiple people so that we're not
repeating each others work?

Thoughts appreciated, from Heikki, Simon, or others.

...Robert

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-18 06:53:24
Message-ID:	1253256804.9666.310.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, 2009-09-17 at 19:01 -0400, Robert Haas wrote:
> >
> > I'm going to put the index-only scans aside for now to focus on hot
> > standby and streaming replication. Both are big patches, so there's
> > plenty of work in those two alone, and not only for me.
>
> What is the best way to attack this? Should I keep reviewing
> index-only scans so that you have feedback for when you get back to
> it, or should I move on to something else? If something else, does it
> make more sense for me to look at HS since I did a bit of work with a
> previous version, or would it be better for me to just pick one of the
> other patches from the CommitFest and work on that?
>
> Also, stepping back from me personally, should we try to assign some
> additional reviewers to these patches? Is there some way we can
> divide up review tasks among multiple people so that we're not
> repeating each others work?
>
> Thoughts appreciated, from Heikki, Simon, or others.

I think this is a great opportunity to widen the pool of people
contributing to reviews.

I suggest the creation of a second group of people, performing
round-robin testing of patches. These people would be able to verify
* documentation matches implemented features (does it do what it says on
the tin?)
* usability of explicit features (do the features work well?)
* usability gap of unimplemented features (what else do we need?)
* are there any bugs?

These questions are often quickly answered for smaller patches, but HS's
scope mean that such a task properly executed could take a full week, if
not longer.

Second group of people are just as skilled Postgres people as reviewers,
in some cases more so, apart from they have less detailed knowledge of
the codebase. We have many such people and it would be good to encourage
them to perform thorough reviews rather than "tire kicking".

I'm not sure that Heikki needs additional reviewers. He now has
significant knowledge of the patch and is good at focusing on key
aspects of the internals. Other code reviewers are welcome, of course.

--
Simon Riggs www.2ndQuadrant.com

From:	Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-18 14:20:20
Message-ID:	m2zl8s8hcr.fsf@hi-media.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Also, stepping back from me personally, should we try to assign some
> additional reviewers to these patches? Is there some way we can
> divide up review tasks among multiple people so that we're not
> repeating each others work?
>
> Thoughts appreciated, from Heikki, Simon, or others.

How about this proposal:
http://archives.postgresql.org/pgsql-hackers/2009-08/msg00764.php

Regards,
--
dim

From:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-18 14:23:33
Message-ID:	f67928030909180723q618ea60dvce7d9b1be269038c@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Sep 15, 2009 at 2:41 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

>
> OK, here is the latest version of the Hot Standby patchset. This is
> about version 30+ by now, but we should regard this as 0.2.1
> Patch against CVS HEAD (now): clean apply, compile, no known bugs.
>

Hi Simon,

Is there a reason that you remove the WAL_DEBUG shown below?

*************** begin:;
*** 899,923 ****
FIN_CRC32(rdata_crc);
record->xl_crc = rdata_crc;

- #ifdef WAL_DEBUG
- if (XLOG_DEBUG)
- {
- StringInfoData buf;
-
- initStringInfo(&buf);
- appendStringInfo(&buf, "INSERT @ %X/%X: ",
- RecPtr.xlogid,
RecPtr.xrecoff);
- xlog_outrec(&buf, record);
- if (rdata->data != NULL)
- {
- appendStringInfo(&buf, " - ");
- RmgrTable[record->xl_rmid].rm_desc(&buf,
record->xl_info, rdata->data);
- }
- elog(LOG, "%s", buf.data);
- pfree(buf.data);
- }
- #endif
-
/* Record begin of record in appropriate places */
ProcLastRecPtr = RecPtr;
Insert->PrevRecord = RecPtr;
--- 947,952 ----

Thanks,

Jeff

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-18 14:32:50
Message-ID:	1253284370.4155.29.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, 2009-09-18 at 07:23 -0700, Jeff Janes wrote:
> On Tue, Sep 15, 2009 at 2:41 PM, Simon Riggs <simon(at)2ndquadrant(dot)com>
> wrote:
>
> OK, here is the latest version of the Hot Standby patchset.
> This is
> about version 30+ by now, but we should regard this as 0.2.1
> Patch against CVS HEAD (now): clean apply, compile, no known
> bugs.

> Is there a reason that you remove the WAL_DEBUG shown below?

WAL_DEBUG is not removed by the patch, though that section of code is
removed, as you observe. I recall an earlier bug report by
me/conversation on hackers about how that section of code was
irrecoverably broken, since it's calling an rmgr routine while not in
recovery and also assuming the data is fully accessible at that point,
which it is not.

--
Simon Riggs www.2ndQuadrant.com

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-18 15:14:55
Message-ID:	24694.1253286895@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> On Fri, 2009-09-18 at 07:23 -0700, Jeff Janes wrote:
>> Is there a reason that you remove the WAL_DEBUG shown below?

> WAL_DEBUG is not removed by the patch, though that section of code is
> removed, as you observe. I recall an earlier bug report by
> me/conversation on hackers about how that section of code was
> irrecoverably broken, since it's calling an rmgr routine while not in
> recovery and also assuming the data is fully accessible at that point,
> which it is not.

Wouldn't it be sufficient to remove the rm_desc() call? I agree
that that's broken, but the rest doesn't seem to be.

regards, tom lane

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-18 15:43:04
Message-ID:	1253288584.4155.55.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, 2009-09-18 at 11:14 -0400, Tom Lane wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> > On Fri, 2009-09-18 at 07:23 -0700, Jeff Janes wrote:
> >> Is there a reason that you remove the WAL_DEBUG shown below?
>
> > WAL_DEBUG is not removed by the patch, though that section of code is
> > removed, as you observe. I recall an earlier bug report by
> > me/conversation on hackers about how that section of code was
> > irrecoverably broken, since it's calling an rmgr routine while not in
> > recovery and also assuming the data is fully accessible at that point,
> > which it is not.
>
> Wouldn't it be sufficient to remove the rm_desc() call? I agree
> that that's broken, but the rest doesn't seem to be.

That would make sense also. Previous action just because that was
earlier consensus. Will change.

--
Simon Riggs www.2ndQuadrant.com

From:	Marcos Luis Ortiz Valmaseda <mlortiz(at)uci(dot)cu>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-18 22:20:57
Message-ID:	894242697.957381253312457118.JavaMail.root@ucimail4.uci.cu
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I want to help on this area, but I need a mentor for this.
For example, Heikki will be a excellent mentor for me.

Following the theme, I think that we have to wide all questions for the process of the acceptance of a patch on the same way that you Simon.

We could write new requirements with all these ideas. Don´t you think?

Regards

"The hurry is enemy of the success: for that reason.......Be patient"

Ing. Marcos L. Ortiz Valmaseda
Línea Soporte y Despliegue
Centro de Tecnologías de Almacenamiento y Análisis de Datos (CENTALAD)

Linux User # 418229
PostgreSQL User
http://www.postgresql.org
http://www.planetpostgresql.org/
http://www.postgresql-es.org/

----- Mensaje original -----
De: "Simon Riggs" <simon(at)2ndQuadrant(dot)com>
Para: "Robert Haas" <robertmhaas(at)gmail(dot)com>
CC: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Josh Berkus" <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Enviados: Jueves, 17 de Septiembre 2009 20:53:24 GMT -10:00 Hawai
Asunto: Re: [HACKERS] Hot Standby 0.2.1

I think this is a great opportunity to widen the pool of people
contributing to reviews.

These questions are often quickly answered for smaller patches, but HS's
scope mean that such a task properly executed could take a full week, if
not longer.

--
Simon Riggs www.2ndQuadrant.com

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-21 10:50:09
Message-ID:	4AB75A61.6040505@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs wrote:
> OK, here is the latest version of the Hot Standby patchset. This is
> about version 30+ by now, but we should regard this as 0.2.1
> Patch against CVS HEAD (now): clean apply, compile, no known bugs.

Thanks! Attached is some minor comment and fixes, and some dead code
removal. Also available in my git repository, branch 'hs-riggs'.

The documentation talks about setting and checking
default_transaction_read_only, but I think it doesn't say anything about
transaction_read_only, which I find odd. This in particular:

> Users will be able to tell whether their session is read-only by
> + issuing SHOW default_transaction_read_only

seems misleading, as you might have default_transaction_read_only=on,
but still be able to do "SET transaction_read_only", so the *session*
isn't necessarily read-only.

The only bug I've found is this that we seem to be missing conflict
resolution for GiST index tuples deleted by the kill_prior_tuples
mechanism. Unless I'm missing something, we need similar handling there
that we have in b-tree.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment	Content-Type	Size
hs-minor-fixes-1.patch	text/x-diff	17.5 KB

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-21 13:01:52
Message-ID:	1253538112.4449.44.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 2009-09-21 at 13:50 +0300, Heikki Linnakangas wrote:

> The only bug I've found

> is this that we seem to be missing conflict
> resolution for GiST index tuples deleted by the kill_prior_tuples
> mechanism. Unless I'm missing something, we need similar handling there
> that we have in b-tree.

OK, I agree with that. Straightforward change. Thanks very much.

I marked the comment to indicate that the handling for GIST and GIN
indexes looked dubious to me also. I had the earlier "it is safe"
comments but that was before we looked at the kill prior tuples issue.

Re-reading code for GIN also, I note that there isn't any further work
because we don't kill prior tuples ever. Also, there is no special
handling of the GIN b-tree posting tree because VACUUM scans that in
logical sequence, rather than the physical sequence in nbtree.

--
Simon Riggs www.2ndQuadrant.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-21 13:23:01
Message-ID:	1253539381.4449.55.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 2009-09-21 at 13:50 +0300, Heikki Linnakangas wrote:
> The documentation talks about setting and checking
> default_transaction_read_only, but I think it doesn't say anything
> about
> transaction_read_only, which I find odd. This in particular:
>
> > Users will be able to tell whether their session is read-only by
> > + issuing SHOW default_transaction_read_only
>
> seems misleading, as you might have default_transaction_read_only=on,
> but still be able to do "SET transaction_read_only", so the *session*
> isn't necessarily read-only.

Yes, clearly missing a check there. Those two operations should be
blocked at higher level, using PreventCommandDuringRecovery() and I
confess that I thought they already were. Doesn't crash because of the
other checks in place, but gives wrong error message.

Thanks for penetration testing the patch.

--
Simon Riggs www.2ndQuadrant.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-21 19:52:30
Message-ID:	1253562750.4449.98.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 2009-09-21 at 14:01 +0100, Simon Riggs wrote:
> On Mon, 2009-09-21 at 13:50 +0300, Heikki Linnakangas wrote:

> > is this that we seem to be missing conflict
> > resolution for GiST index tuples deleted by the kill_prior_tuples
> > mechanism. Unless I'm missing something, we need similar handling there
> > that we have in b-tree.
>
> OK, I agree with that. Straightforward change. Thanks very much.
>
> I marked the comment to indicate that the handling for GIST and GIN
> indexes looked dubious to me also. I had the earlier "it is safe"
> comments but that was before we looked at the kill prior tuples issue.

ISTM I looked at this too quickly.

kill_prior_tuple is only ever set by these lines, after scan starts:

if (!scan->xactStartedInRecovery)
scan->kill_prior_tuple = scan->xs_hot_dead;

which is set in indexam.c, not within any particular am. So the coding,
as submitted, covers all index types, current and future.

AFAICS there is no bug, unless you have a test case or can explain
further?

Worth raising as a query because it forced me to re-check how GIST and
GIN work and am happy again now.

--
Simon Riggs www.2ndQuadrant.com

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-21 20:06:33
Message-ID:	603c8f070909211306g9f00e7axd174e6856f5bec9f@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Sep 21, 2009 at 9:01 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>
> On Mon, 2009-09-21 at 13:50 +0300, Heikki Linnakangas wrote:
>
>> The only bug I've found
>
> !

Yeah, wow.

...Robert

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-22 02:07:49
Message-ID:	200909220207.n8M27ot07020@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs wrote:
>
> OK, here is the latest version of the Hot Standby patchset. This is
> about version 30+ by now, but we should regard this as 0.2.1
> Patch against CVS HEAD (now): clean apply, compile, no known bugs.

Wow, great! Simon has allowed us to pass a great milestone in Postgres
development.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

From:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-22 02:42:33
Message-ID:	f67928030909211942h41c330cak8201168100142a22@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Sep 15, 2009 at 2:41 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>
> OK, here is the latest version of the Hot Standby patchset. This is
> about version 30+ by now, but we should regard this as 0.2.1
> Patch against CVS HEAD (now): clean apply, compile, no known bugs.
>
> OVERVIEW
>
> You can download PDF versions of the fine manual is here
> http://wiki.postgresql.org/images/0/01/Hot_Standby_main.pdf

From this doc:

"In recovery, transactions will not be permitted to take any lock
higher other than
AccessShareLock or AccessExclusiveLock. In addition, transactions may never
assign a TransactionId and may never write WAL. The LOCK TABLE command by
default applies an AccessExclusiveLock. Any LOCK TABLE command that runs on
the standby and requests a specific lock type other than AccessShareLock will be
rejected."

The first sentence seems to say that clients on the stand-by can take
ACCESS EXCLUSIVE, while the last sentence seems to say that they
cannot do so.

I did a little experiment on a hot standby instance. I expected that
either I would be denied the lock altogether, or the lock would cause
WAL replay to be paused until either I committed or was forcibly
canceled. But neither happened, I was granted the lock but WAL replay
continued anyway.

jjanes=# begin;
BEGIN
jjanes=# lock table pgbench_history in access exclusive mode;
LOCK TABLE
jjanes=# select count(*) from pgbench_history;
count
--------
519104
(1 row)

jjanes=# select count(*) from pgbench_history;
count
--------
527814
(1 row)

Is this the expected behavior?

Thanks,

Jeff

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-22 07:11:40
Message-ID:	4AB878AC.2000403@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs wrote:
> On Mon, 2009-09-21 at 14:01 +0100, Simon Riggs wrote:
>> On Mon, 2009-09-21 at 13:50 +0300, Heikki Linnakangas wrote:
>
>>> is this that we seem to be missing conflict
>>> resolution for GiST index tuples deleted by the kill_prior_tuples
>>> mechanism. Unless I'm missing something, we need similar handling there
>>> that we have in b-tree.
>> OK, I agree with that. Straightforward change. Thanks very much.
>>
>> I marked the comment to indicate that the handling for GIST and GIN
>> indexes looked dubious to me also. I had the earlier "it is safe"
>> comments but that was before we looked at the kill prior tuples issue.
>
> ISTM I looked at this too quickly.
>
> kill_prior_tuple is only ever set by these lines, after scan starts:
>
> if (!scan->xactStartedInRecovery)
> scan->kill_prior_tuple = scan->xs_hot_dead;
>
> which is set in indexam.c, not within any particular am. So the coding,
> as submitted, covers all index types, current and future.

That just stops more tuples from being killed in the standby. I was
thinking that we need similar conflict resolution in GiST that we do
with b-tree delete records, to stop killed tuples from being deleted
while they might still be needed in the standby.

But looking closer at GiST, it seems that GiST doesn't actually do that;
killed tuples are not removed at page splits, but only by VACUUM. So
that's not an issue after all.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-22 07:22:41
Message-ID:	1253604161.4449.138.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 2009-09-21 at 19:42 -0700, Jeff Janes wrote:
> On Tue, Sep 15, 2009 at 2:41 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> >
> > OK, here is the latest version of the Hot Standby patchset. This is
> > about version 30+ by now, but we should regard this as 0.2.1
> > Patch against CVS HEAD (now): clean apply, compile, no known bugs.
> >
> > OVERVIEW
> >
> > You can download PDF versions of the fine manual is here
> > http://wiki.postgresql.org/images/0/01/Hot_Standby_main.pdf
>
>
> >From this doc:
>
> "In recovery, transactions will not be permitted to take any lock
> higher other than
> AccessShareLock or AccessExclusiveLock. In addition, transactions may never
> assign a TransactionId and may never write WAL. The LOCK TABLE command by
> default applies an AccessExclusiveLock. Any LOCK TABLE command that runs on
> the standby and requests a specific lock type other than AccessShareLock will be
> rejected."
>
> The first sentence seems to say that clients on the stand-by can take
> ACCESS EXCLUSIVE, while the last sentence seems to say that they
> cannot do so.

You are right to pick up that discrepancy, as Heikki did also.

The root cause of that discrepancy is a change in patch behaviour
between January and now that I will use your post to highlight and
discuss, if you don't mind. (and yes, the docs need to be corrected)

Initially, it seemed that it was certain that a read-only backend could
not take an AccessExclusiveLock. On further thought, there is no
particular reason to block AccessExclusiveLocks themselves, just that
most things you might do while holding one are banned. But the lock
itself is fine. (Any challenge on that?)

AccessExclusiveLocks can be used to serialize the actions of other
backends. That is a very common use case, so my concern was that LOCK
TABLE would be a no-op unless we allowed AccessExclusiveLock, so the
patch does allow it.

> I did a little experiment on a hot standby instance. I expected that
> either I would be denied the lock altogether, or the lock would cause
> WAL replay to be paused until either I committed or was forcibly
> canceled. But neither happened, I was granted the lock but WAL replay
> continued anyway.
>
> jjanes=# begin;
> BEGIN
> jjanes=# lock table pgbench_history in access exclusive mode;
> LOCK TABLE
> jjanes=# select count(*) from pgbench_history;
> count
> --------
> 519104
> (1 row)
>
> jjanes=# select count(*) from pgbench_history;
> count
> --------
> 527814
> (1 row)
>
> Is this the expected behavior?

By me, yes. WAL replay does not require a table lock to progress. Any
changes are protected with block-level locks. It does acquire a table
lock and cancel conflicting queries when it is about to replay something
that would cause a query to explode, such as dropping a table, as
explained in docs.

So this is not a bug.

The explanation of how the above sequence of events occurs is that the
backend acquires AccessExclusiveLock - please check on other session in
pg_locks. WAL replay continues by the Startup process, inserting further
rows into the pgbench_history table as a series of transactions. The
second select takes a later snapshot than the first and so sees more
data than the first select, hence a larger count. (And I am pleased to
see that recovery is progressing quickly even while your queries run).

So not a bug, but just one of many possible behaviours we could enforce.
1. Allow AccessExclusiveLocks yet they do not interrupt WAL replay
2. Allow AccessExclusiveLocks but have them pause WAL replay
3. Disallow AccessExclusiveLocks (and so LOCK TABLE is effectively a
no-op because it will not be able to serialize anything)

So the patch originally implemented (3) but now implements (1).

I would say that (2) is very undesirable because it puts WAL replay in
the control of non-superusers. That could mean LOCK TABLE implicitly
alters the high availability of the standby, and might even be used
maliciously to do that.

I'm open to views on whether we should use (1) or (3). Comments?

Implementing either is no problem and we have a straight choice. We may
even wish to review that again later from additional feedback.

(Jeff, you have also helped me understand that there is a bug in the way
serializable transactions are cancelled, which is easily corrected.
Thanks for that unexpected windfall, but it is otherwise unrelated to
your comments.)

--
Simon Riggs www.2ndQuadrant.com

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-22 08:04:15
Message-ID:	4AB884FF.1000207@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs wrote:
> On Mon, 2009-09-21 at 19:42 -0700, Jeff Janes wrote:
>> jjanes=# begin;
>> BEGIN
>> jjanes=# lock table pgbench_history in access exclusive mode;
>> LOCK TABLE
>> jjanes=# select count(*) from pgbench_history;
>> count
>> --------
>> 519104
>> (1 row)
>>
>> jjanes=# select count(*) from pgbench_history;
>> count
>> --------
>> 527814
>> (1 row)
>>
>> Is this the expected behavior?
>
> By me, yes. WAL replay does not require a table lock to progress. Any
> changes are protected with block-level locks. It does acquire a table
> lock and cancel conflicting queries when it is about to replay something
> that would cause a query to explode, such as dropping a table, as
> explained in docs.

That is rather surprising. You can't get that result in a normal server,
which I think is enough of a reason to disallow it. If you do LOCK TABLE
ACCESS EXCLUSIVE, you wouldn't expect the contents to change under your
nose.

> So not a bug, but just one of many possible behaviours we could enforce.
> 1. Allow AccessExclusiveLocks yet they do not interrupt WAL replay
> 2. Allow AccessExclusiveLocks but have them pause WAL replay
> 3. Disallow AccessExclusiveLocks (and so LOCK TABLE is effectively a
> no-op because it will not be able to serialize anything)
>
> So the patch originally implemented (3) but now implements (1).
>
> I would say that (2) is very undesirable because it puts WAL replay in
> the control of non-superusers. That could mean LOCK TABLE implicitly
> alters the high availability of the standby, and might even be used
> maliciously to do that.

I don't see a problem with (2) as long as the locker is kicked out after
max_standby_delay, like a long-running query. That's what I would
expect. I'm fine with (3) as well, but (1) does seem rather suprising
behavior.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-22 08:28:40
Message-ID:	1253608120.4449.160.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, 2009-09-22 at 11:04 +0300, Heikki Linnakangas wrote:
> >
> > By me, yes. WAL replay does not require a table lock to progress. Any
> > changes are protected with block-level locks. It does acquire a table
> > lock and cancel conflicting queries when it is about to replay something
> > that would cause a query to explode, such as dropping a table, as
> > explained in docs.
>
> That is rather surprising. You can't get that result in a normal server,
> which I think is enough of a reason to disallow it. If you do LOCK TABLE
> ACCESS EXCLUSIVE, you wouldn't expect the contents to change under your
> nose.

OK, "normality" is a reasonable argument against. So (1) is only a
partial implementation of serializing the table.

> > So not a bug, but just one of many possible behaviours we could enforce.
> > 1. Allow AccessExclusiveLocks yet they do not interrupt WAL replay
> > 2. Allow AccessExclusiveLocks but have them pause WAL replay
> > 3. Disallow AccessExclusiveLocks (and so LOCK TABLE is effectively a
> > no-op because it will not be able to serialize anything)
> >
> > So the patch originally implemented (3) but now implements (1).
> >
> > I would say that (2) is very undesirable because it puts WAL replay in
> > the control of non-superusers. That could mean LOCK TABLE implicitly
> > alters the high availability of the standby, and might even be used
> > maliciously to do that.
>
> I don't see a problem with (2) as long as the locker is kicked out after
> max_standby_delay, like a long-running query. That's what I would
> expect. I'm fine with (3) as well, but (1) does seem rather suprising
> behavior.

(2) gives other problems because it would force us to check for
conflicting locks for each heap & index WAL record to ensure that the
lock was honoured. We could optimize that but it's still going to cost.

I'd rather leave things at (3) for now and wait for further feedback.
"Start strict, relax later".

--
Simon Riggs www.2ndQuadrant.com

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-22 09:53:59
Message-ID:	4AB89EB7.8010606@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

In testing, it looks like there's still something wrong with the
subtransaction handling. I created a test function to create a large
number of subtransactions:

CREATE LANGUAGE plpgsql;
CREATE TABLE bar (id int4);

CREATE OR REPLACE FUNCTION subxids (n integer) RETURNS void LANGUAGE
plpgsql AS $$
BEGIN
IF n <= 0 THEN RETURN; END IF;
INSERT INTO bar VALUES (n);
PERFORM subxids(n - 1);
RETURN;
EXCEPTION WHEN raise_exception THEN NULL; END;
$$;

And used that to created 100 nested subtransactions in the primary server:

SELECT subxids(100);

I got this in the standby:

...
LOG: REDO @ 0/A000EA8; LSN 0/A000FCC: prev 0/A000E6C; xid 2662; len
264: Transaction - xid assignment xtop 2599 nsubxids 64; 2600 2601 2602
2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616
2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630
2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644
2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658
2659 2660 2661 2662 0
LOG: extend subtrans xid 2600 page 1 last_page 0
CONTEXT: rmgr 1 redo xid assignment xtop 2599 nsubxids 64; 2600 2601
2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615
2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629
2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643
2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657
2658 2659 2660 2661 2662 0
TRAP: FailedAssertion("!(((newestXact) >= ((TransactionId) 3)))", File:
"subtrans.c", Line: 299)
LOG: startup process (PID 28594) was terminated by signal 6: Aborted
LOG: terminating any other active server processes

Apparently an InvalidXid is sneaking into the unreported xid array
collected in the primary.

I actually bumped into this while I was testing a simpler implementation
of the logic to collect the unreported xids in the primary. As the patch
stands, we keep track of how many of the childXids at each
subtransaction level have already been reported, but it occurs to me
that it would be a lot simpler to populate a separate array of
unreported xids on-the-fly, as xids are assigned. With the simpler
implementation I bumped into another issue (see below), which is why I
wanted to test if it worked without the simplification.

So with the simpler logic, I had this problem. When I do this in the
primary:

postgres=# SELECT subxids(10000);
ERROR: stack depth limit exceeded
HINT: Increase the configuration parameter "max_stack_depth", after
ensuring the platform's stack depth limit is adequate.
CONTEXT: SQL statement "SELECT subxids( $1 - 1)"
PL/pgSQL function "subxids" line 4 at PERFORM
SQL statement "SELECT subxids( $1 - 1)"
...

I get this in the standby:

...
LOG: REDO @ 0/1000B6C4; LSN 0/1000B6F0: prev 0/1000B698; xid 4325; len
16: Transaction - abort: 2009-09-22 12:45:00.938243+03
LOG: record known xact 4325 latestObservedXid 4334
CONTEXT: rmgr 1 redo abort: 2009-09-22 12:45:00.938243+03
LOG: remove KnownAssignedXid 4325
CONTEXT: rmgr 1 redo abort: 2009-09-22 12:45:00.938243+03
LOG: REDO @ 0/1000B6F0; LSN 0/1000B71C: prev 0/1000B6C4; xid 4324; len
16: Transaction - abort: 2009-09-22 12:45:00.938438+03
LOG: record known xact 4324 latestObservedXid 4334
CONTEXT: rmgr 1 redo abort: 2009-09-22 12:45:00.938438+03
LOG: remove KnownAssignedXid 4324
CONTEXT: rmgr 1 redo abort: 2009-09-22 12:45:00.938438+03
LOG: REDO @ 0/1000B71C; LSN 0/1000B748: prev 0/1000B6F0; xid 4323; len
16: Transaction - abort: 2009-09-22 12:45:00.938632+03
LOG: record known xact 4323 latestObservedXid 4334
CONTEXT: rmgr 1 redo abort: 2009-09-22 12:45:00.938632+03
LOG: remove KnownAssignedXid 4323
CONTEXT: rmgr 1 redo abort: 2009-09-22 12:45:00.938632+03
LOG: 1 KnownAssignedXids 3619
CONTEXT: rmgr 1 redo abort: 2009-09-22 12:45:00.938632+03
FATAL: cannot remove KnownAssignedXid 4323
CONTEXT: rmgr 1 redo abort: 2009-09-22 12:45:00.938632+03
LOG: startup process (PID 31284) exited with exit code 1
LOG: terminating any other active server processes

It looks like the standby tries to remove XID 4323 from the
known-assigned hash table, but it's not there because it was removed and
set in pg_subtrans by an XLOG_XACT_ASSIGNMENT record earlier. I guess we
should just not throw an error in that case, but is there a way we could
catch that narrow case and still keep the check in
KnownAssignedXidsRemove()? It seems like the check might help catch
other bugs, so I'd rather keep it if possible.

I've pushed the simplified code to my git repository if you want to take
a look.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-22 10:01:02
Message-ID:	1253613662.4449.191.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, 2009-09-22 at 12:53 +0300, Heikki Linnakangas wrote:

> In testing, it looks like there's still something wrong with the
> subtransaction handling. I created a test function to create a large
> number of subtransactions:

OK, looking at this now. Thanks for the report.

--
Simon Riggs www.2ndQuadrant.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-22 10:29:34
Message-ID:	1253615374.4449.204.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, 2009-09-22 at 12:53 +0300, Heikki Linnakangas wrote:

> It looks like the standby tries to remove XID 4323 from the
> known-assigned hash table, but it's not there because it was removed
> and set in pg_subtrans by an XLOG_XACT_ASSIGNMENT record earlier. I
> guess we should just not throw an error in that case, but is there a
> way we could catch that narrow case and still keep the check in
> KnownAssignedXidsRemove()? It seems like the check might help catch
> other bugs, so I'd rather keep it if possible.

Yes, a certain test is important and I'm glad we've (almost) achieved
it.

ISTM we can say if not in KnownAssignedXids then check pg_subtrans. If
not in either then report error. May need to add info to the abort
record to check xtop also.

--
Simon Riggs www.2ndQuadrant.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-22 13:29:13
Message-ID:	1253626153.4449.218.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, 2009-09-22 at 12:53 +0300, Heikki Linnakangas wrote:
> In testing, it looks like there's still something wrong with the
> subtransaction handling. I created a test function to create a large
> number of subtransactions:
>
> CREATE LANGUAGE plpgsql;
> CREATE TABLE bar (id int4);
>
> CREATE OR REPLACE FUNCTION subxids (n integer) RETURNS void LANGUAGE
> plpgsql AS $$
> BEGIN
> IF n <= 0 THEN RETURN; END IF;
> INSERT INTO bar VALUES (n);
> PERFORM subxids(n - 1);
> RETURN;
> EXCEPTION WHEN raise_exception THEN NULL; END;
> $$;
>
> And used that to created 100 nested subtransactions in the primary server:
>
> SELECT subxids(100);
>
>
> I got this in the standby:
>
> ...
> LOG: REDO @ 0/A000EA8; LSN 0/A000FCC: prev 0/A000E6C; xid 2662; len
> 264: Transaction - xid assignment xtop 2599 nsubxids 64; 2600 2601 2602
> 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616
> 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630
> 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644
> 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658
> 2659 2660 2661 2662 0
> LOG: extend subtrans xid 2600 page 1 last_page 0
> CONTEXT: rmgr 1 redo xid assignment xtop 2599 nsubxids 64; 2600 2601
> 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615
> 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629
> 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643
> 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657
> 2658 2659 2660 2661 2662 0
> TRAP: FailedAssertion("!(((newestXact) >= ((TransactionId) 3)))", File:
> "subtrans.c", Line: 299)
> LOG: startup process (PID 28594) was terminated by signal 6: Aborted
> LOG: terminating any other active server processes
>
> Apparently an InvalidXid is sneaking into the unreported xid array
> collected in the primary.
>
> I actually bumped into this while I was testing a simpler implementation
> of the logic to collect the unreported xids in the primary. As the patch
> stands, we keep track of how many of the childXids at each
> subtransaction level have already been reported, but it occurs to me
> that it would be a lot simpler to populate a separate array of
> unreported xids on-the-fly, as xids are assigned. With the simpler
> implementation I bumped into another issue (see below), which is why I
> wanted to test if it worked without the simplification.

The bug seems an off-by-one error and would seem easily corrected in any
case; there is no fundamental problem revealed.

I prefer your suggested approach, since it avoids that rather complex
looking code that did the grovelling thru the nested xact states.

I'll get back to you with more on this tomorrow.

--
Simon Riggs www.2ndQuadrant.com

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-22 17:02:19
Message-ID:	20090922170219.GE14436@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas escribió:
> Simon Riggs wrote:
> > On Mon, 2009-09-21 at 19:42 -0700, Jeff Janes wrote:
> >> jjanes=# begin;
> >> BEGIN
> >> jjanes=# lock table pgbench_history in access exclusive mode;
> >> LOCK TABLE
> >> jjanes=# select count(*) from pgbench_history;
> >> count
> >> --------
> >> 519104
> >> (1 row)
> >>
> >> jjanes=# select count(*) from pgbench_history;
> >> count
> >> --------
> >> 527814
> >> (1 row)
> >>
> >> Is this the expected behavior?
> >
> > By me, yes. WAL replay does not require a table lock to progress. Any
> > changes are protected with block-level locks. It does acquire a table
> > lock and cancel conflicting queries when it is about to replay something
> > that would cause a query to explode, such as dropping a table, as
> > explained in docs.
>
> That is rather surprising. You can't get that result in a normal server,
> which I think is enough of a reason to disallow it. If you do LOCK TABLE
> ACCESS EXCLUSIVE, you wouldn't expect the contents to change under your
> nose.

I think the fallout from that argument is that WAL replay should hold
table-level locks (otherwise the workaround to this problem is too
special-casey). I'm not sure about that. If I really wanted to get
consistent results, I'd use a serializable transaction.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-22 17:06:43
Message-ID:	200909221706.n8MH6hR01408@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Anyone who is interested in how the hot standby behaves should read the
following excellent PDF Simon produced. It goes into great detail of
the slave's read-only transactions and how the standby behaves during
continuous slave recovery:

> You can download PDF versions of the fine manual is here
> http://wiki.postgresql.org/images/0/01/Hot_Standby_main.pdf

The function call docs are at:

> http://wiki.postgresql.org/images/1/10/Hot_Standby_Recovery_Functions.pdf

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-23 08:13:45
Message-ID:	4AB9D8B9.9090804@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Looking at the way cache invalidations are handled in two-phase
transactions, it would be simpler if we store the shared cache
invalidation messages in the twophase state file header, like we store
deleted relations and subxids. This allows them to be copied to the
COMMIT_PREPARED WAL record, so that we don't need treat twophase commits
differently than regular ones in xact_redo_commit. As the patch stands,
the new
xactGetCommittedInvalidationMessages/MakeSharedInvalidMessagesArray
mechanism is duplicated functionality with
AtPrepare_Inval/-PersistInvalidationMessage - both materialize the
pending shared invalidation messages so that they can be written to
disk. I did that in my git branch.

I wonder if we might have alignment issues with the
SharedInvalidationMessages being stored in WAL records, following the
subxids. All the data stored in that record have 4-byte alignment at the
moment, but if SharedInvalidationMessage ever needs 8-byte alignment, we
would have trouble. Probably not worth changing code, it's highly
unlikely that SharedInvalidationMessage will ever need 8-byte alignment,
but perhaps a comment somewhere would be in order.

I note that we don't emit RunningXacts after a shutdown checkpoint. So
if recovery starts at a shutdown checkpoint, we don't let read-only
backends in until the first online checkpoint. Could we treat a shutdown
checkpoint as a snapshot with no transactions running? Or do prepared
transactions screw that up?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-23 09:07:19
Message-ID:	4AB9E547.9040602@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

The logic in the lock manager to track the number of held
AccessExclusiveLocks (with ProcArrayIncrementNumHeldLocks and
ProcArrayDecrementNumHeldLocks) seems to be broken. I added an Assertion
into ProcArrayDecrementNumHeldLocks:

--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -1401,6 +1401,7 @@ ProcArrayIncrementNumHeldLocks(PGPROC *proc)
void
ProcArrayDecrementNumHeldLocks(PGPROC *proc)
{
+ Assert(proc->numHeldLocks > 0);
proc->numHeldLocks--;
}

This tripped the assertion:

postgres=# CREATE TABLE foo (id int4 primary key);
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index
"foo_pkey" for table "foo"
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

Making matters worse, the primary server refuses to startup up after
that, tripping the assertion again in crash recovery:

$ bin/postmaster -D data
LOG: database system was interrupted while in recovery at 2009-09-23
11:56:15 EEST
HINT: This probably means that some data is corrupted and you will have
to use the last backup for recovery.
LOG: database system was not properly shut down; automatic recovery in
progress
LOG: redo starts at 0/32000070
LOG: REDO @ 0/32000070; LSN 0/320000AC: prev 0/32000020; xid 0; len 32:
Heap2 - clean: rel 1663/11562/1249; blk 32 remxid 4352
LOG: consistent recovery state reached
LOG: REDO @ 0/320000AC; LSN 0/320000CC: prev 0/32000070; xid 0; len 4:
XLOG - nextOid: 24600
LOG: REDO @ 0/320000CC; LSN 0/320000F4: prev 0/320000AC; xid 0; len 12:
Storage - file create: base/11562/16408
LOG: REDO @ 0/320000F4; LSN 0/3200011C: prev 0/320000CC; xid 4364; len
12: Relation - exclusive relation lock: xid 4364 db 11562 rel 16408
LOG: REDO @ 0/3200011C; LSN 0/320001D8: prev 0/320000F4; xid 4364; len
159: Heap - insert: rel 1663/11562/1259; tid 5/4
...
LOG: REDO @ 0/32004754; LSN 0/32004878: prev 0/320046A8; xid 4364; len
264: Transaction - commit: 2009-09-23 11:55:51.888398+03; 15 inval
msgs:catcache id38 catcache id37 catcache id38 catcache id37 catcache
id38 catcache id37 catcache id7 catcache id6 catcache id26 smgr relcache
smgr relcache smgr relcache
TRAP: FailedAssertion("!(proc->numHeldLocks > 0)", File: "procarray.c",
Line: 1404)
LOG: startup process (PID 27430) was terminated by signal 6: Aborted
LOG: aborting startup due to startup process failure

I'm sure that's just a simple bug somewhere, but it highlights that we
need be careful to avoid putting any extra work into the normal recovery
path. Otherwise bugs in hot standby related code can cause crash
recovery to fail.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-23 09:33:20
Message-ID:	1253698400.4449.268.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 2009-09-23 at 11:13 +0300, Heikki Linnakangas wrote:

> Looking at the way cache invalidations are handled in two-phase
> transactions, it would be simpler if we store the shared cache
> invalidation messages in the twophase state file header, like we store
> deleted relations and subxids. This allows them to be copied to the
> COMMIT_PREPARED WAL record, so that we don't need treat twophase commits
> differently than regular ones in xact_redo_commit. As the patch stands,
> the new
> xactGetCommittedInvalidationMessages/MakeSharedInvalidMessagesArray
> mechanism is duplicated functionality with
> AtPrepare_Inval/-PersistInvalidationMessage - both materialize the
> pending shared invalidation messages so that they can be written to
> disk. I did that in my git branch.

We could, but the prepared transaction path already contains special
case code anyway, so we aren't reducing number of test cases required.
This looks like a possible area for refactoring, but I don't see the
need for pre-factoring. i.e. don't rework before commit, rework after.

> I wonder if we might have alignment issues with the
> SharedInvalidationMessages being stored in WAL records, following the
> subxids. All the data stored in that record have 4-byte alignment at the
> moment, but if SharedInvalidationMessage ever needs 8-byte alignment, we
> would have trouble. Probably not worth changing code, it's highly
> unlikely that SharedInvalidationMessage will ever need 8-byte alignment,
> but perhaps a comment somewhere would be in order.

It's a possible source of bugs, but there are no issues there AFAICS.
The technique of multiple arrays on a WAL record wasn't invented by this
patch.

> I note that we don't emit RunningXacts after a shutdown checkpoint. So
> if recovery starts at a shutdown checkpoint, we don't let read-only
> backends in until the first online checkpoint. Could we treat a shutdown
> checkpoint as a snapshot with no transactions running? Or do prepared
> transactions screw that up?

We could, but I see no requirement for starting HS from a backup taken
on a shutdown database. It's just another special case to test and since
we already have significant number of important test cases I'd say add
this later.

That seems to have reflected all of your points on this post, though
thanks for the comments. I'm keen to reduce complexity in areas that
have caused bugs, but for stuff that is working I am tempted to leave
alone on such a big patch. Anything we can do to avoid re-testing
sections of code and/or reduce the number of tests required is going to
increase stability.

--
Simon Riggs www.2ndQuadrant.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-23 09:34:41
Message-ID:	1253698481.4449.270.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 2009-09-23 at 12:07 +0300, Heikki Linnakangas wrote:
> it highlights that we
> need be careful to avoid putting any extra work into the normal
> recovery
> path. Otherwise bugs in hot standby related code can cause crash
> recovery to fail.

Excellent point. I will put in additional protective code there.

--
Simon Riggs www.2ndQuadrant.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-23 10:23:00
Message-ID:	1253701380.4449.284.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 2009-09-23 at 12:07 +0300, Heikki Linnakangas wrote:
> seems to be broken

Agreed.

Patch withdrawn for correction and rework. Nothing serious, but not much
point doing further testing to all current issues resolved.

Tracking of issues raised and later solved via Wiki.

--
Simon Riggs www.2ndQuadrant.com

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-23 11:36:02
Message-ID:	4ABA0822.1030905@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs wrote:
> On Wed, 2009-09-23 at 11:13 +0300, Heikki Linnakangas wrote:
>> I note that we don't emit RunningXacts after a shutdown checkpoint. So
>> if recovery starts at a shutdown checkpoint, we don't let read-only
>> backends in until the first online checkpoint. Could we treat a shutdown
>> checkpoint as a snapshot with no transactions running? Or do prepared
>> transactions screw that up?
>
> We could, but I see no requirement for starting HS from a backup taken
> on a shutdown database. It's just another special case to test and since
> we already have significant number of important test cases I'd say add
> this later.

There's also a related issue that if a backend holding
AccessExclusiveLock crashes without writing an abort WAL record, the
lock is never released in the standby. We handle the expiration of xids
at replay of running-xacts records, but AFAICS we don't do that for locks.

It shouldn't be much code to clear those states at shutdown checkpoint,
just a few lines to call the right functions methinks.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-23 14:45:26
Message-ID:	4ABA3486.1030609@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas wrote:
> Simon Riggs wrote:
>> On Wed, 2009-09-23 at 11:13 +0300, Heikki Linnakangas wrote:
>>> I note that we don't emit RunningXacts after a shutdown checkpoint. So
>>> if recovery starts at a shutdown checkpoint, we don't let read-only
>>> backends in until the first online checkpoint. Could we treat a shutdown
>>> checkpoint as a snapshot with no transactions running? Or do prepared
>>> transactions screw that up?
>> We could, but I see no requirement for starting HS from a backup taken
>> on a shutdown database. It's just another special case to test and since
>> we already have significant number of important test cases I'd say add
>> this later.
>
> There's also a related issue that if a backend holding
> AccessExclusiveLock crashes without writing an abort WAL record, the
> lock is never released in the standby. We handle the expiration of xids
> at replay of running-xacts records, but AFAICS we don't do that for locks.

Ah, scratch that, I now see that we do call
XactClearRecoveryTransactions() when we see a shutdown checkpoint, which
clears all recovery locks. But doesn't that prematurely clear all locks
belonging to prepared transactions as well?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-23 15:44:13
Message-ID:	f67928030909230844o661e523cr5125631ee3c54a8a@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Sep 22, 2009 at 10:02 AM, Alvaro Herrera
<alvherre(at)commandprompt(dot)com> wrote:
> Heikki Linnakangas escribió:
>> Simon Riggs wrote:
>> > On Mon, 2009-09-21 at 19:42 -0700, Jeff Janes wrote:
>> >> jjanes=# begin;
>> >> BEGIN
>> >> jjanes=# lock table pgbench_history in access exclusive mode;
>> >> LOCK TABLE
>> >> jjanes=# select count(*) from pgbench_history;
>> >> count
>> >> --------
>> >> 519104
>> >> (1 row)
>> >>
>> >> jjanes=# select count(*) from pgbench_history;
>> >> count
>> >> --------
>> >> 527814
>> >> (1 row)
>> >>
>> >> Is this the expected behavior?
>> >
>> > By me, yes. WAL replay does not require a table lock to progress. Any
>> > changes are protected with block-level locks. It does acquire a table
>> > lock and cancel conflicting queries when it is about to replay something
>> > that would cause a query to explode, such as dropping a table, as
>> > explained in docs.
>>
>> That is rather surprising. You can't get that result in a normal server,
>> which I think is enough of a reason to disallow it. If you do LOCK TABLE
>> ACCESS EXCLUSIVE, you wouldn't expect the contents to change under your
>> nose.

Right. I'd rather be denied the lock than have it granted to me but
not do the same thing it does on a primary---prevent all changes to
the locked table.

> I think the fallout from that argument is that WAL replay should hold
> table-level locks (otherwise the workaround to this problem is too
> special-casey). I'm not sure about that. If I really wanted to get
> consistent results, I'd use a serializable transaction.

Unfortunately, isolation level "serializable" is not truly
serializable. Usually it is good enough, but when it isn't good
enough and you need an explicit table lock (a very rare but not
nonexistent situation), I think it should either lock the table in the
manner it would do on the primary, or throw an error. I think that
silently changing the behavior between primary and standby is not a
good thing.

Cheers,

Jeff

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-23 16:03:04
Message-ID:	2476.1253721784@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
> Unfortunately, isolation level "serializable" is not truly
> serializable. Usually it is good enough, but when it isn't good
> enough and you need an explicit table lock (a very rare but not
> nonexistent situation), I think it should either lock the table in the
> manner it would do on the primary, or throw an error. I think that
> silently changing the behavior between primary and standby is not a
> good thing.

+1 --- this proposal made me acutely uncomfortable, too.

regards, tom lane

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-23 16:07:26
Message-ID:	4ABA47BE.9020807@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs wrote:
> On Wed, 2009-09-23 at 12:07 +0300, Heikki Linnakangas wrote:
>> seems to be broken
>
> Agreed.

Looking at the relation lock stuff a bit more...

When someone tries to acquire an AccessExclusiveLock, but can't get it
immediately, we sleep while holding RecoveryInfoLock. That doesn't
affect normal queries, but it will in turn block any attempt to get a
running-xacts snapshot, and will thus block checkpoint from finishing.

It could take a very long time until you get an AccessExclusiveLock on a
busy table, so that seems pretty bad. I think we need to think a bit
harder about that locking.

The problem becomes a lot easier if we accept that it's OK to have a
lock included in the running-xacts snapshot and also appear in a
XLOG_RELATION_LOCK record later. The standby should handle that
gracefully already. If we just remove RecoveryInfoLock, that can happen,
but it still won't be possible for a lock to be missed out which is what
we really care about.

Rather than keep the numHeldLocks counters per-proc in proc array, I
think it would be simpler to have a single (or one per lock partition)
counter in shared memory in lock.c. It's just an optimization to make it
faster to find out that there is no loggable AccessExclusiveLocks in the
system, so it really rather belongs into the lock manager.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-23 16:07:58
Message-ID:	4ABA47DE.1070900@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon,

> Patch withdrawn for correction and rework. Nothing serious, but not much
> point doing further testing to all current issues resolved.

:-(

Good thing we went for 4 CFs.

Is there a GIT branch of Simon's current working version up somewhere?

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-23 16:30:31
Message-ID:	4ABA4D27.1090703@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Josh Berkus wrote:
>> Patch withdrawn for correction and rework. Nothing serious, but not much
>> point doing further testing to all current issues resolved.
>
> :-(
>
> Good thing we went for 4 CFs.

I think we should try to hammer this in in this commitfest. None of the
issues found this far are too serious, nothing that requires major rewrites.

> Is there a GIT branch of Simon's current working version up somewhere?

I have my git repository at git.postgresql.org, branch 'hs-riggs'. I've
pushed a number of small changes there over Simon's patch.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-23 16:36:48
Message-ID:	4ABA4EA0.5070708@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki,

> I think we should try to hammer this in in this commitfest. None of the
> issues found this far are too serious, nothing that requires major rewrites.

It would certainly be valuable to get users testing it starting with
Alpha2 instead of waiting 2 months.

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-24 10:33:30
Message-ID:	4ABB4AFA.4090605@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas wrote:
> The problem becomes a lot easier if we accept that it's OK to have a
> lock included in the running-xacts snapshot and also appear in a
> XLOG_RELATION_LOCK record later. The standby should handle that
> gracefully already. If we just remove RecoveryInfoLock, that can happen,
> but it still won't be possible for a lock to be missed out which is what
> we really care about.

I see the problem with that now. Without the lock, it's possible that
the XLOG_RELATION_LOCK WAL record is written before the
XLOG_RUNNING_XACTS record. If the lock is not included in the snapshot,
we want the lock WAL record to be after the snapshot record.

So i guess we'll need the RecoveryInfoLock. But we don't need to hold it
across the wait. I think it's enough to acquire it just before writing
the WAL record in LockAcquire. That will ensure that the WAL record
isn't written until the snapshot is completely finished.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-25 08:49:37
Message-ID:	4ABC8421.6000802@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Looking at the startup sequence now.

I see that you modified ExtendSUBTRANS so that it doesn't wipe out
previously set values if it's called with out-of-order xids. I guess
that works, although I think it can leave pages unzeroed if it's called
with a large enough gap between xids, so that the previous one was on
e.g page 10 and the next one on page 12. Page 11 would not be zeroed,
AFAICS. Not sure if such big gaps in the xid sequence are possible, but
seems so if you have a very large max_connections setting and a lot of
subtransactions.

Nevertheless, I'd feel better if we kept ExtendSUBTRANS unmodified, and
instead arranged things so that ExtendSUBTRANS is always called in
xid-order. We can call it from RecordKnownAssignedTransactionIds, which
handles the out-of-order problem for other purposes already.

I think we have similar problem with clog. ExtendCLOG is currently not
called during recovery, so isn't it possible for the problem with
subtransaction commits that's described in the comments in StartupCLOG
to arise during hot standby? Again, I think we should call ExtendCLOG()
in RecordKnownAssignedTransactionIds. RecordKnownAssignedTransactionIds
is the hot standby version of GetNewTransactionId(), so to speak.

If we go with that, I think we'll need to call StartupSUBTRANS/CLOG
earlier in the startup sequence too, when we establish the startup
snapshot and let backends in.

Thoughts?

Other stuff:

- I removed the new DBConsistentUpToLSN() function and moved its
functionality into XLogNeedsFlush(). Since XLogFlush() updates
minRecoveryPoint instead of flushing WAL during recovery, it makes sense
for XLogNeedsFlush() to correspondingly check if minRecoveryPoint needs
to be updated when in recovery. That's a better definition for the call
in bufmgr.c too.

- I went ahead with the changes with RecoveryInfoLock and tracking the
number of held AccessExclusive locks in lmgr.c instead of proc array.

Can we find a better name for "loggable locks"? It means locks that need
to be WAL logged when acquired, for hot standby, and "loggable lock"
doesn't sound right for that. "Loggable" implies that it can be logged,
but it really means that it *must* be logged.

Keep an eye on my git repository for latest changes.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-25 09:56:37
Message-ID:	1253872597.4449.578.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, 2009-09-24 at 13:33 +0300, Heikki Linnakangas wrote:
> Heikki Linnakangas wrote:
> > The problem becomes a lot easier if we accept that it's OK to have a
> > lock included in the running-xacts snapshot and also appear in a
> > XLOG_RELATION_LOCK record later. The standby should handle that
> > gracefully already. If we just remove RecoveryInfoLock, that can happen,
> > but it still won't be possible for a lock to be missed out which is what
> > we really care about.
>
> I see the problem with that now. Without the lock, it's possible that
> the XLOG_RELATION_LOCK WAL record is written before the
> XLOG_RUNNING_XACTS record. If the lock is not included in the snapshot,
> we want the lock WAL record to be after the snapshot record.
>
> So i guess we'll need the RecoveryInfoLock. But we don't need to hold it
> across the wait. I think it's enough to acquire it just before writing
> the WAL record in LockAcquire. That will ensure that the WAL record
> isn't written until the snapshot is completely finished.

I will think some more on that. I remember thinking there was a reason
why that wasn't enough, possibly to do with no-wait locks which I
remember caused me to change that code.

--
Simon Riggs www.2ndQuadrant.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-25 09:58:46
Message-ID:	1253872726.4449.581.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 2009-09-23 at 19:07 +0300, Heikki Linnakangas wrote:

> Rather than keep the numHeldLocks counters per-proc in proc array, I
> think it would be simpler to have a single (or one per lock partition)
> counter in shared memory in lock.c. It's just an optimization to make it
> faster to find out that there is no loggable AccessExclusiveLocks in the
> system, so it really rather belongs into the lock manager.

What lock would protect that value? The whole purpose is to avoid taking
the LockMgrLocks and to give something that is accessible by the locks
already held by GetRunningTransactionData().

--
Simon Riggs www.2ndQuadrant.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-25 10:04:44
Message-ID:	1253873084.4449.589.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 2009-09-23 at 17:45 +0300, Heikki Linnakangas wrote:
> Heikki Linnakangas wrote:
> > Simon Riggs wrote:
> >> On Wed, 2009-09-23 at 11:13 +0300, Heikki Linnakangas wrote:
> >>> I note that we don't emit RunningXacts after a shutdown checkpoint. So
> >>> if recovery starts at a shutdown checkpoint, we don't let read-only
> >>> backends in until the first online checkpoint. Could we treat a shutdown
> >>> checkpoint as a snapshot with no transactions running? Or do prepared
> >>> transactions screw that up?
> >> We could, but I see no requirement for starting HS from a backup taken
> >> on a shutdown database. It's just another special case to test and since
> >> we already have significant number of important test cases I'd say add
> >> this later.
> >
> > There's also a related issue that if a backend holding
> > AccessExclusiveLock crashes without writing an abort WAL record, the
> > lock is never released in the standby. We handle the expiration of xids
> > at replay of running-xacts records, but AFAICS we don't do that for locks.
>
> Ah, scratch that, I now see that we do call
> XactClearRecoveryTransactions() when we see a shutdown checkpoint, which
> clears all recovery locks. But doesn't that prematurely clear all locks
> belonging to prepared transactions as well?

Much better to read your second post(s). :-)

Yes, you have found a(nother) issue. This was the first one that gave me
pause to think of the answer. The locks currently aren't tracked as to
whether they are 2PC or not, so we would need to store that info also so
that we can selectively release locks later.

Question: is it possible to do a fast shutdown when we have a prepared
transaction? Would it be better to take a different approach there for
prepared transactions? It seems strange to write a shutdown checkpoint
when the system isn't yet "clean".

--
Simon Riggs www.2ndQuadrant.com

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-25 10:14:53
Message-ID:	4ABC981D.6050700@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs wrote:
> On Wed, 2009-09-23 at 19:07 +0300, Heikki Linnakangas wrote:
>
>> Rather than keep the numHeldLocks counters per-proc in proc array, I
>> think it would be simpler to have a single (or one per lock partition)
>> counter in shared memory in lock.c. It's just an optimization to make it
>> faster to find out that there is no loggable AccessExclusiveLocks in the
>> system, so it really rather belongs into the lock manager.
>
> What lock would protect that value? The whole purpose is to avoid taking
> the LockMgrLocks and to give something that is accessible by the locks
> already held by GetRunningTransactionData().

The lock partition lock (so we really need one counter per partition, a
single counter would need additional locking). We're already holding
that in LockAcquire/LockRelease when we need to increment/decrement the
counter.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-25 10:23:51
Message-ID:	4ABC9A37.2090008@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs wrote:
> On Wed, 2009-09-23 at 17:45 +0300, Heikki Linnakangas wrote:
>> XactClearRecoveryTransactions() when we see a shutdown checkpoint, which
>> clears all recovery locks. But doesn't that prematurely clear all locks
>> belonging to prepared transactions as well?
>
> Much better to read your second post(s). :-)
>
> Yes, you have found a(nother) issue. This was the first one that gave me
> pause to think of the answer. The locks currently aren't tracked as to
> whether they are 2PC or not, so we would need to store that info also so
> that we can selectively release locks later.
>
> Question: is it possible to do a fast shutdown when we have a prepared
> transaction?

Yes.

> Would it be better to take a different approach there for
> prepared transactions? It seems strange to write a shutdown checkpoint
> when the system isn't yet "clean".

Hmm, I guess you could define prepared transactions as active backends
from the shutdown point of view, and wait for them to finish. I can see
one problem, though: Once you issue shutdown, fast or smart, we no
longer accept new connections. So you can't connect to issue the
ROLLBACK/COMMIT PREPARED anymore. Anyway, it would be a change from the
current behavior, so it would be better to cope with prepared
transactions in the standby.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-25 10:26:26
Message-ID:	1253874386.4449.602.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, 2009-09-25 at 11:49 +0300, Heikki Linnakangas wrote:
> Looking at the startup sequence now.
>
> I see that you modified ExtendSUBTRANS so that it doesn't wipe out
> previously set values if it's called with out-of-order xids. I guess
> that works, although I think it can leave pages unzeroed if it's called
> with a large enough gap between xids, so that the previous one was on
> e.g page 10 and the next one on page 12. Page 11 would not be zeroed,
> AFAICS. Not sure if such big gaps in the xid sequence are possible, but
> seems so if you have a very large max_connections setting and a lot of
> subtransactions.

Yeh, it happens and I've seen it - which is why the code is there.

> Nevertheless, I'd feel better if we kept ExtendSUBTRANS unmodified, and
> instead arranged things so that ExtendSUBTRANS is always called in
> xid-order. We can call it from RecordKnownAssignedTransactionIds, which
> handles the out-of-order problem for other purposes already.
>
> I think we have similar problem with clog. ExtendCLOG is currently not
> called during recovery, so isn't it possible for the problem with
> subtransaction commits that's described in the comments in StartupCLOG
> to arise during hot standby? Again, I think we should call ExtendCLOG()
> in RecordKnownAssignedTransactionIds. RecordKnownAssignedTransactionIds
> is the hot standby version of GetNewTransactionId(), so to speak.

OK. We record xids in sequence, so it is now a much more natural place
to do this. Zeroing them makes them dirty, unfortunately, but that is
another issue.

RecordKnownAssignedTransactionIds() only works once the snapshot has
been initialised. That could leave a gap, so we will need to run it
continuously if InHotStandby. Which may have knock-on changes also.

> If we go with that, I think we'll need to call StartupSUBTRANS/CLOG
> earlier in the startup sequence too, when we establish the startup
> snapshot and let backends in.

Yes, I think an earlier patch had that, but it turns out that there
really isn't anything for those functions to do. Really we could rename
those functions EndOfRecoverySUBTRANS and EndOfRecoveryCLOG to
illustrate their role better.

> - I removed the new DBConsistentUpToLSN() function and moved its
> functionality into XLogNeedsFlush(). Since XLogFlush() updates
> minRecoveryPoint instead of flushing WAL during recovery, it makes sense
> for XLogNeedsFlush() to correspondingly check if minRecoveryPoint needs
> to be updated when in recovery. That's a better definition for the call
> in bufmgr.c too.
>
> - I went ahead with the changes with RecoveryInfoLock and tracking the
> number of held AccessExclusive locks in lmgr.c instead of proc array.
>
> Can we find a better name for "loggable locks"? It means locks that need
> to be WAL logged when acquired, for hot standby, and "loggable lock"
> doesn't sound right for that. "Loggable" implies that it can be logged,
> but it really means that it *must* be logged.

The distinction of "loggable" is somewhat false since we rely on the
fact that only one person is holding it. So we may as well just call em
what they are: AccessExclusiveLocks.

> Keep an eye on my git repository for latest changes.

No, I'm not doing that. If you want to submit changes, please do so to
the list or just mention what needs work and I will do it. Having
multiple versions of a patch isn't helpful, as I have already said. I
have already been burned multiple times by accepting trial code and you
yourself have re-written particular parts multiple times. I am very,
very grateful for your reviews and detailed suggestions, so this comment
is only about coherency and not tripping each other up. (If you want to
"editorialize", it needs to be just prior to commit, but making changes
to a patch just prior to commit has historically shown to introduce bugs
where there weren't any before).

There's enough changes already to demand a full re-test, so everything
discovered so far plus testing is about 2 weeks work. I will commit
things onto git as agreed as I complete coding but that won't imply full
testing.

--
Simon Riggs www.2ndQuadrant.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-25 10:28:08
Message-ID:	1253874488.4449.605.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, 2009-09-25 at 13:14 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > On Wed, 2009-09-23 at 19:07 +0300, Heikki Linnakangas wrote:
> >
> >> Rather than keep the numHeldLocks counters per-proc in proc array, I
> >> think it would be simpler to have a single (or one per lock partition)
> >> counter in shared memory in lock.c. It's just an optimization to make it
> >> faster to find out that there is no loggable AccessExclusiveLocks in the
> >> system, so it really rather belongs into the lock manager.
> >
> > What lock would protect that value? The whole purpose is to avoid taking
> > the LockMgrLocks and to give something that is accessible by the locks
> > already held by GetRunningTransactionData().
>
> The lock partition lock (so we really need one counter per partition, a
> single counter would need additional locking). We're already holding
> that in LockAcquire/LockRelease when we need to increment/decrement the
> counter.

Again: The whole purpose is to avoid taking those locks. Why would we
put something behind a lock we are trying to avoid taking?

--
Simon Riggs www.2ndQuadrant.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-25 10:32:13
Message-ID:	1253874733.4449.609.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, 2009-09-25 at 13:23 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > On Wed, 2009-09-23 at 17:45 +0300, Heikki Linnakangas wrote:
> >> XactClearRecoveryTransactions() when we see a shutdown checkpoint, which
> >> clears all recovery locks. But doesn't that prematurely clear all locks
> >> belonging to prepared transactions as well?
> >
> > Much better to read your second post(s). :-)
> >
> > Yes, you have found a(nother) issue. This was the first one that gave me
> > pause to think of the answer. The locks currently aren't tracked as to
> > whether they are 2PC or not, so we would need to store that info also so
> > that we can selectively release locks later.
> >
> > Question: is it possible to do a fast shutdown when we have a prepared
> > transaction?
>
> Yes.
>
> > Would it be better to take a different approach there for
> > prepared transactions? It seems strange to write a shutdown checkpoint
> > when the system isn't yet "clean".
>
> Hmm, I guess you could define prepared transactions as active backends
> from the shutdown point of view, and wait for them to finish. I can see
> one problem, though: Once you issue shutdown, fast or smart, we no
> longer accept new connections. So you can't connect to issue the
> ROLLBACK/COMMIT PREPARED anymore. Anyway, it would be a change from the
> current behavior, so it would be better to cope with prepared
> transactions in the standby.

Definitely need to cope with them for Hot Standby. My point was general
one to say that behaviour is very non-useful for users with prepared
transactions. It just causes manual effort by a DBA each time the system
is shutdown.

--
Simon Riggs www.2ndQuadrant.com

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-25 11:00:52
Message-ID:	4ABCA2E4.6000404@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs wrote:
> Definitely need to cope with them for Hot Standby. My point was general
> one to say that behaviour is very non-useful for users with prepared
> transactions. It just causes manual effort by a DBA each time the system
> is shutdown.

The transaction manager is supposed to reconnect and finish any in-doubt
transactions, eventually.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-25 12:50:52
Message-ID:	4ABCBCAC.2010000@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

In XidInMVCCSnapshot:
> if (snapshot->takenDuringRecovery)
> {
> /*
> * If the snapshot contains full subxact data, the fastest way to check
> * things is just to compare the given XID against both subxact XIDs and
> * top-level XIDs. If the snapshot overflowed, we have to use pg_subtrans
> * to convert a subxact XID to its parent XID, but then we need only look
> * at top-level XIDs not subxacts.
> */
...
> }
> else
> {
> int32 j;
>
> /*
> *
> * In recovery we store all xids in the subxact array because this
> * is by far the bigger array and we mostly don't know which xids
> * are top-level and which are subxacts. The xip array is empty.
> *
> * We start by searching subtrans, if we overflowed.
> */
...
> }

Hang on, isn't this 180 degrees backwards?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-25 17:25:35
Message-ID:	1253899535.4449.699.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, 2009-09-25 at 15:50 +0300, Heikki Linnakangas wrote:

> Hang on, isn't this 180 degrees backwards?

A witty riposte escapes me; yes, the if test is !correct.

What I find amazing is that it passed the test where I put it doing make
installcheck in an infinite loop for a long time. I guess that means the
regression tests hardly touch the concurrency code at all, which now I
think about it makes sense but I still find that very worrying. :-(

--
Simon Riggs www.2ndQuadrant.com

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-25 20:30:30
Message-ID:	20090925203030.GR3914@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs wrote:

> What I find amazing is that it passed the test where I put it doing make
> installcheck in an infinite loop for a long time. I guess that means the
> regression tests hardly touch the concurrency code at all, which now I
> think about it makes sense but I still find that very worrying. :-(

Try installcheck-parallel, which should be a bit better. It's probably
not yet good enough though because it always runs the same tests
concurrently.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-25 21:04:37
Message-ID:	1253912677.4449.718.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, 2009-09-25 at 16:30 -0400, Alvaro Herrera wrote:
> Simon Riggs wrote:
>
> > What I find amazing is that it passed the test where I put it doing make
> > installcheck in an infinite loop for a long time. I guess that means the
> > regression tests hardly touch the concurrency code at all, which now I
> > think about it makes sense but I still find that very worrying. :-(
>
> Try installcheck-parallel, which should be a bit better. It's probably
> not yet good enough though because it always runs the same tests
> concurrently.

Good tip, thanks.

--
Simon Riggs www.2ndQuadrant.com

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-26 00:10:58
Message-ID:	4ABD5C12.2080209@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera wrote:
> Simon Riggs wrote:
>
>
>> What I find amazing is that it passed the test where I put it doing make
>> installcheck in an infinite loop for a long time. I guess that means the
>> regression tests hardly touch the concurrency code at all, which now I
>> think about it makes sense but I still find that very worrying. :-(
>>
>
> Try installcheck-parallel, which should be a bit better. It's probably
> not yet good enough though because it always runs the same tests
> concurrently.
>
>

It is also quite easy to set up your own schedule.

cheers

andrew

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-26 00:23:12
Message-ID:	20090926002312.GY3914@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andrew Dunstan wrote:
>
> Alvaro Herrera wrote:
>
> >Try installcheck-parallel, which should be a bit better. It's probably
> >not yet good enough though because it always runs the same tests
> >concurrently.
>
> It is also quite easy to set up your own schedule.

... except you have to be careful with hidden test dependencies :-(

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-27 11:59:27
Message-ID:	4ABF539F.8050305@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

The locking in smgr_redo_commit and smgr_redo_abort doesn't look right.
First of all, smgr_redo_abort is not holding XidGenLock and
ProcArrayLock while modifying ShmemVariableCache->nextXid and
ShmemVariableCache->latestCompletedXid, respectively, like
smgr_redo_commit is. Attached patch fixes that.

But I think there's a little race condition in there anyway, as they
remove the finished xids from known-assigned-xids list by calling
ExpireTreeKnownAssignedTransactionIds, and
ShmemVariableCache->latestCompletedXid is updated only after that. We're
not holding ProcArrayLock between those two steps, like we do during
normal transaction commit. I'm not sure what kind of issues that can
lead to, but it seems like it can lead to broken snapshots if someone
calls GetSnapshotData() between those steps.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment	Content-Type	Size
0001-Make-locking-in-smgr_redo_abort-reflect-that-in-smgr.patch	text/x-diff	0 bytes

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-09-27 12:35:08
Message-ID:	4ABF5BFC.1060705@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

TransactionIdIsInProgress() doesn't consult the known-assigned-xids
structure. That's a problem: in the standby, TransactionIdIsInProgress()
can return false for a transaction that is still running in the master.
HeapTupleSatisfies* functions can incorrectly set HEAP_XMIN/XMAX_INVALID
hint bits for xids that commit later on.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-10-01 14:15:10
Message-ID:	1254406510.17864.221.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, 2009-09-27 at 15:35 +0300, Heikki Linnakangas wrote:
> TransactionIdIsInProgress() doesn't consult the known-assigned-xids
> structure. That's a problem: in the standby, TransactionIdIsInProgress()
> can return false for a transaction that is still running in the master.
> HeapTupleSatisfies* functions can incorrectly set HEAP_XMIN/XMAX_INVALID
> hint bits for xids that commit later on.

Agreed. Will fix.

--
Simon Riggs www.2ndQuadrant.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-10-02 20:29:53
Message-ID:	1254515393.4691.45.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, 2009-09-27 at 14:59 +0300, Heikki Linnakangas wrote:
> The locking in smgr_redo_commit and smgr_redo_abort doesn't look right.
> First of all, smgr_redo_abort is not holding XidGenLock and
> ProcArrayLock while modifying ShmemVariableCache->nextXid and
> ShmemVariableCache->latestCompletedXid, respectively, like
> smgr_redo_commit is. Attached patch fixes that.
>
> But I think there's a little race condition in there anyway, as they
> remove the finished xids from known-assigned-xids list by calling
> ExpireTreeKnownAssignedTransactionIds, and
> ShmemVariableCache->latestCompletedXid is updated only after that. We're
> not holding ProcArrayLock between those two steps, like we do during
> normal transaction commit. I'm not sure what kind of issues that can
> lead to, but it seems like it can lead to broken snapshots if someone
> calls GetSnapshotData() between those steps.

I confess I didn't know what you were talking about when you wrote this
and was expecting you to have a coffee and retract. I realise now you
meant xact_redo_commit() rather than smgr_ and it makes sense at last.

I've just committed about half of your patch exactly, but not all.

I've avoided making the changes to
ShmemVariableCache->latestCompletedXid directly and moved the update to
ExpireTreeKnownAssignedTransactionIds() which already had access to
max_xid while holding ProcArrayLock. Hopefully that resolves your
comment about race condition.

Also, I noticed that you removed the line
TransactionIdAdvance(ShmemVariableCache->nextXid);
in xact_redo_abort(). That looks like an error to me, since this follows
neither the comment nor how it is coded in xact_redo_commit(). Let me
know if there was some other reason for that line removal and I'll make
the change again.

--
Simon Riggs www.2ndQuadrant.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-10-07 13:52:04
Message-ID:	1254923524.26302.229.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, 2009-09-22 at 12:53 +0300, Heikki Linnakangas wrote:
> It looks like the standby tries to remove XID 4323 from the
> known-assigned hash table, but it's not there because it was removed
> and set in pg_subtrans by an XLOG_XACT_ASSIGNMENT record earlier. I
> guess we should just not throw an error in that case, but is there a
> way we could catch that narrow case and still keep the check in
> KnownAssignedXidsRemove()? It seems like the check might help catch
> other bugs, so I'd rather keep it if possible.

Fix attached.

--
Simon Riggs www.2ndQuadrant.com

Attachment	Content-Type	Size
4c10a767f23d918cdde8366408d88b1c801ca1c9.patch	text/x-patch	735 bytes

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-10-07 14:29:07
Message-ID:	1254925747.26302.236.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 2009-09-23 at 12:07 +0300, Heikki Linnakangas wrote:
> we need be careful to avoid putting any extra work into the normal
> recovery path. Otherwise bugs in hot standby related code can cause
> crash recovery to fail.

Re-checked code and found a couple of additional places that needed
tests
if (InHotStandby)

--
Simon Riggs www.2ndQuadrant.com

Attachment	Content-Type	Size
db82165c0084a7e3f0c838c4b019be0b04703381.patch	text/x-patch	4.1 KB

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-10-08 03:11:46
Message-ID:	4ACD5872.8000503@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs wrote:
> On Tue, 2009-09-22 at 12:53 +0300, Heikki Linnakangas wrote:
>> It looks like the standby tries to remove XID 4323 from the
>> known-assigned hash table, but it's not there because it was removed
>> and set in pg_subtrans by an XLOG_XACT_ASSIGNMENT record earlier. I
>> guess we should just not throw an error in that case, but is there a
>> way we could catch that narrow case and still keep the check in
>> KnownAssignedXidsRemove()? It seems like the check might help catch
>> other bugs, so I'd rather keep it if possible.
>
> Fix attached.

I fixed that on Friday already, in a slightly different manner. Do you
see a problem with that approach?

I've made public the version I'm working on. That's the version I'm
ultimately going to commit. It would be a lot more helpful if you
provided these patches over that version. Otherwise I have to refactor
them over that codebase, possibly introducing new bugs.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-10-08 03:19:46
Message-ID:	4ACD5A52.8000000@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas wrote:
> I've made public the version I'm working on. That's the version I'm
> ultimately going to commit. It would be a lot more helpful if you
> provided these patches over that version. Otherwise I have to refactor
> them over that codebase, possibly introducing new bugs.

Actually, it makes most sense if you push your changes directly to my
git repository. I just granted you write access to it. Please do these
changes against the version currently there, and push any fixes there
directly.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-10-08 07:04:35
Message-ID:	1254985475.26302.286.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 2009-10-07 at 23:19 -0400, Heikki Linnakangas wrote:
> Heikki Linnakangas wrote:
> > I've made public the version I'm working on. That's the version I'm
> > ultimately going to commit. It would be a lot more helpful if you
> > provided these patches over that version. Otherwise I have to refactor
> > them over that codebase, possibly introducing new bugs.
>
> Actually, it makes most sense if you push your changes directly to my
> git repository. I just granted you write access to it. Please do these
> changes against the version currently there, and push any fixes there
> directly.

OK, that makes sense. Thank you.

I would still like you to make a clear statement that the contents of
that repository are BSD licenced open source contributions. I have a
contractual responsibility to ensure the code I write is licenced in
that way. I have already mentioned I'm not looking at it yet for that
reason, so I am unaware of any changes not posted to the list.

You have posted patches that I have said I don't agree with. My name is
going to be on this when it goes in, so I don't think it makes any sense
to force that commit to include changes I don't agree with. I cannot
prevent you making changes afterwards, nor would I wish to. I'd like you
to respond sensibly to comments on those. We should work together on a
consensus basis, especially since I know you have not fully tested your
changes (either). Your error rate might be lower than mine, but it is
non-zero.

--
Simon Riggs www.2ndQuadrant.com

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-10-08 11:33:21
Message-ID:	4ACDCE01.2080003@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs wrote:
> I would still like you to make a clear statement that the contents of
> that repository are BSD licenced open source contributions.

Ok. All the content in the repository at
git://git.postgresql.org/git/users/heikki/postgres.git is released under
the same BSD license as PostgreSQL.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-10-08 11:49:47
Message-ID:	1255002587.26302.336.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, 2009-10-08 at 07:33 -0400, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > I would still like you to make a clear statement that the contents of
> > that repository are BSD licenced open source contributions.
>
> Ok. All the content in the repository at
> git://git.postgresql.org/git/users/heikki/postgres.git is released under
> the same BSD license as PostgreSQL.

Thank you. I presume that the copyright is novated also.

I'll get cracking on some changes.

--
Simon Riggs www.2ndQuadrant.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-10-08 23:53:14
Message-ID:	1255045994.6335.21.camel@ebony
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, 2009-10-08 at 12:49 +0100, Simon Riggs wrote:

> I'll get cracking on some changes.

This will probably be next week now, just in case you're wondering when
I'll start adding patches.

--
Simon Riggs www.2ndQuadrant.com

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-10-09 17:21:53
Message-ID:	200910091721.n99HLrg00592@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs wrote:
> You have posted patches that I have said I don't agree with. My name is
> going to be on this when it goes in, so I don't think it makes any sense
> to force that commit to include changes I don't agree with. I cannot
> prevent you making changes afterwards, nor would I wish to. I'd like you
> to respond sensibly to comments on those. We should work together on a
> consensus basis, especially since I know you have not fully tested your
> changes (either). Your error rate might be lower than mine, but it is
> non-zero.

The commit message and release notes mention might have just Simon's
name, or multiple people.

The hot patch commit is going to have multiple people involved before it
is committed, so if Simon is worried that the patch will have ideas in
it he does not agree with, perhaps we can make sure the commit and
release note items include Heikki's name as well. Normally if a
committer makes signficant changes to a patch, the committer's name is
also added to the commmit message, and I suggest we do the same thing
here with hot standby.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-10-09 17:56:28
Message-ID:	CB623596-824E-42BB-B1DC-F13333FE90E9@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Oct 9, 2009, at 1:21 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:

> Simon Riggs wrote:
>> You have posted patches that I have said I don't agree with. My
>> name is
>> going to be on this when it goes in, so I don't think it makes any
>> sense
>> to force that commit to include changes I don't agree with. I cannot
>> prevent you making changes afterwards, nor would I wish to. I'd
>> like you
>> to respond sensibly to comments on those. We should work together
>> on a
>> consensus basis, especially since I know you have not fully tested
>> your
>> changes (either). Your error rate might be lower than mine, but it is
>> non-zero.
>
> The commit message and release notes mention might have just Simon's
> name, or multiple people.
>
> The hot patch commit is going to have multiple people involved
> before it
> is committed, so if Simon is worried that the patch will have ideas in
> it he does not agree with, perhaps we can make sure the commit and
> release note items include Heikki's name as well. Normally if a
> committer makes signficant changes to a patch, the committer's name is
> also added to the commmit message, and I suggest we do the same thing
> here with hot standby.

I think this is a weakness of our current style of heavy-weight
commits. I don't have a great suggestion for fixing it, though. Even
if we move to git, a major feature like this has such a complex
development history that I'm queasy about slurping it in unsquashed.
But at least for simple features I think that there would be a value
in separating the patch author's work from the committer's adjustments.

I realize (now) that this would complicate the release note generation
process somewhat, based on our current process, and there might be
other downsides as well. All the same, I think it has enough value to
make it worth thinking about whether there's some way to make it work.

...Robert

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-10-09 18:05:51
Message-ID:	4ACF7B7F.3040307@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas wrote:
> But at least for simple features I think that there would be a value
> in separating the patch author's work from the committer's adjustments.
>
>

That is just going to make life harder for committers.

There are plenty of things with my name on them that are not exactly
what I submitted. I think that's true of just about everybody. Mostly
things changed hae improved, but not always. I don't think we should be
too proprietary about patches. As far as I'm concerned, credit goes to
the submitter and blame if any to the committer.

cheers

andrew

From:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-10-09 18:08:06
Message-ID:	1255111686.4298.1205.camel@jd-desktop.unknown.charter.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, 2009-10-09 at 14:05 -0400, Andrew Dunstan wrote:
>
> Robert Haas wrote:
> > But at least for simple features I think that there would be a value
> > in separating the patch author's work from the committer's adjustments.
> >
> >
>
> That is just going to make life harder for committers.
>
> There are plenty of things with my name on them that are not exactly
> what I submitted. I think that's true of just about everybody. Mostly
> things changed hae improved, but not always. I don't think we should be
> too proprietary about patches. As far as I'm concerned, credit goes to
> the submitter and blame if any to the committer.

>
> cheers
>
> andrew
>
--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
If the world pushes look it in the eye and GRR. Then push back harder. - Salamander

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hot Standby 0.2.1
Date:	2009-10-09 18:09:16
Message-ID:	200910091809.n99I9G715044@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andrew Dunstan wrote:
>
>
> Robert Haas wrote:
> > But at least for simple features I think that there would be a value
> > in separating the patch author's work from the committer's adjustments.
> >
> >
>
> That is just going to make life harder for committers.
>
> There are plenty of things with my name on them that are not exactly
> what I submitted. I think that's true of just about everybody. Mostly
> things changed hae improved, but not always. I don't think we should be
> too proprietary about patches. As far as I'm concerned, credit goes to
> the submitter and blame if any to the committer.

Agreed.

Simon is right that if only his name is on the commit, there is an
assumption that the committer made no changes, or only cosmetic ones.
For hot standby, I think the committer is making significant changes
(that could lead to bugs) and hence the committer's name should be in
the commit. Sometimes we say "adjusted by" committer, but in this case
I think Heikki is doing more than adjustmants --- nothing wrong with
that --- it should just be documented in the commit message.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +