Small Bug in GetConflictingVirtualXIDs

Lists: pgsql-hackers
From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Small Bug in GetConflictingVirtualXIDs
Date: 2009-12-21 03:02:37
Message-ID: 200912210402.37850.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Simon, Hi all,

HS currently does the following in GetConflictingVirtualXIDs

TransactionId pxmin = proc->xmin;

/*
* We ignore an invalid pxmin because this means that backend
* has no snapshot and cannot get another one while we hold exclusive lock.
*/
if (TransactionIdIsValid(pxmin) && !TransactionIdFollows(pxmin, limitXmin))
{
VirtualTransactionId vxid;

GET_VXID_FROM_PGPROC(vxid, *proc);
if (VirtualTransactionIdIsValid(vxid))
vxids[count++] = vxid;
}

The logic behind this seems fine except in the case of dropping a database.
There you very well might have a open connection without an open snapshot.

This leads to nice errors when youre connected to a database on the slave
without having a in progress transaction while dropping the database on the
primary:

CET FATAL: terminating connection due to administrator command
CET LOG: shutting down
CET LOG: restartpoint starting: shutdown immediate
CET FATAL: could not open file "base/16604/1259": No such file or directory
CET CONTEXT: writing block 5 of relation base/16604/1259
CET WARNING: could not write block 5 of base/16604/1259
CET DETAIL: Multiple failures --- write error might be permanent.
CET CONTEXT: writing block 5 of relation base/16604/1259

Should easily be solvable through an extra parameter for
GetConflictingVirtualXIDs. Want me to prepare a patch?

Andres


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Small Bug in GetConflictingVirtualXIDs
Date: 2009-12-21 07:46:39
Message-ID: 1261381599.17644.1803.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2009-12-21 at 04:02 +0100, Andres Freund wrote:

> The logic behind this seems fine except in the case of dropping a database.
> There you very well might have a open connection without an open snapshot.

Yes, you're right, thanks for the report.

I re-arranged the logic there recently to reduce the number of backends
that would conflict, so it looks like I broke that case when I did
that.

> Should easily be solvable through an extra parameter for
> GetConflictingVirtualXIDs. Want me to prepare a patch?

Much appreciated, thanks.

--
Simon Riggs www.2ndQuadrant.com


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Small Bug in GetConflictingVirtualXIDs
Date: 2009-12-21 15:25:34
Message-ID: 20091221152534.GB3913@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund wrote:

> The logic behind this seems fine except in the case of dropping a database.
> There you very well might have a open connection without an open snapshot.

Perhaps the simplest fix is to ensure that drop database gets a snapshot?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Small Bug in GetConflictingVirtualXIDs
Date: 2009-12-21 15:38:07
Message-ID: 15331.1261409887@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Andres Freund wrote:
>> The logic behind this seems fine except in the case of dropping a database.
>> There you very well might have a open connection without an open snapshot.

> Perhaps the simplest fix is to ensure that drop database gets a snapshot?

I confess to not having followed the thread closely, but why is DROP
DATABASE special in this regard? Wouldn't we soon find ourselves
needing every utility command to take a snapshot?

regards, tom lane


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Small Bug in GetConflictingVirtualXIDs
Date: 2009-12-21 15:48:52
Message-ID: 1261410532.17644.3336.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2009-12-21 at 10:38 -0500, Tom Lane wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> > Andres Freund wrote:
> >> The logic behind this seems fine except in the case of dropping a database.
> >> There you very well might have a open connection without an open snapshot.
>
> > Perhaps the simplest fix is to ensure that drop database gets a snapshot?
>
> I confess to not having followed the thread closely, but why is DROP
> DATABASE special in this regard? Wouldn't we soon find ourselves
> needing every utility command to take a snapshot?

Andres has worded this a little imprecisely, causing a confusion. In
cases regarding HS we need to be clear whether the interacting sessions
are on the master or on the standby to understand the reasons for poor
interactions.

What he means is that you can be connected to the standby without an
open snapshot (from the standby) at the point we replay a drop database
command that had been run on the master. That case currently causes the
bug, created by my recent change to GetConflictingVirtualXids().

Giving the drop database a snapshot is not the answer. I expect Andres
to be able to fix this with a simple patch that would not effect the
case of normal running.

--
Simon Riggs www.2ndQuadrant.com


From: Andres Freund <af(at)cybertec(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Small Bug in GetConflictingVirtualXIDs
Date: 2009-12-21 16:04:58
Message-ID: 200912211704.59321.af@cybertec.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Monday 21 December 2009 16:38:07 Tom Lane wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> > Andres Freund wrote:
> >> The logic behind this seems fine except in the case of dropping a
> >> database. There you very well might have a open connection without an
> >> open snapshot.
> > Perhaps the simplest fix is to ensure that drop database gets a snapshot?
> I confess to not having followed the thread closely, but why is DROP
> DATABASE special in this regard? Wouldn't we soon find ourselves
> needing every utility command to take a snapshot?
Because most other "entities" are locked when you access them. So on the
standby the AccessExlusive (generated on the master) will conflict with
whatever lock you currently have on that entity (on the slave).
There are no locks for an idle session though.

Andres


From: Andres Freund <af(at)cybertec(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: Small Bug in GetConflictingVirtualXIDs
Date: 2009-12-22 02:19:09
Message-ID: 200912220319.10264.af@cybertec.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Monday 21 December 2009 16:48:52 Simon Riggs wrote:
> Giving the drop database a snapshot is not the answer. I expect Andres
> to be able to fix this with a simple patch that would not effect the
> case of normal running.
Actually its less simply than I had thought at first - I don't think the code
ever handled that correctly.
I might be wrong there, my knowledge of the involved code is a bit sparse...
The whole conflict resolution builds on the concept of waiting for an VXid, but
an idle backend does not have a valid vxid. Thats correct, right?
Sure, the code should be modifyable to handle that code mostly transparently
(simply ignoring a invalid localTransactionId when the database id is valid),
but ...

I am inclined to just unconditionally kill the users of the database. Its not
like that would be an issue in production. I cant see a case where its
important to run a session to its end on a database which was dropped on the
master.
Opinions on that?

Andres


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Andres Freund <af(at)cybertec(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: Small Bug in GetConflictingVirtualXIDs
Date: 2009-12-22 10:42:30
Message-ID: 1261478551.7442.4332.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2009-12-22 at 03:19 +0100, Andres Freund wrote:
> On Monday 21 December 2009 16:48:52 Simon Riggs wrote:
> > Giving the drop database a snapshot is not the answer. I expect Andres
> > to be able to fix this with a simple patch that would not effect the
> > case of normal running.
> Actually its less simply than I had thought at first - I don't think the code
> ever handled that correctly.
> I might be wrong there, my knowledge of the involved code is a bit sparse...
> The whole conflict resolution builds on the concept of waiting for an VXid, but
> an idle backend does not have a valid vxid. Thats correct, right?

Yes, that's correct. I'll take this one back then.

> Sure, the code should be modifyable to handle that code mostly transparently
> (simply ignoring a invalid localTransactionId when the database id is valid),
> but ...
>
> I am inclined to just unconditionally kill the users of the database. Its not
> like that would be an issue in production. I cant see a case where its
> important to run a session to its end on a database which was dropped on the
> master.
> Opinions on that?

I don't see any mileage in making Startup process wait for an idle
session, so no real reason to wait for others either.

--
Simon Riggs www.2ndQuadrant.com


From: Andres Freund <af(at)cybertec(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: Small Bug in GetConflictingVirtualXIDs
Date: 2009-12-22 18:03:20
Message-ID: 200912221903.20624.af@cybertec.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tuesday 22 December 2009 11:42:30 Simon Riggs wrote:
> On Tue, 2009-12-22 at 03:19 +0100, Andres Freund wrote:
> > On Monday 21 December 2009 16:48:52 Simon Riggs wrote:
> > > Giving the drop database a snapshot is not the answer. I expect Andres
> > > to be able to fix this with a simple patch that would not effect the
> > > case of normal running.
> >
> > Actually its less simply than I had thought at first - I don't think the
> > code ever handled that correctly.
> > I might be wrong there, my knowledge of the involved code is a bit
> > sparse... The whole conflict resolution builds on the concept of waiting
> > for an VXid, but an idle backend does not have a valid vxid. Thats
> > correct, right?
>
> Yes, that's correct. I'll take this one back then.
So youre writing a fix or shall I?

Andres


From: Andres Freund <af(at)cybertec(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: Small Bug in GetConflictingVirtualXIDs
Date: 2009-12-27 19:11:26
Message-ID: 200912272011.27003.af@cybertec.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tuesday 22 December 2009 11:42:30 Simon Riggs wrote:
> On Tue, 2009-12-22 at 03:19 +0100, Andres Freund wrote:
> > On Monday 21 December 2009 16:48:52 Simon Riggs wrote:
> > > Giving the drop database a snapshot is not the answer. I expect Andres
> > > to be able to fix this with a simple patch that would not effect the
> > > case of normal running.
> >
> > Actually its less simply than I had thought at first - I don't think the
> > code ever handled that correctly.
> > I might be wrong there, my knowledge of the involved code is a bit
> > sparse... The whole conflict resolution builds on the concept of waiting
> > for an VXid, but an idle backend does not have a valid vxid. Thats
> > correct, right?
> I don't see any mileage in making Startup process wait for an idle
> session, so no real reason to wait for others either.
So here is a small patch implementing that behaviour.

Andres

Attachment Content-Type Size
0001-Fix-the-bug-in-HS-that-it-is-possible-to-stay-connec.patch text/x-patch 3.6 KB

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Andres Freund <af(at)cybertec(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: Small Bug in GetConflictingVirtualXIDs
Date: 2009-12-27 20:04:43
Message-ID: 1261944283.18116.24.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, 2009-12-27 at 20:11 +0100, Andres Freund wrote:
> On Tuesday 22 December 2009 11:42:30 Simon Riggs wrote:
> > On Tue, 2009-12-22 at 03:19 +0100, Andres Freund wrote:
> > > On Monday 21 December 2009 16:48:52 Simon Riggs wrote:
> > > > Giving the drop database a snapshot is not the answer. I expect Andres
> > > > to be able to fix this with a simple patch that would not effect the
> > > > case of normal running.
> > >
> > > Actually its less simply than I had thought at first - I don't think the
> > > code ever handled that correctly.
> > > I might be wrong there, my knowledge of the involved code is a bit
> > > sparse... The whole conflict resolution builds on the concept of waiting
> > > for an VXid, but an idle backend does not have a valid vxid. Thats
> > > correct, right?
> > I don't see any mileage in making Startup process wait for an idle
> > session, so no real reason to wait for others either.
> So here is a small patch implementing that behaviour.

OK, thanks. I have also written something, as mentioned. Will review.

--
Simon Riggs www.2ndQuadrant.com


From: Andres Freund <af(at)cybertec(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: Small Bug in GetConflictingVirtualXIDs
Date: 2009-12-27 20:20:01
Message-ID: 200912272120.02171.af@cybertec.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sunday 27 December 2009 21:04:43 Simon Riggs wrote:
> On Sun, 2009-12-27 at 20:11 +0100, Andres Freund wrote:
> > On Tuesday 22 December 2009 11:42:30 Simon Riggs wrote:
> > > On Tue, 2009-12-22 at 03:19 +0100, Andres Freund wrote:
> > > > On Monday 21 December 2009 16:48:52 Simon Riggs wrote:
> > > > > Giving the drop database a snapshot is not the answer. I expect
> > > > > Andres to be able to fix this with a simple patch that would not
> > > > > effect the case of normal running.
> > > >
> > > > Actually its less simply than I had thought at first - I don't think
> > > > the code ever handled that correctly.
> > > > I might be wrong there, my knowledge of the involved code is a bit
> > > > sparse... The whole conflict resolution builds on the concept of
> > > > waiting for an VXid, but an idle backend does not have a valid vxid.
> > > > Thats correct, right?
> > >
> > > I don't see any mileage in making Startup process wait for an idle
> > > session, so no real reason to wait for others either.
> >
> > So here is a small patch implementing that behaviour.
>
> OK, thanks. I have also written something, as mentioned. Will review.
Thats why I had asked in another mail ;-) But I have learned a bit more about
pg while writing that patch so its fine for me at least ;-)

Andres


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Andres Freund <af(at)cybertec(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: Small Bug in GetConflictingVirtualXIDs
Date: 2010-01-10 15:49:00
Message-ID: 1263138541.19367.142097.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, 2009-12-27 at 20:11 +0100, Andres Freund wrote:
> On Tuesday 22 December 2009 11:42:30 Simon Riggs wrote:
> > On Tue, 2009-12-22 at 03:19 +0100, Andres Freund wrote:
> > > On Monday 21 December 2009 16:48:52 Simon Riggs wrote:
> > > > Giving the drop database a snapshot is not the answer. I expect Andres
> > > > to be able to fix this with a simple patch that would not effect the
> > > > case of normal running.
> > >
> > > Actually its less simply than I had thought at first - I don't think the
> > > code ever handled that correctly.
> > > I might be wrong there, my knowledge of the involved code is a bit
> > > sparse... The whole conflict resolution builds on the concept of waiting
> > > for an VXid, but an idle backend does not have a valid vxid. Thats
> > > correct, right?
> > I don't see any mileage in making Startup process wait for an idle
> > session, so no real reason to wait for others either.
> So here is a small patch implementing that behaviour.

I've committed a slightly modified form of this patch.

It was an outstanding bug, so delaying fix at this stage not worth it. I
had in mind a slightly grander fix, but it's hardly a priority to polish
the chrome on this one.

Thanks for the bug report and fix.

--
Simon Riggs www.2ndQuadrant.com


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Andres Freund <af(at)cybertec(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: Small Bug in GetConflictingVirtualXIDs
Date: 2010-01-14 11:27:42
Message-ID: 1263468462.26654.19743.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, 2009-12-27 at 20:11 +0100, Andres Freund wrote:
> On Tuesday 22 December 2009 11:42:30 Simon Riggs wrote:
> > On Tue, 2009-12-22 at 03:19 +0100, Andres Freund wrote:
> > > On Monday 21 December 2009 16:48:52 Simon Riggs wrote:
> > > > Giving the drop database a snapshot is not the answer. I expect Andres
> > > > to be able to fix this with a simple patch that would not effect the
> > > > case of normal running.
> > >
> > > Actually its less simply than I had thought at first - I don't think the
> > > code ever handled that correctly.
> > > I might be wrong there, my knowledge of the involved code is a bit
> > > sparse... The whole conflict resolution builds on the concept of waiting
> > > for an VXid, but an idle backend does not have a valid vxid. Thats
> > > correct, right?
> > I don't see any mileage in making Startup process wait for an idle
> > session, so no real reason to wait for others either.
> So here is a small patch implementing that behaviour.

On further testing, I received a re-connection from an automatic session
retry. That shouldn't have happened, but it indicates the need for
locking around the conflict handler. I had understood that to be placed
elsewhere but that seems wrong now.

This is a low priority item, so looking for a quick fix to allow time on
other areas.

Any objections?

--
Simon Riggs www.2ndQuadrant.com

Attachment Content-Type Size
db_conflict_locking.patch text/x-patch 1.5 KB