Re: BUG #6571: Postgres Kills 'Select 1' query randomly on hot standby databases

Lists: pgsql-bugs
From: amatzinger(at)experts-exchange(dot)com
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #6571: Postgres Kills 'Select 1' query randomly on hot standby databases
Date: 2012-04-02 23:23:43
Message-ID: E1SEqb9-0004Ou-Rl@wrigleys.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 6571
Logged by: Alex Matzinger
Email address: amatzinger(at)experts-exchange(dot)com
PostgreSQL version: 9.1.2
Operating system: Freebsd 7.4
Description:

On a hot standby database, while the primary is being updated, Postgres will
randomly kill a process which is performing a "Select 1" command.

The error is this:
2012-04-02 13:36:13.269
PDT,"smxuser","smxprd1",39523,"127.0.0.1:57893",4f79ffad.9a63,1,"",2012-04-02
12:36:13 PDT,3/32,0,FATAL,40001,"terminating connection due to conflict with
recovery","User query might have needed to see row versions that must be
removed.","In a moment you should be able to reconnect to the database and
repeat your command.",,,,,,,""

We have 5 hot standby's set up, which all preform this SELECT 1, and
postgres kills them across all standby's.

There should never be a situation that SELECT 1 is in conflict with data, as
it it never using any table in the database.

We will be updating to 9.1.3 very soon, but i have not seen any
documentation that this issue is corrected, or even known.


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: amatzinger(at)experts-exchange(dot)com
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #6571: Postgres Kills 'Select 1' query randomly on hot standby databases
Date: 2012-04-03 06:49:43
Message-ID: 4F7A9D87.9060708@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On 03.04.2012 02:23, amatzinger(at)experts-exchange(dot)com wrote:
> On a hot standby database, while the primary is being updated, Postgres will
> randomly kill a process which is performing a "Select 1" command.
>
> The error is this:
> 2012-04-02 13:36:13.269
> PDT,"smxuser","smxprd1",39523,"127.0.0.1:57893",4f79ffad.9a63,1,"",2012-04-02
> 12:36:13 PDT,3/32,0,FATAL,40001,"terminating connection due to conflict with
> recovery","User query might have needed to see row versions that must be
> removed.","In a moment you should be able to reconnect to the database and
> repeat your command.",,,,,,,""
>
> We have 5 hot standby's set up, which all preform this SELECT 1, and
> postgres kills them across all standby's.
>
> There should never be a situation that SELECT 1 is in conflict with data, as
> it it never using any table in the database.

The system doesn't make a difference between queries like "SELECT 1"
that don't access any tables, and those that do. Even if "SELECT 1"
doesn't access any tables, a subsequent statement in the same
transaction might.

I'm assuming that those "SELECT 1"s were issued in transactions that had
been open for a long time, because you shouldn't get recovery conflicts
with very short transactions, in practice anyway.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Alex Matzinger <amatzinger(at)experts-exchange(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #6571: Postgres Kills 'Select 1' query randomly on hot standby databases
Date: 2012-04-03 16:07:44
Message-ID: 4F7B2050.9000507@experts-exchange.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

The connection that is executing the SELECT 1 are generally open for 1-5
hours before they are killed. The specific connection only executes
SELECT 1. The transaction is simply BEGIN, and then SELECT 1's, no
other query is executed.

The updates we make to the primary database are quiet large, usually
several megabytes.

Alex

On 4/2/2012 11:49 PM, Heikki Linnakangas wrote:
> On 03.04.2012 02:23, amatzinger(at)experts-exchange(dot)com wrote:
>> On a hot standby database, while the primary is being updated,
>> Postgres will
>> randomly kill a process which is performing a "Select 1" command.
>>
>> The error is this:
>> 2012-04-02 13:36:13.269
>> PDT,"smxuser","smxprd1",39523,"127.0.0.1:57893",4f79ffad.9a63,1,"",2012-04-02
>>
>> 12:36:13 PDT,3/32,0,FATAL,40001,"terminating connection due to
>> conflict with
>> recovery","User query might have needed to see row versions that must be
>> removed.","In a moment you should be able to reconnect to the
>> database and
>> repeat your command.",,,,,,,""
>>
>> We have 5 hot standby's set up, which all preform this SELECT 1, and
>> postgres kills them across all standby's.
>>
>> There should never be a situation that SELECT 1 is in conflict with
>> data, as
>> it it never using any table in the database.
>
> The system doesn't make a difference between queries like "SELECT 1"
> that don't access any tables, and those that do. Even if "SELECT 1"
> doesn't access any tables, a subsequent statement in the same
> transaction might.
>
> I'm assuming that those "SELECT 1"s were issued in transactions that
> had been open for a long time, because you shouldn't get recovery
> conflicts with very short transactions, in practice anyway.
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alex Matzinger <amatzinger(at)experts-exchange(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #6571: Postgres Kills 'Select 1' query randomly on hot standby databases
Date: 2012-04-03 16:40:12
Message-ID: 9729.1333471212@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Alex Matzinger <amatzinger(at)experts-exchange(dot)com> writes:
> The connection that is executing the SELECT 1 are generally open for 1-5
> hours before they are killed. The specific connection only executes
> SELECT 1. The transaction is simply BEGIN, and then SELECT 1's, no
> other query is executed.

Lose the "BEGIN" and it will probably work more nicely.

regards, tom lane


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Alex Matzinger" <amatzinger(at)experts-exchange(dot)com>
Cc: <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #6571: Postgres Kills 'Select 1' query randomly on hot standby databases
Date: 2012-04-03 16:41:38
Message-ID: 4F7AE1F20200002500046B1C@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Alex Matzinger <amatzinger(at)experts-exchange(dot)com> wrote:

> The connection that is executing the SELECT 1 are generally open
> for 1-5 hours before they are killed. The specific connection
> only executes SELECT 1. The transaction is simply BEGIN, and then
> SELECT 1's, no other query is executed.

Don't do that.

In particular, never put more into a single database transaction
than is required for integrity; and there is no apparent reason why
running a series of "SELECT 1" statements needs to be in a single
transaction. (It's not blindingly obvious why you would want to do
it in general, but I can imagine it possibly being useful for
monitoring purposes.)

-Kevin


From: John R Pierce <pierce(at)hogranch(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #6571: Postgres Kills 'Select 1' query randomly on hot standby databases
Date: 2012-04-03 16:50:38
Message-ID: 4F7B2A5E.8060004@hogranch.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On 04/03/12 9:40 AM, Tom Lane wrote:
> Alex Matzinger<amatzinger(at)experts-exchange(dot)com> writes:
>> > The connection that is executing the SELECT 1 are generally open for 1-5
>> > hours before they are killed. The specific connection only executes
>> > SELECT 1. The transaction is simply BEGIN, and then SELECT 1's, no
>> > other query is executed.
> Lose the "BEGIN" and it will probably work more nicely.

indeed,a 1-5 hour long transaction means VACUUM can't clean up anything
newer than the oldest active transaction, and thats not per database,
thats cluster-wide.

--
john r pierce N 37, W 122
santa cruz ca mid-left coast


From: Alex Matzinger <amatzinger(at)experts-exchange(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #6571: Postgres Kills 'Select 1' query randomly on hot standby databases
Date: 2012-04-05 16:38:04
Message-ID: 4F7DCA6C.5080806@experts-exchange.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Dropping the BEGIN has cleared up the issue. Thank you.

On 4/3/2012 9:50 AM, John R Pierce wrote:
> On 04/03/12 9:40 AM, Tom Lane wrote:
>> Alex Matzinger<amatzinger(at)experts-exchange(dot)com> writes:
>>> > The connection that is executing the SELECT 1 are generally open
>>> for 1-5
>>> > hours before they are killed. The specific connection only executes
>>> > SELECT 1. The transaction is simply BEGIN, and then SELECT 1's, no
>>> > other query is executed.
>> Lose the "BEGIN" and it will probably work more nicely.
>
>
> indeed,a 1-5 hour long transaction means VACUUM can't clean up
> anything newer than the oldest active transaction, and thats not per
> database, thats cluster-wide.
>
>
>