Cursor with hold emits the same row more than once across commits in 8.3.7

Lists: pgsql-bugspgsql-hackers
From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Cursor with hold emits the same row more than once across commits in 8.3.7
Date: 2009-06-09 06:15:38
Message-ID: 4A2DFE0A.6050502@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Short Desc: Cursor with hold emits the same row more than once across
commits in 8.3.7
Os : Debian Etch amd64 / Ubuntu Jaunty amd64
Pg : 8.3.7
Build options: Official package and also compiled from source with
--enable-integer-datetimes
Detailed Desc:

A construction of the form

DECLARE cur CURSOR WITH HOLD FOR SELECT * FROM obj

loop
FETCH 1000 FROM cur
process 'em
COMMIT

results in some of the same rows being emitted more than once, altho the
final rowcount is correct (i.e some rows end up being never seen).

Originally discovered using a perl DBI program, and we wondered if the
version of DBD::Pg might be an issue, so a c library program was written
to test this - and it exhibits the problem too (see attached for schema
and program). The table rows are reasonably wide:

select attname,n_distinct,avg_width from pg_stats where tablename='obj';
attname | n_distinct | avg_width
-------------+------------+-----------
obj_id | -1 | 4
obj_link_id | 5 | 4
obj_fil | 13035 | 1188

which may be a factor(tuplestore issues?)... The table is reasonably
sizable (10000000 rows). I can attach the generation program for this
dataset if required.

regards

Mark

Attachment Content-Type Size
cursor-bug.tar.gz application/x-gzip 1.5 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Cc: pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cursor with hold emits the same row more than once across commits in 8.3.7
Date: 2009-06-09 16:07:44
Message-ID: 20997.1244563664@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz> writes:
> A construction of the form

> DECLARE cur CURSOR WITH HOLD FOR SELECT * FROM obj

> loop
> FETCH 1000 FROM cur
> process 'em
> COMMIT

> results in some of the same rows being emitted more than once, altho the
> final rowcount is correct (i.e some rows end up being never seen).

I poked into this a bit, and it looks sort of nasty. Mark's immediate
complaint is a consequence of the synchronize_seqscan patch, but there
are other issues too. The problem comes from the fact that a WITH HOLD
cursor is initially treated the same as a regular cursor, ie, we just
fetch on demand. If it's still open at transaction commit, we do this:

ExecutorRewind();
fetch all the rows into a tuplestore;
advance the tuplestore past the number of rows previously fetched;

and then later transactions can fetch-on-demand from the tuplestore.

The reason for the bug is that with synchronize_seqscan on, a SeqScan
node that gets rewound does not necessarily restart from the same point
in the table that it initially started reading from. So the initial
fetch grabs 1000 rows, but then when we rewind, the first 1000 rows
loaded into the tuplestore may come from a different range of the table.

This does not only affect cursors WITH HOLD. Some paths in the
cursor MOVE logic also rely on ExecutorRewind to behave sanely.

We could probably fix this specific issue by refactoring things in such
a way that the seqscan start point is frozen on the first read and
re-used after rewinds.

However, it strikes me also that a cursor query containing volatile
functions is going to cause some similar issues --- you can't just
re-execute the query for "the same" rows and expect to get stable
results. What should we do about that?

The technically best solution is probably similar to what Materialize
nodes do, ie, read the query only once and be careful to stash rows
aside into a tuplestore as they are read. This would fix both issues
with one patch. The problem with that is that if the user doesn't
actually do any backwards fetching, you waste all the tuplestore
maintenance work.

Or we could just document that cursors containing volatile functions
don't behave stably if you try to read backwards; or try to enforce that
you can't do so.

The volatile-function issue has been there since the dawn of time, and
we've never had a complaint about it AFAIR. So maybe trying to "fix"
it isn't a good thing and we should just document the behavior. But
the syncscan instability is new and probably ought to be dealt with.

Thoughts?

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>, pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Cursor with hold emits the same row more than once across commits in 8.3.7
Date: 2009-06-09 17:37:07
Message-ID: 603c8f070906091037l3a748fc8m11ca40cb4b7fe8f7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Tue, Jun 9, 2009 at 12:07 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> The technically best solution is probably similar to what Materialize
> nodes do, ie, read the query only once and be careful to stash rows
> aside into a tuplestore as they are read.  This would fix both issues
> with one patch.  The problem with that is that if the user doesn't
> actually do any backwards fetching, you waste all the tuplestore
> maintenance work.

This seems like the only option that will produce correct answers, so
it gets my vote. How much is the performance penalty for
materializing the tuplestore? I'm inclined to think that whatever it
is, you just have to pay it if you ask for a WITH HOLD cursor.

I suppose in theory you could try to figure out whether
materialization is really necessary (let's see... no seqscans here and
no volatile functions... ok, so we can cheat...) but that seems
likely to lead to future bugs as the rules for which things are safe
change.

...Robert


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>, pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Cursor with hold emits the same row more than once across commits in 8.3.7
Date: 2009-06-09 17:38:51
Message-ID: 20090609173851.GG5938@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Robert Haas escribió:

> I suppose in theory you could try to figure out whether
> materialization is really necessary (let's see... no seqscans here and
> no volatile functions... ok, so we can cheat...) but that seems
> likely to lead to future bugs as the rules for which things are safe
> change.

Another thing we could do is disable syncscan if we see a WITH HOLD
cursor, but I guess it's not future-proof either.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>, pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Cursor with hold emits the same row more than once across commits in 8.3.7
Date: 2009-06-09 17:47:14
Message-ID: 23358.1244569634@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> This seems like the only option that will produce correct answers, so
> it gets my vote. How much is the performance penalty for
> materializing the tuplestore? I'm inclined to think that whatever it
> is, you just have to pay it if you ask for a WITH HOLD cursor.

I don't mind paying it for a WITH HOLD cursor, since by definition
you're asking for a more expensive behavior there. The thing that is
bothering me more is whether we want to pay a price for a *non* WITH
HOLD cursor. You can get instability for seqscan or volatile functions
if you just try MOVE BACKWARD ALL and re-read.

regards, tom lane


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>, pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cursor with hold emits the same row more than once across commits in 8.3.7
Date: 2009-06-09 17:51:25
Message-ID: 1244569885.21727.64.camel@monkey-cat.sm.truviso.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Tue, 2009-06-09 at 12:07 -0400, Tom Lane wrote:
> We could probably fix this specific issue by refactoring things in such
> a way that the seqscan start point is frozen on the first read and
> re-used after rewinds.

I don't know what you mean by "frozen" exactly, but the start point of a
synchronized scan is stored in shared memory; otherwise, it wouldn't
know where to stop.

Regards,
Jeff Davis


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>, pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cursor with hold emits the same row more than once across commits in 8.3.7
Date: 2009-06-09 17:54:16
Message-ID: 1244570056.21727.66.camel@monkey-cat.sm.truviso.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Tue, 2009-06-09 at 10:51 -0700, Jeff Davis wrote:
> On Tue, 2009-06-09 at 12:07 -0400, Tom Lane wrote:
> > We could probably fix this specific issue by refactoring things in such
> > a way that the seqscan start point is frozen on the first read and
> > re-used after rewinds.
>
> I don't know what you mean by "frozen" exactly, but the start point of a
> synchronized scan is stored in shared memory; otherwise, it wouldn't
> know where to stop.
>

Correction: I didn't actually mean _shared_ memory there. It's just
backend-local memory.

Regards,
Jeff Davis


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>, pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Cursor with hold emits the same row more than once across commits in 8.3.7
Date: 2009-06-09 18:00:41
Message-ID: 603c8f070906091100v1e991c97y8e58e62153efac2a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Tue, Jun 9, 2009 at 1:47 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> This seems like the only option that will produce correct answers, so
>> it gets my vote.  How much is the performance penalty for
>> materializing the tuplestore?  I'm inclined to think that whatever it
>> is, you just have to pay it if you ask for a WITH HOLD cursor.
>
> I don't mind paying it for a WITH HOLD cursor, since by definition
> you're asking for a more expensive behavior there.  The thing that is
> bothering me more is whether we want to pay a price for a *non* WITH
> HOLD cursor.  You can get instability for seqscan or volatile functions
> if you just try MOVE BACKWARD ALL and re-read.

[ reads the fine manual ]

It seems like we need to materialize if you ask for WITH HOLD or
SCROLL. I guess the question is what to do if you haven't specified
either and then try to scroll anyway. The manual says that it may
fail, but it doesn't say that might seem to work but actually return
wrong answers.

...Robert


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Mark Kirkwood" <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>, <pgsql-bugs(at)postgresql(dot)org>,<pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Cursor with hold emits the same row more than once across commits in 8.3.7
Date: 2009-06-09 18:00:43
Message-ID: 4A2E5CFB020000250002777E@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> This seems like the only option that will produce correct answers,
>> so it gets my vote. How much is the performance penalty for
>> materializing the tuplestore? I'm inclined to think that whatever
>> it is, you just have to pay it if you ask for a WITH HOLD cursor.
>
> I don't mind paying it for a WITH HOLD cursor, since by definition
> you're asking for a more expensive behavior there. The thing that
> is bothering me more is whether we want to pay a price for a *non*
> WITH HOLD cursor. You can get instability for seqscan or volatile
> functions if you just try MOVE BACKWARD ALL and re-read.

I would expect to pay more for a scrollable cursor than non-
scrollable; and in fact, the fine manual says "Depending upon the
complexity of the query's execution plan, specifying SCROLL might
impose a performance penalty on the query's execution time." That
would tend to argue in favor of taking the time to produce correct
answers. It does raise a question, though, about another sentence in
the same paragraph: "The default is to allow scrolling in some cases;
this is not the same as specifying SCROLL." Either we make people pay
for this when they haven't specified SCROLL but PostgreSQL has
historically given it to them anyway, or we might break existing
applications.

-Kevin


From: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>, pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Cursor with hold emits the same row more than once across commits in 8.3.7
Date: 2009-06-09 18:15:56
Message-ID: 3073cc9b0906091115u1672ae42nd3f8146521721a3@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Tue, Jun 9, 2009 at 1:00 PM, Kevin
Grittner<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
>
> the same paragraph: "The default is to allow scrolling in some cases;

"in some cases"... like in "but not all"... ?
this doesn't sound like a vow to me. if the user really wants
SCROLLing ability he should have been specified that way...
i say pay the price for WITH HOLD and SCROLL and don't allow scrolling
ability if SCROLL hasn't been specified

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>, pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Re: Cursor with hold emits the same row more than once across commits in 8.3.7
Date: 2009-06-09 18:33:18
Message-ID: 25190.1244572398@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Jeff Davis <pgsql(at)j-davis(dot)com> writes:
> On Tue, 2009-06-09 at 10:51 -0700, Jeff Davis wrote:
>> I don't know what you mean by "frozen" exactly, but the start point of a
>> synchronized scan is stored in shared memory; otherwise, it wouldn't
>> know where to stop.

> Correction: I didn't actually mean _shared_ memory there. It's just
> backend-local memory.

Well, wherever it's stored, it's a demonstrable fact that we're not
getting the same rows after ExecutorRewind(); and that we do get the
same rows out if we disable synchronize_seqscans in Mark's test case.
I haven't got round to identifying exactly what to change if we decide
to go for a narrow fix instead of storing the query results at a higher
level.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Robert Haas <robertmhaas(at)gmail(dot)com>, Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>, pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Cursor with hold emits the same row more than once across commits in 8.3.7
Date: 2009-06-10 17:16:19
Message-ID: 8133.1244654179@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Given that RC freeze is nearly upon us for 8.4, and that we need a
reasonably non-invasive fix for 8.3 anyway, I propose that for now
we just deal with the syncscan issue by tweaking heap_rescan so that
rs_startblock doesn't get changed. It looks like that's about a
three-line patch. The question of how cursors should behave with
respect to volatile functions can be documented and left for another
time.

regards, tom lane


From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Cursor with hold emits the same row more than once across commits in 8.3.7
Date: 2009-06-10 22:33:32
Message-ID: 4A3034BC.2030906@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Tom Lane wrote:
> Given that RC freeze is nearly upon us for 8.4, and that we need a
> reasonably non-invasive fix for 8.3 anyway, I propose that for now
> we just deal with the syncscan issue by tweaking heap_rescan so that
> rs_startblock doesn't get changed. It looks like that's about a
> three-line patch. The question of how cursors should behave with
> respect to volatile functions can be documented and left for another
> time.
>

Sounds like a good approach.

Mark