Re: Cursor with hold emits the same row more than once across commits in 8.3.7

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Cc: pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cursor with hold emits the same row more than once across commits in 8.3.7
Date: 2009-06-09 16:07:44
Message-ID: 20997.1244563664@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz> writes:
> A construction of the form

> DECLARE cur CURSOR WITH HOLD FOR SELECT * FROM obj

> loop
> FETCH 1000 FROM cur
> process 'em
> COMMIT

> results in some of the same rows being emitted more than once, altho the
> final rowcount is correct (i.e some rows end up being never seen).

I poked into this a bit, and it looks sort of nasty. Mark's immediate
complaint is a consequence of the synchronize_seqscan patch, but there
are other issues too. The problem comes from the fact that a WITH HOLD
cursor is initially treated the same as a regular cursor, ie, we just
fetch on demand. If it's still open at transaction commit, we do this:

ExecutorRewind();
fetch all the rows into a tuplestore;
advance the tuplestore past the number of rows previously fetched;

and then later transactions can fetch-on-demand from the tuplestore.

The reason for the bug is that with synchronize_seqscan on, a SeqScan
node that gets rewound does not necessarily restart from the same point
in the table that it initially started reading from. So the initial
fetch grabs 1000 rows, but then when we rewind, the first 1000 rows
loaded into the tuplestore may come from a different range of the table.

This does not only affect cursors WITH HOLD. Some paths in the
cursor MOVE logic also rely on ExecutorRewind to behave sanely.

We could probably fix this specific issue by refactoring things in such
a way that the seqscan start point is frozen on the first read and
re-used after rewinds.

However, it strikes me also that a cursor query containing volatile
functions is going to cause some similar issues --- you can't just
re-execute the query for "the same" rows and expect to get stable
results. What should we do about that?

The technically best solution is probably similar to what Materialize
nodes do, ie, read the query only once and be careful to stash rows
aside into a tuplestore as they are read. This would fix both issues
with one patch. The problem with that is that if the user doesn't
actually do any backwards fetching, you waste all the tuplestore
maintenance work.

Or we could just document that cursors containing volatile functions
don't behave stably if you try to read backwards; or try to enforce that
you can't do so.

The volatile-function issue has been there since the dawn of time, and
we've never had a complaint about it AFAIR. So maybe trying to "fix"
it isn't a good thing and we should just document the behavior. But
the syncscan instability is new and probably ought to be dealt with.

Thoughts?

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Robert Haas 2009-06-09 17:37:07 Re: [HACKERS] Cursor with hold emits the same row more than once across commits in 8.3.7
Previous Message Tom Lane 2009-06-09 13:59:06 Re: Bug in pg_dump.exe/pg_restore (Version 8.4 beta 2)

Browse pgsql-hackers by date

  From Date Subject
Next Message Florian Weimer 2009-06-09 16:12:01 Re: Multicolumn index corruption on 8.4 beta 2
Previous Message Simon Riggs 2009-06-09 16:07:35 Re: Multicolumn index corruption on 8.4 beta 2