Re: First-draft release notes for next week's releases

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: First-draft release notes for next week's releases
Date: 2014-03-17 18:27:15
Message-ID: 20140317182715.GN16438@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2014-03-17 14:16:41 -0400, Tom Lane wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> > On 2014-03-17 14:01:03 -0400, Tom Lane wrote:
> >> IIUC, this case only occurs when using the new-in-9.3 types of
> >> nonexclusive row locks. I'm willing to bet that the number of
> >> applications using those is negligible; so I think it's all right to not
> >> mention that case explicitly, as long as the wording doesn't say that
> >> foreign keys are the *only* cause (which I didn't).
>
> > I actually think the issue could also occur with row locks of other
> > severities (is that the correct term?).
>
> The commit log entry says
>
> We were resetting the tuple's HEAP_HOT_UPDATED flag as well as t_ctid on
> WAL replay of a tuple-lock operation, which is incorrect when the tuple
> is already updated.
>
> Back-patch to 9.3. The clearing of both header elements was there
> previously, but since no update could be present on a tuple that was
> being locked, it was harmless.
>
> which I read to mean that the case can't occur with the types of row
> locks that were allowed pre-9.3.

That's not an unreasonable interpretation of the commit message, but I
don't think it's correct with respect to the code :(

> > but if I see correctly it's also triggerable if a backend waits for an
> > updating transaction to finish and follow_updates = true is passed to
> > heap_lock_tuple(). Which e.g. nodeLockRows.c does...
>
> That sounds backwards. nodeLockRows locks the latest tuple in the chain,
> so it can't be subject to this.

Hm, I don't see anything in the code preventing it, that's the
lock_tuple() before the EPQ stuff... in ExecLockRows():
foreach(lc, node->lr_arowMarks)
{
test = heap_lock_tuple(erm->relation, &tuple,
estate->es_output_cid,
lockmode, erm->noWait, true,
&buffer, &hufd);
ReleaseBuffer(buffer);
switch (test)
{
case HeapTupleSelfUpdated:
...

the true passed to heap_lock_tuple() is the follow_updates
parameter. And then in heap_lock_tuple():
if (require_sleep)
{
if (infomask & HEAP_XMAX_IS_MULTI)
{
...
/* if there are updates, follow the update chain */
if (follow_updates &&
!HEAP_XMAX_IS_LOCKED_ONLY(infomask))
{
HTSU_Result res;

res = heap_lock_updated_tuple(relation, tuple, &t_ctid,
GetCurrentTransactionId(),
...
else
{
/* wait for regular transaction to end */
if (nowait)
{
if (!ConditionalXactLockTableWait(xwait))
ereport(ERROR,
(errcode(ERRCODE_LOCK_NOT_AVAILABLE),
errmsg("could not obtain lock on row in relation \"%s\"",
RelationGetRelationName(relation))));
}
else
XactLockTableWait(xwait);

/* if there are updates, follow the update chain */
if (follow_updates &&
!HEAP_XMAX_IS_LOCKED_ONLY(infomask))
...
if (RelationNeedsWAL(relation))
{
xl_heap_lock xlrec;
XLogRecPtr recptr;
XLogRecData rdata[2];

xlrec.target.node = relation->rd_node;
xlrec.target.tid = tuple->t_self;
...

To me that looks sufficient to trigger the bug, because we're issuing a
wal record about the row that was passed to heap_lock_update(), not the
latest one in the ctid chain. When replaying that record, it will reset
the t_ctid field, thus breaking the chain.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-03-17 18:29:56 Re: First-draft release notes for next week's releases
Previous Message Merlin Moncure 2014-03-17 18:20:47 Re: Planner hints in Postgresql