Re: [HACKERS] logical decoding of two-phase transactions

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Nikhil Sontakke <nikhils(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Sokolov Yura <y(dot)sokolov(at)postgrespro(dot)ru>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: [HACKERS] logical decoding of two-phase transactions
Date: 2018-04-05 08:45:49
Message-ID: b1c6ba51-436c-b10d-895f-3805d3caf4a5@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4/5/18 8:50 AM, Nikhil Sontakke wrote:
> Hi Tomas,
>
>> 1) There's a race condition in LogicalLockTransaction. The code does
>> roughly this:
>>
>> if (!BecomeDecodeGroupMember(...))
>> ... bail out ...
>>
>> Assert(MyProc->decodeGroupLeader);
>> lwlock = LockHashPartitionLockByProc(MyProc->decodeGroupLeader);
>> ...
>>
>> but AFAICS there is no guarantee that the transaction does not commit
>> (or even abort) right after the become decode group member. In which
>> case LogicalDecodeRemoveTransaction might have already reset our pointer
>> to a leader to NULL. In which case the Assert() and lock will fail.
>>
>> I've initially thought this can be fixed by setting decodeLocked=true in
>> BecomeDecodeGroupMember, but that's not really true - that would fix the
>> race for aborts, but not commits. LogicalDecodeRemoveTransaction skips
>> the wait for commits entirely, and just resets the flags anyway.
>>
>> So this needs a different fix, I think. BecomeDecodeGroupMember also
>> needs the leader PGPROC pointer, but it does not have the issue because
>> it gets it as a parameter. I think the same thing would work for here
>> too - that is, use the AssignDecodeGroupLeader() result instead.
>>
>
> That's a good catch. One of the earlier patches had a check for this
> (it also had an ill-placed assert above though) which we removed as
> part of the ongoing review.
>
> Instead of doing the above, we can just re-check if the
> decodeGroupLeader pointer has become NULL and if so, re-assert that
> the leader has indeed gone away before returning false. I propose a
> diff like below.
>
> /*
>
> * If we were able to add ourself, then Abort processing will
>
> - * interlock with us.
>
> + * interlock with us. If the leader was done in the meanwhile
>
> + * it could have removed us and gone away as well.
>
> */
>
> - Assert(MyProc->decodeGroupLeader);
>
> + if (MyProc->decodeGroupLeader == NULL)
>
> + {
>
> + Assert(BackendXidGetProc(txn->xid) == NULL);
>
> + return false
>
> + }
>
>

Uh? Simply rechecking if MyProc->decodeGroupLeader is NULL obviously
does not fix the race condition - it might get NULL right after the
check. So we need to either lookup the PROC again (and then get the
associated lwlock), or hold some other type of lock.

>>
>> 3) I don't quite understand why BecomeDecodeGroupMember does the
>> cross-check using PID. In which case would it help?
>>
>
> When I wrote this support, I had written it with the intention of
> supporting both 2PC (in which case pid is 0) and in-progress regular
> transactions. That's why the presence of PID in these functions. The
> current use case is just for 2PC, so we could remove it.
>

Sure, but why do we need to cross-check the PID at all? I may be missing
something here, but I don't see what does this protect against?

>
>>
>> 5) ReorderBufferCommitInternal does elog(LOG) about interrupting the
>> decoding of aborted transaction only in one place. There are about three
>> other places where we check LogicalLockTransaction. Seems inconsistent.
>>
>
> Note that I have added it for the OPTIONAL test_decoding test cases
> (which AFAIK we don't plan to commit in that state) which demonstrate
> concurrent rollback interlocking with the lock/unlock APIs. The first
> ELOG was enough to catch the interaction. If we think these elogs
> should be present in the code, then yes, I can add it elsewhere as
> well as part of an earlier patch.
>

Ah, I see. Makes sense. I've been looking at the patch as a whole and
haven't realized it's part of this piece.

>
> Ok, I am looking at your provided patch and incorporating relevant
> changes from it. WIll submit a patch set soon.
>

OK.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2018-04-05 08:46:08 Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Previous Message Heikki Linnakangas 2018-04-05 08:35:08 Re: chained transactions