Re: Relation cache invalidation on replica

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Relation cache invalidation on replica
Date: 2016-02-28 09:53:36
Message-ID: CANP8+jJCFYnBQy0VJpnv+EKxHDsH8-ABv4QsJ+VD=R1_mw=YEQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 27 February 2016 at 07:52, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru
> wrote:

> On 02/27/2016 04:16 AM, Simon Riggs wrote:
>
> On 27 February 2016 at 00:33, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>
>> On 27 February 2016 at 00:29, Andres Freund <andres(at)anarazel(dot)de> wrote:
>>
>>> On 2016-02-26 18:05:55 +0300, Konstantin Knizhnik wrote:
>>> > The reason of the problem is that invalidation messages are not
>>> delivered to
>>> > replica after the end of concurrent create index.
>>> > Invalidation messages are included in xlog as part of transaction
>>> commit
>>> > record.
>>> > Concurrent index create is split into three transaction, last of which
>>> is
>>> > just performing inplace update of index tuple, marking it as valid and
>>> > invalidating cache. But as far as this transaction is not assigned
>>> XID, no
>>> > transaction record is created in WAL and send to replicas. As a result,
>>> > replica doesn't receive this invalidation messages.
>>>
>>> Ugh, that's a fairly ugly bug.
>>
>>
>> Looking now.
>>
>
> If the above is true, then the proposed fix wouldn't work either.
>
> No point in sending a cache invalidation message on the standby if you
> haven't also written WAL, since the catalog re-read would just see the old
> row.
>
> heap_inplace_update() does write WAL, which blows away the starting
> premise.
>
> So I'm not seeing this as an extant bug in an open source version of
> PostgreSQL, in my current understanding.
>
>
> Inplace update really creates record in WAL and this is why index is
> marked as valid at replica.
> But invalidation messages are sent only with transaction commit record and
> such record is not created in this case,
> because there is no assigned XID.
>
> This is a real bug which originally observed by one of our customers with
> different versions of Postgres (last one them have tried was 9.5.1).
> Then we reproduced it locally and determined the reason of the problem.
> Repro scenario is very simple: you just need to create large enough table
> (pgbench with scale factor 100 works well in my case)
> so that "create index concurrently" takes substantial amount of time. If,
> while this statement is in progress, you will execute some query at replica
> which
> uses this index, then it will cache state of relation without index. And
> after even when index is actually constructed, it will never be used in
> this backend (but other backends at replica will use it).
>

OK, so this is a fairly restricted bug. I was wondering how we'd missed it
for years. It wouldn't affect all backends, just those that accessed the
index before it was valid. New backends and restarts wouldn't be affected.

> I am not sure about the best way of fixing the problem.
> I have not tested Andreas proposal:
>
> if (nrels != 0 || nmsgs != 0 || RelcacheInitFileInval)
>
> if it actually fixes the problem.
> Assigning XID in heap_inplace_update definitely should work.
> It is better than forcing assignment XID in DefineIndex? I am not sure,
> because this problem seems to be related only with concurrent update
> (but may be I am wrong).
> At least not all inplace updates should cause catalog invalidation and so
> require XID assignment.
>

We have various proposals for fixing this, so on consideration here's what
I think we should do...

1. Ignore my first patch to always set an xid. Andres thought that this may
break something else could be true, so is not worth the risk.

2. Apply Konstantin's patch to fix this situation for the specific case
only.

3. Take Andres' idea and build that in as protection. We currently check
that nrels != 0 and throw an ERROR. We should do the same thing if there is
an invalidation event, so that we catch errors not just ignore them and
issue the commit anyway. This will check that there are no other cases in
other code.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
cache_inval_has_commit_rec.v1.patch application/octet-stream 730 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Konstantin Knizhnik 2016-02-28 12:14:14 Re: The plan for FDW-based sharding
Previous Message Yury Zhuravlev 2016-02-28 08:51:03 Re: [PATH] Correct negative/zero year in to_date/to_timestamp