Re: logical changeset generation v6

From: Steve Singer <steve(at)ssinger(dot)info>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: logical changeset generation v6
Date: 2013-09-28 17:25:09
Message-ID: BLU0-SMTP992D12C8DC135752910F19DC2A0@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 09/27/2013 05:18 PM, Andres Freund wrote:
> Hi Steve,
>
> On 2013-09-27 17:06:59 -0400, Steve Singer wrote:
>>> I've determined that when in this test the walsender seems to be hitting
>>> this when it is decode the transactions that are behind the slonik
>>> commands to add tables to replication (set add table, set add sequence).
>>> This is before the SUBSCRIBE SET is submitted.
>>>
>>> I've also noticed something else that is strange (but might be unrelated).
>>> If I stop my slon process and restart it I get messages like:
>>>
>>> WARNING: Starting logical replication from 0/a9321360
>>> ERROR: cannot stream from 0/A9321360, minimum is 0/A9320B00
>>>
>>> Where 0/A9321360 was sent in the last packet my slon received from the
>>> walsender before the restart.
> Uh, that looks like I fumbled some comparison. Let me check.
>
>> I've further narrowed this down to something (or the combination of) what
>> the _disorder_replica.altertableaddTriggers(1);
>> stored function does. (or @SLONYNAMESPACE(at)(dot)altertableaddTriggers(int);
>>
>> Which is essentially
>> * Get an exclusive lock on sl_config_lock
>> * Get an exclusive lock on the user table in question
>> * create a trigger (the deny access trigger)
>> * create a truncate trigger
>> * create a deny truncate trigger
>>
>> I am not yet able to replicate the error by issuing the same SQL commands
>> from psql, but I must be missing something.
>>
>> I can replicate this when just using the test_decoding plugin.
> Thanks. That should get me started with debugging. Unless it's possibly
> fixed in the latest version, one bug fixed there might cause something
> like this if the moon stands exactly right?

The latest version has NOT fixed the problem.

Also, I was a bit inaccurate in my previous descriptions. To clarify:

1. I sometimes am getting that 'unexpected duplicate' error
2. The 'set add table ' which triggers those functions that create and
configure triggers is actually causing the walsender to hit the
following assertion
2 0x0000000000773d47 in ExceptionalCondition (
conditionName=conditionName(at)entry=0x8cf400 "!(ent->cmin ==
change->tuplecid.cmin)", errorType=errorType(at)entry=0x7ab830
"FailedAssertion",
fileName=fileName(at)entry=0x8cecc3 "reorderbuffer.c",
lineNumber=lineNumber(at)entry=1162) at assert.c:54
#3 0x0000000000665480 in ReorderBufferBuildTupleCidHash (txn=0x1b6e610,
rb=<optimized out>) at reorderbuffer.c:1162
#4 ReorderBufferCommit (rb=0x1b6e4f8, xid=<optimized out>,
commit_lsn=3461001952, end_lsn=<optimized out>) at reorderbuffer.c:1285
#5 0x000000000065f0f7 in DecodeCommit (xid=<optimized out>,
nsubxacts=<optimized out>, sub_xids=<optimized out>, ninval_msgs=16,
msgs=0x1b637c0, buf=0x7fff54d01530, buf=0x7fff54d01530, ctx=0x1adb928,
ctx=0x1adb928) at decode.c:477

I had added an assert(false) to the code where the 'unknown duplicate'
error was logged to make spotting this easier but yesterday I didn't
double check that I was hitting the assertion I added versus this other
one. I can't yet say if this is two unrelated issues or if I'd get to
the 'unknown duplicate' message immediately after.

> Greetings,
>
> Andres Freund
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2013-09-28 19:03:20 Re: record identical operator - Review
Previous Message Steve Singer 2013-09-28 16:55:44 Re: logical changeset generation v6.1