Lists: | pgsql-hackers |
---|
From: | Erik Rijkers <er(at)xs4all(dot)nl> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Logical replication - TRAP: FailedAssertion in pgstat.c |
Date: | 2017-04-15 02:47:01 |
Message-ID: | addb5506ffba0d3e6877a6348f29210e@xs4all.nl |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Testing logical replication, with the following patches on top of
yesterday's master:
0001-Reserve-global-xmin-for-create-slot-snasphot-export.patch +
0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch+
0003-Prevent-snapshot-builder-xmin-from-going-backwards.patch +
0004-Fix-xl_running_xacts-usage-in-snapshot-builder.patch +
0005-Skip-unnecessary-snapshot-builds.patch
Is applying that patch set is still correct?
It builds fine, but when I run the old pbench-over-logical-replication
test I get:
TRAP: FailedAssertion("!(entry->trans == ((void *)0))", File:
"pgstat.c", Line: 828)
reliably (often within a minute).
The test itself does not fail, at least not that I saw (but I only ran a
few).
thanks,
Erik Rijkers
From: | Erik Rijkers <er(at)xs4all(dot)nl> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Cc: | pgsql-hackers-owner(at)postgresql(dot)org |
Subject: | Re: Logical replication - TRAP: FailedAssertion in pgstat.c |
Date: | 2017-04-16 08:46:21 |
Message-ID: | f0a17406e84a663ce46cc0f164bf5221@xs4all.nl |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 2017-04-15 04:47, Erik Rijkers wrote:
>
> 0001-Reserve-global-xmin-for-create-slot-snasphot-export.patch +
> 0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch+
> 0003-Prevent-snapshot-builder-xmin-from-going-backwards.patch +
> 0004-Fix-xl_running_xacts-usage-in-snapshot-builder.patch +
> 0005-Skip-unnecessary-snapshot-builds.patch
I am now using these newer patches:
https://www.postgresql.org/message-id/30242bc6-eca4-b7bb-670e-8d0458753a8c%402ndquadrant.com
> It builds fine, but when I run the old pbench-over-logical-replication
> test I get:
>
> TRAP: FailedAssertion("!(entry->trans == ((void *)0))", File:
> "pgstat.c", Line: 828)
To get that error:
--------------
#!/bin/sh
port1=6972 port2=6973 scale=25 clients=16 duration=60
echo "drop table if exists pgbench_accounts;
drop table if exists pgbench_branches;
drop table if exists pgbench_tellers;
drop table if exists pgbench_history;" | psql -qXp $port1 \
&& echo "drop table if exists pgbench_accounts;
drop table if exists pgbench_branches;
drop table if exists pgbench_tellers;
drop table if exists pgbench_history;" | psql -qXp $port2 \
&& pgbench -p $port1 -qis ${scale//_/} && echo "
alter table pgbench_history add column hid serial primary key;
" | psql -q1Xp $port1 \
&& pg_dump -F c -p $port1 \
--exclude-table-data=pgbench_history \
--exclude-table-data=pgbench_accounts \
--exclude-table-data=pgbench_branches \
--exclude-table-data=pgbench_tellers \
-t pgbench_history \
-t pgbench_accounts \
-t pgbench_branches \
-t pgbench_tellers \
| pg_restore -1 -p $port2 -d testdb
appname=pgbench_derail
echo "create publication pub1 for all tables;" | psql -p $port1 -aqtAX
echo "create subscription sub1 connection 'port=${port1}
application_name=${appname}' publication pub1 with (disabled);
alter subscription sub1 enable;
" | psql -p $port2 -aqtAX
echo "-- pgbench -p $port1 -c $clients -T $duration -n -- scale $scale
"
pgbench -p $port1 -c $clients -T $duration -n
--------------
Erik Rijkers
From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Erik Rijkers <er(at)xs4all(dot)nl>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, Peter Eisentraut <peter_e(at)gmx(dot)net> |
Cc: | pgsql-hackers(at)postgresql(dot)org, pgsql-hackers-owner(at)postgresql(dot)org |
Subject: | Re: Logical replication - TRAP: FailedAssertion in pgstat.c |
Date: | 2017-04-16 18:41:25 |
Message-ID: | 20170416184125.r7ogozn2osae2yom@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 2017-04-16 10:46:21 +0200, Erik Rijkers wrote:
> On 2017-04-15 04:47, Erik Rijkers wrote:
> >
> > 0001-Reserve-global-xmin-for-create-slot-snasphot-export.patch +
> > 0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch+
> > 0003-Prevent-snapshot-builder-xmin-from-going-backwards.patch +
> > 0004-Fix-xl_running_xacts-usage-in-snapshot-builder.patch +
> > 0005-Skip-unnecessary-snapshot-builds.patch
>
> I am now using these newer patches:
> https://www.postgresql.org/message-id/30242bc6-eca4-b7bb-670e-8d0458753a8c%402ndquadrant.com
>
> > It builds fine, but when I run the old pbench-over-logical-replication
> > test I get:
> >
> > TRAP: FailedAssertion("!(entry->trans == ((void *)0))", File:
> > "pgstat.c", Line: 828)
>
>
> To get that error:
I presume this is the fault of
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=139eb9673cb84c76f493af7e68301ae204199746
if you git revert that individual commit, do things work again?
- Andres
From: | Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de>, Erik Rijkers <er(at)xs4all(dot)nl>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, Peter Eisentraut <peter_e(at)gmx(dot)net> |
Cc: | pgsql-hackers(at)postgresql(dot)org, pgsql-hackers-owner(at)postgresql(dot)org |
Subject: | Re: Logical replication - TRAP: FailedAssertion in pgstat.c |
Date: | 2017-04-16 20:11:12 |
Message-ID: | dfc1c75f-e2d7-bc59-23ca-bc9b9af2dbb6@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 16/04/17 20:41, Andres Freund wrote:
> On 2017-04-16 10:46:21 +0200, Erik Rijkers wrote:
>> On 2017-04-15 04:47, Erik Rijkers wrote:
>>>
>>> 0001-Reserve-global-xmin-for-create-slot-snasphot-export.patch +
>>> 0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch+
>>> 0003-Prevent-snapshot-builder-xmin-from-going-backwards.patch +
>>> 0004-Fix-xl_running_xacts-usage-in-snapshot-builder.patch +
>>> 0005-Skip-unnecessary-snapshot-builds.patch
>>
>> I am now using these newer patches:
>> https://www.postgresql.org/message-id/30242bc6-eca4-b7bb-670e-8d0458753a8c%402ndquadrant.com
>>
>>> It builds fine, but when I run the old pbench-over-logical-replication
>>> test I get:
>>>
>>> TRAP: FailedAssertion("!(entry->trans == ((void *)0))", File:
>>> "pgstat.c", Line: 828)
>>
>>
>> To get that error:
>
> I presume this is the fault of
> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=139eb9673cb84c76f493af7e68301ae204199746
> if you git revert that individual commit, do things work again?
>
> - Andres
>
Yeah it is, it needs to be fenced to happen only after commit, which is
not guaranteed at the point of code, we probably need to put the
pgstat_report_stat() inside the if above after the
CommitTransactionCommand() (that will make it report stats for changes
apply did to pg_subscription_rel after next transaction though) or put
it into it's own if that checks if tx was indeed committed (which is bit
ugly).
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From: | Erik Rijkers <er(at)xs4all(dot)nl> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org, pgsql-hackers-owner(at)postgresql(dot)org |
Subject: | Re: Logical replication - TRAP: FailedAssertion in pgstat.c |
Date: | 2017-04-17 07:30:58 |
Message-ID: | dfced5b5a3267e48cb8dbf8aebc6c963@xs4all.nl |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 2017-04-16 20:41, Andres Freund wrote:
> On 2017-04-16 10:46:21 +0200, Erik Rijkers wrote:
>> On 2017-04-15 04:47, Erik Rijkers wrote:
>> >
>> > 0001-Reserve-global-xmin-for-create-slot-snasphot-export.patch +
>> > 0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch+
>> > 0003-Prevent-snapshot-builder-xmin-from-going-backwards.patch +
>> > 0004-Fix-xl_running_xacts-usage-in-snapshot-builder.patch +
>> > 0005-Skip-unnecessary-snapshot-builds.patch
>>
>> I am now using these newer patches:
>> https://www.postgresql.org/message-id/30242bc6-eca4-b7bb-670e-8d0458753a8c%402ndquadrant.com
>>
>> > It builds fine, but when I run the old pbench-over-logical-replication
>> > test I get:
>> >
>> > TRAP: FailedAssertion("!(entry->trans == ((void *)0))", File:
>> > "pgstat.c", Line: 828)
>>
>>
>> To get that error:
>
> I presume this is the fault of
> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=139eb9673cb84c76f493af7e68301ae204199746
> if you git revert that individual commit, do things work again?
>
Yes, compiled from 67c2def11d4 with the above 4 patches, it runs
flawlessly again. (flawlessly= a few hours without any error)
From: | Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru> |
---|---|
To: | Erik Rijkers <er(at)xs4all(dot)nl> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org, pgsql-hackers-owner(at)postgresql(dot)org |
Subject: | Re: Logical replication - TRAP: FailedAssertion in pgstat.c |
Date: | 2017-04-17 13:59:35 |
Message-ID: | 63CDF594-3680-4E05-BBB4-6A4CF86FEFAD@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
> On 17 Apr 2017, at 10:30, Erik Rijkers <er(at)xs4all(dot)nl> wrote:
>
> On 2017-04-16 20:41, Andres Freund wrote:
>> On 2017-04-16 10:46:21 +0200, Erik Rijkers wrote:
>>> On 2017-04-15 04:47, Erik Rijkers wrote:
>>> >
>>> > 0001-Reserve-global-xmin-for-create-slot-snasphot-export.patch +
>>> > 0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch+
>>> > 0003-Prevent-snapshot-builder-xmin-from-going-backwards.patch +
>>> > 0004-Fix-xl_running_xacts-usage-in-snapshot-builder.patch +
>>> > 0005-Skip-unnecessary-snapshot-builds.patch
>>> I am now using these newer patches:
>>> https://www.postgresql.org/message-id/30242bc6-eca4-b7bb-670e-8d0458753a8c%402ndquadrant.com
>>> > It builds fine, but when I run the old pbench-over-logical-replication
>>> > test I get:
>>> >
>>> > TRAP: FailedAssertion("!(entry->trans == ((void *)0))", File:
>>> > "pgstat.c", Line: 828)
>>> To get that error:
>> I presume this is the fault of
>> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=139eb9673cb84c76f493af7e68301ae204199746
>> if you git revert that individual commit, do things work again?
>
> Yes, compiled from 67c2def11d4 with the above 4 patches, it runs flawlessly again. (flawlessly= a few hours without any error)
>
I’ve reproduced failure, this happens under tablesync worker and putting
pgstat_report_stat() under the previous condition block should help.
However for me it took about an hour of running this script to catch original assert.
Can you check with that patch applied?
Attachment | Content-Type | Size |
---|---|---|
logical_worker.patch | application/octet-stream | 704 bytes |
unknown_filename | text/plain | 95 bytes |
From: | Erik Rijkers <er(at)xs4all(dot)nl> |
---|---|
To: | Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org, pgsql-hackers-owner(at)postgresql(dot)org |
Subject: | Re: Logical replication - TRAP: FailedAssertion in pgstat.c |
Date: | 2017-04-17 19:16:54 |
Message-ID: | 81b4087df137adee5745bb8f987c1fad@xs4all.nl |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 2017-04-17 15:59, Stas Kelvich wrote:
>> On 17 Apr 2017, at 10:30, Erik Rijkers <er(at)xs4all(dot)nl> wrote:
>>
>> On 2017-04-16 20:41, Andres Freund wrote:
>>> On 2017-04-16 10:46:21 +0200, Erik Rijkers wrote:
>>>> On 2017-04-15 04:47, Erik Rijkers wrote:
>>>> >
>>>> > 0001-Reserve-global-xmin-for-create-slot-snasphot-export.patch +
>>>> > 0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch+
>>>> > 0003-Prevent-snapshot-builder-xmin-from-going-backwards.patch +
>>>> > 0004-Fix-xl_running_xacts-usage-in-snapshot-builder.patch +
>>>> > 0005-Skip-unnecessary-snapshot-builds.patch
>>>> I am now using these newer patches:
>>>> https://www.postgresql.org/message-id/30242bc6-eca4-b7bb-670e-8d0458753a8c%402ndquadrant.com
>>>> > It builds fine, but when I run the old pbench-over-logical-replication
>>>> > test I get:
>>>> >
>>>> > TRAP: FailedAssertion("!(entry->trans == ((void *)0))", File:
>>>> > "pgstat.c", Line: 828)
>>>> To get that error:
>>> I presume this is the fault of
>>> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=139eb9673cb84c76f493af7e68301ae204199746
>>> if you git revert that individual commit, do things work again?
>>
>> Yes, compiled from 67c2def11d4 with the above 4 patches, it runs
>> flawlessly again. (flawlessly= a few hours without any error)
>>
>
> I’ve reproduced failure, this happens under tablesync worker and
> putting
> pgstat_report_stat() under the previous condition block should help.
>
> However for me it took about an hour of running this script to catch
> original assert.
>
> Can you check with that patch applied?
Your patch on top of the 5 patches above seem to solve the matter too:
no problems after running for 2 hours (previously it failed within half
a minute).
Erik Rijkers
From: | Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com> |
---|---|
To: | Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Erik Rijkers <er(at)xs4all(dot)nl>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru> |
Cc: | pgsql-hackers(at)postgresql(dot)org, pgsql-hackers-owner(at)postgresql(dot)org |
Subject: | Re: Logical replication - TRAP: FailedAssertion in pgstat.c |
Date: | 2017-04-20 18:58:17 |
Message-ID: | 7cbd2286-ae2c-7dd1-722d-7531388636b8@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 4/16/17 16:11, Petr Jelinek wrote:
> Yeah it is, it needs to be fenced to happen only after commit, which is
> not guaranteed at the point of code, we probably need to put the
> pgstat_report_stat() inside the if above after the
> CommitTransactionCommand() (that will make it report stats for changes
> apply did to pg_subscription_rel after next transaction though)
I think to avoid the latter, we should add more pgstat_report_stat()
calls, such as in process_syncing_tables_for_apply(). Basically every
code path that calls CommitTransactionCommand() should have one, no?
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com> |
Cc: | Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Erik Rijkers <er(at)xs4all(dot)nl>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, pgsql-hackers-owner(at)postgresql(dot)org |
Subject: | Re: Logical replication - TRAP: FailedAssertion in pgstat.c |
Date: | 2017-05-02 18:43:28 |
Message-ID: | CA+TgmoYXdVPaUycSd-mC-4fZNFkGsUKb8DyYb1J6SV7C-ffgWA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Thu, Apr 20, 2017 at 2:58 PM, Peter Eisentraut
<peter(dot)eisentraut(at)2ndquadrant(dot)com> wrote:
> On 4/16/17 16:11, Petr Jelinek wrote:
>> Yeah it is, it needs to be fenced to happen only after commit, which is
>> not guaranteed at the point of code, we probably need to put the
>> pgstat_report_stat() inside the if above after the
>> CommitTransactionCommand() (that will make it report stats for changes
>> apply did to pg_subscription_rel after next transaction though)
>
> I think to avoid the latter, we should add more pgstat_report_stat()
> calls, such as in process_syncing_tables_for_apply(). Basically every
> code path that calls CommitTransactionCommand() should have one, no?
Is there anything left to be committed here?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From: | Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, Erik Rijkers <er(at)xs4all(dot)nl>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Logical replication - TRAP: FailedAssertion in pgstat.c |
Date: | 2017-05-03 06:17:45 |
Message-ID: | 22cc402c-88eb-fa35-217f-0060db2c72f0@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 02/05/17 20:43, Robert Haas wrote:
> On Thu, Apr 20, 2017 at 2:58 PM, Peter Eisentraut
> <peter(dot)eisentraut(at)2ndquadrant(dot)com> wrote:
>> On 4/16/17 16:11, Petr Jelinek wrote:
>>> Yeah it is, it needs to be fenced to happen only after commit, which is
>>> not guaranteed at the point of code, we probably need to put the
>>> pgstat_report_stat() inside the if above after the
>>> CommitTransactionCommand() (that will make it report stats for changes
>>> apply did to pg_subscription_rel after next transaction though)
>>
>> I think to avoid the latter, we should add more pgstat_report_stat()
>> calls, such as in process_syncing_tables_for_apply(). Basically every
>> code path that calls CommitTransactionCommand() should have one, no?
>
> Is there anything left to be committed here?
>
Afaics the fix was not committed. Peter wanted more comprehensive fix
which didn't happen. I think something like attached should do the job.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachment | Content-Type | Size |
---|---|---|
fix-statistics-reporting-in-logical-replication-work.patch | binary/octet-stream | 2.9 KB |
From: | Erik Rijkers <er(at)xs4all(dot)nl> |
---|---|
To: | Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Logical replication - TRAP: FailedAssertion in pgstat.c |
Date: | 2017-05-03 11:23:19 |
Message-ID: | b2c7513379be13c0ac5d95648d78db48@xs4all.nl |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 2017-05-03 08:17, Petr Jelinek wrote:
> On 02/05/17 20:43, Robert Haas wrote:
>> On Thu, Apr 20, 2017 at 2:58 PM, Peter Eisentraut
>>> code path that calls CommitTransactionCommand() should have one, no?
>>
>> Is there anything left to be committed here?
>>
>
> Afaics the fix was not committed. Peter wanted more comprehensive fix
> which didn't happen. I think something like attached should do the job.
I'm running my pgbench-over-logical-replication test in chunk of 15
minutes, wth different pgbench -c (num clients) and -s (scale) values.
With this patch (and nothing else) on top of master (8f8b9be51fd7 to be
precise):
> fix-statistics-reporting-in-logical-replication-work.patch
logical replication is still often failing (as expected, I suppose; it
seems because of "inital snapshot too large") but indeed I do not see
the 'TRAP: FailedAssertion in pgstat.c' anymore.
(If there is any other configuration of patches worth testing please let
me know)
thanks
Erik Rijkers
From: | Noah Misch <noah(at)leadboat(dot)com> |
---|---|
To: | Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com> |
Cc: | Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Erik Rijkers <er(at)xs4all(dot)nl>, Andres Freund <andres(at)anarazel(dot)de>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Logical replication - TRAP: FailedAssertion in pgstat.c |
Date: | 2017-05-05 07:11:42 |
Message-ID: | 20170505071142.GC843225@rfd.leadboat.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Wed, May 03, 2017 at 01:23:19PM +0200, Erik Rijkers wrote:
> On 2017-05-03 08:17, Petr Jelinek wrote:
> >On 02/05/17 20:43, Robert Haas wrote:
> >>On Thu, Apr 20, 2017 at 2:58 PM, Peter Eisentraut
>
> >>>code path that calls CommitTransactionCommand() should have one, no?
> >>
> >>Is there anything left to be committed here?
> >>
> >
> >Afaics the fix was not committed. Peter wanted more comprehensive fix
> >which didn't happen. I think something like attached should do the job.
>
> I'm running my pgbench-over-logical-replication test in chunk of 15 minutes,
> wth different pgbench -c (num clients) and -s (scale) values.
>
> With this patch (and nothing else) on top of master (8f8b9be51fd7 to be
> precise):
>
> >fix-statistics-reporting-in-logical-replication-work.patch
>
> logical replication is still often failing (as expected, I suppose; it seems
> because of "inital snapshot too large") but indeed I do not see the 'TRAP:
> FailedAssertion in pgstat.c' anymore.
>
> (If there is any other configuration of patches worth testing please let me
> know)
[Action required within three days. This is a generic notification.]
The above-described topic is currently a PostgreSQL 10 open item. Peter,
since you committed the patch believed to have created it, you own this open
item. If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know. Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message. Include a date for your subsequent status update. Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10. Consequently, I will appreciate your efforts
toward speedy resolution. Thanks.
[1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
From: | Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com> |
---|---|
To: | Erik Rijkers <er(at)xs4all(dot)nl> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Logical replication - TRAP: FailedAssertion in pgstat.c |
Date: | 2017-05-05 11:13:42 |
Message-ID: | db35e432-56b5-830f-5fc4-26d3ca47724f@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 03/05/17 13:23, Erik Rijkers wrote:
> On 2017-05-03 08:17, Petr Jelinek wrote:
>> On 02/05/17 20:43, Robert Haas wrote:
>>> On Thu, Apr 20, 2017 at 2:58 PM, Peter Eisentraut
>
>>>> code path that calls CommitTransactionCommand() should have one, no?
>>>
>>> Is there anything left to be committed here?
>>>
>>
>> Afaics the fix was not committed. Peter wanted more comprehensive fix
>> which didn't happen. I think something like attached should do the job.
>
> I'm running my pgbench-over-logical-replication test in chunk of 15
> minutes, wth different pgbench -c (num clients) and -s (scale) values.
>
> With this patch (and nothing else) on top of master (8f8b9be51fd7 to be
> precise):
>
>> fix-statistics-reporting-in-logical-replication-work.patch
>
> logical replication is still often failing (as expected, I suppose; it
> seems because of "inital snapshot too large") but indeed I do not see
Yes that's different thing that we've been discussing a bit in snapbuild
woes thread.
> the 'TRAP: FailedAssertion in pgstat.c' anymore.
>
> (If there is any other configuration of patches worth testing please let
> me know)
>
Thanks, so the patch works.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com> |
Cc: | Erik Rijkers <er(at)xs4all(dot)nl>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Logical replication - TRAP: FailedAssertion in pgstat.c |
Date: | 2017-05-08 15:52:29 |
Message-ID: | CAD21AoDgLu532G-goNE9F0ppicNego+Wx7RKh_qCs0s_aO6VHQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Fri, May 5, 2017 at 8:13 PM, Petr Jelinek
<petr(dot)jelinek(at)2ndquadrant(dot)com> wrote:
> On 03/05/17 13:23, Erik Rijkers wrote:
>> On 2017-05-03 08:17, Petr Jelinek wrote:
>>> On 02/05/17 20:43, Robert Haas wrote:
>>>> On Thu, Apr 20, 2017 at 2:58 PM, Peter Eisentraut
>>
>>>>> code path that calls CommitTransactionCommand() should have one, no?
>>>>
>>>> Is there anything left to be committed here?
>>>>
>>>
>>> Afaics the fix was not committed. Peter wanted more comprehensive fix
>>> which didn't happen. I think something like attached should do the job.
>>
>> I'm running my pgbench-over-logical-replication test in chunk of 15
>> minutes, wth different pgbench -c (num clients) and -s (scale) values.
>>
>> With this patch (and nothing else) on top of master (8f8b9be51fd7 to be
>> precise):
>>
>>> fix-statistics-reporting-in-logical-replication-work.patch
>>
>> logical replication is still often failing (as expected, I suppose; it
>> seems because of "inital snapshot too large") but indeed I do not see
>
> Yes that's different thing that we've been discussing a bit in snapbuild
> woes thread.
>
>> the 'TRAP: FailedAssertion in pgstat.c' anymore.
>>
>> (If there is any other configuration of patches worth testing please let
>> me know)
>>
>
> Thanks, so the patch works.
>
I think that we should commit the local transaction that did initial
data copy, and then report stat as well. Currently table sync worker
doesn't commit the local transaction in LogicalRepSyncTableStart
(maybe until apply commit record?) if its status is changed to
SUBREL_STATE_CATCHUP. That's why the table sync worker issues
assertion failure.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
From: | Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com> |
---|---|
To: | Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Erik Rijkers <er(at)xs4all(dot)nl> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Logical replication - TRAP: FailedAssertion in pgstat.c |
Date: | 2017-05-08 16:12:58 |
Message-ID: | 781faf78-ef5c-b7a2-250b-44f964af292a@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 5/5/17 07:13, Petr Jelinek wrote:
> Yes that's different thing that we've been discussing a bit in snapbuild
> woes thread.
>
>> the 'TRAP: FailedAssertion in pgstat.c' anymore.
>>
>> (If there is any other configuration of patches worth testing please let
>> me know)
>
> Thanks, so the patch works.
Committed, with s/xact_started/started_tx/, to match nearby code.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From: | Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com> |
---|---|
To: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
Cc: | Erik Rijkers <er(at)xs4all(dot)nl>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Logical replication - TRAP: FailedAssertion in pgstat.c |
Date: | 2017-05-08 16:26:36 |
Message-ID: | 5f656818-4800-62e7-7c28-b2f1b584e2cb@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 08/05/17 17:52, Masahiko Sawada wrote:
> On Fri, May 5, 2017 at 8:13 PM, Petr Jelinek
> <petr(dot)jelinek(at)2ndquadrant(dot)com> wrote:
>> On 03/05/17 13:23, Erik Rijkers wrote:
>>> On 2017-05-03 08:17, Petr Jelinek wrote:
>>>> On 02/05/17 20:43, Robert Haas wrote:
>>>>> On Thu, Apr 20, 2017 at 2:58 PM, Peter Eisentraut
>>>
>>>>>> code path that calls CommitTransactionCommand() should have one, no?
>>>>>
>>>>> Is there anything left to be committed here?
>>>>>
>>>>
>>>> Afaics the fix was not committed. Peter wanted more comprehensive fix
>>>> which didn't happen. I think something like attached should do the job.
>>>
>>> I'm running my pgbench-over-logical-replication test in chunk of 15
>>> minutes, wth different pgbench -c (num clients) and -s (scale) values.
>>>
>>> With this patch (and nothing else) on top of master (8f8b9be51fd7 to be
>>> precise):
>>>
>>>> fix-statistics-reporting-in-logical-replication-work.patch
>>>
>>> logical replication is still often failing (as expected, I suppose; it
>>> seems because of "inital snapshot too large") but indeed I do not see
>>
>> Yes that's different thing that we've been discussing a bit in snapbuild
>> woes thread.
>>
>>> the 'TRAP: FailedAssertion in pgstat.c' anymore.
>>>
>>> (If there is any other configuration of patches worth testing please let
>>> me know)
>>>
>>
>> Thanks, so the patch works.
>>
>
> I think that we should commit the local transaction that did initial
> data copy, and then report stat as well. Currently table sync worker
> doesn't commit the local transaction in LogicalRepSyncTableStart
> (maybe until apply commit record?) if its status is changed to
> SUBREL_STATE_CATCHUP. That's why the table sync worker issues
> assertion failure.
>
That would fix the assert as well yes, but it would also mean that if
the worker crashed between the initial copy and the end of catchup there
would be no way to restart it without manual intervention from user
since the synchronization position would be lost. Hence the fix I
proposed which does it differently and has the whole sync in a single
transaction.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From: | Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com> |
---|---|
To: | Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Erik Rijkers <er(at)xs4all(dot)nl> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Logical replication - TRAP: FailedAssertion in pgstat.c |
Date: | 2017-05-08 16:27:22 |
Message-ID: | ea2d722e-bead-98de-ab13-5b8e4501d724@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 08/05/17 18:12, Peter Eisentraut wrote:
> On 5/5/17 07:13, Petr Jelinek wrote:
>> Yes that's different thing that we've been discussing a bit in snapbuild
>> woes thread.
>>
>>> the 'TRAP: FailedAssertion in pgstat.c' anymore.
>>>
>>> (If there is any other configuration of patches worth testing please let
>>> me know)
>>
>> Thanks, so the patch works.
>
> Committed, with s/xact_started/started_tx/, to match nearby code.
>
Thanks!
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com> |
Cc: | Erik Rijkers <er(at)xs4all(dot)nl>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Logical replication - TRAP: FailedAssertion in pgstat.c |
Date: | 2017-05-09 00:32:01 |
Message-ID: | CAD21AoB_p+okFK_tROGxG-P1xfSN2TVxwPZoY8gf9BXcahq-WQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Tue, May 9, 2017 at 1:26 AM, Petr Jelinek
<petr(dot)jelinek(at)2ndquadrant(dot)com> wrote:
> On 08/05/17 17:52, Masahiko Sawada wrote:
>> On Fri, May 5, 2017 at 8:13 PM, Petr Jelinek
>> <petr(dot)jelinek(at)2ndquadrant(dot)com> wrote:
>>> On 03/05/17 13:23, Erik Rijkers wrote:
>>>> On 2017-05-03 08:17, Petr Jelinek wrote:
>>>>> On 02/05/17 20:43, Robert Haas wrote:
>>>>>> On Thu, Apr 20, 2017 at 2:58 PM, Peter Eisentraut
>>>>
>>>>>>> code path that calls CommitTransactionCommand() should have one, no?
>>>>>>
>>>>>> Is there anything left to be committed here?
>>>>>>
>>>>>
>>>>> Afaics the fix was not committed. Peter wanted more comprehensive fix
>>>>> which didn't happen. I think something like attached should do the job.
>>>>
>>>> I'm running my pgbench-over-logical-replication test in chunk of 15
>>>> minutes, wth different pgbench -c (num clients) and -s (scale) values.
>>>>
>>>> With this patch (and nothing else) on top of master (8f8b9be51fd7 to be
>>>> precise):
>>>>
>>>>> fix-statistics-reporting-in-logical-replication-work.patch
>>>>
>>>> logical replication is still often failing (as expected, I suppose; it
>>>> seems because of "inital snapshot too large") but indeed I do not see
>>>
>>> Yes that's different thing that we've been discussing a bit in snapbuild
>>> woes thread.
>>>
>>>> the 'TRAP: FailedAssertion in pgstat.c' anymore.
>>>>
>>>> (If there is any other configuration of patches worth testing please let
>>>> me know)
>>>>
>>>
>>> Thanks, so the patch works.
>>>
>>
>> I think that we should commit the local transaction that did initial
>> data copy, and then report stat as well. Currently table sync worker
>> doesn't commit the local transaction in LogicalRepSyncTableStart
>> (maybe until apply commit record?) if its status is changed to
>> SUBREL_STATE_CATCHUP. That's why the table sync worker issues
>> assertion failure.
>>
>
> That would fix the assert as well yes, but it would also mean that if
> the worker crashed between the initial copy and the end of catchup there
> would be no way to restart it without manual intervention from user
> since the synchronization position would be lost. Hence the fix I
> proposed which does it differently and has the whole sync in a single
> transaction.
I understood that the data synchronization even including apply
logical record after changed to SUBREL_STATE_CATCHUP should be done in
a single transaction. Thank you for explanation.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center