Quick Links

Re: logical changeset generation v6.4

From:	Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: logical changeset generation v6.4
Date:	2013-10-21 15:26:47
Message-ID:	526547B7.5030009@2ndQuadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 10/21/2013 05:06 PM, Andres Freund wrote:
> On 2013-10-21 16:40:43 +0200, Hannu Krosing wrote:
>> On 10/18/2013 08:50 PM, Andres Freund wrote:
>>> On 2013-10-18 08:11:29 -0400, Robert Haas wrote:
>> ...
>>>> 2. If that seems too complicated, how about just logging the whole old
>>>> tuple for version 1?
>>> I think that'd make the patch much less useful because it bloats WAL
>>> unnecessarily for the primary user (replication) of it. I'd rather go
>>> for primary keys only if that proves to be the contentious point.
>>>
>>> How about modifying the selection to go from:
>>> * all rows if ALTER TABLE ... REPLICA IDENTITY NOTHING|FULL;
>>> * index chosen by ALTER TABLE ... REPLICA IDENTITY USING indexname
>>> * [later, maybe] ALTER TABLE ... REPLICA IDENTITY (cola, colb)
>>> * primary key
>>> * candidate key with the smallest oid
>>>
>>> Including the candidate key will help people using changeset extration
>>> for auditing that do not have primary key. That really isn't an
>>> infrequent usecase.
>> As I understand it for a table with *no* unique index,
>> the "candidate key" is the full tuple, so if we get an UPDATE for
>> it then this should be replicated as
>> "UPDATE first row matching (NOT DISTINCT FROM) all columns"
>> which on replay side will be equivalent to
>> CREATE CURSOR ...; FETCH 1 ...; UPDATE ... WHERE CURRENT...'
> No, it's not a candidate key since it's not uniquely identifying a
> row. You can play tricks as you describe, but that still doesn't make
> the whole row a candidate key.
>
> But anyway, I suggest allowing for logging all columns above...
I the "all columns" option this ?

How about modifying the selection to go from:
* all rows if ALTER TABLE ... REPLICA IDENTITY NOTHING|FULL;

for some reason I thought it to be option to either log or not log PK column ...

>
>> I know that this will slow down replication, as you can not use direct
>> index updates internally - at least not easily - but need to let postgreSQL
>> actually plan this, but such single row update is no faster on origin
>> either.
> That's not actually true. Consider somebody doing something like:
> UPDATE big_table_without_indexes SET column = ...;
> On the source side that's essentialy O(n). If you replicate on a
> row-by-row basis it will be O(n^2) on the replay side.
Probably more like O(n^2 / 2) but yes, this is what I meant with the
sentence
after that ;)

Cheers

--
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ

In response to

Re: logical changeset generation v6.4 at 2013-10-21 15:06:02 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Hannu Krosing	2013-10-21 15:46:05	Re: Commitfest II CLosed
Previous Message	Andres Freund	2013-10-21 15:20:26	Re: logical changeset generation v6.4