Re: [COMMITTERS] pgsql: Fix decoding of MULTI_INSERTs when rows other than the last are

Lists: pgsql-committerspgsql-hackers
From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-committers(at)postgresql(dot)org
Subject: pgsql: Fix decoding of MULTI_INSERTs when rows other than the last are
Date: 2014-07-06 14:01:11
Message-ID: E1X3n0B-0000Ie-2J@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Fix decoding of MULTI_INSERTs when rows other than the last are toasted.

When decoding the results of a HEAP2_MULTI_INSERT (currently only
generated by COPY FROM) toast columns for all but the last tuple
weren't replaced by their actual contents before being handed to the
output plugin. The reassembled toast datums where disregarded after
every REORDER_BUFFER_CHANGE_(INSERT|UPDATE|DELETE) which is correct
for plain inserts, updates, deletes, but not multi inserts - there we
generate several REORDER_BUFFER_CHANGE_INSERTs for a single
xl_heap_multi_insert record.

To solve the problem add a clear_toast_afterwards boolean to
ReorderBufferChange's union member that's used by modifications. All
row changes but multi_inserts always set that to true, but
multi_insert sets it only for the last change generated.

Add a regression test covering decoding of multi_inserts - there was
none at all before.

Backpatch to 9.4 where logical decoding was introduced.

Bug found by Petr Jelinek.

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/1b86c81d2d255d3fb665ddc77c2bc3dfd751a1df

Modified Files
--------------
contrib/test_decoding/expected/toast.out | 20 +++++++++++++++++++-
contrib/test_decoding/sql/toast.sql | 13 +++++++++++++
src/backend/replication/logical/decode.c | 10 ++++++++++
src/backend/replication/logical/reorderbuffer.c | 9 ++++++++-
src/include/replication/reorderbuffer.h | 4 ++++
5 files changed, 54 insertions(+), 2 deletions(-)


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: pgsql-committers(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [COMMITTERS] pgsql: Fix decoding of MULTI_INSERTs when rows other than the last are
Date: 2014-07-06 16:50:52
Message-ID: 20140706165052.GA28604@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Hi,

On 2014-07-06 14:01:11 +0000, Andres Freund wrote:
> Fix decoding of MULTI_INSERTs when rows other than the last are toasted.
>
> When decoding the results of a HEAP2_MULTI_INSERT (currently only
> generated by COPY FROM) toast columns for all but the last tuple
> weren't replaced by their actual contents before being handed to the
> output plugin. The reassembled toast datums where disregarded after
> every REORDER_BUFFER_CHANGE_(INSERT|UPDATE|DELETE) which is correct
> for plain inserts, updates, deletes, but not multi inserts - there we
> generate several REORDER_BUFFER_CHANGE_INSERTs for a single
> xl_heap_multi_insert record.
>
> To solve the problem add a clear_toast_afterwards boolean to
> ReorderBufferChange's union member that's used by modifications. All
> row changes but multi_inserts always set that to true, but
> multi_insert sets it only for the last change generated.
>
> Add a regression test covering decoding of multi_inserts - there was
> none at all before.
>
> Backpatch to 9.4 where logical decoding was introduced.
>
> Bug found by Petr Jelinek.

Further testing unfortuantely shows that this isn't sufficient:

Commit 1b86c81d2d fixed the decoding of toasted columns for the rows
contained in one xl_heap_multi_insert record. But that's not actually
enough because heap_multi_insert() will actually first toast all
passed in rows and then emit several *_multi_insert records; one for
each page it fills with tuples.

I've attached a preliminary patch that:
Add a XLOG_HEAP_LAST_MULTI_INSERT flag which is set in
xl_heap_multi_insert->flag denoting that this multi_insert record is
the last emitted by one heap_multi_insert() call. Then use that flag
in decode.c to only set clear_toast_afterwards in the right situation.

But since I am not exactly happy about that solution I'm wondering if
somebody can think of a better approach. Alternatives I though of
included:
* Add a boolean to xl_heap_multi_insert instead of adding the
XLOG_HEAP_LAST_MULTI_INSERT flag. I don't have an opinion either way
about that one.
* Only toast rows in heap_multi_insert for every page. That doesn't
really work without major reworks though as the loop over all the
tuples is done in a critical section.
* Don't clean up toast chunks for multi insert at all. Instead do so after
the next !multi insert record. I think that's out because of the
amount of multi_insert calls COPY can produce.

The patch also adds 200 lines of COPY FROM STDIN in the regression tests
to create a long COPY that does a multi insert with toast covering two
pages. Does anybody object to that?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
0001-Fix-decoding-of-consecutive-MULTI_INSERTs-emitted-by.patch text/x-patch 29.8 KB