Re: [COMMITTERS] pgsql: Fix decoding of MULTI_INSERTs when rows other than the last are

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: pgsql-committers(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [COMMITTERS] pgsql: Fix decoding of MULTI_INSERTs when rows other than the last are
Date: 2014-07-06 16:50:52
Message-ID: 20140706165052.GA28604@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

Hi,

On 2014-07-06 14:01:11 +0000, Andres Freund wrote:
> Fix decoding of MULTI_INSERTs when rows other than the last are toasted.
>
> When decoding the results of a HEAP2_MULTI_INSERT (currently only
> generated by COPY FROM) toast columns for all but the last tuple
> weren't replaced by their actual contents before being handed to the
> output plugin. The reassembled toast datums where disregarded after
> every REORDER_BUFFER_CHANGE_(INSERT|UPDATE|DELETE) which is correct
> for plain inserts, updates, deletes, but not multi inserts - there we
> generate several REORDER_BUFFER_CHANGE_INSERTs for a single
> xl_heap_multi_insert record.
>
> To solve the problem add a clear_toast_afterwards boolean to
> ReorderBufferChange's union member that's used by modifications. All
> row changes but multi_inserts always set that to true, but
> multi_insert sets it only for the last change generated.
>
> Add a regression test covering decoding of multi_inserts - there was
> none at all before.
>
> Backpatch to 9.4 where logical decoding was introduced.
>
> Bug found by Petr Jelinek.

Further testing unfortuantely shows that this isn't sufficient:

Commit 1b86c81d2d fixed the decoding of toasted columns for the rows
contained in one xl_heap_multi_insert record. But that's not actually
enough because heap_multi_insert() will actually first toast all
passed in rows and then emit several *_multi_insert records; one for
each page it fills with tuples.

I've attached a preliminary patch that:
Add a XLOG_HEAP_LAST_MULTI_INSERT flag which is set in
xl_heap_multi_insert->flag denoting that this multi_insert record is
the last emitted by one heap_multi_insert() call. Then use that flag
in decode.c to only set clear_toast_afterwards in the right situation.

But since I am not exactly happy about that solution I'm wondering if
somebody can think of a better approach. Alternatives I though of
included:
* Add a boolean to xl_heap_multi_insert instead of adding the
XLOG_HEAP_LAST_MULTI_INSERT flag. I don't have an opinion either way
about that one.
* Only toast rows in heap_multi_insert for every page. That doesn't
really work without major reworks though as the loop over all the
tuples is done in a critical section.
* Don't clean up toast chunks for multi insert at all. Instead do so after
the next !multi insert record. I think that's out because of the
amount of multi_insert calls COPY can produce.

The patch also adds 200 lines of COPY FROM STDIN in the regression tests
to create a long COPY that does a multi insert with toast covering two
pages. Does anybody object to that?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
0001-Fix-decoding-of-consecutive-MULTI_INSERTs-emitted-by.patch text/x-patch 29.8 KB

In response to

Browse pgsql-committers by date

  From Date Subject
Next Message Robert Haas 2014-07-06 18:57:09 pgsql: Remove swpb-based spinlock implementation for ARMv5 and earlier.
Previous Message Andres Freund 2014-07-06 14:01:13 pgsql: Fix decoding of MULTI_INSERTs when rows other than the last are

Browse pgsql-hackers by date

  From Date Subject
Next Message Sawada Masahiko 2014-07-06 17:18:38 Re: add line number as prompt option to psql
Previous Message Tomas Vondra 2014-07-06 16:44:35 Re: tweaking NTUP_PER_BUCKET