COPY with hints, rebirth

Lists: pgsql-hackers
From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: COPY with hints, rebirth
Date: 2012-02-24 20:55:33
Message-ID: CA+U5nMJ8CdApEcm0+Ln0WXVx194k3wJHihb=-0xDHen1v0hhDg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

A long time ago, in a galaxy far away, we discussed ways to speed up
data loads/COPY.
http://archives.postgresql.org/pgsql-hackers/2007-01/msg00470.php

In particular, the idea that we could mark tuples as committed while
we are still loading them, to avoid negative behaviour for the first
reader.

Simple patch to implement this is attached, together with test case.

Current behaviour is shown here
Run COPY and then... SELECT count(*) FROM table with no indexes
1st SELECT Time: 1518.571 ms <--- slowed dramatically by setting hint bits
2nd SELECT Time: 914.141 ms
3rd SELECT Time: 914.921 ms

With this patch I observed the following results
1st SELECT Time: 890.820 ms
2nd SELECT Time: 884.799 ms
3rd SELECT Time: 882.405 ms

What exactly does it do? Previously, we optimised COPY when it was
loading data into a newly created table or a freshly truncated table.
This patch extends that and actually sets the tuple header flag as
HEAP_XMIN_COMMITTED during the load. Doing so is simple 2 lines of
code. The patch also adds some tests for corner cases that would make
that action break MVCC - though those cases are minor and typical data
loads will benefit fully from this.

In the link above, Tom suggested reworking HeapTupleSatisfiesMVCC()
and adding current xid to snapshots. That is an invasive change that I
would wish to avoid at any time and explains the long delay in
tackling this. The way I've implemented it, is just as a short test
during XidInMVCCSnapshot() so that we trap the case when the xid ==
xmax and so would appear to be running. This is much less invasive and
just as performant as Tom's original suggestion.

Why do we need this now? Setting checksums on page requires us to
write WAL for hints, so the situation of the 1st SELECT after a load
would get somewhat worse when page_checksums are enabled, but we
already know there is a price. However, this is a situation we can
solve, and add value for all cases, not just when checksums enabled.
So I'm posting this as a separate patch rather than including that as
a tuning feature of the checksums patch.

Your input will be generously received,

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
copy_nohints.v357.patch text/x-diff 4.8 KB
copytest.sql text/x-sql 358 bytes

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Simon Riggs" <simon(at)2ndQuadrant(dot)com>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: COPY with hints, rebirth
Date: 2012-02-25 18:24:37
Message-ID: 4F48D3050200002500045BC8@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> wrote:

> This patch extends that and actually sets the tuple header flag as
> HEAP_XMIN_COMMITTED during the load.

Fantastic!

So, without bulk-load conditions, a long-lived tuple in PostgreSQL
is written to disk at least five times[1]:

(1) The WAL record for the inserted tuple is written.
(2) The inserted tuple is written.
(3) The HEAP_XMIN_COMMITTED bit is set and the tuple is re-written
in place some time after the inserting transaction's COMMIT.
(4) The WAL record for the "freeze" in write 5 is written.
(5) The xmin is set to frozen and the tuple is rewritten in place
some time after every other connection can see it.

Prior to your patch, bulk load omitted write 1. With your patch we
will also omit write 3.

Since you've just been looking at this area, do you have any
thoughts about writes 4 and 5 being rendered unnecessary by writing
bulk-loaded tuples with a frozen xmin, and having transactions with
a snapshot which doesn't include the bulk load's transaction just
not seeing the table? (Or am I just dreaming about those?)

-Kevin

[1] If you are archiving, it could be more.


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: COPY with hints, rebirth
Date: 2012-02-25 18:58:12
Message-ID: CA+U5nMLrEqkgQr4e+oOYaJ51fQRw5ijzG9s3hCiJ3nG3nR_K5A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Feb 25, 2012 at 6:24 PM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> wrote:
>
>> This patch extends that and actually sets the tuple header flag as
>> HEAP_XMIN_COMMITTED during the load.
>
> Fantastic!
>
> So, without bulk-load conditions, a long-lived tuple in PostgreSQL
> is written to disk at least five times[1]:
>
> (1) The WAL record for the inserted tuple is written.
> (2) The inserted tuple is written.
> (3) The HEAP_XMIN_COMMITTED bit is set and the tuple is re-written
>    in place some time after the inserting transaction's COMMIT.
> (4) The WAL record for the "freeze" in write 5 is written.
> (5) The xmin is set to frozen and the tuple is rewritten in place
>    some time after every other connection can see it.
>
> Prior to your patch, bulk load omitted write 1.  With your patch we
> will also omit write 3.

Yes, well explained.

> Since you've just been looking at this area, do you have any
> thoughts about writes 4 and 5 being rendered unnecessary by writing
> bulk-loaded tuples with a frozen xmin, and having transactions with
> a snapshot which doesn't include the bulk load's transaction just
> not seeing the table?  (Or am I just dreaming about those?)

Setting straight to frozen breaks MVCC, unless/until we use MVCC for
catalog access because we can see the table immediately and then read
the contents as if they had always been there.

I think we could add that as an option on COPY, since "breaking MVCC"
in that way is only a bad thing if it happens accidentally without the
user's permission and knowledge - which they may wish to give in many
cases, such as reloading a database from a dump.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: COPY with hints, rebirth
Date: 2012-02-26 19:16:46
Message-ID: 4F4A851E.3080501@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 24.02.2012 22:55, Simon Riggs wrote:
> A long time ago, in a galaxy far away, we discussed ways to speed up
> data loads/COPY.
> http://archives.postgresql.org/pgsql-hackers/2007-01/msg00470.php
>
> In particular, the idea that we could mark tuples as committed while
> we are still loading them, to avoid negative behaviour for the first
> reader.
>
> Simple patch to implement this is attached, together with test case.
>
> ...
>
> What exactly does it do? Previously, we optimised COPY when it was
> loading data into a newly created table or a freshly truncated table.
> This patch extends that and actually sets the tuple header flag as
> HEAP_XMIN_COMMITTED during the load. Doing so is simple 2 lines of
> code. The patch also adds some tests for corner cases that would make
> that action break MVCC - though those cases are minor and typical data
> loads will benefit fully from this.

This doesn't work with subtransactions:

postgres=# create table a as select 1 as id;
SELECT 1
postgres=# copy a to '/tmp/a';
COPY 1
postgres=# begin;
BEGIN
postgres=# truncate a;
TRUNCATE TABLE
postgres=# savepoint sp1;
SAVEPOINT
postgres=# copy a from '/tmp/a';
COPY 1
postgres=# select * from a;
id
----
(0 rows)

The query should return the row copied in the same subtransaction.

> In the link above, Tom suggested reworking HeapTupleSatisfiesMVCC()
> and adding current xid to snapshots. That is an invasive change that I
> would wish to avoid at any time and explains the long delay in
> tackling this. The way I've implemented it, is just as a short test
> during XidInMVCCSnapshot() so that we trap the case when the xid ==
> xmax and so would appear to be running. This is much less invasive and
> just as performant as Tom's original suggestion.

TransactionIdIsCurrentTransactionId() can be fairly expensive if you
have a lot of subtransactions open...

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: COPY with hints, rebirth
Date: 2012-02-29 16:14:06
Message-ID: CA+U5nM+tu8E_rgG2-bhqWoGKhy7znQeH5WBJHfCsJRxT8LrH5g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Feb 25, 2012 at 6:58 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Sat, Feb 25, 2012 at 6:24 PM, Kevin Grittner
> <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
>> Simon Riggs <simon(at)2ndQuadrant(dot)com> wrote:
>>
>>> This patch extends that and actually sets the tuple header flag as
>>> HEAP_XMIN_COMMITTED during the load.
>>
>> Fantastic!

> I think we could add that as an option on COPY, since "breaking MVCC"
> in that way is only a bad thing if it happens accidentally without the
> user's permission and knowledge - which they may wish to give in many
> cases, such as reloading a database from a dump.

I've coded up COPY FREEZE to do just that.

This version doesn't require any changes to transaction machinery.

But it is very effective at avoiding 4 out of the 5 writes you mention.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
copy_freeze.v1.patch text/x-diff 8.3 KB

From: Christopher Browne <cbbrowne(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: COPY with hints, rebirth
Date: 2012-02-29 18:14:04
Message-ID: CAFNqd5Xffz2DxBDcpqY=Ok7=L2v46TOpi_nhtRHjhwr0rajoGQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Feb 29, 2012 at 11:14 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> But it is very effective at avoiding 4 out of the 5 writes you mention.

For the "common case," would we not want to have (1) [WAL] and (2)
[Writing the pre-frozen tuple]?

If we only write the tuple (2), and don't capture WAL, then the COPY
wouldn't be replicable, right?
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Christopher Browne <cbbrowne(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: COPY with hints, rebirth
Date: 2012-02-29 21:19:17
Message-ID: CA+U5nMJSfsKPkJvCSD7VOB363qjRmDMon__oSeOTtuvUEZKHEg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Feb 29, 2012 at 6:14 PM, Christopher Browne <cbbrowne(at)gmail(dot)com> wrote:
> On Wed, Feb 29, 2012 at 11:14 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> But it is very effective at avoiding 4 out of the 5 writes you mention.
>
> For the "common case," would we not want to have (1) [WAL] and (2)
> [Writing the pre-frozen tuple]?
>
> If we only write the tuple (2), and don't capture WAL, then the COPY
> wouldn't be replicable, right?

Well, my answer is a question: how would you like it to work?

The way I coded it is that it will still write WAL if wal_level is
set, so it would be replicable. So it only works when writing to a
newly created table but is otherwise separate to whether WAL is
skipped.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: COPY with hints, rebirth
Date: 2012-03-01 16:40:59
Message-ID: CA+U5nMLM5MVXmt-gW=kM_evyckxxFKNFSLaLLMvZ-TTs3MrkNQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Feb 26, 2012 at 7:16 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> On 24.02.2012 22:55, Simon Riggs wrote:
>>
>> A long time ago, in a galaxy far away, we discussed ways to speed up
>> data loads/COPY.
>> http://archives.postgresql.org/pgsql-hackers/2007-01/msg00470.php
>>
>> In particular, the idea that we could mark tuples as committed while
>> we are still loading them, to avoid negative behaviour for the first
>> reader.
>>
>> Simple patch to implement this is attached, together with test case.
>>
>>  ...
>>
>>
>> What exactly does it do? Previously, we optimised COPY when it was
>> loading data into a newly created table or a freshly truncated table.
>> This patch extends that and actually sets the tuple header flag as
>> HEAP_XMIN_COMMITTED during the load. Doing so is simple 2 lines of
>> code. The patch also adds some tests for corner cases that would make
>> that action break MVCC - though those cases are minor and typical data
>> loads will benefit fully from this.
>
>
> This doesn't work with subtransactions:
...
> The query should return the row copied in the same subtransaction.

Thanks for pointing that out.

New patch with corrected logic and test case attached.

>> In the link above, Tom suggested reworking HeapTupleSatisfiesMVCC()
>> and adding current xid to snapshots. That is an invasive change that I
>> would wish to avoid at any time and explains the long delay in
>> tackling this. The way I've implemented it, is just as a short test
>> during XidInMVCCSnapshot() so that we trap the case when the xid ==
>> xmax and so would appear to be running. This is much less invasive and
>> just as performant as Tom's original suggestion.
>
>
> TransactionIdIsCurrentTransactionId() can be fairly expensive if you have a
> lot of subtransactions open...

I've put in something to avoid that cost for the common case - just a boolean.

This seems like the best plan rather than the explicit FREEZE option.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
copy_nohints.v358.patch text/x-diff 8.8 KB

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: COPY with hints, rebirth
Date: 2012-03-01 20:49:23
Message-ID: 4F4FE0D3.1070509@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01.03.2012 18:40, Simon Riggs wrote:
> On Sun, Feb 26, 2012 at 7:16 PM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> On 24.02.2012 22:55, Simon Riggs wrote:
>>>
>>> What exactly does it do? Previously, we optimised COPY when it was
>>> loading data into a newly created table or a freshly truncated table.
>>> This patch extends that and actually sets the tuple header flag as
>>> HEAP_XMIN_COMMITTED during the load. Doing so is simple 2 lines of
>>> code. The patch also adds some tests for corner cases that would make
>>> that action break MVCC - though those cases are minor and typical data
>>> loads will benefit fully from this.
>>
>> This doesn't work with subtransactions:
> ...
>> The query should return the row copied in the same subtransaction.
>
> Thanks for pointing that out.
>
> New patch with corrected logic and test case attached.

It's still broken:

-- create test table and file
create table a as select 1 as id;
copy a to '/tmp/a';

-- start test
postgres=# begin;
BEGIN
postgres=# truncate a;
TRUNCATE TABLE
postgres=# savepoint sp1;
SAVEPOINT
postgres=# copy a from '/tmp/a';
COPY 1
postgres=# select * from a;
id
----
1
(1 row)

postgres=# rollback to savepoint sp1;
ROLLBACK
postgres=# select * from a;
id
----
1
(1 row)

That last select should not have seen the tuple.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: COPY with hints, rebirth
Date: 2012-03-02 08:46:45
Message-ID: CA+U5nMKDRGcXn2iB=x8k4CvHZAgwMNBUkypzJ4SG70AvxBpAUw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Mar 1, 2012 at 8:49 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> On 01.03.2012 18:40, Simon Riggs wrote:
>>
>> On Sun, Feb 26, 2012 at 7:16 PM, Heikki Linnakangas
>> <heikki(dot)linnakangas(at)enterprisedb(dot)com>  wrote:
>>>
>>> On 24.02.2012 22:55, Simon Riggs wrote:
>>>>
>>>>
>>>> What exactly does it do? Previously, we optimised COPY when it was
>>>> loading data into a newly created table or a freshly truncated table.
>>>> This patch extends that and actually sets the tuple header flag as
>>>> HEAP_XMIN_COMMITTED during the load. Doing so is simple 2 lines of
>>>> code. The patch also adds some tests for corner cases that would make
>>>> that action break MVCC - though those cases are minor and typical data
>>>> loads will benefit fully from this.
>>>
>>>
>>> This doesn't work with subtransactions:
>>
>> ...
>>>
>>> The query should return the row copied in the same subtransaction.
>>
>>
>> Thanks for pointing that out.
>>
>> New patch with corrected logic and test case attached.
>
>
> It's still broken:

Agreed. Thanks for your detailed input.

Performance test results show that playing with XidInMVCCSnapshot()
causes a small but growing performance regression that I find
unacceptable, so while I might fix the above, I doubt if I can improve
this as well.

So this approach isn't the one...

The COPY FREEZE patch provides a way for the user to say explicitly
that they don't really care about these MVCC corner cases and as a
result allows us to avoid touching XidInMVCCSnapshot() at all. So
there is still a patch on the table.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


From: Noah Misch <noah(at)leadboat(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: COPY with hints, rebirth
Date: 2012-03-02 20:58:34
Message-ID: 20120302205834.GC29160@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Mar 02, 2012 at 08:46:45AM +0000, Simon Riggs wrote:
> On Thu, Mar 1, 2012 at 8:49 PM, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> > It's still broken:

[BEGIN;TRUNCATE;SAVEPOINT;COPY;ROLLBACK TO]

> So this approach isn't the one...
>
> The COPY FREEZE patch provides a way for the user to say explicitly
> that they don't really care about these MVCC corner cases and as a
> result allows us to avoid touching XidInMVCCSnapshot() at all. So
> there is still a patch on the table.

You can salvage the optimization by tightening its prerequisite: use it when
the current subtransaction or a child thereof created or truncated the table.
A parent subtransaction having done so is acceptable for the WAL avoidance
optimization but not for this.

A COPY FREEZE ignoring that stronger restriction would be an interesting
feature in its own right, but it brings up other problems. For example,
suppose you write a tuple, then fail while writing its index entries. The
tuple is already frozen and visible, but it lacks a full set of live index
entries. The subtransaction aborts, but the top transaction commits and makes
the situation permanent.

Incidentally, I contend that we should write frozen tuples to new/truncated
tables unconditionally. The current behavior of making old snapshots see the
table as empty violates atomicity at least as badly as letting those snapshots
see the future-snapshot contents. But Marti has a sound proposal that would
interact with your efforts here to avoid violating atomicity at all:
http://archives.postgresql.org/message-id/CABRT9RBRMdsoz8KxgeHfb4LG-ev9u67-6DLqvoiibpkKhTLQfw@mail.gmail.com

Thanks,
nm


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Simon Riggs" <simon(at)2ndQuadrant(dot)com>, "Noah Misch" <noah(at)leadboat(dot)com>
Cc: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: COPY with hints, rebirth
Date: 2012-03-02 21:12:10
Message-ID: 4F50E34A0200002500045E65@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Noah Misch <noah(at)leadboat(dot)com> wrote:

> Incidentally, I contend that we should write frozen tuples to
> new/truncated tables unconditionally.

+1

> The current behavior of making old snapshots see the table as
> empty violates atomicity at least as badly as letting those
> snapshots see the future-snapshot contents.

Right, there was no point where the table existed as empty at the
end of a transaction, so it is quite broken as things stand now.

> But Marti has a sound proposal that would interact with your
> efforts here to avoid violating atomicity at all

Well, getting it right is certainly better than moving from a slow
non-conforming behavior to a fast non-conforming behavior. ;-)

-Kevin


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: COPY with hints, rebirth
Date: 2012-03-02 22:46:32
Message-ID: CA+U5nMJEizmJXXvsFW8tB+u8W6YKqgReEUJmsHVcxubCDL-SUQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Mar 2, 2012 at 8:58 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> On Fri, Mar 02, 2012 at 08:46:45AM +0000, Simon Riggs wrote:
>> On Thu, Mar 1, 2012 at 8:49 PM, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> > It's still broken:
>
> [BEGIN;TRUNCATE;SAVEPOINT;COPY;ROLLBACK TO]
>
>> So this approach isn't the one...
>>
>> The COPY FREEZE patch provides a way for the user to say explicitly
>> that they don't really care about these MVCC corner cases and as a
>> result allows us to avoid touching XidInMVCCSnapshot() at all. So
>> there is still a patch on the table.
>
> You can salvage the optimization by tightening its prerequisite: use it when
> the current subtransaction or a child thereof created or truncated the table.
> A parent subtransaction having done so is acceptable for the WAL avoidance
> optimization but not for this.

That's already what it does. The problem is what happens after the COPY.

If we said "you can't see the rows you just loaded" it would be
problem over, but in my opinion that needs an explicit request from
the user to show they accept that.

> A COPY FREEZE ignoring that stronger restriction would be an interesting
> feature in its own right, but it brings up other problems.  For example,
> suppose you write a tuple, then fail while writing its index entries.  The
> tuple is already frozen and visible, but it lacks a full set of live index
> entries.  The subtransaction aborts, but the top transaction commits and makes
> the situation permanent.

The optimisation only works in the subtransaction that created the
relfilenode so that isn't an issue.

Thanks for your input.

> Incidentally, I contend that we should write frozen tuples to new/truncated
> tables unconditionally.  The current behavior of making old snapshots see the
> table as empty violates atomicity at least as badly as letting those snapshots
> see the future-snapshot contents.  But Marti has a sound proposal that would
> interact with your efforts here to avoid violating atomicity at all:
> http://archives.postgresql.org/message-id/CABRT9RBRMdsoz8KxgeHfb4LG-ev9u67-6DLqvoiibpkKhTLQfw@mail.gmail.com

Personally, that seems a pretty reasonable thing.

I like Marti's idea. At present, making his idea work could easily
mean checksums sink, so not sure whether to attempt to make that work
in detail.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Simon Riggs" <simon(at)2ndQuadrant(dot)com>, "Noah Misch" <noah(at)leadboat(dot)com>
Cc: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: COPY with hints, rebirth
Date: 2012-03-02 23:15:47
Message-ID: 4F5100430200002500045E86@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> wrote:

> I like Marti's idea. At present, making his idea work could easily
> mean checksums sink, so not sure whether to attempt to make that
> work in detail.

For my part, improving bulk load performance and TRUNCATE
transactional semantics would trump checksums. If we have to defer
one or the other to a later release, I would prefer that we bump
checksums.

While I understand the desire for checksums, and understand others
have had situations where they would have helped, so far I have yet
to run into a situation where I feel they would have helped me. The
hint bit and freeze issues require ongoing attention and have real
performance consequences on a regular basis. And closing the window
for odd transactional semantics on TRUNCATE versus DELETE of all
rows in a table would be one less thing to worry about, in my world.

-Kevin


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: COPY with hints, rebirth
Date: 2012-03-03 13:14:59
Message-ID: CA+U5nMLyZvLbWB8_Vmxy1yMhZGe5ytpOwOSBR+8NwURwWUh6wg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Mar 2, 2012 at 8:58 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> On Fri, Mar 02, 2012 at 08:46:45AM +0000, Simon Riggs wrote:
>> On Thu, Mar 1, 2012 at 8:49 PM, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> > It's still broken:
>
> [BEGIN;TRUNCATE;SAVEPOINT;COPY;ROLLBACK TO]
>
>> So this approach isn't the one...
>>
>> The COPY FREEZE patch provides a way for the user to say explicitly
>> that they don't really care about these MVCC corner cases and as a
>> result allows us to avoid touching XidInMVCCSnapshot() at all. So
>> there is still a patch on the table.
>
> You can salvage the optimization by tightening its prerequisite: use it when
> the current subtransaction or a child thereof created or truncated the table.
> A parent subtransaction having done so is acceptable for the WAL avoidance
> optimization but not for this.

I misread your earlier comment. Yes, that will make this work correctly.

> Incidentally, I contend that we should write frozen tuples to new/truncated
> tables unconditionally.  The current behavior of making old snapshots see the
> table as empty violates atomicity at least as badly as letting those snapshots
> see the future-snapshot contents.  But Marti has a sound proposal that would
> interact with your efforts here to avoid violating atomicity at all:
> http://archives.postgresql.org/message-id/CABRT9RBRMdsoz8KxgeHfb4LG-ev9u67-6DLqvoiibpkKhTLQfw@mail.gmail.com

Thank you for bringing this to my attention.

This will make this optimisation work correctly without adding
anything to the main code path of XidInMVCCSnapshot() and without the
annoying FREEZE syntax. So this puts the patch squarely back on the
table.

I'll do another version of this later today designed to work with the
StrictMVCC patch.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: COPY with hints, rebirth
Date: 2012-03-03 19:47:31
Message-ID: CA+U5nMKTiY8MBfD3n_nqeL6mmMdapX-bTi4N3MPPwpK2t7hP6A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Mar 3, 2012 at 1:14 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Fri, Mar 2, 2012 at 8:58 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
>> On Fri, Mar 02, 2012 at 08:46:45AM +0000, Simon Riggs wrote:
>>> On Thu, Mar 1, 2012 at 8:49 PM, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>> > It's still broken:
>>
>> [BEGIN;TRUNCATE;SAVEPOINT;COPY;ROLLBACK TO]
>>
>>> So this approach isn't the one...
>>>
>>> The COPY FREEZE patch provides a way for the user to say explicitly
>>> that they don't really care about these MVCC corner cases and as a
>>> result allows us to avoid touching XidInMVCCSnapshot() at all. So
>>> there is still a patch on the table.
>>
>> You can salvage the optimization by tightening its prerequisite: use it when
>> the current subtransaction or a child thereof created or truncated the table.
>> A parent subtransaction having done so is acceptable for the WAL avoidance
>> optimization but not for this.
>
> I misread your earlier comment. Yes, that will make this work correctly.
>
>> Incidentally, I contend that we should write frozen tuples to new/truncated
>> tables unconditionally.  The current behavior of making old snapshots see the
>> table as empty violates atomicity at least as badly as letting those snapshots
>> see the future-snapshot contents.  But Marti has a sound proposal that would
>> interact with your efforts here to avoid violating atomicity at all:
>> http://archives.postgresql.org/message-id/CABRT9RBRMdsoz8KxgeHfb4LG-ev9u67-6DLqvoiibpkKhTLQfw@mail.gmail.com
>
> Thank you for bringing this to my attention.
>
> This will make this optimisation work correctly without adding
> anything to the main code path of XidInMVCCSnapshot() and without the
> annoying FREEZE syntax. So this puts the patch squarely back on the
> table.
>
> I'll do another version of this later today designed to work with the
> StrictMVCC patch.

OK, so latest version of patch handling the subtransaction issues
correctly. Thanks to Noah and Heikki for identifying them and hitting
me with the clue stick.

So this version of patch has negligible effect on mainline performance
though leaves newly loaded data visible in ways that would break MVCC.
So this patch relies on the safe_truncate.v2.patch posted on separate
thread.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
copy_autofrozen.v359.patch text/x-diff 6.4 KB

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: COPY with hints, rebirth
Date: 2012-03-03 19:53:31
Message-ID: CA+U5nMKTi=ExfLsg3gTOnU9zg6iPZeXA0K-rqN0bXKerE7Yt-Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Mar 2, 2012 at 11:15 PM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> wrote:
>
>> I like Marti's idea. At present, making his idea work could easily
>> mean checksums sink, so not sure whether to attempt to make that
>> work in detail.
>
> For my part, improving bulk load performance and TRUNCATE
> transactional semantics would trump checksums.  If we have to defer
> one or the other to a later release, I would prefer that we bump
> checksums.
>
> While I understand the desire for checksums, and understand others
> have had situations where they would have helped, so far I have yet
> to run into a situation where I feel they would have helped me.  The
> hint bit and freeze issues require ongoing attention and have real
> performance consequences on a regular basis.  And closing the window
> for odd transactional semantics on TRUNCATE versus DELETE of all
> rows in a table would be one less thing to worry about, in my world.

Since you supported this, I've invested the time to make it work. It
doesn't look like it needs to be either-or.

Please review the safe_truncate.v2.patch and
copy_autofrozen.v359.patch, copied here to assist testing and
inspection.

At present those patches handle only the TRUNCATE/COPY optimisation
but we could easily add CTAS, CREATE/COPY, CLUSTERVACUUM FULL etc..

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
copy_autofrozen.v359.patch text/x-diff 6.4 KB
safe_truncate.v2.patch text/x-diff 17.0 KB