Re: TRUNCATE SERIALIZABLE and frozen COPY

Lists: pgsql-hackers
From: "Kevin Grittner" <kgrittn(at)mail(dot)com>
To: "Merlin Moncure" <mmoncure(at)gmail(dot)com>
Cc: "Robert Haas" <robertmhaas(at)gmail(dot)com>,"Simon Riggs" <simon(at)2ndquadrant(dot)com>,"Marti Raudsepp" <marti(at)juffo(dot)org>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: TRUNCATE SERIALIZABLE and frozen COPY
Date: 2012-11-09 15:34:06
Message-ID: 20121109153406.77880@gmx.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Merlin Moncure wrote:
> Kevin Grittner <kgrittn(at)mail(dot)com> wrote:
>> Robert Haas wrote:

>> It seems to me that the goal would be to make this semantically
>> idential to the behavior users would see if an unqualified DELETE
>> were run against the table rather than a TRUNCATE.
>
> but, triggers would not fire, right?

Right. Perhaps "identical" was too strong a word. I was referring to
the aspect under consideration here -- making it "serializable" in
line with other MVCC operations.

If we're not talking about making conflicts with other transactions
behave just the same as an unqualified DELETE from a user
perspective, I'm not sure what the goal is, exactly. Obviously we
would be keeping the guts of the implementation the same (swapping in
a new, empty heap).

-Kevin


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Kevin Grittner <kgrittn(at)mail(dot)com>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: TRUNCATE SERIALIZABLE and frozen COPY
Date: 2012-11-09 15:46:13
Message-ID: CA+U5nM+a96+7DZCooZZ+TUzNn=n2fGwmszH5irLde=2DBcFruQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 9 November 2012 15:34, Kevin Grittner <kgrittn(at)mail(dot)com> wrote:

> If we're not talking about making conflicts with other transactions
> behave just the same as an unqualified DELETE from a user
> perspective, I'm not sure what the goal is, exactly.

Reasonable question.

My goal is to allow COPY to load frozen tuples without causing MVCC violations.

Altering TRUNCATE so it behaves perfectly from an MVCC/Serializable
perspective is a much bigger, and completely different goal, as well
as something I don't see as desirable anyway for at least 2 good
reasons, as explained. IMHO if people want MVCC/Serializable
semantics, use DELETE, possibly spending time to make unqualified
DELETE do some fancy TRUNCATE-like tricks with relfilenodes.

Forcing a tightly scoped proposal into a much wider one will just kill
this and leave it blocked.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Kevin Grittner <kgrittn(at)mail(dot)com>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: TRUNCATE SERIALIZABLE and frozen COPY
Date: 2012-11-09 16:00:34
Message-ID: CA+U5nMLuLN6JW300wzrAcbmzLsDaP9b1BOvqsBKA6Fv1VXYPrA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 9 November 2012 15:46, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

> Altering TRUNCATE so it behaves perfectly from an MVCC/Serializable
> perspective is a much bigger, and completely different goal, as well
> as something I don't see as desirable anyway for at least 2 good
> reasons, as explained. IMHO if people want MVCC/Serializable
> semantics, use DELETE, possibly spending time to make unqualified
> DELETE do some fancy TRUNCATE-like tricks with relfilenodes.

We spent a lot of time in 9.2 making TRUNCATE/reload of a table "just
work", rather than implementing a REPLACE command.

ISTM strange to throw away all that effort, changing behaviour of
TRUNCATE and thus forcing the need for a REPLACE command after all.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Kevin Grittner <kgrittn(at)mail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: TRUNCATE SERIALIZABLE and frozen COPY
Date: 2012-11-09 16:27:41
Message-ID: 12670.1352478461@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> On 9 November 2012 15:34, Kevin Grittner <kgrittn(at)mail(dot)com> wrote:
>> If we're not talking about making conflicts with other transactions
>> behave just the same as an unqualified DELETE from a user
>> perspective, I'm not sure what the goal is, exactly.

> Reasonable question.

> My goal is to allow COPY to load frozen tuples without causing MVCC violations.

If that's the goal, I question why you're insisting on touching
TRUNCATE's behavior. We already have the principle that "TRUNCATE is
like DELETE except not concurrent-safe". Why not just invent a
non-concurrent-safe option to COPY that loads prefrozen tuples into a
new heap, and call it good? There will be visibility oddness from that
definition, sure, but AFAICS there will be visibility oddness from what
you're talking about too. You'll just have expended a very great deal
of effort to make the weirdness a bit different. Even if the TRUNCATE
part of it were perfectly clean, the "load prefrozen tuples" part won't
be --- so I'm not seeing the value of changing TRUNCATE.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Kevin Grittner <kgrittn(at)mail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: TRUNCATE SERIALIZABLE and frozen COPY
Date: 2012-11-09 17:09:29
Message-ID: CA+TgmoYh_KXErp9eOejMV6EJJaczeZZcSj3kRtq=yg1SjiMidg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Nov 9, 2012 at 11:27 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> My goal is to allow COPY to load frozen tuples without causing MVCC violations.
>
> If that's the goal, I question why you're insisting on touching
> TRUNCATE's behavior. We already have the principle that "TRUNCATE is
> like DELETE except not concurrent-safe". Why not just invent a
> non-concurrent-safe option to COPY that loads prefrozen tuples into a
> new heap, and call it good? There will be visibility oddness from that
> definition, sure, but AFAICS there will be visibility oddness from what
> you're talking about too. You'll just have expended a very great deal
> of effort to make the weirdness a bit different. Even if the TRUNCATE
> part of it were perfectly clean, the "load prefrozen tuples" part won't
> be --- so I'm not seeing the value of changing TRUNCATE.

I don't object to the idea of giving COPY a way to load prefrozen
tuples, but I think you might be missing the point here otherwise.
Right now, if you CREATE or TRUNCATE a table, copy a bunch of data
into it, and then commit, another transaction that took a snapshot
before your commit can subsequently look at that table and it will NOT
see your newly-loaded data. What it will see instead is an empty
table. This is, of course, wrong: it ought to fail with a
serialization error. It is very possible that the table has never
been empty at the conclusion of a completed transaction: it might have
contained data before the TRUNCATE, and it might again contain data by
the time the truncating transaction commits. Yet, we see it as empty,
which is not MVCC-compliant.

If we were to make COPY pre-freeze the data when the table was created
or truncated in the same transaction, it would alter the behavior in
this situation, and from an application perspective, only this
situation. Now, instead of seeing the table as empty, you'd see the
new contents. This is also not MVCC-compliant, and I guess the
concern when we have talked about this topic before is that changing
from wrong behavior to another, not-backward-compatible wrong behavior
might not be the friendliest thing to do. We could decide we don't
care and just break it. Or we could try to make it through a
serialization error, as Simon is proposing here, which seems like the
tidiest solution. Or we could keep the old heap around until there
are no more snapshots that can need it, which is a bit scary since
we'd be eating double disk-space in the meantime, but it would
certainly be useful to some users, I think.

Just having an option to preload frozen tuples dodges all of these
issues by throwing our hands up in the air, but it does have the
advantage of being more general. Even if we do that I'm not sure it
would be a bad thing to try to solve this issue in a somewhat more
principled way, but it would surely reduce the urgency.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Kevin Grittner <kgrittn(at)mail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: TRUNCATE SERIALIZABLE and frozen COPY
Date: 2012-11-09 20:23:16
Message-ID: 17719.1352492596@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Just having an option to preload frozen tuples dodges all of these
> issues by throwing our hands up in the air, but it does have the
> advantage of being more general. Even if we do that I'm not sure it
> would be a bad thing to try to solve this issue in a somewhat more
> principled way, but it would surely reduce the urgency.

Yeah. ISTM the whole point of TRUNCATE is "I don't care about
serializability for this operation, give me efficiency instead".
So I see nothing wrong with a (non-default) option for COPY that
similarly trades away some semantic guarantees for efficiency's sake.
There are an awful lot of bulk-load scenarios where people will gladly
take that trade, and are not very interested in halfway points either.

regards, tom lane


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kevin Grittner <kgrittn(at)mail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: TRUNCATE SERIALIZABLE and frozen COPY
Date: 2012-11-09 20:31:43
Message-ID: CA+U5nMJwyP17AC+5O+wFWvsbn8kkMCky95Jx+Rna7SvAteLGow@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 9 November 2012 16:27, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
>> On 9 November 2012 15:34, Kevin Grittner <kgrittn(at)mail(dot)com> wrote:
>>> If we're not talking about making conflicts with other transactions
>>> behave just the same as an unqualified DELETE from a user
>>> perspective, I'm not sure what the goal is, exactly.
>
>> Reasonable question.
>
>> My goal is to allow COPY to load frozen tuples without causing MVCC violations.
>
> If that's the goal, I question why you're insisting on touching
> TRUNCATE's behavior. We already have the principle that "TRUNCATE is
> like DELETE except not concurrent-safe". Why not just invent a
> non-concurrent-safe option to COPY that loads prefrozen tuples into a
> new heap, and call it good? There will be visibility oddness from that
> definition, sure, but AFAICS there will be visibility oddness from what
> you're talking about too. You'll just have expended a very great deal
> of effort to make the weirdness a bit different. Even if the TRUNCATE
> part of it were perfectly clean, the "load prefrozen tuples" part won't
> be --- so I'm not seeing the value of changing TRUNCATE.

This is wonderful thought and I wish I'd thought of it. My digression
via truncate now annoys me.

Yes, there are objections and I've read what Robert has said. An
explicit new option is perfectly entitled to introduce new behaviour
and won't cause a problem with existing applications. I personally
don't care about serializable stuff here, and nor do most others. They
just want a way to load new data quickly. It is important to me that
we do things in well principled ways, but the task at hand is data
loading not perfect visibility guarantees.

So what we're talking about here is a new mode for COPY, that when
requested will pre-freeze tuples when loading into a newly
created/truncated table. If the table isn't newly created/truncated
then we'll just ignore it and continue. I see no need to throw an
error, since that will just cause annoying usability issues.

COPY FREEZE here we come, with extensive docs to explain the trade-off
the user is accepting.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)mail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: TRUNCATE SERIALIZABLE and frozen COPY
Date: 2012-11-12 16:10:50
Message-ID: CA+TgmobgFLC+506vEyJH4WwFYHL6QcCv_XbX01nU=TrcOi4Thg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Nov 9, 2012 at 3:31 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> So what we're talking about here is a new mode for COPY, that when
> requested will pre-freeze tuples when loading into a newly
> created/truncated table. If the table isn't newly created/truncated
> then we'll just ignore it and continue. I see no need to throw an
> error, since that will just cause annoying usability issues.

Actually, why not just have it work always? If people want to load
frozen tuples into a table that's not newly created/truncated, why not
let them? Sure, there could be MVCC violations, but as long as the
behavior is opt-in, who cares? I think it'd be useful to a lot of
people.

If we want to reduce (not eliminate) the potential MVCC issues, which
I think would be a good idea, we could take AccessExclusiveLock on the
table when COPY (FREEZE) is used. Someone using an old snapshot but
accessing the table for the first time after AEL is released could
still see MVCC anomalies, but at least it would rule out things
changing in mid-query, which is the case that I think would be most
problematic.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Kevin Grittner <kgrittn(at)mail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: TRUNCATE SERIALIZABLE and frozen COPY
Date: 2012-11-12 16:20:08
Message-ID: 26259.1352737208@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Fri, Nov 9, 2012 at 3:31 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> So what we're talking about here is a new mode for COPY, that when
>> requested will pre-freeze tuples when loading into a newly
>> created/truncated table. If the table isn't newly created/truncated
>> then we'll just ignore it and continue. I see no need to throw an
>> error, since that will just cause annoying usability issues.

> Actually, why not just have it work always? If people want to load
> frozen tuples into a table that's not newly created/truncated, why not
> let them? Sure, there could be MVCC violations, but as long as the
> behavior is opt-in, who cares? I think it'd be useful to a lot of
> people.

I thought about that too, but there's a big problem. It wouldn't be
just MVCC that would be broken, but transactional integrity: if the
COPY fails partway through, the already-loaded rows still look valid.
The new-file requirement provides a way to roll them back.

I'm willing to have an option that compromises MVCC semantics
transiently, but giving up transactional integrity seems a bit much.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Kevin Grittner <kgrittn(at)mail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: TRUNCATE SERIALIZABLE and frozen COPY
Date: 2012-11-12 16:22:23
Message-ID: CA+TgmoZvrLqXw7ZuvKgSjFtaQ5NDxwOCmUCB5p9gAMfpUBKOww@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Nov 12, 2012 at 11:20 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Fri, Nov 9, 2012 at 3:31 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>> So what we're talking about here is a new mode for COPY, that when
>>> requested will pre-freeze tuples when loading into a newly
>>> created/truncated table. If the table isn't newly created/truncated
>>> then we'll just ignore it and continue. I see no need to throw an
>>> error, since that will just cause annoying usability issues.
>
>> Actually, why not just have it work always? If people want to load
>> frozen tuples into a table that's not newly created/truncated, why not
>> let them? Sure, there could be MVCC violations, but as long as the
>> behavior is opt-in, who cares? I think it'd be useful to a lot of
>> people.
>
> I thought about that too, but there's a big problem. It wouldn't be
> just MVCC that would be broken, but transactional integrity: if the
> COPY fails partway through, the already-loaded rows still look valid.
> The new-file requirement provides a way to roll them back.
>
> I'm willing to have an option that compromises MVCC semantics
> transiently, but giving up transactional integrity seems a bit much.

Hmm, good point. There might be some way around that, but figuring it
out is probably material for a separate patch.

But I guess that raises the question - should COPY (FREEZE) silently
ignore the option for not-new relfilenodes, or should it error out?
Simon proposed the former, but I'm wondering if the latter would be
better.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)mail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: TRUNCATE SERIALIZABLE and frozen COPY
Date: 2012-11-12 19:28:47
Message-ID: CA+U5nML0QrMQp_HC1wAAunyO4mxYwErXC4q6QRAv_CYgQEN6KQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12 November 2012 16:22, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> But I guess that raises the question - should COPY (FREEZE) silently
> ignore the option for not-new relfilenodes, or should it error out?
> Simon proposed the former, but I'm wondering if the latter would be
> better.

It's got some complex pre-conditions, so having scripts fail because
you mis-specified FREEZE would be annoying.

The option indicates "I accept the potential MVCC violation", not "it
will always freeze".

If there is a better name...

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services