Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and

Lists: pgsql-hackers
From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: bizgres-general <bizgres-general(at)pgfoundry(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: WAL bypass for INSERT, UPDATE and DELETE?
Date: 2005-12-22 14:31:33
Message-ID: 1135261893.2964.502.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Having just optimized COPY to avoid writing WAL during the transaction
in which a table was first created, it seems worth considering whether
this should occur for INSERT, UPDATE and DELETE also.

It is fairly common to do data transformation using INSERT SELECTs and
UPDATEs. This is usually done with temporary tables however. (DELETE
would most efficiently be handled as an additional NOT clause on the
insert, so it is uncommonly used in this circumstance.)

However, CREATE TABLE AS SELECT (CTAS) does not allow inheritance, so a
new permanent partition has to be created using CREATE TABLE, followed
by an INSERT SELECT or COPY.

Is that sufficient reason to optimise INSERT SELECT and UPDATE also? Or
should I not bother? Or should I try to teach CTAS to use inheritance
(which sounds harder and has a few gotchas).

Currently, CTAS optimization requires a heap_sync during ExecEndPlan. It
would be easy enough to extend this so that it also works for INSERT,
UPDATE and DELETE.

Best Regards, Simon Riggs


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: bizgres-general <bizgres-general(at)pgfoundry(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WAL bypass for INSERT, UPDATE and DELETE?
Date: 2005-12-22 17:12:04
Message-ID: 23765.1135271524@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> Currently, CTAS optimization requires a heap_sync during ExecEndPlan. It
> would be easy enough to extend this so that it also works for INSERT,
> UPDATE and DELETE.

If you tried to do it that way you'd break the system completely. Not
all updates go through the executor.

I think it's a bad idea anyway; you'd be adding overhead to the lowest
level routines in order to support a feature that would be very seldom
used, at least in comparison to the number of times those routines are
executed.

regards, tom lane


From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: bizgres-general <bizgres-general(at)pgfoundry(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and DELETE?
Date: 2005-12-22 18:37:51
Message-ID: 20051222183751.GG72143@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Dec 22, 2005 at 02:31:33PM +0000, Simon Riggs wrote:
> Having just optimized COPY to avoid writing WAL during the transaction
> in which a table was first created, it seems worth considering whether
> this should occur for INSERT, UPDATE and DELETE also.
>
> It is fairly common to do data transformation using INSERT SELECTs and
> UPDATEs. This is usually done with temporary tables however. (DELETE
> would most efficiently be handled as an additional NOT clause on the
> insert, so it is uncommonly used in this circumstance.)
>
> However, CREATE TABLE AS SELECT (CTAS) does not allow inheritance, so a
> new permanent partition has to be created using CREATE TABLE, followed
> by an INSERT SELECT or COPY.
>
> Is that sufficient reason to optimise INSERT SELECT and UPDATE also? Or
> should I not bother? Or should I try to teach CTAS to use inheritance
> (which sounds harder and has a few gotchas).
>
> Currently, CTAS optimization requires a heap_sync during ExecEndPlan. It
> would be easy enough to extend this so that it also works for INSERT,
> UPDATE and DELETE.

Well, both UPDATE and DELETE seem like pretty odd use cases to me;
typically I'd do any needed data manipulation during the INSERT SELECT.
But it doesn't make sense to me to set this up for INSERT and ignore
UPDATE and DELETE.

I do think this needs to be something that is made either completely
transparent or must be specifically enabled. As described, I believe
this would break PITR, so users should have to specifically request that
behavior (and they should probably get a WARNING message, too).

Though, what's interesting is that theoretically it should be possible
to do this and still protect PITR, by logging the statements (but not
the actual data) to WAL. This isn't very practical with WAL (though it
might be worth looking at storing a compressed version of what's being
fed in to COPY), but in this case if we end up in a recovery situation
the data that the insert is pulling from should exist in the database in
the same state, so it should be possible to re-create the table. There's
still an issue of how to handle the pages from the new table that will
end up in WAL on subsiquent transactions, since presumably they might be
identical, but someone might be able to come up with a clever solution
for that. In the meantime, breaking WAL recovery needs to be something
that users must specifically request, via something like UPDATE NOWAL.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461


From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: bizgres-general <bizgres-general(at)pgfoundry(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and DELETE?
Date: 2005-12-22 18:38:45
Message-ID: 20051222183845.GH72143@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

BTW, this should also probably be moved over to -hackers...
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: bizgres-general <bizgres-general(at)pgfoundry(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WAL bypass for INSERT, UPDATE and DELETE?
Date: 2005-12-22 20:10:54
Message-ID: 1135282254.2964.518.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2005-12-22 at 12:12 -0500, Tom Lane wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > Currently, CTAS optimization requires a heap_sync during ExecEndPlan. It
> > would be easy enough to extend this so that it also works for INSERT,
> > UPDATE and DELETE.
>
> If you tried to do it that way you'd break the system completely. Not
> all updates go through the executor.

Wow, didn't know that.

> I think it's a bad idea anyway; you'd be adding overhead to the lowest
> level routines in order to support a feature that would be very seldom
> used, at least in comparison to the number of times those routines are
> executed.

Agreed.

Maybe just INSERT SELECT then. That's easy enough to test for without
altering the main code path in the executor too much. If anybody is
going to say they want it?

Best Regards, Simon Riggs


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, bizgres-general <bizgres-general(at)pgfoundry(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and DELETE?
Date: 2005-12-22 20:18:26
Message-ID: 20051222201826.GH21783@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Dec 22, 2005 at 12:37:51PM -0600, Jim C. Nasby wrote:
> I do think this needs to be something that is made either completely
> transparent or must be specifically enabled. As described, I believe
> this would break PITR, so users should have to specifically request that
> behavior (and they should probably get a WARNING message, too).

This reminds me of a friend who used MSSQL that had replication going
that broke every time you did a certain statement. It may have been
SELECT INTO [1]. His main problem was that the replication would
stop working silently. We need to be waving red flags if we broke
someone's backup procedure.

Considering "WAL bypass" is code for "breaks PITR", I think we really
need to make sure people realise that running such a command breaks
their backups/replication/whatever people are doing.

[1] http://msdn.microsoft.com/library/default.asp?url=/library/en-us/trblsql/tr_reslsyserr_1_1r94.asp

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, bizgres-general <bizgres-general(at)pgfoundry(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-22 22:13:03
Message-ID: 1135289583.2964.536.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2005-12-22 at 21:18 +0100, Martijn van Oosterhout wrote:
> Considering "WAL bypass" is code for "breaks PITR"

No it isn't. All of the WAL bypass logic does *not* operate when PITR is
active. The WAL bypass logic is aimed at Data Warehouses, which
typically never operate in PITR mode for performance reasons, however
the choice is yours.

Best Regards, Simon Riggs


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, bizgres-general <bizgres-general(at)pgfoundry(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-22 22:36:25
Message-ID: 20051222223625.GC6026@ns.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Simon Riggs (simon(at)2ndquadrant(dot)com) wrote:
> On Thu, 2005-12-22 at 21:18 +0100, Martijn van Oosterhout wrote:
> > Considering "WAL bypass" is code for "breaks PITR"
>
> No it isn't. All of the WAL bypass logic does *not* operate when PITR is
> active. The WAL bypass logic is aimed at Data Warehouses, which
> typically never operate in PITR mode for performance reasons, however
> the choice is yours.

Eh? PITR mode is bad for performance? Maybe I missed something but I
wouldn't have thought PITR would degrade regular performance all that
badly. So long as it doesn't take 15 minutes or some such to move the
WAL to somewhere else (and I'm not sure that'd even slow things down..).
For a Data Warehouse, have you got a better way of doing backups such
that you don't lose at minimum most of a day's work? I'm not exactly a
big fan do doing a pg_dump every night either given that the database is
360GB. Much nicer to take a weekly dump of the database and then do
PITR for a week or two before taking another dump of the db.

I like the idea of making COPY go faster, but please don't break my
backup system while you're at it. I'm honestly kind of nervous about
what you mean by checking it PITR is active- how is that done, exactly?
Check if you have a script set to rotate the logs elsewhere? Or is it
checking if you're in the taking-a-full-database-backup stage? Or what?
What's the performance decrease when using PITR, and what's it from? Is
it just that COPY isn't as fast? Honestly, I could live with COPY being
not as fast as it could be if my backups work. :)

Sorry for sounding concerned but, well, backups are very important and
so is performance and I'm afraid either I've not read all the
documentation about the issues being discussed here or there isn't
enough out there to make sense of it all yet. :)

Thanks,

Stephen


From: "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-22 23:52:38
Message-ID: dofe63$20vk$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


"Simon Riggs" <simon(at)2ndquadrant(dot)com> wrote
>
> No it isn't. All of the WAL bypass logic does *not* operate when PITR is
> active. The WAL bypass logic is aimed at Data Warehouses, which
> typically never operate in PITR mode for performance reasons, however
> the choice is yours.
>

To make things, is it possible to add a GUC to let user disable *all* the
xlogs?

Regards,
Qingqing


From: "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-23 00:05:10
Message-ID: dofett$eat$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


"Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu> wrote
>
>
> To make things, is it possible to add a GUC to let user disable *all* the
> xlogs?
>

It may work like this:

BEGIN TRANSACTION WITHOUT XLOG; /* forbidden vacuum, PITR etc */
BEGIN
... /* no xlog during this peroid */
END; /* don't mark this transaction committed */
BEGIN
...
END;
END TRANSACTION DO COMMIT; /* at this time issue checkpiont && mark all
transactions committed */

So during this peroid, if any transaction failed, the only consequence is
add invisible garbage data. When everything is going well, then END
TRANSACTION DO COMMIT will mark these transaction permanate. Also, seems
there is no problem even with XLOG_NO_TRAN updates.

Regards,
Qingqing


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-23 00:29:29
Message-ID: 8748.1135297769@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu> writes:
> BEGIN TRANSACTION WITHOUT XLOG; /* forbidden vacuum, PITR etc */

> So during this peroid, if any transaction failed, the only consequence is
> add invisible garbage data.

No, the likely consequence is irretrievable corruption of any table or
index page touched by the transaction. You're going to have a very hard
time selling this as a good idea.

regards, tom lane


From: "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-23 00:53:04
Message-ID: dofhnu$28o3$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote
> "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu> writes:
>> BEGIN TRANSACTION WITHOUT XLOG; /* forbidden vacuum, PITR etc */
>
>> So during this peroid, if any transaction failed, the only consequence is
>> add invisible garbage data.
>
> No, the likely consequence is irretrievable corruption of any table or
> index page touched by the transaction.
>

I guess I know (at least part) of what you mean. This is because we rely on
replay all the xlog no matter it belongs to a committed transaction or not.
Why? Because a failed transaction is not totally useless since later
transaction may reply on some physical thing it creates - for example, a new
page and its links of a btree. So for heap, there is(95% sure) no such
problem. Our heap redo algorithm can automatically add empty pages to a
heap. For index, there are problems, but I suspect they are solvable by not
bypassing these records ... if this is not totally-nonstarter, I will
investigate details of how to do it.

Regards,
Qingqing


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-23 04:31:07
Message-ID: 15810.1135312267@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu> writes:
> I guess I know (at least part) of what you mean. This is because we rely on
> replay all the xlog no matter it belongs to a committed transaction or not.
> Why? Because a failed transaction is not totally useless since later
> transaction may reply on some physical thing it creates - for example, a new
> page and its links of a btree. So for heap, there is(95% sure) no such
> problem.

Torn pages (partial page write) are still a problem.

regards, tom lane


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, bizgres-general <bizgres-general(at)pgfoundry(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-23 10:18:43
Message-ID: 1135333123.2964.589.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2005-12-22 at 17:36 -0500, Stephen Frost wrote:
> * Simon Riggs (simon(at)2ndquadrant(dot)com) wrote:
> > On Thu, 2005-12-22 at 21:18 +0100, Martijn van Oosterhout wrote:
> > > Considering "WAL bypass" is code for "breaks PITR"
> >
> > No it isn't. All of the WAL bypass logic does *not* operate when PITR is
> > active. The WAL bypass logic is aimed at Data Warehouses, which
> > typically never operate in PITR mode for performance reasons, however
> > the choice is yours.

OK, thanks for saying all of that; you probably speak for many in
raising these concerns. I'll answer each bit as we come to it. Suffice
to say, your concerns are good and so are the answers:

> Eh? PITR mode is bad for performance? Maybe I missed something but I
> wouldn't have thought PITR would degrade regular performance all that
> badly.

PITR mode is *not* bad for performance. On a very heavily loaded
write-intensive test system, the general PITR overhead on regular
performance was around 1% - so almost negligible.

We have been discussing a number of optimizations to specific commands
that would allow them to avoid writing WAL and thus speed up their
performance. If archive_command is set then WAL will always be written;
if it is not set then these commands will (or could) go faster:

- CREATE TABLE AS SELECT (in 8.1)
- COPY LOCK (patch submitted)
- COPY in same transaction as CREATE TABLE (patch submitted)
- INSERT SELECT in same transaction as CREATE TABLE (this discussion)

(There are a number of other conditions also, such as there must be no
indexes on a table. All of which now documented with the patch)

> So long as it doesn't take 15 minutes or some such to move the
> WAL to somewhere else (and I'm not sure that'd even slow things down..).
> For a Data Warehouse, have you got a better way of doing backups such
> that you don't lose at minimum most of a day's work?

Yes. Don't just use the backup facilities on their own. Think about how
the architecture of your systems will work and see if there is a better
way when you look at very large systems.

> I'm not exactly a
> big fan do doing a pg_dump every night either given that the database is
> 360GB. Much nicer to take a weekly dump of the database and then do
> PITR for a week or two before taking another dump of the db.

e.g. Keep your reference data (low volume) in an Operational Data Store
(ODS) database, protected by archiving. Keep your main fact data (high
volume) in the Data Warehouse, but save the data in slices as you load
it, so that a recovery is simply a reload of the database: no PITR or
pg_dump required, so high performance data transformation and load work
is possible. This is a commonly used architectural design pattern.

> I like the idea of making COPY go faster, but please don't break my
> backup system while you're at it.

On a personal note, I would only add that I spent a long time working on
PITR and I would never design anything that would intentionally break it
(nor would patches be accepted that did that). That probably gives me
the confidence to approach designs that might look like I'm doing that,
but without actually straying over the edge.

> I'm honestly kind of nervous about
> what you mean by checking it PITR is active- how is that done, exactly?
> Check if you have a script set to rotate the logs elsewhere? Or is it
> checking if you're in the taking-a-full-database-backup stage? Or what?

Internally, we use XLogArchivingActive(). Externally this will be set
when the admin sets archive_command to a particular value.

My original preference was for a parameter called archive_mode= ON | OFF
which would allow us to more easily discuss this, but this does not
currently exist.

> What's the performance decrease when using PITR, and what's it from? Is
> it just that COPY isn't as fast? Honestly, I could live with COPY being
> not as fast as it could be if my backups work. :)

These commands will not be optimized for speed when archive_command is set:
- CREATE TABLE AS SELECT (in 8.1)
- COPY LOCK (patch submitted)

> Sorry for sounding concerned but, well, backups are very important and
> so is performance and I'm afraid either I've not read all the
> documentation about the issues being discussed here or there isn't
> enough out there to make sense of it all yet. :)

If you choose PITR, then you are safe. If you do not, the crash recovery
of the database is not endangered by these optimizations.

Hope that covers all of your concerns?

I'm just writing a course that explains many of these techniques,
available in the New Year.

Best Regards, Simon Riggs


From: Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-24 02:37:42
Message-ID: Pine.LNX.4.58.0512232135140.24800@eon.cs
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>
> Torn pages (partial page write) are still a problem.

I revised the idea with MINIMAL XLOG (instead of WITHOUT XLOG) like the
below. I think in this way, we can always gaurantee its correctness and
can always improve it.

To Use It
----------
A "BEGIN TRANSACTION MINIMAL XLOG/END" block is a speicial "BEGIN/END"
transaction block. It tries to avoid unnessary xlogs but still perserves
transaction semantics. It is good for the situation that the user wants to
do a big data load. It is issued like this:

1. BEGIN TRANSACTION MINIMAL XLOG
2. ... /* statements */
3. END;

From user's view, it is almost the same as ordinary transaction: if
everything run smoothly from step 1 to 3, the transaction will be made
durable. If any step failed (including ABORT, transaction error, system
crash), it looks like nothing happened. To make life easier, no
subtransactions is allowed.

To Implement It
----------------
At step 1, we will disallow some operations, including vacuum, PITR.
At step 2, only minimal xlog entries are logged. If anything inside
failed, handle it like ordinary transaction.
At step 3, we issue a checkpoint, then mark the transaction commited. If
step 8 itself failed, handle it like ordinary transaction.

The correctness is easy: if we let "minimal xlog" equal to "all xlog",
then it is exactly the same as an ordinary transaction plus a checkpoint
inside the transaction block.

Based on the above proof, we can have the following implementation steps:
1. We first make the framework without revising any XLogInsert() - thus
the implementation is correct;
2. Examine each XLogInsert() and differenciate the content under MINIAML
XLOG is set or not.

The importance of the above steps is that it implies that there is no need
to completely pick up what are the MINIAL XLOG content are, we can do them
gradually in a non-invasive way.

Minimal Xlog
-------------
The xlog of failed transaction is not totally useless since later
transaction may reply on something it creates - for example, a new page
and its links of a btree. We have to pick up these xlogs.

RM_HEAP_ID: The problem of heap is torn page prevention. We currently copy
the whole page into xlog if it is the first time touched after a
checkpoint. So we can always have this copy to replace the data file page
which might be torn written. I didn't come up with any good method to
handle it so far, so we keep this. (We can possibly avoid copy a P_NEW
page, that's another story though). So what we can avoid xlog at least
include the insert/update/delete happened on a page that's no need to be
copied, which will give us a 50% xlog volumn/contention reduction I think.

RM_BTREE_ID/RM_HASH_ID/RM_GIST_ID: For index, things get more complex. We
need the xlogs to maintain the structure of the btree index, like the
pointers, high key etc, but the content is not necessarily needed. Need
more research here.

RM_XLOG_ID/RM_XACT_ID/RM_SMGR_ID/RM_CLOG_ID/RM_DBASE_ID/RM_TBLSPC_ID/RM_MULTIXACT_ID/RM_SEQ_ID:
It is hard to avoid much here, but they are not the important volume
contribution of xlogs.

Regards,
Qingqing


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-24 03:00:39
Message-ID: 11802.1135393239@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu> writes:
> I revised the idea with MINIMAL XLOG (instead of WITHOUT XLOG) like the
> below. I think in this way, we can always gaurantee its correctness and
> can always improve it.

I think the entire idea is a waste of time anyway. If we have the COPY
case covered then we have done the only case that has any practical use
AFAICS --- you don't use UPDATE or DELETE for bulk loads, and you don't
use retail INSERTs either if you are striving for speed.

I don't want to see us mucking with the WAL logic without a *whole* lot
better rationale than has been presented so far.

regards, tom lane


From: Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-24 03:29:50
Message-ID: Pine.LNX.4.58.0512232225430.25023@eon.cs
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 23 Dec 2005, Tom Lane wrote:
> Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu> writes:
> > I revised the idea with MINIMAL XLOG (instead of WITHOUT XLOG) like the
> > below. I think in this way, we can always gaurantee its correctness and
> > can always improve it.
>
> I think the entire idea is a waste of time anyway. If we have the COPY
> case covered then we have done the only case that has any practical use
> AFAICS --- you don't use UPDATE or DELETE for bulk loads, and you don't
> use retail INSERTs either if you are striving for speed.
>

There are several posts on the list asking about NOLOGGING option
presented in Oracle. User may need it to do bulk updates against the
database. I don't think we plan to support it since it does not gaurantee
transaction semantics. But MINIMAL XLOG is something that we are afford to
do and not invasive change needed AFAICS.

Regards,
Qingqing


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-24 03:41:42
Message-ID: 87fyoj6u15.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu> writes:
> > I revised the idea with MINIMAL XLOG (instead of WITHOUT XLOG) like the
> > below. I think in this way, we can always gaurantee its correctness and
> > can always improve it.
>
> I think the entire idea is a waste of time anyway. If we have the COPY
> case covered then we have done the only case that has any practical use
> AFAICS --- you don't use UPDATE or DELETE for bulk loads, and you don't
> use retail INSERTs either if you are striving for speed.

Well the "industry standard" approach for loading data is called "ETL". I
forget what the E stands for, but the idea is to first load the data into a
table exactly as it appears in the input data.

Then the "T" stands for "transform". This could include just normalizing the
input data into the correct format but it could also include looking up
foreign key values for input strings, checking for duplicate records, etc.
This would be a series of UPDATEs and DELETEs.

I guess the "L" stands for load, where you insert it into the real tables.

Up until the point where you load it into the real tables you don't really
care about the transactional integrity of the data. If it's lost you can
always just restart the process.

In many cases you could use temporary tables, but sometimes you might want
multiple processes or multiple transactions to be able to see the data.
Consider for example a loading process that includes a human operator
reviewing the data before the data is approved to be loaded into the final
tables.

But I don't see turning on and off the WAL on a per-transaction basis to be
useful. Every transaction in the system is affected by the WAL status of every
other transaction working with the same tables. It doesn't serve any purpose
to have one transaction bypassing the WAL while everyone else does WAL logging
for the same table; they're all going to lose if the system crashes.

It seems to me the only rational way to approach this is to have a per-table
flag that sets that table to be non-logged. Essentially changing a table's
behaviour to that of a temporary table except that other transactions can see
it. If the system crashes the table is truncated on system restore.

The only problem I have with this is that it smells too much like MySQL MyISAM
tables...

--
greg


From: "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-24 03:50:57
Message-ID: doiggp$68a$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


"Greg Stark" <gsstark(at)mit(dot)edu> wrote
>
> But I don't see turning on and off the WAL on a per-transaction basis to
> be
> useful. Every transaction in the system is affected by the WAL status of
> every
> other transaction working with the same tables. It doesn't serve any
> purpose
> to have one transaction bypassing the WAL while everyone else does WAL
> logging
> for the same table; they're all going to lose if the system crashes.
>
Sure, so a minimal amount xlog is required. And to make finished transaction
durable, issue a checkpoint.

> It seems to me the only rational way to approach this is to have a
> per-table
> flag that sets that table to be non-logged. Essentially changing a table's
> behaviour to that of a temporary table except that other transactions can
> see
> it. If the system crashes the table is truncated on system restore.
>
> The only problem I have with this is that it smells too much like MySQL
> MyISAM
> tables...
>
Table are related, so table A references table B. So set a per-table flag is
hard to use or doesn't work.

Regards,
Qingqing


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-24 04:06:21
Message-ID: 12176.1135397181@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> writes:
> It seems to me the only rational way to approach this is to have a per-table
> flag that sets that table to be non-logged. Essentially changing a table's
> behaviour to that of a temporary table except that other transactions can see
> it.

But what's the point? Nowhere in your scenario is there a reason why
we need to have multiple sessions working on the data being loaded.
So a temp table solves the problem perfectly. (Temp tables do span
transactions, just not sessions.)

I've got a fundamental philosophical objection to proposals in this
line, which I suppose I'd better state for the record. I don't like
inventing nonstandard SQL extensions or peculiar semantics just to gain
performance. It imposes extra complexity on users that they could do
without; the first time someone loses critical data because he didn't
fully understand the tradeoffs involved, he'll have a justifiable gripe.
I also don't like playing Oracle's or MySQL's game by inventing
proprietary syntax. We claim to believe in standards compliance, so we
should have a pretty high bar for inventing nonstandard syntax. When
the proposed change has a narrow use-case and is only going to yield
marginal improvements even in that case, I think we should just say no.

Bottom line: if we can't figure out how to do it transparently, I think
we haven't thought hard enough.

regards, tom lane


From: Rod Taylor <pg(at)rbt(dot)ca>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-24 04:19:01
Message-ID: 1135397941.7514.45.camel@home
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> In many cases you could use temporary tables, but sometimes you might want
> multiple processes or multiple transactions to be able to see the data.

Could always implement GLOBAL TEMP tables that have the ability to use
these kinds of shortcuts.
--


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-24 07:01:17
Message-ID: 87acer6ksi.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Greg Stark <gsstark(at)mit(dot)edu> writes:
> > It seems to me the only rational way to approach this is to have a per-table
> > flag that sets that table to be non-logged. Essentially changing a table's
> > behaviour to that of a temporary table except that other transactions can see
> > it.
>
> But what's the point? Nowhere in your scenario is there a reason why
> we need to have multiple sessions working on the data being loaded.
> So a temp table solves the problem perfectly. (Temp tables do span
> transactions, just not sessions.)

Well I gave one example of a use case where you want to span sessions. To be
more explicit, if you have some batch job loading data that you then want to
have a human review via some other path such as a web page before continuing
with the process.

There's nothing saying all the data is coming from the same source at all. You
could have some of the data being loaded by a psql COPY command and then other
data being loaded by a perl script. Being tied to having the two executed from
the same session is an arbitrary limit on the architecture.

> I've got a fundamental philosophical objection to proposals in this
> line, which I suppose I'd better state for the record. I don't like
> inventing nonstandard SQL extensions or peculiar semantics just to gain
> performance. It imposes extra complexity on users that they could do
> without; the first time someone loses critical data because he didn't
> fully understand the tradeoffs involved, he'll have a justifiable gripe.
> I also don't like playing Oracle's or MySQL's game by inventing
> proprietary syntax. We claim to believe in standards compliance, so we
> should have a pretty high bar for inventing nonstandard syntax. When
> the proposed change has a narrow use-case and is only going to yield
> marginal improvements even in that case, I think we should just say no.
>
> Bottom line: if we can't figure out how to do it transparently, I think
> we haven't thought hard enough.

I don't think 'alter table set logged off' complicates users' lives too much.

But I like the other user's suggestion too. If there's a standards blessed
feature to have temporary tables that are visible in other sessions then
perhaps that's what we've arrived at from another direction.

This means the final load into production data would still have to be logged,
so this wouldn't get everything the original posters were looking for. But
getting around that does mean risking someone's production data at some point.

--
greg


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Rod Taylor <pg(at)rbt(dot)ca>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-24 08:23:42
Message-ID: 20051224082342.GA8707@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Dec 23, 2005 at 11:19:01PM -0500, Rod Taylor wrote:
> > In many cases you could use temporary tables, but sometimes you might want
> > multiple processes or multiple transactions to be able to see the data.
>
> Could always implement GLOBAL TEMP tables that have the ability to use
> these kinds of shortcuts.

Indeed, that's the thought I get while reading this thread.

TEMP tables are not xlogged

For global temp tables I'd suggest creating a schema pg_globaltemp or
some such where:

1. No Xlog, like TEMP tables
2. Use shared buffer cache like normal tables.
3. On startup after unclean shutdown, delete anything in that schema

And (AFAIK) we get extra SQL complience to boot. If you really want an
extension, how about one to un-TEMP-ify a temp table.

That would allow the following:

CREATE GLOBAL TEMP TABLE blah INHERITS myhugetable;

-- Load data, munge, etc... no xlog traffic
-- If system crashes here, nothing to recover

ALTER TABLE blah ALTER SCHEMA TO public; -- untempify

-- If PITR, write Xlog data now
-- If no PITR, just move file and update catalog

How does this not do what people are asking for?
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-24 14:18:17
Message-ID: 1135433897.2964.650.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2005-12-23 at 22:41 -0500, Greg Stark wrote:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>
> > Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu> writes:
> > > I revised the idea with MINIMAL XLOG (instead of WITHOUT XLOG) like the
> > > below. I think in this way, we can always gaurantee its correctness and
> > > can always improve it.
> >
> > I think the entire idea is a waste of time anyway. If we have the COPY
> > case covered then we have done the only case that has any practical use
> > AFAICS --- you don't use UPDATE or DELETE for bulk loads, and you don't
> > use retail INSERTs either if you are striving for speed.
>
> Well the "industry standard" approach for loading data is called "ETL". I
> forget what the E stands for, but the idea is to first load the data into a
> table exactly as it appears in the input data.
>
> Then the "T" stands for "transform". This could include just normalizing the
> input data into the correct format but it could also include looking up
> foreign key values for input strings, checking for duplicate records, etc.
> This would be a series of UPDATEs and DELETEs.
>
> I guess the "L" stands for load, where you insert it into the real tables.

Just for completeness: the whole data load area is typically known as
ETL now, but there are two main approaches: ETL and ELT

ETL stands for
Extract - take the data from the source system
Transform - the process of mapping from source to target data model
Load - load the transformed data into the target system

In this mode, the only SQL used is COPY, during the "L" phase.

"ETL" is the "industry standard" as set by vendors who sell tools that
work that way: Informatica, Ascential etc. Another, equally viable route
is similar to your description, but is known as ELT, which is supported
by vendors such as Oracle (proprietary-only) and Sunopsis (open):

Extract - take the data from the source system
Load - load the data into the DW staging area in same form as source
Tranform - transform the data using SQL commands

In the latter ELT approach, you use COPY during the "L" phase, but then
you definitely do use INSERT SELECT and sometimes UPDATE commands,
rarely DELETEs during the "T" phase.

The ELT approach is the main basis for a *large* number of very large
Oracle, Teradata and DB2 data warehouses that have custom-built load
procedures. Check out this link for a high level explanation,
http://www.sunopsis.com/corporate/us/products/sunopsis/snps_dc.htm
but noting that Sunopsis did not invent this approach, or the name ELT.

> Up until the point where you load it into the real tables you don't really
> care about the transactional integrity of the data. If it's lost you can
> always just restart the process.

> In many cases you could use temporary tables, but sometimes you might want
> multiple processes or multiple transactions to be able to see the data.
> Consider for example a loading process that includes a human operator
> reviewing the data before the data is approved to be loaded into the final
> tables.

Agreed

> But I don't see turning on and off the WAL on a per-transaction basis to be
> useful. Every transaction in the system is affected by the WAL status of every
> other transaction working with the same tables. It doesn't serve any purpose
> to have one transaction bypassing the WAL while everyone else does WAL logging
> for the same table; they're all going to lose if the system crashes.

Yes, I really don't like that way.

> It seems to me the only rational way to approach this is to have a per-table
> flag that sets that table to be non-logged. Essentially changing a table's
> behaviour to that of a temporary table except that other transactions can see
> it. If the system crashes the table is truncated on system restore.

Often, you need to speed up the load into your largest tables. The data
you are loading often comes from a file that you can backed-up, so if
the database crashes during the load, you can always restart the load.
But that doesn't mean you want to lose the data that is already there if
you crash.

That's exactly what COPY LOCK has been designed to do. It is analagous
to Oracle's sql*loader direct path nologging mode.

Best Regards, Simon Riggs


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Rod Taylor <pg(at)rbt(dot)ca>, Greg Stark <gsstark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-24 15:16:52
Message-ID: 874q4y7cez.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:

> CREATE GLOBAL TEMP TABLE blah INHERITS myhugetable;

I don't think you can have your temp table inherit from a real table. That
would make your real table have temp table semantics. Ie, records in it will
spontaneously disappear on reboot.

But you could use "LIKE" to define the table which I think does what you want
here.

Partitioned table support would eventually include the ability to take the
table (once you've done the untempify thing below) and move it into the
partitioned table as a partition.

> -- Load data, munge, etc... no xlog traffic
> -- If system crashes here, nothing to recover
>
> ALTER TABLE blah ALTER SCHEMA TO public; -- untempify

Personally I don't like the idea of tying temporary table status with the
schema. Is this in the standard? It seems to me that you should be able to
create temporary tables in whatever schema you want. People sometimes use
schemas to separate different modules of their application. It's strange to
force them to share a single namespace for temporary tables.

I would think untempify should be a separate special command, not some hidden
feature in ALTER SCHEMA. It makes it less surprising when someone reads the
code. It also makes it more likely someone who needs the functionality will
find it.

ALTER TABLE blah SET NOT TEMPORARY

This also raises the question of whether it should be possible to do:

ALTER TABLE blah SET TEMPORARY

and when it should be possible. At a first cut, any table that isn't involved
in any foreign key relationships and isn't part of any inherited table
structure should be eligible.

Come to think of it my database would benefit from this feature. I have cache
tables I rebuild nightly. They have to be visible from every session but I
don't care about losing them on a crash since they can always be rebuilt. As
it stands they inflate the daily backups, and if I were using PITR the nightly
build would flood the PITR logs with tons of useless data. Making them global
temporary tables would let me reduce the backup size, reduce PITR volume, and
speed up the nightly cache build in one stroke.

--
greg


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, Rod Taylor <pg(at)rbt(dot)ca>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-24 15:32:29
Message-ID: 15451.1135438349@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> writes:
> Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:
>> CREATE GLOBAL TEMP TABLE blah INHERITS myhugetable;

> I don't think you can have your temp table inherit from a real table.

Bear in mind also that this notion of a GLOBAL TEMP table has less than
nothing to do with what the standard says a GLOBAL TEMP table is. You
can't argue for the idea by claiming it increases our standards
compliance, because it actually is moving even further away. The spec
says that a temp table is a table that exists but starts empty in every
session, ie, data inserted by one session is not visible to any other
session. The GLOBAL/LOCAL choice does not affect this, it actually just
governs visibility of the table within different modules (a concept we
don't have at all).

An ALTER TABLE SET LOGGED/UNLOGGED switch might have some merit, but
please don't muddy the waters by confusing this with temp-table status.

regards, tom lane


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Rod Taylor <pg(at)rbt(dot)ca>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-26 12:03:27
Message-ID: 1135598607.2964.702.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, 2005-12-24 at 10:32 -0500, Tom Lane wrote:

> An ALTER TABLE SET LOGGED/UNLOGGED switch might have some merit, but
> please don't muddy the waters by confusing this with temp-table
> status.

I would not be against such a table-level switch, but the exact
behaviour would need to be specified more closely before this became a
TODO item, IMHO.

If someone has a 100 GB table, they would not appreciate the table being
truncated if a transaction to load 1 GB of data aborts, forcing recovery
of the 100 GB table.

Best Regards, Simon Riggs


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Rod Taylor <pg(at)rbt(dot)ca>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-26 12:22:12
Message-ID: 20051226122206.GA12934@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Dec 26, 2005 at 12:03:27PM +0000, Simon Riggs wrote:
> I would not be against such a table-level switch, but the exact
> behaviour would need to be specified more closely before this became a
> TODO item, IMHO.

Well, I think at a per table level is the only sensible level. If a
table isn't logged, neither are the indexes. After an unclean shutdown
the data could be anywhere between OK and rubbish, with no way of
finding out which way.

> If someone has a 100 GB table, they would not appreciate the table being
> truncated if a transaction to load 1 GB of data aborts, forcing recovery
> of the 100 GB table.

Ah, but wouldn't such a large table be partitioned in such a way that
you could have the most recent partition having the loaded data.
Personally, I think these "shared temp tables" have more applications
than meet the eye. I've had systems with cache tables which could be
wiped on boot. Though I think my preference would be to TRUNCATE rather
than DROP on unclean shutdown.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Rod Taylor <pg(at)rbt(dot)ca>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-26 16:00:51
Message-ID: 200512261600.jBQG0p801613@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


I think this brings up an interesting distinction, that having the NO
LOGGING switch per command doesn't make sense because it modifying the
table. It has to be a per-object switch, or something that operates
only on empty tables. This is the exact same distinction we talked
about for NO LOGGING COPY.

---------------------------------------------------------------------------

Martijn van Oosterhout wrote:
-- Start of PGP signed section.
> On Mon, Dec 26, 2005 at 12:03:27PM +0000, Simon Riggs wrote:
> > I would not be against such a table-level switch, but the exact
> > behaviour would need to be specified more closely before this became a
> > TODO item, IMHO.
>
> Well, I think at a per table level is the only sensible level. If a
> table isn't logged, neither are the indexes. After an unclean shutdown
> the data could be anywhere between OK and rubbish, with no way of
> finding out which way.
>
> > If someone has a 100 GB table, they would not appreciate the table being
> > truncated if a transaction to load 1 GB of data aborts, forcing recovery
> > of the 100 GB table.
>
> Ah, but wouldn't such a large table be partitioned in such a way that
> you could have the most recent partition having the loaded data.
> Personally, I think these "shared temp tables" have more applications
> than meet the eye. I've had systems with cache tables which could be
> wiped on boot. Though I think my preference would be to TRUNCATE rather
> than DROP on unclean shutdown.
>
> Have a nice day,
> --
> Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> > tool for doing 5% of the work and then sitting around waiting for someone
> > else to do the other 95% so you can sue them.
-- End of PGP section, PGP failed!

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-27 17:07:29
Message-ID: 20051227170729.GD72143@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Dec 24, 2005 at 02:01:17AM -0500, Greg Stark wrote:
> But I like the other user's suggestion too. If there's a standards blessed
> feature to have temporary tables that are visible in other sessions then
> perhaps that's what we've arrived at from another direction.

Having a temporary table that is visible to all sessions would have a
lot of use besides just ETL (the E stands for Extract, btw) operations.
One example is storing session data for a webapp; most people would
happily trade off losing that data on a database restart for the
increased performance of not logging.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Rod Taylor <pg(at)rbt(dot)ca>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-27 18:20:29
Message-ID: 20051227182024.GC32404@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Dec 26, 2005 at 11:00:51AM -0500, Bruce Momjian wrote:
>
> I think this brings up an interesting distinction, that having the NO
> LOGGING switch per command doesn't make sense because it modifying the
> table. It has to be a per-object switch, or something that operates
> only on empty tables. This is the exact same distinction we talked
> about for NO LOGGING COPY.

I've thought of one other possibility, which is kind of at the extreme
end of system implementation. Given the suggestion about not losing a
whole table on unclean shutdown, how about using a single table, split.

1. When setting no logging flag, take exclusive lock and record
filesize. This size is X.
2. From now on any data before X is read-only. So no updates. Any new
data needs to be allocated at end, so no FSM either.
3. Any data added after X is not logged to xlog.
4. On unclean shutdown, truncate table to length X.
5. When logging is reenabled, set X back to infinity.

The shared temp tables discussed earlier and basically be above with
X=0. The process of untempifying a table is setting X to infinity. Temp
tables always have X=0.

This allows people to do their ELT, because they can update rows
written after X but without the logging. If you have PITR you simply
never allow changing X of any table.

But this is pretty way out though IMO.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


From: Hannu Krosing <hannu(at)skype(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-27 22:41:57
Message-ID: 1135723318.3533.14.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Ühel kenal päeval, R, 2005-12-23 kell 23:06, kirjutas Tom Lane:
> Greg Stark <gsstark(at)mit(dot)edu> writes:
> > It seems to me the only rational way to approach this is to have a per-table
> > flag that sets that table to be non-logged. Essentially changing a table's
> > behaviour to that of a temporary table except that other transactions can see
> > it.
>
> But what's the point? Nowhere in your scenario is there a reason why
> we need to have multiple sessions working on the data being loaded.
> So a temp table solves the problem perfectly. (Temp tables do span
> transactions, just not sessions.)

"Human operator looking at the date before loading it into final tables"
seems to be exactly that usecase. After some script loads the data in
temp tables, you may want an operator to look at the data using some
visual app or some other script. It would be quite tricky to make them
use the same session, especially if different steps use different
technologies (shell script, psql, jdbc, python, ...)

> I've got a fundamental philosophical objection to proposals in this
> line, which I suppose I'd better state for the record. I don't like
> inventing nonstandard SQL extensions or peculiar semantics just to gain
> performance.

Out TEMP tables *already* are nonstandard. IIRC the standard temp tables
the word temp applies only for data, not the schema.

> It imposes extra complexity on users that they could do
> without; the first time someone loses critical data because he didn't
> fully understand the tradeoffs involved, he'll have a justifiable gripe.
> I also don't like playing Oracle's or MySQL's game by inventing
> proprietary syntax. We claim to believe in standards compliance, so we
> should have a pretty high bar for inventing nonstandard syntax. When
> the proposed change has a narrow use-case and is only going to yield
> marginal improvements even in that case, I think we should just say no.
>
> Bottom line: if we can't figure out how to do it transparently, I think
> we haven't thought hard enough.

I guess we can never make the database so smart that it can quess users
specific needs of durability on per-table basis. What we can do is
making different durability choices more obvious at syntax level.

I have been long musing about having multiple wal files/filesets,
perhaps per-tablespace, perhaps just specified on per-table basis.
This would give both possibility for added performance by using
more/different storages and also a possibility to select different
classes of durability.

making one of WAL files (strategies) be /dev/null would almost get us
non-logged writes, except for a little overhead in write() calls.
fsync() on /dev/null should be instantaneous .

----------------------
Hannu


From: Hannu Krosing <hannu(at)skype(dot)net>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Rod Taylor <pg(at)rbt(dot)ca>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-27 22:47:31
Message-ID: 1135723651.3533.16.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Ühel kenal päeval, T, 2005-12-27 kell 19:20, kirjutas Martijn van
Oosterhout:
> On Mon, Dec 26, 2005 at 11:00:51AM -0500, Bruce Momjian wrote:
> >
> > I think this brings up an interesting distinction, that having the NO
> > LOGGING switch per command doesn't make sense because it modifying the
> > table. It has to be a per-object switch, or something that operates
> > only on empty tables. This is the exact same distinction we talked
> > about for NO LOGGING COPY.
>
> I've thought of one other possibility, which is kind of at the extreme
> end of system implementation. Given the suggestion about not losing a
> whole table on unclean shutdown, how about using a single table, split.
>
> 1. When setting no logging flag, take exclusive lock and record
> filesize. This size is X.
> 2. From now on any data before X is read-only. So no updates. Any new
> data needs to be allocated at end, so no FSM either.
> 3. Any data added after X is not logged to xlog.
> 4. On unclean shutdown, truncate table to length X.
> 5. When logging is reenabled, set X back to infinity.

How would it work for indexes ?

------------
Hannu


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Hannu Krosing <hannu(at)skype(dot)net>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-28 00:30:12
Message-ID: 9450.1135729812@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hannu Krosing <hannu(at)skype(dot)net> writes:
> making one of WAL files (strategies) be /dev/null would almost get us
> non-logged writes, except for a little overhead in write() calls.
> fsync() on /dev/null should be instantaneous .

No, you really do want to push it up to a higher level than that.
The CPU and contention costs of generating a WAL record are nontrivial,
and doing so only to throw it away is pretty silly. What's more,
pointing WAL at /dev/null would disable logging for the entire database
cluster, not just selected tables which is what people seem to be
asking for in this thread. (No, I don't wish to deal with multiple
WAL output streams...)

regards, tom lane


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Hannu Krosing <hannu(at)skype(dot)net>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Rod Taylor <pg(at)rbt(dot)ca>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-28 08:50:26
Message-ID: 20051228085021.GA24400@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Dec 28, 2005 at 12:47:31AM +0200, Hannu Krosing wrote:
> > I've thought of one other possibility, which is kind of at the extreme
> > end of system implementation. Given the suggestion about not losing a
> > whole table on unclean shutdown, how about using a single table, split.

<snip>

> How would it work for indexes ?

Hmm, it wouldn't, so scrap that.

If any anyone ever gets indexes that work across multiple tables of an
inheritence hierarchy, you could probably make it work for this.
Otherwise... OTOH, triggers will trip you up also.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Rod Taylor <pg(at)rbt(dot)ca>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-29 01:58:14
Message-ID: 200512290158.jBT1wEK28785@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Having read through this thread, I would like to propose a
syntax/behavior.

I think we all now agree that the logging is more part of the table than
the command itself. Right now we have a COPY LOCK patch, but people are
going to want to control logging for INSERT INTO ... SELECT, and UPDATE,
and all sorts of other things, so I think we are best adding an ALTER
TABLE capability. I am thinking of this syntax:

ALTER TABLE name RELIABILITY option

where "option" is:

DROP [ TABLE ON CRASH ]
DELETE [ ROWS ON CRASH ]
EXCLUSIVE
SHARE

Let me explain each option. DROP would drop the table on a restart
after a non-clean shutdown. It would do _no_ logging on the table and
allow concurrent access, plus index access. DELETE is the same as DROP,
but it just truncates the table (perhaps TRUNCATE is a better word).

EXCLUSIVE would allow only a single session to modify the table, and
would do all changes by appending to the table, similar to COPY LOCK.
EXCLUSIVE would also not allow indexes because those can not be isolated
like appending to the heap. EXCLUSIVE would write all dirty shared
buffers for the table and fsync them before committing. SHARE is the
functionality we have now, with full logging.

Does this get us any closer to a TODO item? It isn't great, but I think
it is pretty clear, and I assume pg_dump would use ALTER to load each
table. The advanage is that the COPY statements themselves are
unchanged so they would work in loading into older versions of
PostgreSQL.

---------------------------------------------------------------------------

Martijn van Oosterhout wrote:
-- Start of PGP signed section.
> On Mon, Dec 26, 2005 at 12:03:27PM +0000, Simon Riggs wrote:
> > I would not be against such a table-level switch, but the exact
> > behaviour would need to be specified more closely before this became a
> > TODO item, IMHO.
>
> Well, I think at a per table level is the only sensible level. If a
> table isn't logged, neither are the indexes. After an unclean shutdown
> the data could be anywhere between OK and rubbish, with no way of
> finding out which way.
>
> > If someone has a 100 GB table, they would not appreciate the table being
> > truncated if a transaction to load 1 GB of data aborts, forcing recovery
> > of the 100 GB table.
>
> Ah, but wouldn't such a large table be partitioned in such a way that
> you could have the most recent partition having the loaded data.
> Personally, I think these "shared temp tables" have more applications
> than meet the eye. I've had systems with cache tables which could be
> wiped on boot. Though I think my preference would be to TRUNCATE rather
> than DROP on unclean shutdown.
>
> Have a nice day,
> --
> Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> > tool for doing 5% of the work and then sitting around waiting for someone
> > else to do the other 95% so you can sue them.
-- End of PGP section, PGP failed!

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Rod Taylor <pg(at)rbt(dot)ca>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-29 03:05:30
Message-ID: 43B3527A.4040709@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

now agree that the logging is more part of the table than
> the command itself. Right now we have a COPY LOCK patch, but people are
> going to want to control logging for INSERT INTO ... SELECT, and UPDATE,
> and all sorts of other things, so I think we are best adding an ALTER
> TABLE capability. I am thinking of this syntax:
>
> ALTER TABLE name RELIABILITY option
>
> where "option" is:
>
> DROP [ TABLE ON CRASH ]
> DELETE [ ROWS ON CRASH ]
> EXCLUSIVE
> SHARE

I would say ON FAILURE (Crash just seems way to scary :))

Joshua D. Drake


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Rod Taylor <pg(at)rbt(dot)ca>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-29 04:09:21
Message-ID: 200512290409.jBT49LD13611@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Joshua D. Drake wrote:
> now agree that the logging is more part of the table than
> > the command itself. Right now we have a COPY LOCK patch, but people are
> > going to want to control logging for INSERT INTO ... SELECT, and UPDATE,
> > and all sorts of other things, so I think we are best adding an ALTER
> > TABLE capability. I am thinking of this syntax:
> >
> > ALTER TABLE name RELIABILITY option
> >
> > where "option" is:
> >
> > DROP [ TABLE ON CRASH ]
> > DELETE [ ROWS ON CRASH ]
> > EXCLUSIVE
> > SHARE
>
> I would say ON FAILURE (Crash just seems way to scary :))

Agreed, maybe ON RECOVERY.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Rod Taylor <pg(at)rbt(dot)ca>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-29 13:19:45
Message-ID: 1135862385.2964.804.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2005-12-28 at 20:58 -0500, Bruce Momjian wrote:
> Having read through this thread, I would like to propose a
> syntax/behavior.
>
> I think we all now agree that the logging is more part of the table than
> the command itself. Right now we have a COPY LOCK patch, but people are
> going to want to control logging for INSERT INTO ... SELECT, and UPDATE,
> and all sorts of other things, so I think we are best adding an ALTER
> TABLE capability. I am thinking of this syntax:
>
> ALTER TABLE name RELIABILITY option
>
> where "option" is:
>
> DROP [ TABLE ON CRASH ]
> DELETE [ ROWS ON CRASH ]
> EXCLUSIVE
> SHARE
>
> Let me explain each option. DROP would drop the table on a restart
> after a non-clean shutdown. It would do _no_ logging on the table and
> allow concurrent access, plus index access. DELETE is the same as DROP,
> but it just truncates the table (perhaps TRUNCATE is a better word).
>
> EXCLUSIVE would allow only a single session to modify the table, and
> would do all changes by appending to the table, similar to COPY LOCK.
> EXCLUSIVE would also not allow indexes because those can not be isolated
> like appending to the heap. EXCLUSIVE would write all dirty shared
> buffers for the table and fsync them before committing. SHARE is the
> functionality we have now, with full logging.
>
> Does this get us any closer to a TODO item? It isn't great, but I think
> it is pretty clear, and I assume pg_dump would use ALTER to load each
> table. The advanage is that the COPY statements themselves are
> unchanged so they would work in loading into older versions of
> PostgreSQL.

First off, thanks for summarising a complex thread.

My view would be that this thread has been complex because everybody has
expressed a somewhat different requirement, which could be broken down
as:
1. The need for a multi-user-accessible yet temporary table
2. Loading data into a table immediately after it is created (i.e. in
same transaction), including but not limited to a reload from pg_dump
3. How to load data quickly into an existing table (COPY)
4. How to add/modify data quickly in an existing table (INSERT SELECT,
UPDATE)

I can see the need for all of those individually; my existing patch
submission covers (2) and (3) only. I very much like your thought to
coalesce these various requirements into a single coherent model.

For requirement (1), table level options make sense. We would:
- CREATE TABLE ALLTHINGS
- ALTER TABLE ALLTHINGS RELIABILITY DELETE ROWS ON RECOVERY
- lots of SQL, all fast because not logged

(2) is catered for adequately by the existing COPY patch i.e. it will
detect whether a table has just been created and then avoid writing WAL.
In the patch, pg_dump has *not* been altered to use COPY LOCK, so a
pg_dump *will* work with any other version of PostgreSQL, which *would
not* be the case if we added ALTER TABLE ... RELIABILITY statements into
it. Also, a pg_dump created at an earlier version could also be loaded
faster using the patch. The only requirement is to issue all SQL as part
of the same transaction - which is catered for by the
--single-transaction option on pg_restore and psql. So (2) is catered
for fully without the need for an ALTER TABLE ... RELIABILITY statement
or COPY LOCK.

For requirement (3), I would use table level options like this:
(the table already exists and is reasonably big; we should not assume
that everybody can and does use partitioning)
- ALTER TABLE RELIABILITY ALLTHINGS2 EXCLUSIVE
- COPY
- ALTER TABLE RELIABILITY ALLTHINGS2 SHARE

For a load into an existing table I would always do all three actions
together. COPY LOCK does exactly that *and* does it atomically.

The two ways of doing (3) have a few pros/cons either way:
Pro for ALTER TABLE:
- same syntax as req (1)
- doesn't need the keyword LOCK
- allows INSERT SELECT, UPDATE operations also (req 4)
Cons:
- existing programs have to add additional statements to take advantage
of this; with COPY LOCK we would add just a single keyword
- operation is not atomic, which might lead to some operations waiting
for a lock to operate as unlogged, since they would execute before the
second ALTER TABLE gets there
- operation will be understood by some, but not others. They will forget
to switch the RELIABILITY back on and then lose their whole table when
the database crashes. (watch...)

...but would it be a problem to have both?

So, my thinking would be to separate things into two:
a) Add a TODO item "shared temp tables" that caters for (1) and (4)

ALTER TABLE name RELIABILITY
{DELETE ROWS AT RECOVERY | FULL RECOVERY}
(syntax TBD)

which would
- truncate all rows and remove all index entries during recovery
- use shared_buffers, not temp_buffers
- never write xlog records, even when in PITR mode
- would avoid writing WAL for both heap *and* index tuples

b) Leave the COPY patch as is, since it caters for reqs (2) and (3) as
*separate* optimizations (but using a common infrastructure in code).
[This work was based upon discussions on -hackers only 6 months ago, so
its not like its been snuck in or anything
http://archives.postgresql.org/pgsql-hackers/2005-06/msg00069.php
http://archives.postgresql.org/pgsql-hackers/2005-06/msg00075.php ]

These two thoughts are separable. There is no need to
have-both-or-neither within PostgreSQL.

Eventually, I'd like all of these options, as a database designer.

Best Regards, Simon Riggs

> --------------------------------------------------------------------------
>
> Martijn van Oosterhout wrote:
> -- Start of PGP signed section.
> > On Mon, Dec 26, 2005 at 12:03:27PM +0000, Simon Riggs wrote:
> > > I would not be against such a table-level switch, but the exact
> > > behaviour would need to be specified more closely before this became a
> > > TODO item, IMHO.
> >
> > Well, I think at a per table level is the only sensible level. If a
> > table isn't logged, neither are the indexes. After an unclean shutdown
> > the data could be anywhere between OK and rubbish, with no way of
> > finding out which way.
> >
> > > If someone has a 100 GB table, they would not appreciate the table being
> > > truncated if a transaction to load 1 GB of data aborts, forcing recovery
> > > of the 100 GB table.
> >
> > Ah, but wouldn't such a large table be partitioned in such a way that
> > you could have the most recent partition having the loaded data.
> > Personally, I think these "shared temp tables" have more applications
> > than meet the eye. I've had systems with cache tables which could be
> > wiped on boot. Though I think my preference would be to TRUNCATE rather
> > than DROP on unclean shutdown.
> >
> > Have a nice day,
> > --
> > Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> > > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> > > tool for doing 5% of the work and then sitting around waiting for someone
> > > else to do the other 95% so you can sue them.
> -- End of PGP section, PGP failed!
>


From: Rod Taylor <pg(at)rbt(dot)ca>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-29 14:35:27
Message-ID: 1135866927.61038.13.camel@home
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> So, my thinking would be to separate things into two:
> a) Add a TODO item "shared temp tables" that caters for (1) and (4)
>
> ALTER TABLE name RELIABILITY
> {DELETE ROWS AT RECOVERY | FULL RECOVERY}
> (syntax TBD)

DELETE ROWS AT RECOVERY would need to be careful or disallowed when
referenced via a foreign key to ensure the database is not restored in
an inconsistent state.

--


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Rod Taylor <pg(at)rbt(dot)ca>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-29 15:10:40
Message-ID: 1135869040.2964.824.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2005-12-29 at 09:35 -0500, Rod Taylor wrote:
> > So, my thinking would be to separate things into two:
> > a) Add a TODO item "shared temp tables" that caters for (1) and (4)
> >
> > ALTER TABLE name RELIABILITY
> > {DELETE ROWS AT RECOVERY | FULL RECOVERY}
> > (syntax TBD)
>
> DELETE ROWS AT RECOVERY would need to be careful or disallowed when
> referenced via a foreign key to ensure the database is not restored in
> an inconsistent state.

I think we'd need to apply the same rule as we do for temp tables: they
cannot be referenced by a permanent table.

There are possibly some other restrictions also. Anyone?

Best Regards, Simon Riggs


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: kleptog(at)svana(dot)org, simon(at)2ndquadrant(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, gsstark(at)mit(dot)edu, pg(at)rbt(dot)ca, zhouqq(at)cs(dot)toronto(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-29 16:05:42
Message-ID: 200512291605.jBTG5gi00396@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan wrote:
> Bruce Momjian said:
> > DROP would drop the table on a restart
> > after a non-clean shutdown. It would do _no_ logging on the table and
> > allow concurrent access, plus index access. DELETE is the same as
> > DROP, but it just truncates the table (perhaps TRUNCATE is a better
> > word).
> >
> > EXCLUSIVE would allow only a single session to modify the table, and
> > would do all changes by appending to the table, similar to COPY LOCK.
> > EXCLUSIVE would also not allow indexes because those can not be
> > isolated like appending to the heap. EXCLUSIVE would write all dirty
> > shared buffers for the table and fsync them before committing. SHARE
> > is the functionality we have now, with full logging.
>
>
> I an horribly scared that this will be used as a "performance boost" for
> normal use. I would at least like to see some restrictions that make it
> harder to mis-use. Perhaps restrict to superuser?

Certainly restrict to table owner.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Greg Stark <gsstark(at)mit(dot)edu>, Rod Taylor <pg(at)rbt(dot)ca>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-29 16:12:11
Message-ID: 7273.1135872731@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> My view would be that this thread has been complex because everybody has
> expressed a somewhat different requirement, which could be broken down
> as:
> 1. The need for a multi-user-accessible yet temporary table
> 2. Loading data into a table immediately after it is created (i.e. in
> same transaction), including but not limited to a reload from pg_dump
> 3. How to load data quickly into an existing table (COPY)
> 4. How to add/modify data quickly in an existing table (INSERT SELECT,
> UPDATE)

> I can see the need for all of those individually; my existing patch
> submission covers (2) and (3) only. I very much like your thought to
> coalesce these various requirements into a single coherent model.

However, you then seem to be arguing for still using the COPY LOCK
syntax, which I think Bruce intended would go away in favor of using
these ALTER commands. Certainly that's what I'd prefer --- COPY has
got too darn many options already.

> In the patch, pg_dump has *not* been altered to use COPY LOCK, so a
> pg_dump *will* work with any other version of PostgreSQL, which *would
> not* be the case if we added ALTER TABLE ... RELIABILITY statements into
> it.

Wrong --- the good thing about ALTER TABLE is that an old version of
Postgres would simply reject it and keep going. Therefore we could get
the speedup in dumps without losing compatibility, which is not true
of COPY LOCK.

BTW, this is a perfect example of the use-case for not abandoning a
dump-file load simply because one command fails. (We have relied on
this sort of reasoning many times before, too, for example by using
"SET default_with_oids" in preference to CREATE TABLE WITH/WITHOUT OIDS.)
I don't think that "wrap the whole load into begin/end" is really a very
workable answer, because there are far too many scenarios where you
can't do that. Another one where it doesn't help is a data-only dump.

regards, tom lane


From: "Andrew Dunstan" <andrew(at)dunslane(dot)net>
To: <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: <kleptog(at)svana(dot)org>, <simon(at)2ndquadrant(dot)com>, <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <gsstark(at)mit(dot)edu>, <pg(at)rbt(dot)ca>, <zhouqq(at)cs(dot)toronto(dot)edu>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-29 16:14:37
Message-ID: 51082.68.143.134.146.1135872877.squirrel@www.dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian said:
> DROP would drop the table on a restart
> after a non-clean shutdown. It would do _no_ logging on the table and
> allow concurrent access, plus index access. DELETE is the same as
> DROP, but it just truncates the table (perhaps TRUNCATE is a better
> word).
>
> EXCLUSIVE would allow only a single session to modify the table, and
> would do all changes by appending to the table, similar to COPY LOCK.
> EXCLUSIVE would also not allow indexes because those can not be
> isolated like appending to the heap. EXCLUSIVE would write all dirty
> shared buffers for the table and fsync them before committing. SHARE
> is the functionality we have now, with full logging.

I an horribly scared that this will be used as a "performance boost" for
normal use. I would at least like to see some restrictions that make it
harder to mis-use. Perhaps restrict to superuser?

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, kleptog(at)svana(dot)org, simon(at)2ndquadrant(dot)com, gsstark(at)mit(dot)edu, pg(at)rbt(dot)ca, zhouqq(at)cs(dot)toronto(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-29 16:24:28
Message-ID: 7966.1135873468@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> Andrew Dunstan wrote:
>> I an horribly scared that this will be used as a "performance boost" for
>> normal use. I would at least like to see some restrictions that make it
>> harder to mis-use. Perhaps restrict to superuser?

> Certainly restrict to table owner.

I can see the argument for superuser-only: decisions about data
integrity tradeoffs should be reserved to the DBA, who is the one who
will get blamed if the database loses data, no matter how stupid his
users are.

But I'm not wedded to that. I could live with table-owner.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Greg Stark <gsstark(at)mit(dot)edu>, Rod Taylor <pg(at)rbt(dot)ca>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-29 16:37:39
Message-ID: 200512291637.jBTGbdC03848@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > My view would be that this thread has been complex because everybody has
> > expressed a somewhat different requirement, which could be broken down
> > as:
> > 1. The need for a multi-user-accessible yet temporary table
> > 2. Loading data into a table immediately after it is created (i.e. in
> > same transaction), including but not limited to a reload from pg_dump
> > 3. How to load data quickly into an existing table (COPY)
> > 4. How to add/modify data quickly in an existing table (INSERT SELECT,
> > UPDATE)
>
> > I can see the need for all of those individually; my existing patch
> > submission covers (2) and (3) only. I very much like your thought to
> > coalesce these various requirements into a single coherent model.
>
> However, you then seem to be arguing for still using the COPY LOCK
> syntax, which I think Bruce intended would go away in favor of using
> these ALTER commands. Certainly that's what I'd prefer --- COPY has
> got too darn many options already.
>
> > In the patch, pg_dump has *not* been altered to use COPY LOCK, so a
> > pg_dump *will* work with any other version of PostgreSQL, which *would
> > not* be the case if we added ALTER TABLE ... RELIABILITY statements into
> > it.
>
> Wrong --- the good thing about ALTER TABLE is that an old version of
> Postgres would simply reject it and keep going. Therefore we could get
> the speedup in dumps without losing compatibility, which is not true
> of COPY LOCK.
>
> BTW, this is a perfect example of the use-case for not abandoning a
> dump-file load simply because one command fails. (We have relied on
> this sort of reasoning many times before, too, for example by using
> "SET default_with_oids" in preference to CREATE TABLE WITH/WITHOUT OIDS.)
> I don't think that "wrap the whole load into begin/end" is really a very
> workable answer, because there are far too many scenarios where you
> can't do that. Another one where it doesn't help is a data-only dump.

Yep, Tom is echoing my reaction. There is a temptation to add things up
onto existing commands, e.g. LOCK, and while it works, it makes for some
very complex user API's. Having COPY behave differently because it is
in a transaction is fine as long as it is user-invisible, but once you
require users to do that to get the speedup, it isn't user-invisible
anymore.

(I can see it now, "Why is pg_dump putting things in transactions?",
"Because it prevents it from being logged." "Oh, should I be doing that
in my code?" "Perhaps, if you want ..." You can see where that
discussion is going. Having them see "ATER TABLE ... RELIBILITY
TRUNCATE" is very clear, and very clear on how it can be used in user
code.)

I think there is great utility in giving users one API, namely
RELIABILITY (or some other keyword), and telling them that is where they
control logging. I realize adding one keyword, LOCK, to an existing
command isn't a big deal, but once you decentralize your API enough
times, you end up with a terribly complex database system. It is this
design rigidity that helps make PostgreSQL so much easier to use than
other database systems.

I do think it is valid concern about someone use the table between the
CREATE and the ALTER TABLE RELIABILITY. One solution would be to allow
the RELIABILITY as part of the CREATE TABLE, another is to tell users to
create the table inside a transaction.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Greg Stark <gsstark(at)mit(dot)edu>
To: "Andrew Dunstan" <andrew(at)dunslane(dot)net>
Cc: <pgman(at)candle(dot)pha(dot)pa(dot)us>, <kleptog(at)svana(dot)org>, <simon(at)2ndquadrant(dot)com>, <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <gsstark(at)mit(dot)edu>, <pg(at)rbt(dot)ca>, <zhouqq(at)cs(dot)toronto(dot)edu>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-29 17:20:32
Message-ID: 87vex74y73.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Andrew Dunstan" <andrew(at)dunslane(dot)net> writes:

> Bruce Momjian said:
> > DROP would drop the table on a restart
> > after a non-clean shutdown. It would do _no_ logging on the table and
> > allow concurrent access, plus index access. DELETE is the same as
> > DROP, but it just truncates the table (perhaps TRUNCATE is a better
> > word).
> >
> > EXCLUSIVE would allow only a single session to modify the table, and
> > would do all changes by appending to the table, similar to COPY LOCK.
> > EXCLUSIVE would also not allow indexes because those can not be
> > isolated like appending to the heap. EXCLUSIVE would write all dirty
> > shared buffers for the table and fsync them before committing. SHARE
> > is the functionality we have now, with full logging.
>
> I an horribly scared that this will be used as a "performance boost" for
> normal use. I would at least like to see some restrictions that make it
> harder to mis-use. Perhaps restrict to superuser?

Well that's its whole purpose. At least you can hardly argue that you didn't
realize the consequences of "DELETE ROWS ON RECOVERY"... :)

Some thoughts:

a) I'm not sure I understand the purpose of EXCLUSIVE. When would I ever want to
use it instead of DELETE ROWS?

b) It seems like the other feature people were talking about of not logging
for a table created within the same transaction should be handled by
having this flag implicitly set for any such newly created table.
Ie, the test for whether to log would look like:

if (!table->logged && table->xid != myxid) ...

c) Every option in ALTER TABLE should be in CREATE TABLE as well.

d) Yes as someone else mentioned, this should only be allowable on a table
with no foreign keys referencing it.

--
greg


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, kleptog(at)svana(dot)org, simon(at)2ndquadrant(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, pg(at)rbt(dot)ca, zhouqq(at)cs(dot)toronto(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-29 17:30:49
Message-ID: 200512291730.jBTHUnn09840@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark wrote:
> "Andrew Dunstan" <andrew(at)dunslane(dot)net> writes:
>
> > Bruce Momjian said:
> > > DROP would drop the table on a restart
> > > after a non-clean shutdown. It would do _no_ logging on the table and
> > > allow concurrent access, plus index access. DELETE is the same as
> > > DROP, but it just truncates the table (perhaps TRUNCATE is a better
> > > word).
> > >
> > > EXCLUSIVE would allow only a single session to modify the table, and
> > > would do all changes by appending to the table, similar to COPY LOCK.
> > > EXCLUSIVE would also not allow indexes because those can not be
> > > isolated like appending to the heap. EXCLUSIVE would write all dirty
> > > shared buffers for the table and fsync them before committing. SHARE
> > > is the functionality we have now, with full logging.
> >
> > I an horribly scared that this will be used as a "performance boost" for
> > normal use. I would at least like to see some restrictions that make it
> > harder to mis-use. Perhaps restrict to superuser?
>
> Well that's its whole purpose. At least you can hardly argue that you didn't
> realize the consequences of "DELETE ROWS ON RECOVERY"... :)

True. I think we are worried about non-owners using it, but the owner
had to grant permissions for others to modify it, so we might be OK.

> Some thoughts:
>
> a) I'm not sure I understand the purpose of EXCLUSIVE. When would I ever want to
> use it instead of DELETE ROWS?

Good question. The use case is doing COPY into a table that already had
data. EXCLUSIVE allows additions to the table but preserves the
existing data on a crash.

> b) It seems like the other feature people were talking about of not logging
> for a table created within the same transaction should be handled by
> having this flag implicitly set for any such newly created table.
> Ie, the test for whether to log would look like:
>
> if (!table->logged && table->xid != myxid) ...

Yes, the question is whether we want to limit users to having this
optimization _only_ when they have created the table in the same
transaction, and the short answer is we don't.

> c) Every option in ALTER TABLE should be in CREATE TABLE as well.

I looked into that and see that things like:

ALTER [ COLUMN ] column SET STATISTICS integer
ALTER [ COLUMN ] column SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN }

are not supported by CREATE TABLE, and probably shouldn't be because the
value can be changed after the table is created. I think the only
things we usually support in CREATE TABLE are those that cannot be
altered.

> d) Yes as someone else mentioned, this should only be allowable on a table
> with no foreign keys referencing it.

Right, and EXCLUSIVE can not have an index either.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Greg Stark <gsstark(at)mit(dot)edu>, Rod Taylor <pg(at)rbt(dot)ca>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-30 13:09:12
Message-ID: 1135948152.2862.113.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2005-12-29 at 11:37 -0500, Bruce Momjian wrote:
> Tom Lane wrote:
> > Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > > My view would be that this thread has been complex because everybody has
> > > expressed a somewhat different requirement, which could be broken down
> > > as:
> > > 1. The need for a multi-user-accessible yet temporary table
> > > 2. Loading data into a table immediately after it is created (i.e. in
> > > same transaction), including but not limited to a reload from pg_dump
> > > 3. How to load data quickly into an existing table (COPY)
> > > 4. How to add/modify data quickly in an existing table (INSERT SELECT,
> > > UPDATE)

> > However, you then seem to be arguing for still using the COPY LOCK
> > syntax, which I think Bruce intended would go away in favor of using
> > these ALTER commands. Certainly that's what I'd prefer --- COPY has
> > got too darn many options already.

COPY LOCK was Tom's suggestion at the end of a long discussion thread on
this precise issue. Nobody objected to it at that point; I implemented
it *exactly* that way because I wanted to very visibly follow the
consensus of the community, after informed debate.
http://archives.postgresql.org/pgsql-hackers/2005-06/msg00068.php

Please re-read the links to previous discussions.
http://archives.postgresql.org/pgsql-hackers/2005-06/msg00069.php
There are points there, not made by me, that still apply and need to be
considered here, yet have not been.

Just to restate my current thinking:
- agree we should have ALTER TABLE ... RELIABILITY DELETE ROWS
- we should have COPY LOCK rather than
ALTER TABLE .... RELIABILITY EXCLUSIVE
(Though I welcome better wording and syntax in either case; it is the
behaviour only that I discuss).

It seems now that we have agreed approaches for (1), (2) and (4). Please
note that I have listened to the needs of others with regard to
requirement (1), as espoused by earlier by Hannu and again now by
Martijn. Some of the points about requirement (3) I made in my previous
post have not yet been addressed, IMHO.

My mind is not fixed. AFAICS there are valid points remaining on both
sides of the discussion about loading data quickly into an existing
table.

> I do think it is valid concern about someone use the table between the
> CREATE and the ALTER TABLE RELIABILITY. One solution would be to allow
> the RELIABILITY as part of the CREATE TABLE, another is to tell users to
> create the table inside a transaction.

Neither solution works for this use case:

> > 3. How to load data quickly into an existing table (COPY)

This is the only use case for which ALTER TABLE ... EXCLUSIVE makes
sense. That option means that any write lock held upon the table would
be an EXCLUSIVE table lock, so would never be a performance gain with
single row INSERT, UPDATE or DELETEs.

Following Andrew's concerns, I'd also note that ALTER TABLE requires a
much higher level of privilege to operate than does COPY. That sounds
like it will make things more secure, but all it does is open up the
administrative rights, since full ownership rights must be obtained
merely to load data.

> Having COPY behave differently because it is
> in a transaction is fine as long as it is user-invisible

Good

> I think there is great utility in giving users one API, namely
> RELIABILITY (or some other keyword), and telling them that is where they
> control logging. I realize adding one keyword, LOCK, to an existing
> command isn't a big deal, but once you decentralize your API enough
> times, you end up with a terribly complex database system. It is this
> design rigidity that helps make PostgreSQL so much easier to use than
> other database systems.

I do see the appeal of your suggestion...

TRUNCATE is a special command to delete quickly. There is no requirement
to do an ALTER TABLE statement before that command executes.

Balance would suggest that a special command to load data quickly would
be reasonably accepted by users.

Minor points below:

> > > In the patch, pg_dump has *not* been altered to use COPY LOCK, so a
> > > pg_dump *will* work with any other version of PostgreSQL, which *would
> > > not* be the case if we added ALTER TABLE ... RELIABILITY statements into
> > > it.
> >
> > Wrong --- the good thing about ALTER TABLE is that an old version of
> > Postgres would simply reject it and keep going. Therefore we could get
> > the speedup in dumps without losing compatibility, which is not true
> > of COPY LOCK.

That was pointing out one of Bruce's objections was not relevant because
it assumed COPY LOCK was required to make pg_restore go faster; that was
not the case - so there is no valid objection either way now.

> > BTW, this is a perfect example of the use-case for not abandoning a
> > dump-file load simply because one command fails. (We have relied on
> > this sort of reasoning many times before, too, for example by using
> > "SET default_with_oids" in preference to CREATE TABLE WITH/WITHOUT OIDS.)
> > I don't think that "wrap the whole load into begin/end" is really a very
> > workable answer, because there are far too many scenarios where you
> > can't do that. Another one where it doesn't help is a data-only dump.

Which is why --single-transaction is not the default, per the earlier
discussion on that point (on -patches).

> Yep, Tom is echoing my reaction. There is a temptation to add things up
> onto existing commands, e.g. LOCK, and while it works, it makes for some
> very complex user API's. Having COPY behave differently because it is
> in a transaction is fine as long as it is user-invisible, but once you
> require users to do that to get the speedup, it isn't user-invisible
> anymore.
>
> (I can see it now, "Why is pg_dump putting things in transactions?",
> "Because it prevents it from being logged." "Oh, should I be doing that
> in my code?" "Perhaps, if you want ..." You can see where that
> discussion is going. Having them see "ATER TABLE ... RELIBILITY
> TRUNCATE" is very clear, and very clear on how it can be used in user
> code.)

The above case is not an argument against COPY LOCK. Exactly what you
say above would still occur even when we have ALTER TABLE ...
RELIABILITY statement, since COPY LOCK and
COPY-optimized-within-same-transaction are different things.

Best Regards, Simon Riggs


From: "Andrew Dunstan" <andrew(at)dunslane(dot)net>
To: <simon(at)2ndquadrant(dot)com>
Cc: <pgman(at)candle(dot)pha(dot)pa(dot)us>, <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <kleptog(at)svana(dot)org>, <gsstark(at)mit(dot)edu>, <pg(at)rbt(dot)ca>, <zhouqq(at)cs(dot)toronto(dot)edu>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-30 14:53:33
Message-ID: 56737.68.143.134.146.1135954413.squirrel@www.dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs said:

>
> Following Andrew's concerns, I'd also note that ALTER TABLE requires a
> much higher level of privilege to operate than does COPY. That sounds
> like it will make things more secure, but all it does is open up the
> administrative rights, since full ownership rights must be obtained
> merely to load data.
>

Yeah. And since a role can own a table you could have a role and add lots of
users to it ....

My concern is more about making plain that this is for special operations,
not normal operations. Or maybe I have misunderstood the purpose.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Andrew Dunstan" <andrew(at)dunslane(dot)net>
Cc: simon(at)2ndquadrant(dot)com, pgman(at)candle(dot)pha(dot)pa(dot)us, kleptog(at)svana(dot)org, gsstark(at)mit(dot)edu, pg(at)rbt(dot)ca, zhouqq(at)cs(dot)toronto(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-30 14:57:06
Message-ID: 11876.1135954626@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Andrew Dunstan" <andrew(at)dunslane(dot)net> writes:
> Simon Riggs said:
>> Following Andrew's concerns, I'd also note that ALTER TABLE requires a
>> much higher level of privilege to operate than does COPY. That sounds
>> like it will make things more secure, but all it does is open up the
>> administrative rights, since full ownership rights must be obtained
>> merely to load data.

> My concern is more about making plain that this is for special operations,
> not normal operations. Or maybe I have misunderstood the purpose.

Rephrase that as "full ownership rights must be obtained to load data in
a way that requires dropping any existing indexes and locking out other
users of the table". I don't think the use-case for this will be very
large for non-owners, or indeed even for owners except during initial
table creation; and so I don't think the above argument is strong.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Greg Stark <gsstark(at)mit(dot)edu>, Rod Taylor <pg(at)rbt(dot)ca>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-30 16:49:59
Message-ID: 200512301649.jBUGnxn21488@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs wrote:
> On Thu, 2005-12-29 at 11:37 -0500, Bruce Momjian wrote:
> > Tom Lane wrote:
> > > Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > > > My view would be that this thread has been complex because everybody has
> > > > expressed a somewhat different requirement, which could be broken down
> > > > as:
> > > > 1. The need for a multi-user-accessible yet temporary table
> > > > 2. Loading data into a table immediately after it is created (i.e. in
> > > > same transaction), including but not limited to a reload from pg_dump
> > > > 3. How to load data quickly into an existing table (COPY)
> > > > 4. How to add/modify data quickly in an existing table (INSERT SELECT,
> > > > UPDATE)
>
> > > However, you then seem to be arguing for still using the COPY LOCK
> > > syntax, which I think Bruce intended would go away in favor of using
> > > these ALTER commands. Certainly that's what I'd prefer --- COPY has
> > > got too darn many options already.
>
> COPY LOCK was Tom's suggestion at the end of a long discussion thread on
> this precise issue. Nobody objected to it at that point; I implemented
> it *exactly* that way because I wanted to very visibly follow the
> consensus of the community, after informed debate.
> http://archives.postgresql.org/pgsql-hackers/2005-06/msg00068.php
>
> Please re-read the links to previous discussions.
> http://archives.postgresql.org/pgsql-hackers/2005-06/msg00069.php
> There are points there, not made by me, that still apply and need to be
> considered here, yet have not been.

Yes, I know we agreed to the COPY LOCK, but new features now being
requested, so we have to re-evaluate where we are going with COPY LOCK
to get a more consistent solution.

> Just to restate my current thinking:
> - agree we should have ALTER TABLE ... RELIABILITY DELETE ROWS
> - we should have COPY LOCK rather than
> ALTER TABLE .... RELIABILITY EXCLUSIVE
> (Though I welcome better wording and syntax in either case; it is the
> behaviour only that I discuss).
>
> It seems now that we have agreed approaches for (1), (2) and (4). Please
> note that I have listened to the needs of others with regard to
> requirement (1), as espoused by earlier by Hannu and again now by
> Martijn. Some of the points about requirement (3) I made in my previous
> post have not yet been addressed, IMHO.
>
> My mind is not fixed. AFAICS there are valid points remaining on both
> sides of the discussion about loading data quickly into an existing
> table.
>
> > I do think it is valid concern about someone use the table between the
> > CREATE and the ALTER TABLE RELIABILITY. One solution would be to allow
> > the RELIABILITY as part of the CREATE TABLE, another is to tell users to
> > create the table inside a transaction.
>
> Neither solution works for this use case:
>
> > > 3. How to load data quickly into an existing table (COPY)
>
> This is the only use case for which ALTER TABLE ... EXCLUSIVE makes
> sense. That option means that any write lock held upon the table would
> be an EXCLUSIVE table lock, so would never be a performance gain with
> single row INSERT, UPDATE or DELETEs.

Ah, but people wanted fast INSERT INTO ... SELECT, and that would use
EXCLUSIVE too. What about a massive UPDATE? Perhaps that could use
EXCLUSIVE? We don't want to add "LOCK" to every command that might use
EXCLUSIVE. ALTER is much better for this.

I agree if we thought EXCLUSIVE would only be used for COPY, we could
use LOCK, but I am thinking it will be used for other commands as well.

> Following Andrew's concerns, I'd also note that ALTER TABLE requires a
> much higher level of privilege to operate than does COPY. That sounds
> like it will make things more secure, but all it does is open up the
> administrative rights, since full ownership rights must be obtained
> merely to load data.

True, but as pointed out by others, I don't see that happening too
often.

> > Having COPY behave differently because it is
> > in a transaction is fine as long as it is user-invisible
>
> Good
>
> > I think there is great utility in giving users one API, namely
> > RELIABILITY (or some other keyword), and telling them that is where they
> > control logging. I realize adding one keyword, LOCK, to an existing
> > command isn't a big deal, but once you decentralize your API enough
> > times, you end up with a terribly complex database system. It is this
> > design rigidity that helps make PostgreSQL so much easier to use than
> > other database systems.
>
> I do see the appeal of your suggestion...
>
> TRUNCATE is a special command to delete quickly. There is no requirement
> to do an ALTER TABLE statement before that command executes.

The TRUNCATE happens during recovery. There is no user interaction. It
happens because we can't restore the contents of the table in a
consistent state because no logging was used. Basically, a table marked
RELIABILITY TRUNCATE would be truncated on a recovery start of the
postmaster.

> Balance would suggest that a special command to load data quickly would
> be reasonably accepted by users.
>
>
>
>
> Minor points below:
>
> > > > In the patch, pg_dump has *not* been altered to use COPY LOCK, so a
> > > > pg_dump *will* work with any other version of PostgreSQL, which *would
> > > > not* be the case if we added ALTER TABLE ... RELIABILITY statements into
> > > > it.
> > >
> > > Wrong --- the good thing about ALTER TABLE is that an old version of
> > > Postgres would simply reject it and keep going. Therefore we could get
> > > the speedup in dumps without losing compatibility, which is not true
> > > of COPY LOCK.
>
> That was pointing out one of Bruce's objections was not relevant because
> it assumed COPY LOCK was required to make pg_restore go faster; that was
> not the case - so there is no valid objection either way now.

I don't consider the single-transaction to be a no-cost solution. You
are adding flags to commands, and you are using a dump layout for
performance where the purpose for the layout is not clear. The ALTER is
clear to the user, and it allows nologging operations to happen after
the table is created.

In fact, for use in pg_dump, I think DROP is the proper operation for
loading, not your transaction wrapping solution. We already agree we
need DROP (or TRUNCATE), so why not use that rather than the transaction
wrap idea?

> > > BTW, this is a perfect example of the use-case for not abandoning a
> > > dump-file load simply because one command fails. (We have relied on
> > > this sort of reasoning many times before, too, for example by using
> > > "SET default_with_oids" in preference to CREATE TABLE WITH/WITHOUT OIDS.)
> > > I don't think that "wrap the whole load into begin/end" is really a very
> > > workable answer, because there are far too many scenarios where you
> > > can't do that. Another one where it doesn't help is a data-only dump.
>
> Which is why --single-transaction is not the default, per the earlier
> discussion on that point (on -patches).

Right, but why not use DROP/TRUNCATE? That works for old dumps too, and
has no downsides, meaning it can be always on.

> > Yep, Tom is echoing my reaction. There is a temptation to add things up
> > onto existing commands, e.g. LOCK, and while it works, it makes for some
> > very complex user API's. Having COPY behave differently because it is
> > in a transaction is fine as long as it is user-invisible, but once you
> > require users to do that to get the speedup, it isn't user-invisible
> > anymore.
> >
> > (I can see it now, "Why is pg_dump putting things in transactions?",
> > "Because it prevents it from being logged." "Oh, should I be doing that
> > in my code?" "Perhaps, if you want ..." You can see where that
> > discussion is going. Having them see "ATER TABLE ... RELIBILITY
> > TRUNCATE" is very clear, and very clear on how it can be used in user
> > code.)
>
> The above case is not an argument against COPY LOCK. Exactly what you
> say above would still occur even when we have ALTER TABLE ...
> RELIABILITY statement, since COPY LOCK and
> COPY-optimized-within-same-transaction are different things.

See my posting above that we might want EXCLUSIVE for other commands,
meaning ALTER makes more sense.

So, to summarize, I think we should add DROP/TRUNCATE, and use that by
default (or optionally off?) in pg_dump, and, assuming we want EXCLUSIVE
for more than just COPY, we need to add ALTER TABLE EXCLUSIVE.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: simon(at)2ndquadrant(dot)com, pgman(at)candle(dot)pha(dot)pa(dot)us, kleptog(at)svana(dot)org, gsstark(at)mit(dot)edu, pg(at)rbt(dot)ca, zhouqq(at)cs(dot)toronto(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-30 17:39:21
Message-ID: 43B570C9.6060406@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:

>"Andrew Dunstan" <andrew(at)dunslane(dot)net> writes:
>
>
>>Simon Riggs said:
>>
>>
>>>Following Andrew's concerns, I'd also note that ALTER TABLE requires a
>>>much higher level of privilege to operate than does COPY. That sounds
>>>like it will make things more secure, but all it does is open up the
>>>administrative rights, since full ownership rights must be obtained
>>>merely to load data.
>>>
>>>
>
>
>
>>My concern is more about making plain that this is for special operations,
>>not normal operations. Or maybe I have misunderstood the purpose.
>>
>>
>
>Rephrase that as "full ownership rights must be obtained to load data in
>a way that requires dropping any existing indexes and locking out other
>users of the table". I don't think the use-case for this will be very
>large for non-owners, or indeed even for owners except during initial
>table creation; and so I don't think the above argument is strong.
>
>
>
>

Those restrictions aren't true of Bruce's proposed drop and
delete/truncate recovery modes, are they?

People do crazy things in pursuit of performance. Illustration: a few
months ago I was instrumenting an app (based on MySQL/ISAM) and I
noticed that under load it simply didn't update the inventory properly -
of 1000 orders placed within a few seconds it might reduce inventory by
3 or 4. I reported this and they shrugged their shoulders and said
"well, we'd have to lock the table and that would slow everything down
...".

I just want to be sure we aren't providing a footgun. "Oh, just set
recovery mode to delete. It won't make any difference unless you crash
and you'll run faster."

cheers

andrew


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, simon(at)2ndquadrant(dot)com, kleptog(at)svana(dot)org, gsstark(at)mit(dot)edu, pg(at)rbt(dot)ca, zhouqq(at)cs(dot)toronto(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-30 17:58:15
Message-ID: 200512301758.jBUHwFv03107@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan wrote:
> >>My concern is more about making plain that this is for special operations,
> >>not normal operations. Or maybe I have misunderstood the purpose.
> >>
> >>
> >
> >Rephrase that as "full ownership rights must be obtained to load data in
> >a way that requires dropping any existing indexes and locking out other
> >users of the table". I don't think the use-case for this will be very
> >large for non-owners, or indeed even for owners except during initial
> >table creation; and so I don't think the above argument is strong.
> >
> >
> >
> >
>
> Those restrictions aren't true of Bruce's proposed drop and
> delete/truncate recovery modes, are they?

Only the owner could do the ALTER, for sure, but once the owner sets it,
any user with permission to write to the table would have those
characteristics.

> People do crazy things in pursuit of performance. Illustration: a few
> months ago I was instrumenting an app (based on MySQL/ISAM) and I
> noticed that under load it simply didn't update the inventory properly -
> of 1000 orders placed within a few seconds it might reduce inventory by
> 3 or 4. I reported this and they shrugged their shoulders and said
> "well, we'd have to lock the table and that would slow everything down
> ...".
>
> I just want to be sure we aren't providing a footgun. "Oh, just set
> recovery mode to delete. It won't make any difference unless you crash
> and you'll run faster."

I think we have to trust the object owner in this case. I don't know of
any super-user-only ALTER commands, but I suppose we could set it up
that way if we wanted.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Greg Stark <gsstark(at)mit(dot)edu>, Rod Taylor <pg(at)rbt(dot)ca>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-30 19:28:41
Message-ID: 1135970921.5052.68.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2005-12-30 at 11:49 -0500, Bruce Momjian wrote:

> Yes, I know we agreed to the COPY LOCK, but new features now being
> requested, so we have to re-evaluate where we are going with COPY LOCK
> to get a more consistent solution.

Thank you.

> Ah, but people wanted fast INSERT INTO ... SELECT, and that would use
> EXCLUSIVE too. What about a massive UPDATE? Perhaps that could use
> EXCLUSIVE? We don't want to add "LOCK" to every command that might use
> EXCLUSIVE. ALTER is much better for this.

> I agree if we thought EXCLUSIVE would only be used for COPY, we could
> use LOCK, but I am thinking it will be used for other commands as well.

Agreed, I will look to implement this.

Could the internals of my recent patch be reviewed? Changing the user
interface is less of a problem than changing the internals, which is
where the hard work takes place. I do not want to extend this work
further only to have that part rejected later.

The implications of EXCLUSIVE are:
- there will be a check on each and every I, U, D to check the state of
the relation
- *every* operation that attempts a write lock will attempt to acquire
an EXCLUSIVE full table lock instead
- following successful completion of *each* DML statement, the relation
will be heap_sync'd involving a full scan of the buffer cache

Can I clarify the wording of the syntax? Is EXCLUSIVE the right word?
How about FASTLOAD or BULKLOAD? Those words seem less likely to be
misused in the future - i.e. we are invoking a special mode, rather than
invoking a special "go faster" option.

> I don't consider the single-transaction to be a no-cost solution. You
> are adding flags to commands, and you are using a dump layout for
> performance where the purpose for the layout is not clear. The ALTER is
> clear to the user, and it allows nologging operations to happen after
> the table is created.
>
> In fact, for use in pg_dump, I think DROP is the proper operation for
> loading, not your transaction wrapping solution. We already agree we
> need DROP (or TRUNCATE), so why not use that rather than the transaction
> wrap idea?

This was discussed on-list by 2 core team members, a committer and
myself, but I see no requirements change here. You even accepted the
invisible COPY optimization in your last post - why unpick that now?
Please forgive my tone, but I am lost for reasonable yet expressive
words.

The --single-transaction mode would apply even if the dump was created
using an earlier version of pg_dump. pg_dump has *not* been altered at
all. (And I would again add that the idea was not my own)

> So, to summarize, I think we should add DROP/TRUNCATE, and use that by
> default (or optionally off?) in pg_dump, and, assuming we want EXCLUSIVE
> for more than just COPY, we need to add ALTER TABLE EXCLUSIVE.

Would you mind stating again what you mean, just so I can understand
this? Your summary isn't enough.

Best Regards, Simon Riggs


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Greg Stark <gsstark(at)mit(dot)edu>, Rod Taylor <pg(at)rbt(dot)ca>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-30 21:14:49
Message-ID: 200512302114.jBULEno02301@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs wrote:
> On Fri, 2005-12-30 at 11:49 -0500, Bruce Momjian wrote:
>
> > Yes, I know we agreed to the COPY LOCK, but new features now being
> > requested, so we have to re-evaluate where we are going with COPY LOCK
> > to get a more consistent solution.
>
> Thank you.

Good. I think we can be happy that COPY LOCK didn't get into a release,
so we don't have to support it forever. When we are adding features, we
have to consider not only the current release, but future releases and
what people will ask for in the future so the syntax can be expanded
without breaking previous usage.

> > Ah, but people wanted fast INSERT INTO ... SELECT, and that would use
> > EXCLUSIVE too. What about a massive UPDATE? Perhaps that could use
> > EXCLUSIVE? We don't want to add "LOCK" to every command that might use
> > EXCLUSIVE. ALTER is much better for this.
>
> > I agree if we thought EXCLUSIVE would only be used for COPY, we could
> > use LOCK, but I am thinking it will be used for other commands as well.
>
> Agreed, I will look to implement this.
>
> Could the internals of my recent patch be reviewed? Changing the user
> interface is less of a problem than changing the internals, which is
> where the hard work takes place. I do not want to extend this work
> further only to have that part rejected later.

OK, I will look it over this week or next.

> The implications of EXCLUSIVE are:
> - there will be a check on each and every I, U, D to check the state of
> the relation
> - *every* operation that attempts a write lock will attempt to acquire
> an EXCLUSIVE full table lock instead
> - following successful completion of *each* DML statement, the relation
> will be heap_sync'd involving a full scan of the buffer cache

Yes, I think that is it. What we can do is implement EXCLUSIVE to
affect only COPY at this point, and document that, and later add other
commands.

> Can I clarify the wording of the syntax? Is EXCLUSIVE the right word?
> How about FASTLOAD or BULKLOAD? Those words seem less likely to be
> misused in the future - i.e. we are invoking a special mode, rather than
> invoking a special "go faster" option.

The problem with the FASTLOAD/BULKLOAD words is that EXCLUSIVE mode is
probably not the best for loading. I would think TRUNCATE would be a
better option.

In fact, in loading a table, I think both EXCLUSIVE and TRUNCATE would be
the same, mostly. You would create the table, set its RELIABILITY to
TRUNCATE, COPY into the table, then set the RELIABILITY to SHARE or
DEFAULT. The second ALTER has to sync all the dirty data blocks, which
the same thing EXCLUSIVE does at the conclusion of COPY.

So, we need a name for EXCLUSIVE mode that suggests how it is different
from TRUNCATE, and in this case, the difference is that EXCLUSIVE
preserves the previous contents of the table on recovery, while TRUNCATE
does not. Do you want to call the mode PRESERVE, or EXCLUSIVE WRITER?
Anyway, the keywords are easy to modify, even after the patch is
submitted. FYI, I usually go through keywords.c looking for a keyword
we already use.

> > I don't consider the single-transaction to be a no-cost solution. You
> > are adding flags to commands, and you are using a dump layout for
> > performance where the purpose for the layout is not clear. The ALTER is
> > clear to the user, and it allows nologging operations to happen after
> > the table is created.
> >
> > In fact, for use in pg_dump, I think DROP is the proper operation for
> > loading, not your transaction wrapping solution. We already agree we
> > need DROP (or TRUNCATE), so why not use that rather than the transaction
> > wrap idea?
>
> This was discussed on-list by 2 core team members, a committer and
> myself, but I see no requirements change here. You even accepted the
> invisible COPY optimization in your last post - why unpick that now?
> Please forgive my tone, but I am lost for reasonable yet expressive
> words.

Do you think you are the only one who has rewritten a patch multiple
times? We all have. The goal is to get the functionality into the
system in the most seamless way possible. Considering the number of
people who use PostgreSQL, if it takes use 10 tries, it is worth it
considering the thousands of people who will use it. Would you have us
include a sub-optimal patch and have thousands of people adjust to its
non-optimal functionality? I am sure you would not. Perhaps a company
would say, "Oh, just ship it", but we don't.

> The --single-transaction mode would apply even if the dump was created
> using an earlier version of pg_dump. pg_dump has *not* been altered at
> all. (And I would again add that the idea was not my own)

I assume you mean this:

http://archives.postgresql.org/pgsql-patches/2005-12/msg00257.php

I guess with the ALTER commands I don't see much value in the
--single-transaction flag. I am sure others suggested it, but would
they suggest it now given our current direction. The fact that the
patch was submitted does not give it any more weight --- the question is
does this feature make sense for 8.2. The goal is not to cram as many
optimizations into PostgreSQL as possible, the goal is to present a
consistent usable system to users.

> > So, to summarize, I think we should add DROP/TRUNCATE, and use that by
> > default (or optionally off?) in pg_dump, and, assuming we want EXCLUSIVE
> > for more than just COPY, we need to add ALTER TABLE EXCLUSIVE.
>
> Would you mind stating again what you mean, just so I can understand
> this? Your summary isn't enough.

New ALTER TABLE mode, perhaps call it PERSISTENCE:

ALTER TABLE tab PERSISTENCE DROP ON RECOVERY
ALTER TABLE tab PERSISTENCE TRUNCATE ON RECOVERY

These would drop or truncate all tables with this flag on a non-clean
start of the postmaster, and write something in the server logs.
However, I don't know that we have the code in place to DROP/TRUNCATE in
recovery mode, and it would affect all databases, so it could be quite
complex to implement. In this mode, no WAL logs would be written for
table modifications, though DDL commands would have to be logged.

ALTER TABLE tab PERSISTENCE PRESERVE (or STABLE?)

Table contents are preserved across recoveries, but data modifications
can happen only one at a time. I don't think we have a lock mode that
does this, so I am worried a new lock mode will have to be created. A
simplified solution at this stage would be to take an exclusive lock on
the table, but really we just need a single-writer table lock, which I
don't think we have. initially this can implemented to only affect COPY
but later can be done for other commands.

ALTER TABLE tab PERSISTENCE DEFAULT

This would be our current default mode, which is full concurrency and
persistence.

It took me over an hour to write this, but I feel the time is worth it
because of the number of users who use our software.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Greg Stark <gsstark(at)mit(dot)edu>, Rod Taylor <pg(at)rbt(dot)ca>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-30 22:36:24
Message-ID: 87mzii8b6f.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


As far as EXCLUSIVE or COPY LOCK goes, I think this would be useful
functionality but perhaps there doesn't have to be any proprietary user
interface to it at all. Why not just check if the conditions are already
present to allow the optimization and if so go ahead.

That is, if the current transaction already has an exclusive lock on the table
and there are no indexes (and PITR isn't active) then Postgres could go ahead
and use the same WAL skipping logic as the other operations that already so
so. This would work for inserts whether coming from COPY or plain SQL INSERTs.

The nice thing about this is that the user's SQL wouldn't need any proprietary
extensions at all. Just tell people to do

BEGIN;
LOCK TABLE foo;
COPY foo from ...
COMMIT;

There could be a COPY LOCK option to obtain a lock, but it would be purely for
user convenience so they don't have to bother with BEGIN and COMMIt.

The only downside is a check to see if an exclusive table lock is present on
every copy and insert. That might be significant but perhaps there are ways to
finess that. If not perhaps only doing it on COPY would be a good compromise.

--
greg


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Rod Taylor <pg(at)rbt(dot)ca>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-30 22:46:45
Message-ID: 200512302246.jBUMkjF25196@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark wrote:
>
> As far as EXCLUSIVE or COPY LOCK goes, I think this would be useful
> functionality but perhaps there doesn't have to be any proprietary user
> interface to it at all. Why not just check if the conditions are already
> present to allow the optimization and if so go ahead.
>
> That is, if the current transaction already has an exclusive lock on the table
> and there are no indexes (and PITR isn't active) then Postgres could go ahead
> and use the same WAL skipping logic as the other operations that already so
> so. This would work for inserts whether coming from COPY or plain SQL INSERTs.
>
> The nice thing about this is that the user's SQL wouldn't need any proprietary
> extensions at all. Just tell people to do
>
> BEGIN;
> LOCK TABLE foo;
> COPY foo from ...
> COMMIT;
>
> There could be a COPY LOCK option to obtain a lock, but it would be purely for
> user convenience so they don't have to bother with BEGIN and COMMIt.
>
> The only downside is a check to see if an exclusive table lock is present on
> every copy and insert. That might be significant but perhaps there are ways to
> finess that. If not perhaps only doing it on COPY would be a good compromise.

Well, again, if we wanted to use EXCLUSIVE only for COPY, this might
make sense. However, also consider that the idea for EXCLUSIVE was that
users could continue read-only queries on the table while it is being
loaded (like COPY allows now), and that in EXCLUSIVE mode, we are only
going to write into new pages.

If someone has an exclusive lock on the table and does a COPY or SELECT
INTO do we want to assume we are only going to write into new pages, and
do we want to force an exclusive lock rather than a single-writer lock?
I don't think so.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Greg Stark <gsstark(at)mit(dot)edu>, Rod Taylor <pg(at)rbt(dot)ca>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-30 22:50:21
Message-ID: 1135983021.5052.94.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2005-12-30 at 16:14 -0500, Bruce Momjian wrote:

> > This was discussed on-list by 2 core team members, a committer and
> > myself, but I see no requirements change here. You even accepted the
> > invisible COPY optimization in your last post - why unpick that now?
> > Please forgive my tone, but I am lost for reasonable yet expressive
> > words.
>
> Do you think you are the only one who has rewritten a patch multiple
> times? We all have. The goal is to get the functionality into the
> system in the most seamless way possible. Considering the number of
> people who use PostgreSQL, if it takes use 10 tries, it is worth it
> considering the thousands of people who will use it. Would you have us
> include a sub-optimal patch and have thousands of people adjust to its
> non-optimal functionality? I am sure you would not. Perhaps a company
> would say, "Oh, just ship it", but we don't.

You're right.

Not like we've not been here before, eh?

[I'll look at the tech another day]

Best Regards, Simon Riggs


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Greg Stark <gsstark(at)mit(dot)edu>, Rod Taylor <pg(at)rbt(dot)ca>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-30 22:52:49
Message-ID: 200512302252.jBUMqna04275@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs wrote:
> On Fri, 2005-12-30 at 16:14 -0500, Bruce Momjian wrote:
>
> > > This was discussed on-list by 2 core team members, a committer and
> > > myself, but I see no requirements change here. You even accepted the
> > > invisible COPY optimization in your last post - why unpick that now?
> > > Please forgive my tone, but I am lost for reasonable yet expressive
> > > words.
> >
> > Do you think you are the only one who has rewritten a patch multiple
> > times? We all have. The goal is to get the functionality into the
> > system in the most seamless way possible. Considering the number of
> > people who use PostgreSQL, if it takes use 10 tries, it is worth it
> > considering the thousands of people who will use it. Would you have us
> > include a sub-optimal patch and have thousands of people adjust to its
> > non-optimal functionality? I am sure you would not. Perhaps a company
> > would say, "Oh, just ship it", but we don't.
>
> You're right.
>
> Not like we've not been here before, eh?
>
> [I'll look at the tech another day]

I know it is discouraging. I have felt it many times myself. However, I
have to keep my eye on the greater good that we are doing as a project,
and that my frustration is a small price to pay for the greater
usability we will give to our users.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Rod Taylor <pg(at)rbt(dot)ca>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-30 23:04:33
Message-ID: 87hd8q89vi.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:

> > BEGIN;
> > LOCK TABLE foo;
> > COPY foo from ...
> > COMMIT;
> >
> > There could be a COPY LOCK option to obtain a lock, but it would be purely for
> > user convenience so they don't have to bother with BEGIN and COMMIt.
> >
> > The only downside is a check to see if an exclusive table lock is present on
> > every copy and insert. That might be significant but perhaps there are ways to
> > finess that. If not perhaps only doing it on COPY would be a good compromise.
>
> Well, again, if we wanted to use EXCLUSIVE only for COPY, this might
> make sense. However, also consider that the idea for EXCLUSIVE was that
> users could continue read-only queries on the table while it is being
> loaded (like COPY allows now), and that in EXCLUSIVE mode, we are only
> going to write into new pages.

Well I pictured the above kicking in for any insert. You can't do it on
deletes and updates anyways since torn pages could cause the table to become
corrupt.

We could add a LOCK TABLE SHARED feature to allow the appropriate type of lock
to be acquired.

But now that I think further on this that doesn't really make this free.
fsyncing a table isn't free. If some other transaction has come and done some
massive updates on the table then I come along and do a single quick insert I
don't necessarily want to fsync all those pending writes, it's cheaper to
fsync the WAL log.

> If someone has an exclusive lock on the table and does a COPY or SELECT
> INTO do we want to assume we are only going to write into new pages, and
> do we want to force an exclusive lock rather than a single-writer lock?
> I don't think so.

And only using new pages is itself a cost as well. Though I think the fact
that it would tend to mean a lot less seeking and more sequential i/o would
tend to make it a worth the extra garbage in the table.

It might be useful having some kind of LOCK TABLE SHARED anyways. It seems
silly to have the functionality in the database and not expose it for users.

--
greg


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Rod Taylor <pg(at)rbt(dot)ca>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-30 23:09:00
Message-ID: 200512302309.jBUN90D06794@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
>
> > > BEGIN;
> > > LOCK TABLE foo;
> > > COPY foo from ...
> > > COMMIT;
> > >
> > > There could be a COPY LOCK option to obtain a lock, but it would be purely for
> > > user convenience so they don't have to bother with BEGIN and COMMIt.
> > >
> > > The only downside is a check to see if an exclusive table lock is present on
> > > every copy and insert. That might be significant but perhaps there are ways to
> > > finess that. If not perhaps only doing it on COPY would be a good compromise.
> >
> > Well, again, if we wanted to use EXCLUSIVE only for COPY, this might
> > make sense. However, also consider that the idea for EXCLUSIVE was that
> > users could continue read-only queries on the table while it is being
> > loaded (like COPY allows now), and that in EXCLUSIVE mode, we are only
> > going to write into new pages.
>
> Well I pictured the above kicking in for any insert. You can't do it on
> deletes and updates anyways since torn pages could cause the table to become
> corrupt.

True, but UPDATE could save 1/2 the log I/O because the new rows would
not have to be logged.

> We could add a LOCK TABLE SHARED feature to allow the appropriate type of lock
> to be acquired.

Do you really want this behavior to be controlled by the locking mode?
That seem strange to me, as well as confusing, especially if you already
have some other kind of lock on the table.

> But now that I think further on this that doesn't really make this free.
> fsyncing a table isn't free. If some other transaction has come and done some
> massive updates on the table then I come along and do a single quick insert I
> don't necessarily want to fsync all those pending writes, it's cheaper to
> fsync the WAL log.

That is true.

> > If someone has an exclusive lock on the table and does a COPY or SELECT
> > INTO do we want to assume we are only going to write into new pages, and
> > do we want to force an exclusive lock rather than a single-writer lock?
> > I don't think so.
>
> And only using new pages is itself a cost as well. Though I think the fact
> that it would tend to mean a lot less seeking and more sequential i/o would
> tend to make it a worth the extra garbage in the table.
>
> It might be useful having some kind of LOCK TABLE SHARED anyways. It seems
> silly to have the functionality in the database and not expose it for users.

We could if there is a use-case for it.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: "Michael Paesold" <mpaesold(at)gmx(dot)at>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: simon(at)2ndquadrant(dot)com, andrew(at)dunslane(dot)net, tgl(at)sss(dot)pgh(dot)pa(dot)us, kleptog(at)svana(dot)org, gsstark(at)mit(dot)edu, pg(at)rbt(dot)ca, zhouqq(at)cs(dot)toronto(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2005-12-31 11:59:44
Message-ID: 14969.1136030384@www6.gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian wrote:

> > The --single-transaction mode would apply even if the dump was created
> > using an earlier version of pg_dump. pg_dump has *not* been altered at
> > all. (And I would again add that the idea was not my own)
>
> I assume you mean this:
>
> http://archives.postgresql.org/pgsql-patches/2005-12/msg00257.php
>
> I guess with the ALTER commands I don't see much value in the
> --single-transaction flag. I am sure others suggested it, but would
> they suggest it now given our current direction.

I just want to add that --single-transaction has a value of it's own. There
were times when I wanted to restore parts of a dump all-or-nothing.

This is possible with PostgreSQL, unlike many other DBM systems, because
people like Tom Lane have invested in ensuring that all DDL is working
without implicitly committing an enclosing transaction.

Using pg_restore directly into a database, it is not possible to get a
single transaction right now. One has to restore to a file and manually
added BEGIN/COMMIT. Just for that I think --single-transaction is a great
addition and a missing feature.

I think more people have a use-case for that.

Best Regards,
Michael Paesold

--
Telefonieren Sie schon oder sparen Sie noch?
NEU: GMX Phone_Flat http://www.gmx.net/de/go/telefonie


From: August Zajonc <augustz(at)augustz(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2006-01-01 01:02:24
Message-ID: dp79mp$f56$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

As a user and a list lurker I very much like Bruce's proposed ALTER
TABLE syntax. COPY LOCK (and the variants I can imagine being required
for all the other types of cases) don't seem as appealing.

And ALTER TABLE seems to make it clear it is an object level change,
feels like it fits the internal model of the change better.

As a user a quick note that I've really found the documentation strong,
which makes a big difference. Probably a low glory thing but much
appreciated.

Big props everyone I see posting for keeping an eye (and code) on the
various optimizations, a lot of the use cases are ones I can relate too
(beyond even the ETL ones). I'm always impressed at the range of areas
folks are looking at improving, and hope to see it encouraged, even with
something as useless as thanks :)

To a good new year... and a great past one.

- August


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: WAL bypass for INSERT, UPDATE and DELETE?
Date: 2006-01-03 13:08:46
Message-ID: 1136293726.5052.143.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2005-12-22 at 12:12 -0500, Tom Lane wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > Currently, CTAS optimization requires a heap_sync during ExecEndPlan. It
> > would be easy enough to extend this so that it also works for INSERT,
> > UPDATE and DELETE.
>
> If you tried to do it that way you'd break the system completely. Not
> all updates go through the executor.
>
> I think it's a bad idea anyway; you'd be adding overhead to the lowest
> level routines in order to support a feature that would be very seldom
> used, at least in comparison to the number of times those routines are
> executed.

The current thinking seems to be that we should implement an ALTER TABLE
RELIABILITY statement that applies to COPY, INSERT, UPDATE and DELETE.

> If you tried to do it that way you'd break the system completely. Not
> all updates go through the executor.

Where would I put a heap_sync to catch all of the I, U, D cases?
(Possibly multiple places).

Or were you thinking of things like ALTER TABLE TYPE?
Or perhaps inheritance?

Best Regards, Simon Riggs


From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, kleptog(at)svana(dot)org, simon(at)2ndquadrant(dot)com, gsstark(at)mit(dot)edu, pg(at)rbt(dot)ca, zhouqq(at)cs(dot)toronto(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2006-01-03 15:45:21
Message-ID: 20060103154521.GC82560@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Dec 29, 2005 at 11:24:28AM -0500, Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > Andrew Dunstan wrote:
> >> I an horribly scared that this will be used as a "performance boost" for
> >> normal use. I would at least like to see some restrictions that make it
> >> harder to mis-use. Perhaps restrict to superuser?
>
> > Certainly restrict to table owner.
>
> I can see the argument for superuser-only: decisions about data
> integrity tradeoffs should be reserved to the DBA, who is the one who
> will get blamed if the database loses data, no matter how stupid his
> users are.
>
> But I'm not wedded to that. I could live with table-owner.

I dislike restricting to super-user, and to some extent even table
owner. The reason is that if you have some automated batch process, you
don't want that process running as a superuser. Also, it is often
awkward to require that the user running that batch own the table.

I'd much rather see this as a grantable permission on the table. (The
same is true with truncate, btw). This way, if a DBA knew he could trust
a specific role, he could allow for these operations on a specific
table.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461


From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, simon(at)2ndquadrant(dot)com, kleptog(at)svana(dot)org, gsstark(at)mit(dot)edu, pg(at)rbt(dot)ca, zhouqq(at)cs(dot)toronto(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2006-01-03 15:58:34
Message-ID: 20060103155834.GD82560@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Dec 30, 2005 at 12:58:15PM -0500, Bruce Momjian wrote:
> Andrew Dunstan wrote:
> > >>My concern is more about making plain that this is for special operations,
> > >>not normal operations. Or maybe I have misunderstood the purpose.
> > >>
> > >>
> > >
> > >Rephrase that as "full ownership rights must be obtained to load data in
> > >a way that requires dropping any existing indexes and locking out other
> > >users of the table". I don't think the use-case for this will be very
> > >large for non-owners, or indeed even for owners except during initial
> > >table creation; and so I don't think the above argument is strong.
> > >
> >
> > Those restrictions aren't true of Bruce's proposed drop and
> > delete/truncate recovery modes, are they?
>
> Only the owner could do the ALTER, for sure, but once the owner sets it,
> any user with permission to write to the table would have those
> characteristics.

Dumb question: if the ALTER is done inside a transaction, and then
reverted at the end of the transaction, does that mean that no other
transactions would have those permissions? I think the general use-case
is that you only one the session doing the ALTER to be able to use these
special modes, not anyone else who happens to be hitting the table at
that time...
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, kleptog(at)svana(dot)org, simon(at)2ndquadrant(dot)com, gsstark(at)mit(dot)edu, pg(at)rbt(dot)ca, zhouqq(at)cs(dot)toronto(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2006-01-03 16:21:37
Message-ID: 20060103162137.GO6026@ns.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Jim C. Nasby (jnasby(at)pervasive(dot)com) wrote:
> I dislike restricting to super-user, and to some extent even table
> owner. The reason is that if you have some automated batch process, you
> don't want that process running as a superuser. Also, it is often
> awkward to require that the user running that batch own the table.

The owner of the table could be a role which the batch runner is part of
(along with whatever other roles you wish to have 'owner'-level
permissions on the table).

> I'd much rather see this as a grantable permission on the table. (The
> same is true with truncate, btw). This way, if a DBA knew he could trust
> a specific role, he could allow for these operations on a specific
> table.

In general, I do prefer that permissions be seperably grantable. Being
able to grant 'truncate' permissions would be really nice. Is the only
reason such permission doesn't exist due to no one working on it, or is
there other disagreement about it?

Thanks,

Stephen


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, simon(at)2ndquadrant(dot)com, kleptog(at)svana(dot)org, gsstark(at)mit(dot)edu, pg(at)rbt(dot)ca, zhouqq(at)cs(dot)toronto(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2006-01-03 16:26:51
Message-ID: 16809.1136305611@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Jim C. Nasby" <jnasby(at)pervasive(dot)com> writes:
> Dumb question: if the ALTER is done inside a transaction, and then
> reverted at the end of the transaction, does that mean that no other
> transactions would have those permissions? I think the general use-case
> is that you only one the session doing the ALTER to be able to use these
> special modes, not anyone else who happens to be hitting the table at
> that time...

Such an ALTER would certainly require exclusive lock on the table,
so I'm not sure that I see much use-case for doing it like that.
You'd want to do the ALTER and commit so as not to lock other people
out of the table entirely while doing the bulk data-pushing.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, kleptog(at)svana(dot)org, simon(at)2ndquadrant(dot)com, gsstark(at)mit(dot)edu, pg(at)rbt(dot)ca, zhouqq(at)cs(dot)toronto(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2006-01-03 16:29:02
Message-ID: 16856.1136305742@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Stephen Frost <sfrost(at)snowman(dot)net> writes:
> In general, I do prefer that permissions be seperably grantable. Being
> able to grant 'truncate' permissions would be really nice. Is the only
> reason such permission doesn't exist due to no one working on it, or is
> there other disagreement about it?

Lack of appetite for having forty nonstandard kinds of privilege,
I suppose ;-)

Given that we now have roles, it's fairly easy to grant "table owner"
to trusted people, so the use-case for special privilege types has
dropped off dramatically IMHO.

regards, tom lane


From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, kleptog(at)svana(dot)org, simon(at)2ndquadrant(dot)com, gsstark(at)mit(dot)edu, pg(at)rbt(dot)ca, zhouqq(at)cs(dot)toronto(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2006-01-03 16:38:52
Message-ID: 20060103163852.GI82560@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jan 03, 2006 at 11:29:02AM -0500, Tom Lane wrote:
> Stephen Frost <sfrost(at)snowman(dot)net> writes:
> > In general, I do prefer that permissions be seperably grantable. Being
> > able to grant 'truncate' permissions would be really nice. Is the only
> > reason such permission doesn't exist due to no one working on it, or is
> > there other disagreement about it?
>
> Lack of appetite for having forty nonstandard kinds of privilege,
> I suppose ;-)
>
> Given that we now have roles, it's fairly easy to grant "table owner"
> to trusted people, so the use-case for special privilege types has
> dropped off dramatically IMHO.

Yeah, I hadn't thought about that. I agree; if you trust some process
enough to have MVCC-affecting rights then you should be able to trust it
with full ownership rights.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461


From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, simon(at)2ndquadrant(dot)com, kleptog(at)svana(dot)org, gsstark(at)mit(dot)edu, pg(at)rbt(dot)ca, zhouqq(at)cs(dot)toronto(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2006-01-03 16:43:25
Message-ID: 20060103164325.GJ82560@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jan 03, 2006 at 11:26:51AM -0500, Tom Lane wrote:
> "Jim C. Nasby" <jnasby(at)pervasive(dot)com> writes:
> > Dumb question: if the ALTER is done inside a transaction, and then
> > reverted at the end of the transaction, does that mean that no other
> > transactions would have those permissions? I think the general use-case
> > is that you only one the session doing the ALTER to be able to use these
> > special modes, not anyone else who happens to be hitting the table at
> > that time...
>
> Such an ALTER would certainly require exclusive lock on the table,
> so I'm not sure that I see much use-case for doing it like that.
> You'd want to do the ALTER and commit so as not to lock other people
> out of the table entirely while doing the bulk data-pushing.

Maybe this just isn't clear, but would EXCLUSIVE block writes from all
other sessions then? The post I replied to mentioned that the ALTER
would affect all backends is why I'm wondering...
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, simon(at)2ndquadrant(dot)com, kleptog(at)svana(dot)org, gsstark(at)mit(dot)edu, pg(at)rbt(dot)ca, zhouqq(at)cs(dot)toronto(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2006-01-03 16:48:01
Message-ID: 17173.1136306881@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Jim C. Nasby" <jnasby(at)pervasive(dot)com> writes:
> On Tue, Jan 03, 2006 at 11:26:51AM -0500, Tom Lane wrote:
>> Such an ALTER would certainly require exclusive lock on the table,
>> so I'm not sure that I see much use-case for doing it like that.
>> You'd want to do the ALTER and commit so as not to lock other people
>> out of the table entirely while doing the bulk data-pushing.

> Maybe this just isn't clear, but would EXCLUSIVE block writes from all
> other sessions then?

I don't think it should (which implies that EXCLUSIVE is a bad name).
My point is that ALTER RELIABILITY would have to gain exclusive lock
for long enough to change the table's reliability marking --- you have
to synchronize such a change with other transactions' activity on the
table, and table-level locks are the only mechanism we have for that.
It's not different from a schema change such as adding a column.

regards, tom lane


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, kleptog(at)svana(dot)org, simon(at)2ndquadrant(dot)com, gsstark(at)mit(dot)edu, pg(at)rbt(dot)ca, zhouqq(at)cs(dot)toronto(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2006-01-03 16:54:01
Message-ID: 20060103165359.GP6026@ns.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> Stephen Frost <sfrost(at)snowman(dot)net> writes:
> > In general, I do prefer that permissions be seperably grantable. Being
> > able to grant 'truncate' permissions would be really nice. Is the only
> > reason such permission doesn't exist due to no one working on it, or is
> > there other disagreement about it?
>
> Lack of appetite for having forty nonstandard kinds of privilege,
> I suppose ;-)
>
> Given that we now have roles, it's fairly easy to grant "table owner"
> to trusted people, so the use-case for special privilege types has
> dropped off dramatically IMHO.

The problem is that you might want to grant 'truncate' to people who
*aren't* particularly trusted. For truncate, at least I have a
real-world use-case for it. I've got a number of users who are required
to fill in on the order of 300 seperate tables. We provide a number of
different ways of doing the uploads:
ODBC
phppgadmin
Our own phppgadmin-like interface
Web-based streaming CSV uploader

The CSV uploader is by far the most popular because it allows them to
easily reload a table from the files they have. The uploader starts off
with a 'delete from x' right now. I've been looking at (but don't
particularly like) writing a setuid pl/pgsql function so that I can make
the uploader be able to truncate the tables. This would almost entirely
eliminate the need to do vacuums on the tables. It's very rare for
there to be multiple things happening on a given database at a given
time too.

What does happen though, is that these tables are often used immediately
after they've been uploaded which means they needs to be analyzed too.
Again, something which would be very nice if the uploader could do.
Same with vacuums, in fact, if the uploader could vacuum the tables I
probably wouldn't need truncate as badly (though it'd still be nice),
since I could just delete from table; vacuum;

What these users *can't* do, by any means, is drop tables, or change the
structure or types or keys or anything else having to do with the table
definitions.

Writing setuid pl/pgsql functions for each of these is circumventing the
ACL and permission system of the database; working *around* it instead
of *with* it, which makes me somewhat nervous and feels like a lack in
the database. :/

Thanks!

Stephen


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, kleptog(at)svana(dot)org, simon(at)2ndquadrant(dot)com, gsstark(at)mit(dot)edu, pg(at)rbt(dot)ca, zhouqq(at)cs(dot)toronto(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2006-01-03 16:55:15
Message-ID: 20060103165515.GQ6026@ns.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Jim C. Nasby (jnasby(at)pervasive(dot)com) wrote:
> Yeah, I hadn't thought about that. I agree; if you trust some process
> enough to have MVCC-affecting rights then you should be able to trust it
> with full ownership rights.

About that, I have to disagree. :) I've got a case where this isn't
true, see my other post please. Giving someone truncate rights is
*very* different from allowing them to drop a table or change the
definition of the table.

Thanks,

Stephen


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, simon(at)2ndquadrant(dot)com, kleptog(at)svana(dot)org, gsstark(at)mit(dot)edu, pg(at)rbt(dot)ca, zhouqq(at)cs(dot)toronto(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2006-01-03 17:08:05
Message-ID: 200601031708.k03H85j27170@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> "Jim C. Nasby" <jnasby(at)pervasive(dot)com> writes:
> > On Tue, Jan 03, 2006 at 11:26:51AM -0500, Tom Lane wrote:
> >> Such an ALTER would certainly require exclusive lock on the table,
> >> so I'm not sure that I see much use-case for doing it like that.
> >> You'd want to do the ALTER and commit so as not to lock other people
> >> out of the table entirely while doing the bulk data-pushing.
>
> > Maybe this just isn't clear, but would EXCLUSIVE block writes from all
> > other sessions then?
>
> I don't think it should (which implies that EXCLUSIVE is a bad name).

Agreed, EXCLUSIVE was used to mean an _exclusive_ writer. The new words
I proposed were PRESERVE or STABLE.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, kleptog(at)svana(dot)org, simon(at)2ndquadrant(dot)com, gsstark(at)mit(dot)edu, pg(at)rbt(dot)ca, zhouqq(at)cs(dot)toronto(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2006-01-03 17:37:32
Message-ID: 17841.1136309852@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Stephen Frost <sfrost(at)snowman(dot)net> writes:
> The problem is that you might want to grant 'truncate' to people who
> *aren't* particularly trusted. For truncate, at least I have a
> real-world use-case for it.

I don't find this use-case particularly convincing. If the users are
allowed to delete all data in a given table, then that table must be
dedicated to them anyway; so it's not that easy to see why you can't
risk giving them ownership rights on it. The worst they can do is
screw up their own data, no?

In any case, I don't see what's so wrong with the model of using
SECURITY DEFINER interface functions when you want a security
restriction that's finer-grain than the system provides. I really
*don't* want to see us trying to, say, categorize every variety of
ALTER TABLE as a separately grantable privilege. I could live with
something like a catchall "ADMIN" privilege ... except it's not
clear how that would differ from ownership.

regards, tom lane


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, kleptog(at)svana(dot)org, simon(at)2ndquadrant(dot)com, gsstark(at)mit(dot)edu, pg(at)rbt(dot)ca, zhouqq(at)cs(dot)toronto(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2006-01-03 18:30:56
Message-ID: 20060103183056.GR6026@ns.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> I don't find this use-case particularly convincing. If the users are
> allowed to delete all data in a given table, then that table must be
> dedicated to them anyway; so it's not that easy to see why you can't
> risk giving them ownership rights on it. The worst they can do is
> screw up their own data, no?

Being able to delete all data in a given table in no way implies
ownership rights. The tables are part of a specification which the
users are being asked to respond to. Being able to change the table
types or remove the constraints put on the tables would allow the
users to upload garbage which would then affect downstream processing.

We can't guarentee this won't happen anyway but we try to confine the
things they can mess up to a reasonable set which we can check for (and
do, through a rather involved error checking system). There are *alot*
of things built on top of the table structures and having them change
would basically break the whole system (without the appropriate changes
being made to the other parts of the system).

> In any case, I don't see what's so wrong with the model of using
> SECURITY DEFINER interface functions when you want a security
> restriction that's finer-grain than the system provides. I really
> *don't* want to see us trying to, say, categorize every variety of
> ALTER TABLE as a separately grantable privilege. I could live with
> something like a catchall "ADMIN" privilege ... except it's not
> clear how that would differ from ownership.

I don't think anyone's asked for 'ALTER TABLE' privileges to be
seperately grantable. It seems to me that the privileges which *need*
to be grantable are ones associated with DML statements. I would
classify TRUNCATE, VACUUM and ANALYZE as DML statements (along with
select, insert, update, and delete). They're PostgreSQL-specific DML
statements but they still fall into that category. I don't think
it's a coincidence that the SQL-defined DML statements are all,
individually, grantable.

That doesn't mean I think we should get rid of RULE, REFERENCES or
TRIGGER, though honestly I've very rarely needed to grant any of them
(I don't think I've ever granted RULE or TRIGGER...). References is
DDL-oriented, but for *other* tables; RULE and TRIGGER are DDL and I
can't really justify why someone other than the owner would need them
but I'm guessing someone's using them. I don't think their existance
should imply that if we ever change the grants again we have to include
all types of 'ALTER TABLE', etc, though.

Thanks,

Stephen


From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, simon(at)2ndquadrant(dot)com, kleptog(at)svana(dot)org, gsstark(at)mit(dot)edu, pg(at)rbt(dot)ca, zhouqq(at)cs(dot)toronto(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2006-01-03 21:16:55
Message-ID: 20060103211655.GR82560@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jan 03, 2006 at 12:08:05PM -0500, Bruce Momjian wrote:
> Tom Lane wrote:
> > "Jim C. Nasby" <jnasby(at)pervasive(dot)com> writes:
> > > On Tue, Jan 03, 2006 at 11:26:51AM -0500, Tom Lane wrote:
> > >> Such an ALTER would certainly require exclusive lock on the table,
> > >> so I'm not sure that I see much use-case for doing it like that.
> > >> You'd want to do the ALTER and commit so as not to lock other people
> > >> out of the table entirely while doing the bulk data-pushing.
> >
> > > Maybe this just isn't clear, but would EXCLUSIVE block writes from all
> > > other sessions then?
> >
> > I don't think it should (which implies that EXCLUSIVE is a bad name).
>
> Agreed, EXCLUSIVE was used to mean an _exclusive_ writer. The new words
> I proposed were PRESERVE or STABLE.

This seems to seriously limit the usefulness, though. You'll only want
to use EXCLUSIVE/PRESERVE/STABLE when you've got a specific set of DML
to do, that you know you can recover from. But if at the same time some
other part of the system could be doing what it thinks will be ACID DML
to that same table, you're now in trouble.

At a minimum that would need to be clearly spelled out in the docs. I
think it also makes a very strong use-case for exposing table-level
shared locks as well, since that would at least allow other backends to
continue reading from the table.

Idealistically, if EXCLUSIVE/PRESERVE/STABLE does it's thing by only
appending new pages, it would be nice if other backends could continue
performing updates at the same time, assuming there's free space
available elsewhere within the table (and that you'd be able to recover
those logged changes regardless of the non-logged operations). But
that's a pretty lofty goal...
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, simon(at)2ndquadrant(dot)com, kleptog(at)svana(dot)org, gsstark(at)mit(dot)edu, pg(at)rbt(dot)ca, zhouqq(at)cs(dot)toronto(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2006-01-03 21:20:47
Message-ID: 200601032120.k03LKl609990@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jim C. Nasby wrote:
> > > I don't think it should (which implies that EXCLUSIVE is a bad name).
> >
> > Agreed, EXCLUSIVE was used to mean an _exclusive_ writer. The new words
> > I proposed were PRESERVE or STABLE.
>
> This seems to seriously limit the usefulness, though. You'll only want
> to use EXCLUSIVE/PRESERVE/STABLE when you've got a specific set of DML
> to do, that you know you can recover from. But if at the same time some
> other part of the system could be doing what it thinks will be ACID DML
> to that same table, you're now in trouble.
>
> At a minimum that would need to be clearly spelled out in the docs. I
> think it also makes a very strong use-case for exposing table-level
> shared locks as well, since that would at least allow other backends to
> continue reading from the table.

We would be creating a new lock type for this.

> Idealistically, if EXCLUSIVE/PRESERVE/STABLE does it's thing by only
> appending new pages, it would be nice if other backends could continue
> performing updates at the same time, assuming there's free space
> available elsewhere within the table (and that you'd be able to recover
> those logged changes regardless of the non-logged operations). But
> that's a pretty lofty goal...

"Idealistically", yep. It would be great if we could put a helmet on
and the computer would read your mind. :-)

Basically meaning your idea of update while EXCLUSIVE/PRESERVE/STABLE is
happening is never going to be implemented because it is just too hard
to do, and too prone to error.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, simon(at)2ndquadrant(dot)com, kleptog(at)svana(dot)org, gsstark(at)mit(dot)edu, pg(at)rbt(dot)ca, zhouqq(at)cs(dot)toronto(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2006-01-03 21:27:50
Message-ID: 20060103212750.GT82560@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jan 03, 2006 at 04:20:47PM -0500, Bruce Momjian wrote:
> Jim C. Nasby wrote:
> > > > I don't think it should (which implies that EXCLUSIVE is a bad name).
> > >
> > > Agreed, EXCLUSIVE was used to mean an _exclusive_ writer. The new words
> > > I proposed were PRESERVE or STABLE.
> >
> > This seems to seriously limit the usefulness, though. You'll only want
> > to use EXCLUSIVE/PRESERVE/STABLE when you've got a specific set of DML
> > to do, that you know you can recover from. But if at the same time some
> > other part of the system could be doing what it thinks will be ACID DML
> > to that same table, you're now in trouble.
> >
> > At a minimum that would need to be clearly spelled out in the docs. I
> > think it also makes a very strong use-case for exposing table-level
> > shared locks as well, since that would at least allow other backends to
> > continue reading from the table.
>
> We would be creating a new lock type for this.

Sorry if I've just missed this in the thread, but what would the new
lock type do? My impression is that as it stands you can either do:

BEGIN;
ALTER TABLE EXCLUSIVE;
...
ALTER TABLE SHARE; --fsync
COMMIT;

Which would block all other access to the table as soon as the first
ALTER TABLE happens. Or you can:

ALTER TABLE EXCLUSIVE;
...
ALTER TABLE SHARE;

Which means that between the two ALTER TABLES every backend that does
DML on that table will not have that DML logged, but because there's no
exclusive lock that DML would be allowed to occur.

BTW, there might be some usecase for the second scenario, in which case
it would probably be better to tell the user to aquire a table-lock on
their own rather than do it automatically as part of the update...

> Basically meaning your idea of update while EXCLUSIVE/PRESERVE/STABLE is
> happening is never going to be implemented because it is just too hard
> to do, and too prone to error.

What I figured. Never hurts to ask though. :)
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, simon(at)2ndquadrant(dot)com, kleptog(at)svana(dot)org, gsstark(at)mit(dot)edu, pg(at)rbt(dot)ca, zhouqq(at)cs(dot)toronto(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2006-01-03 22:38:25
Message-ID: 200601032238.k03McP804163@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jim C. Nasby wrote:
> > We would be creating a new lock type for this.
>
> Sorry if I've just missed this in the thread, but what would the new
> lock type do? My impression is that as it stands you can either do:
>
> BEGIN;
> ALTER TABLE EXCLUSIVE;
> ...
> ALTER TABLE SHARE; --fsync
> COMMIT;
>
> Which would block all other access to the table as soon as the first
> ALTER TABLE happens. Or you can:
>
> ALTER TABLE EXCLUSIVE;
> ...
> ALTER TABLE SHARE;
>
> Which means that between the two ALTER TABLES every backend that does
> DML on that table will not have that DML logged, but because there's no
> exclusive lock that DML would be allowed to occur.

Right, the DML will be single-threaded and fsync of all dirty pages will
happen before commit of each transaction.

> BTW, there might be some usecase for the second scenario, in which case
> it would probably be better to tell the user to aquire a table-lock on
> their own rather than do it automatically as part of the update...

> > Basically meaning your idea of update while EXCLUSIVE/PRESERVE/STABLE is
> > happening is never going to be implemented because it is just too hard
> > to do, and too prone to error.
>
> What I figured. Never hurts to ask though. :)

Actually, it does hurt because it generates discussion volume for no
purpose.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Greg Stark <gsstark(at)mit(dot)edu>, Rod Taylor <pg(at)rbt(dot)ca>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2006-01-03 22:53:53
Message-ID: 1136328833.5052.223.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2005-12-30 at 16:14 -0500, Bruce Momjian wrote:
> Simon Riggs wrote:
> > The implications of EXCLUSIVE are:
> > - there will be a check on each and every I, U, D to check the state of
> > the relation
> > - *every* operation that attempts a write lock will attempt to acquire
> > an EXCLUSIVE full table lock instead
> > - following successful completion of *each* DML statement, the relation
> > will be heap_sync'd involving a full scan of the buffer cache
>
> Yes, I think that is it. What we can do is implement EXCLUSIVE to
> affect only COPY at this point, and document that, and later add other
> commands.
>
> > Can I clarify the wording of the syntax? Is EXCLUSIVE the right word?
> > How about FASTLOAD or BULKLOAD? Those words seem less likely to be
> > misused in the future - i.e. we are invoking a special mode, rather than
> > invoking a special "go faster" option.
>
> The problem with the FASTLOAD/BULKLOAD words is that EXCLUSIVE mode is
> probably not the best for loading. I would think TRUNCATE would be a
> better option.
>
> In fact, in loading a table, I think both EXCLUSIVE and TRUNCATE would be
> the same, mostly. You would create the table, set its RELIABILITY to
> TRUNCATE, COPY into the table, then set the RELIABILITY to SHARE or
> DEFAULT. The second ALTER has to sync all the dirty data blocks, which
> the same thing EXCLUSIVE does at the conclusion of COPY.
>
> So, we need a name for EXCLUSIVE mode that suggests how it is different
> from TRUNCATE, and in this case, the difference is that EXCLUSIVE
> preserves the previous contents of the table on recovery, while TRUNCATE
> does not. Do you want to call the mode PRESERVE, or EXCLUSIVE WRITER?
> Anyway, the keywords are easy to modify, even after the patch is
> submitted. FYI, I usually go through keywords.c looking for a keyword
> we already use.

I'm very happy for suggestions on what these new modes are called.

> > > So, to summarize, I think we should add DROP/TRUNCATE, and use that by
> > > default (or optionally off?) in pg_dump, and, assuming we want EXCLUSIVE
> > > for more than just COPY, we need to add ALTER TABLE EXCLUSIVE.
> >
> > Would you mind stating again what you mean, just so I can understand
> > this? Your summary isn't enough.
>
> New ALTER TABLE mode, perhaps call it PERSISTENCE:
>
> ALTER TABLE tab PERSISTENCE DROP ON RECOVERY
> ALTER TABLE tab PERSISTENCE TRUNCATE ON RECOVERY
>
> These would drop or truncate all tables with this flag on a non-clean
> start of the postmaster, and write something in the server logs.
> However, I don't know that we have the code in place to DROP/TRUNCATE in
> recovery mode, and it would affect all databases, so it could be quite
> complex to implement. In this mode, no WAL logs would be written for
> table modifications, though DDL commands would have to be logged.

Right now, this will be a TODO item... it looks like it will take some
thought to implement correctly.

> ALTER TABLE tab PERSISTENCE PRESERVE (or STABLE?)
>
> Table contents are preserved across recoveries, but data modifications
> can happen only one at a time. I don't think we have a lock mode that
> does this, so I am worried a new lock mode will have to be created. A
> simplified solution at this stage would be to take an exclusive lock on
> the table, but really we just need a single-writer table lock, which I
> don't think we have. initially this can implemented to only affect COPY
> but later can be done for other commands.

ExclusiveLock locks out everything apart from readers, no new lock mode
AFAICS. Implementing that is little additional work for COPY.

Tom had a concern about setting this for I, U, D commands via the
executor. Not sure what the details of that are, as yet.

We can use either of the unlogged modes for pg_dump, so I'd suggest its
this one. Everybody happy with this being the new default in pg_dump, or
should it be an option?

> ALTER TABLE tab PERSISTENCE DEFAULT
>
> This would be our current default mode, which is full concurrency and
> persistence.

I'm thinking whether the ALTER TABLE statement might be better with two
bool flags rather than a 3-state char.

flag 1: ENABLE LOGGING | DISABLE LOGGING

flag 2: FULL RECOVERY | TRUNCATE ON RECOVERY

Giving 3 possible sets of options:

-- the default
ALTER TABLE mytable ENABLE LOGGING FULL RECOVERY; (default)

-- EXCLUSIVE mode
ALTER TABLE mytable DISABLE LOGGING FULL RECOVERY;
...which would be used like this
ALTER TABLE mytable DISABLE LOGGING;
COPY or other bulk data manipulation SQL
ALTER TABLE mytable ENABLE LOGGING;
...since FULL RECOVERY is the default.

-- multiuser temp table mode
ALTER TABLE mytable DISABLE LOGGING TRUNCATE ON RECOVERY;
...which would usually be left on all the time

which only uses one new keyword LOGGING and yet all the modes are fairly
explicit as to what they do.

An alternative might be the slightly more verbose:
ALTER TABLE mytable DISABLE LOGGING FORCE EXCLUSIVE TABLE LOCK;
which would be turned off by
ALTER TABLE mytable ENABLE LOGGING;

Comments?

Best Regards, Simon Riggs


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, kleptog(at)svana(dot)org, gsstark(at)mit(dot)edu, pg(at)rbt(dot)ca, zhouqq(at)cs(dot)toronto(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2006-01-03 23:10:16
Message-ID: 1136329816.5052.239.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2006-01-03 at 16:20 -0500, Bruce Momjian wrote:
> Jim C. Nasby wrote:

> > Idealistically, if EXCLUSIVE/PRESERVE/STABLE does it's thing by only
> > appending new pages, it would be nice if other backends could continue
> > performing updates at the same time, assuming there's free space
> > available elsewhere within the table (and that you'd be able to recover
> > those logged changes regardless of the non-logged operations). But
> > that's a pretty lofty goal...
>
> "Idealistically", yep. It would be great if we could put a helmet on
> and the computer would read your mind. :-)
>
> Basically meaning your idea of update while EXCLUSIVE/PRESERVE/STABLE is
> happening is never going to be implemented because it is just too hard
> to do, and too prone to error.

The reason for locking the whole table was to ensure that we do not have
a mixture of logged and non-logged writers writing to the same data
blocks, since that could damage blocks unrecoverably in the event of a
crash. (Though perhaps only if full_block_writes is on)

The ALTER TABLE .. EXCLUSIVE/(insert name) mode would mean that *any*
backend who took a write lock on the table, would lock out the whole
table. So this new mode is not restricted to the job/user who ran the
ALTER TABLE command. (I would note that that is how Oracle and Teradata
do this for pre-load utility table locking, but why should we follow
them on that?)

Currently, when we add a new row when the FSM is empty, we check the
last block of the table. That would cause multiple writers to access the
same blocks and so we would be in danger. The only way to avoid that
would be for logged writers (who would use the FSM if it were not empty)
to notify back to the FSM that they have just added a block - and remove
the behaviour to look for the last block.

Anyway, one step at a time. *Maybe* we can do that in the future, but
right now I'd like to add the basic fast write/load functionality.

Also, I think I will do the docs first this time, just so everyone can
read what we're getting ahead of time, to ensure we all agree.

Best Regards, Simon Riggs


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, kleptog(at)svana(dot)org, gsstark(at)mit(dot)edu, pg(at)rbt(dot)ca, zhouqq(at)cs(dot)toronto(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2006-01-03 23:23:54
Message-ID: 1136330634.5052.247.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2006-01-03 at 17:38 -0500, Bruce Momjian wrote:

> Right, the DML will be single-threaded and fsync of all dirty pages will
> happen before commit of each transaction.

heap_sync() would occur at end of statement, as it does with CTAS. We
could delay until EOT but I'm not sure I see why; in most cases they'd
be the same point anyway.

I'd been toying with the idea of making the freshly added blocks live
only in temp_buffers to avoid the shared_buffers overhead, but that was
starting to sounds too wierd for my liking.

Best Regards, Simon Riggs


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Michael Paesold <mpaesold(at)gmx(dot)at>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, andrew(at)dunslane(dot)net, tgl(at)sss(dot)pgh(dot)pa(dot)us, kleptog(at)svana(dot)org, gsstark(at)mit(dot)edu, pg(at)rbt(dot)ca, zhouqq(at)cs(dot)toronto(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2006-01-03 23:58:09
Message-ID: 1136332689.5052.263.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, 2005-12-31 at 12:59 +0100, Michael Paesold wrote:
> Bruce Momjian wrote:
>
> > > The --single-transaction mode would apply even if the dump was created
> > > using an earlier version of pg_dump. pg_dump has *not* been altered at
> > > all. (And I would again add that the idea was not my own)
> >
> > I assume you mean this:
> >
> > http://archives.postgresql.org/pgsql-patches/2005-12/msg00257.php
> >
> > I guess with the ALTER commands I don't see much value in the
> > --single-transaction flag. I am sure others suggested it, but would
> > they suggest it now given our current direction.
>
> I just want to add that --single-transaction has a value of it's own. There
> were times when I wanted to restore parts of a dump all-or-nothing.
>
> This is possible with PostgreSQL, unlike many other DBM systems, because
> people like Tom Lane have invested in ensuring that all DDL is working
> without implicitly committing an enclosing transaction.
>
> Using pg_restore directly into a database, it is not possible to get a
> single transaction right now. One has to restore to a file and manually
> added BEGIN/COMMIT. Just for that I think --single-transaction is a great
> addition and a missing feature.
>
> I think more people have a use-case for that.

I did originally separate the --single-transaction patch for this
reason. I think its a valid patch on its own and its wrapped and ready
to go, with some deletions from the doc patch.

Best Regards, Simon Riggs


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Greg Stark <gsstark(at)mit(dot)edu>, Rod Taylor <pg(at)rbt(dot)ca>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2006-01-04 00:11:55
Message-ID: 1136333515.5052.273.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2005-12-29 at 11:37 -0500, Bruce Momjian wrote:
> Having COPY behave differently because it is
> in a transaction is fine as long as it is user-invisible, but once you
> require users to do that to get the speedup, it isn't user-invisible
> anymore.

Since we're agreed on adding ALTER TABLE rather than COPY LOCK, we have
our explicit mechanism for speedup.

However, it costs a single line of code and very very little execution
time to add in the optimization to COPY to make it bypass WAL when
executed in the same transaction that created the table. Everything else
is already there.

As part of the use_wal test:
+ if (resultRelInfo->ri_NumIndices == 0 &&
+ !XLogArchivingActive() &&
>> (cstate->rel->rd_createSubid != InvalidSubTransactionId ))
+ use_wal = false;

the value is already retrieved from cache...

Can anyone see a reason *not* to put that change in also? We just don't
advertise it as the "suggested" route to gaining performance, nor would
we rely on it for pg_dump/restore performance.

Best Regards, Simon Riggs


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2006-01-04 01:16:19
Message-ID: 200601031716.20102.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce,

> > > Basically meaning your idea of update while
> > > EXCLUSIVE/PRESERVE/STABLE is happening is never going to be
> > > implemented because it is just too hard to do, and too prone to
> > > error.
> >
> > What I figured. Never hurts to ask though. :):)
>
> Actually, it does hurt because it generates discussion volume for no
> purpose.

Zowie!!

Surely you didn't mean that the way it sounded?

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: josh(at)agliodbs(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2006-01-04 04:24:09
Message-ID: 200601040424.k044O9k13992@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Josh Berkus wrote:
> Bruce,
>
> > > > Basically meaning your idea of update while
> > > > EXCLUSIVE/PRESERVE/STABLE is happening is never going to be
> > > > implemented because it is just too hard to do, and too prone to
> > > > error.
> > >
> > > What I figured. Never hurts to ask though. :):)
> >
> > Actually, it does hurt because it generates discussion volume for no
> > purpose.
>
> Zowie!!
>
> Surely you didn't mean that the way it sounded?

Actually, I did. Throwing out random ideas that have little useful
purpose and just confuse the discussion is not helpful.

"Wouldn't it be nice if PostgreSQL allowed commands to be typed
backwards so people could program by looking at the screen through a
mirror" is not helpful.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Greg Stark <gsstark(at)mit(dot)edu>, Rod Taylor <pg(at)rbt(dot)ca>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2006-01-05 17:22:28
Message-ID: 200601051722.k05HMSM02052@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs wrote:
> > So, we need a name for EXCLUSIVE mode that suggests how it is different
> > from TRUNCATE, and in this case, the difference is that EXCLUSIVE
> > preserves the previous contents of the table on recovery, while TRUNCATE
> > does not. Do you want to call the mode PRESERVE, or EXCLUSIVE WRITER?
> > Anyway, the keywords are easy to modify, even after the patch is
> > submitted. FYI, I usually go through keywords.c looking for a keyword
> > we already use.
>
> I'm very happy for suggestions on what these new modes are called.
>
> > > > So, to summarize, I think we should add DROP/TRUNCATE, and use that by
> > > > default (or optionally off?) in pg_dump, and, assuming we want EXCLUSIVE
> > > > for more than just COPY, we need to add ALTER TABLE EXCLUSIVE.
> > >
> > > Would you mind stating again what you mean, just so I can understand
> > > this? Your summary isn't enough.
> >
> > New ALTER TABLE mode, perhaps call it PERSISTENCE:
> >
> > ALTER TABLE tab PERSISTENCE DROP ON RECOVERY
> > ALTER TABLE tab PERSISTENCE TRUNCATE ON RECOVERY
> >
> > These would drop or truncate all tables with this flag on a non-clean
> > start of the postmaster, and write something in the server logs.
> > However, I don't know that we have the code in place to DROP/TRUNCATE in
> > recovery mode, and it would affect all databases, so it could be quite
> > complex to implement. In this mode, no WAL logs would be written for
> > table modifications, though DDL commands would have to be logged.
>
> Right now, this will be a TODO item... it looks like it will take some
> thought to implement correctly.

OK, I know my suggestions have made it more complicated.

TODO added:

* Allow control over which tables are WAL-logged

Allow tables to bypass WAL writes and just fsync() dirty pages on
commit. To do this, only a single writer can modify the table, and
writes must happen only on new pages. Readers can continue accessing
the table. This would affect COPY, and perhaps INSERT/UPDATE too.
Another option is to avoid transaction logging entirely and truncate
or drop the table on crash recovery. These should be implemented
using ALTER TABLE, e.g. ALTER TABLE PERSISTENCE [ DROP | TRUNCATE |
STABLE | DEFAULT ]. Tables using non-default logging should not use
referential integrity with default-logging tables, and tables using
stable logging probably can not have indexes. [walcontrol]

> > ALTER TABLE tab PERSISTENCE PRESERVE (or STABLE?)
> >
> > Table contents are preserved across recoveries, but data modifications
> > can happen only one at a time. I don't think we have a lock mode that
> > does this, so I am worried a new lock mode will have to be created. A
> > simplified solution at this stage would be to take an exclusive lock on
> > the table, but really we just need a single-writer table lock, which I
> > don't think we have. initially this can implemented to only affect COPY
> > but later can be done for other commands.
>
> ExclusiveLock locks out everything apart from readers, no new lock mode
> AFAICS. Implementing that is little additional work for COPY.

Nice.

> Tom had a concern about setting this for I, U, D commands via the
> executor. Not sure what the details of that are, as yet.

That is much more complicated than the COPY-only idea, for sure. I am
thinking we could add the ALTER syntax and just do COPY at this stage,
meaning that I/U/D still do full logging until we get to improving them.
The big benefit is that the user API doesn't need to change when we
improve the code. In fact I think we could do the TRUNCATE/DROP easily
for I/U/D, but the STABLE option would require work and we don't need to
implement it in the first patch.

> We can use either of the unlogged modes for pg_dump, so I'd suggest its
> this one. Everybody happy with this being the new default in pg_dump, or
> should it be an option?
>
> > ALTER TABLE tab PERSISTENCE DEFAULT
> >
> > This would be our current default mode, which is full concurrency and
> > persistence.
>
> I'm thinking whether the ALTER TABLE statement might be better with two
> bool flags rather than a 3-state char.
>
> flag 1: ENABLE LOGGING | DISABLE LOGGING
>
> flag 2: FULL RECOVERY | TRUNCATE ON RECOVERY
>
> Giving 3 possible sets of options:
>
> -- the default
> ALTER TABLE mytable ENABLE LOGGING FULL RECOVERY; (default)
>
> -- EXCLUSIVE mode
> ALTER TABLE mytable DISABLE LOGGING FULL RECOVERY;
> ...which would be used like this
> ALTER TABLE mytable DISABLE LOGGING;
> COPY or other bulk data manipulation SQL
> ALTER TABLE mytable ENABLE LOGGING;
> ...since FULL RECOVERY is the default.
>
> -- multiuser temp table mode
> ALTER TABLE mytable DISABLE LOGGING TRUNCATE ON RECOVERY;
> ...which would usually be left on all the time
>
> which only uses one new keyword LOGGING and yet all the modes are fairly
> explicit as to what they do.
>
> An alternative might be the slightly more verbose:
> ALTER TABLE mytable DISABLE LOGGING FORCE EXCLUSIVE TABLE LOCK;
> which would be turned off by
> ALTER TABLE mytable ENABLE LOGGING;
>
> Comments?

I had the same idea originally, but avoided it because the logging
really does affect what other options you can use. For example, if you
want truncate on recovery, you certainly do not want logging, so it
seems the options are not really independent. In fact if someone asks
for truncate on recovery, do we automatically turn off logging for them,
or throw an error, or a warning. It just seemed too error-prone and
confusing, though perhaps more logical. Of course, if others like the
above, we can do it.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Greg Stark <gsstark(at)mit(dot)edu>, Rod Taylor <pg(at)rbt(dot)ca>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2006-01-05 17:27:05
Message-ID: 200601051727.k05HR5p02803@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs wrote:
> On Thu, 2005-12-29 at 11:37 -0500, Bruce Momjian wrote:
> > Having COPY behave differently because it is
> > in a transaction is fine as long as it is user-invisible, but once you
> > require users to do that to get the speedup, it isn't user-invisible
> > anymore.
>
> Since we're agreed on adding ALTER TABLE rather than COPY LOCK, we have
> our explicit mechanism for speedup.
>
> However, it costs a single line of code and very very little execution
> time to add in the optimization to COPY to make it bypass WAL when
> executed in the same transaction that created the table. Everything else
> is already there.
>
> As part of the use_wal test:
> + if (resultRelInfo->ri_NumIndices == 0 &&
> + !XLogArchivingActive() &&
> >> (cstate->rel->rd_createSubid != InvalidSubTransactionId ))
> + use_wal = false;
>
> the value is already retrieved from cache...
>
> Can anyone see a reason *not* to put that change in also? We just don't
> advertise it as the "suggested" route to gaining performance, nor would
> we rely on it for pg_dump/restore performance.

Seems like a nice optimization.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Greg Stark <gsstark(at)mit(dot)edu>, Rod Taylor <pg(at)rbt(dot)ca>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2006-01-05 21:56:21
Message-ID: 1136498181.21025.285.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2006-01-05 at 12:27 -0500, Bruce Momjian wrote:

> Seems like a nice optimization.

Negative thoughts: Toast tables have a toast index on them, yes? We have
agreed that we cannot use the optimization if we have indexes on the
main table. It follows that we cannot use the optimization if we have
*any* toasted data, since that would require a pointer between two
blocks, which would not be correctly recovered following a crash. If we
log the toast table then there could be a mismatch between heap and
toast table; if we don't log the toast table there could be a mismatch
between toast table and toast index.

We can test to see if the toast table is empty when we do ALTER TABLE,
but loading operations may try to create toasted data rows.

Presumably that means we must either:
i) abort a COPY if we get a toastable value
ii) if we get a toastable value, insert the row into a new block, which
we do logging of, then also log the toast insert and the toast index
insert - i.e. some blocks we log, others not

This is still useful for many applications, IMHO, but the list of
restrictions seems to be growing. Worse, we wouldn't know that the toast
tables were empty until after we did the COPY TO for a pg_dump, so we
wouldn't be able to retrospectively add an ALTER TABLE command ahead of
the COPY.

Thoughts? Hopefully there are some flaws in my thinking here,

Best Regards, Simon Riggs


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Greg Stark <gsstark(at)mit(dot)edu>, Rod Taylor <pg(at)rbt(dot)ca>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2006-02-04 03:29:48
Message-ID: 200602040329.k143Tmw02546@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Based on this, I think we should just implement the TRUNCATE/DROP option
for the table, and avoid the idea of allowing non-logged operations on a
table that has any data we want recovered after a crash.

---------------------------------------------------------------------------

Simon Riggs wrote:
> On Thu, 2006-01-05 at 12:27 -0500, Bruce Momjian wrote:
>
> > Seems like a nice optimization.
>
> Negative thoughts: Toast tables have a toast index on them, yes? We have
> agreed that we cannot use the optimization if we have indexes on the
> main table. It follows that we cannot use the optimization if we have
> *any* toasted data, since that would require a pointer between two
> blocks, which would not be correctly recovered following a crash. If we
> log the toast table then there could be a mismatch between heap and
> toast table; if we don't log the toast table there could be a mismatch
> between toast table and toast index.
>
> We can test to see if the toast table is empty when we do ALTER TABLE,
> but loading operations may try to create toasted data rows.
>
> Presumably that means we must either:
> i) abort a COPY if we get a toastable value
> ii) if we get a toastable value, insert the row into a new block, which
> we do logging of, then also log the toast insert and the toast index
> insert - i.e. some blocks we log, others not
>
> This is still useful for many applications, IMHO, but the list of
> restrictions seems to be growing. Worse, we wouldn't know that the toast
> tables were empty until after we did the COPY TO for a pg_dump, so we
> wouldn't be able to retrospectively add an ALTER TABLE command ahead of
> the COPY.
>
> Thoughts? Hopefully there are some flaws in my thinking here,
>
> Best Regards, Simon Riggs
>
>
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Greg Stark <gsstark(at)mit(dot)edu>, Rod Taylor <pg(at)rbt(dot)ca>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2006-02-07 00:07:57
Message-ID: 1139270877.1258.61.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2006-02-03 at 22:29 -0500, Bruce Momjian wrote:

> Based on this, I think we should just implement the TRUNCATE/DROP option
> for the table, and avoid the idea of allowing non-logged operations on a
> table that has any data we want recovered after a crash.

Well the only other option is this:

Implement an UNDO at abort like we do for DDL, which would truncate the
table back down to the starting point. That can be made to work for both
cases.

In addition if the starting point was > 0 then we'd need to perform a
VACUUM style operation to remove any index pointers with tids into the
to-be-truncated blocks. That would then make it work for the
with-indexes and/or with-toast cases.

If starting point == 0 we would just truncate the indexes and toast
stuff too.

Most importantly we'd need to do this at recovery time. That bit will
take a bit of work to make it happen right, but seems doable.

So we cover both cases at once, using one lot of logic. But there is a
fair amount of work there, so I'll need to consider whether its 8.2
material or not.

Best Regards, Simon Riggs


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Greg Stark <gsstark(at)mit(dot)edu>, Rod Taylor <pg(at)rbt(dot)ca>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2006-02-07 02:08:08
Message-ID: 200602070208.k17288K04692@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


I have split up the TODO items as:

* Allow WAL logging to be turned off for a table, but the table
might be dropped or truncated during crash recovery [walcontrol]

Allow tables to bypass WAL writes and just fsync() dirty pages on
commit. This should be implemented using ALTER TABLE, e.g. ALTER
TABLE PERSISTENCE [ DROP | TRUNCATE | DEFAULT ]. Tables using
non-default logging should not use referential integrity with
default-logging tables. A table without dirty buffers during a
crash could perhaps avoid the drop/truncate.

* Allow WAL logging to be turned off for a table, but the table would
avoid being truncated/dropped [walcontrol]

To do this, only a single writer can modify the table, and writes
must happen only on new pages so the new pages can be removed during
crash recovery. Readers can continue accessing the table. Such
tables probably cannot have indexes. One complexity is the handling
of indexes on TOAST tables.

I think the first one is possible, while the second has many
complexities that make its implementation suspect.

---------------------------------------------------------------------------

Simon Riggs wrote:
> On Fri, 2006-02-03 at 22:29 -0500, Bruce Momjian wrote:
>
> > Based on this, I think we should just implement the TRUNCATE/DROP option
> > for the table, and avoid the idea of allowing non-logged operations on a
> > table that has any data we want recovered after a crash.
>
> Well the only other option is this:
>
> Implement an UNDO at abort like we do for DDL, which would truncate the
> table back down to the starting point. That can be made to work for both
> cases.
>
> In addition if the starting point was > 0 then we'd need to perform a
> VACUUM style operation to remove any index pointers with tids into the
> to-be-truncated blocks. That would then make it work for the
> with-indexes and/or with-toast cases.
>
> If starting point == 0 we would just truncate the indexes and toast
> stuff too.
>
> Most importantly we'd need to do this at recovery time. That bit will
> take a bit of work to make it happen right, but seems doable.
>
> So we cover both cases at once, using one lot of logic. But there is a
> fair amount of work there, so I'll need to consider whether its 8.2
> material or not.
>
> Best Regards, Simon Riggs
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Greg Stark <gsstark(at)mit(dot)edu>, Rod Taylor <pg(at)rbt(dot)ca>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2006-02-07 02:24:31
Message-ID: 43E804DF.5020408@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> * Allow WAL logging to be turned off for a table, but the table
> might be dropped or truncated during crash recovery [walcontrol]
>
> Allow tables to bypass WAL writes and just fsync() dirty pages on
> commit. This should be implemented using ALTER TABLE, e.g. ALTER
> TABLE PERSISTENCE [ DROP | TRUNCATE | DEFAULT ]. Tables using
> non-default logging should not use referential integrity with
> default-logging tables. A table without dirty buffers during a
> crash could perhaps avoid the drop/truncate.

This would be such a sweet feature for website session tables...

Chris


From: Doug McNaught <doug(at)mcnaught(dot)org>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Greg Stark <gsstark(at)mit(dot)edu>, Rod Taylor <pg(at)rbt(dot)ca>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2006-02-07 02:27:45
Message-ID: 87u0bbx5r2.fsf@asmodeus.mcnaught.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:

> TABLE PERSISTENCE [ DROP | TRUNCATE | DEFAULT ]. Tables using
> non-default logging should not use referential integrity with
> default-logging tables.

I have to say this smells way too much like MySQL for me to feel
comfortable. But that's just my opinion. :)

-Doug


From: Rick Gigger <rick(at)alpinenetworking(dot)com>
To: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Greg Stark <gsstark(at)mit(dot)edu>, Rod Taylor <pg(at)rbt(dot)ca>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2006-02-07 04:07:41
Message-ID: D8417238-847B-4591-A730-899206F3B760@alpinenetworking.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I was thinking the exact same thing. Except the "and just fsync()
dirty pages on commit" part. Wouldn't that actually make the
situation worse? I thought the whole point of WAL was that it was
more efficient to fsync all of the changes in one sequential write in
one file rather than fsyncing all of the separate dirty pages.

On Feb 6, 2006, at 7:24 PM, Christopher Kings-Lynne wrote:

>> * Allow WAL logging to be turned off for a table, but the table
>> might be dropped or truncated during crash recovery [walcontrol]
>>
>> Allow tables to bypass WAL writes and just fsync() dirty pages on
>> commit. This should be implemented using ALTER TABLE, e.g. ALTER
>> TABLE PERSISTENCE [ DROP | TRUNCATE | DEFAULT ]. Tables using
>> non-default logging should not use referential integrity with
>> default-logging tables. A table without dirty buffers during a
>> crash could perhaps avoid the drop/truncate.
>
> This would be such a sweet feature for website session tables...
>
> Chris
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Rick Gigger <rick(at)alpinenetworking(dot)com>
Cc: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Greg Stark <gsstark(at)mit(dot)edu>, Rod Taylor <pg(at)rbt(dot)ca>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2006-02-07 04:13:59
Message-ID: 200602070413.k174DxY07346@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Rick Gigger wrote:
> I was thinking the exact same thing. Except the "and just fsync()
> dirty pages on commit" part. Wouldn't that actually make the
> situation worse? I thought the whole point of WAL was that it was
> more efficient to fsync all of the changes in one sequential write in
> one file rather than fsyncing all of the separate dirty pages.

Uh, supposedly the WAL traffic is not as efficient as fsyncing whole
pages if you are doing lots of full pages.

---------------------------------------------------------------------------

>
> On Feb 6, 2006, at 7:24 PM, Christopher Kings-Lynne wrote:
>
> >> * Allow WAL logging to be turned off for a table, but the table
> >> might be dropped or truncated during crash recovery [walcontrol]
> >>
> >> Allow tables to bypass WAL writes and just fsync() dirty pages on
> >> commit. This should be implemented using ALTER TABLE, e.g. ALTER
> >> TABLE PERSISTENCE [ DROP | TRUNCATE | DEFAULT ]. Tables using
> >> non-default logging should not use referential integrity with
> >> default-logging tables. A table without dirty buffers during a
> >> crash could perhaps avoid the drop/truncate.
> >
> > This would be such a sweet feature for website session tables...
> >
> > Chris
> >
> >
> > ---------------------------(end of
> > broadcast)---------------------------
> > TIP 2: Don't 'kill -9' the postmaster
> >
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Rick Gigger <rick(at)alpinenetworking(dot)com>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Greg Stark <gsstark(at)mit(dot)edu>, Rod Taylor <pg(at)rbt(dot)ca>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2006-02-07 06:51:13
Message-ID: DF68221A-EE93-4F19-8872-9C2A3BF3F3A3@alpinenetworking.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Rick Gigger wrote:
>> I was thinking the exact same thing. Except the "and just fsync()
>> dirty pages on commit" part. Wouldn't that actually make the
>> situation worse? I thought the whole point of WAL was that it was
>> more efficient to fsync all of the changes in one sequential write in
>> one file rather than fsyncing all of the separate dirty pages.
>
> Uh, supposedly the WAL traffic is not as efficient as fsyncing whole
> pages if you are doing lots of full pages.

So then you would want to use this particular feature only when
updating/inserting large amounts of info at a time then?

For instance if you have a table with rows that are 12 bytes wide
using this feature would be very bad because it would fsync out a
whole bunch of extra data just to get those 12 bytes written. But on
a table that wrote out several k of data at a time it would help
because it would be filling up entire pages and not doing any
wasteful fsyncing?

I guess that probably would help session tables because it would
avoid writing the data twice.

In the case of session tables though I for one don't care if that
data is recoverable or not. If my database just crashed I've
probably got bigger problems then just dropped sessions. Would it be
possible to a) Not WAL log that table, b) not fsync that table at
all. Then if that table is in an inconsistent state just recreate
the table?

I'm guessing that is getting a little too crazy but I have been
thinking about setting up a separate postgres instance listening on a
different port with fsync off as a possible way to avoid the fsync
overhead for the sessions.

> ----------------------------------------------------------------------
> -----
>
>
>>
>> On Feb 6, 2006, at 7:24 PM, Christopher Kings-Lynne wrote:
>>
>>>> * Allow WAL logging to be turned off for a table, but the table
>>>> might be dropped or truncated during crash recovery [walcontrol]
>>>>
>>>> Allow tables to bypass WAL writes and just fsync() dirty
>>>> pages on
>>>> commit. This should be implemented using ALTER TABLE, e.g.
>>>> ALTER
>>>> TABLE PERSISTENCE [ DROP | TRUNCATE | DEFAULT ]. Tables using
>>>> non-default logging should not use referential integrity with
>>>> default-logging tables. A table without dirty buffers during a
>>>> crash could perhaps avoid the drop/truncate.
>>>
>>> This would be such a sweet feature for website session tables...
>>>
>>> Chris
>>>
>>>
>>> ---------------------------(end of
>>> broadcast)---------------------------
>>> TIP 2: Don't 'kill -9' the postmaster
>>>
>>
>>
>> ---------------------------(end of
>> broadcast)---------------------------
>> TIP 4: Have you searched our list archives?
>>
>> http://archives.postgresql.org
>>
>
> --
> Bruce Momjian | http://candle.pha.pa.us
> pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
> + If your life is a hard drive, | 13 Roberts Road
> + Christ can be your backup. | Newtown Square,
> Pennsylvania 19073
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Rick Gigger <rick(at)alpinenetworking(dot)com>
Cc: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Greg Stark <gsstark(at)mit(dot)edu>, Rod Taylor <pg(at)rbt(dot)ca>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2006-02-07 09:40:37
Message-ID: 1139305237.1258.98.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2006-02-06 at 21:07 -0700, Rick Gigger wrote:
> I was thinking the exact same thing. Except the "and just fsync()
> dirty pages on commit" part. Wouldn't that actually make the
> situation worse? I thought the whole point of WAL was that it was
> more efficient to fsync all of the changes in one sequential write in
> one file rather than fsyncing all of the separate dirty pages.

This would apply to only a single relation, so would be just as
efficient a write to the database as to WAL. The proposed route is to
sync to the database, but not to WAL, thus halving the required I/O.

Yes, its designed for large data loads.

Best Regards, Simon Riggs


From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Rick Gigger <rick(at)alpinenetworking(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Greg Stark <gsstark(at)mit(dot)edu>, Rod Taylor <pg(at)rbt(dot)ca>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2006-02-08 01:44:24
Message-ID: 43E94CF8.9070408@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> This would apply to only a single relation, so would be just as
> efficient a write to the database as to WAL. The proposed route is to
> sync to the database, but not to WAL, thus halving the required I/O.
>
> Yes, its designed for large data loads.

A question - would setting fsync=off while restoring a multi-gig dump
(during an upgrade) improve performance?

Chris


From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Rick Gigger <rick(at)alpinenetworking(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Greg Stark <gsstark(at)mit(dot)edu>, Rod Taylor <pg(at)rbt(dot)ca>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and
Date: 2006-02-08 02:00:51
Message-ID: 20060208020051.GB57004@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Feb 08, 2006 at 09:44:24AM +0800, Christopher Kings-Lynne wrote:
> >This would apply to only a single relation, so would be just as
> >efficient a write to the database as to WAL. The proposed route is to
> >sync to the database, but not to WAL, thus halving the required I/O.
> >
> >Yes, its designed for large data loads.
>
>
> A question - would setting fsync=off while restoring a multi-gig dump
> (during an upgrade) improve performance?

Yes.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461