Quick Links

unlogged tables

Lists:	pgsql-hackers

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	unlogged tables
Date:	2010-11-14 00:16:30
Message-ID:	AANLkTin0tGeMhXXEnDpFv3dpE4KvofV4YkYhM++qy_pe@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Here is a series of three patches related to unlogged tables.

1. The first one (relpersistence-v1) is a mostly mechanical patch that
replaces pg_class.relistemp (a Boolean) with pg_class.relpersistence
(a character), so that we can support more than two values. BE SURE
YOU INITDB, since the old catalog format will not work with this patch
applied.

2. The second one (unlogged-tables-v1) adds support for unlogged
tables by adding a new supported value for relpersistence. I made this
work by having backend that creates an unlogged relation write out an
"init" fork for that relation. The main fork is nuked and replaced by
the contents of the init fork during startup. But I haven't made this
code work yet for index types other than btree, so attempting to
define a non-btree index on an unlogged relation will currently result
in an error. I don't think that's probably too hard to fix, but I
haven't done it yet.

3. The third patch (relax-sync-commit-v1) allows asynchronous commit
even when synchronous_commit=on if the transaction has not written
WAL. Of course, a read-only transaction won't even have an XID and
therefore won't need a commit record, so what this is really doing is
allowing transactions that have written only to temp - or unlogged -
tables to commit asynchronously. This should be OK, because if the
system crashes before the commit record hits the disk, we haven't
really lost anything we wouldn't lose anyway: the temp tables will
disappear on restart, and the unlogged ones will be truncated. This
path actually could be applied independently of the first two, if I
adjusted the comments a bit.

Review and testing would be appreciated.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment	Content-Type	Size
relpersistence-v1.patch	application/octet-stream	65.3 KB
unlogged-tables-v1.patch	application/octet-stream	52.7 KB
relax-sync-commit-v1.patch	application/octet-stream	3.1 KB

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: unlogged tables
Date:	2010-11-14 00:41:59
Message-ID:	26250.1289695319@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> 2. The second one (unlogged-tables-v1) adds support for unlogged
> tables by adding a new supported value for relpersistence. I made this
> work by having backend that creates an unlogged relation write out an
> "init" fork for that relation. The main fork is nuked and replaced by
> the contents of the init fork during startup. But I haven't made this
> code work yet for index types other than btree, so attempting to
> define a non-btree index on an unlogged relation will currently result
> in an error. I don't think that's probably too hard to fix, but I
> haven't done it yet.

That seems pretty gross. If you're going to have to take a special
action at startup anyway, why wouldn't it take the form of "truncate,
then if it's an index, call the appropriate ambuild function"? Maybe
that's a bit ugly, but at least the ugliness is localized rather than
scribbled all over the filesystem. I'm also concerned about possible
failure modes having to do with the "init fork" being missing or
corrupted.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: unlogged tables
Date:	2010-11-14 00:54:11
Message-ID:	AANLkTi=xL3LbY7L2S3nSf0sVEicncRkYffsRRA=bZC=4@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Nov 13, 2010 at 7:41 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> 2. The second one (unlogged-tables-v1) adds support for unlogged
>> tables by adding a new supported value for relpersistence. I made this
>> work by having backend that creates an unlogged relation write out an
>> "init" fork for that relation. The main fork is nuked and replaced by
>> the contents of the init fork during startup. But I haven't made this
>> code work yet for index types other than btree, so attempting to
>> define a non-btree index on an unlogged relation will currently result
>> in an error. I don't think that's probably too hard to fix, but I
>> haven't done it yet.
>
> That seems pretty gross. If you're going to have to take a special
> action at startup anyway, why wouldn't it take the form of "truncate,
> then if it's an index, call the appropriate ambuild function"?

We've been over this ground before. You can't read from non-shared
catalogs without binding to a database, and you must reinitialize all
unlogged relations before opening the database for a connection. So
what you're proposing would involving launching a worker process for
each database in the cluster, having it do its thing and then exit,
and only after all that's done opening the database for connections.
That seems vastly more complex and less performant than what I've done
here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: unlogged tables
Date:	2010-11-14 00:59:35
Message-ID:	26619.1289696375@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Here is a series of three patches related to unlogged tables.
> 1. The first one (relpersistence-v1) is a mostly mechanical patch that
> replaces pg_class.relistemp (a Boolean) with pg_class.relpersistence
> (a character), so that we can support more than two values. BE SURE
> YOU INITDB, since the old catalog format will not work with this patch
> applied.

While I'm griping ... is there a really good reason to do it that way,
rather than adding a new column? This will break clients that are
looking at relistemp. Maybe there aren't any, but I wouldn't bet on
that, and it doesn't seem like you're buying a lot by creating this
incompatibility. I would also argue that temp-ness is a distinct
concept from logged-ness.

regards, tom lane

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: unlogged tables
Date:	2010-11-14 01:14:02
Message-ID:	4CDF37DA.1020009@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 11/13/2010 07:59 PM, Tom Lane wrote:
> I would also argue that temp-ness is a distinct
> concept from logged-ness.

I agree.

cheers

andrew

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: unlogged tables
Date:	2010-11-14 01:15:15
Message-ID:	26911.1289697315@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Sat, Nov 13, 2010 at 7:41 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> That seems pretty gross. If you're going to have to take a special
>> action at startup anyway, why wouldn't it take the form of "truncate,
>> then if it's an index, call the appropriate ambuild function"?

> We've been over this ground before. You can't read from non-shared
> catalogs without binding to a database, and you must reinitialize all
> unlogged relations before opening the database for a connection. So
> what you're proposing would involving launching a worker process for
> each database in the cluster, having it do its thing and then exit,
> and only after all that's done opening the database for connections.
> That seems vastly more complex and less performant than what I've done
> here.

The fact that it's easy doesn't make it workable. I would point out for
starters that AMs might (do) put WAL locations and/or XIDs into indexes.
Occasionally copying very old LSNs or XIDs back into active files seems
pretty dangerous.

Cleanup at first connection is something we've been avoiding for years,
but maybe it's time to bite the bullet and do that?

BTW, how will all of this activity look to a hot-standby slave?

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: unlogged tables
Date:	2010-11-14 01:55:48
Message-ID:	AANLkTi=VR=mC=JxwFOLz8O+Q+OVNtGKXhuj2YyqEL6hk@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Nov 13, 2010 at 7:59 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> Here is a series of three patches related to unlogged tables.
>> 1. The first one (relpersistence-v1) is a mostly mechanical patch that
>> replaces pg_class.relistemp (a Boolean) with pg_class.relpersistence
>> (a character), so that we can support more than two values. BE SURE
>> YOU INITDB, since the old catalog format will not work with this patch
>> applied.
>
> While I'm griping ... is there a really good reason to do it that way,
> rather than adding a new column? This will break clients that are
> looking at relistemp. Maybe there aren't any, but I wouldn't bet on
> that, and it doesn't seem like you're buying a lot by creating this
> incompatibility. I would also argue that temp-ness is a distinct
> concept from logged-ness.

I think that would be a recipe for bugs. Look at the three new macros
I introduced. If you keep relistemp around, then any code which
relies on it is likely testing for one of those three things, or maybe
even something subtly different from any of them, as in the cases
where I needed to add a switch statement. The way I see it, this is
ultimately a four-level hierarchy: permanent tables (write WAL, shared
buffers, ordinary namespace), unlogged tables (don't write WAL, shared
buffers, ordinary namespace), global temporary tables (don't write
WAL, local buffers, ordinary namespace), local temporary tables (don't
write WAL, local buffers, private namespace). Even if we don't end up
implementing global temporary tables in the way I'm envisioning (I
know you have an alternate proposal), it seem inevitable that a
boolean for relistemp isn't going to be sufficient.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: unlogged tables
Date:	2010-11-14 02:09:15
Message-ID:	AANLkTi=V=CWSuw3ESni=knbx1ZM8LscOn28Zr=0DEVE5@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Nov 13, 2010 at 8:15 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Sat, Nov 13, 2010 at 7:41 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> That seems pretty gross. If you're going to have to take a special
>>> action at startup anyway, why wouldn't it take the form of "truncate,
>>> then if it's an index, call the appropriate ambuild function"?
>
>> We've been over this ground before. You can't read from non-shared
>> catalogs without binding to a database, and you must reinitialize all
>> unlogged relations before opening the database for a connection. So
>> what you're proposing would involving launching a worker process for
>> each database in the cluster, having it do its thing and then exit,
>> and only after all that's done opening the database for connections.
>> That seems vastly more complex and less performant than what I've done
>> here.
>
> The fact that it's easy doesn't make it workable. I would point out for
> starters that AMs might (do) put WAL locations and/or XIDs into indexes.
> Occasionally copying very old LSNs or XIDs back into active files seems
> pretty dangerous.

I haven't examined the GIST, GIN, or hash index code in detail so I am
not sure whether there are any hazards there; the btree case does not
seem to have any issues of this type. Certainly, if an index AM puts
an XID into an empty index, that's gonna break. I would consider that
a pretty odd thing to do, though. An LSN seems less problematic since
the LSN space does not wrap; it should just look like an index that
was created a long time ago and never updated (which, in effect, it
is).

> Cleanup at first connection is something we've been avoiding for years,
> but maybe it's time to bite the bullet and do that?

There would certainly be some advantage to doing cleanup at first
connection even if we stick with the overall approach I've adopted
here, because you could avoid the overhead of cleaning up databases
that are never actually accessed. There are a few downsides, though.
If you happened to leave a large amount of unlogged data on disk after
a crash, and then for some reason never connected to that database
again, you'd never reclaim that disk space; although perhaps you could
somehow arrange for autovacuum to clean up in that case. Also, the
first connection to the offending database would need to lock out all
other connections until cleanup was completed, although I suppose
that's still better than doing the cleanup in the startup process as
is presently the case. I guess the main problem is you'd need a
reliable and *inexpensive* way of identifying the first connection to
each database. Paying something extra at startup time is better than
paying even a small penalty on each individual connection; goodness
knows our connections are expensive enough already.

> BTW, how will all of this activity look to a hot-standby slave?

The table will appear to exist but you'll get an error if you try to
access it. I think at present it'll complain about the underying
files being missing; that could probably be fine-tuned if we're so
inclined.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: unlogged tables
Date:	2010-11-14 02:17:36
Message-ID:	AANLkTimucBwh4QRdqf80R5V-7UpwCqbEqT43XjtX_Aob@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Nov 14, 2010 at 1:15 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Cleanup at first connection is something we've been avoiding for years,
> but maybe it's time to bite the bullet and do that?
>

Another alternative is to initialize the unlogged tables when you
first access them. If you try to open a table and there are no files
attached them go ahead and initialize it by creating an empty table
and building any indexes.

Hm, I had been assuming recovery would be responsible for cleaning up
the tables even if the first access is responsible for rebuilding
them. But there's a chance there have been no modifications to them
since the last checkpoint. But in that case the data in them is fine.
It would be a weird interface if it only cleared them out sometimes
based on unpredictable timing though. Avoiding that does require some
kind of alternate storage scheme other than the WAL to indicate what
needs to be cleared out. .init files are as good a mechanism even if
they just mean "unlink this file on startup".

--
greg

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: unlogged tables
Date:	2010-11-14 02:35:01
Message-ID:	AANLkTimxoBCG3suEF3nbo-4sxsr_RPXUAQKUOwJx_Vkx@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Nov 13, 2010 at 9:17 PM, Greg Stark <gsstark(at)mit(dot)edu> wrote:
> On Sun, Nov 14, 2010 at 1:15 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Cleanup at first connection is something we've been avoiding for years,
>> but maybe it's time to bite the bullet and do that?
>
> Another alternative is to initialize the unlogged tables when you
> first access them. If you try to open a table and there are no files
> attached them go ahead and initialize it by creating an empty table
> and building any indexes.

I thought about that (I've thought about a lot of things in regards to
this feature...). One problem is that you presumably will need to
open the relation before you can decide whether this is the first
access since restart. But by the time you've opened them, you've
already taken an AccessShareLock, and you'll presumably need something
a whole lot stronger than that to do the rebuild. Lock upgrades are
usually a good thing to avoid when possible, although maybe it would
be OK in this case, not sure. Another problem is that it's not too
clear to me where you'd hook in the logic to do the cleanup. The
relcache code seems like an awfully low-level place to be trying to
perpetrate this sort of monkey business.

> Hm, I had been assuming recovery would be responsible for cleaning up
> the tables even if the first access is responsible for rebuilding
> them. But there's a chance there have been no modifications to them
> since the last checkpoint. But in that case the data in them is fine.
> It would be a weird interface if it only cleared them out sometimes
> based on unpredictable timing though. Avoiding that does require some
> kind of alternate storage scheme other than the WAL to indicate what
> needs to be cleared out. .init files are as good a mechanism even if
> they just mean "unlink this file on startup".

One idea I had was to trigger the rebuild when we notice that the main
relation fork is missing. Then the startup code can just notice the
init fork, annihilate everything else, and call it good. However, this
appears to require modifying some fairly fundamental assumptions of
the current system. smgr.c/md.c believe that nobody should ever try
to read a nonexistent block, and unconditionally throw an error if the
caller tries to do so. You could provide a mode where they don't do
that, and instead return an error indication to the caller. Then you
could add an additional ReadBuffer mode, say RBM_FAIL, to let the
error percolate back up through that layer to the index AM or heap
code, which could then try to upgrade its lock and recreate the main
fork. However, I really couldn't work up much enthusiasm for
implementing this feature in a way that requires drilling a hole in
the abstraction stack from top to bottom.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Marti Raudsepp <marti(at)juffo(dot)org>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: unlogged tables
Date:	2010-11-15 15:54:48
Message-ID:	AANLkTi=n-PcrnYzHWG9JNoGkk0SHoOA3kudCC+7O5TMQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Nov 14, 2010 at 02:16, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Here is a series of three patches related to unlogged tables.

Just wondering, have you thought of any mechanisms how application
code might detect that an unlogged table was truncated due to restart?
While polling with something like "SELECT 1 FROM table LIMIT 1" might
work, it's an awful hack.

One obvious use case for these unlogged tables would be materalized
views. I think it would be useful to execute e.g. a TRUNCATE trigger
so that an the view could be initialized. If an exclusive lock were
passed on to the trigger procedure, this could even be done in a
race-condition-free manner as far as I can tell.

Would there be a problem with invoking this trigger from the session
that first touches the table?

Regards,
Marti

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Marti Raudsepp <marti(at)juffo(dot)org>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: unlogged tables
Date:	2010-11-15 16:22:28
Message-ID:	AANLkTi=t+ukXuyCJmBhyYJvAKM3WKUQ9R9uFqnjOGeb3@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Nov 15, 2010 at 10:54 AM, Marti Raudsepp <marti(at)juffo(dot)org> wrote:
> On Sun, Nov 14, 2010 at 02:16, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> Here is a series of three patches related to unlogged tables.
>
> Just wondering, have you thought of any mechanisms how application
> code might detect that an unlogged table was truncated due to restart?
> While polling with something like "SELECT 1 FROM table LIMIT 1" might
> work, it's an awful hack.
>
> One obvious use case for these unlogged tables would be materalized
> views. I think it would be useful to execute e.g. a TRUNCATE trigger
> so that an the view could be initialized. If an exclusive lock were
> passed on to the trigger procedure, this could even be done in a
> race-condition-free manner as far as I can tell.
>
> Would there be a problem with invoking this trigger from the session
> that first touches the table?

Yeah, this infrastructure doesn't really allow that. The truncate
happens way too early on in startup to execute any user-provided code.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Marti Raudsepp <marti(at)juffo(dot)org>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: unlogged tables
Date:	2010-11-15 16:25:15
Message-ID:	15440.1289838315@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Marti Raudsepp <marti(at)juffo(dot)org> writes:
> Would there be a problem with invoking this trigger from the session
> that first touches the table?

Other than security?

regards, tom lane

From:	Marti Raudsepp <marti(at)juffo(dot)org>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: unlogged tables
Date:	2010-11-15 17:02:53
Message-ID:	AANLkTimQ-JmVGJ595f2oLiV5d8WASqQBLKoTd846HZoK@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Nov 15, 2010 at 18:25, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Marti Raudsepp <marti(at)juffo(dot)org> writes:
>> Would there be a problem with invoking this trigger from the session
>> that first touches the table?
>
> Other than security?

Right, I guess that would only make sense with SECURITY DEFINER.

On Mon, Nov 15, 2010 at 18:22, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Nov 15, 2010 at 10:54 AM, Marti Raudsepp <marti(at)juffo(dot)org> wrote:
>> Just wondering, have you thought of any mechanisms how application
>> code might detect that an unlogged table was truncated due to restart?

> Yeah, this infrastructure doesn't really allow that. The truncate
> happens way too early on in startup to execute any user-provided code.

The truncate itself can be performed early and set a flag somewhere
that would invoke a trigger on the first access. I suppose it cannot
be called a "truncate trigger" then.

Or maybe provide hooks for pgAgent instead?

Do you see any alternatives to be notified of unlogged table
truncates? Without notification, this feature would seem to have
limited usefulness.

Regards,
Marti

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Marti Raudsepp <marti(at)juffo(dot)org>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: unlogged tables
Date:	2010-11-15 17:32:49
Message-ID:	AANLkTikXsmB16YL0ag68P8Tm4ULyGrPmBeg5SxFE7L=R@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Nov 15, 2010 at 12:02 PM, Marti Raudsepp <marti(at)juffo(dot)org> wrote:
> On Mon, Nov 15, 2010 at 18:25, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Marti Raudsepp <marti(at)juffo(dot)org> writes:
>>> Would there be a problem with invoking this trigger from the session
>>> that first touches the table?
>>
>> Other than security?
>
> Right, I guess that would only make sense with SECURITY DEFINER.
>
> On Mon, Nov 15, 2010 at 18:22, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Mon, Nov 15, 2010 at 10:54 AM, Marti Raudsepp <marti(at)juffo(dot)org> wrote:
>>> Just wondering, have you thought of any mechanisms how application
>>> code might detect that an unlogged table was truncated due to restart?
>
>> Yeah, this infrastructure doesn't really allow that. The truncate
>> happens way too early on in startup to execute any user-provided code.
>
> The truncate itself can be performed early and set a flag somewhere
> that would invoke a trigger on the first access. I suppose it cannot
> be called a "truncate trigger" then.
>
> Or maybe provide hooks for pgAgent instead?
>
> Do you see any alternatives to be notified of unlogged table
> truncates? Without notification, this feature would seem to have
> limited usefulness.

Well, you're only monitoring for a server restart. That's probably
something you need a way to monitor for anyway. I don't think we have
a function that exposes the time of the last server restart at the SQL
level, but maybe we should. You can monitor for it by watching the
logs, of course.

This is really intended for things like caches of session information
where loss is annoying (because users have to log back into the
webapp, or whatever) but not so critical that we want to take a
performance penalty to prevent it. It will also be helpful to people
who want to make PG run very very quickly even at the risk of data
loss, as in the recent discussion on -performance and some
conversations I had at PG West; it provides a more structured, and
hopefully also more performant, alternative to turning off fsync,
full_page_writes, and synchronous commit. For some such apps, it may
be sufficient to check for truncating at each reconnect, which will be
a whole lot easier than what they have to do now (which is rebuild the
entire cluster every time PG restarts).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Aidan Van Dyk <aidan(at)highrise(dot)ca>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: unlogged tables
Date:	2010-11-15 17:45:25
Message-ID:	AANLkTinhMkhOCCxJpjgSZ5d9Fe1vgFnAgZ=SnyxF1csQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Nov 15, 2010 at 11:22 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> Yeah, this infrastructure doesn't really allow that. The truncate
> happens way too early on in startup to execute any user-provided code.

But you could use the very feature of unlogged tables to know if
you've "initialized" some unlogged table by using an unlogged table to
note the initilization.

If the value you expect isn't in your "noted" table, you know that
it's been truncated...

Sure, it's "app side", but the hole point of unlogged tables it to
allow optimzations when the "appside" knows the data's dispensable and
rebuild-able.

--
Aidan Van Dyk Create like a god,
aidan(at)highrise(dot)ca command like a king,
http://www.highrise.ca/ work like a slave.

From:	Peter Eisentraut <peter_e(at)gmx(dot)net>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: unlogged tables
Date:	2010-11-16 19:49:17
Message-ID:	1289936957.31200.22.camel@vanquo.pezone.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On lör, 2010-11-13 at 19:16 -0500, Robert Haas wrote:
> 1. The first one (relpersistence-v1) is a mostly mechanical patch that
> replaces pg_class.relistemp (a Boolean) with pg_class.relpersistence
> (a character), so that we can support more than two values. BE SURE
> YOU INITDB, since the old catalog format will not work with this patch
> applied.

Btw., I would recommend that even in-progress or proposed patches
include catversion updates, which helps communicate the message such as
yours in a more robust manner and also reduces the chance of forgetting
the catversion change in the final commit.

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: unlogged tables
Date:	2010-11-16 20:08:24
Message-ID:	AANLkTikRazDB6fFPNoH1fqD7NZCegNwvN65WPgtEW_FY@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Nov 16, 2010 at 2:49 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> On lör, 2010-11-13 at 19:16 -0500, Robert Haas wrote:
>> 1. The first one (relpersistence-v1) is a mostly mechanical patch that
>> replaces pg_class.relistemp (a Boolean) with pg_class.relpersistence
>> (a character), so that we can support more than two values. BE SURE
>> YOU INITDB, since the old catalog format will not work with this patch
>> applied.
>
> Btw., I would recommend that even in-progress or proposed patches
> include catversion updates, which helps communicate the message such as
> yours in a more robust manner and also reduces the chance of forgetting
> the catversion change in the final commit.

I thought we had a policy of NOT doing that, because of the merge
conflicts thereby created. It's also hard to know what value to set
it to; whatever you pick will certainly be obsolete by commit time.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Peter Eisentraut <peter_e(at)gmx(dot)net>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: unlogged tables
Date:	2010-11-16 20:26:17
Message-ID:	1289939177.31200.31.camel@vanquo.pezone.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On tis, 2010-11-16 at 15:08 -0500, Robert Haas wrote:
> > Btw., I would recommend that even in-progress or proposed patches
> > include catversion updates, which helps communicate the message such
> as
> > yours in a more robust manner and also reduces the chance of
> forgetting
> > the catversion change in the final commit.
>
> I thought we had a policy of NOT doing that, because of the merge
> conflicts thereby created.

I don't know, but I for one *want* the merge conflict, because if I'm
actually merging two diverging lines of system catalog changes, then I
had better stop and think about it.

> It's also hard to know what value to set
> it to; whatever you pick will certainly be obsolete by commit time.

Well, the most important thing is that it's different from the last
value, but I have occasionally wondered about a way to support tagging
branches separately.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: unlogged tables
Date:	2010-11-16 21:04:27
Message-ID:	5836.1289941467@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Tue, Nov 16, 2010 at 2:49 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
>> Btw., I would recommend that even in-progress or proposed patches
>> include catversion updates,

> I thought we had a policy of NOT doing that, because of the merge
> conflicts thereby created. It's also hard to know what value to set
> it to; whatever you pick will certainly be obsolete by commit time.

Well, my expectation would be that the committer would reset the
catversion to current date when it goes into master. The question is
whether such a practice would be (a) helpful to testers and/or (b)
useful to the committer.

As for (a), it likely would be, except that a patch that's not very
recent is almost certainly going to get a merge failure here when the
tester tries to apply it locally. That doesn't really seem like a gain.
Still, I can see the point of forcing an initdb when needed.

As for (b), I'm not sure I buy Peter's argument about a merge conflict
on that being a helpful flag. I don't see any reason to think that
system catalog changes are much more (or less) likely to result in
hidden merge conflicts than any other type of change. I'm not
personally going to rely on a submitter's determination of whether a
catversion bump is needed anyhow.

So I lean towards being happy with the current approach, though I could
be talked into the other given a better argument for it.

regards, tom lane

From:	Peter Eisentraut <peter_e(at)gmx(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: unlogged tables
Date:	2010-11-16 21:37:06
Message-ID:	1289943426.31200.50.camel@vanquo.pezone.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On tis, 2010-11-16 at 16:04 -0500, Tom Lane wrote:
> Well, my expectation would be that the committer would reset the
> catversion to current date when it goes into master. The question is
> whether such a practice would be (a) helpful to testers and/or (b)
> useful to the committer.

As with most such things, it's a matter of personal preference. I
started doing this out of necessity a while ago, and it has turned out
to be very helpful.

> As for (a), it likely would be, except that a patch that's not very
> recent is almost certainly going to get a merge failure here when the
> tester tries to apply it locally. That doesn't really seem like a gain.

Arguably, it means that the patch should be updated. At least, it's a
warning sign to the reviewer.

> Still, I can see the point of forcing an initdb when needed.

Especially because it prevents novice patch reviewers from mixing
mismatching source and data directories and wasting time on obscure
"bugs".

> As for (b), I'm not sure I buy Peter's argument about a merge conflict
> on that being a helpful flag. I don't see any reason to think that
> system catalog changes are much more (or less) likely to result in
> hidden merge conflicts than any other type of change.

Actually, in a recent sample, the likelihood for a hidden merge conflict
in near 100% because different patches keep reassigning the same OID for
new database objects.

In addition, there is the Git philosophy argument that every branch
should stand on its own. If more than one person collaborates on a
branch for more than one week, all the original reasons for having the
catversion in the first place come back into play. And so while I do
not wish to be radical about requiring catversion updates in random
patches, we should recognize the possibility that catversion updates
outside of the mainline are reasonable.

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: unlogged tables
Date:	2010-12-15 07:20:17
Message-ID:	4D086C31.2050600@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 14.11.2010 02:16, Robert Haas wrote:
> 3. The third patch (relax-sync-commit-v1) allows asynchronous commit
> even when synchronous_commit=on if the transaction has not written
> WAL. Of course, a read-only transaction won't even have an XID and
> therefore won't need a commit record, so what this is really doing is
> allowing transactions that have written only to temp - or unlogged -
> tables to commit asynchronously. This should be OK, because if the
> system crashes before the commit record hits the disk, we haven't
> really lost anything we wouldn't lose anyway: the temp tables will
> disappear on restart, and the unlogged ones will be truncated. This
> path actually could be applied independently of the first two, if I
> adjusted the comments a bit.

Looks ok. I'd suggest rewording this comment though:

/*
* Check if we want to commit asynchronously. If we're doing cleanup of
* any non-temp rels or committing any command that wanted to force sync
* commit, then we must flush XLOG immediately. (We must not allow
* asynchronous commit if there are any non-temp tables to be deleted,
* because we might delete the files before the COMMIT record is flushed to
* disk. We do allow asynchronous commit if all to-be-deleted tables are
* temporary though, since they are lost anyway if we crash.) Otherwise,
* we can defer the flush if either (1) the user has set synchronous_commit
* = off, or (2) the current transaction has not performed any WAL-logged
* operation. This latter case can arise if the only writes performed by
* the current transaction target temporary or unlogged relations. Loss
* of such a transaction won't matter anyway, because temp tables will be
* lost after a crash anyway, and unlogged ones will be truncated.
*/

It's a bit hard to follow, as it first lists exceptions on when we must
flush XLOG immediately, and then lists conditions on when we can skip
it. How about:

/*
* Check if we can commit asynchronously. We can skip flushing the XLOG
* if synchronous_commit=off, or if the current transaction has not
* performed any WAL-logged operation. The latter case can arise if the
* only writes performed by the current transaction target temporary or
* unlogged relations. Loss of such a transaction won't matter anyway,
* because temp tables will be lost after a crash anyway, and unlogged
* ones will be truncated.
*
* However, if we're doing cleanup of any non-temp rels or committing
* any command that wanted to force sync commit, then we must flush
* XLOG immediately anyway. (We must not allow asynchronous commit if
* there are any non-temp tables to be deleted, because we might delete
* the files before the COMMIT record is flushed to disk. We do allow
* asynchronous commit if all to-be-deleted tables are temporary
* though, since they are lost anyway if we crash.)
*/

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: unlogged tables
Date:	2010-12-15 09:06:43
Message-ID:	1292404003.2737.5724.camel@ebony
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, 2010-11-13 at 20:55 -0500, Robert Haas wrote:
> I think that would be a recipe for bugs. Look at the three new macros
> I introduced. If you keep relistemp around, then any code which
> relies on it is likely testing for one of those three things, or maybe
> even something subtly different from any of them, as in the cases
> where I needed to add a switch statement. The way I see it, this is
> ultimately a four-level hierarchy

That argument isn't clear enough to avoid me agreeing so far with Tom
and Andrew that logged-ness is separate from temp-ness. As you say
though, it might be a recipe for bugs, so please explain a little more.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: unlogged tables
Date:	2010-12-15 09:20:41
Message-ID:	1292404841.2737.5749.camel@ebony
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, 2010-11-13 at 19:16 -0500, Robert Haas wrote:

> 3. The third patch (relax-sync-commit-v1) allows asynchronous commit
> even when synchronous_commit=on if the transaction has not written
> WAL. Of course, a read-only transaction won't even have an XID and
> therefore won't need a commit record, so what this is really doing is
> allowing transactions that have written only to temp - or unlogged -
> tables to commit asynchronously.

I like this, great idea.

Avoiding the commit record entirely will break Hot Standby though, since
we rely on the assumption that all xids that are assigned are also
logged. The xids would be "known assigned", yet since they never
actually appear they will clog up the machinery (pun unintended).

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: unlogged tables
Date:	2010-12-15 15:08:47
Message-ID:	AANLkTik=g1XZOSvmzZtC8tKycVnr7NrFbi49GWDgMNeN@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Dec 15, 2010 at 4:20 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Sat, 2010-11-13 at 19:16 -0500, Robert Haas wrote:
>
>> 3. The third patch (relax-sync-commit-v1) allows asynchronous commit
>> even when synchronous_commit=on if the transaction has not written
>> WAL. Of course, a read-only transaction won't even have an XID and
>> therefore won't need a commit record, so what this is really doing is
>> allowing transactions that have written only to temp - or unlogged -
>> tables to commit asynchronously.
>
> I like this, great idea.
>
> Avoiding the commit record entirely will break Hot Standby though, since
> we rely on the assumption that all xids that are assigned are also
> logged. The xids would be "known assigned", yet since they never
> actually appear they will clog up the machinery (pun unintended).

Uggh, that's a really, really bad pun.

I made the same observation to Tom somewhere-or-other (must have been
a different thread because I don't see it on this one), along with the
further observation that we actually could suppress the commit record
entirely if wal_level < hot_standby, but I'm not sure there's enough
benefit to doing that to worry about the additional complexity.
Changing it from a foreground flush to a background flush already wins
so much that I don't really see the point of doing anything further.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: unlogged tables
Date:	2010-12-15 15:39:32
Message-ID:	AANLkTimQ4Qax89X13gCdJh_vBgnz88c24BpMugp3=FG=@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Dec 15, 2010 at 4:06 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Sat, 2010-11-13 at 20:55 -0500, Robert Haas wrote:
>> I think that would be a recipe for bugs. Look at the three new macros
>> I introduced. If you keep relistemp around, then any code which
>> relies on it is likely testing for one of those three things, or maybe
>> even something subtly different from any of them, as in the cases
>> where I needed to add a switch statement. The way I see it, this is
>> ultimately a four-level hierarchy
>
> That argument isn't clear enough to avoid me agreeing so far with Tom
> and Andrew that logged-ness is separate from temp-ness. As you say
> though, it might be a recipe for bugs, so please explain a little more.

Sure. Most of the existing checks for rd_istemp were actually
checking whether the relation required WAL-logging. If there's any
third-party code out there that is checking rd_istemp, it likely also
needs to be revised to check whether WAL-logging is needed, not
whether the relation is temp. The way I've coded it, such code will
fail to compile, and can be very easily fixed by substituting a call
to RelationNeedsWAL() or RelationUsesLocalBuffers() or
RelationUsesTempNamespace(), depending on which property the caller
actually cares about. That's better than having the code compile, but
then not work as expected.

As of today, RelationNeedsWAL() always gives an answer which is
directly opposite to the answer given by RelationUsesLocalBuffers()
and RelationUsesTempNamespace(). But the main unlogged tables patch
changes that. RelationNeedsWAL() will return true for permanent
tables and false for unlogged and temp tables, while
RelationUsesLocalBuffers() and RelationUsesTempNamespace() will return
false for permanent and unlogged tables and true for temp tables.
When and if we get global temporary tables, there will be a further
split between RelationUsesLocalBuffers() and
RelationUsesTempNamespace(). The former will return true for both
global and local temporary tables, and the latter only for local
temporary tables.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company