Quick Links

Re: Eliminating VACUUM FULL WAS: remove flatfiles.c

Lists:	pgsql-hackers

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	remove flatfiles.c
Date:	2009-08-31 22:53:21
Message-ID:	20090831225321.GM6060@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

This patch removes flatfiles.c for good.

It doesn't change the keeping of locks in dbcommands.c and user.c,
because at least some of them are still required.

Regarding sync commits that previously happen and now won't, I think the
only case worth worrying about is the one in vacuum.c. Do we need a
ForceSyncCommit() in there? I'm not sure if vacuum itself already
forces sync commit.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Attachment	Content-Type	Size
remove-flatfiles.patch	text/x-diff	24.3 KB

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: remove flatfiles.c
Date:	2009-08-31 23:04:33
Message-ID:	4329.1251759873@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> This patch removes flatfiles.c for good.

Aw, you beat me to it.

> Regarding sync commits that previously happen and now won't, I think the
> only case worth worrying about is the one in vacuum.c. Do we need a
> ForceSyncCommit() in there? I'm not sure if vacuum itself already
> forces sync commit.

Hmm, I had been assuming we wouldn't need that anymore.

regards, tom lane

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: remove flatfiles.c
Date:	2009-08-31 23:16:10
Message-ID:	20090831231610.GN6060@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:

> > Regarding sync commits that previously happen and now won't, I think the
> > only case worth worrying about is the one in vacuum.c. Do we need a
> > ForceSyncCommit() in there? I'm not sure if vacuum itself already
> > forces sync commit.
>
> Hmm, I had been assuming we wouldn't need that anymore.

The comment in user.c and dbcommands.c says

/*
* Force synchronous commit, thus minimizing the window between
* creation of the database files and commital of the transaction. If
* we crash before committing, we'll have a DB that's taking up disk
* space but is not in pg_database, which is not good.
*/
ForceSyncCommit();

so I think those ones are still necessary. There's another call in
RenameDatabase() which I don't think needs a sync commit (because it
won't change the dir name), and one in vacuum.c:

/*
! * If we were able to advance datfrozenxid, mark the flat-file copy of
! * pg_database for update at commit, and see if we can truncate pg_clog.
! * Also force update if the shared XID-wrap-limit info is stale.
*/
if (dirty || !TransactionIdLimitIsValid())
- {
- database_file_update_needed();
vac_truncate_clog(newFrozenXid);
- }
}

AFAICT this doesn't need a sync commit. (Right now, VACUUM FULL forces
one, but lazy vacuum doesn't).

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: remove flatfiles.c
Date:	2009-08-31 23:33:23
Message-ID:	8810.1251761603@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Tom Lane wrote:
>> Hmm, I had been assuming we wouldn't need that anymore.

> The comment in user.c and dbcommands.c says [...]
> so I think those ones are still necessary.

Yeah, after a look through the code I think you can trust the associated
comments: if it says it needs sync commit, put in ForceSyncCommit, else
we don't need it.

regards, tom lane

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: remove flatfiles.c
Date:	2009-09-01 07:43:58
Message-ID:	1251791038.2889.42.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 2009-08-31 at 18:53 -0400, Alvaro Herrera wrote:

VACUUM FULL requires ForceSyncCommit().

Not sure why removing them elsewhere is important? Getting robustness
wrong is a big, bad thing and this opens us to future error. We already
tuned VACUUM so it does very little if it has no work to do, why would
one extra I/O improve things so much? If it ain't broke...

VACUUM does so many things that I'd rather have it all safely on disk.
I'd feel happier with the rule "VACUUM always sync commits", so we all
remember it and can rely upon it to be the same from release to release.

--
Simon Riggs www.2ndQuadrant.com

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: remove flatfiles.c
Date:	2009-09-01 13:58:43
Message-ID:	13589.1251813523@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> VACUUM does so many things that I'd rather have it all safely on disk.
> I'd feel happier with the rule "VACUUM always sync commits", so we all
> remember it and can rely upon it to be the same from release to release.

Non-FULL vacuum has *never* done a sync commit, except in the unusual
corner case that it moves the database's datfrozenxid, which is a corner
case that didn't even exist until fairly recently. I think the argument
that we should have it force sync for no reason whatsoever is silly.
We get beat up on a regular basis about "spikes" in response time;
why would you want to have vacuum creating one when it doesn't need to?

As for the FULL case, the sync commit is to try to protect a horribly
unsafe kluge that should go away entirely (if vacuum full itself doesn't
go away entirely). That's hardly something I want to institutionalize
either.

regards, tom lane

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: remove flatfiles.c
Date:	2009-09-01 17:07:07
Message-ID:	1251824827.2889.149.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, 2009-09-01 at 09:58 -0400, Tom Lane wrote:

> We get beat up on a regular basis about "spikes" in response time;
> why would you want to have vacuum creating one when it doesn't need
> to?

If one I/O on a background utility can cause such a spike, we are in
serious shitake. I would be more comfortable if the various important
things VACUUM does were protected by sync commit. I see no reason to
optimise away one I/O just because we might theoretically do so. Any
mistake in the theory and we are exposed. Why take the risk? We do many
things to check and secure our data, why not this one? If this was
suggested separately it as an optimisation you'd laugh and say why
bother?

--
Simon Riggs www.2ndQuadrant.com

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: remove flatfiles.c
Date:	2009-09-01 18:10:24
Message-ID:	20529.1251828624@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> On Tue, 2009-09-01 at 09:58 -0400, Tom Lane wrote:
>> We get beat up on a regular basis about "spikes" in response time;
>> why would you want to have vacuum creating one when it doesn't need
>> to?

> If one I/O on a background utility can cause such a spike, we are in
> serious shitake. I would be more comfortable if the various important
> things VACUUM does were protected by sync commit. I see no reason to
> optimise away one I/O just because we might theoretically do so. Any
> mistake in the theory and we are exposed. Why take the risk?

*WHAT* risk? Most vacuums do not do a sync commit, and never have.

regards, tom lane

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: remove flatfiles.c
Date:	2009-09-01 22:53:57
Message-ID:	407d949e0909011553wa616516g6f2bbecebcc6816@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Sep 1, 2009 at 2:58 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> We get beat up on a regular basis about "spikes" in response time;
> why would you want to have vacuum creating one when it doesn't need to?

Isn't this sync commit just going to do the same thing that the wal
writer is going to do in at most 200ms anyways?

> As for the FULL case, the sync commit is to try to protect a horribly
> unsafe kluge that should go away entirely (if vacuum full itself doesn't
> go away entirely).

I'm all for throwing away VACUUM FULL btw. I was thinking of proposing
that we replace it with something like CLUSTER which just rewrites the
tuples in the order it finds them.

The use cases where VACUUM FULL wins currently are where storing two
copies of the table and its indexes concurrently just isn't practical.
Also perhaps tables where there are too many large indexes to make
rebuilding them all in one maintenance window practical.

I don't see any way to address these problems without something as
complex as xvac and moved_in/moved_off and without the index bloat
problems. I think we could improve the i/o access patterns we have
currently which make vacuum full so slow, but the fundamental problems
would remain.

So the question is whether those use cases are worth keeping our
existing vacuum full for or whether we could do without it and just
recommend partitioning for people with tables large enough to make
table rewrites impractical.

--
greg
http://mit.edu/~gsstark/resume.pdf

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: remove flatfiles.c
Date:	2009-09-01 23:01:34
Message-ID:	20090901230134.GB7243@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Stark wrote:

> The use cases where VACUUM FULL wins currently are where storing two
> copies of the table and its indexes concurrently just isn't practical.

Yeah, but then do you really need to use VACUUM FULL? If that's really
a problem then there ain't that many dead tuples around.

> Also perhaps tables where there are too many large indexes to make
> rebuilding them all in one maintenance window practical.

If that's the concern maybe we oughta do something about concurrently
re-creating those indexes somehow. Plain REINDEX doesn't work of
course, but maybe we can do some trick with creating a new index and
dropping the original one afterwards.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: remove flatfiles.c
Date:	2009-09-01 23:34:07
Message-ID:	407d949e0909011634q716b254fhe4365e6faef67e0c@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Sep 2, 2009 at 12:01 AM, Alvaro
Herrera<alvherre(at)commandprompt(dot)com> wrote:
>> The use cases where VACUUM FULL wins currently are where storing two
>> copies of the table and its indexes concurrently just isn't practical.
>
> Yeah, but then do you really need to use VACUUM FULL? If that's really
> a problem then there ain't that many dead tuples around.

That's what I want to believe. But picture if you have, say a
1-terabyte table which is 50% dead tuples and you don't have a spare
1-terabytes to rewrite the whole table.

>> Also perhaps tables where there are too many large indexes to make
>> rebuilding them all in one maintenance window practical.
>
> If that's the concern maybe we oughta do something about concurrently
> re-creating those indexes somehow. Plain REINDEX doesn't work of
> course, but maybe we can do some trick with creating a new index and
> dropping the original one afterwards.

Well that doesn't really work if you want to rewrite the table.
CLUSTER has to rebuild all the indexes when it's done.

I think the solution for both of these is actually partitioning. The
bottom line is that having a single table which contains very large
amounts of data is awkward to maintain.

--
greg
http://mit.edu/~gsstark/resume.pdf

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: remove flatfiles.c
Date:	2009-09-01 23:42:56
Message-ID:	10271.1251848576@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> writes:
> On Wed, Sep 2, 2009 at 12:01 AM, Alvaro
> Herrera<alvherre(at)commandprompt(dot)com> wrote:
>>> The use cases where VACUUM FULL wins currently are where storing two
>>> copies of the table and its indexes concurrently just isn't practical.
>>
>> Yeah, but then do you really need to use VACUUM FULL? If that's really
>> a problem then there ain't that many dead tuples around.

> That's what I want to believe. But picture if you have, say a
> 1-terabyte table which is 50% dead tuples and you don't have a spare
> 1-terabytes to rewrite the whole table.

But trying to VACUUM FULL that table is going to be horridly painful
too, and you'll still have bloated indexes afterwards. You might as
well just live with the 50% waste, especially since if you did a
full-table update once you'll probably do it again sometime.

I'm having a hard time believing that VACUUM FULL really has any
interesting use-case anymore.

regards, tom lane

From:	Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: remove flatfiles.c
Date:	2009-09-01 23:56:52
Message-ID:	4A9DB4C4.2060503@cheapcomplexdevices.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Stark wrote:
>
> That's what I want to believe. But picture if you have, say a
> 1-terabyte table which is 50% dead tuples and you don't have a spare
> 1-terabytes to rewrite the whole table.

Could one hypothetically do
update bigtable set pk = pk where ctid in (select ctid from bigtable order by ctid desc limit 100);
vacuum;
and repeat until max(ctid) is small enough?

Sure, it'll take longer than vacuum full; but at first glance
it seems lightweight enough to do even on a live, heavily accessed
table.

IIRC I tried something like this once, and it worked to some extent,
but after a few loops didn't shrink the table as much as I had expected.

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
Cc:	Greg Stark <gsstark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: remove flatfiles.c
Date:	2009-09-02 01:29:33
Message-ID:	20090902012933.GC7243@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Ron Mayer wrote:
> Greg Stark wrote:
> >
> > That's what I want to believe. But picture if you have, say a
> > 1-terabyte table which is 50% dead tuples and you don't have a spare
> > 1-terabytes to rewrite the whole table.
>
> Could one hypothetically do
> update bigtable set pk = pk where ctid in (select ctid from bigtable order by ctid desc limit 100);
> vacuum;
> and repeat until max(ctid) is small enough?

I remember Hannu Krosing said they used something like that to shrink
really bloated tables. Maybe we should try to explicitely support a
mechanism that worked in that fashion. I think I tried it at some point
and found that the problem with it was that ctid was too limited in what
it was able to do.

The neat thing is that now that we have the visibility fork, each vacuum
needn't scan the whole table each time.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

From:	Rod Taylor <rod(dot)taylor(at)gmail(dot)com>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: remove flatfiles.c
Date:	2009-09-02 01:34:58
Message-ID:	751261b20909011834g5e08e7c0hc8e789f15b3fecb8@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Sep 1, 2009 at 19:34, Greg Stark <gsstark(at)mit(dot)edu> wrote:

> On Wed, Sep 2, 2009 at 12:01 AM, Alvaro
> Herrera<alvherre(at)commandprompt(dot)com> wrote:
> >> The use cases where VACUUM FULL wins currently are where storing two
> >> copies of the table and its indexes concurrently just isn't practical.
> >
> > Yeah, but then do you really need to use VACUUM FULL? If that's really
> > a problem then there ain't that many dead tuples around.
>
> That's what I want to believe. But picture if you have, say a
> 1-terabyte table which is 50% dead tuples and you don't have a spare
> 1-terabytes to rewrite the whole table.
>

It would be interesting if there was something between VACUUM FULL and
CLUSTER which could, say, work on a single 1GB segment at a time in a manner
similar to cluster.

You would still end up with index bloat like vacuum full, though perhaps not
as bad, but shuffling around the tuples should be faster.

The idea here is that the files can be truncated individually. Two 500MB
files is pretty much the same as a single 1GB file on disk.

Of course, I'm hand waving and don't have the technical expertise to figure
out if it can be done easily within PostgreSQL.

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Greg Stark <gsstark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: remove flatfiles.c
Date:	2009-09-02 02:55:14
Message-ID:	603c8f070909011955t5046cbffx272584cb005fb12f@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Sep 1, 2009 at 7:42 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Greg Stark <gsstark(at)mit(dot)edu> writes:
>> On Wed, Sep 2, 2009 at 12:01 AM, Alvaro
>> Herrera<alvherre(at)commandprompt(dot)com> wrote:
>>>> The use cases where VACUUM FULL wins currently are where storing two
>>>> copies of the table and its indexes concurrently just isn't practical.
>>>
>>> Yeah, but then do you really need to use VACUUM FULL? If that's really
>>> a problem then there ain't that many dead tuples around.
>
>> That's what I want to believe. But picture if you have, say a
>> 1-terabyte table which is 50% dead tuples and you don't have a spare
>> 1-terabytes to rewrite the whole table.
>
> But trying to VACUUM FULL that table is going to be horridly painful
> too, and you'll still have bloated indexes afterwards. You might as
> well just live with the 50% waste, especially since if you did a
> full-table update once you'll probably do it again sometime.
>
> I'm having a hard time believing that VACUUM FULL really has any
> interesting use-case anymore.

What if your large table doesn't have an index? Then there's no way to cluster.

I'm a bit skeptical about partitioning as a solution, too. The
planner is just not clever enough with partitioned tables, yet.

...Robert

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: remove flatfiles.c
Date:	2009-09-02 02:56:04
Message-ID:	603c8f070909011956p62d0cc7ct55af7d1b16f2ddb6@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Sep 1, 2009 at 9:29 PM, Alvaro
Herrera<alvherre(at)commandprompt(dot)com> wrote:
> Ron Mayer wrote:
>> Greg Stark wrote:
>> >
>> > That's what I want to believe. But picture if you have, say a
>> > 1-terabyte table which is 50% dead tuples and you don't have a spare
>> > 1-terabytes to rewrite the whole table.
>>
>> Could one hypothetically do
>> update bigtable set pk = pk where ctid in (select ctid from bigtable order by ctid desc limit 100);
>> vacuum;
>> and repeat until max(ctid) is small enough?
>
> I remember Hannu Krosing said they used something like that to shrink
> really bloated tables. Maybe we should try to explicitely support a
> mechanism that worked in that fashion. I think I tried it at some point
> and found that the problem with it was that ctid was too limited in what
> it was able to do.

I think a way to incrementally shrink large tables would be enormously
beneficial. Maybe vacuum could try to do a bit of that each time it
runs.

...Robert

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: remove flatfiles.c
Date:	2009-09-02 02:58:46
Message-ID:	20090902025846.GD7243@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas escribió:
> On Tue, Sep 1, 2009 at 7:42 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> > But trying to VACUUM FULL that table is going to be horridly painful
> > too, and you'll still have bloated indexes afterwards. You might as
> > well just live with the 50% waste, especially since if you did a
> > full-table update once you'll probably do it again sometime.
> >
> > I'm having a hard time believing that VACUUM FULL really has any
> > interesting use-case anymore.
>
> What if your large table doesn't have an index? Then there's no way to cluster.

But there's nothing saying we cannot provide a version of CLUSTER that
does not follow any index and just copies the live tuples.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: remove flatfiles.c
Date:	2009-09-02 03:02:01
Message-ID:	603c8f070909012002u4f38ec4ds54913ec96e0538a2@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Sep 1, 2009 at 10:58 PM, Alvaro
Herrera<alvherre(at)commandprompt(dot)com> wrote:
> Robert Haas escribió:
>> On Tue, Sep 1, 2009 at 7:42 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> > But trying to VACUUM FULL that table is going to be horridly painful
>> > too, and you'll still have bloated indexes afterwards. You might as
>> > well just live with the 50% waste, especially since if you did a
>> > full-table update once you'll probably do it again sometime.
>> >
>> > I'm having a hard time believing that VACUUM FULL really has any
>> > interesting use-case anymore.
>>
>> What if your large table doesn't have an index? Then there's no way to cluster.
>
> But there's nothing saying we cannot provide a version of CLUSTER that
> does not follow any index and just copies the live tuples.

Agreed.

...Robert

From:	Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: remove flatfiles.c
Date:	2009-09-02 05:30:01
Message-ID:	3073cc9b0909012230p6dc5bf91v39597a4d38e88de7@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Sep 1, 2009 at 9:55 PM, Robert Haas<robertmhaas(at)gmail(dot)com> wrote:
>
> I'm a bit skeptical about partitioning as a solution, too. The
> planner is just not clever enough with partitioned tables, yet.
>

analyze and vacuum a *very* big table and even scan a huge index is
not a joke neither...
and yes the planner is not very clever about partitioning and
certainly that is something we need to fix not something we have to
live with... no that that will be easy but hey! we have very brilliant
people here (you being one of them)

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: remove flatfiles.c
Date:	2009-09-02 13:55:36
Message-ID:	407d949e0909020655wa523e1am79691fc806747cfc@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Sep 2, 2009 at 6:30 AM, Jaime
Casanova<jcasanov(at)systemguards(dot)com(dot)ec> wrote:
> On Tue, Sep 1, 2009 at 9:55 PM, Robert Haas<robertmhaas(at)gmail(dot)com> wrote:
>>
>> I'm a bit skeptical about partitioning as a solution, too. The
>> planner is just not clever enough with partitioned tables, yet.

Yeah, we need to fix that :)

I think we're already reaching the point where the pains of dealing
with partitioned tables are usually less than the pains of dealing
with VACUUM FULL.

> analyze and vacuum a *very* big table and even scan a huge index is
> not a joke neither...

Hm, not sure I see this. The sample size for Analyze is not dependent
on the size of the table. Only on the stats_target. And vacuum with
the VM is now going to be dependent only on the number of updates to
the table, not on the size of the table.

The problem use cases we have today are only when you really do have
enough dead space to clean up that you want to compact the file -- but
not so much that it's worth rewriting the whole table using CLUSTER or
ALTER TABLE.

Perhaps we should go one version with a enable_legacy_full_vacuum
which defaults to off. That would at least let us hear about use cases
where people are unhappy with a replacement.

I did start a while ago on a replacement which used the existing
rewrite mechanism to do the equivalent of cluster without changing the
ordering. I forget where I left that but I could go back and look at
it. I'll be busy for the next few weeks though so it won't be right
away.

--
greg
http://mit.edu/~gsstark/resume.pdf

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: remove flatfiles.c
Date:	2009-09-02 17:41:59
Message-ID:	4A9EAE67.6060802@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

All,

> I'm having a hard time believing that VACUUM FULL really has any
> interesting use-case anymore.

Basically, for:
a) people who don't understand CLUSTER (easily fixed, simply create a
VACUUM FULL command which just does CLUSTER on the primary key)

b) people who are completely out of space on disk and are trying to
shrink the database to free up space.

For (b), I think it's OK to just tell those people that they need to
move the database files or find something else to delete. Most of the
time, they have to do that *anyway* in order for VACUUM FULL to work,
since the transaction log is on the same disk. We just need a little
more documentation, is all.

> The problem use cases we have today are only when you really do have
> enough dead space to clean up that you want to compact the file -- but
> not so much that it's worth rewriting the whole table using CLUSTER or
> ALTER TABLE.

I haven't seen this use-case in the field. I'm not sure that it
actually exists. Anyone run across a case where this made sense?

Recently I actually had a client dump and reload their database rather
than running VACUUM FULL; a reload took 4 hours but VACUUM FULL took
more than 18.

> Perhaps we should go one version with a enable_legacy_full_vacuum
> which defaults to off. That would at least let us hear about use cases
> where people are unhappy with a replacement.

I think we do need to do this, just because people won't have changed
their admin scripts. But the goal should be to dump VACUUM FULL
entirely by 8.6 if we *don't* get serious use-cases.

> I did start a while ago on a replacement which used the existing
> rewrite mechanism to do the equivalent of cluster without changing the
> ordering. I forget where I left that but I could go back and look at
> it. I'll be busy for the next few weeks though so it won't be right
> away.

This would be very helpful.

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

From:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: remove flatfiles.c
Date:	2009-09-02 17:46:59
Message-ID:	1251913619.8406.37.camel@jd-desktop.unknown.charter.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 2009-09-02 at 10:41 -0700, Josh Berkus wrote:
> All,
>
>
> > I'm having a hard time believing that VACUUM FULL really has any
> > interesting use-case anymore.
>
> Basically, for:
> a) people who don't understand CLUSTER (easily fixed, simply create a
> VACUUM FULL command which just does CLUSTER on the primary key)
>
> b) people who are completely out of space on disk and are trying to
> shrink the database to free up space.
>
> For (b), I think it's OK to just tell those people that they need to
> move the database files or find something else to delete. Most of the
> time, they have to do that *anyway* in order for VACUUM FULL to work,
> since the transaction log is on the same disk. We just need a little
> more documentation, is all.

Right.

>
> > The problem use cases we have today are only when you really do have
> > enough dead space to clean up that you want to compact the file -- but
> > not so much that it's worth rewriting the whole table using CLUSTER or
> > ALTER TABLE.
>
> I haven't seen this use-case in the field. I'm not sure that it
> actually exists. Anyone run across a case where this made sense?
>

No.

> Recently I actually had a client dump and reload their database rather
> than running VACUUM FULL; a reload took 4 hours but VACUUM FULL took
> more than 18.
>

Exactly.

> > Perhaps we should go one version with a enable_legacy_full_vacuum
> > which defaults to off. That would at least let us hear about use cases
> > where people are unhappy with a replacement.
>
> I think we do need to do this, just because people won't have changed
> their admin scripts. But the goal should be to dump VACUUM FULL
> entirely by 8.6 if we *don't* get serious use-cases.
>

Agreed, but I think we shouldn't even put it in the postgresql.conf by
default. Just document that it exists. Settings for the sake of settings
(even ones that may have a corner case) seem to confuse users.

Joshua D. Drake

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: remove flatfiles.c
Date:	2009-09-02 17:52:11
Message-ID:	407d949e0909021052p79e661b3hc7e42ee2d2c0767a@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Sep 2, 2009 at 6:41 PM, Josh Berkus<josh(at)agliodbs(dot)com> wrote:
> All,
>
>
>> I'm having a hard time believing that VACUUM FULL really has any
>> interesting use-case anymore.
>
> Basically, for:
> a) people who don't understand CLUSTER (easily fixed, simply create a
> VACUUM FULL command which just does CLUSTER on the primary key)

I don't think we want to cluster on the primary key. I think we just
want to rewrite the table keeping the same physical ordering.

>> The problem use cases we have today are only when you really do have
>> enough dead space to clean up that you want to compact the file -- but
>> not so much that it's worth rewriting the whole table using CLUSTER or
>> ALTER TABLE.
>
> I haven't seen this use-case in the field. I'm not sure that it
> actually exists. Anyone run across a case where this made sense?

Well I've certainly seen people whose disks are more than 50% full.
They tend to be the same people who want to compact their tables. I
can't say whether any of them had a single table with associated
indexes that were taking up more than 50% but it's not uncommon to
have a single table that dominates your database.

> Recently I actually had a client dump and reload their database rather
> than running VACUUM FULL; a reload took 4 hours but VACUUM FULL took
> more than 18.
>
>> Perhaps we should go one version with a enable_legacy_full_vacuum
>> which defaults to off. That would at least let us hear about use cases
>> where people are unhappy with a replacement.
>
> I think we do need to do this, just because people won't have changed
> their admin scripts. But the goal should be to dump VACUUM FULL
> entirely by 8.6 if we *don't* get serious use-cases.

We could deal with the admin scripts by making VACUUM FULL do the new
behaviour. But I actually don't really like that. I wold prefer to
break VACUUM FULL since anyone doing it routinely is probably
mistaken. We could name the command something which is more
descriptive like VACUUM REWRITE or VACUUM REBUILD or something like
that.

--
greg
http://mit.edu/~gsstark/resume.pdf

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Josh Berkus" <josh(at)agliodbs(dot)com>,"Greg Stark" <gsstark(at)mit(dot)edu>
Cc:	<pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: remove flatfiles.c
Date:	2009-09-02 17:57:29
Message-ID:	4A9E6BB9020000250002A8D6@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> wrote:

> I don't think we want to cluster on the primary key. I think we just
> want to rewrite the table keeping the same physical ordering.

Well if that's what you want to do, couldn't you do something like?:

Lock the table.
Prop all indexes
Pass the heap with two pointers, one to the first available empty
space and one to the first non-dead row past that, and move inside the
existing file.
Rebuild the indexes.
Release the lock.

-Kevin

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: remove flatfiles.c
Date:	2009-09-02 18:01:02
Message-ID:	4A9EB2DE.6090108@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg,

> I don't think we want to cluster on the primary key. I think we just
> want to rewrite the table keeping the same physical ordering.

Agreed.

> Well I've certainly seen people whose disks are more than 50% full.
> They tend to be the same people who want to compact their tables. I
> can't say whether any of them had a single table with associated
> indexes that were taking up more than 50% but it's not uncommon to
> have a single table that dominates your database.

Those people would also need for the tables involved to be fairly small,
or to be able to afford a lot of downtime. VACUUM FULL on a 100GB table
with current commodity servers can take upwards of 8 hours. I really
think the cases of people who have more available downtime than disk
space is is vanishingly small group.

However, I'll do a survey. Why not?

> We could deal with the admin scripts by making VACUUM FULL do the new
> behaviour. But I actually don't really like that. I wold prefer to
> break VACUUM FULL since anyone doing it routinely is probably
> mistaken. We could name the command something which is more
> descriptive like VACUUM REWRITE or VACUUM REBUILD or something like
> that.

Agreed. I like VACUUM REWRITE, as it makes it fairly clear what's going on.

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

From:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Greg Stark <gsstark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: remove flatfiles.c
Date:	2009-09-02 18:07:04
Message-ID:	1251914824.8406.41.camel@jd-desktop.unknown.charter.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 2009-09-02 at 11:01 -0700, Josh Berkus wrote:
> Greg,
>
> > I don't think we want to cluster on the primary key. I think we just
> > want to rewrite the table keeping the same physical ordering.
>
> Agreed.

Are we sure about that? I would argue that the majority of users out
their (think Django and other Web*Frameworks) are all searching
primarily by primary key + other anyway.

We could always offer the ability to vacuum full (cluster) on index foo
but default to the primary key.

Joshua D. Drake
--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: remove flatfiles.c
Date:	2009-09-02 18:31:26
Message-ID:	603c8f070909021131g4a5513c6i24b6ec5355bf12f6@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Sep 2, 2009 at 1:52 PM, Greg Stark<gsstark(at)mit(dot)edu> wrote:
> We could deal with the admin scripts by making VACUUM FULL do the new
> behaviour. But I actually don't really like that. I wold prefer to
> break VACUUM FULL since anyone doing it routinely is probably
> mistaken.

So I have a script that goes and finds bloated tables and runs VACUUM
FULL on them in the middle of the night if the bloat passes a certain
threshold. The tables are small enough and the number of users is low
enough that this doesn't cause any problems for me. I'm OK if the
name of the command changes, but I'd like there to be a command that I
can pass a table name to and get my table debloated without having to
make any follow-on decisions (such as picking an index to cluster by).

...Robert

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: remove flatfiles.c
Date:	2009-09-02 18:31:30
Message-ID:	14718.1251916290@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> writes:
> On Wed, Sep 2, 2009 at 6:41 PM, Josh Berkus<josh(at)agliodbs(dot)com> wrote:
>>> Perhaps we should go one version with a enable_legacy_full_vacuum
>>> which defaults to off. That would at least let us hear about use cases
>>> where people are unhappy with a replacement.
>>
>> I think we do need to do this, just because people won't have changed
>> their admin scripts. But the goal should be to dump VACUUM FULL
>> entirely by 8.6 if we *don't* get serious use-cases.

What's wrong with just ignoring the FULL option? It's a reserved
word anyway because of FULL OUTER JOINs, so there's no syntactic
benefit to be had from eliminating it from the VACUUM syntax.

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Greg Stark <gsstark(at)mit(dot)edu>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: remove flatfiles.c
Date:	2009-09-02 18:54:40
Message-ID:	15083.1251917680@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> So I have a script that goes and finds bloated tables and runs VACUUM
> FULL on them in the middle of the night if the bloat passes a certain
> threshold. The tables are small enough and the number of users is low
> enough that this doesn't cause any problems for me. I'm OK if the
> name of the command changes, but I'd like there to be a command that I
> can pass a table name to and get my table debloated without having to
> make any follow-on decisions (such as picking an index to cluster by).

I think we *should* have a command that works like CLUSTER except it just
seqscans the source table without depending on any particular index.
Whether it's called VACUUM FULL or something else is a detail.

regards, tom lane

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: remove flatfiles.c
Date:	2009-09-02 19:01:06
Message-ID:	407d949e0909021201j3b09e178t99de07d6bd1a7cc6@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Sep 2, 2009 at 6:57 PM, Kevin
Grittner<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> Greg Stark <gsstark(at)mit(dot)edu> wrote:
>
>> I don't think we want to cluster on the primary key. I think we just
>> want to rewrite the table keeping the same physical ordering.
>
> Well if that's what you want to do, couldn't you do something like?:
>
> Lock the table.
> Prop all indexes
> Pass the heap with two pointers, one to the first available empty
> space and one to the first non-dead row past that, and move inside the
> existing file.
> Rebuild the indexes.
> Release the lock.

Well dropping the indexes and moving tuples are both "hard" if you
care about crash-safety and transactional integrity.

The way we rewrite tables now is:

Lock table
Create new filenode.
Scan old table and copy each record into the new filenode keeping
update chains intact.
Rebuild all indexes for the table (using a similar strategy with new
relfilenodes)
Commit the transaction

If the transaction aborts at any point you still have the old pg_class
record which points to the old relfilenode and all the old indexes are
still valid.

We have all the pieces we need to do this, it's just a matter of
putting them together with a command to call them.

A big part of what VACUUM FULL is annoying is the complexity of moving
tuples in place. VACUUM FULL has to mark the old tuples and the new
copies with its xid. It can't truncate the relation until it commits
that xid.

Actually I wonder how much performance improvement would come on
normal DML just from not having to check xvac in the visibility
checks. It's probably not much but...

--
greg
http://mit.edu/~gsstark/resume.pdf

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Greg Stark <gsstark(at)mit(dot)edu>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: remove flatfiles.c
Date:	2009-09-02 19:04:27
Message-ID:	603c8f070909021204o6b78ca5coba1494ff028f372b@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Sep 2, 2009 at 2:31 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Greg Stark <gsstark(at)mit(dot)edu> writes:
>> On Wed, Sep 2, 2009 at 6:41 PM, Josh Berkus<josh(at)agliodbs(dot)com> wrote:
>>>> Perhaps we should go one version with a enable_legacy_full_vacuum
>>>> which defaults to off. That would at least let us hear about use cases
>>>> where people are unhappy with a replacement.
>>>
>>> I think we do need to do this, just because people won't have changed
>>> their admin scripts. But the goal should be to dump VACUUM FULL
>>> entirely by 8.6 if we *don't* get serious use-cases.
>
>> We could deal with the admin scripts by making VACUUM FULL do the new
>> behaviour. But I actually don't really like that. I wold prefer to
>> break VACUUM FULL since anyone doing it routinely is probably
>> mistaken. We could name the command something which is more
>> descriptive like VACUUM REWRITE or VACUUM REBUILD or something like
>> that.
>
> What's wrong with just ignoring the FULL option? It's a reserved
> word anyway because of FULL OUTER JOINs, so there's no syntactic
> benefit to be had from eliminating it from the VACUUM syntax.

Silent behavior changes are usually a bad idea.

...Robert

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Greg Stark <gsstark(at)mit(dot)edu>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: remove flatfiles.c
Date:	2009-09-02 19:10:53
Message-ID:	603c8f070909021210l3d0ede5fg6ae0ebd9a44b9f5c@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Sep 2, 2009 at 2:54 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> So I have a script that goes and finds bloated tables and runs VACUUM
>> FULL on them in the middle of the night if the bloat passes a certain
>> threshold. The tables are small enough and the number of users is low
>> enough that this doesn't cause any problems for me. I'm OK if the
>> name of the command changes, but I'd like there to be a command that I
>> can pass a table name to and get my table debloated without having to
>> make any follow-on decisions (such as picking an index to cluster by).
>
> I think we *should* have a command that works like CLUSTER except it just
> seqscans the source table without depending on any particular index.
> Whether it's called VACUUM FULL or something else is a detail.

Yeah. We could do this by extending the syntax for cluster (e.g.
CLUSTER [VERBOSE] tablename [USING indexname | WITH NO INDEX]), but
I'm not sure whether that has any real advantage over just using the
existing command name.

I confess to being a little fuzzy on the details of how this
implementation (seq-scanning the source table for live tuples) is
different/better from the current VACUUM FULL implementation. Can
someone fill me in?

...Robert

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: remove flatfiles.c
Date:	2009-09-02 19:30:26
Message-ID:	407d949e0909021230q40d520far51de9e63162f5861@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Sep 2, 2009 at 8:10 PM, Robert Haas<robertmhaas(at)gmail(dot)com> wrote:
> I confess to being a little fuzzy on the details of how this
> implementation (seq-scanning the source table for live tuples) is
> different/better from the current VACUUM FULL implementation. Can
> someone fill me in?

VACUUM FULL is a *lot* more complex.

It scans pages *backwards* from the end (which does wonderful things
on rotating media). Marks each live tuple it finds as "moved off",
finds a new place for it (using the free space map I think?). Insert
the tuple on the new page and marks it "moved in" and updates the
indexes.

Then it commits the transaction but keeps the lock. Then it has to
vacuum all the indexes of the references to the old tuples at the end
of the table. I think it has to commit that too before it can finally
truncate the table.

The backwards scan is awful for rotating media. The reading from the
end and writing to the beginning is bad too, though hopefully the
cache can help that.

A lot of the complexity comes in from other parts of the system that
have to be aware of tuples that have been "moved off" or "moved in".
They have to be able to check whether the vacuum committed or not.

That reminds me there was another proposal to do an "online" vacuum
full similar to our concurrent index builds. Do noop-updates to tuples
at the end of the table, hopefully finding space for them earlier in
the table. Wait until those transactions are no longer visible to
anyone else and then truncate. (Actually I think you could just not do
anything and let regular lazy vacuum do the truncate). That might be a
good practical alternative for sites where copying their entire table
isn't practical.

--
greg
http://mit.edu/~gsstark/resume.pdf

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: remove flatfiles.c
Date:	2009-09-02 19:45:49
Message-ID:	25528.1251920749@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> writes:
> The backwards scan is awful for rotating media. The reading from the
> end and writing to the beginning is bad too, though hopefully the
> cache can help that.

Yeah. And all that pales in comparison to what happens in the indexes.
You have to insert index entries (retail) for each moved-in tuple,
then after doing the intermediate commit you run around and remove
the index entries for the moved-off tuples. Lots of nonsequential
access to insert the entries. The cleanup isn't so bad --- it's
comparable to what regular lazy VACUUM has to do --- but that's just
one step in a very expensive process.

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: remove flatfiles.c
Date:	2009-09-02 19:49:46
Message-ID:	26362.1251920986@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> writes:
> It scans pages *backwards* from the end (which does wonderful things
> on rotating media). Marks each live tuple it finds as "moved off",
> finds a new place for it (using the free space map I think?).

BTW, VACUUM FULL doesn't use the free space map --- that code predates
the FSM by a lot. It builds its own map of free space during its
initial lazy-VACUUM-equivalent scan that just removes dead tuples.
While I don't think this hurts performance any, I have seen reports of
VACUUM FULL failing outright on large tables because it runs out of
memory for this map. So that's still another way in which it's not
actually all that useful for huge tables.

regards, tom lane

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: remove flatfiles.c
Date:	2009-09-02 20:03:19
Message-ID:	20090902200319.GC5314@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane escribió:
> Greg Stark <gsstark(at)mit(dot)edu> writes:
> > It scans pages *backwards* from the end (which does wonderful things
> > on rotating media). Marks each live tuple it finds as "moved off",
> > finds a new place for it (using the free space map I think?).
>
> BTW, VACUUM FULL doesn't use the free space map --- that code predates
> the FSM by a lot. It builds its own map of free space during its
> initial lazy-VACUUM-equivalent scan that just removes dead tuples.

Another weird consequence of this is that it bails out if it finds a
tuple larger than it can fit in one of the earlier pages; if there's
dead space to be compacted before that, it's not compacted.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: remove flatfiles.c
Date:	2009-09-02 20:13:53
Message-ID:	603c8f070909021313n70759644x9078cdc7756394ee@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Sep 2, 2009 at 3:30 PM, Greg Stark<gsstark(at)mit(dot)edu> wrote:
> On Wed, Sep 2, 2009 at 8:10 PM, Robert Haas<robertmhaas(at)gmail(dot)com> wrote:
>> I confess to being a little fuzzy on the details of how this
>> implementation (seq-scanning the source table for live tuples) is
>> different/better from the current VACUUM FULL implementation. Can
>> someone fill me in?
>
>
> VACUUM FULL is a *lot* more complex.
>
> It scans pages *backwards* from the end (which does wonderful things
> on rotating media). Marks each live tuple it finds as "moved off",
> finds a new place for it (using the free space map I think?). Insert
> the tuple on the new page and marks it "moved in" and updates the
> indexes.
>
> Then it commits the transaction but keeps the lock. Then it has to
> vacuum all the indexes of the references to the old tuples at the end
> of the table. I think it has to commit that too before it can finally
> truncate the table.
>
> The backwards scan is awful for rotating media. The reading from the
> end and writing to the beginning is bad too, though hopefully the
> cache can help that.
>
> A lot of the complexity comes in from other parts of the system that
> have to be aware of tuples that have been "moved off" or "moved in".
> They have to be able to check whether the vacuum committed or not.

Ugh.

> That reminds me there was another proposal to do an "online" vacuum
> full similar to our concurrent index builds. Do noop-updates to tuples
> at the end of the table, hopefully finding space for them earlier in
> the table. Wait until those transactions are no longer visible to
> anyone else and then truncate. (Actually I think you could just not do
> anything and let regular lazy vacuum do the truncate). That might be a
> good practical alternative for sites where copying their entire table
> isn't practical.

I don't have a strong opinion about whether it's better to take an
exclusive lock on everything and reorganize freely, or whether it's
better to try to use MVCC semantics to reduce the locking impact (at a
cost of difficulty in the presence of long-running transactions). But
I think it would be really nice to have an incremental way to reduce
table bloat. Locking a table (or even better, part of a table) for a
couple of seconds once an hour for several days or weeks figures to be
practical in some (many?) environments where locking a table for
minutes or hours is not.

...Robert

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: remove flatfiles.c
Date:	2009-09-02 20:37:06
Message-ID:	28748.1251923826@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Another weird consequence of this is that it bails out if it finds a
> tuple larger than it can fit in one of the earlier pages; if there's
> dead space to be compacted before that, it's not compacted.

I don't find a lot wrong with that. The code defines its purpose as
being to shorten the table file length. Once it hits a page that
can't be emptied, it cannot shorten the file any further, so why
shouldn't it stop?

regards, tom lane

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: remove flatfiles.c
Date:	2009-09-02 20:39:51
Message-ID:	20090902203950.GG5314@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane escribió:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> > Another weird consequence of this is that it bails out if it finds a
> > tuple larger than it can fit in one of the earlier pages; if there's
> > dead space to be compacted before that, it's not compacted.
>
> I don't find a lot wrong with that. The code defines its purpose as
> being to shorten the table file length. Once it hits a page that
> can't be emptied, it cannot shorten the file any further, so why
> shouldn't it stop?

All that work, and it wasn't capable of defragging the other pages? At
the very least it could register them in the FSM.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: remove flatfiles.c
Date:	2009-09-02 20:42:59
Message-ID:	28853.1251924179@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Tom Lane escribi:
>> I don't find a lot wrong with that. The code defines its purpose as
>> being to shorten the table file length. Once it hits a page that
>> can't be emptied, it cannot shorten the file any further, so why
>> shouldn't it stop?

> All that work, and it wasn't capable of defragging the other pages? At
> the very least it could register them in the FSM.

You mean like vac_update_fsm() ?

regards, tom lane

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: remove flatfiles.c
Date:	2009-09-02 22:54:36
Message-ID:	20090902225436.GA5896@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane escribió:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> > Tom Lane escribi:
> >> I don't find a lot wrong with that. The code defines its purpose as
> >> being to shorten the table file length. Once it hits a page that
> >> can't be emptied, it cannot shorten the file any further, so why
> >> shouldn't it stop?
>
> > All that work, and it wasn't capable of defragging the other pages? At
> > the very least it could register them in the FSM.
>
> You mean like vac_update_fsm() ?

Huh :-)

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: remove flatfiles.c
Date:	2009-09-02 22:55:19
Message-ID:	4A9EF7D7.6020508@cheapcomplexdevices.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas wrote:
> On Tue, Sep 1, 2009 at 9:29 PM, Alvaro
> Herrera<alvherre(at)commandprompt(dot)com> wrote:
>> Ron Mayer wrote:
>>> Greg Stark wrote:
>>>> That's what I want to believe. But picture if you have, say a
>>>> 1-terabyte table which is 50% dead tuples and you don't have a spare
>>>> 1-terabytes to rewrite the whole table.
>>> Could one hypothetically do
>>> update bigtable set pk = pk where ctid in (select ctid from bigtable order by ctid desc limit 100);
>>> vacuum;
>>> and repeat until max(ctid) is small enough?
>> I remember Hannu Krosing said they used something like that to shrink
>> really bloated tables. Maybe we should try to explicitely support a
>> mechanism that worked in that fashion. I think I tried it at some point
>> and found that the problem with it was that ctid was too limited in what
>> it was able to do.
>
> I think a way to incrementally shrink large tables would be enormously
> beneficial. Maybe vacuum could try to do a bit of that each time it
> runs.

Yet when I try it now, I'm having trouble making it work.
Would you expect the ctid to be going down in the psql session
shown below? I wonder why it isn't.

regression=# create table shrink_test as select * from tenk1;
SELECT
regression=# delete from shrink_test where (unique2 % 2) = 0;
DELETE 5000
regression=# create index "shrink_test(unique1)" on shrink_test(unique1);
CREATE INDEX
regression=# select max(ctid) from shrink_test;
max
----------
(333,10)
(1 row)

regression=# update shrink_test set unique1=unique1 where ctid in (select ctid from shrink_test order by ctid desc limit 100);
UPDATE 100
regression=# vacuum shrink_test;
VACUUM
regression=# select max(ctid) from shrink_test;
max
----------
(333,21)
(1 row)

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: remove flatfiles.c
Date:	2009-09-03 00:31:11
Message-ID:	407d949e0909021731q49fd757dl6ef655d9c30a1330@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Sep 2, 2009 at 11:55 PM, Ron Mayer<rm_pg(at)cheapcomplexdevices(dot)com> wrote:
> Yet when I try it now, I'm having trouble making it work.
> Would you expect the ctid to be going down in the psql session
> shown below? I wonder why it isn't.

Even before HOT we preferentially tried to put updated tuples on the
same page they were on before. On pre-8.3 if you did these updates
*without* the vacuum they would eventually be forced to find a new
page and hopefully would find one earlier in the table.

On 8.4 HOT will (hopefully) prevent even that from working. Unless you
have a long-running transaction in the background it will clean up the
old tuples in the chain on the page each time the page fills up.
You've deleted half the tuples on the page so the updates will always
fit in that space.

Eventually you'll hit the maximum number of tuples allowed on the page
dead or alive. But the vacuums are defeating that too. A special
purpose command could work around all of this.

--
greg
http://mit.edu/~gsstark/resume.pdf

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: remove flatfiles.c
Date:	2009-09-03 00:37:16
Message-ID:	407d949e0909021737h5683f4dex9b848eab9fe145cd@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Sep 2, 2009 at 8:45 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Greg Stark <gsstark(at)mit(dot)edu> writes:
>> The backwards scan is awful for rotating media. The reading from the
>> end and writing to the beginning is bad too, though hopefully the
>> cache can help that.
>
> Yeah. And all that pales in comparison to what happens in the indexes.
> You have to insert index entries (retail) for each moved-in tuple,

Hm, that could be addressed by buffering index inserts in backend
local memory. That's something Heikki proposed a long time ago
primarily for improving bulk data loading. Basically it would be a
retail version of the bulk loader that we saw at the 10th anniversary
where you merge a sorted list into the index.

You would still have to flush the buffer at transaction commit but
even if it only buffered a few dozen tuples if they're in the same
region of the index it would be a win. In this case it could probably
buffer hundreds and merge them all into the index en masse.

--
greg
http://mit.edu/~gsstark/resume.pdf

From:	daveg <daveg(at)sonic(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Greg Stark <gsstark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: remove flatfiles.c
Date:	2009-09-03 23:39:59
Message-ID:	20090903233959.GB6207@sonic.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Sep 01, 2009 at 07:42:56PM -0400, Tom Lane wrote:
> Greg Stark <gsstark(at)mit(dot)edu> writes:
> > On Wed, Sep 2, 2009 at 12:01 AM, Alvaro
> > Herrera<alvherre(at)commandprompt(dot)com> wrote:
> >>> The use cases where VACUUM FULL wins currently are where storing two
> >>> copies of the table and its indexes concurrently just isn't practical.
> >>
> >> Yeah, but then do you really need to use VACUUM FULL? If that's really
> >> a problem then there ain't that many dead tuples around.
>
> > That's what I want to believe. But picture if you have, say a
> > 1-terabyte table which is 50% dead tuples and you don't have a spare
> > 1-terabytes to rewrite the whole table.
>
> But trying to VACUUM FULL that table is going to be horridly painful
> too, and you'll still have bloated indexes afterwards. You might as
> well just live with the 50% waste, especially since if you did a
> full-table update once you'll probably do it again sometime.
>
> I'm having a hard time believing that VACUUM FULL really has any
> interesting use-case anymore.

I have a client who uses temp tables heavily, hundreds of thousands of creates
and drops per day. They also have long running queries. The only thing that
keeps catalog bloat somewhat in check is vacuum full on bloated catalogs
a few times a day. Without that pg_class, pg_attribute etc quickly balloon to
thousands of pages.

-dg

--
David Gould daveg(at)sonic(dot)net 510 536 1443 510 282 0869
If simplicity worked, the world would be overrun with insects.

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	daveg <daveg(at)sonic(dot)net>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: remove flatfiles.c
Date:	2009-09-03 23:57:25
Message-ID:	4AA057E5.1030102@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

daveg wrote:
> On Tue, Sep 01, 2009 at 07:42:56PM -0400, Tom Lane wrote:
>
>> Greg Stark <gsstark(at)mit(dot)edu> writes:
>>
>>> On Wed, Sep 2, 2009 at 12:01 AM, Alvaro
>>> Herrera<alvherre(at)commandprompt(dot)com> wrote:
>>>
>>>>> The use cases where VACUUM FULL wins currently are where storing two
>>>>> copies of the table and its indexes concurrently just isn't practical.
>>>>>
>>>> Yeah, but then do you really need to use VACUUM FULL? If that's really
>>>> a problem then there ain't that many dead tuples around.
>>>>
>>> That's what I want to believe. But picture if you have, say a
>>> 1-terabyte table which is 50% dead tuples and you don't have a spare
>>> 1-terabytes to rewrite the whole table.
>>>
>> But trying to VACUUM FULL that table is going to be horridly painful
>> too, and you'll still have bloated indexes afterwards. You might as
>> well just live with the 50% waste, especially since if you did a
>> full-table update once you'll probably do it again sometime.
>>
>> I'm having a hard time believing that VACUUM FULL really has any
>> interesting use-case anymore.
>>
>
> I have a client who uses temp tables heavily, hundreds of thousands of creates
> and drops per day. They also have long running queries. The only thing that
> keeps catalog bloat somewhat in check is vacuum full on bloated catalogs
> a few times a day. Without that pg_class, pg_attribute etc quickly balloon to
> thousands of pages.
>
>
>

That's a rate of more than one create and drop per second. How does your
client handle the fact that VACUUM FULL will exclusively lock those
catalog tables? Without knowing more, it looks like a bit of a design issue.

cheers

andrew

From:	daveg <daveg(at)sonic(dot)net>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: remove flatfiles.c
Date:	2009-09-04 00:12:01
Message-ID:	20090904001201.GC6207@sonic.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Sep 03, 2009 at 07:57:25PM -0400, Andrew Dunstan wrote:
> daveg wrote:
> >On Tue, Sep 01, 2009 at 07:42:56PM -0400, Tom Lane wrote:
> >>I'm having a hard time believing that VACUUM FULL really has any
> >>interesting use-case anymore.
> >
> >I have a client who uses temp tables heavily, hundreds of thousands of
> >creates
> >and drops per day. They also have long running queries. The only thing that
> >keeps catalog bloat somewhat in check is vacuum full on bloated catalogs
> >a few times a day. Without that pg_class, pg_attribute etc quickly balloon
> >to thousands of pages.
>
> That's a rate of more than one create and drop per second. How does your
> client handle the fact that VACUUM FULL will exclusively lock those
> catalog tables? Without knowing more, it looks like a bit of a design issue.

I'd say it is several per second.

They wait for the catalog locks sometimes. This is not an interactive
application so that is somewhat acceptable. It also occasionally causes
deadlocks which is less agreeable.

There are various reasons for the heavy use of temps, mainly having to do
with loading external feeds or reusing intermediate query results in a series
of queries.

It would be great if there was a way to have temp tables that
did not get cataloged, eg local cache only.

-dg

--
David Gould daveg(at)sonic(dot)net 510 536 1443 510 282 0869
If simplicity worked, the world would be overrun with insects.

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	daveg(at)sonic(dot)net
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Eliminating VACUUM FULL WAS: remove flatfiles.c
Date:	2009-09-04 18:37:24
Message-ID:	4AA15E64.9020308@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

All,

>>> I have a client who uses temp tables heavily, hundreds of thousands of
>>> creates
>>> and drops per day. They also have long running queries. The only
thing that
>>> keeps catalog bloat somewhat in check is vacuum full on bloated catalogs
>>> a few times a day. With

Actually, this is a good point ... if we dropped VACUUM FULL, we'd need
to also be able to call CLUSTER (or VACUUM REWRITE) on the system catalogs.

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	daveg(at)sonic(dot)net, Andrew Dunstan <andrew(at)dunslane(dot)net>, Greg Stark <gsstark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Eliminating VACUUM FULL WAS: remove flatfiles.c
Date:	2009-09-04 18:48:42
Message-ID:	24477.1252090122@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Josh Berkus <josh(at)agliodbs(dot)com> writes:
>>> I have a client who uses temp tables heavily, hundreds of thousands of
>>> creates
>>> and drops per day. They also have long running queries. The only
>>> thing that
>>> keeps catalog bloat somewhat in check is vacuum full on bloated catalogs
>>> a few times a day. With

> Actually, this is a good point ... if we dropped VACUUM FULL, we'd need
> to also be able to call CLUSTER (or VACUUM REWRITE) on the system catalogs.

I don't think I believe the claim above that vacuum full is actually
necessary. Reasonably aggressive regular vacuuming ought to do it.

We used to have a bug that caused row deletions during backend shutdown
to not get reported to the stats collector; which had the effect that
dead catalog entries for temp tables didn't get counted, and so autovac
didn't hit the catalogs often enough, and so you'd get bloat in exactly
this scenario. I suspect the claim that manual vacuum full is necessary
is based on obsolete experience from before that bug got stomped.
It's hardly an ideal solution anyway given what an exclusive lock on
pg_class will do to the rest of the system --- and a cluster-like
cleanup won't be any better about that.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, daveg(at)sonic(dot)net, Andrew Dunstan <andrew(at)dunslane(dot)net>, Greg Stark <gsstark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Eliminating VACUUM FULL WAS: remove flatfiles.c
Date:	2009-09-04 18:53:04
Message-ID:	603c8f070909041153m5aef37detdc4f384a2967ced9@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Sep 4, 2009 at 2:48 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Josh Berkus <josh(at)agliodbs(dot)com> writes:
>>>> I have a client who uses temp tables heavily, hundreds of thousands of
>>>> creates
>>>> and drops per day. They also have long running queries. The only
>>>> thing that
>>>> keeps catalog bloat somewhat in check is vacuum full on bloated catalogs
>>>> a few times a day. With
>
>> Actually, this is a good point ... if we dropped VACUUM FULL, we'd need
>> to also be able to call CLUSTER (or VACUUM REWRITE) on the system catalogs.
>
> I don't think I believe the claim above that vacuum full is actually
> necessary. Reasonably aggressive regular vacuuming ought to do it.
>
> We used to have a bug that caused row deletions during backend shutdown
> to not get reported to the stats collector; which had the effect that
> dead catalog entries for temp tables didn't get counted, and so autovac
> didn't hit the catalogs often enough, and so you'd get bloat in exactly
> this scenario. I suspect the claim that manual vacuum full is necessary
> is based on obsolete experience from before that bug got stomped.
> It's hardly an ideal solution anyway given what an exclusive lock on
> pg_class will do to the rest of the system --- and a cluster-like
> cleanup won't be any better about that.

I'm confused. Are you saying that pg_class will never get bloated, so
we don't need a way to debloat it? I realize that with HOT bloat is
much less of a problem than it used to be, but surely it's not
altogether impossible...

...Robert

From:	Boszormenyi Zoltan <zb(at)cybertec(dot)at>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, daveg(at)sonic(dot)net, Andrew Dunstan <andrew(at)dunslane(dot)net>, Greg Stark <gsstark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Eliminating VACUUM FULL WAS: remove flatfiles.c
Date:	2009-09-04 19:01:27
Message-ID:	4AA16407.8050605@cybertec.at
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane írta:
> Josh Berkus <josh(at)agliodbs(dot)com> writes:
>
>>>> I have a client who uses temp tables heavily, hundreds of thousands of
>>>> creates
>>>> and drops per day. They also have long running queries. The only
>>>> thing that
>>>> keeps catalog bloat somewhat in check is vacuum full on bloated catalogs
>>>> a few times a day. With
>>>>
>
>
>> Actually, this is a good point ... if we dropped VACUUM FULL, we'd need
>> to also be able to call CLUSTER (or VACUUM REWRITE) on the system catalogs.
>>
>
> I don't think I believe the claim above that vacuum full is actually
> necessary. Reasonably aggressive regular vacuuming ought to do it.
>

How about setting a non-100% fillfactor on catalog tables?
Maybe by default? That would also avoid most of the bloat, wouldn't it?

> We used to have a bug that caused row deletions during backend shutdown
> to not get reported to the stats collector; which had the effect that
> dead catalog entries for temp tables didn't get counted, and so autovac
> didn't hit the catalogs often enough, and so you'd get bloat in exactly
> this scenario. I suspect the claim that manual vacuum full is necessary
> is based on obsolete experience from before that bug got stomped.
> It's hardly an ideal solution anyway given what an exclusive lock on
> pg_class will do to the rest of the system --- and a cluster-like
> cleanup won't be any better about that.
>
> regards, tom lane
>
>

--
Bible has answers for everything. Proof:
"But let your communication be, Yea, yea; Nay, nay: for whatsoever is more
than these cometh of evil." (Matthew 5:37) - basics of digital technology.
"May your kingdom come" - superficial description of plate tectonics

----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
http://www.postgresql.at/

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, daveg(at)sonic(dot)net, Andrew Dunstan <andrew(at)dunslane(dot)net>, Greg Stark <gsstark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Eliminating VACUUM FULL WAS: remove flatfiles.c
Date:	2009-09-04 19:10:44
Message-ID:	3451.1252091444@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> I'm confused. Are you saying that pg_class will never get bloated, so
> we don't need a way to debloat it? I realize that with HOT bloat is
> much less of a problem than it used to be, but surely it's not
> altogether impossible...

Well, it's certainly *possible*, I'm just questioning the assertion that
it's necessarily a common situation.

regards, tom lane

From:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, daveg(at)sonic(dot)net, Andrew Dunstan <andrew(at)dunslane(dot)net>, Greg Stark <gsstark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Eliminating VACUUM FULL WAS: remove flatfiles.c
Date:	2009-09-04 19:27:24
Message-ID:	1252092444.18360.0.camel@jd-desktop.unknown.charter.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, 2009-09-04 at 15:10 -0400, Tom Lane wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > I'm confused. Are you saying that pg_class will never get bloated, so
> > we don't need a way to debloat it? I realize that with HOT bloat is
> > much less of a problem than it used to be, but surely it's not
> > altogether impossible...
>
> Well, it's certainly *possible*, I'm just questioning the assertion that
> it's necessarily a common situation.

Depends on your definition of common. It is very easy for someone to
blow away their vacuum settings in such a way that it will become
bloated pretty quick.

Joshua D. Drake

>
> regards, tom lane
>
--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, daveg(at)sonic(dot)net, Andrew Dunstan <andrew(at)dunslane(dot)net>, Greg Stark <gsstark(at)mit(dot)edu>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Eliminating VACUUM FULL WAS: remove flatfiles.c
Date:	2009-09-04 19:28:18
Message-ID:	20090904192818.GL5603@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Joshua D. Drake escribió:
> On Fri, 2009-09-04 at 15:10 -0400, Tom Lane wrote:
> > Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > > I'm confused. Are you saying that pg_class will never get bloated, so
> > > we don't need a way to debloat it? I realize that with HOT bloat is
> > > much less of a problem than it used to be, but surely it's not
> > > altogether impossible...
> >
> > Well, it's certainly *possible*, I'm just questioning the assertion that
> > it's necessarily a common situation.
>
> Depends on your definition of common. It is very easy for someone to
> blow away their vacuum settings in such a way that it will become
> bloated pretty quick.

No problem, just CLUSTER that table same as today.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, daveg(at)sonic(dot)net, Andrew Dunstan <andrew(at)dunslane(dot)net>, Greg Stark <gsstark(at)mit(dot)edu>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Eliminating VACUUM FULL WAS: remove flatfiles.c
Date:	2009-09-04 19:36:37
Message-ID:	17447.1252092997@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> No problem, just CLUSTER that table same as today.

Uh, no, that was Josh's point: you can't CLUSTER pg_class, because you
can't change its relfilenode. If you do, backends won't know where to
read pg_class to find out its relfilenode.

I was wondering whether maintenance operations like "vacuum rewrite"
could get away with filling a new table file and then moving it into
place with rename(2), which is guaranteed atomic (at least on sane
filesystems). The idea doesn't work right off because (1) you need
to atomically install the updated indexes too, and (2) the table
might span more than one segment file. But maybe we could think of
something.

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, daveg(at)sonic(dot)net, Andrew Dunstan <andrew(at)dunslane(dot)net>, Greg Stark <gsstark(at)mit(dot)edu>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Eliminating VACUUM FULL WAS: remove flatfiles.c
Date:	2009-09-04 20:01:00
Message-ID:	19750.1252094460@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
>> No problem, just CLUSTER that table same as today.

> Uh, no, that was Josh's point: you can't CLUSTER pg_class, because you
> can't change its relfilenode. If you do, backends won't know where to
> read pg_class to find out its relfilenode.

> I was wondering whether maintenance operations like "vacuum rewrite"
> could get away with filling a new table file and then moving it into
> place with rename(2), which is guaranteed atomic (at least on sane
> filesystems). The idea doesn't work right off because (1) you need
> to atomically install the updated indexes too, and (2) the table
> might span more than one segment file. But maybe we could think of
> something.

Hmm ... reading that over again, it seems like there is a pretty
obvious solution. The sticking point --- not only for pg_class,
but for shared catalogs such as pg_database --- is the lack of a
way to track relfilenode if it ever changes. What if we keep
the relfilenode of these critical tables someplace else? For
instance, we could have a "map" file in each database holding
the relfilenode of pg_class, and one in $PGDATA/global holding
the relfilenodes of the shared catalogs and indexes. It'd be
possible to update a map file atomically via the rename(2) trick.
Then we teach relcache or some similar place to believe the map
files over the contents of pg_class.

This looks sort of like a reversion to flat files, but it has a couple
of saving graces:
1. The set of entries is fixed and small, so there's no performance
issue looming with database growth.
2. We could not synchronize updates with transaction commit, which was
always the real Achilles' heel of the flat files. But I think we don't
need to, if we restrict the set of operations that can change the
relfilenodes of critical tables to "maintenance" operations that only
rewrite the table contents and don't make any logical changes in the
contents. Given that restriction, transaction commit isn't actually
important; it will be the act of moving the updated map file into place
that effectively is commit for these operations. If you crash after
that, your change is still effective.

Thoughts?

regards, tom lane

From:	Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, daveg(at)sonic(dot)net, Andrew Dunstan <andrew(at)dunslane(dot)net>, Greg Stark <gsstark(at)mit(dot)edu>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Eliminating VACUUM FULL WAS: remove flatfiles.c
Date:	2009-09-04 20:37:39
Message-ID:	m21vmm1m5o.fsf@hi-media.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
>> No problem, just CLUSTER that table same as today.
>
> Uh, no, that was Josh's point: you can't CLUSTER pg_class, because you
> can't change its relfilenode. If you do, backends won't know where to
> read pg_class to find out its relfilenode.

Why can't MVCC apply here? You'd have two versions of the pg_class entry
that just has been CLUSTERed, and you keep the old relfilenode arround
too. MVCC applies, and you teach vacuum to clean out the old file when
cleaning out the no more visible tuple.

Or you take necessary locks to protect the operation.

Something so obvious that I'll regret asking why it can not be done this
way sure will get back from this mail :)

Regards.
--
dim

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, daveg(at)sonic(dot)net, Andrew Dunstan <andrew(at)dunslane(dot)net>, Greg Stark <gsstark(at)mit(dot)edu>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Eliminating VACUUM FULL WAS: remove flatfiles.c
Date:	2009-09-04 20:43:09
Message-ID:	20090904204309.GM5603@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Dimitri Fontaine escribió:
> Hi,
>
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> > Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> >> No problem, just CLUSTER that table same as today.
> >
> > Uh, no, that was Josh's point: you can't CLUSTER pg_class, because you
> > can't change its relfilenode. If you do, backends won't know where to
> > read pg_class to find out its relfilenode.
>
> Why can't MVCC apply here? You'd have two versions of the pg_class entry
> that just has been CLUSTERed, and you keep the old relfilenode arround
> too. MVCC applies, and you teach vacuum to clean out the old file when
> cleaning out the no more visible tuple.

It just doesn't work. pg_class (and various other rels) are special
because they are needed to bootstrap the catalog system. See
RelationCacheInitializePhase3. It wouldn't be possible to figure out
what's pg_class relfilenode until you have read it from pg_class, which
is precisely what we're trying to do.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

From:	Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, daveg(at)sonic(dot)net, Andrew Dunstan <andrew(at)dunslane(dot)net>, Greg Stark <gsstark(at)mit(dot)edu>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Eliminating VACUUM FULL WAS: remove flatfiles.c
Date:	2009-09-04 20:54:28
Message-ID:	m2skf2zb0b.fsf@hi-media.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Dimitri Fontaine escribió:
>> Why can't MVCC apply here? You'd have two versions of the pg_class entry
>> that just has been CLUSTERed, and you keep the old relfilenode arround
>> too. MVCC applies, and you teach vacuum to clean out the old file when
>> cleaning out the no more visible tuple.
>
> It just doesn't work. pg_class (and various other rels) are special
> because they are needed to bootstrap the catalog system. See
> RelationCacheInitializePhase3. It wouldn't be possible to figure out
> what's pg_class relfilenode until you have read it from pg_class, which
> is precisely what we're trying to do.

Well at bootstrap time I guess noone is able to disturb the system by
placing a concurrent CLUSTER pg_class; call. Once started, do those rels
still need to have a special behavior?

I guess I'm being dense, will now let people in the know find a solution...
--
dim

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, daveg(at)sonic(dot)net, Andrew Dunstan <andrew(at)dunslane(dot)net>, Greg Stark <gsstark(at)mit(dot)edu>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Eliminating VACUUM FULL WAS: remove flatfiles.c
Date:	2009-09-04 21:00:53
Message-ID:	20090904210053.GN5603@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Dimitri Fontaine escribió:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> > Dimitri Fontaine escribió:
> >> Why can't MVCC apply here? You'd have two versions of the pg_class entry
> >> that just has been CLUSTERed, and you keep the old relfilenode arround
> >> too. MVCC applies, and you teach vacuum to clean out the old file when
> >> cleaning out the no more visible tuple.
> >
> > It just doesn't work. pg_class (and various other rels) are special
> > because they are needed to bootstrap the catalog system. See
> > RelationCacheInitializePhase3. It wouldn't be possible to figure out
> > what's pg_class relfilenode until you have read it from pg_class, which
> > is precisely what we're trying to do.
>
> Well at bootstrap time I guess noone is able to disturb the system by
> placing a concurrent CLUSTER pg_class; call. Once started, do those rels
> still need to have a special behavior?

The relcache need to be bootstrapped more than once, not just at
initdb's bootstrap. (I guess you could try a breakpoint in formrdesc)

> I guess I'm being dense, will now let people in the know find a solution...

Yeah, well, Tom just posted a possible solution :-)

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, daveg(at)sonic(dot)net, Andrew Dunstan <andrew(at)dunslane(dot)net>, Greg Stark <gsstark(at)mit(dot)edu>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Eliminating VACUUM FULL WAS: remove flatfiles.c
Date:	2009-09-04 21:01:25
Message-ID:	21715.1252098085@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Dimitri Fontaine <dfontaine(at)hi-media(dot)com> writes:
> Well at bootstrap time I guess noone is able to disturb the system by
> placing a concurrent CLUSTER pg_class; call. Once started, do those rels
> still need to have a special behavior?

It doesn't matter, if you fail to get past bootstrap because you
couldn't find pg_class. The existing design for this is absolutely
dependent on the fact that pg_class has a fixed relfilenode = fixed
physical file name. MVCC has nothing to do with it.

See my followon message for a sketch of a possible solution. Basically
it's pushing the fixed file name over to another place ...

regards, tom lane

From:	Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, daveg(at)sonic(dot)net, Andrew Dunstan <andrew(at)dunslane(dot)net>, Greg Stark <gsstark(at)mit(dot)edu>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Eliminating VACUUM FULL WAS: remove flatfiles.c
Date:	2009-09-04 21:27:19
Message-ID:	m24oriz9hk.fsf@hi-media.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> The relcache need to be bootstrapped more than once, not just at
> initdb's bootstrap. (I guess you could try a breakpoint in formrdesc)

Ok so in RelationCacheInitializePhase3 we have formrdesc calls:

formrdesc("pg_class", false,
true, Natts_pg_class, Desc_pg_class);

That will do this:
/*
* initialize relation id from info in att array (my, this is ugly)
*/
RelationGetRelid(relation) = relation->rd_att->attrs[0]->attrelid;
relation->rd_rel->relfilenode = RelationGetRelid(relation);

And this uses data from pg_attribute.h which looks like this:

#define Schema_pg_class \
{ 1259, {"relname"}, 19, -1, 0, NAMEDATALEN, 1, 0, -1, -1, false, 'p', 'c', true, false, false, true, 0, { 0 } }, \

So in the source I'm reading, pg_class relfilenode is 1259, the system
knows how to get there, then read real values instead of bootstrap mode
dummy one.

Except that the bootstrap mode is used at initdb, at system startup and
anytime there's a cache miss?

So to be able to CLUSTER we have to find a way for cache miss to get the
file named 1259 or the new value, whatever it is after CLUSTER changed
it. I guess symlinks are not allowed?

> Yeah, well, Tom just posted a possible solution :-)

Flat file are back? really?

Regards,
--
dim

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, daveg(at)sonic(dot)net, Andrew Dunstan <andrew(at)dunslane(dot)net>, Greg Stark <gsstark(at)mit(dot)edu>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Eliminating VACUUM FULL WAS: remove flatfiles.c
Date:	2009-09-05 01:28:17
Message-ID:	603c8f070909041828s80f6cf3kf3f7de7ca3a07fe2@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Sep 4, 2009 at 4:01 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I wrote:
>> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
>>> No problem, just CLUSTER that table same as today.
>
>> Uh, no, that was Josh's point: you can't CLUSTER pg_class, because you
>> can't change its relfilenode. If you do, backends won't know where to
>> read pg_class to find out its relfilenode.
>
>> I was wondering whether maintenance operations like "vacuum rewrite"
>> could get away with filling a new table file and then moving it into
>> place with rename(2), which is guaranteed atomic (at least on sane
>> filesystems). The idea doesn't work right off because (1) you need
>> to atomically install the updated indexes too, and (2) the table
>> might span more than one segment file. But maybe we could think of
>> something.
>
> Hmm ... reading that over again, it seems like there is a pretty
> obvious solution. The sticking point --- not only for pg_class,
> but for shared catalogs such as pg_database --- is the lack of a
> way to track relfilenode if it ever changes. What if we keep
> the relfilenode of these critical tables someplace else? For
> instance, we could have a "map" file in each database holding
> the relfilenode of pg_class, and one in $PGDATA/global holding
> the relfilenodes of the shared catalogs and indexes. It'd be
> possible to update a map file atomically via the rename(2) trick.
> Then we teach relcache or some similar place to believe the map
> files over the contents of pg_class.
>
> This looks sort of like a reversion to flat files, but it has a couple
> of saving graces:
> 1. The set of entries is fixed and small, so there's no performance
> issue looming with database growth.
> 2. We could not synchronize updates with transaction commit, which was
> always the real Achilles' heel of the flat files. But I think we don't
> need to, if we restrict the set of operations that can change the
> relfilenodes of critical tables to "maintenance" operations that only
> rewrite the table contents and don't make any logical changes in the
> contents. Given that restriction, transaction commit isn't actually
> important; it will be the act of moving the updated map file into place
> that effectively is commit for these operations. If you crash after
> that, your change is still effective.

This doesn't seem totally horrible. But, before you go do it, do we
have a clearly-defined plan for the rest of the project? Because we
only need this if we're absolutely confident that rewriting the table
in place is just not an option worth keeping around. It's unclear to
me that everyone is convinced of that, and even if they are, it's
unclear to me what we plan to implement instead.

...Robert

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, daveg(at)sonic(dot)net, Andrew Dunstan <andrew(at)dunslane(dot)net>, Greg Stark <gsstark(at)mit(dot)edu>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Eliminating VACUUM FULL WAS: remove flatfiles.c
Date:	2009-09-05 01:37:34
Message-ID:	24852.1252114654@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Fri, Sep 4, 2009 at 4:01 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Hmm ... reading that over again, it seems like there is a pretty
>> obvious solution.

> This doesn't seem totally horrible. But, before you go do it, do we
> have a clearly-defined plan for the rest of the project?

Rest of what project? Removing vacuum full isn't a necessary component
of that. It would enable doing CLUSTER on pg_class, and it would
eliminate the crock of REINDEX having to reindex shared indexes
in-place. It could probably be justified even without any changes in
our approach to vacuum.

> ... only need this if we're absolutely confident that rewriting the table
> in place is just not an option worth keeping around. It's unclear to
> me that everyone is convinced of that, and even if they are, it's
> unclear to me what we plan to implement instead.

I thought we were pretty well agreed that a seqscan variant of
CLUSTER would be worth doing. Whether we take the next step by
eliminating vacuum full is a different question, but the shape of
the substitute seems perfectly clear.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, daveg(at)sonic(dot)net, Andrew Dunstan <andrew(at)dunslane(dot)net>, Greg Stark <gsstark(at)mit(dot)edu>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Eliminating VACUUM FULL WAS: remove flatfiles.c
Date:	2009-09-05 03:03:52
Message-ID:	603c8f070909042003p6c0b9768t9667266c12f80c1@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Sep 4, 2009 at 9:37 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Fri, Sep 4, 2009 at 4:01 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Hmm ... reading that over again, it seems like there is a pretty
>>> obvious solution.
>
>> This doesn't seem totally horrible. But, before you go do it, do we
>> have a clearly-defined plan for the rest of the project?
>
> Rest of what project? Removing vacuum full isn't a necessary component
> of that. It would enable doing CLUSTER on pg_class, and it would
> eliminate the crock of REINDEX having to reindex shared indexes
> in-place. It could probably be justified even without any changes in
> our approach to vacuum.

OK, I'm sold.

>> ... only need this if we're absolutely confident that rewriting the table
>> in place is just not an option worth keeping around. It's unclear to
>> me that everyone is convinced of that, and even if they are, it's
>> unclear to me what we plan to implement instead.
>
> I thought we were pretty well agreed that a seqscan variant of
> CLUSTER would be worth doing. Whether we take the next step by
> eliminating vacuum full is a different question, but the shape of
> the substitute seems perfectly clear.

Well, there were some other ideas discussed, but perhaps that's the
only one that had a clear consensus.

...Robert

From:	Andrew McNamara <andrewm(at)object-craft(dot)com(dot)au>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: remove flatfiles.c
Date:	2009-09-16 06:33:40
Message-ID:	20090916063341.0735C5AC0D6@longblack.object-craft.com.au
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

>> That's what I want to believe. But picture if you have, say a
>> 1-terabyte table which is 50% dead tuples and you don't have a spare
>> 1-terabytes to rewrite the whole table.
>
>But trying to VACUUM FULL that table is going to be horridly painful
>too, and you'll still have bloated indexes afterwards. You might as
>well just live with the 50% waste, especially since if you did a
>full-table update once you'll probably do it again sometime.
>
>I'm having a hard time believing that VACUUM FULL really has any
>interesting use-case anymore.

This was almost exactly the scenario I faced recently. A production
database unexpectedly filled up its partition. On investigation, we found
a developer had added a component to the application that updated every
row in one table each day, exhausting the free space map. Over time,
most of the tables in the system had grown to contain 50-70% dead tuples.

The owner of the system was understandably reluctant to dump and restore
the system, and there wasn't enough space left on the system to rewrite
any of the large tables. In the end, I dropped a table (the one owned
by the offending developer... 8-), and this gave me just enough space
to VACUUM FULL one table at a time.

--
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	pgsql-hackers(at)postgreSQL(dot)org
Subject:	Re: Eliminating VACUUM FULL WAS: remove flatfiles.c
Date:	2010-01-31 20:36:23
Message-ID:	15745.1264970183@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Back in September I wrote:
> ... The sticking point --- not only for pg_class,
> but for shared catalogs such as pg_database --- is the lack of a
> way to track relfilenode if it ever changes. What if we keep
> the relfilenode of these critical tables someplace else? For
> instance, we could have a "map" file in each database holding
> the relfilenode of pg_class, and one in $PGDATA/global holding
> the relfilenodes of the shared catalogs and indexes. It'd be
> possible to update a map file atomically via the rename(2) trick.
> Then we teach relcache or some similar place to believe the map
> files over the contents of pg_class.

Thinking about this some more, I can see one small disadvantage:
for the relations that we use the map file for, pg_class.relfilenode
would not be trustworthy. This would not affect most of the system
internals (which will be looking at the relcache's copy, which would
be kept valid by the relcache code). But it would affect user queries,
such as for example attempts to use contrib/oid2name to identify a
file on-disk. The main case where pg_class.relfilenode would be
likely to be out-of-sync is for shared catalogs. We could keep it
up to date in most cases for local catalogs, but there's no hope
of reaching into other databases' pg_class when a shared catalog
is relocated.

What I'd suggest doing about this is:

(1) Store zero in pg_class.relfilenode for those catalogs for which
the map is used. This at least makes it obvious that the value
you're looking at isn't valid.

(2) Provide a SQL function to extract the real relfilenode of any
specified pg_class entry. We'd have to modify oid2name and
pg_dump to know to use the function instead of looking at the
column.

There might be some other client-side code that would be broken
until it got taught about the function, but hopefully not much.

Thoughts?

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Eliminating VACUUM FULL WAS: remove flatfiles.c
Date:	2010-01-31 22:58:55
Message-ID:	603c8f071001311458i767e8c04l69c38c00053287b3@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Jan 31, 2010 at 3:36 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Back in September I wrote:
>> ... The sticking point --- not only for pg_class,
>> but for shared catalogs such as pg_database --- is the lack of a
>> way to track relfilenode if it ever changes. What if we keep
>> the relfilenode of these critical tables someplace else? For
>> instance, we could have a "map" file in each database holding
>> the relfilenode of pg_class, and one in $PGDATA/global holding
>> the relfilenodes of the shared catalogs and indexes. It'd be
>> possible to update a map file atomically via the rename(2) trick.
>> Then we teach relcache or some similar place to believe the map
>> files over the contents of pg_class.
>
> Thinking about this some more, I can see one small disadvantage:
> for the relations that we use the map file for, pg_class.relfilenode
> would not be trustworthy. This would not affect most of the system
> internals (which will be looking at the relcache's copy, which would
> be kept valid by the relcache code). But it would affect user queries,
> such as for example attempts to use contrib/oid2name to identify a
> file on-disk. The main case where pg_class.relfilenode would be
> likely to be out-of-sync is for shared catalogs. We could keep it
> up to date in most cases for local catalogs, but there's no hope
> of reaching into other databases' pg_class when a shared catalog
> is relocated.
>
> What I'd suggest doing about this is:
>
> (1) Store zero in pg_class.relfilenode for those catalogs for which
> the map is used. This at least makes it obvious that the value
> you're looking at isn't valid.
>
> (2) Provide a SQL function to extract the real relfilenode of any
> specified pg_class entry. We'd have to modify oid2name and
> pg_dump to know to use the function instead of looking at the
> column.
>
> There might be some other client-side code that would be broken
> until it got taught about the function, but hopefully not much.
>
> Thoughts?

Seems reasonable to me (assuming there's no way to avoid changing the
relfilenode, which I assume is the case but don't actually know the
code well enough to say with certainty).

...Robert