Quick Links

Feedback on getting rid of VACUUM FULL

Lists:	pgsql-hackers

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Feedback on getting rid of VACUUM FULL
Date:	2009-09-16 18:10:35
Message-ID:	4AB12A1B.2050401@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hackers,

Here's the feedback on replacing VACUUM FULL with VACUUM REWRITE:
http://it.toolbox.com/blogs/database-soup/getting-rid-of-vacuum-full-feedback-needed-33959

Of note:

a) To date, I have yet to hear a single person bring up an actual
real-life use-case where VACUUM FULL was desireable and REWRITE would
not be. Lots of people have said something hypothetical, but nobody has
come forward with a "I have this database X and several times Y
happened, and only FULL would work ...". This makes me think that there
very likey are no actual use cases where we need to preserve FULL.

b) Several people have strongly pushed for a phased removal of FULL over
more than one PG version, with a warning message about depreciation.

c) Vivek had some points about required implementation:

"However, there still must be a way to compact the tables that is mvcc
safe. From what I have read and recall, cluster is not. Thus, the vacuum
rewrite would be a mandatory feature (or cluster could be made mvcc safe)."

Is Vivek correct about this? News to me ...

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Josh Berkus" <josh(at)agliodbs(dot)com>
Cc:	<pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-16 18:20:27
Message-ID:	4AB0E61B020000250002B1D5@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Josh Berkus <josh(at)agliodbs(dot)com> wrote:

> a) To date, I have yet to hear a single person bring up an actual
> real-life use-case where VACUUM FULL was desireable and REWRITE
> would not be.

Would rewrite have handled this?:

http://archives.postgresql.org/pgsql-hackers/2009-09/msg01016.php

-Kevin

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-16 18:23:04
Message-ID:	4AB12D08.3000204@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Josh Berkus wrote:
> Hackers,
>
> Here's the feedback on replacing VACUUM FULL with VACUUM REWRITE:
> http://it.toolbox.com/blogs/database-soup/getting-rid-of-vacuum-full-feedback-needed-33959
>
> Of note:
>
> a) To date, I have yet to hear a single person bring up an actual
> real-life use-case where VACUUM FULL was desireable and REWRITE would
> not be. Lots of people have said something hypothetical, but nobody has
> come forward with a "I have this database X and several times Y
> happened, and only FULL would work ...". This makes me think that there
> very likey are no actual use cases where we need to preserve FULL.

Well, Andrew McNamara just posted today:
http://archives.postgresql.org/message-id/20090916063341.0735C5AC0D6@longblack.object-craft.com.au

Had VACUUM FULL not been available, though, I'm pretty sure he would've
come up with something else instead.

> c) Vivek had some points about required implementation:
>
> "However, there still must be a way to compact the tables that is mvcc
> safe. From what I have read and recall, cluster is not. Thus, the vacuum
> rewrite would be a mandatory feature (or cluster could be made mvcc safe)."
>
> Is Vivek correct about this? News to me ...

No, that was fixed in 8.3.

I was just going to post that we should make a decision about this,
because ISTM there's some code in Simon's hot standby patch that is only
required to support VACUUM FULL. If we make the decision that we drop
VACUUM FULL in 8.5, we can take that part out of the patch now. It's not
a huge amount of code, but still.

I'm in favor of removing VACUUM FULL in 8.5. To replace it, we should offer:

1) VACUUM REWRITE, which is like CLUSTER but doesn't use an index, and
2) Another utility that does something like UPDATE ... WHERE ctid > ? to
move tuples to lower pages. It will be different from current VACUUM
FULL in some ways. It won't require a table lock, for example, but it
won't be able to move update chains as nicely. But it would be trivial
to write one, so I think we should offer that as a contrib module.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Hannu Krosing <hannu(at)krosing(dot)net>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-16 18:37:13
Message-ID:	1253126233.778.26.camel@hvost1700
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 2009-09-16 at 11:10 -0700, Josh Berkus wrote:
> Hackers,
>
> Here's the feedback on replacing VACUUM FULL with VACUUM REWRITE:
> http://it.toolbox.com/blogs/database-soup/getting-rid-of-vacuum-full-feedback-needed-33959
>
> Of note:
>
> a) To date, I have yet to hear a single person bring up an actual
> real-life use-case where VACUUM FULL was desireable and REWRITE would
> not be.

The only case is when you are out of disk space and can't afford to
write out a full set of live rows.

What I'd like to propose is VACUUM FULL CONCURRENTLY, which similar to
VACUUM CONCURRENTLY, would actually do the compaction phase, that is,
move simultaneously from two directions, from start, to find empty space
and from end to find tuples. for each sufficiently large empty space the
forward scan finds it would take one or more tuples from the reverse
scan and then "null update" those to the empty space found by the
free-space-scan beginning. it should do that in small chunks, say one
page at a time, so it will minimally interfere with OLTP loads.

Once these two scans meet, you can stop and either run an non full
vacuum, or just continue in similar fashion to non-full vacuum and do
the cleanups of indexes and heap.

You may need to repeat this a few times to get actual shrinkage but it
has the very real advantage of being usable on 24/7 systems, which
neither VACUUM FULL nor CLUSTER possess.

At some point I actually had external scripts doing similar stuff for
on-line table shrinking, the only difference being that I could not move
the tuple towards beginning right away (pg preferred in-page updates)
and had to keep doing null updates (id=id where id) until the page
number in ctid changed.

> Lots of people have said something hypothetical, but nobody has
> come forward with a "I have this database X and several times Y
> happened, and only FULL would work ...". This makes me think that there
> very likey are no actual use cases where we need to preserve FULL.
>
> b) Several people have strongly pushed for a phased removal of FULL over
> more than one PG version, with a warning message about depreciation.
>
> c) Vivek had some points about required implementation:
>
> "However, there still must be a way to compact the tables that is mvcc
> safe. From what I have read and recall, cluster is not. Thus, the vacuum
> rewrite would be a mandatory feature (or cluster could be made mvcc safe)."
>
> Is Vivek correct about this? News to me ...

It used to be true at some point, probably not true any more.

IIRC, the problem was, that old table was not locked during rewrite and
thus some code could be updating the old heap even while the data had
been muved to the new one.

> --
> Josh Berkus
> PostgreSQL Experts Inc.
> www.pgexperts.com
>

From:	Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-16 18:40:09
Message-ID:	1253126409.778.28.camel@hvost1700
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 2009-09-16 at 21:23 +0300, Heikki Linnakangas wrote:

> I was just going to post that we should make a decision about this,
> because ISTM there's some code in Simon's hot standby patch that is only
> required to support VACUUM FULL. If we make the decision that we drop
> VACUUM FULL in 8.5, we can take that part out of the patch now. It's not
> a huge amount of code, but still.
>
> I'm in favor of removing VACUUM FULL in 8.5. To replace it, we should offer:
>
> 1) VACUUM REWRITE, which is like CLUSTER but doesn't use an index, and
> 2) Another utility that does something like UPDATE ... WHERE ctid > ? to
> move tuples to lower pages. It will be different from current VACUUM
> FULL in some ways. It won't require a table lock, for example, but it
> won't be able to move update chains as nicely. But it would be trivial
> to write one, so I think we should offer that as a contrib module.

I have not checked, but I suspect pg_reorg may already be doing
something similar http://pgfoundry.org/forum/forum.php?forum_id=1561

> --
> Heikki Linnakangas
> EnterpriseDB http://www.enterprisedb.com

--
Hannu Krosing http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability
Services, Consulting and Training

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-16 18:40:50
Message-ID:	1253126450.29243.287.camel@monkey-cat.sm.truviso.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 2009-09-16 at 21:23 +0300, Heikki Linnakangas wrote:
> 2) Another utility that does something like UPDATE ... WHERE ctid > ? to
> move tuples to lower pages. It will be different from current VACUUM
> FULL in some ways. It won't require a table lock, for example, but it
> won't be able to move update chains as nicely. But it would be trivial
> to write one, so I think we should offer that as a contrib module.

An advantage here is that it would allow people to do a "partial vacuum
full" to gradually move tuples from the end of the relation to the
beginning. That would allow vacuums in between the updates to free the
index tuples, preventing index bloat.

Another thing to think about is that lazy vacuum only shrinks the heap
file if it happens to be able to acquire an access exclusive lock.
Because vacuum can't be run inside a transaction block, I don't think
there's currently a way to ensure that the heap file actually gets
shrunk. How about we provide some way to make it acquire an access
exclusive lock at the beginning, but still perform a lazy vacuum?

Regards,
Jeff Davis

From:	Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-16 18:41:03
Message-ID:	1253126463.778.29.camel@hvost1700
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

The only case is when you are out of disk space and can't afford to
write out a full set of live rows.

Once these two scans meet, you can stop and either run an non full
vacuum, or just continue in similar fashion to non-full vacuum and do
the cleanups of indexes and heap.

You may need to repeat this a few times to get actual shrinkage but it
has the very real advantage of being usable on 24/7 systems, which
neither VACUUM FULL nor CLUSTER possess.

It used to be true at some point, probably not true any more.

IIRC, the problem was, that old table was not locked during rewrite and
thus some code could be updating the old heap even while the data had
been muved to the new one.

> --
> Josh Berkus
> PostgreSQL Experts Inc.
> www.pgexperts.com
>
--
Hannu Krosing http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability
Services, Consulting and Training

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-16 18:42:18
Message-ID:	4AB1318A.6070803@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 9/16/09 11:20 AM, Kevin Grittner wrote:
> Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>
>> a) To date, I have yet to hear a single person bring up an actual
>> real-life use-case where VACUUM FULL was desireable and REWRITE
>> would not be.
>
> Would rewrite have handled this?:
>
> http://archives.postgresql.org/pgsql-hackers/2009-09/msg01016.php

Ok, that sounds like a real use case.

However, given Heikki's post about FULL being an issue for Hot Standby,
I'm more inclined to provide a workaround ... for example, allowing
REWRITE to write to a designated tablespace, which would allow people to
use a portable drive or similar for the extra disk space.

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-16 18:48:16
Message-ID:	4AB132F0.8020901@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hannu,

> The only case is when you are out of disk space and can't afford to
> write out a full set of live rows.

Well, it's actually rather specific. You need to have:

a) *Some* free disk space (FULL requires extra disk) but not enough to
copy one entire table and its indexes.

b) be already down or willing to accept the long downtime which comes
with FULL more than you're willing to go out and get some extra disk or
move your database to a new share.

There's no question that this combination is fairly circumstantial and
represents a minority of potential vacuum cases. Unfortunately, it does
seem to represent some real-life ones, so we have to take those into
account.

> What I'd like to propose is VACUUM FULL CONCURRENTLY, which similar to
> VACUUM CONCURRENTLY, would actually do the compaction phase, that is,
> move simultaneously from two directions, from start, to find empty space
> and from end to find tuples. for each sufficiently large empty space the
> forward scan finds it would take one or more tuples from the reverse
> scan and then "null update" those to the empty space found by the
> free-space-scan beginning. it should do that in small chunks, say one
> page at a time, so it will minimally interfere with OLTP loads.

How would this work with HS?

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

From:	Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-16 18:49:27
Message-ID:	1253126967.778.36.camel@hvost1700
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 2009-09-16 at 13:20 -0500, Kevin Grittner wrote:
> Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>
> > a) To date, I have yet to hear a single person bring up an actual
> > real-life use-case where VACUUM FULL was desireable and REWRITE
> > would not be.
>
> Would rewrite have handled this?:
>
> http://archives.postgresql.org/pgsql-hackers/2009-09/msg01016.php

If REWRITE is just a CLUSTER using seqscan, then no

If it is a sequence of

1. ordinary VACUUM (it can't run out of FSM anymore, no?)
2. a process moving live tuples from end (using reverse seqscan) to free
space found scanning in first-to-last direction, either one tuple at a
time or one page at a time, until the two scans meet
3. another ordinary VACUUM to actually reclaim the free space

4. repeat a few times so that tuples at the end of relation (for
whatever reason) added while doing 1-3 are also moved towards beginning

then yes, it would have taken some time, but it would have definitely
helped

It would still have caused index bloat, so to get full benefit of it,
one should have finished it up with an equivalent of CONCURRENT REINDEX.

--
Hannu Krosing http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability
Services, Consulting and Training

From:	Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-16 18:53:42
Message-ID:	1253127222.778.38.camel@hvost1700
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 2009-09-16 at 11:48 -0700, Josh Berkus wrote:
> Hannu,
>
> > The only case is when you are out of disk space and can't afford to
> > write out a full set of live rows.
>
> Well, it's actually rather specific. You need to have:
>
> a) *Some* free disk space (FULL requires extra disk) but not enough to
> copy one entire table and its indexes.
>
> b) be already down or willing to accept the long downtime which comes
> with FULL more than you're willing to go out and get some extra disk or
> move your database to a new share.
>
> There's no question that this combination is fairly circumstantial and
> represents a minority of potential vacuum cases. Unfortunately, it does
> seem to represent some real-life ones, so we have to take those into
> account.

Agreed.

> > What I'd like to propose is VACUUM FULL CONCURRENTLY, which similar to
> > VACUUM CONCURRENTLY, would actually do the compaction phase, that is,
> > move simultaneously from two directions, from start, to find empty space
> > and from end to find tuples. for each sufficiently large empty space the
> > forward scan finds it would take one or more tuples from the reverse
> > scan and then "null update" those to the empty space found by the
> > free-space-scan beginning. it should do that in small chunks, say one
> > page at a time, so it will minimally interfere with OLTP loads.
>
> How would this work with HS?

Exactly the same as just doing a lot of UPDATE's which move tuples
around between pages.

It actually _is_ a lots of updates, just with extra condition that tuple
is always moved to lowest available free slot.

> --
> Josh Berkus
> PostgreSQL Experts Inc.
> www.pgexperts.com
>
--
Hannu Krosing http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability
Services, Consulting and Training

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-16 19:16:37
Message-ID:	4AB13995.6040901@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hannu,

> If it is a sequence of
>
> 1. ordinary VACUUM (it can't run out of FSM anymore, no?)
> 2. a process moving live tuples from end (using reverse seqscan) to free
> space found scanning in first-to-last direction, either one tuple at a
> time or one page at a time, until the two scans meet
> 3. another ordinary VACUUM to actually reclaim the free space
>
> 4. repeat a few times so that tuples at the end of relation (for
> whatever reason) added while doing 1-3 are also moved towards beginning

Sounds good, you want to code it for 8.5?

I could actually see two tools, one VACUUM FULL CONCURRENTLY and one
VACUUM REWRITE. The first would be "in place" and the second would be
"fast". Both should work better with HS than current VF does.

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-16 20:53:57
Message-ID:	4AB15065.1000607@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hannu Krosing wrote:
> On Wed, 2009-09-16 at 21:23 +0300, Heikki Linnakangas wrote:
>> 2) Another utility that does something like UPDATE ... WHERE ctid > ? to
>> move tuples to lower pages. It will be different from current VACUUM
>> FULL in some ways. It won't require a table lock, for example, but it
>> won't be able to move update chains as nicely. But it would be trivial
>> to write one, so I think we should offer that as a contrib module.
>
> I have not checked, but I suspect pg_reorg may already be doing
> something similar http://pgfoundry.org/forum/forum.php?forum_id=1561

Hmm, AFAICT pg_reorg is much more complex, writing stuff to a temp table
and swapping relfilenodes afterwards. More like the VACUUM REWRITE
that's been discussed.

For the kicks, I looked at what it would take to write a utility like
that. It turns out to be quite trivial, patch attached. It uses the same
principle as VACUUM FULL, scans from the end, moving tuples to
lower-numbered pages until it can't do it anymore. It requires a small
change to heap_update(), to override the preference to store the new
tuple on the same page as the old one, but other than that, it's all in
the external module.

To test:

-- Create and populate test table
CREATE TABLE foo (id int4 PRIMARY KEY);
INSERT INTO foo SELECT a FROM generate_series(1,100000) a;

-- Delete a lot of tuples from the beginning. This creates the hole that
we want to compact out.
DELETE FROM foo WHERE id < 90000;

-- Vacuum to remove the dead tuples
VACUUM VERBOSE foo;

-- Run the utility to "move" the tuples
SELECT vacuumfull('foo');

-- Vacuum table again to remove the old tuple versions of the moved rows
and truncate the file.

VACUUM VERBOSE foo;

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment	Content-Type	Size
vacuumfull-contrib-1.patch	text/x-diff	14.6 KB

From:	Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-16 21:07:25
Message-ID:	1253135245.778.62.camel@hvost1700
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 2009-09-16 at 23:53 +0300, Heikki Linnakangas wrote:
> Hannu Krosing wrote:
> > On Wed, 2009-09-16 at 21:23 +0300, Heikki Linnakangas wrote:
> >> 2) Another utility that does something like UPDATE ... WHERE ctid > ? to
> >> move tuples to lower pages. It will be different from current VACUUM
> >> FULL in some ways. It won't require a table lock, for example, but it
> >> won't be able to move update chains as nicely. But it would be trivial
> >> to write one, so I think we should offer that as a contrib module.
> >
> > I have not checked, but I suspect pg_reorg may already be doing
> > something similar http://pgfoundry.org/forum/forum.php?forum_id=1561
>
> Hmm, AFAICT pg_reorg is much more complex, writing stuff to a temp table
> and swapping relfilenodes afterwards. More like the VACUUM REWRITE
> that's been discussed.
>
> For the kicks, I looked at what it would take to write a utility like
> that. It turns out to be quite trivial, patch attached. It uses the same
> principle as VACUUM FULL, scans from the end, moving tuples to
> lower-numbered pages until it can't do it anymore. It requires a small
> change to heap_update(), to override the preference to store the new
> tuple on the same page as the old one, but other than that, it's all in
> the external module.

Exactly as I hoped :D

One thing that would be harder to do, and which CLUSTER currently does
is introducing empty space within pages, based on fillfactor.

Doing that would need a similar, though reversed strategy. But it is
probably not something that is often needed, as a an update on page with
no free space would eventually do almost the same.

> To test:
>
> -- Create and populate test table
> CREATE TABLE foo (id int4 PRIMARY KEY);
> INSERT INTO foo SELECT a FROM generate_series(1,100000) a;
>
> -- Delete a lot of tuples from the beginning. This creates the hole that
> we want to compact out.
> DELETE FROM foo WHERE id < 90000;
>
> -- Vacuum to remove the dead tuples
> VACUUM VERBOSE foo;
>
> -- Run the utility to "move" the tuples
> SELECT vacuumfull('foo');
>
> -- Vacuum table again to remove the old tuple versions of the moved rows
> and truncate the file.
>
> VACUUM VERBOSE foo;

Now, if you could just make vacuumfull('foo'); run in multiple
transactions (say one per N tuples moved, or even per N seconds spent)
to make it friendlier for OLTP workloads, which then dont have to wait
for the whole thing to finish in order to proceed with update of a moved
tuple (and also to deal with deadloks from trying to move an updated
tuple) then I'd claim we have a much better VACUUM FULL :)

--
Hannu Krosing http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability
Services, Consulting and Training

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Hannu Krosing <hannu(at)2ndQuadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-16 21:29:22
Message-ID:	1253136562.29243.348.camel@monkey-cat.sm.truviso.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 2009-09-16 at 23:53 +0300, Heikki Linnakangas wrote:
> For the kicks, I looked at what it would take to write a utility like
> that. It turns out to be quite trivial, patch attached. It uses the same
> principle as VACUUM FULL, scans from the end, moving tuples to
> lower-numbered pages until it can't do it anymore. It requires a small
> change to heap_update(), to override the preference to store the new
> tuple on the same page as the old one, but other than that, it's all in
> the external module.

It fails at initdb time for me:
FATAL: unrecognized heap_update status: 5
STATEMENT: REVOKE ALL on pg_authid FROM public;

Regards,
Jeff Davis

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Hannu Krosing <hannu(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-16 21:43:45
Message-ID:	603c8f070909161443n46f36466mbc615cd2d4d47ac9@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Sep 16, 2009 at 4:53 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Hannu Krosing wrote:
>> On Wed, 2009-09-16 at 21:23 +0300, Heikki Linnakangas wrote:
>>> 2) Another utility that does something like UPDATE ... WHERE ctid > ? to
>>> move tuples to lower pages. It will be different from current VACUUM
>>> FULL in some ways. It won't require a table lock, for example, but it
>>> won't be able to move update chains as nicely. But it would be trivial
>>> to write one, so I think we should offer that as a contrib module.
>>
>> I have not checked, but I suspect pg_reorg may already be doing
>> something similar http://pgfoundry.org/forum/forum.php?forum_id=1561
>
> Hmm, AFAICT pg_reorg is much more complex, writing stuff to a temp table
> and swapping relfilenodes afterwards. More like the VACUUM REWRITE
> that's been discussed.
>
> For the kicks, I looked at what it would take to write a utility like
> that. It turns out to be quite trivial, patch attached. It uses the same
> principle as VACUUM FULL, scans from the end, moving tuples to
> lower-numbered pages until it can't do it anymore. It requires a small
> change to heap_update(), to override the preference to store the new
> tuple on the same page as the old one, but other than that, it's all in
> the external module.
>
> To test:
>
> -- Create and populate test table
> CREATE TABLE foo (id int4 PRIMARY KEY);
> INSERT INTO foo SELECT a FROM generate_series(1,100000) a;
>
> -- Delete a lot of tuples from the beginning. This creates the hole that
> we want to compact out.
> DELETE FROM foo WHERE id < 90000;
>
> -- Vacuum to remove the dead tuples
> VACUUM VERBOSE foo;
>
> -- Run the utility to "move" the tuples
> SELECT vacuumfull('foo');
>
> -- Vacuum table again to remove the old tuple versions of the moved rows
> and truncate the file.
>
> VACUUM VERBOSE foo;

I think this should be in core, not a contrib module.

I also wonder whether we should consider teaching regular VACUUM to do
a little of this every time it's run. Right now, once your table gets
bloated, it stays bloated forever, until you intervene. Making it
slowly get better by itself would reduce the number of people who live
with the problem for a month or a year before writing in to say
"Access to this table seems really slow...".

...Robert

From:	Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-16 22:06:31
Message-ID:	3073cc9b0909161506n595db8d2q4ca812f3b37fb8e2@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Sep 16, 2009 at 1:42 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> On 9/16/09 11:20 AM, Kevin Grittner wrote:
>> Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>>
>>> a) To date, I have yet to hear a single person bring up an actual
>>> real-life use-case where VACUUM FULL was desireable and REWRITE
>>> would not be.
>>
>> Would rewrite have handled this?:
>>
>> http://archives.postgresql.org/pgsql-hackers/2009-09/msg01016.php
>
> Ok, that sounds like a real use case.
>
> However, given Heikki's post about FULL being an issue for Hot Standby,
> I'm more inclined to provide a workaround ... for example, allowing
> REWRITE to write to a designated tablespace, which would allow people to
> use a portable drive or similar for the extra disk space.
>

if you have a portable drive at hand you can create a tablespace in
that dirve, move the table to that tablespace, return to the old
tablespace and drop the new tblspc... ok, one command for all that
could be handy but not a need...

the real problem is when you *don't* have more space... i have been
recently in that situation and vaccum full was a life saver but the
only reason that server came to that situation was a horribly fsm
configuration and a bad design that forces an incredible amount of
updates...

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc:	"Hannu Krosing" <hannu(at)2ndquadrant(dot)com>, "Josh Berkus" <josh(at)agliodbs(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-16 23:31:40
Message-ID:	4AB12F0C020000250002B20C@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> I think this should be in core, not a contrib module.

+1

> I also wonder whether we should consider teaching regular VACUUM to
> do a little of this every time it's run. Right now, once your table
> gets bloated, it stays bloated forever, until you intervene. Making
> it slowly get better by itself would reduce the number of people who
> live with the problem for a month or a year before writing in to say
> "Access to this table seems really slow...".

+1 if feasible. That would be a very nice feature.

-Kevin

From:	Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-16 23:34:03
Message-ID:	4AB175EB.8030206@cheapcomplexdevices.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> Hannu Krosing wrote:
>>> On Wed, 2009-09-16 at 21:23 +0300, Heikki Linnakangas wrote:
>>>> 2) Another utility that does something like UPDATE ... WHERE ctid > ? to
> I also wonder whether we should consider teaching regular VACUUM to do
> a little of this every time it's run. Right now, once your table gets

Having it be built into VACUUM would surprise me a bit, but I wonder
if autovacuum could detect when such a tuple-mover would be useful,
and run one before it does a VACUUM if needed.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Hannu Krosing <hannu(at)2ndQuadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-17 00:36:57
Message-ID:	3409.1253147817@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> For the kicks, I looked at what it would take to write a utility like
> that. It turns out to be quite trivial, patch attached.

I don't think you've really thought this through; particularly not this:

> + rel = heap_open(relid, AccessShareLock);

You can NOT modify a relation with only AccessShareLock, and frankly
I doubt you should be doing this with less than exclusive lock. Which
would make the thing quite unpleasant to use in practice.

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-17 01:00:21
Message-ID:	3631.1253149221@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> I'm in favor of removing VACUUM FULL in 8.5. To replace it, we should offer:

> 1) VACUUM REWRITE, which is like CLUSTER but doesn't use an index, and

Check, although I'm not eager to make REWRITE a fully reserved word,
which is what this would entail. I would propose that we call this
VACUUM FULL.

> 2) Another utility that does something like UPDATE ... WHERE ctid > ? to
> move tuples to lower pages. It will be different from current VACUUM
> FULL in some ways. It won't require a table lock, for example, but it
> won't be able to move update chains as nicely.

I think it does require a table lock; you are ignoring the impact on
concurrent transactions of changing existing tuples' CTIDs (and XMINs).
In particular this could absolutely NOT be a standard part of plain
vacuum, despite all the wishful thinking going on downthread. But if
we get rid of old-style VACUUM FULL then we do need something to cover
those few-and-far-between situations where you really do desperately
need to compact a table in place; and a utility like this seems like a
reasonable solution. I'm thinking in particular that it should be
possible to have it move just a bounded number of tuples at a time,
so that you could do a VACUUM to clean out the indexes in between
move passes. Otherwise you run the risk of running out of disk space
anyway, due to index bloat.

regards, tom lane

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-17 01:02:34
Message-ID:	1253149354.9666.95.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 2009-09-16 at 21:23 +0300, Heikki Linnakangas wrote:

All it saves is a few hacks, which realistically don't cause anything
more than an eyesore. VF has been ugly for years so we don't need to
react quickly and I don't want to delay HS. Please let's not focus on
side problems.

> I'm in favor of removing VACUUM FULL in 8.5. To replace it, we should offer:
>
> 1) VACUUM REWRITE, which is like CLUSTER but doesn't use an index, and

I think that can be called VACUUM FULL also. We are just changing the
internal implementation after all. There are too many scripts that
already invoke VF to ask people to rewrite.

Hmmm, I think such a utility could easily cause more complaints than
VACUUM FULL unless we had a few other things as well. It doesn't move
update chains so it will mean that the table will not be able to shrink
immediately, nor even for a long time afterwards if there are long
queries. If a table were concurrently updated then this would not help
at all, unless the FSM channelled *all* backends carefully to parts of
the table that would help the process rather than hinder it. It will
also bloat indexes just as VF does now. REINDEX CONCURRENTLY would help
with that and we need it anyway for other reasons - and it needs to be
invoked by autovacuum.

A better way would be to have the FSM sense that packing was needed and
then alter the path transactions take so that they naturally begin to
repack the table over time. That way we wouldn't need to run a utility
at all in most cases.

--
Simon Riggs www.2ndQuadrant.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-17 01:07:36
Message-ID:	1253149656.9666.99.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 2009-09-16 at 11:40 -0700, Jeff Davis wrote:

> Another thing to think about is that lazy vacuum only shrinks the heap
> file if it happens to be able to acquire an access exclusive lock.
> Because vacuum can't be run inside a transaction block, I don't think
> there's currently a way to ensure that the heap file actually gets
> shrunk. How about we provide some way to make it acquire an access
> exclusive lock at the beginning, but still perform a lazy vacuum?

I think it would be useful to have an additional option to force VACUUM
to wait for the lock so it can truncate. It's annoying to have to re-run
VACUUM just to give it a chance at the lock again.

--
Simon Riggs www.2ndQuadrant.com

From:	Andrew McNamara <andrewm(at)object-craft(dot)com(dot)au>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-17 01:09:01
Message-ID:	20090917010901.DE6B65AC0D6@longblack.object-craft.com.au
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

>Well, Andrew McNamara just posted today:
>http://archives.postgresql.org/message-id/20090916063341.0735C5AC0D6@longblack.object-craft.com.au
>
>Had VACUUM FULL not been available, though, I'm pretty sure he would've
>come up with something else instead.

Indeed I would have. And it was our own slackness that got us into
the situation.

Several people suggested using a portable drive - in this case, it would
not have been practical as the machines are physically managed by another
group at a remote location (the paperwork would be the real blocker).
Getting more drives added to the SAN would have been even more painful.

My preference would be to keep the VACUUM FULL command, but to reimplement
it as a table rewriter (like CLUSTER?).

I see little risk to changing the behaviour without changing the name -
only experts are currently aware exactly what it actually does, and they
are more likely to keep an eye out for changes like this.

--
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Hannu Krosing <hannu(at)2ndQuadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-17 01:18:08
Message-ID:	1253150288.9666.108.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 2009-09-16 at 20:36 -0400, Tom Lane wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> > For the kicks, I looked at what it would take to write a utility like
> > that. It turns out to be quite trivial, patch attached.
>
> I don't think you've really thought this through; particularly not this:
>
> > + rel = heap_open(relid, AccessShareLock);
>
> You can NOT modify a relation with only AccessShareLock, and frankly
> I doubt you should be doing this with less than exclusive lock. Which
> would make the thing quite unpleasant to use in practice.

C'mon, we know he knows that.

But I guess we should define the locking requirement for such a utility
explicitly: ShareUpdateExclusiveLock, please.

What we need is VACUUM FULL CONCURRENTLY and REINDEX CONCURRENTLY.

--
Simon Riggs www.2ndQuadrant.com

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Hannu Krosing <hannu(at)2ndQuadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-17 01:19:09
Message-ID:	3836.1253150349@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> What we need is VACUUM FULL CONCURRENTLY and REINDEX CONCURRENTLY.

VACUUM FULL CONCURRENTLY is a contradiction in terms. Wishing it were
possible doesn't make it so.

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Jeff Davis <pgsql(at)j-davis(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-17 01:22:50
Message-ID:	3893.1253150570@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> I think it would be useful to have an additional option to force VACUUM
> to wait for the lock so it can truncate. It's annoying to have to re-run
> VACUUM just to give it a chance at the lock again.

It would be better to separate out the truncate-what-you-can behavior
as an entirely distinct operation.

If we go with Heikki's plan of a new special operation that moves tuples
down without trying to preserve XMINs, then we could have that thing
truncate any empty end pages as its first (not last) step. But it might
be more useful/flexible if they were just two separate ops.

regards, tom lane

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-17 01:32:45
Message-ID:	1253151165.9666.121.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 2009-09-16 at 21:00 -0400, Tom Lane wrote:

> But if
> we get rid of old-style VACUUM FULL then we do need something to cover
> those few-and-far-between situations where you really do desperately
> need to compact a table in place; and a utility like this seems like a
> reasonable solution. I'm thinking in particular that it should be
> possible to have it move just a bounded number of tuples at a time,
> so that you could do a VACUUM to clean out the indexes in between
> move passes. Otherwise you run the risk of running out of disk space
> anyway, due to index bloat.

Agreed to all of the above, though I see some challenges.

The way I read the thread so far is that there are multiple
requirements:
* Shrink a table efficiently - when time and space available to do so
* Shrink a table in place - when no space available
* Shrink a table concurrently - when no dedicated time available

We probably can't do all of them at once, but we do need all of them, at
various times.

--
Simon Riggs www.2ndQuadrant.com

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-17 01:35:25
Message-ID:	3999.1253151325@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> The way I read the thread so far is that there are multiple
> requirements:

> * Shrink a table efficiently - when time and space available to do so

To be addressed by the CLUSTER-based solution (VACUUM REWRITE or
whatever we call it).

> * Shrink a table in place - when no space available

To be addressed by the UPDATE-style tuple-mover (which could be thought
of as VACUUM FULL rewritten to not use any special mechanisms).

> * Shrink a table concurrently - when no dedicated time available

Wishful thinking, which should not stop us from proceeding with the
solutions we know how to implement.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-17 01:41:13
Message-ID:	603c8f070909161841m27f0f65av7711317a036382ef@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Sep 16, 2009 at 9:35 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
>> The way I read the thread so far is that there are multiple
>> requirements:
>
>> * Shrink a table efficiently - when time and space available to do so
>
> To be addressed by the CLUSTER-based solution (VACUUM REWRITE or
> whatever we call it).
>
>> * Shrink a table in place - when no space available
>
> To be addressed by the UPDATE-style tuple-mover (which could be thought
> of as VACUUM FULL rewritten to not use any special mechanisms).
>
>> * Shrink a table concurrently - when no dedicated time available
>
> Wishful thinking, which should not stop us from proceeding with the
> solutions we know how to implement.

The UPDATE-style tuple-mover might work for this too, for certain
workloads. If most of your transactions are short, and the server
load is not too high, it might be OK to lock the table, move a few
tuples, lock the table, move a few tuples, etc. Now if you have
long-running transactions, not so much.

...Robert

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-17 01:48:20
Message-ID:	4523.1253152100@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Wed, Sep 16, 2009 at 9:35 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
>>> * Shrink a table concurrently - when no dedicated time available
>>
>> Wishful thinking, which should not stop us from proceeding with the
>> solutions we know how to implement.

> The UPDATE-style tuple-mover might work for this too, for certain
> workloads. If most of your transactions are short, and the server
> load is not too high, it might be OK to lock the table, move a few
> tuples, lock the table, move a few tuples, etc. Now if you have
> long-running transactions, not so much.

Yeah, I was just wondering about that myself. Seems like there would
be lots of situations where short exclusive-lock intervals could be
tolerated, even though not long ones. So that's another argument
for being able to set an upper bound on how many tuples get moved
per call.

regards, tom lane

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-17 03:07:28
Message-ID:	1253156848.12967.6.camel@jdavis
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 2009-09-16 at 21:48 -0400, Tom Lane wrote:
> Yeah, I was just wondering about that myself. Seems like there would
> be lots of situations where short exclusive-lock intervals could be
> tolerated, even though not long ones.

But a short-lived exclusive lock can turn into a long-lived exclusive
lock if there are long-lived transactions ahead of it in the queue. We
probably don't want to automate anything by default that acquires
exclusive locks, even for a short time. However, I agree that it's fine
in many situations if the administrator is choosing it.

Regards,
Jeff Davis

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-17 03:12:46
Message-ID:	5317.1253157166@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Jeff Davis <pgsql(at)j-davis(dot)com> writes:
> On Wed, 2009-09-16 at 21:48 -0400, Tom Lane wrote:
>> Yeah, I was just wondering about that myself. Seems like there would
>> be lots of situations where short exclusive-lock intervals could be
>> tolerated, even though not long ones.

> But a short-lived exclusive lock can turn into a long-lived exclusive
> lock if there are long-lived transactions ahead of it in the queue. We
> probably don't want to automate anything by default that acquires
> exclusive locks, even for a short time. However, I agree that it's fine
> in many situations if the administrator is choosing it.

Right, which is why autovacuum can't have anything to do with this.
But as an emergency recovery tool it seems reasonable enough.

regards, tom lane

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-17 05:12:09
Message-ID:	4AB1C529.9020806@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Wed, Sep 16, 2009 at 9:35 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
>>>> * Shrink a table concurrently - when no dedicated time available
>>> Wishful thinking, which should not stop us from proceeding with the
>>> solutions we know how to implement.
>
>> The UPDATE-style tuple-mover might work for this too, for certain
>> workloads. If most of your transactions are short, and the server
>> load is not too high, it might be OK to lock the table, move a few
>> tuples, lock the table, move a few tuples, etc. Now if you have
>> long-running transactions, not so much.
>
> Yeah, I was just wondering about that myself. Seems like there would
> be lots of situations where short exclusive-lock intervals could be
> tolerated, even though not long ones.

That was my thinking. The tuple moving can block if another backend is
doing updates concurrently, and the moving can block other backends from
updating (and cause serialization errors). But that seems like a
perfectly acceptable limitation that we can simply document. Surely it's
better than taking an ExclusiveLock.

> So that's another argument
> for being able to set an upper bound on how many tuples get moved
> per call.

Yeah, that would alleviate it. We could write a client utility to call
it repeatedly, and perhaps VACUUMs in between, to make it easier to use.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Jeff Davis <pgsql(at)j-davis(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-17 07:47:25
Message-ID:	1253173645.9666.130.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 2009-09-16 at 23:12 -0400, Tom Lane wrote:
> Jeff Davis <pgsql(at)j-davis(dot)com> writes:
> > On Wed, 2009-09-16 at 21:48 -0400, Tom Lane wrote:
> >> Yeah, I was just wondering about that myself. Seems like there would
> >> be lots of situations where short exclusive-lock intervals could be
> >> tolerated, even though not long ones.
>
> > But a short-lived exclusive lock can turn into a long-lived exclusive
> > lock if there are long-lived transactions ahead of it in the queue. We
> > probably don't want to automate anything by default that acquires
> > exclusive locks, even for a short time. However, I agree that it's fine
> > in many situations if the administrator is choosing it.
>
> Right, which is why autovacuum can't have anything to do with this.

We already do this and we already solved the problem associated with it.
VACUUM tries to grab a conditional lock to shrink the table. We can do
the same thing here, just retry the lock for each chunk cleaned.

--
Simon Riggs www.2ndQuadrant.com

From:	Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-17 07:53:44
Message-ID:	1253174024.778.112.camel@hvost1700
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 2009-09-16 at 21:19 -0400, Tom Lane wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> > What we need is VACUUM FULL CONCURRENTLY and REINDEX CONCURRENTLY.
>
> VACUUM FULL CONCURRENTLY is a contradiction in terms. Wishing it were
> possible doesn't make it so.

It depends on what do you mean by "VACUUM FULL"

if VACUUM FULL is just something that works on a table ends up with
(mostly) compacted one, then doing this CONCURRENTLY should not be
impossible.

If you mean the current version of VACUUM FULL, then this is impossible
indeed.

--
Hannu Krosing http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability
Services, Consulting and Training

From:	Joshua Tolley <eggyknap(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-17 13:35:44
Message-ID:	20090917133544.GE25739@eddie
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Sep 16, 2009 at 09:48:20PM -0400, Tom Lane wrote:
> Seems like there would
> be lots of situations where short exclusive-lock intervals could be
> tolerated, even though not long ones. So that's another argument
> for being able to set an upper bound on how many tuples get moved
> per call.

Presumably this couldn't easily be an upper bound on the time spent moving
tuples, rather than an upper bound on the number of tuples moved?

--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Joshua Tolley <eggyknap(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-17 13:45:32
Message-ID:	603c8f070909170645t2023fc6eq5166f58a97a6675b@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Sep 17, 2009 at 9:35 AM, Joshua Tolley <eggyknap(at)gmail(dot)com> wrote:
> On Wed, Sep 16, 2009 at 09:48:20PM -0400, Tom Lane wrote:
>> Seems like there would
>> be lots of situations where short exclusive-lock intervals could be
>> tolerated, even though not long ones. So that's another argument
>> for being able to set an upper bound on how many tuples get moved
>> per call.
>
> Presumably this couldn't easily be an upper bound on the time spent moving
> tuples, rather than an upper bound on the number of tuples moved?

It's probably not worth it. There shouldn't be a tremendous amount of
variability in how long it takes to move N tuples, so it's just a
matter of finding the right value of N for your system and workload.
Making the code more complicated so that it's easier to tune something
that isn't very hard to tune anyway doesn't seem like a good
trade-off.

(Plus, of course, you can't stop in the middle: so you'd end up moving
a few tuples and then trying to estimate whether you had enough time
left to move a few more... and maybe being wrong... blech.)

...Robert

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-17 14:21:54
Message-ID:	10262.1253197314@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hannu Krosing <hannu(at)2ndQuadrant(dot)com> writes:
> On Wed, 2009-09-16 at 21:19 -0400, Tom Lane wrote:
>> VACUUM FULL CONCURRENTLY is a contradiction in terms. Wishing it were
>> possible doesn't make it so.

> It depends on what do you mean by "VACUUM FULL"

Anything that moves tuples is not acceptable as a hidden background
operation, because it will break applications that depend on CTID.

The utility Heikki is talking about is something that DBAs would
invoke explicitly, presumably with an understanding of the side effects.

regards, tom lane

From:	Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Joshua Tolley <eggyknap(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-17 14:24:53
Message-ID:	1253197493.778.176.camel@hvost1700
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, 2009-09-17 at 09:45 -0400, Robert Haas wrote:
> On Thu, Sep 17, 2009 at 9:35 AM, Joshua Tolley <eggyknap(at)gmail(dot)com> wrote:
> > On Wed, Sep 16, 2009 at 09:48:20PM -0400, Tom Lane wrote:
> >> Seems like there would
> >> be lots of situations where short exclusive-lock intervals could be
> >> tolerated, even though not long ones. So that's another argument
> >> for being able to set an upper bound on how many tuples get moved
> >> per call.
> >
> > Presumably this couldn't easily be an upper bound on the time spent moving
> > tuples, rather than an upper bound on the number of tuples moved?
>
> It's probably not worth it. There shouldn't be a tremendous amount of
> variability in how long it takes to move N tuples, so it's just a
> matter of finding the right value of N for your system and workload.

If you already have found the free space and the tuples to move, and
they both are evenly distributed, then it should take more or less than
same time to move them.

If you yet have to find the tuples, one by one and then place them in
small free slots on pages far apart then it takes significantly longer
than just moving full pages.

Also, associated index updates can be of very different length,
especially for huge indexes where you may not only end up doing lots of
page splits, but may also need to read in large sets of pages from disk.

> Making the code more complicated so that it's easier to tune something
> that isn't very hard to tune anyway doesn't seem like a good
> trade-off.

I think that just making sure that pessimal cases don't happen should be
enough, maybe just check for too-much-time-in-transaction after each N
pages touched.

--
Hannu Krosing http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability
Services, Consulting and Training

From:	Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-17 14:31:08
Message-ID:	1253197868.778.181.camel@hvost1700
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, 2009-09-17 at 10:21 -0400, Tom Lane wrote:
> Hannu Krosing <hannu(at)2ndQuadrant(dot)com> writes:
> > On Wed, 2009-09-16 at 21:19 -0400, Tom Lane wrote:
> >> VACUUM FULL CONCURRENTLY is a contradiction in terms. Wishing it were
> >> possible doesn't make it so.
>
> > It depends on what do you mean by "VACUUM FULL"
>
> Anything that moves tuples is not acceptable as a hidden background
> operation,

I did not mean VACUUM FULL to be run as a hidden background operation.
just as something that does not need everything else to be shut down.

> because it will break applications that depend on CTID.

Do you know of any such applications out in the wild ?

> The utility Heikki is talking about is something that DBAs would
> invoke explicitly, presumably with an understanding of the side effects.

Like VACUUM FULL ?

--
Hannu Krosing http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability
Services, Consulting and Training

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-17 14:32:42
Message-ID:	10393.1253197962@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hannu Krosing <hannu(at)2ndQuadrant(dot)com> writes:
> On Thu, 2009-09-17 at 10:21 -0400, Tom Lane wrote:
>> because it will break applications that depend on CTID.

> Do you know of any such applications out in the wild ?

Yes, they're out there.

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Joshua Tolley <eggyknap(at)gmail(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-17 14:45:23
Message-ID:	10524.1253198723@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hannu Krosing <hannu(at)2ndQuadrant(dot)com> writes:
> On Thu, 2009-09-17 at 09:45 -0400, Robert Haas wrote:
>> Making the code more complicated so that it's easier to tune something
>> that isn't very hard to tune anyway doesn't seem like a good
>> trade-off.

> I think that just making sure that pessimal cases don't happen should be
> enough, maybe just check for too-much-time-in-transaction after each N
> pages touched.

If people think that a runtime limit is the most natural way to control
this, I don't see a reason not to do it that way. I would envision
checking the elapsed time once per page or few pages; shouldn't be a
huge amount of effort or complication ...

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Hannu Krosing <hannu(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-17 15:25:28
Message-ID:	603c8f070909170825m8499bd7w63d3ea8c409c8fe9@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Sep 17, 2009 at 10:21 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Hannu Krosing <hannu(at)2ndQuadrant(dot)com> writes:
>> On Wed, 2009-09-16 at 21:19 -0400, Tom Lane wrote:
>>> VACUUM FULL CONCURRENTLY is a contradiction in terms. Wishing it were
>>> possible doesn't make it so.
>
>> It depends on what do you mean by "VACUUM FULL"
>
> Anything that moves tuples is not acceptable as a hidden background
> operation, because it will break applications that depend on CTID.

I'm a bit confused. CTIDs change all the time anyway, whenever you
update the table. What could someone possibly be using them for?

...Robert

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Hannu Krosing <hannu(at)2ndQuadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Joshua Tolley <eggyknap(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-17 15:34:14
Message-ID:	1253201654.9666.181.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, 2009-09-17 at 10:45 -0400, Tom Lane wrote:
> Hannu Krosing <hannu(at)2ndQuadrant(dot)com> writes:
> > On Thu, 2009-09-17 at 09:45 -0400, Robert Haas wrote:
> >> Making the code more complicated so that it's easier to tune something
> >> that isn't very hard to tune anyway doesn't seem like a good
> >> trade-off.
>
> > I think that just making sure that pessimal cases don't happen should be
> > enough, maybe just check for too-much-time-in-transaction after each N
> > pages touched.
>
> If people think that a runtime limit is the most natural way to control
> this, I don't see a reason not to do it that way. I would envision
> checking the elapsed time once per page or few pages; shouldn't be a
> huge amount of effort or complication ...

Yes, I think time is the most natural way. Currently, VACUUM provides an
effective max impact time since it locks one block at any one time and
therefore limits how long users need wait for it. We need a way to
specify the maximum time we are prepared for an update/delete
transaction to wait when this utility runs (in ms). That way we can
easily assess the impact on transactional systems.

--
Simon Riggs www.2ndQuadrant.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-17 15:41:26
Message-ID:	1253202086.9666.188.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, 2009-09-17 at 11:25 -0400, Robert Haas wrote:
> On Thu, Sep 17, 2009 at 10:21 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Hannu Krosing <hannu(at)2ndQuadrant(dot)com> writes:
> >> On Wed, 2009-09-16 at 21:19 -0400, Tom Lane wrote:
> >>> VACUUM FULL CONCURRENTLY is a contradiction in terms. Wishing it were
> >>> possible doesn't make it so.
> >
> >> It depends on what do you mean by "VACUUM FULL"
> >
> > Anything that moves tuples is not acceptable as a hidden background
> > operation, because it will break applications that depend on CTID.
>
> I'm a bit confused. CTIDs change all the time anyway, whenever you
> update the table. What could someone possibly be using them for?

This part of the thread is somewhat strange. I don't think anybody was
suggesting the thing that Tom has assumed was meant, so how that chimera
would work isn't important. So, moving on...

The update utility being discussed is in danger of confusing these two
goals
* compact the table using minimal workspace
* compact the table with minimal interruption to concurrent updaters

We really *need* it to do the first for when emergencies arrive, but
most of the time we'd like it do the the second one. They aren't
necessarily the same thing and I don't want us to forget the "using
minimal workspace" requirement because the other one sounds so juicy.

--
Simon Riggs www.2ndQuadrant.com

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-17 16:08:04
Message-ID:	4AB25EE4.4090708@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas wrote:
> On Thu, Sep 17, 2009 at 10:21 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Hannu Krosing <hannu(at)2ndQuadrant(dot)com> writes:
>>> On Wed, 2009-09-16 at 21:19 -0400, Tom Lane wrote:
>>>> VACUUM FULL CONCURRENTLY is a contradiction in terms. Wishing it were
>>>> possible doesn't make it so.
>>> It depends on what do you mean by "VACUUM FULL"
>> Anything that moves tuples is not acceptable as a hidden background
>> operation, because it will break applications that depend on CTID.
>
> I'm a bit confused. CTIDs change all the time anyway, whenever you
> update the table. What could someone possibly be using them for?

As a unique identifier, while you hold a portal open. I recall that last
time this was discussed was wrt. HOT. At least one of the drivers used
it to implement client-side updateable cursors (ODBC if I recall
correctly). We normally guarantee that CTID of a row doesn't change
within the same transaction that you read it, but if we do UPDATEs to
move tuples behind the application's back, the UPDATEs will cause the
CTID of the row to change.

It's no different from the situation where another backend UPDATEs the
row under your nose, but it's not something you want to do automatically
without notice.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-17 16:18:46
Message-ID:	20055.1253204326@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> Robert Haas wrote:
>> On Thu, Sep 17, 2009 at 10:21 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Anything that moves tuples is not acceptable as a hidden background
>>> operation, because it will break applications that depend on CTID.

>> I'm a bit confused. CTIDs change all the time anyway, whenever you
>> update the table. What could someone possibly be using them for?

> As a unique identifier, while you hold a portal open.

Or for an update without having to hold a transaction open. We have
recommended this type of technique in the past:

select ctid, xmin, * from table where id = something;

... allow user to edit the row at his leisure ...

update table set ... where id = something and
ctid = previous value and xmin = previous value;
if rows_updated = 0 then
report error ("row was already updated by someone else");

(Actually, the ctid is only being used for fast access here; the xmin
is what is really needed to detect that someone else updated the row.
But the proposed tuple-mover would break the xmin check too.)

> It's no different from the situation where another backend UPDATEs the
> row under your nose, but it's not something you want to do automatically
> without notice.

Exactly. The application is typically going to throw a "concurrent
update" type of error when this happens, and we don't want magic
background operations to cause that.

regards, tom lane

From:	Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-17 16:23:05
Message-ID:	1253204585.778.182.camel@hvost1700
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, 2009-09-17 at 10:32 -0400, Tom Lane wrote:
> Hannu Krosing <hannu(at)2ndQuadrant(dot)com> writes:
> > On Thu, 2009-09-17 at 10:21 -0400, Tom Lane wrote:
> >> because it will break applications that depend on CTID.
>
> > Do you know of any such applications out in the wild ?
>
> Yes, they're out there.

How do they deal with concurrent UPDATEs ?

> regards, tom lane
>
--
Hannu Krosing http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability
Services, Consulting and Training

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndQuadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-17 16:30:20
Message-ID:	20226.1253205020@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> The update utility being discussed is in danger of confusing these two
> goals
> * compact the table using minimal workspace
> * compact the table with minimal interruption to concurrent updaters

Actually, the update utility is explicitly meant to satisfy both of
those goals (possibly with different usage styles). I don't see any
particular confusion.

regards, tom lane

From:	Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-17 16:31:29
Message-ID:	1253205089.778.188.camel@hvost1700
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, 2009-09-17 at 12:18 -0400, Tom Lane wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> > Robert Haas wrote:
> >> On Thu, Sep 17, 2009 at 10:21 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >>> Anything that moves tuples is not acceptable as a hidden background
> >>> operation, because it will break applications that depend on CTID.
>
> >> I'm a bit confused. CTIDs change all the time anyway, whenever you
> >> update the table. What could someone possibly be using them for?
>
> > As a unique identifier, while you hold a portal open.
>
> Or for an update without having to hold a transaction open. We have
> recommended this type of technique in the past:
>
> select ctid, xmin, * from table where id = something;
>
> ... allow user to edit the row at his leisure ...
>
> update table set ... where id = something and
> ctid = previous value and xmin = previous value;
> if rows_updated = 0 then
> report error ("row was already updated by someone else");
>
> (Actually, the ctid is only being used for fast access here; the xmin
> is what is really needed to detect that someone else updated the row.
> But the proposed tuple-mover would break the xmin check too.)

I have used mostly duck-typed, interface-not-identity languages lately,
so for me the natural thing to check in similar situation is if any
"interesting columns" have changed, by simply preserving old values in
user application and use these in WHERE clause of update.

Why should anyone care if there has been say a null update (set id=id
where id=...) ?

If you need real locking, then just define a locked (or locked_by or
locked_until) column and use that for concurrent edit control

> > It's no different from the situation where another backend UPDATEs the
> > row under your nose, but it's not something you want to do automatically
> > without notice.
>
> Exactly. The application is typically going to throw a "concurrent
> update" type of error when this happens, and we don't want magic
> background operations to cause that.

Would'nt current VACUUM FULL or CLUSTER cause much more grief in this
situation ?

--
Hannu Krosing http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability
Services, Consulting and Training

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-17 16:31:42
Message-ID:	603c8f070909170931v6f6e9aa5mcccdc757b4fb2990@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Sep 17, 2009 at 12:18 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> It's no different from the situation where another backend UPDATEs the
>> row under your nose, but it's not something you want to do automatically
>> without notice.
>
> Exactly. The application is typically going to throw a "concurrent
> update" type of error when this happens, and we don't want magic
> background operations to cause that.

OK, that makes sense. It seems like we more or less have consensus on
what to do here.

- Change VACUUM FULL to be the equivalent of CLUSTER-without-index.
- Add some kind of tuple mover that can be invoked when it's necessary
to incrementally compact a table in place.

This might not cover every possible use case, but it seems that it
can't be any worse than what we have now. The tuple mover seems like
a workable substitute for the current VACUUM FULL in cases where space
is limited, and by virtual of being incremental it can be used in
situations where the current VACUUM FULL can't. There could be a loss
of functionality of the tuple mover is slower than VACUUM FULL, but
the consensus seems to be that's almost impossible to contemplate.

The new VACUUM FULL behavior, OTOH, should be faster than the existing
one in cases where space consumption is not an issue.

So nothing gets any worse, and some things get better.

But who is implementing this?

...Robert

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Hannu Krosing <hannu(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-17 16:32:28
Message-ID:	603c8f070909170932o62feac52wcc25887b08d56fa7@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Sep 17, 2009 at 12:31 PM, Hannu Krosing <hannu(at)2ndquadrant(dot)com> wrote:
>> Exactly. The application is typically going to throw a "concurrent
>> update" type of error when this happens, and we don't want magic
>> background operations to cause that.
>
> Would'nt current VACUUM FULL or CLUSTER cause much more grief in this
> situation ?

No. They take an exclusive lock on the table, so this situation can't
occur in those cases, which was Tom's point.

...Robert

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-17 16:36:40
Message-ID:	20363.1253205400@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hannu Krosing <hannu(at)2ndQuadrant(dot)com> writes:
> On Thu, 2009-09-17 at 12:18 -0400, Tom Lane wrote:
>> Or for an update without having to hold a transaction open. We have
>> recommended this type of technique in the past:

> If you need real locking, then just define a locked (or locked_by or
> locked_until) column and use that for concurrent edit control

That's pessimistic locking, and it sucks for any number of reasons,
most obviously if your client crashes or otherwise forgets to release
the lock. The method I was illustrating is specifically meant for
apps that would prefer optimistic locking.

>> Exactly. The application is typically going to throw a "concurrent
>> update" type of error when this happens, and we don't want magic
>> background operations to cause that.

> Would'nt current VACUUM FULL or CLUSTER cause much more grief in this
> situation ?

Sure, but neither of those are recommended for routine maintenance
during live database operations. (What you might do during maintenance
windows is a different discussion.)

regards, tom lane

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndQuadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-17 17:02:35
Message-ID:	1253206955.9666.205.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, 2009-09-17 at 12:30 -0400, Tom Lane wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> > The update utility being discussed is in danger of confusing these two
> > goals
> > * compact the table using minimal workspace
> > * compact the table with minimal interruption to concurrent updaters
>
> Actually, the update utility is explicitly meant to satisfy both of
> those goals (possibly with different usage styles). I don't see any
> particular confusion.

<sigh> It wasn't explicit until now. The confusion was you saying that
"VACUUM FULL CONCURRENTLY" was an impossible dream, that's why I've
restated it the above way so its clear what we want.

--
Simon Riggs www.2ndQuadrant.com

From:	Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-17 17:13:12
Message-ID:	1253207592.778.204.camel@hvost1700
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, 2009-09-17 at 12:36 -0400, Tom Lane wrote:
> Hannu Krosing <hannu(at)2ndQuadrant(dot)com> writes:
> > On Thu, 2009-09-17 at 12:18 -0400, Tom Lane wrote:
> >> Or for an update without having to hold a transaction open. We have
> >> recommended this type of technique in the past:
>
> > If you need real locking, then just define a locked (or locked_by or
> > locked_until) column and use that for concurrent edit control
>
> That's pessimistic locking, and it sucks for any number of reasons,
> most obviously if your client crashes or otherwise forgets to release
> the lock.

That's the (locked_by,locked_until) case. It is used for a) telling
other potential editors that "this row is being edited" and also to time
out the lock.

> The method I was illustrating is specifically meant for
> apps that would prefer optimistic locking.

But surely any reliance on internal implementation details like CTID or -
XMIN should be discouraged in ordinanry user code, or really anything
except maintenance utilities which sometimes _have_ to do that.

Still most people would _not_ want that to fail, if someone just opended
the edit windeo and then clicked "Save" without making any changes.

Telling the user the "You can't save your edited record as somebody just
changed the xmin field seems kind of silly.

> >> Exactly. The application is typically going to throw a "concurrent
> >> update" type of error when this happens, and we don't want magic
> >> background operations to cause that.
>
> > Would'nt current VACUUM FULL or CLUSTER cause much more grief in this
> > situation ?
>
> Sure, but neither of those are recommended for routine maintenance
> during live database operations.

If they were, then we would net be having this whole discussion now.

> (What you might do during maintenance windows is a different discussion.)

I aim at 24/7 operations with no maintenance window in sight

>
> regards, tom lane
>
--
Hannu Krosing http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability
Services, Consulting and Training

From:	Greg Smith <gsmith(at)gregsmith(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-17 18:33:05
Message-ID:	alpine.GSO.2.01.0909171429420.19480@westnet.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 16 Sep 2009, Tom Lane wrote:

>> * Shrink a table in place - when no space available
> To be addressed by the UPDATE-style tuple-mover (which could be thought
> of as VACUUM FULL rewritten to not use any special mechanisms).

Is there any synergy here with the needs of a future in-place upgrade
upgrade mechanism that handles page header expansion? That problem seemed
to always get stuck on the issue of how to move tuples around when the
pages were full. Not trying to drag the scope of this job out, just
looking for common ground that might be considered when designing the
tuple-mover if it could serve both purposes.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

From:	Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To:	Greg Smith <gsmith(at)gregsmith(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-17 18:45:48
Message-ID:	1253213148.778.300.camel@hvost1700
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, 2009-09-17 at 14:33 -0400, Greg Smith wrote:
> On Wed, 16 Sep 2009, Tom Lane wrote:
>
> >> * Shrink a table in place - when no space available
> > To be addressed by the UPDATE-style tuple-mover (which could be thought
> > of as VACUUM FULL rewritten to not use any special mechanisms).
>
> Is there any synergy here with the needs of a future in-place upgrade
> upgrade mechanism that handles page header expansion? That problem seemed
> to always get stuck on the issue of how to move tuples around when the
> pages were full. Not trying to drag the scope of this job out, just
> looking for common ground that might be considered when designing the
> tuple-mover if it could serve both purposes.

I understood that the main difficulty for in-place tuple expansion was
keeping CTIDs to not need to update indexes.

Current tuple mover discussion does not address that.

But maybe something can be tahen from this discussion the other way
round - maybe we should not be afraid of doing null updates during
in-place update

--
Hannu Krosing http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability
Services, Consulting and Training

From:	Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Hannu Krosing <hannu(at)2ndQuadrant(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-17 19:42:33
Message-ID:	m2d45p73yu.fsf@hi-media.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

Forewords: re-reading, I hope my english will not make this sound like a
high-kick when I'm just struggling to understand what all this is
about. Sending in order not to regret missing the oportunity I think I'm
seeing...

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> Hannu Krosing <hannu(at)2ndQuadrant(dot)com> writes:
>> On Wed, 2009-09-16 at 21:19 -0400, Tom Lane wrote:
>>> VACUUM FULL CONCURRENTLY is a contradiction in terms. Wishing it were
>>> possible doesn't make it so.
>
>> It depends on what do you mean by "VACUUM FULL"
>
> Anything that moves tuples is not acceptable as a hidden background
> operation, because it will break applications that depend on CTID.

I though this community had the habit of pushing public interface
backward compatibility while going as far as requiring systematic full
dump and restore cycle for major version upgrade in order to allow for
internal redesign anytime in development.

And even if it's easy enough to SELECT ctid FROM table, this has always
been an implementation detail in my mind, the same way catalog layout
is.

I don't see any reason why not breaking the user visible behavior of
tuples CTID between any two major releases, all the more when the reason
we're talking about it is automated online physical optimisations, which
seems to be opening the door for bloat resistant PostgreSQL.

> The utility Heikki is talking about is something that DBAs would
> invoke explicitly, presumably with an understanding of the side effects.

That's the CLUSTER on seqscan. As far as the table rewritting goes, the
above only states your POV, based on ctid backward compatibility need
which I'm not the only one here not sharing, let alone understanding.

Am I completely wet here?
--
dim

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
Cc:	Greg Smith <gsmith(at)gregsmith(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-17 21:41:01
Message-ID:	4516.1253223661@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hannu Krosing <hannu(at)2ndQuadrant(dot)com> writes:
> But maybe something can be tahen from this discussion the other way
> round - maybe we should not be afraid of doing null updates during
> in-place update

The problem for in-place update is that it can't assume that any of the
normal infrastructure (like index insertion or WAL logging) is up.

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
Cc:	Hannu Krosing <hannu(at)2ndQuadrant(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-17 21:44:09
Message-ID:	4572.1253223849@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Dimitri Fontaine <dfontaine(at)hi-media(dot)com> writes:
> I don't see any reason why not breaking the user visible behavior of
> tuples CTID between any two major releases,

> Am I completely wet here?

Completely. This is a user-visible behavior that we have encouraged
people to rely on, and for which there is no easy substitute.

regards, tom lane

From:	Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Hannu Krosing <hannu(at)2ndQuadrant(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-17 22:10:49
Message-ID:	m2hbv143yu.fsf@hi-media.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> Completely. This is a user-visible behavior that we have encouraged
> people to rely on, and for which there is no easy substitute.

Excited to have self-healing tables (against bloat), I parse this as an
opening. Previously on this thread you say:

> (Actually, the ctid is only being used for fast access here; the xmin
> is what is really needed to detect that someone else updated the row.
> But the proposed tuple-mover would break the xmin check too.)

So to have the impossible feature, we need a way not to break existing
code relying on ctid and xmin. How stretching would you consider the
idea of taking a (maybe new) table lock as soon as a SELECT output
contains system columns, this lock preventing the magic utility to
operate?

Regards,
--
dim

From:	"Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To:	"Tom Lane EXTERN" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Dimitri Fontaine" <dfontaine(at)hi-media(dot)com>
Cc:	"Hannu Krosing" <hannu(at)2ndQuadrant(dot)com>, "Simon Riggs" <simon(at)2ndQuadrant(dot)com>, "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Josh Berkus" <josh(at)agliodbs(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-18 06:50:35
Message-ID:	D960CB61B694CF459DCFB4B0128514C203937F1F@exadv11.host.magwien.gv.at
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> > I don't see any reason why not breaking the user visible behavior of
> > tuples CTID between any two major releases,
>
> > Am I completely wet here?
>
> Completely. This is a user-visible behavior that we have encouraged
> people to rely on, and for which there is no easy substitute.

I second that: it would hurt to lose this generic technique for
optimistic locking.

Yours,
Laurenz Albe

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Hannu Krosing <hannu(at)2ndQuadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-18 06:54:04
Message-ID:	1253256845.9666.313.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, 2009-09-17 at 17:44 -0400, Tom Lane wrote:
> Dimitri Fontaine <dfontaine(at)hi-media(dot)com> writes:
> > I don't see any reason why not breaking the user visible behavior of
> > tuples CTID between any two major releases,
>
> > Am I completely wet here?
>
> Completely. This is a user-visible behavior that we have encouraged
> people to rely on, and for which there is no easy substitute.

Agreed. I investigated that avenue as a possible implementation approach
when designing HOT and I didn't find anything worth taking away.

I'm very much in favour of a higher-level solution to compacting a
table, as has been discussed for the batch update utility. That avoids
most of the low-level yuck that VACUUM FULL imposes upon itself and
everyone around it. If we want to move forward long term we need to keep
the internals as clean as possible. Hot Standby would never have been
possible without that principle having already been applied across the
other subsystems.

--
Simon Riggs www.2ndQuadrant.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Cc:	Tom Lane EXTERN <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Hannu Krosing <hannu(at)2ndQuadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-18 07:28:26
Message-ID:	1253258906.9666.337.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, 2009-09-18 at 08:50 +0200, Albe Laurenz wrote:
> Tom Lane wrote:
> > > I don't see any reason why not breaking the user visible behavior of
> > > tuples CTID between any two major releases,
> >
> > > Am I completely wet here?
> >
> > Completely. This is a user-visible behavior that we have encouraged
> > people to rely on, and for which there is no easy substitute.
>
> I second that: it would hurt to lose this generic technique for
> optimistic locking.

CTIDs don't help with optimistic locking, though it seems they can.

If you don't hold open the transaction then someone else can update the
row. That sounds good, but because of HOT it is possible that the same
CTID with the same PK value occupies that exact CTID value when you
return to check it. You think row has not been updated so you perform
your update, but it has been updated, so you overwrite previous data -
data loss. Actually worse, sporadic data loss because of race
conditions.

--
Simon Riggs www.2ndQuadrant.com

From:	marcin mank <marcin(dot)mank(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-18 08:01:04
Message-ID:	b1b9fac60909180101k794bca3dp98942faf1e267a6a@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> Exactly. The application is typically going to throw a "concurrent
> update" type of error when this happens, and we don't want magic
> background operations to cause that.
>

I`d give up the possibility of using CTIDs in the way You explained
for an auto-debloater without blinking an eye. Maybe we should have a
GUC to enable/disable the auto-debloater? Make it a reloption?

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, Tom Lane EXTERN <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Hannu Krosing <hannu(at)2ndQuadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2009-09-18 08:01:35
Message-ID:	4AB33E5F.20803@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs wrote:
> CTIDs don't help with optimistic locking, though it seems they can.
>
> If you don't hold open the transaction then someone else can update the
> row. That sounds good, but because of HOT it is possible that the same
> CTID with the same PK value occupies that exact CTID value when you
> return to check it. You think row has not been updated so you perform
> your update, but it has been updated, so you overwrite previous data -
> data loss. Actually worse, sporadic data loss because of race
> conditions.

Yeah, you have to check xmin as well.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Cc:	Heikki <hlinnaka(at)iki(dot)fi>
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2015-04-24 19:04:03
Message-ID:	20150424190403.GP4369@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas wrote:

> Hmm, AFAICT pg_reorg is much more complex, writing stuff to a temp table
> and swapping relfilenodes afterwards. More like the VACUUM REWRITE
> that's been discussed.
>
> For the kicks, I looked at what it would take to write a utility like
> that. It turns out to be quite trivial, patch attached. It uses the same
> principle as VACUUM FULL, scans from the end, moving tuples to
> lower-numbered pages until it can't do it anymore. It requires a small
> change to heap_update(), to override the preference to store the new
> tuple on the same page as the old one, but other than that, it's all in
> the external module.

More than five years have passed since Heikki posted this, and we still
haven't found a solution to the problem -- which neverthless keeps
biting people to the point that multiple "user-space" implementations of
similar techniques are out there.

I think what we need here is something that does heap_update to tuples
at the end of the table, moving them to earlier pages; then wait for old
snapshots to die (the infrastructure for which we have now, thanks to
CREATE INDEX CONCURRENTLY); then truncate the empty pages. Of course,
there are lots of details to resolve. It doesn't really matter that
this runs for long: a process doing this for hours might be better than
AccessExclusiveLock on the table for a much shorter period.

Are there any takers?

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Cc:	Heikki <hlinnaka(at)iki(dot)fi>
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2015-04-24 20:21:38
Message-ID:	553AA5D2.9080001@BlueTreble.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 4/24/15 2:04 PM, Alvaro Herrera wrote:
> Heikki Linnakangas wrote:
>
>> Hmm, AFAICT pg_reorg is much more complex, writing stuff to a temp table
>> and swapping relfilenodes afterwards. More like the VACUUM REWRITE
>> that's been discussed.
>>
>> For the kicks, I looked at what it would take to write a utility like
>> that. It turns out to be quite trivial, patch attached. It uses the same
>> principle as VACUUM FULL, scans from the end, moving tuples to
>> lower-numbered pages until it can't do it anymore. It requires a small
>> change to heap_update(), to override the preference to store the new
>> tuple on the same page as the old one, but other than that, it's all in
>> the external module.
>
> More than five years have passed since Heikki posted this, and we still
> haven't found a solution to the problem -- which neverthless keeps
> biting people to the point that multiple "user-space" implementations of
> similar techniques are out there.
>
> I think what we need here is something that does heap_update to tuples
> at the end of the table, moving them to earlier pages; then wait for old
> snapshots to die (the infrastructure for which we have now, thanks to
> CREATE INDEX CONCURRENTLY); then truncate the empty pages. Of course,
> there are lots of details to resolve. It doesn't really matter that
> this runs for long: a process doing this for hours might be better than
> AccessExclusiveLock on the table for a much shorter period.
>
> Are there any takers?

Honestly, I'd prefer we exposed some way to influence where a new tuple
gets put, and perhaps better ways of accessing tuples on a specific
page. That would make it a lot easier to handle this in userspace, but
it would also make it easier to do things like concurrent clustering. Or
just organizing a table however you wanted.

That said, why not just pull what Heikki did into contrib, and add the
necessary mode to heap_update?
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org, Heikki <hlinnaka(at)iki(dot)fi>
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2015-04-24 20:34:01
Message-ID:	20150424203401.GR4369@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Jim Nasby wrote:

> Honestly, I'd prefer we exposed some way to influence where a new tuple gets
> put, and perhaps better ways of accessing tuples on a specific page. That
> would make it a lot easier to handle this in userspace, but it would also
> make it easier to do things like concurrent clustering. Or just organizing a
> table however you wanted.

That's great and all, but it doesn't help people who have already, for
whatever reason, ran into severe bloat and cannot take long enough
downtime to run VACUUM FULL.

> That said, why not just pull what Heikki did into contrib, and add the
> necessary mode to heap_update?

Sure, that's what I suggest. We just need to fix the bugs and (as Tom
puts it) "infelicities."

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	<pgsql-hackers(at)postgresql(dot)org>, Heikki <hlinnaka(at)iki(dot)fi>
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2015-04-24 21:36:12
Message-ID:	553AB74C.6040109@BlueTreble.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 4/24/15 3:34 PM, Alvaro Herrera wrote:
> Jim Nasby wrote:
>
>> Honestly, I'd prefer we exposed some way to influence where a new tuple gets
>> put, and perhaps better ways of accessing tuples on a specific page. That
>> would make it a lot easier to handle this in userspace, but it would also
>> make it easier to do things like concurrent clustering. Or just organizing a
>> table however you wanted.
>
> That's great and all, but it doesn't help people who have already, for
> whatever reason, ran into severe bloat and cannot take long enough
> downtime to run VACUUM FULL.
>
>> That said, why not just pull what Heikki did into contrib, and add the
>> necessary mode to heap_update?
>
> Sure, that's what I suggest. We just need to fix the bugs and (as Tom
> puts it) "infelicities."

It looks like the biggest complaint (aside from allowing a limited
number of tuples to be moved) is in [1] and [2], where Tom is saying
that you can't simply call heap_update() like this without holding an
exclusive lock on the table. Is that because we're not actually changing
the tuple?

Another issue is both HOT and KeyUpdate; I think we need to completely
ignore/over-ride that stuff for this.

Instead of adding forcefsm, I think it would be more useful to accept a
target block number. That way we can actually control where the new
tuple goes. For this particular case we'd presumably go with normal FSM
page selection logic, but someone could chose to to do something more
sophisticated if they wanted.

[1] http://postgresql.org/message-id/3409.1253147817@sss.pgh.pa.us
[2] http://postgresql.org/message-id/3631.1253149221@sss.pgh.pa.us
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org, Heikki <hlinnaka(at)iki(dot)fi>
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2015-04-24 22:30:00
Message-ID:	20150424223000.GS4369@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Jim Nasby wrote:

> It looks like the biggest complaint (aside from allowing a limited number of
> tuples to be moved) is in [1] and [2], where Tom is saying that you can't
> simply call heap_update() like this without holding an exclusive lock on the
> table. Is that because we're not actually changing the tuple?

That's nonsense -- obviously UPDATE can do heap_update without an
exclusive lock on the table, so the explanation must be something else.
I think his actual complaint was that you can't remove the old tuple
until concurrent readers of the table have already finished scanning it,
or you get into a situation where they might need to read the page in
which the original version resided, but your mini-vacuum already removed
it. So before removing it you need to wait until they are all finished.
This is the reason I mentioned CREATE INDEX CONCURRENTLY: if you wait
until those transactions are all gone (like CIC does), you are then free
to remove the old versions of the tuple, because you know that all
readers have a snapshot new enough to see the new version of the tuple.

> Another issue is both HOT and KeyUpdate; I think we need to completely
> ignore/over-ride that stuff for this.

You don't need anything for HOT, because cross-page updates are never
HOT. Not sure what you mean about KeyUpdate, but yeah you might need
something there -- obviously, you don't want to create multixacts
unnecessarily.

> Instead of adding forcefsm, I think it would be more useful to accept a
> target block number. That way we can actually control where the new tuple
> goes.

Whatever makes the most sense, I suppose. (Maybe we shouldn't consider
this a tweaked heap_update -- which is already complex enough -- but a
separate heapam entry point.)

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	<pgsql-hackers(at)postgresql(dot)org>, Heikki <hlinnaka(at)iki(dot)fi>
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2015-04-24 23:00:53
Message-ID:	553ACB25.5020801@BlueTreble.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 4/24/15 5:30 PM, Alvaro Herrera wrote:
> Jim Nasby wrote:
>
>> It looks like the biggest complaint (aside from allowing a limited number of
>> tuples to be moved) is in [1] and [2], where Tom is saying that you can't
>> simply call heap_update() like this without holding an exclusive lock on the
>> table. Is that because we're not actually changing the tuple?
>
> That's nonsense -- obviously UPDATE can do heap_update without an
> exclusive lock on the table, so the explanation must be something else.
> I think his actual complaint was that you can't remove the old tuple
> until concurrent readers of the table have already finished scanning it,
> or you get into a situation where they might need to read the page in
> which the original version resided, but your mini-vacuum already removed
> it. So before removing it you need to wait until they are all finished.
> This is the reason I mentioned CREATE INDEX CONCURRENTLY: if you wait
> until those transactions are all gone (like CIC does), you are then free
> to remove the old versions of the tuple, because you know that all
> readers have a snapshot new enough to see the new version of the tuple.

Except I don't see anywhere in the patch that's actually removing the
old tuple...

>> Another issue is both HOT and KeyUpdate; I think we need to completely
>> ignore/over-ride that stuff for this.
>
> You don't need anything for HOT, because cross-page updates are never
> HOT. Not sure what you mean about KeyUpdate, but yeah you might need
> something there -- obviously, you don't want to create multixacts
> unnecessarily.

If I'm not mistaken, if there's enough room left on the page then
HeapSatisfiesHOTandKeyUpdate() will say this tuple satisfies HOT. So
we'd have to do something to over-ride that, and I don't think the
current patch does that. (It might force it to a new page anyway, but it
does nothing with satisfies_hot, which I suspect isn't safe.)

>> Instead of adding forcefsm, I think it would be more useful to accept a
>> target block number. That way we can actually control where the new tuple
>> goes.
>
> Whatever makes the most sense, I suppose. (Maybe we shouldn't consider
> this a tweaked heap_update -- which is already complex enough -- but a
> separate heapam entry point.)

Yeah, I thought about creating heap_move, but I suspect that would still
have to worry about a lot of this other stuff anyway. Far more likely
for a change to be missed in heap_move than heap_update too.

I am tempted to add a SQL heap_move function though, assuming it's not
much extra work.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Heikki <hlinnaka(at)iki(dot)fi>
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2015-04-25 11:30:24
Message-ID:	CANP8+j+HioCDzzfrQ_ciytA1bjLs_rtqj9YQiwRnH2dnDr3DFw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 24 April 2015 at 22:36, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com> wrote:

> Instead of adding forcefsm, I think it would be more useful to accept a
> target block number. That way we can actually control where the new tuple
> goes. For this particular case we'd presumably go with normal FSM page
> selection logic, but someone could chose to to do something more
> sophisticated if they wanted.
>
> [1] http://postgresql.org/message-id/3409.1253147817@sss.pgh.pa.us
> [2] http://postgresql.org/message-id/3631.1253149221@sss.pgh.pa.us

I don't think specifying exact blocks will help, it will get us in more
trouble in the long run.

I think we need to be able to specify these update placement strategies

* TARGBLOCK_SAME - try to put the update on the same block if possible -
default
* TARGBLOCK_NEW - always force the update to go on a new block, to shrink
table rapidly

and these new block selection strategies

* FSM_ANY - Any block from FSM - default, as now
* FSM_NEAR - A block near the current one to maintain clustering as much as
possible - set automatically if table is clustered
* FSM_SHRINK - A block as near to block 0 as possible, while still handing
out different blocks to each backend by reselecting a block if we
experience write contention

I would suggest that if VACUUM finds the table is bloated beyond a specific
threshold it automatically puts it in FSM_SHRINK mode, and resets it back
to FSM_ANY once the bloat has reduced. That will naturally avoid bloat.

fsm modes can also be set manually to enforce bloat minimization.

We can also design a utility to actively use TARGBLOCK_NEW and FSM_SHRINK
to reduce table size without blocking writes.

But this is all stuff for 9.6...

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Heikki <hlinnaka(at)iki(dot)fi>
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2015-04-27 21:21:36
Message-ID:	553EA860.8020108@BlueTreble.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 4/25/15 6:30 AM, Simon Riggs wrote:
> On 24 April 2015 at 22:36, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com
> <mailto:Jim(dot)Nasby(at)bluetreble(dot)com>> wrote:
>
> Instead of adding forcefsm, I think it would be more useful to
> accept a target block number. That way we can actually control where
> the new tuple goes. For this particular case we'd presumably go with
> normal FSM page selection logic, but someone could chose to to do
> something more sophisticated if they wanted.
>
> [1] http://postgresql.org/message-id/3409.1253147817@sss.pgh.pa.us
> [2] http://postgresql.org/message-id/3631.1253149221@sss.pgh.pa.us
>
>
> I don't think specifying exact blocks will help, it will get us in more
> trouble in the long run.
>
> I think we need to be able to specify these update placement strategies
...
> and these new block selection strategies
...
> We can also design a utility to actively use TARGBLOCK_NEW and
> FSM_SHRINK to reduce table size without blocking writes.

I generally agree, but was trying to keep the scope on this more
manageable. A first step in this direction is just providing a method to
move a specific tuple to a specific page; if there's no room there throw
an error. Having some kind of SQL level support for that will be a lot
easier than adding those other modes to the FSM, and will at least allow
users to deal with bloat themselves.

> But this is all stuff for 9.6...

Definitely. :)
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Heikki <hlinnaka(at)iki(dot)fi>
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2015-04-28 18:32:01
Message-ID:	CA+TgmoaS_ymtTdGmTP15CTgMYKxEppDy_1SsGy+oqLbJyxKmhA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Apr 24, 2015 at 3:04 PM, Alvaro Herrera
<alvherre(at)2ndquadrant(dot)com> wrote:
> Heikki Linnakangas wrote:
>> Hmm, AFAICT pg_reorg is much more complex, writing stuff to a temp table
>> and swapping relfilenodes afterwards. More like the VACUUM REWRITE
>> that's been discussed.
>>
>> For the kicks, I looked at what it would take to write a utility like
>> that. It turns out to be quite trivial, patch attached. It uses the same
>> principle as VACUUM FULL, scans from the end, moving tuples to
>> lower-numbered pages until it can't do it anymore. It requires a small
>> change to heap_update(), to override the preference to store the new
>> tuple on the same page as the old one, but other than that, it's all in
>> the external module.
>
> More than five years have passed since Heikki posted this, and we still
> haven't found a solution to the problem -- which neverthless keeps
> biting people to the point that multiple "user-space" implementations of
> similar techniques are out there.

Yeah. The problem with solving this with an update is that a
concurrent "real" update may not see the expected behavior, especially
at higher isolation levels. Tom also complained that the CTID will
change, and somebody might care about that. But I think it's pretty
clear that a lot of people will be able to live with those problems,
and those who can't will be no worse off than now.

> I think what we need here is something that does heap_update to tuples
> at the end of the table, moving them to earlier pages; then wait for old
> snapshots to die (the infrastructure for which we have now, thanks to
> CREATE INDEX CONCURRENTLY); then truncate the empty pages. Of course,
> there are lots of details to resolve. It doesn't really matter that
> this runs for long: a process doing this for hours might be better than
> AccessExclusiveLock on the table for a much shorter period.

Why do you need to do anything other than update the tuples and let
autovacuum clean up the mess?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Heikki <hlinnaka(at)iki(dot)fi>
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2015-04-28 18:44:24
Message-ID:	20150428184424.GD4369@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas wrote:
> On Fri, Apr 24, 2015 at 3:04 PM, Alvaro Herrera
> <alvherre(at)2ndquadrant(dot)com> wrote:

> > I think what we need here is something that does heap_update to tuples
> > at the end of the table, moving them to earlier pages; then wait for old
> > snapshots to die (the infrastructure for which we have now, thanks to
> > CREATE INDEX CONCURRENTLY); then truncate the empty pages. Of course,
> > there are lots of details to resolve. It doesn't really matter that
> > this runs for long: a process doing this for hours might be better than
> > AccessExclusiveLock on the table for a much shorter period.
>
> Why do you need to do anything other than update the tuples and let
> autovacuum clean up the mess?

Sure, that's one option. I think autovac's current approach is too
heavyweight: it always has to scan the whole relation and all the
indexes. It might be more convenient to do something more
fine-grained; for instance, maybe instead of scanning the whole
relation, start from the end of the relation walking backwards and stop
once the first page containing a live or recently-dead tuple is found.
Perhaps, while scanning the indexes you know that all CTIDs with pages
higher than some threshold value are gone; you can remove them without
scanning the heap at all perhaps.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Heikki <hlinnaka(at)iki(dot)fi>
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2015-04-29 00:36:30
Message-ID:	5540278E.7080602@BlueTreble.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 4/28/15 1:32 PM, Robert Haas wrote:
>> More than five years have passed since Heikki posted this, and we still
>> >haven't found a solution to the problem -- which neverthless keeps
>> >biting people to the point that multiple "user-space" implementations of
>> >similar techniques are out there.
> Yeah. The problem with solving this with an update is that a
> concurrent "real" update may not see the expected behavior, especially
> at higher isolation levels. Tom also complained that the CTID will
> change, and somebody might care about that. But I think it's pretty
> clear that a lot of people will be able to live with those problems,
> and those who can't will be no worse off than now.

But that's the same thing that would happen during a real update, even
if it was just UPDATE SET some_field = some_field, no? Doesn't
heap_update already do everything that's necessary? Or are you worried
that doing this could be user-visible (which as long as it's a manual
process I think is OK)?
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Heikki <hlinnaka(at)iki(dot)fi>
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2015-04-29 17:03:28
Message-ID:	CA+Tgmoay-hU6sQaaMJOsQrF2hJdVzSXmWP8diGj1Nnvdb780LA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Apr 28, 2015 at 2:44 PM, Alvaro Herrera
<alvherre(at)2ndquadrant(dot)com> wrote:
>> > I think what we need here is something that does heap_update to tuples
>> > at the end of the table, moving them to earlier pages; then wait for old
>> > snapshots to die (the infrastructure for which we have now, thanks to
>> > CREATE INDEX CONCURRENTLY); then truncate the empty pages. Of course,
>> > there are lots of details to resolve. It doesn't really matter that
>> > this runs for long: a process doing this for hours might be better than
>> > AccessExclusiveLock on the table for a much shorter period.
>>
>> Why do you need to do anything other than update the tuples and let
>> autovacuum clean up the mess?
>
> Sure, that's one option. I think autovac's current approach is too
> heavyweight: it always has to scan the whole relation and all the
> indexes. It might be more convenient to do something more
> fine-grained; for instance, maybe instead of scanning the whole
> relation, start from the end of the relation walking backwards and stop
> once the first page containing a live or recently-dead tuple is found.
> Perhaps, while scanning the indexes you know that all CTIDs with pages
> higher than some threshold value are gone; you can remove them without
> scanning the heap at all perhaps.

I agree that scanning all of the indexes is awfully heavy-weight, but
I don't see how we're going to get around that. The problem with
index vac is not that it's expensive to decide which CTIDs need to get
killed, but that we have to search for them in every page of the
index. Unfortunately, I have no idea how to get around that. The
only alternative approach is to regenerate the index tuples we expect
to find based on the heap tuples we're killing and search the index
for them one at a time. Tom's been opposed to that in the past, but
maybe it's worth reconsidering.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Heikki <hlinnaka(at)iki(dot)fi>
Subject:	Re: Feedback on getting rid of VACUUM FULL
Date:	2015-04-29 17:34:51
Message-ID:	CAMkU=1xfYvO4XQxx3js+ZWw9SMDwFb56BY57EDJVMSkz04B9JA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Apr 28, 2015 at 11:32 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Fri, Apr 24, 2015 at 3:04 PM, Alvaro Herrera
> <alvherre(at)2ndquadrant(dot)com> wrote:
>
> > I think what we need here is something that does heap_update to tuples
> > at the end of the table, moving them to earlier pages; then wait for old
> > snapshots to die (the infrastructure for which we have now, thanks to
> > CREATE INDEX CONCURRENTLY); then truncate the empty pages. Of course,
> > there are lots of details to resolve. It doesn't really matter that
> > this runs for long: a process doing this for hours might be better than
> > AccessExclusiveLock on the table for a much shorter period.
>
> Why do you need to do anything other than update the tuples and let
> autovacuum clean up the mess?
>

It could take a long time before autovacuum kicked in and did so. I think
a lot of time when people need this, the lack of space in the file system
is blocking some other action they want to do, so they want a definitive
answer as to when the deed is done rather than manually polling the file
system with "df". You could invoke vacuum manually rather than waiting for
autovacuum, but it would kind of suck to do that only to find out you
didn't wait long enough for all the snapshots to go away and so no space
was actually released--and I don't think we have good ways of finding out
how long is long enough. Ways of squeezing tables in the background would
be nice, but so would a way of doing it in the foreground and getting a
message when it is complete.

Cheers,

Jeff