Quick Links

Revitalising VACUUM FULL for 8.3

Lists:	pgsql-hackers

From:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
To:	<pgsql-hackers(at)postgresql(dot)org>
Subject:	Revitalising VACUUM FULL for 8.3
Date:	2007-03-01 12:02:23
Message-ID:	1172750543.3760.1232.camel@silverbirch.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Use case for VACUUM FULL is very low these days. VACUUM does the most
important part of what VACUUM FULL offers, yet does it concurrently
rather than with a full table lock. VACUUM FULL also
- has very long execution time
- generates lots of WAL traffic
- uses lots of memory while it runs
- isn't as good at compacting a relation as CLUSTER
- sometimes requires multiple runs to properly compact data

CLUSTER has
- much better compaction than VACUUM FULL when run concurrently with
other transactions (yes, really!)
- need not generate WAL, in many cases
- offers no weird failure cases

I propose to use the CLUSTER technique in place of the current VACUUM
FULL code. The command VACUUM FULL would still exist, but would execute
the command in a different manner, very similar to the way CLUSTER
works, just without the sort-the-table feature.

The benefits of this approach would be:
- VACUUM FULL would compact relations much better than it does now
- would never need multiple executions to achieve good compaction
- operate much faster, with a single pass over the main relation
- it would compact as effectively as CLUSTER, yet execute faster
- need not generate WAL while it executes
- no requirement for large maintenance_work_mem

The potential downsides of this approach are the following:
- "it uses more space"
- "it breaks MVCC"
There are good answers to both these points, so please read on.

The manual says VACUUM FULL can be speeded up by dropping and
re-creating indexes, but it is still lengthy. It is even faster to drop
the indexes, do a CREATE TABLE AS SELECT * FROM table, drop the old
table and then rebuild the indexes.

So the main use case for current VACUUM FULL is when the space to be
freed inside the table is low enough to make defraging the table quicker
than than the above workaround, yet still high enough that we were
worried enough to do a VACUUM FULL. You must also be running it
concurrently with other transactions, but clearly ones that don't
include the current table because they will be locked out by the VACUUM
FULL. That's a tough requirement because this table is by-definition one
that is/has been heavily updated/deleted. And also VACUUM hasn't been
much use at freeing space at the end of the table, which it would only
fail to do with concurrently held locks. Thats a very narrow use case
and I doubt whether it exists at all any longer.

New VACUUM FULL would perform a SeqScan of the main relation using
SnapshotNow, inserting the results into a new relfilenode. If it crashes
part way through the first phase, we drop the file being built and table
is untouched. No WAL need be written while we are doing this, except
when archive_command is set. This produces a new relation which is
tightly compacted, free of older tuples and the physical file is no
larger than required. Vacuum delay points would also be supported. No FK
checks would be required, nor would other constraints need to
be-rechecked during the insertion. Index entries would not be made
during the insertions into the second heap. Just as with CLUSTER, the
second phase would consist of rebuilding all indexes, ensuring that they
too are as compact as possible.

Space usage of VACUUM FULL could be as high as twice the target table,
but only in the case where there wasn't anything to VACUUM or truncate.
In the typical use case we would be looking to remove large numbers of
dead tuples, as well as truncate the relation, so the actual space
overhead would be more typically only about +50% of the pre-VACUUM FULL
size of the target table. If execution hits an out-of-space error then
the command can quickly recover. If space really isn't available, then
indexes can be dropped manually and the process re-executed. Arranging
for more temp space is now easier with the new temp space management
code.

Current CLUSTER does not respect MVCC. It's possible for an old
serializable transaction to miss consistent data as a result. That is a
serious problem for pg_dump and I propose to fix that in this proposal,
for both CLUSTER and the new VACUUM FULL.

I would like to introduce the concept of utility transactions. This is
any transaction that touches only one table in a transaction and is not
returning or modifying data. All utility transactions wait until they
are older than all non-utility transactions before they commit. A
utility transaction would currently be any VACUUM, VACUUM FULL and
CREATE INDEX CONCURRENTLY. That is safe because each of those commands
executes in its own transaction and doesn't touch more than one table at
a time. Once each knows there is no chance of being interfered with, it
can continue its work and commit. This technique is already in use for
CREATE INDEX CONCURRENTLY, so just needs to be extended to all other
utilities - but in a way that allows them to recognise each other. This
extends upon the thought that VACUUMs already recognise other VACUUMs
and avoid using them as part of their Snapshot.

I would also like to detect cases where CLUSTER, CREATE INDEX and ALTER
TABLE are running in their own implicit top-level transaction, so that
these too can be recognised by the server as utility transactions.

The utility transaction concept would make new VACUUM FULL MVCC-safe and
would also make most executions of CLUSTER MVCC-safe also (the implicit
top-level transaction cases).

The coding for much of this looks very straightforward. I have to
rewrite VACUUM FULL to cope with HOT anyway, so this seems like both a
faster, more robust and easier development path to take. However, none
of the above arguments rely on the acceptance of the HOT patch to
increase their relevance for this proposal.

Comments?

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

From:	Csaba Nagy <nagy(at)ecircle-ag(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	postgres hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Revitalising VACUUM FULL for 8.3
Date:	2007-03-01 12:21:50
Message-ID:	1172751710.3772.111.camel@coppola.muc.ecircle.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, 2007-03-01 at 13:02, Simon Riggs wrote:
> I would like to introduce the concept of utility transactions. This is
> any transaction that touches only one table in a transaction and is not
> returning or modifying data. All utility transactions wait until they
> are older than all non-utility transactions before they commit. A
> utility transaction would currently be any VACUUM, VACUUM FULL and
> CREATE INDEX CONCURRENTLY. That is safe because each of those commands
> executes in its own transaction and doesn't touch more than one table at
> a time. Once each knows there is no chance of being interfered with, it
> can continue its work and commit. This technique is already in use for
> CREATE INDEX CONCURRENTLY, so just needs to be extended to all other
> utilities - but in a way that allows them to recognise each other. This
> extends upon the thought that VACUUMs already recognise other VACUUMs
> and avoid using them as part of their Snapshot.

Wouldn't this be deadlock prone ? What if a non-utility transaction
(which could even be started before the vacuum full) blocks on the table
being vacuumed, then if the vacuum wants to wait until all non-utility
transactions finish will deadlock.

> The utility transaction concept would make new VACUUM FULL MVCC-safe and
> would also make most executions of CLUSTER MVCC-safe also (the implicit
> top-level transaction cases).

Making cluster MVCC-safe will kill my back-door of clustering a hot
table while I run a full DB backup.

Cheers,
Csaba.

From:	Hannu Krosing <hannu(at)skype(dot)net>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Revitalising VACUUM FULL for 8.3
Date:	2007-03-01 12:32:24
Message-ID:	1172752344.3216.15.camel@localhost.localdomain
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Ühel kenal päeval, N, 2007-03-01 kell 12:02, kirjutas Simon Riggs:
> Use case for VACUUM FULL is very low these days. VACUUM does the most
> important part of what VACUUM FULL offers, yet does it concurrently
> rather than with a full table lock. VACUUM FULL also
> - has very long execution time
> - generates lots of WAL traffic
> - uses lots of memory while it runs
> - isn't as good at compacting a relation as CLUSTER
> - sometimes requires multiple runs to properly compact data

Yet another way to achieve VACUUM FULL-like results would be a COMPACT
TABLE command, which would do the following:

1 - start a forward sequential scan to find free space

2 - start a backwards seqscan to find live tuples

move live tuples to free space by doing a null update (UPDATE without
changing any field values) with new version being placed to lowest
possible age until the two scans intersect.

This, together with ordinary VACUUM would achieve almost the same
results as VACUUM FULL with much lower overhead. And it can also be done
in lots of smaller transactions instead of one big one if required.

If we can trust FSM, the whole process just becomes the backward scan
and null updates until the null update does not move tuple to a lower
page. Also, for the duration of COMPACT TABLE the updated tuple should
always be placed in lowes available slot, that is no same-page updates
should be tied before going to FSM.

This has some downsides :

1 - the original xmin will be lost

2 - as with any updates, it may block/abort other concurrent updates, so
it could be a good thing to teach the update mechanism about null
updates.

Still I think that this would be the chepest way to get VACUUM FULL
behaviour without locking the whole table for long time

--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com

From:	Hannu Krosing <hannu(at)skype(dot)net>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Revitalising VACUUM FULL for 8.3
Date:	2007-03-01 12:42:51
Message-ID:	1172752971.3216.21.camel@localhost.localdomain
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Ühel kenal päeval, N, 2007-03-01 kell 14:32, kirjutas Hannu Krosing:

> If we can trust FSM, the whole process just becomes the backward scan
> and null updates until the null update does not move tuple to a lower
> page. Also, for the duration of COMPACT TABLE the updated tuple should
> always be placed in lowes available slot, that is no same-page updates
> should be tied before going to FSM.
>
> This has some downsides :
>
> 1 - the original xmin will be lost
>
> 2 - as with any updates, it may block/abort other concurrent updates, so
> it could be a good thing to teach the update mechanism about null
> updates.
>
> Still I think that this would be the chepest way to get VACUUM FULL
> behaviour without locking the whole table for long time

This means that

VACUUM FULL mytable;

would translate to:

VACUUM mytable; -- make free space
COMPACT mytable; -- move tuples in a bunch of small transactions
-- might have a GUC for max trx length
VACUUM mytable; -- free the tuples at the end and give space back to fs

--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com

From:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
To:	"Csaba Nagy" <nagy(at)ecircle-ag(dot)com>
Cc:	"postgres hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Revitalising VACUUM FULL for 8.3
Date:	2007-03-01 12:56:46
Message-ID:	1172753807.3760.1251.camel@silverbirch.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, 2007-03-01 at 13:21 +0100, Csaba Nagy wrote:
> On Thu, 2007-03-01 at 13:02, Simon Riggs wrote:
> > I would like to introduce the concept of utility transactions. This is
> > any transaction that touches only one table in a transaction and is not
> > returning or modifying data. All utility transactions wait until they
> > are older than all non-utility transactions before they commit. A
> > utility transaction would currently be any VACUUM, VACUUM FULL and
> > CREATE INDEX CONCURRENTLY. That is safe because each of those commands
> > executes in its own transaction and doesn't touch more than one table at
> > a time. Once each knows there is no chance of being interfered with, it
> > can continue its work and commit. This technique is already in use for
> > CREATE INDEX CONCURRENTLY, so just needs to be extended to all other
> > utilities - but in a way that allows them to recognise each other. This
> > extends upon the thought that VACUUMs already recognise other VACUUMs
> > and avoid using them as part of their Snapshot.
>
> Wouldn't this be deadlock prone ? What if a non-utility transaction
> (which could even be started before the vacuum full) blocks on the table
> being vacuumed, then if the vacuum wants to wait until all non-utility
> transactions finish will deadlock.

Exactly the same as CREATE INDEX CONCURRENTLY, which has a much more
frequent use case than VACUUM FULL does, even after I've made the
proposed changes.

The situation, as I understand it, would be that the utility command
waits on another transaction to complete. As soon as that other
transaction touches the locked table it will detect a simple deadlock
and the non-utility statement will abort.

> > The utility transaction concept would make new VACUUM FULL MVCC-safe and
> > would also make most executions of CLUSTER MVCC-safe also (the implicit
> > top-level transaction cases).
>
> Making cluster MVCC-safe will kill my back-door of clustering a hot
> table while I run a full DB backup.

Wow. I'll take that as a request for a NOWAIT option on utility
commands, rather than a suggestion that we shouldn't strive to make
things safe in the default case.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

From:	Csaba Nagy <nagy(at)ecircle-ag(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	postgres hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Revitalising VACUUM FULL for 8.3
Date:	2007-03-01 13:04:06
Message-ID:	1172754246.3772.121.camel@coppola.muc.ecircle.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, 2007-03-01 at 13:56, Simon Riggs wrote:
> > Wouldn't this be deadlock prone ? What if a non-utility transaction
> > (which could even be started before the vacuum full) blocks on the table
> > being vacuumed, then if the vacuum wants to wait until all non-utility
> > transactions finish will deadlock.
>
> Exactly the same as CREATE INDEX CONCURRENTLY, which has a much more
> frequent use case than VACUUM FULL does, even after I've made the
> proposed changes.
>
> The situation, as I understand it, would be that the utility command
> waits on another transaction to complete. As soon as that other
> transaction touches the locked table it will detect a simple deadlock
> and the non-utility statement will abort.

Fair enough.

> > > The utility transaction concept would make new VACUUM FULL MVCC-safe and
> > > would also make most executions of CLUSTER MVCC-safe also (the implicit
> > > top-level transaction cases).
> >
> > Making cluster MVCC-safe will kill my back-door of clustering a hot
> > table while I run a full DB backup.
>
> Wow. I'll take that as a request for a NOWAIT option on utility
> commands, rather than a suggestion that we shouldn't strive to make
> things safe in the default case.

Yes please... if the current behavior is possible to be triggered (by
that NOWAIT for eg.), it would be actually good to have the MVCC
behavior as default.

Thanks,
Csaba.

From:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
To:	"Hannu Krosing" <hannu(at)skype(dot)net>
Cc:	<pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Revitalising VACUUM FULL for 8.3
Date:	2007-03-01 13:05:28
Message-ID:	1172754328.3760.1258.camel@silverbirch.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, 2007-03-01 at 14:32 +0200, Hannu Krosing wrote:
> Ühel kenal päeval, N, 2007-03-01 kell 12:02, kirjutas Simon Riggs:
> > Use case for VACUUM FULL is very low these days. VACUUM does the most
> > important part of what VACUUM FULL offers, yet does it concurrently
> > rather than with a full table lock. VACUUM FULL also
> > - has very long execution time
> > - generates lots of WAL traffic
> > - uses lots of memory while it runs
> > - isn't as good at compacting a relation as CLUSTER
> > - sometimes requires multiple runs to properly compact data
>
> Yet another way to achieve VACUUM FULL-like results would be a COMPACT
> TABLE command, which would do the following:
>
> 1 - start a forward sequential scan to find free space
>
> 2 - start a backwards seqscan to find live tuples
>
> move live tuples to free space by doing a null update (UPDATE without
> changing any field values) with new version being placed to lowest
> possible age until the two scans intersect.

Thats almost exactly what VACUUM FULL does now, just using two passes of
the table. 1st scan builds the free block list and tuples to move list,
then it starts from back, moving rows until it gets to the front. This
does sound like it would save some I/O by avoiding the second half of
the first scan. In most other respects its the same thing, AFAICS.

CLUSTER would still achieve better compaction and you'll still need to
write lots of WAL doing this.

ISTM a radical approach is needed, so I'm very open to discussion about
this and how we cope.

If we break down the main thoughts into a few parts:

1. would like a way to CLUSTER/VACUUM FULL where we don't have to move
all of the tuple versions, just the current ones.

2. would like a way to compact a table more efficiently

Your idea does (2) in a concurrent manner, which is very good.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

From:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
To:	"Hannu Krosing" <hannu(at)skype(dot)net>
Cc:	<pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Revitalising VACUUM FULL for 8.3
Date:	2007-03-01 13:14:14
Message-ID:	1172754854.3760.1267.camel@silverbirch.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, 2007-03-01 at 14:42 +0200, Hannu Krosing wrote:
> Ühel kenal päeval, N, 2007-03-01 kell 14:32, kirjutas Hannu Krosing:
>
> > If we can trust FSM, the whole process just becomes the backward scan
> > and null updates until the null update does not move tuple to a lower
> > page. Also, for the duration of COMPACT TABLE the updated tuple should
> > always be placed in lowes available slot, that is no same-page updates
> > should be tied before going to FSM.
> >
> > This has some downsides :
> >
> > 1 - the original xmin will be lost
> >
> > 2 - as with any updates, it may block/abort other concurrent updates, so
> > it could be a good thing to teach the update mechanism about null
> > updates.
> >
> > Still I think that this would be the chepest way to get VACUUM FULL
> > behaviour without locking the whole table for long time
>
> This means that
>
> VACUUM FULL mytable;
>
> would translate to:
>
> VACUUM mytable; -- make free space
-- stop here if nothing further to do
-- stop when freespace >= remaining space in table
-- update the FSM, so that concurrent inserts/updates
put their new versions at start of table
> COMPACT mytable; -- move tuples in a bunch of small transactions
> -- might have a GUC for max trx length
-- which => always perform COLD UPDATEs, never HOT ones
> VACUUM mytable; -- free the tuples at the end and give space back to fs
-- start the VACUUM from the first non-filled block

So if we do this, we wouldn't need to worry about HOT tuples at all, nor
would we need to wait until all transactions are gone.

This would also mean that VACUUM FULL could be achieved concurrently.

I'm on the edge here, want to persuade me some more?
Pro
- we don't need any more space

Cons
- not as fast as dump-to-new-file technique
- will end up with bloated indexes and REINDEX is still not concurrent

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

From:	"Zeugswetter Andreas ADI SD" <ZeugswetterA(at)spardat(dot)at>
To:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Hannu Krosing" <hannu(at)skype(dot)net>
Cc:	<pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Revitalising VACUUM FULL for 8.3
Date:	2007-03-01 13:35:15
Message-ID:	E1539E0ED7043848906A8FF995BDA57901CAFFCD@m0143.s-mxs.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> -- start the VACUUM from the first non-filled block
>
> So if we do this, we wouldn't need to worry about HOT tuples
> at all, nor would we need to wait until all transactions are gone.

You need to wait until you are allowed to truncate if you want
concurrency.
Or a concurrent scan might miss a row, because the visible tuple got
truncated away.

Andreas

From:	Hannu Krosing <hannu(at)skype(dot)net>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Revitalising VACUUM FULL for 8.3
Date:	2007-03-01 13:40:04
Message-ID:	1172756404.3216.34.camel@localhost.localdomain
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Ühel kenal päeval, N, 2007-03-01 kell 13:14, kirjutas Simon Riggs:
> On Thu, 2007-03-01 at 14:42 +0200, Hannu Krosing wrote:
> > Ühel kenal päeval, N, 2007-03-01 kell 14:32, kirjutas Hannu Krosing:
> >
> > > If we can trust FSM, the whole process just becomes the backward scan
> > > and null updates until the null update does not move tuple to a lower
> > > page. Also, for the duration of COMPACT TABLE the updated tuple should
> > > always be placed in lowes available slot, that is no same-page updates
> > > should be tied before going to FSM.
> > >
> > > This has some downsides :
> > >
> > > 1 - the original xmin will be lost
> > >
> > > 2 - as with any updates, it may block/abort other concurrent updates, so
> > > it could be a good thing to teach the update mechanism about null
> > > updates.
> > >
> > > Still I think that this would be the chepest way to get VACUUM FULL
> > > behaviour without locking the whole table for long time
> >
> > This means that
> >
> > VACUUM FULL mytable;
> >
> > would translate to:
> >
> > VACUUM mytable; -- make free space
> -- stop here if nothing further to do
> -- stop when freespace >= remaining space in table
> -- update the FSM, so that concurrent inserts/updates
> put their new versions at start of table

This still needs some tweaking, so that same-page updates don't happen
for tuples that need moving.

> > COMPACT mytable; -- move tuples in a bunch of small transactions
> > -- might have a GUC for max trx length
> -- which => always perform COLD UPDATEs, never HOT ones

Yes, that too.

> > VACUUM mytable; -- free the tuples at the end and give space back to fs
> -- start the VACUUM from the first non-filled block
>
> So if we do this, we wouldn't need to worry about HOT tuples at all, nor
> would we need to wait until all transactions are gone.

We would still need to do something for HOT tuples that are placed in
the "dense" part of table, that is below the eventual truncation point.

> This would also mean that VACUUM FULL could be achieved concurrently.

That was the main point, yes.

> I'm on the edge here, want to persuade me some more?

It should be much simpler to do than any of the CLUSTER/COPY variants -
"just" null updates with simple new-version placement .

> Pro
> - we don't need any more space

Actually we may need a little (or a lot), as the moved tuples need index
entries. But likely it is much less than for CLUSTER variant.

> Cons
> - not as fast as dump-to-new-file technique

It depends on table contents and amount/placement of dirty space - it
may be much faster.

> - will end up with bloated indexes and REINDEX is still not concurrent

If we get plan invalidation, then doing CREATE newindex; DROP oldindex;
would be a good replacement for REINDEX. And it can be done CONCURRENTLY
as well.

Work done on COMPACT TABLE could probably be reused later for some
version of CLUSTER CONCURRENTLY which would do something similar, but
pay more attention on ordering and/or clustering and fill factor

--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com

From:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
To:	"Zeugswetter Andreas ADI SD" <ZeugswetterA(at)spardat(dot)at>
Cc:	"Hannu Krosing" <hannu(at)skype(dot)net>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Revitalising VACUUM FULL for 8.3
Date:	2007-03-01 13:41:44
Message-ID:	1172756505.3760.1270.camel@silverbirch.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, 2007-03-01 at 14:35 +0100, Zeugswetter Andreas ADI SD wrote:
> > -- start the VACUUM from the first non-filled block
> >
> > So if we do this, we wouldn't need to worry about HOT tuples
> > at all, nor would we need to wait until all transactions are gone.
>
> You need to wait until you are allowed to truncate if you want
> concurrency.
> Or a concurrent scan might miss a row, because the visible tuple got
> truncated away.

I was not suggesting that we remove visible rows through truncation.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

From:	"Zeugswetter Andreas ADI SD" <ZeugswetterA(at)spardat(dot)at>
To:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc:	"Hannu Krosing" <hannu(at)skype(dot)net>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Revitalising VACUUM FULL for 8.3
Date:	2007-03-01 14:03:39
Message-ID:	E1539E0ED7043848906A8FF995BDA57901CAFFE8@m0143.s-mxs.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> > > -- start the VACUUM from the first non-filled block
> > >
> > > So if we do this, we wouldn't need to worry about HOT tuples at
all,
> > > nor would we need to wait until all transactions are gone.
> >
> > You need to wait until you are allowed to truncate if you want
> > concurrency.
> > Or a concurrent scan might miss a row, because the visible tuple got

> > truncated away.
>
> I was not suggesting that we remove visible rows through truncation.

Sure, unless you suggest to not truncate during this vacuum run ?
But we are talking about vacuum full, so truncation is essential.

It was suggested to do a dummy null update to move live tuples up front.
The old version is still visible for serializable txns.

Andreas

From:	Hannu Krosing <hannu(at)skype(dot)net>
To:	Zeugswetter Andreas ADI SD <ZeugswetterA(at)spardat(dot)at>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Revitalising VACUUM FULL for 8.3
Date:	2007-03-01 14:54:44
Message-ID:	1172760884.3216.37.camel@localhost.localdomain
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Ühel kenal päeval, N, 2007-03-01 kell 15:03, kirjutas Zeugswetter
Andreas ADI SD:
> > > > -- start the VACUUM from the first non-filled block
> > > >
> > > > So if we do this, we wouldn't need to worry about HOT tuples at
> all,
> > > > nor would we need to wait until all transactions are gone.
> > >
> > > You need to wait until you are allowed to truncate if you want
> > > concurrency.
> > > Or a concurrent scan might miss a row, because the visible tuple got
>
> > > truncated away.
> >
> > I was not suggesting that we remove visible rows through truncation.
>
> Sure, unless you suggest to not truncate during this vacuum run ?
> But we are talking about vacuum full, so truncation is essential.
>
> It was suggested to do a dummy null update to move live tuples up front.
> The old version is still visible for serializable txns.

There should probably be a wait-other-trxs to finish stage between
COMPACT and 2nd VACUUM if truncation at this cycle is absolutely needed

> Andreas
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: You can help support the PostgreSQL project by donating at
>
> http://www.postgresql.org/about/donate
--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com

From:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To:	Hannu Krosing <hannu(at)skype(dot)net>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Revitalising VACUUM FULL for 8.3
Date:	2007-03-01 15:49:06
Message-ID:	45E6F5F2.4030109@commandprompt.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> This means that
>
> VACUUM FULL mytable;
>
> would translate to:
>
> VACUUM mytable; -- make free space
> COMPACT mytable; -- move tuples in a bunch of small transactions
> -- might have a GUC for max trx length
> VACUUM mytable; -- free the tuples at the end and give space back to fs

VACUUM my table [n] tuples; -- free tuples at the end but only up to [n]
tuples.

Joshua D. Drake

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

From:	"Jim C(dot) Nasby" <jim(at)nasby(dot)net>
To:	Hannu Krosing <hannu(at)skype(dot)net>
Cc:	Zeugswetter Andreas ADI SD <ZeugswetterA(at)spardat(dot)at>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Revitalising VACUUM FULL for 8.3
Date:	2007-03-02 03:30:49
Message-ID:	20070302033048.GB15006@nasby.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Mar 01, 2007 at 04:54:44PM +0200, Hannu Krosing wrote:
> > > I was not suggesting that we remove visible rows through truncation.
> >
> > Sure, unless you suggest to not truncate during this vacuum run ?
> > But we are talking about vacuum full, so truncation is essential.
> >
> > It was suggested to do a dummy null update to move live tuples up front.
> > The old version is still visible for serializable txns.
>
> There should probably be a wait-other-trxs to finish stage between
> COMPACT and 2nd VACUUM if truncation at this cycle is absolutely needed

Or you could just do a vacuum a bit later. If we've also got the DSM at
that point, vacuum should be fast.
--
Jim Nasby jim(at)nasby(dot)net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

From:	"Jim C(dot) Nasby" <jim(at)nasby(dot)net>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Hannu Krosing <hannu(at)skype(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Revitalising VACUUM FULL for 8.3
Date:	2007-03-02 03:36:18
Message-ID:	20070302033618.GC15006@nasby.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Mar 01, 2007 at 01:05:28PM +0000, Simon Riggs wrote:
> ISTM a radical approach is needed, so I'm very open to discussion about
> this and how we cope.
>
> If we break down the main thoughts into a few parts:
>
> 1. would like a way to CLUSTER/VACUUM FULL where we don't have to move
> all of the tuple versions, just the current ones.
>
> 2. would like a way to compact a table more efficiently
>
> Your idea does (2) in a concurrent manner, which is very good.

Along similar lines, I think it would also be very useful to have a mode
where any time a tuple on the last X pages gets updated it's moved off
of it's existing page in the relation. Kind of like a 'shrink the heap
in the background'. Granted, this probably won't compact as much as
something more aggressive would, but it should be essentially free.
--
Jim Nasby jim(at)nasby(dot)net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)