Quick Links

Re: Visibility map, partial vacuums

Lists:	pgsql-hackers

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Visibility map, partial vacuums
Date:	2008-10-27 12:03:35
Message-ID:	4905AE17.7090305@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Here's finally my attempt at the visibility map, aka. the dead space
map. It's still work-in-progress, but it's time to discuss some design
details in detail. Patch attached, anyway, for reference.

Visibility Map is basically a bitmap, with one bit per heap page, with
'1' for pages that are known to contain only tuples that are visible to
everyone. Such pages don't need vacuuming, because there is no dead
tuples, and the information can also be used to skip visibility checks.
It should allow index-only-scans in the future, 8.5 perhaps, but that's
not part of this patch. The visibility map is stored in a new relation
fork, alongside the main data and the FSM.

Lazy VACUUM only needs to visit pages that are '0' in the visibility
map. This allows partial vacuums, where we only need to scan those parts
of the table that need vacuuming, plus all indexes.

To avoid having to update the visibility map every time a heap page is
updated, I have added a new flag to the heap page header,
PD_ALL_VISIBLE, which indicates the same thing as a set bit in the
visibility map: all tuples on the page are known to be visible to
everyone. When a page is modified, the visibility map only needs to be
updated if PD_ALL_VISIBLE was set. That should make the impact
unnoticeable for use cases with lots of updates, where the visibility
map doesn't help, as only the first update on page after a vacuum needs
to update the visibility map.

As a bonus, I'm using the PD_ALL_VISIBLE flag to skip visibility checks
in sequential scans. That seems to give a small 5-10% speedup on my
laptop, to a simple "SELECT COUNT(*) FROM foo" query, where foo is a
narrow table with just a single integer column, fitting in RAM.

The critical part of this patch is to keep the PD_ALL_VISIBLE flag and
the visibility map up-to-date, avoiding race conditions. An invariant is
maintained: if PD_ALL_VISIBLE flag is *not* set, the corresponding bit
in the visiblity map must also not be set. If PD_ALL_VISIBLE flag is
set, the bit in the visibility map can be set, or not.

To modify a page:
If PD_ALL_VISIBLE flag is set, the bit in the visibility map is cleared
first. The heap page is kept pinned, but not locked, while the
visibility map is updated. We want to avoid holding a lock across I/O,
even though the visibility map is likely to stay in cache. After the
visibility map has been updated, the page is exclusively locked and
modified as usual, and PD_ALL_VISIBLE flag is cleared before releasing
the lock.

To set the PD_ALL_VISIBLE flag, you must hold an exclusive lock on the
page, while you observe that all tuples on the page are visible to everyone.

To set the bit in the visibility map, you need to hold a cleanup lock on
the heap page. That keeps away other backends trying to clear the bit in
the visibility map at the same time. Note that you need to hold a lock
on the heap page to examine PD_ALL_VISIBLE, otherwise the cleanup lock
doesn't protect from the race condition.

That's how the patch works right now. However, there's a small
performance problem with the current approach: setting the
PD_ALL_VISIBLE flag must be WAL-logged. Otherwise, this could happen:
1. All tuples on a page become visible to everyone. The inserting
transaction committed, for example. A backend sees that and set
PD_ALL_VISIBLE
2. Vacuum comes along, and sees that there's no work to be done on the
page. It sets the bit in the visibility map.
3. The visibility map page is flushed to disk. The heap page is not, yet.
4. Crash

The bit in the visibility map is now set, but the corresponding
PD_ALL_VISIBLE flag is not, because it never made it to disk.

I'm avoiding that at the moment by only setting PD_ALL_VISIBLE as part
of a page prune operation, and forcing a WAL record to be written even
if no other work is done on the page. The downside of that is that it
can lead to a big increase in WAL traffic after a bulk load, for
example. The first vacuum after the bulk load would have to write a WAL
record for every heap page, even though there's no dead tuples.

One option would be to just ignore that problem for now, and not
WAL-log. As long as we don't use the visibility map for anything like
index-only-scans, it doesn't matter much if there's some bits set that
shouldn't be. It just means that VACUUM will skip some pages that need
vacuuming, but VACUUM FREEZE will eventually catch those. Given how
little time we have until commitfest and feature freeze, that's probably
the most reasonable thing to do. I'll follow up with other solutions to
that problem, but mainly for discussion for 8.5.

Another thing that does need to be fixed, is the way that the extension
and truncation of the visibility map is handled; that's broken in the
current patch. I started working on the patch a long time ago, before
the FSM rewrite was finished, and haven't gotten around fixing that part
yet. We already solved it for the FSM, so we could just follow that
pattern. The way we solved truncation in the FSM was to write a separate
WAL record with the new heap size, but perhaps we want to revisit that
decision, instead of adding again new code to write a third WAL record,
for truncation of the visibility map. smgrtruncate() writes a WAL record
of its own, if any full blocks are truncated away of the FSM, but we
needed a WAL record even if no full blocks are truncated from the FSM
file, because the "tail" of the last remaining FSM page, representing
the truncated away heap pages, still needs to cleared. Visibility map
has the same problem.

One proposal was to piggyback on the smgrtruncate() WAL-record, and call
FreeSpaceMapTruncateRel from smgr_redo(). I considered that ugly from a
modularity point of view; smgr.c shouldn't be calling higher-level
functions. But maybe it wouldn't be that bad, after all. Or, we could
remove WAL-logging from smgrtruncate() altogether, and move it to
RelationTruncate() or another higher-level function, and handle the
WAL-logging and replay there.

There's some side-effects of partial vacuums that also need to be fixed.
First of all, the tuple count stored in pg_class is now wrong: it only
includes tuples from the pages that are visited. VACUUM VERBOSE output
needs to be changed as well to reflect that only some pages were scanned.

Other TODOs
- performance testing, to ensure that there's no significant performance
penalty.
- should add a specialized version of visibilitymap_clear() for WAL
reaply, so that wouldn't have to rely so much on the fake relcache entries.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment	Content-Type	Size
visibilitymap-1.patch	text/x-diff	54.5 KB

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map, partial vacuums
Date:	2008-10-27 23:31:18
Message-ID:	26727.1225150278@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> To modify a page:
> If PD_ALL_VISIBLE flag is set, the bit in the visibility map is cleared
> first. The heap page is kept pinned, but not locked, while the
> visibility map is updated. We want to avoid holding a lock across I/O,
> even though the visibility map is likely to stay in cache. After the
> visibility map has been updated, the page is exclusively locked and
> modified as usual, and PD_ALL_VISIBLE flag is cleared before releasing
> the lock.

So after having determined that you will modify a page, you release the
ex lock on the buffer and then try to regain it later? Seems like a
really bad idea from here. What if it's no longer possible to do the
modification you intended?

> To set the PD_ALL_VISIBLE flag, you must hold an exclusive lock on the
> page, while you observe that all tuples on the page are visible to everyone.

That doesn't sound too good from a concurrency standpoint...

> That's how the patch works right now. However, there's a small
> performance problem with the current approach: setting the
> PD_ALL_VISIBLE flag must be WAL-logged. Otherwise, this could happen:

I'm more concerned about *clearing* the bit being WAL-logged. That's
necessary for correctness.

regards, tom lane

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map, partial vacuums
Date:	2008-10-28 11:25:08
Message-ID:	1225193108.3971.154.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 2008-10-27 at 14:03 +0200, Heikki Linnakangas wrote:
> One option would be to just ignore that problem for now, and not
> WAL-log.

Probably worth skipping for now, since it will cause patch conflicts if
you do. Are there any other interactions with Hot Standby?

But it seems like we can sneak in an extra flag on a HEAP2_CLEAN record
to say "page is now all visible", without too much work.

Does the PD_ALL_VISIBLE flag need to be set at the same time as updating
the VM? Surely heapgetpage() could do a ConditionalLockBuffer exclusive
to set the block flag (unlogged), but just not update VM. Separating the
two concepts should allow the visibility check speed gain to more
generally available.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map, partial vacuums
Date:	2008-10-28 12:57:13
Message-ID:	49070C29.9090508@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs wrote:
> On Mon, 2008-10-27 at 14:03 +0200, Heikki Linnakangas wrote:
>> One option would be to just ignore that problem for now, and not
>> WAL-log.
>
> Probably worth skipping for now, since it will cause patch conflicts if
> you do. Are there any other interactions with Hot Standby?
>
> But it seems like we can sneak in an extra flag on a HEAP2_CLEAN record
> to say "page is now all visible", without too much work.

Hmm. Even if a tuple is visible to everyone on the master, it's not
necessarily yet visible to all the read-only transactions in the slave.

> Does the PD_ALL_VISIBLE flag need to be set at the same time as updating
> the VM? Surely heapgetpage() could do a ConditionalLockBuffer exclusive
> to set the block flag (unlogged), but just not update VM. Separating the
> two concepts should allow the visibility check speed gain to more
> generally available.

Yes, that should be possible in theory. There's no version of
ConditionalLockBuffer() for conditionally upgrading a shared lock to
exclusive, but it should be possible in theory. I'm not sure if it would
be safe to set the PD_ALL_VISIBLE_FLAG while holding just a shared lock,
though. If it is, then we could do just that.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map, partial vacuums
Date:	2008-10-28 13:08:10
Message-ID:	18941.1225199290@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> ... I'm not sure if it would
> be safe to set the PD_ALL_VISIBLE_FLAG while holding just a shared lock,
> though. If it is, then we could do just that.

Seems like it must be safe. If you have shared lock on a page then no
one else could be modifying the page in a way that would falsify
PD_ALL_VISIBLE. You might have several processes concurrently try to
set the bit but that is safe (same situation as for hint bits).

The harder part is propagating the bit to the visibility map, but I
gather you intend to only allow VACUUM to do that?

regards, tom lane

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map, partial vacuums
Date:	2008-10-28 13:45:03
Message-ID:	4907175F.8050409@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> The harder part is propagating the bit to the visibility map, but I
> gather you intend to only allow VACUUM to do that?

Yep.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map, partial vacuums
Date:	2008-10-28 13:49:47
Message-ID:	4907187B.7060103@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> To modify a page:
>> If PD_ALL_VISIBLE flag is set, the bit in the visibility map is cleared
>> first. The heap page is kept pinned, but not locked, while the
>> visibility map is updated. We want to avoid holding a lock across I/O,
>> even though the visibility map is likely to stay in cache. After the
>> visibility map has been updated, the page is exclusively locked and
>> modified as usual, and PD_ALL_VISIBLE flag is cleared before releasing
>> the lock.
>
> So after having determined that you will modify a page, you release the
> ex lock on the buffer and then try to regain it later? Seems like a
> really bad idea from here. What if it's no longer possible to do the
> modification you intended?

In case of insert/update, you have to find a new target page. I put the
logic in RelationGetBufferForTuple(). In case of delete and update (old
page), the flag is checked and bit cleared just after pinning the
buffer, before doing anything else. (I note that that's not actually
what the patch is doing for heap_update, will fix..)

If we give up on the strict requirement that the bit in the visibility
map has to be cleared if the PD_ALL_VISIBLE flag on the page is not set,
then we could just update the visibility map after releasing the locks
on the heap pages. I think I'll do that for now, for simplicity.

>> To set the PD_ALL_VISIBLE flag, you must hold an exclusive lock on the
>> page, while you observe that all tuples on the page are visible to everyone.
>
> That doesn't sound too good from a concurrency standpoint...

Well, no, but it's only done in VACUUM. And pruning. I implemented it as
a new loop that call HeapTupleSatisfiesVacuum on each tuple, and
checking that xmin is old enough for live tuples, but come to think of
it, we're already calling HeapTupleSatisfiesVacuum for every tuple on
the page during VACUUM, so it should be possible to piggyback on that by
restructuring the code.

>> That's how the patch works right now. However, there's a small
>> performance problem with the current approach: setting the
>> PD_ALL_VISIBLE flag must be WAL-logged. Otherwise, this could happen:
>
> I'm more concerned about *clearing* the bit being WAL-logged. That's
> necessary for correctness.

Yes, clearing the PD_ALL_VISIBLE flag always needs to be WAL-logged.
There's a new boolean field in xl_heap_insert/update/delete records
indicating if the operation cleared the flag. On replay, if the flag was
cleared, the bit in the visibility map is also cleared.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map, partial vacuums
Date:	2008-10-28 14:22:15
Message-ID:	1225203735.3971.167.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, 2008-10-28 at 14:57 +0200, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > On Mon, 2008-10-27 at 14:03 +0200, Heikki Linnakangas wrote:
> >> One option would be to just ignore that problem for now, and not
> >> WAL-log.
> >
> > Probably worth skipping for now, since it will cause patch conflicts if
> > you do. Are there any other interactions with Hot Standby?
> >
> > But it seems like we can sneak in an extra flag on a HEAP2_CLEAN record
> > to say "page is now all visible", without too much work.
>
> Hmm. Even if a tuple is visible to everyone on the master, it's not
> necessarily yet visible to all the read-only transactions in the slave.

Never a problem. No query can ever see the rows removed by a cleanup
record, enforced by the recovery system.

> > Does the PD_ALL_VISIBLE flag need to be set at the same time as updating
> > the VM? Surely heapgetpage() could do a ConditionalLockBuffer exclusive
> > to set the block flag (unlogged), but just not update VM. Separating the
> > two concepts should allow the visibility check speed gain to more
> > generally available.
>
> Yes, that should be possible in theory. There's no version of
> ConditionalLockBuffer() for conditionally upgrading a shared lock to
> exclusive, but it should be possible in theory. I'm not sure if it would
> be safe to set the PD_ALL_VISIBLE_FLAG while holding just a shared lock,
> though. If it is, then we could do just that.

To be honest, I'm more excited about your perf results for that than I
am about speeding up some VACUUMs.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map, partial vacuums
Date:	2008-10-28 17:02:39
Message-ID:	490745AF.2090602@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs wrote:
> On Tue, 2008-10-28 at 14:57 +0200, Heikki Linnakangas wrote:
>> Simon Riggs wrote:
>>> On Mon, 2008-10-27 at 14:03 +0200, Heikki Linnakangas wrote:
>>>> One option would be to just ignore that problem for now, and not
>>>> WAL-log.
>>> Probably worth skipping for now, since it will cause patch conflicts if
>>> you do. Are there any other interactions with Hot Standby?
>>>
>>> But it seems like we can sneak in an extra flag on a HEAP2_CLEAN record
>>> to say "page is now all visible", without too much work.
>> Hmm. Even if a tuple is visible to everyone on the master, it's not
>> necessarily yet visible to all the read-only transactions in the slave.
>
> Never a problem. No query can ever see the rows removed by a cleanup
> record, enforced by the recovery system.

Yes, but there's a problem with recently inserted tuples:

1. A query begins in the slave, taking a snapshot with xmax = 100. So
the effects of anything more recent should not be seen.
2. Transaction 100 inserts a tuple in the master, and commits
3. A vacuum comes along. There's no other transactions running in the
master. Vacuum sees that all tuples on the page, including the one just
inserted, are visible to everyone, and sets PD_ALL_VISIBLE flag.
4. The change is replicated to the slave.
5. The query in the slave that began at step 1 looks at the page, sees
that the PD_ALL_VISIBLE flag is set. Therefore it skips the visibility
checks, and erroneously returns the inserted tuple.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map, partial vacuums
Date:	2008-10-28 17:03:15
Message-ID:	1225213395.3971.217.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 2008-10-27 at 14:03 +0200, Heikki Linnakangas wrote:

> Lazy VACUUM only needs to visit pages that are '0' in the visibility
> map. This allows partial vacuums, where we only need to scan those parts
> of the table that need vacuuming, plus all indexes.

Just realised that this means we still have to visit each block of a
btree index with a cleanup lock.

That means the earlier idea of saying I don't need a cleanup lock if the
page is not in memory makes a lot more sense with a partial vacuum.

1. Scan all blocks in memory for the index (and so, don't do this unless
the index is larger than a certain % of shared buffers),
2. Start reading in new blocks until you've removed the correct number
of tuples
3. Work through the rest of the blocks checking that they are either in
shared buffers and we can get a cleanup lock, or they aren't in shared
buffers and so nobody has them pinned.

If you step (2) intelligently with regard to index correlation you might
not need to do much I/O at all, if any.

(1) has a good hit ratio because mostly only active tables will be
vacuumed so are fairly likely to be in memory.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map, partial vacuums
Date:	2008-10-28 17:30:24
Message-ID:	1225215024.3971.221.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, 2008-10-28 at 19:02 +0200, Heikki Linnakangas wrote:

> Yes, but there's a problem with recently inserted tuples:
>
> 1. A query begins in the slave, taking a snapshot with xmax = 100. So
> the effects of anything more recent should not be seen.
> 2. Transaction 100 inserts a tuple in the master, and commits
> 3. A vacuum comes along. There's no other transactions running in the
> master. Vacuum sees that all tuples on the page, including the one just
> inserted, are visible to everyone, and sets PD_ALL_VISIBLE flag.
> 4. The change is replicated to the slave.
> 5. The query in the slave that began at step 1 looks at the page, sees
> that the PD_ALL_VISIBLE flag is set. Therefore it skips the visibility
> checks, and erroneously returns the inserted tuple.

Yep. I was thinking about FSM and row removal. So PD_ALL_VISIBLE must be
separately settable on the standby. Another reason why it should be able
to be set without a VACUUM - since there will never be one on standby.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map, partial vacuums
Date:	2008-10-28 17:58:12
Message-ID:	9966.1225216692@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> On Mon, 2008-10-27 at 14:03 +0200, Heikki Linnakangas wrote:
>> Lazy VACUUM only needs to visit pages that are '0' in the visibility
>> map. This allows partial vacuums, where we only need to scan those parts
>> of the table that need vacuuming, plus all indexes.

> Just realised that this means we still have to visit each block of a
> btree index with a cleanup lock.

Yes, and your proposal cannot fix that. Read "The Deletion Algorithm"
in nbtree/README, particularly the second paragraph.

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map, partial vacuums
Date:	2008-10-28 17:59:37
Message-ID:	9993.1225216777@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> Yes, but there's a problem with recently inserted tuples:

> 1. A query begins in the slave, taking a snapshot with xmax = 100. So
> the effects of anything more recent should not be seen.
> 2. Transaction 100 inserts a tuple in the master, and commits
> 3. A vacuum comes along. There's no other transactions running in the
> master. Vacuum sees that all tuples on the page, including the one just
> inserted, are visible to everyone, and sets PD_ALL_VISIBLE flag.
> 4. The change is replicated to the slave.
> 5. The query in the slave that began at step 1 looks at the page, sees
> that the PD_ALL_VISIBLE flag is set. Therefore it skips the visibility
> checks, and erroneously returns the inserted tuple.

But this is exactly equivalent to the problem with recently deleted
tuples: vacuum on the master might take actions that are premature with
respect to the status on the slave. Whatever solution we adopt for that
will work for this too.

regards, tom lane

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map, partial vacuums
Date:	2008-10-28 18:07:17
Message-ID:	1225217237.3971.225.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, 2008-10-28 at 13:58 -0400, Tom Lane wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> > On Mon, 2008-10-27 at 14:03 +0200, Heikki Linnakangas wrote:
> >> Lazy VACUUM only needs to visit pages that are '0' in the visibility
> >> map. This allows partial vacuums, where we only need to scan those parts
> >> of the table that need vacuuming, plus all indexes.
>
> > Just realised that this means we still have to visit each block of a
> > btree index with a cleanup lock.
>
> Yes, and your proposal cannot fix that. Read "The Deletion Algorithm"
> in nbtree/README, particularly the second paragraph.

Yes, understood. Please read the algorithm again. It does guarantee that
each block in the index has been checked to see if nobody is pinning it,
it just avoids performing I/O to prove that.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map, partial vacuums
Date:	2008-11-14 08:31:39
Message-ID:	491D376B.9000608@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas wrote:
> Another thing that does need to be fixed, is the way that the extension
> and truncation of the visibility map is handled; that's broken in the
> current patch. I started working on the patch a long time ago, before
> the FSM rewrite was finished, and haven't gotten around fixing that part
> yet. We already solved it for the FSM, so we could just follow that
> pattern. The way we solved truncation in the FSM was to write a separate
> WAL record with the new heap size, but perhaps we want to revisit that
> decision, instead of adding again new code to write a third WAL record,
> for truncation of the visibility map. smgrtruncate() writes a WAL record
> of its own, if any full blocks are truncated away of the FSM, but we
> needed a WAL record even if no full blocks are truncated from the FSM
> file, because the "tail" of the last remaining FSM page, representing
> the truncated away heap pages, still needs to cleared. Visibility map
> has the same problem.
>
> One proposal was to piggyback on the smgrtruncate() WAL-record, and call
> FreeSpaceMapTruncateRel from smgr_redo(). I considered that ugly from a
> modularity point of view; smgr.c shouldn't be calling higher-level
> functions. But maybe it wouldn't be that bad, after all. Or, we could
> remove WAL-logging from smgrtruncate() altogether, and move it to
> RelationTruncate() or another higher-level function, and handle the
> WAL-logging and replay there.

In preparation for the visibility map patch, I revisited the truncation
issue, and hacked together a patch to piggyback the FSM truncation to
the main fork smgr truncation WAL record. I moved the WAL-logging from
smgrtruncate() to RelationTruncate(). There's a new flag to
RelationTruncate indicating whether the FSM should be truncated too, and
only one truncation WAL record is written for the operation.

That does seem cleaner than the current approach where the FSM writes a
separate WAL record just to clear the bits of the last remaining FSM
page. I had to move RelationTruncate() to smgr.c, because I don't think
a function in bufmgr.c should be doing WAL-logging. However,
RelationTruncate really doesn't belong in smgr.c either. Also, now that
smgrtruncate doesn't write its own WAL record, it doesn't seem right for
smgrcreate to be doing that either.

So, I think I'll take this one step forward, and move RelationTruncate()
to a new higher level file, e.g. src/backend/catalog/storage.c, and also
create a new RelationCreateStorage() function that calls smgrcreate(),
and move the WAL-logging from smgrcreate() to RelationCreateStorage().

So, we'll have two functions in a new file:

/* Create physical storage for a relation. If 'fsm' is true, an FSM fork
is also created */
RelationCreateStorage(Relation rel, bool fsm)
/* Truncate the relation to 'nblocks' blocks. If 'fsm' is true, the FSM
is also truncated */
RelationTruncate(Relation rel, BlockNumber nblocks, bool fsm)

The next question is whether the "pending rel deletion" stuff in smgr.c
should be moved to the new file too. It seems like it would belong there
better. That would leave smgr.c as a very thin wrapper around md.c

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map, partial vacuums
Date:	2008-11-14 09:44:52
Message-ID:	87r65eej57.fsf@oxford.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:

> The next question is whether the "pending rel deletion" stuff in smgr.c should
> be moved to the new file too. It seems like it would belong there better. That
> would leave smgr.c as a very thin wrapper around md.c

Well it's just a switch, albeit with only one case, so I wouldn't expect it to
be much more than a thin wrapper.

If we had more storage systems it might be clearer what features were common
to all of them and could be hoisted up from md.c. I'm not clear there are any
though.

Actually I wonder if an entirely in-memory storage system would help with the
"temporary table" problem on systems where the kernel is too aggressive about
flushing file buffers or metadata.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's PostGIS support!

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map, partial vacuums
Date:	2008-11-14 13:38:26
Message-ID:	491D7F52.6070908@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas wrote:
> So, I think I'll take this one step forward, and move RelationTruncate()
> to a new higher level file, e.g. src/backend/catalog/storage.c, and also
> create a new RelationCreateStorage() function that calls smgrcreate(),
> and move the WAL-logging from smgrcreate() to RelationCreateStorage().
>
> So, we'll have two functions in a new file:
>
> /* Create physical storage for a relation. If 'fsm' is true, an FSM fork
> is also created */
> RelationCreateStorage(Relation rel, bool fsm)
> /* Truncate the relation to 'nblocks' blocks. If 'fsm' is true, the FSM
> is also truncated */
> RelationTruncate(Relation rel, BlockNumber nblocks, bool fsm)
>
> The next question is whether the "pending rel deletion" stuff in smgr.c
> should be moved to the new file too. It seems like it would belong there
> better. That would leave smgr.c as a very thin wrapper around md.c

This new approach feels pretty good to me, attached is a patch to do
just that. Many of the functions formerly in smgr.c are now in
src/backend/catalog/storage.c, including all the WAL-logging and pending
rel deletion stuff. I kept their old names for now, though perhaps they
should be renamed now that they're above smgr level.

I also implemented Tom's idea of delaying creation of the FSM until it's
needed, not because of performance, but because it started to get quite
hairy to keep track of which relations should have a FSM and which
shouldn't. Creation of the FSM fork is now treated more like extending a
relation, as a non-WAL-logged operation, and it's up to freespace.c to
create the file when it's needed. There's no operation to explicitly
delete an individual fork of a relation, RelationCreateStorage only
creates the main fork, RelationDropStorage drops all forks, and
RelationTruncate truncates the FSM if and only if the FSM fork exists.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment	Content-Type	Size
refactor-smgr-create-truncate-1.patch	text/x-diff	76.9 KB

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map, partial vacuums
Date:	2008-11-20 13:29:48
Message-ID:	4925664C.3090605@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I committed the changes to FSM truncation yesterday, that helps with the
truncation of the visibility map as well. Attached is an updated
visibility map patch.

There's two open issues:

1. The bits in the visibility map are set in the 1st phase of lazy
vacuum. That works, but it means that after a delete or update, it takes
two vacuums until the bit in the visibility map is set. The first vacuum
removes the dead tuple, and only the second sees that there's no dead
tuples and sets the bit.

2. Should modify the output of VACUUM VERBOSE to say how many pages were
actually scanned. What other information is relevant, or is no longer
relevant, with partial vacuums.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment	Content-Type	Size
visibilitymap-2.patch	text/x-diff	48.2 KB

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map, partial vacuums
Date:	2008-11-23 19:05:12
Message-ID:	26361.1227467112@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> I committed the changes to FSM truncation yesterday, that helps with the
> truncation of the visibility map as well. Attached is an updated
> visibility map patch.

I looked over this patch a bit ...

> 1. The bits in the visibility map are set in the 1st phase of lazy
> vacuum. That works, but it means that after a delete or update, it takes
> two vacuums until the bit in the visibility map is set. The first vacuum
> removes the dead tuple, and only the second sees that there's no dead
> tuples and sets the bit.

I think this is probably not a big issue really. The point of this change
is to optimize things for pages that are static over the long term; one
extra vacuum cycle before the page is deemed static doesn't seem like a
problem. You could even argue that this saves I/O because we don't set
the bit (and perhaps later have to clear it) until we know that the page
has stayed static across a vacuum cycle and thus has a reasonable
probability of continuing to do so.

A possible problem is that if a relation is filled all in one shot,
autovacuum would trigger a single vacuum cycle on it and then never have
a reason to trigger another; leading to the bits never getting set (or
at least not till an antiwraparound vacuum occurs). We might want to
tweak autovac so that an extra vacuum cycle occurs in this case. But
I'm not quite sure what a reasonable heuristic would be.

Some other points:

* ISTM that the patch is designed on the plan that the PD_ALL_VISIBLE
page header flag *must* be correct, but it's really okay if the backing
map bit *isn't* correct --- in particular we don't trust the map bit
when performing antiwraparound vacuums. This isn't well documented.

* Also, I see that vacuum has a provision for clearing an incorrectly
set PD_ALL_VISIBLE flag, but shouldn't it fix the map too?

* It would be good if the visibility map fork were never created until
there is occasion to set a bit in it; this would for instance typically
mean that temp tables would never have one. I think that
visibilitymap.c doesn't get this quite right --- in particular
vm_readbuf seems willing to create/extend the fork whether its extend
argument is true or not, so it looks like an inquiry operation would
cause the map fork to be created. It should be possible to act as
though a nonexistent fork just means "all zeroes".

* heap_insert's all_visible_cleared variable doesn't seem to get
initialized --- didn't your compiler complain?

* You missed updating SizeOfHeapDelete and SizeOfHeapUpdate

regards, tom lane

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map, partial vacuums
Date:	2008-11-23 20:18:58
Message-ID:	1227471538.7370.14.camel@jdavis-laptop
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, 2008-11-23 at 14:05 -0500, Tom Lane wrote:
> A possible problem is that if a relation is filled all in one shot,
> autovacuum would trigger a single vacuum cycle on it and then never have
> a reason to trigger another; leading to the bits never getting set (or
> at least not till an antiwraparound vacuum occurs). We might want to
> tweak autovac so that an extra vacuum cycle occurs in this case. But
> I'm not quite sure what a reasonable heuristic would be.
>

This would only be an issue if using the visibility map for things other
than partial vacuum (e.g. index-only scan), right? If we never do
another VACUUM, we don't need partial vacuum.

Regards,
Jeff Davis

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map, partial vacuums
Date:	2008-11-23 21:18:42
Message-ID:	27709.1227475122@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Jeff Davis <pgsql(at)j-davis(dot)com> writes:
> On Sun, 2008-11-23 at 14:05 -0500, Tom Lane wrote:
>> A possible problem is that if a relation is filled all in one shot,
>> autovacuum would trigger a single vacuum cycle on it and then never have
>> a reason to trigger another; leading to the bits never getting set (or
>> at least not till an antiwraparound vacuum occurs).

> This would only be an issue if using the visibility map for things other
> than partial vacuum (e.g. index-only scan), right? If we never do
> another VACUUM, we don't need partial vacuum.

Well, the patch already uses the page header bits for optimization of
seqscans, and could probably make good use of them for bitmap scans too.
It'd be nice if the page header bits got set even if the map bits
didn't.

Reflecting on it though, maybe Heikki described the behavior too
pessimistically anyway. If a page contains no dead tuples, it should
get its bits set on first visit anyhow, no? So for the ordinary bulk
load scenario where there are no failed insertions, the first vacuum
pass should set all the bits ... at least, if enough time has passed
for RecentXmin to be past the inserting transaction.

However, my comment above was too optimistic, because in an insert-only
scenario autovac would in fact not trigger VACUUM at all, only ANALYZE.

So it seems like we do indeed want to rejigger autovac's rules a bit
to account for the possibility of wanting to apply vacuum to get
visibility bits set.

regards, tom lane

From:	"Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Jeff Davis <pgsql(at)j-davis(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map, partial vacuums
Date:	2008-11-23 22:28:51
Message-ID:	4929D923.7050001@zeut.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> However, my comment above was too optimistic, because in an insert-only
> scenario autovac would in fact not trigger VACUUM at all, only ANALYZE.
>
> So it seems like we do indeed want to rejigger autovac's rules a bit
> to account for the possibility of wanting to apply vacuum to get
> visibility bits set.

I'm sure I'm missing something, but I thought the point of this was to
lessen the impact of VACUUM and now you are suggesting that we have to
add vacuums to tables that have never needed one before.

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map, partial vacuums
Date:	2008-11-24 07:31:22
Message-ID:	492A584A.4050704@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> Reflecting on it though, maybe Heikki described the behavior too
> pessimistically anyway. If a page contains no dead tuples, it should
> get its bits set on first visit anyhow, no? So for the ordinary bulk
> load scenario where there are no failed insertions, the first vacuum
> pass should set all the bits ... at least, if enough time has passed
> for RecentXmin to be past the inserting transaction.

Right. I did say "... after a delete or update, it takes two vacuums
until ..." in my mail.

> However, my comment above was too optimistic, because in an insert-only
> scenario autovac would in fact not trigger VACUUM at all, only ANALYZE.
>
> So it seems like we do indeed want to rejigger autovac's rules a bit
> to account for the possibility of wanting to apply vacuum to get
> visibility bits set.

I'm not too excited about triggering an extra vacuum. As Matthew pointed
out, the point of this patch is to reduce the number of vacuums
required, not increase it. If you're not going to vacuum a table, you
don't care if the bits in the visibility map are set or not.

We could set the PD_ALL_VISIBLE flag more aggressively, outside VACUUMs,
if we want to make the seqscan optimization more effective. For example,
a seqscan could set the flag too, if it sees that all the tuples were
visible, and had the XMIN_COMMITTED and XMAX_INVALID hint bits set.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map, partial vacuums
Date:	2008-11-24 08:05:06
Message-ID:	492A6032.6080000@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> * ISTM that the patch is designed on the plan that the PD_ALL_VISIBLE
> page header flag *must* be correct, but it's really okay if the backing
> map bit *isn't* correct --- in particular we don't trust the map bit
> when performing antiwraparound vacuums. This isn't well documented.

Right. Will add comments.

We can't use the map bit for antiwraparound vacuums, because the bit
doesn't tell you when the tuples have been frozen. And we can't advance
relfrozenxid if we've skipped any pages.

I've been thinking that we could add one frozenxid field to each
visibility map page, for the oldest xid on the heap pages covered by the
visibility map page. That would allow more fine-grained anti-wraparound
vacuums as well.

> * Also, I see that vacuum has a provision for clearing an incorrectly
> set PD_ALL_VISIBLE flag, but shouldn't it fix the map too?

Yes, will fix. Although, as long as we don't trust the visibility map,
no real damage would be done.

> * It would be good if the visibility map fork were never created until
> there is occasion to set a bit in it; this would for instance typically
> mean that temp tables would never have one. I think that
> visibilitymap.c doesn't get this quite right --- in particular
> vm_readbuf seems willing to create/extend the fork whether its extend
> argument is true or not, so it looks like an inquiry operation would
> cause the map fork to be created. It should be possible to act as
> though a nonexistent fork just means "all zeroes".

The visibility map won't be inquired unless you vacuum. This is a bit
tricky. In vacuum, we only know whether we can set a bit or not, after
we've acquired a cleanup lock on the page, and scanned all the tuples.
While we're holding a cleanup lock, we don't want to do I/O, which could
potentially block out other processes for a long time. So it's too late
to extend the visibility map at that point.

I agree that vm_readbuf should not create the fork if 'extend' is false,
that's an oversight, but it won't change the actual behavior because
visibilitymap_test calls it with 'extend' true. Because of the above.

I will add comments about that, though, there's nothing describing that
currently.

> * heap_insert's all_visible_cleared variable doesn't seem to get
> initialized --- didn't your compiler complain?

Hmph, I must've been compiling with -O0.

> * You missed updating SizeOfHeapDelete and SizeOfHeapUpdate

Thanks.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map, partial vacuums
Date:	2008-11-24 14:37:59
Message-ID:	18086.1227537479@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> I've been thinking that we could add one frozenxid field to each
> visibility map page, for the oldest xid on the heap pages covered by the
> visibility map page. That would allow more fine-grained anti-wraparound
> vacuums as well.

This doesn't strike me as a particularly good idea. Right now the map
is only hints as far as vacuum is concerned --- if you do the above then
the map becomes critical data. And I don't really think you'll buy
much.

> The visibility map won't be inquired unless you vacuum. This is a bit
> tricky. In vacuum, we only know whether we can set a bit or not, after
> we've acquired a cleanup lock on the page, and scanned all the tuples.
> While we're holding a cleanup lock, we don't want to do I/O, which could
> potentially block out other processes for a long time. So it's too late
> to extend the visibility map at that point.

This is no good; I think you've made the wrong tradeoffs. In
particular, even though only vacuum *currently* uses the map, you want
to extend it to be used by indexscans. So it's going to uselessly
spring into being even without vacuums.

I'm not convinced that I/O while holding cleanup lock is so bad that we
should break other aspects of the system to avoid it. However, if you
want to stick to that, how about
* vacuum page, possibly set its header bit
* release page lock (but not pin)
* if we need to set the bit, fetch the corresponding map page
(I/O might happen here)
* get share lock on heap page, then recheck its header bit;
if still set, set the map bit

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map, partial vacuums
Date:	2008-11-24 14:40:30
Message-ID:	18123.1227537630@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> Tom Lane wrote:
>> So it seems like we do indeed want to rejigger autovac's rules a bit
>> to account for the possibility of wanting to apply vacuum to get
>> visibility bits set.

> I'm not too excited about triggering an extra vacuum. As Matthew pointed
> out, the point of this patch is to reduce the number of vacuums
> required, not increase it. If you're not going to vacuum a table, you
> don't care if the bits in the visibility map are set or not.

But it's already the case that the bits provide a performance increase
to other things besides vacuum.

> We could set the PD_ALL_VISIBLE flag more aggressively, outside VACUUMs,
> if we want to make the seqscan optimization more effective. For example,
> a seqscan could set the flag too, if it sees that all the tuples were
> visible, and had the XMIN_COMMITTED and XMAX_INVALID hint bits set.

I was wondering whether we could teach heap_page_prune to set the flag
without adding any extra tuple visibility checks. A seqscan per se
shouldn't be doing this because it doesn't normally call
HeapTupleSatifiesVacuum.

regards, tom lane

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map, partial vacuums
Date:	2008-11-24 15:23:31
Message-ID:	87ljv9rvv0.fsf@oxford.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> I've been thinking that we could add one frozenxid field to each
>> visibility map page, for the oldest xid on the heap pages covered by the
>> visibility map page. That would allow more fine-grained anti-wraparound
>> vacuums as well.
>
> This doesn't strike me as a particularly good idea. Right now the map
> is only hints as far as vacuum is concerned --- if you do the above then
> the map becomes critical data. And I don't really think you'll buy
> much.

Hm, that depends on how critical the critical data is. It's critical that the
frozenxid that autovacuum sees is no more recent than the actual frozenxid,
but not critical that it be entirely up-to-date otherwise.

So if it's possible for the frozenxid in the visibility map to go backwards
then it's no good, since if that update is lost we might skip a necessary
vacuum freeze. But if we guarantee that we never update the frozenxid in the
visibility map forward ahead of recentglobalxmin then it can't ever go
backwards. (Well, not in a way that matters)

However I'm a bit puzzled how you could possibly maintain this frozenxid. As
soon as you freeze an xid you'll have to visit all the other pages covered by
that visibility map page to see what the new value should be.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's 24x7 Postgres support!

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Gregory Stark <stark(at)enterprisedb(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map, partial vacuums
Date:	2008-11-24 15:55:08
Message-ID:	492ACE5C.70905@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Gregory Stark wrote:
> However I'm a bit puzzled how you could possibly maintain this frozenxid. As
> soon as you freeze an xid you'll have to visit all the other pages covered by
> that visibility map page to see what the new value should be.

Right, you could only advance it when you scan all the pages covered by
the visibility map page. But that's better than having to scan the whole
relation.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map, partial vacuums
Date:	2008-11-24 16:23:42
Message-ID:	87bpw5rt2p.fsf@oxford.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:

> Gregory Stark wrote:
>> However I'm a bit puzzled how you could possibly maintain this frozenxid. As
>> soon as you freeze an xid you'll have to visit all the other pages covered by
>> that visibility map page to see what the new value should be.
>
> Right, you could only advance it when you scan all the pages covered by the
> visibility map page. But that's better than having to scan the whole relation.

Is it? It seems like that would just move around the work. You'll still have
to visit every page once ever 2B transactions or so. You'll just do it 64MB at
a time.

It's nice to smooth the work but it would be much nicer to detect that a
normal vacuum has already processed all of those pages since the last
insert/update/delete on those pages and so avoid the work entirely.

To avoid the work entirely you need some information about the oldest xid on
those pages seen by regular vacuums (and/or prunes).

We would want to skip any page which:

a) Has been visited by vacuum freeze and not been updated since

b) Has been visited by a regular vacuum and the oldest xid found was more
recent than freeze_threshold.

c) Has been updated frequently such that no old tuples remain

Ideally (b) should completely obviate the need for anti-wraparound freezes
entirely.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's 24x7 Postgres support!

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Gregory Stark <stark(at)enterprisedb(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map, partial vacuums
Date:	2008-11-24 19:03:41
Message-ID:	4502.1227553421@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> So if it's possible for the frozenxid in the visibility map to go backwards
> then it's no good, since if that update is lost we might skip a necessary
> vacuum freeze.

Seems like a lost disk write would be enough to make that happen.

Now you might argue that the odds of that are no worse than the odds of
losing an update to one particular heap page, but in this case the
single hiccup could lead to losing half a gigabyte of data (assuming 8K
page size). The leverage you get for saving vacuum freeze work is
exactly equal to the magnification factor for data loss.

regards, tom lane

From:	Decibel! <decibel(at)decibel(dot)org>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Jeff Davis <pgsql(at)j-davis(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map, partial vacuums
Date:	2008-11-25 23:45:36
Message-ID:	294448DB-AD20-4E4B-BDC6-6937D9E01039@decibel.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Nov 23, 2008, at 3:18 PM, Tom Lane wrote:
> So it seems like we do indeed want to rejigger autovac's rules a bit
> to account for the possibility of wanting to apply vacuum to get
> visibility bits set.

That makes the idea of not writing out hint bit updates unless the
page is already dirty a lot easier to swallow, because now we'd have
a mechanism in place to ensure that they were set in a reasonable
timeframe by autovacuum. That actually wouldn't incur much extra
overhead at all, except in the case of a table that's effectively
write-only. Actually, that's not even true; you still have to
eventually freeze a write-mostly table.
--
Decibel!, aka Jim C. Nasby, Database Architect decibel(at)decibel(dot)org
Give your computer some brain candy! www.distributed.net Team #1828

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map, partial vacuums
Date:	2008-11-26 12:43:12
Message-ID:	492D4460.1000809@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> The visibility map won't be inquired unless you vacuum. This is a bit
>> tricky. In vacuum, we only know whether we can set a bit or not, after
>> we've acquired a cleanup lock on the page, and scanned all the tuples.
>> While we're holding a cleanup lock, we don't want to do I/O, which could
>> potentially block out other processes for a long time. So it's too late
>> to extend the visibility map at that point.
>
> This is no good; I think you've made the wrong tradeoffs. In
> particular, even though only vacuum *currently* uses the map, you want
> to extend it to be used by indexscans. So it's going to uselessly
> spring into being even without vacuums.
>
> I'm not convinced that I/O while holding cleanup lock is so bad that we
> should break other aspects of the system to avoid it. However, if you
> want to stick to that, how about
> * vacuum page, possibly set its header bit
> * release page lock (but not pin)
> * if we need to set the bit, fetch the corresponding map page
> (I/O might happen here)
> * get share lock on heap page, then recheck its header bit;
> if still set, set the map bit

Yeah, could do that.

There is another problem, though, if the map is frequently probed for
pages that don't exist in the map, or the map doesn't exist at all.
Currently, the size of the map file is kept in relcache, in the
rd_vm_nblocks_cache variable. Whenever a page is accessed that's >
rd_vm_nblocks_cache, smgrnblocks is called to see if the page exists,
and rd_vm_nblocks_cache is updated. That means that every probe to a
non-existing page causes an lseek(), which isn't free.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map, partial vacuums
Date:	2008-11-26 13:12:15
Message-ID:	5856.1227705135@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> There is another problem, though, if the map is frequently probed for
> pages that don't exist in the map, or the map doesn't exist at all.
> Currently, the size of the map file is kept in relcache, in the
> rd_vm_nblocks_cache variable. Whenever a page is accessed that's >
> rd_vm_nblocks_cache, smgrnblocks is called to see if the page exists,
> and rd_vm_nblocks_cache is updated. That means that every probe to a
> non-existing page causes an lseek(), which isn't free.

Well, considering how seldom new pages will be added to the visibility
map, it seems to me we could afford to send out a relcache inval event
when that happens. Then rd_vm_nblocks_cache could be treated as
trustworthy.

Maybe it'd be worth doing that for the FSM too. The frequency of
invals would be higher, but then again the reference frequency is
probably higher too?

regards, tom lane

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map, partial vacuums
Date:	2008-11-26 13:32:32
Message-ID:	492D4FF0.2000606@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> There is another problem, though, if the map is frequently probed for
>> pages that don't exist in the map, or the map doesn't exist at all.
>> Currently, the size of the map file is kept in relcache, in the
>> rd_vm_nblocks_cache variable. Whenever a page is accessed that's >
>> rd_vm_nblocks_cache, smgrnblocks is called to see if the page exists,
>> and rd_vm_nblocks_cache is updated. That means that every probe to a
>> non-existing page causes an lseek(), which isn't free.
>
> Well, considering how seldom new pages will be added to the visibility
> map, it seems to me we could afford to send out a relcache inval event
> when that happens. Then rd_vm_nblocks_cache could be treated as
> trustworthy.
>
> Maybe it'd be worth doing that for the FSM too. The frequency of
> invals would be higher, but then again the reference frequency is
> probably higher too?

A relcache invalidation sounds awfully heavy-weight. Perhaps a
light-weight invalidation event that doesn't flush the entry altogether,
but just resets the cached sizes?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map, partial vacuums
Date:	2008-11-26 13:58:30
Message-ID:	7144.1227707910@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> Tom Lane wrote:
>> Well, considering how seldom new pages will be added to the visibility
>> map, it seems to me we could afford to send out a relcache inval event
>> when that happens. Then rd_vm_nblocks_cache could be treated as
>> trustworthy.

> A relcache invalidation sounds awfully heavy-weight.

It really isn't.

regards, tom lane

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map, partial vacuums
Date:	2008-11-26 15:44:24
Message-ID:	492D6ED8.5070105@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> Tom Lane wrote:
>>> Well, considering how seldom new pages will be added to the visibility
>>> map, it seems to me we could afford to send out a relcache inval event
>>> when that happens. Then rd_vm_nblocks_cache could be treated as
>>> trustworthy.
>
>> A relcache invalidation sounds awfully heavy-weight.
>
> It really isn't.

Okay, then. I'll use relcache invalidation for both the FSM and
visibility map.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map, partial vacuums
Date:	2008-11-27 19:44:15
Message-ID:	492EF88F.9050709@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Here's an updated version, with a lot of smaller cleanups, and using
relcache invalidation to notify other backends when the visibility map
fork is extended. I already committed the change to FSM to do the same.
I'm feeling quite satisfied to commit this patch early next week.

I modified the VACUUM VERBOSE output slightly, to print the number of
pages scanned. The added part emphasized below:

postgres=# vacuum verbose foo;
INFO: vacuuming "public.foo"
INFO: "foo": removed 230 row versions in 10 pages
INFO: "foo": found 230 removable, 10 nonremovable row versions in *10
out of* 43 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM

That seems OK to me, but maybe others have an opinion on that?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Visibility map, partial vacuums
Date:	2008-11-27 20:37:48
Message-ID:	492F051C.40403@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas wrote:
> Here's an updated version, ...

And here it is, for real...

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment	Content-Type	Size
visibilitymap-3.patch	text/x-diff	55.6 KB

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Visibility map, partial vacuums
Date:	2008-12-03 13:51:20
Message-ID:	20081203135120.GF3968@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas wrote:

> Hmm. It just occurred to me that I think this circumvented the
> anti-wraparound vacuuming: a normal vacuum doesn't advance relfrozenxid
> anymore. We'll need to disable the skipping when autovacuum is triggered
> to prevent wraparound. VACUUM FREEZE does that already, but it's
> unnecessarily aggressive in freezing.

Heh :-) Yes, this should be handled sanely, without having to invoke
FREEZE.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Visibility map, partial vacuums
Date:	2008-12-03 13:58:05
Message-ID:	87tz9lqs2a.fsf@oxford.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:

> Hmm. It just occurred to me that I think this circumvented the anti-wraparound
> vacuuming: a normal vacuum doesn't advance relfrozenxid anymore. We'll need to
> disable the skipping when autovacuum is triggered to prevent wraparound. VACUUM
> FREEZE does that already, but it's unnecessarily aggressive in freezing.

Having seen how the anti-wraparound vacuums work in the field I think merely
replacing it with a regular vacuum which covers the whole table will not
actually work well.

What will happen is that, because nothing else is advancing the relfrozenxid,
the age of the relfrozenxid for all tables will advance until they all hit
autovacuum_max_freeze_age. Quite often all the tables were created around the
same time so they will all hit autovacuum_max_freeze_age at the same time.

So a database which was operating fine and receiving regular vacuums at a
reasonable pace will suddenly be hit by vacuums for every table all at the
same time, 3 at a time. If you don't have vacuum_cost_delay set that will
cause a major issue. Even if you do have vacuum_cost_delay set it will prevent
the small busy tables from getting vacuumed regularly due to the backlog in
anti-wraparound vacuums.

Worse, vacuum will set the freeze_xid to nearly the same value for all of the
tables. So it will all happen again in another 100M transactions. And again in
another 100M transactions, and again...

I think there are several things which need to happen here.

1) Raise autovacuum_max_freeze_age to 400M or 800M. Having it at 200M just
means unnecessary full table vacuums long before they accomplish anything.

2) Include a factor which spreads out the anti-wraparound freezes in the
autovacuum launcher. Some ideas:

. we could implicitly add random(vacuum_freeze_min_age) to the
autovacuum_max_freeze_age. That would spread them out evenly over 100M
transactions.

. we could check if another anti-wraparound vacuum is still running and
implicitly add a vacuum_freeze_min_age penalty to the
autovacuum_max_freeze_age for each running anti-wraparound vacuum. That
would spread them out without being introducing non-determinism which
seems better.

. we could leave autovacuum_max_freeze_age and instead pick a semi-random
vacuum_freeze_min_age. This would mean the first set of anti-wraparound
vacuums would still be synchronized but subsequent ones might be spread
out somewhat. There's not as much room to randomize this though and it
would affect how much i/o vacuum did which makes it seem less palatable
to me.

3) I also think we need to put a clamp on the vacuum_cost_delay. Too many
people are setting it to unreasonably high values which results in their
vacuums never completing. Actually I think what we should do is junk all
the existing parameters and replace it with a vacuum_nice_level or
vacuum_bandwidth_cap from which we calculate the cost_limit and hide all
the other parameters as internal parameters.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Gregory Stark <stark(at)enterprisedb(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Visibility map, partial vacuums
Date:	2008-12-03 14:11:40
Message-ID:	4936939C.9080001@hagander.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Gregory Stark wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>
>> Hmm. It just occurred to me that I think this circumvented the anti-wraparound
>> vacuuming: a normal vacuum doesn't advance relfrozenxid anymore. We'll need to
>> disable the skipping when autovacuum is triggered to prevent wraparound. VACUUM
>> FREEZE does that already, but it's unnecessarily aggressive in freezing.
>
> Having seen how the anti-wraparound vacuums work in the field I think merely
> replacing it with a regular vacuum which covers the whole table will not
> actually work well.
>
> What will happen is that, because nothing else is advancing the relfrozenxid,
> the age of the relfrozenxid for all tables will advance until they all hit
> autovacuum_max_freeze_age. Quite often all the tables were created around the
> same time so they will all hit autovacuum_max_freeze_age at the same time.
>
> So a database which was operating fine and receiving regular vacuums at a
> reasonable pace will suddenly be hit by vacuums for every table all at the
> same time, 3 at a time. If you don't have vacuum_cost_delay set that will
> cause a major issue. Even if you do have vacuum_cost_delay set it will prevent
> the small busy tables from getting vacuumed regularly due to the backlog in
> anti-wraparound vacuums.
>
> Worse, vacuum will set the freeze_xid to nearly the same value for all of the
> tables. So it will all happen again in another 100M transactions. And again in
> another 100M transactions, and again...
>
> I think there are several things which need to happen here.
>
> 1) Raise autovacuum_max_freeze_age to 400M or 800M. Having it at 200M just
> means unnecessary full table vacuums long before they accomplish anything.
>
> 2) Include a factor which spreads out the anti-wraparound freezes in the
> autovacuum launcher. Some ideas:
>
> . we could implicitly add random(vacuum_freeze_min_age) to the
> autovacuum_max_freeze_age. That would spread them out evenly over 100M
> transactions.
>
> . we could check if another anti-wraparound vacuum is still running and
> implicitly add a vacuum_freeze_min_age penalty to the
> autovacuum_max_freeze_age for each running anti-wraparound vacuum. That
> would spread them out without being introducing non-determinism which
> seems better.
>
> . we could leave autovacuum_max_freeze_age and instead pick a semi-random
> vacuum_freeze_min_age. This would mean the first set of anti-wraparound
> vacuums would still be synchronized but subsequent ones might be spread
> out somewhat. There's not as much room to randomize this though and it
> would affect how much i/o vacuum did which makes it seem less palatable
> to me.

How about a way to say that only one (or a config parameter for <n>) of
the autovac workers can be used for anti-wraparound vacuum? Then the
other slots would still be available for the
small-but-frequently-updated tables.

> 3) I also think we need to put a clamp on the vacuum_cost_delay. Too many
> people are setting it to unreasonably high values which results in their
> vacuums never completing. Actually I think what we should do is junk all
> the existing parameters and replace it with a vacuum_nice_level or
> vacuum_bandwidth_cap from which we calculate the cost_limit and hide all
> the other parameters as internal parameters.

It would certainly be helpful if it was just a single parameter - the
arbitraryness of the parameters there now make them pretty hard to set
properly - or at least easy to set wrong.

//Magnus

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Gregory Stark <stark(at)enterprisedb(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Visibility map, partial vacuums
Date:	2008-12-03 16:49:18
Message-ID:	4936B88E.5090108@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

FWIW, it seems the omission is actually the other way 'round. Autovacuum
always forces a full-scanning vacuum, making the visibility map useless
for autovacuum. This obviously needs to be fixed.

> What will happen is that, because nothing else is advancing the relfrozenxid,
> the age of the relfrozenxid for all tables will advance until they all hit
> autovacuum_max_freeze_age. Quite often all the tables were created around the
> same time so they will all hit autovacuum_max_freeze_age at the same time.
>
> So a database which was operating fine and receiving regular vacuums at a
> reasonable pace will suddenly be hit by vacuums for every table all at the
> same time, 3 at a time. If you don't have vacuum_cost_delay set that will
> cause a major issue. Even if you do have vacuum_cost_delay set it will prevent
> the small busy tables from getting vacuumed regularly due to the backlog in
> anti-wraparound vacuums.
>
> Worse, vacuum will set the freeze_xid to nearly the same value for all of the
> tables. So it will all happen again in another 100M transactions. And again in
> another 100M transactions, and again...

But we already have that problem, don't we? When you initially load your
database, all tuples will have the same xmin, and all tables will have
more or less the same relfrozenxid. I guess you can argue that it
becomes more obvious if vacuums are otherwise cheaper, but I don't think
the visibility map makes that much difference to suddenly make this
issue urgent.

Agreed that it would be nice to do something about it, though.

> I think there are several things which need to happen here.
>
> 1) Raise autovacuum_max_freeze_age to 400M or 800M. Having it at 200M just
> means unnecessary full table vacuums long before they accomplish anything.

It allows you to truncate clog. If I did my math right, 200M
transactions amounts to ~50MB of clog. Perhaps we should still raise it,
disk space is cheap after all.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Visibility map, partial vacuums
Date:	2008-12-03 17:55:40
Message-ID:	87bpvtqh2b.fsf@oxford.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:

> Gregory Stark wrote:
>> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>>
>>> Hmm. It just occurred to me that I think this circumvented the anti-wraparound
>>> vacuuming: a normal vacuum doesn't advance relfrozenxid anymore. We'll need to
>>> disable the skipping when autovacuum is triggered to prevent wraparound. VACUUM
>>> FREEZE does that already, but it's unnecessarily aggressive in freezing.
>
> FWIW, it seems the omission is actually the other way 'round. Autovacuum always
> forces a full-scanning vacuum, making the visibility map useless for
> autovacuum. This obviously needs to be fixed.

How does it do that? Is there some option in the VacStmt to control this? Do
we just need a syntax to set that option?

How easy is it to tell what percentage of the table needs to be vacuumed? If
it's > 50% perhaps it would make sense to scan the whole table? (Hm. Not
really if it's a contiguous 50% though...)

Another idea: Perhaps each page of the visibility map should have a frozenxid
(or multiple frozenxids?). Then if an individual page of the visibility map is
old we could force scanning all the heap pages covered by that map page and
update it. I'm not sure we can do that safely though without locking issues --
or is it ok because it's vacuum doing the updating?

>> Worse, vacuum will set the freeze_xid to nearly the same value for all of the
>> tables. So it will all happen again in another 100M transactions. And again in
>> another 100M transactions, and again...
>
> But we already have that problem, don't we? When you initially load your
> database, all tuples will have the same xmin, and all tables will have more or
> less the same relfrozenxid. I guess you can argue that it becomes more obvious
> if vacuums are otherwise cheaper, but I don't think the visibility map makes
> that much difference to suddenly make this issue urgent.

We already have that problem but it only bites in a specific case: if you have
no other vacuums being triggered by the regular dead tuple scale factor. The
normal case is intended to be that autovacuum triggers much more frequently
than every 100M transactions to reduce bloat.

However in practice this specific case does seem to arise rather alarmingly
easy. Most databases do have some large tables which are never deleted from or
updated. Also, the default scale factor of 20% is actually quite easy to never
reach if your tables are also growing quickly -- effectively moving the
goalposts further out as fast as the updates and deletes bloat the table.

The visibility map essentially widens this specific use case to cover *all*
tables. Since the relfrozenxid would never get advanced by regular vacuums the
only time it would get advanced is when they all hit the 200M wall
simultaneously.

> Agreed that it would be nice to do something about it, though.
>
>> I think there are several things which need to happen here.
>>
>> 1) Raise autovacuum_max_freeze_age to 400M or 800M. Having it at 200M just
>> means unnecessary full table vacuums long before they accomplish anything.
>
> It allows you to truncate clog. If I did my math right, 200M transactions
> amounts to ~50MB of clog. Perhaps we should still raise it, disk space is cheap
> after all.

Ah. Hm. Then perhaps this belongs in the realm of the config generator people
are working on. They'll need a dial to say how much disk space you expect your
database to take in addition to how much memory your machine has available.
50M is nothing for a 1TB database but it's kind of silly to have to keep
hundreds of megs of clogs on a 1MB database.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Gregory Stark <stark(at)enterprisedb(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Visibility map, partial vacuums
Date:	2008-12-03 18:33:19
Message-ID:	4936D0EF.6000200@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Gregory Stark wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> Gregory Stark wrote:
>>> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>>>> Hmm. It just occurred to me that I think this circumvented the anti-wraparound
>>>> vacuuming: a normal vacuum doesn't advance relfrozenxid anymore. We'll need to
>>>> disable the skipping when autovacuum is triggered to prevent wraparound. VACUUM
>>>> FREEZE does that already, but it's unnecessarily aggressive in freezing.
>> FWIW, it seems the omission is actually the other way 'round. Autovacuum always
>> forces a full-scanning vacuum, making the visibility map useless for
>> autovacuum. This obviously needs to be fixed.
>
> How does it do that? Is there some option in the VacStmt to control this? Do
> we just need a syntax to set that option?

The way it works now is that if VacuumStmt->freeze_min_age is not -1
(which means "use the default"), the visibility map is not used and the
whole table is scanned. Autovacuum always sets freeze_min_age, so it's
never using the visibility map. Attached is a patch I'm considering to
fix that.

> How easy is it to tell what percentage of the table needs to be vacuumed? If
> it's > 50% perhaps it would make sense to scan the whole table? (Hm. Not
> really if it's a contiguous 50% though...)

Hmm. You could scan the visibility map to see how much you could skip by
using it. You could account for contiguity.

> Another idea: Perhaps each page of the visibility map should have a frozenxid
> (or multiple frozenxids?). Then if an individual page of the visibility map is
> old we could force scanning all the heap pages covered by that map page and
> update it. I'm not sure we can do that safely though without locking issues --
> or is it ok because it's vacuum doing the updating?

We discussed that a while ago:

http://archives.postgresql.org/message-id/492A6032.6080000@enterprisedb.com

Tom was concerned about making the visibility map not just a hint but
critical data. Rightly so. This is certainly 8.5 stuff; perhaps it would
be more palatable after we get the index-only-scans working using the
visibility map, since the map would be critical data anyway.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment	Content-Type	Size
visibilitymap-autovac-1.patch	text/x-diff	3.4 KB

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Visibility map, partial vacuums
Date:	2008-12-03 22:38:53
Message-ID:	87myfcq3ya.fsf@oxford.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Gregory Stark <stark(at)enterprisedb(dot)com> writes:

> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>
>> Gregory Stark wrote:
>>> 1) Raise autovacuum_max_freeze_age to 400M or 800M. Having it at 200M just
>>> means unnecessary full table vacuums long before they accomplish anything.
>>
>> It allows you to truncate clog. If I did my math right, 200M transactions
>> amounts to ~50MB of clog. Perhaps we should still raise it, disk space is cheap
>> after all.

Hm, the more I think about it the more this bothers me. It's another subtle
change from the current behaviour.

Currently *every* vacuum tries to truncate the clog. So you're constantly
trimming off a little bit.

With the visibility map (assuming you fix it not to do full scans all the
time) you can never truncate the clog just as you can never advance the
relfrozenxid unless you do a special full-table vacuum.

I think in practice most people had a read-only table somewhere in their
database which prevented the clog from ever being truncated anyways, so
perhaps this isn't such a big deal.

But the bottom line is that the anti-wraparound vacuums are going to be a lot
more important and much more visible now than they were in the past.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Gregory Stark <stark(at)enterprisedb(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Visibility map, partial vacuums
Date:	2009-01-15 00:55:21
Message-ID:	200901150055.n0F0tLK27057@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Would someone tell me why 'autovacuum_freeze_max_age' defaults to 200M
when our wraparound limit is around 2B?

Also, is anything being done about the concern about 'vacuum storm'
explained below?

---------------------------------------------------------------------------

Gregory Stark wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>
> > Hmm. It just occurred to me that I think this circumvented the anti-wraparound
> > vacuuming: a normal vacuum doesn't advance relfrozenxid anymore. We'll need to
> > disable the skipping when autovacuum is triggered to prevent wraparound. VACUUM
> > FREEZE does that already, but it's unnecessarily aggressive in freezing.
>
> Having seen how the anti-wraparound vacuums work in the field I think merely
> replacing it with a regular vacuum which covers the whole table will not
> actually work well.
>
> What will happen is that, because nothing else is advancing the relfrozenxid,
> the age of the relfrozenxid for all tables will advance until they all hit
> autovacuum_max_freeze_age. Quite often all the tables were created around the
> same time so they will all hit autovacuum_max_freeze_age at the same time.
>
> So a database which was operating fine and receiving regular vacuums at a
> reasonable pace will suddenly be hit by vacuums for every table all at the
> same time, 3 at a time. If you don't have vacuum_cost_delay set that will
> cause a major issue. Even if you do have vacuum_cost_delay set it will prevent
> the small busy tables from getting vacuumed regularly due to the backlog in
> anti-wraparound vacuums.
>
> Worse, vacuum will set the freeze_xid to nearly the same value for all of the
> tables. So it will all happen again in another 100M transactions. And again in
> another 100M transactions, and again...
>
> I think there are several things which need to happen here.
>
> 1) Raise autovacuum_max_freeze_age to 400M or 800M. Having it at 200M just
> means unnecessary full table vacuums long before they accomplish anything.
>
> 2) Include a factor which spreads out the anti-wraparound freezes in the
> autovacuum launcher. Some ideas:
>
> . we could implicitly add random(vacuum_freeze_min_age) to the
> autovacuum_max_freeze_age. That would spread them out evenly over 100M
> transactions.
>
> . we could check if another anti-wraparound vacuum is still running and
> implicitly add a vacuum_freeze_min_age penalty to the
> autovacuum_max_freeze_age for each running anti-wraparound vacuum. That
> would spread them out without being introducing non-determinism which
> seems better.
>
> . we could leave autovacuum_max_freeze_age and instead pick a semi-random
> vacuum_freeze_min_age. This would mean the first set of anti-wraparound
> vacuums would still be synchronized but subsequent ones might be spread
> out somewhat. There's not as much room to randomize this though and it
> would affect how much i/o vacuum did which makes it seem less palatable
> to me.
>
> 3) I also think we need to put a clamp on the vacuum_cost_delay. Too many
> people are setting it to unreasonably high values which results in their
> vacuums never completing. Actually I think what we should do is junk all
> the existing parameters and replace it with a vacuum_nice_level or
> vacuum_bandwidth_cap from which we calculate the cost_limit and hide all
> the other parameters as internal parameters.
>
> --
> Gregory Stark
> EnterpriseDB http://www.enterprisedb.com
> Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Gregory Stark <stark(at)enterprisedb(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Visibility map, partial vacuums
Date:	2009-01-15 01:18:34
Message-ID:	496E8EEA.8050206@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Bruce Momjian wrote:
> Would someone tell me why 'autovacuum_freeze_max_age' defaults to 200M
> when our wraparound limit is around 2B?
>

Presumably because of this (from the docs):

"The commit status uses two bits per transaction, so if
autovacuum_freeze_max_age has its maximum allowed value of a little less
than two billion, pg_clog can be expected to grow to about half a gigabyte."

cheers

andrew

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Gregory Stark <stark(at)enterprisedb(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Visibility map, partial vacuums
Date:	2009-01-15 01:31:11
Message-ID:	200901150131.n0F1VBa10679@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andrew Dunstan wrote:
>
>
> Bruce Momjian wrote:
> > Would someone tell me why 'autovacuum_freeze_max_age' defaults to 200M
> > when our wraparound limit is around 2B?
> >
>
> Presumably because of this (from the docs):
>
> "The commit status uses two bits per transaction, so if
> autovacuum_freeze_max_age has its maximum allowed value of a little less
> than two billion, pg_clog can be expected to grow to about half a gigabyte."

Oh, that's interesting; thanks.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Visibility map, partial vacuums
Date:	2009-01-15 01:40:33
Message-ID:	87wscxfisu.fsf@oxford.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:

> Would someone tell me why 'autovacuum_freeze_max_age' defaults to 200M
> when our wraparound limit is around 2B?

I suggested raising it dramatically in the post you quote and Heikki pointed
it controls the maximum amount of space the clog will take. Raising it to,
say, 800M will mean up to 200MB of space which might be kind of annoying for a
small database.

It would be nice if we could ensure the clog got trimmed frequently enough on
small databases that we could raise the max_age. It's really annoying to see
all these vacuums running 10x more often than necessary.

The rest of the thread is visible at the bottom of:

http://article.gmane.org/gmane.comp.db.postgresql.devel.general/107525

> Also, is anything being done about the concern about 'vacuum storm'
> explained below?

I'm interested too.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's PostGIS support!

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Gregory Stark <stark(at)enterprisedb(dot)com>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Visibility map, partial vacuums
Date:	2009-01-15 06:30:34
Message-ID:	496ED80A.2080804@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Gregory Stark wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
>
>> Would someone tell me why 'autovacuum_freeze_max_age' defaults to 200M
>> when our wraparound limit is around 2B?
>
> I suggested raising it dramatically in the post you quote and Heikki pointed
> it controls the maximum amount of space the clog will take. Raising it to,
> say, 800M will mean up to 200MB of space which might be kind of annoying for a
> small database.
>
> It would be nice if we could ensure the clog got trimmed frequently enough on
> small databases that we could raise the max_age. It's really annoying to see
> all these vacuums running 10x more often than necessary.

Well, if it's a small database, you might as well just vacuum it.

> The rest of the thread is visible at the bottom of:
>
> http://article.gmane.org/gmane.comp.db.postgresql.devel.general/107525
>
>> Also, is anything being done about the concern about 'vacuum storm'
>> explained below?
>
> I'm interested too.

The additional "vacuum_freeze_table_age" (as I'm now calling it) setting
I discussed in a later thread should alleviate that somewhat. When a
table is autovacuumed, the whole table is scanned to freeze tuples if
it's older than vacuum_freeze_table_age, and relfrozenxid is advanced.
When different tables reach the autovacuum threshold at different times,
they will also have their relfrozenxids set to different values. And in
fact no anti-wraparound vacuum is needed.

That doesn't help with read-only or insert-only tables, but that's not a
new problem.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Gregory Stark <stark(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Visibility map, partial vacuums
Date:	2009-01-15 15:48:53
Message-ID:	200901151548.n0FFmr204820@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas wrote:
> >> Also, is anything being done about the concern about 'vacuum storm'
> >> explained below?
> >
> > I'm interested too.
>
> The additional "vacuum_freeze_table_age" (as I'm now calling it) setting
> I discussed in a later thread should alleviate that somewhat. When a
> table is autovacuumed, the whole table is scanned to freeze tuples if
> it's older than vacuum_freeze_table_age, and relfrozenxid is advanced.
> When different tables reach the autovacuum threshold at different times,
> they will also have their relfrozenxids set to different values. And in
> fact no anti-wraparound vacuum is needed.
>
> That doesn't help with read-only or insert-only tables, but that's not a
> new problem.

OK, is this targeted for 8.4?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Gregory Stark <stark(at)enterprisedb(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Visibility map, partial vacuums
Date:	2009-01-15 15:52:47
Message-ID:	200901151552.n0FFql905542@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Gregory Stark wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
>
> > Would someone tell me why 'autovacuum_freeze_max_age' defaults to 200M
> > when our wraparound limit is around 2B?
>
> I suggested raising it dramatically in the post you quote and Heikki pointed
> it controls the maximum amount of space the clog will take. Raising it to,
> say, 800M will mean up to 200MB of space which might be kind of annoying for a
> small database.
>
> It would be nice if we could ensure the clog got trimmed frequently enough on
> small databases that we could raise the max_age. It's really annoying to see
> all these vacuums running 10x more often than necessary.

I always assumed that it was our 4-byte xid that was requiring our
vacuum freeze, but I now see our limiting factor is the size of clog;
interesting.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Gregory Stark <stark(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Visibility map, partial vacuums
Date:	2009-01-15 15:57:09
Message-ID:	496F5CD5.3010300@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Bruce Momjian wrote:
> Heikki Linnakangas wrote:
>>>> Also, is anything being done about the concern about 'vacuum storm'
>>>> explained below?
>>> I'm interested too.
>> The additional "vacuum_freeze_table_age" (as I'm now calling it) setting
>> I discussed in a later thread should alleviate that somewhat. When a
>> table is autovacuumed, the whole table is scanned to freeze tuples if
>> it's older than vacuum_freeze_table_age, and relfrozenxid is advanced.
>> When different tables reach the autovacuum threshold at different times,
>> they will also have their relfrozenxids set to different values. And in
>> fact no anti-wraparound vacuum is needed.
>>
>> That doesn't help with read-only or insert-only tables, but that's not a
>> new problem.
>
> OK, is this targeted for 8.4?

Yes. It's been on my todo list for a long time, and I've also added it
to the Open Items list so that we don't lose track of it.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com