Quick Links

Idea for getting rid of VACUUM FREEZE on cold pages

Lists:	pgsql-hackers

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	postgres hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-05-21 21:57:35
Message-ID:	4BF701CF.2090205@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

From a discussion at dinner at pgcon, I wanted to send this to the list
for people to poke holes in it:

Problem: currently, if your database has a large amount of "cold" data,
such as 350GB of 3-year-old sales transactions, in 8.4 vacuum no longer
needs to touch it thanks to the visibility map. However, every
freeze_age transactions, very old pages need to be sucked into memory
and rewritten just in order to freeze those pages. This can have a huge
impact on system performance, and seems unjustified because the pages
are not actually being used.

Suggested resolution: we would add a 4-byte field to the *page* header
which would track the XID wraparound count. Any page whose wraparound
count was not equal to the current one would be considered to have all
frozen tuples. This would remove the necessity to read and write old
pages just to freeze them, a humongous gain for databases with long data
retention horizons, let alone data warehouses.

All xids on the page would, necessarily, need to belong to the same
wraparound; if a page gets updated and its wraparound count (hereafter
WCID) is lower than current, all tuples on the page would be frozen
before any data is written to it. XIDs which were before the max_freeze
horizon on a page which was being written anyway would be frozen as they
are now.

Obvious issues:

(1) In a case of rows written close to the wraparound point, this would
cause a set of tuples to be frozen sooner than they would be in the
current system.

(2) It's not clear what to do with a page where there are XIDs which are
just before wraparound (like XID # 2.4b) which are still visible and
receives a write with a new cycle xid (#1).

(3) This will require changing the page structure, with all that
entails. So it should probably be done when we're making another change
(like adding CRCs).

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

From:	Florian Pflug <fgp(at)phlo(dot)org>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	postgres hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-05-22 01:33:24
Message-ID:	408F4E07-BD8A-44EF-8DE7-96FFC058864F@phlo.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On May 21, 2010, at 23:57 , Josh Berkus wrote:
> From a discussion at dinner at pgcon, I wanted to send this to the list for people to poke holes in it:
>
> Problem: currently, if your database has a large amount of "cold" data, such as 350GB of 3-year-old sales transactions, in 8.4 vacuum no longer needs to touch it thanks to the visibility map. However, every freeze_age transactions, very old pages need to be sucked into memory and rewritten just in order to freeze those pages. This can have a huge impact on system performance, and seems unjustified because the pages are not actually being used.
>
> Suggested resolution: we would add a 4-byte field to the *page* header which would track the XID wraparound count. Any page whose wraparound count was not equal to the current one would be considered to have all frozen tuples. This would remove the necessity to read and write old pages just to freeze them, a humongous gain for databases with long data retention horizons, let alone data warehouses.

If I understand this correctly, VACUUM usually only frees old tuples, but never increases the oldest xid in the pg_class record. Once that value becomes older than freeze_age, VACUUM needs to scan the whole relation to freeze old tuples. That results in most of the pages being marked dirty and subsequently being written out, causing an IO storm. If, OTOH, the wraparound count was stored in the page header, VACUUM would still need to read those pages, but wouldn't need to write them out.

Alternatively, VACUUM could freeze a few pages on each run, even if the xids are below freeze_age. It could pick those pages randomly, or maybe even prefer pages whose tuples have older xmin/xmas values. That would spread the load out more evenly, much like we try to spread checkpoints out over the whole checkpoint interval.

best regards,
Florian Pflugi

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	postgres hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-05-22 13:35:29
Message-ID:	1787.1274535329@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Josh Berkus <josh(at)agliodbs(dot)com> writes:
> From a discussion at dinner at pgcon, I wanted to send this to the list
> for people to poke holes in it:

Somebody (I think Joe or Heikki) poked a big hole in this last night at
the Royal Oak. Although the scheme would get rid of the need to replace
old XIDs with FrozenXid, it does not get rid of the need to set hint
bits before you can truncate CLOG. So in your example of an insert-only
table that's probably never read again, there's still a minimum of one
update visit required on every old page. Now that's still better than
two update visits ... but we could manage that already, just by tweaking
vacuum's heuristics about when to freeze vs when to set hint bits.

regards, tom lane

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	postgres hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-05-22 15:14:44
Message-ID:	4BF7F4E4.2040106@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> Somebody (I think Joe or Heikki) poked a big hole in this last night at
> the Royal Oak. Although the scheme would get rid of the need to replace
> old XIDs with FrozenXid, it does not get rid of the need to set hint
> bits before you can truncate CLOG. So in your example of an insert-only
> table that's probably never read again, there's still a minimum of one
> update visit required on every old page. Now that's still better than
> two update visits ... but we could manage that already, just by tweaking
> vacuum's heuristics about when to freeze vs when to set hint bits.

Yeah, someone pointed that out to me too and suggested that a freeze map
was the better solution. I still think there's something we can do with
pages on the visibility map but I'll have to think about it some more.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

From:	Jan Wieck <JanWieck(at)Yahoo(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, postgres hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-05-23 00:52:25
Message-ID:	4BF87C49.3020008@Yahoo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 5/22/2010 9:16 PM, Tom Lane wrote:
> Josh Berkus <josh(at)agliodbs(dot)com> writes:
>>> Somebody (I think Joe or Heikki) poked a big hole in this last night at
>>> the Royal Oak. Although the scheme would get rid of the need to replace
>>> old XIDs with FrozenXid, it does not get rid of the need to set hint
>>> bits before you can truncate CLOG. So in your example of an insert-only
>>> table that's probably never read again, there's still a minimum of one
>>> update visit required on every old page. Now that's still better than
>>> two update visits ... but we could manage that already, just by tweaking
>>> vacuum's heuristics about when to freeze vs when to set hint bits.
>
>> Yeah, someone pointed that out to me too and suggested that a freeze map
>> was the better solution. I still think there's something we can do with
>> pages on the visibility map but I'll have to think about it some more.
>
> It occurred to me on the flight home that maybe we could salvage
> something from this if there were some mechanism that caused hint bits
> to get set before the page got written out from shared buffers the first
> time. This assumes that you have enough slack in shared-buffer space
> that the transactions that touched a particular page all commit or abort
> before the page first gets flushed to disk.

At least the background writer should have a few spare cycles to look
over a "to be flushed" page before writing it.

Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	postgres hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-05-23 01:16:57
Message-ID:	7319.1274577417@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Josh Berkus <josh(at)agliodbs(dot)com> writes:
>> Somebody (I think Joe or Heikki) poked a big hole in this last night at
>> the Royal Oak. Although the scheme would get rid of the need to replace
>> old XIDs with FrozenXid, it does not get rid of the need to set hint
>> bits before you can truncate CLOG. So in your example of an insert-only
>> table that's probably never read again, there's still a minimum of one
>> update visit required on every old page. Now that's still better than
>> two update visits ... but we could manage that already, just by tweaking
>> vacuum's heuristics about when to freeze vs when to set hint bits.

> Yeah, someone pointed that out to me too and suggested that a freeze map
> was the better solution. I still think there's something we can do with
> pages on the visibility map but I'll have to think about it some more.

It occurred to me on the flight home that maybe we could salvage
something from this if there were some mechanism that caused hint bits
to get set before the page got written out from shared buffers the first
time. This assumes that you have enough slack in shared-buffer space
that the transactions that touched a particular page all commit or abort
before the page first gets flushed to disk.

regards, tom lane

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, postgres hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-05-24 13:30:13
Message-ID:	4BFA7F65.8070203@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 22/05/10 16:35, Tom Lane wrote:
> Josh Berkus<josh(at)agliodbs(dot)com> writes:
>> From a discussion at dinner at pgcon, I wanted to send this to the list
>> for people to poke holes in it:
>
> Somebody (I think Joe or Heikki) poked a big hole in this last night at
> the Royal Oak.

Me.

> Although the scheme would get rid of the need to replace
> old XIDs with FrozenXid, it does not get rid of the need to set hint
> bits before you can truncate CLOG.

Hmm, we don't rely on setting hint bits to truncate CLOG anymore
(http://archives.postgresql.org/pgsql-committers/2006-11/msg00026.php).
It's the replacement of xids with FrozenXid that matters, the hint bits
are really just hints.

Doesn't change the conclusion, though: you still need to replace XIDs
with FrozenXids to truncate the clog. Conceivably we could keep around
more than 2^32 transactions in clog with this scheme, but then you need
a lot more space for the clog. But perhaps it would be better to do that
than to launch anti-wraparound vacuums, or to refuse more updates in the
extreme cases.

> So in your example of an insert-only
> table that's probably never read again, there's still a minimum of one
> update visit required on every old page. Now that's still better than
> two update visits ... but we could manage that already, just by tweaking
> vacuum's heuristics about when to freeze vs when to set hint bits.

(As also discussed in the Royal Oak) I think we should simply not dirty
a page when a hint bit is updated. Reading a page from disk is
expensive, setting hint bits on the access is generally cheap compared
to that. But that is orthogonal to the idea of a per-page XID epoch.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, postgres hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-05-24 14:14:40
Message-ID:	8522.1274710480@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> (As also discussed in the Royal Oak) I think we should simply not dirty
> a page when a hint bit is updated. Reading a page from disk is
> expensive, setting hint bits on the access is generally cheap compared
> to that. But that is orthogonal to the idea of a per-page XID epoch.

I'm not sure it's cheap. What you suggest would result in a substantial
increase in clog accesses, which means (1) more I/O and (2) more
contention. Certainly it's worth experimenting with, but it's no
guaranteed win.

regards, tom lane

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	postgres hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-05-24 16:44:32
Message-ID:	4BFAACF0.6080504@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> I'm not sure it's cheap. What you suggest would result in a substantial
> increase in clog accesses, which means (1) more I/O and (2) more
> contention. Certainly it's worth experimenting with, but it's no
> guaranteed win.

It seems like there's a number of issues we could fix by making the CLOG
more efficient somehow -- from the elimination of hint bits to the
ability to freeze pages without writing them.

Not, of course, that I have any idea how to do that.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	postgres hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-05-24 19:49:47
Message-ID:	1274730220-sup-6350@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Josh Berkus's message of vie may 21 17:57:35 -0400 2010:

> Problem: currently, if your database has a large amount of "cold" data,
> such as 350GB of 3-year-old sales transactions, in 8.4 vacuum no longer
> needs to touch it thanks to the visibility map. However, every
> freeze_age transactions, very old pages need to be sucked into memory
> and rewritten just in order to freeze those pages. This can have a huge
> impact on system performance, and seems unjustified because the pages
> are not actually being used.

I think this is nonsense. If you have 3-years-old sales transactions,
and your database has any interesting churn, tuples those pages have
been frozen for a very long time *already*. The problem is vacuum
reading them in so that it can verify there's nothing to do. If we want
to avoid *reading* those pages, this solution is useless:

> Suggested resolution: we would add a 4-byte field to the *page* header
> which would track the XID wraparound count.

because you still have to read the page.

I think what you're looking for is for this Xid wraparound count to be
stored elsewhere, not inside the page. That way vacuum can read it and
skip the page without reading it altogether. I think a "freeze map" has
been mentioned downthread.

I remember mentioning some time ago that we could declare some tables as
frozen, i.e. "not needing vacuum". This strikes me as similar, except
at the page level rather than table level.

--
Álvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, postgres hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-05-25 08:41:30
Message-ID:	4BFB8D3A.4040600@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 24/05/10 22:49, Alvaro Herrera wrote:
> Excerpts from Josh Berkus's message of vie may 21 17:57:35 -0400 2010:
>
>> Problem: currently, if your database has a large amount of "cold" data,
>> such as 350GB of 3-year-old sales transactions, in 8.4 vacuum no longer
>> needs to touch it thanks to the visibility map. However, every
>> freeze_age transactions, very old pages need to be sucked into memory
>> and rewritten just in order to freeze those pages. This can have a huge
>> impact on system performance, and seems unjustified because the pages
>> are not actually being used.
>
> I think this is nonsense. If you have 3-years-old sales transactions,
> and your database has any interesting churn, tuples those pages have
> been frozen for a very long time *already*. The problem is vacuum
> reading them in so that it can verify there's nothing to do. If we want
> to avoid *reading* those pages, this solution is useless:
>
>> Suggested resolution: we would add a 4-byte field to the *page* header
>> which would track the XID wraparound count.
>
> because you still have to read the page.

What's missing from the suggestion is that relfrozenxid and datfrozenxid
also need to be expanded to 8-bytes. That way you effectively have
8-byte XIDs, which means that you never need to vacuum to avoid XID
wraparound.

You still need to freeze to truncate clog, though, but if you have the
disk space, you can now do that every 100 billion transactions for
example if you wish.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, postgres hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-05-25 17:10:35
Message-ID:	1274807228-sup-3161@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Heikki Linnakangas's message of mar may 25 04:41:30 -0400 2010:
> On 24/05/10 22:49, Alvaro Herrera wrote:

> > I think this is nonsense. If you have 3-years-old sales transactions,
> > and your database has any interesting churn, tuples those pages have
> > been frozen for a very long time *already*.

> What's missing from the suggestion is that relfrozenxid and datfrozenxid
> also need to be expanded to 8-bytes. That way you effectively have
> 8-byte XIDs, which means that you never need to vacuum to avoid XID
> wraparound.

Hmm, so are we going to use the "xid epoch" more officially? That's
entirely a new line of development, perhaps it opens new possibilities.

This sounds like extending Xid to 64 bits, without having to store the
high bits everywhere. Was this discussed in the PGCon devs meeting?

--
Álvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, postgres hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-05-25 18:16:14
Message-ID:	27435.1274811374@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> writes:
> This sounds like extending Xid to 64 bits, without having to store the
> high bits everywhere. Was this discussed in the PGCon devs meeting?

Yeah, that's what it would amount to. It was not discussed at the dev
meeting --- it was an idea that came up one evening at PGCon.

I'm not sure whether this would imply having to widen xid to 64 bits
internally. That could be a bit unpleasant as far as CPU and shared
memory space go, although every year that goes by makes 32-bit machines
less interesting as DB servers.

regards, tom lane

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, postgres hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-05-25 20:49:23
Message-ID:	4BFC37D3.3070908@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro,

>> This sounds like extending Xid to 64 bits, without having to store the
>> high bits everywhere. Was this discussed in the PGCon devs meeting?

Essentially, yes.

One of the main objections to raising XID to 64-bit has been the per-row
overhead. But adding 4 bytes per page wouldn't be much of an impact.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

From:	Jan Wieck <JanWieck(at)Yahoo(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, postgres hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-05-25 20:53:52
Message-ID:	4BFC38E0.3040409@Yahoo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 5/24/2010 9:30 AM, Heikki Linnakangas wrote:
> On 22/05/10 16:35, Tom Lane wrote:
>> Josh Berkus<josh(at)agliodbs(dot)com> writes:
>>> From a discussion at dinner at pgcon, I wanted to send this to the list
>>> for people to poke holes in it:
>>
>> Somebody (I think Joe or Heikki) poked a big hole in this last night at
>> the Royal Oak.
>
> Me.
>
>> Although the scheme would get rid of the need to replace
>> old XIDs with FrozenXid, it does not get rid of the need to set hint
>> bits before you can truncate CLOG.
>
> Hmm, we don't rely on setting hint bits to truncate CLOG anymore
> (http://archives.postgresql.org/pgsql-committers/2006-11/msg00026.php).
> It's the replacement of xids with FrozenXid that matters, the hint bits
> are really just hints.
>
> Doesn't change the conclusion, though: you still need to replace XIDs
> with FrozenXids to truncate the clog. Conceivably we could keep around
> more than 2^32 transactions in clog with this scheme, but then you need
> a lot more space for the clog. But perhaps it would be better to do that
> than to launch anti-wraparound vacuums, or to refuse more updates in the
> extreme cases.

Correct. The problem actually are aborted transactions. Just because an
XID is really old doesn't mean it was committed.

Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Jan Wieck <JanWieck(at)Yahoo(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, postgres hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-05-25 20:56:29
Message-ID:	4BFC397D.10703@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> Correct. The problem actually are aborted transactions. Just because an
> XID is really old doesn't mean it was committed.

Yes, that's the main issue with my idea; XIDs which fell off the CLOG
would become visible even if they'd aborted.

Do we get a bit in the visibility map for a page which has aborted
transaction rows on it?

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Jan Wieck <JanWieck(at)Yahoo(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, postgres hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-05-26 05:04:16
Message-ID:	4BFCABD0.3030905@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 25/05/10 23:56, Josh Berkus wrote:
> Do we get a bit in the visibility map for a page which has aborted
> transaction rows on it?

If there's a tuple with an aborted xmin on a page, the bit in the
visibility map is not set. A tuple with aborted xmax doesn't matter.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-05-26 18:35:28
Message-ID:	4BFD69F0.7030105@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 5/25/10 10:04 PM, Heikki Linnakangas wrote:
> On 25/05/10 23:56, Josh Berkus wrote:
>> Do we get a bit in the visibility map for a page which has aborted
>> transaction rows on it?
>
> If there's a tuple with an aborted xmin on a page, the bit in the
> visibility map is not set. A tuple with aborted xmax doesn't matter.

Then it seems like pages in the visibility map, at least, would not need
to be vacuumed or frozen. Do pages persist in the visibility map
indefinitely?

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-05-26 18:44:05
Message-ID:	4BFD6BF5.7090906@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 26/05/10 21:35, Josh Berkus wrote:
> On 5/25/10 10:04 PM, Heikki Linnakangas wrote:
>> On 25/05/10 23:56, Josh Berkus wrote:
>>> Do we get a bit in the visibility map for a page which has aborted
>>> transaction rows on it?
>>
>> If there's a tuple with an aborted xmin on a page, the bit in the
>> visibility map is not set. A tuple with aborted xmax doesn't matter.
>
> Then it seems like pages in the visibility map, at least, would not need
> to be vacuumed or frozen. Do pages persist in the visibility map
> indefinitely?

In theory, until any tuple on the page is inserted/updated/deleted
again. However, we've been operating on the assumption that it's always
safe to clear any bit in the visibility map, without affecting
correctness. I would not like to give up that assumption, it makes life
easier.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-05-26 18:46:39
Message-ID:	4BFD6C8F.6070009@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> In theory, until any tuple on the page is inserted/updated/deleted
> again. However, we've been operating on the assumption that it's always
> safe to clear any bit in the visibility map, without affecting
> correctness. I would not like to give up that assumption, it makes life
> easier.

It wouldn't affect correctness, it would just force that page to be
vacuumed-and-frozen. I think I can make this work, let me just hammer
it out.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-05-26 19:13:59
Message-ID:	AANLkTin-c8VQw03gTffjfb_q2iUlU_3EKUbKeqDkWC_v@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, May 26, 2010 at 2:44 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> On 26/05/10 21:35, Josh Berkus wrote:
>> On 5/25/10 10:04 PM, Heikki Linnakangas wrote:
>>> On 25/05/10 23:56, Josh Berkus wrote:
>>>> Do we get a bit in the visibility map for a page which has aborted
>>>> transaction rows on it?
>>>
>>> If there's a tuple with an aborted xmin on a page, the bit in the
>>> visibility map is not set. A tuple with aborted xmax doesn't matter.
>>
>> Then it seems like pages in the visibility map, at least, would not need
>> to be vacuumed or frozen. Do pages persist in the visibility map
>> indefinitely?
>
> In theory, until any tuple on the page is inserted/updated/deleted again.
> However, we've been operating on the assumption that it's always safe to
> clear any bit in the visibility map, without affecting correctness. I would
> not like to give up that assumption, it makes life easier.

What if we drove it off of the PD_ALL_VISIBLE bit on the page itself,
rather than the visibility map bit? It would be safe to clear the
visibility map bit without touching the page, but if you clear the
PD_ALL_VISIBLE bit on the page itself then you set all the hint bits
and freeze all the tuples. In the case where the visibility map bit
gets cleared but the page-level bit is still set, a future vacuum can
notice and reset the visibility map bit. But whenever the visibility
map bit is set, you know that the page-level bit MUST be set, so you
needn't vacuum those pages, even for anti-wraparound: you know they'll
be frozen when and if they ever get written again.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-05-26 21:01:24
Message-ID:	4BFD8C24.1020606@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> What if we drove it off of the PD_ALL_VISIBLE bit on the page itself,
> rather than the visibility map bit? It would be safe to clear the
> visibility map bit without touching the page, but if you clear the
> PD_ALL_VISIBLE bit on the page itself then you set all the hint bits
> and freeze all the tuples. In the case where the visibility map bit
> gets cleared but the page-level bit is still set, a future vacuum can
> notice and reset the visibility map bit. But whenever the visibility
> map bit is set, you know that the page-level bit MUST be set, so you
> needn't vacuum those pages, even for anti-wraparound: you know they'll
> be frozen when and if they ever get written again.

How does that get us out of reading and writing old pages, though? If
we're going to set a bit on them, we might as well freeze them.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-05-27 00:06:46
Message-ID:	4918.1274918806@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Josh Berkus <josh(at)agliodbs(dot)com> writes:
> How does that get us out of reading and writing old pages, though?

Yeah. Neither PD_ALL_VISIBLE nor the visibility map are going to solve
your problem, because they cannot become set without having visited the
page.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-05-27 00:48:36
Message-ID:	AANLkTikHa3o1XdwCFJziLUNxvs31l-RBdLWv7itKZu0O@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, May 26, 2010 at 8:06 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Josh Berkus <josh(at)agliodbs(dot)com> writes:
>> How does that get us out of reading and writing old pages, though?
>
> Yeah. Neither PD_ALL_VISIBLE nor the visibility map are going to solve
> your problem, because they cannot become set without having visited the
> page.

Well, maybe I'm confused here, but arranging things so that we NEVER
have to visit the page after initially writing it seems like it's
setting the bar almost impossibly high. Consider a table that is
regularly written but append-only. Every time autovacuum kicks in,
we'll go and remove any dead tuples and then mark the pages
PD_ALL_VISIBLE and set the visibility map bits, which will cause
subsequent vacuums to ignore the all-visible portions of the table...
until anti-wraparound kicks in, at which point we'll vacuum the entire
table and freeze everything.

If, however, we decree that you can't write a new tuple into a
PD_ALL_VISIBLE page without freezing the existing tuples, then you'll
still have the small, incremental vacuums but those are pretty cheap,
and in any event, I don't see any way to get rid of them unless
someone can devise a scheme to do away with vacuum entirely. But you
won't need the full-table vacuum to freeze tuples, because you can
freeze them opportunistically the next time those pages are written
(at which point freezing will be very cheap because the page has to be
written to disk at that point no matter what).

Maybe I'm confused.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-05-27 00:59:06
Message-ID:	5602.1274921946@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Wed, May 26, 2010 at 8:06 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Yeah. Neither PD_ALL_VISIBLE nor the visibility map are going to solve
>> your problem, because they cannot become set without having visited the
>> page.

> Well, maybe I'm confused here, but arranging things so that we NEVER
> have to visit the page after initially writing it seems like it's
> setting the bar almost impossibly high.

Well, that was the use-case that Josh was on about when this idea came
up: high-volume append-only log tables that in most cases will never be
read, so his client wants to get rid of the extra I/O for maintenance
visits to once-written pages.

If you're willing to allow one visit and rewrite of each page, then
we can do that today with maybe a bit of rejiggering of vacuum's
when-to-freeze heuristics.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-05-27 01:32:07
Message-ID:	AANLkTil3irlbJRRWmIjyuCKgaT_msZuPKopwQjms0XC6@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, May 26, 2010 at 8:59 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Wed, May 26, 2010 at 8:06 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Yeah. Neither PD_ALL_VISIBLE nor the visibility map are going to solve
>>> your problem, because they cannot become set without having visited the
>>> page.
>
>> Well, maybe I'm confused here, but arranging things so that we NEVER
>> have to visit the page after initially writing it seems like it's
>> setting the bar almost impossibly high.
>
> Well, that was the use-case that Josh was on about when this idea came
> up: high-volume append-only log tables that in most cases will never be
> read, so his client wants to get rid of the extra I/O for maintenance
> visits to once-written pages.

Well, I'll just note that using PD_ALL_VISIBLE as I'm proposing is
basically equivalent to Josh's original proposal of using an XID epoch
except that it addresses all three of the "obvious issues" which he
noted in his original email; plus it doesn't prevent truncating CLOG
(on the assumption that we rejigger things not to consult clog when
the page is marked PD_ALL_VISIBLE).

> If you're willing to allow one visit and rewrite of each page, then
> we can do that today with maybe a bit of rejiggering of vacuum's
> when-to-freeze heuristics.

Hmm, yeah. Maybe we should freeze when we set PD_ALL_VISIBLE; that
might be just as good, and simpler. Assuming the visibility map is
sufficiently crash-safe/non-buggy, we could then teach VACUUM that
it's OK to advance relfrozenxid even when doing just a partial vacuum
- because any pages that were skipped must contain only frozen tuples.
Previously you've objected to proposals in this direction because
they might destroy forensic information, but maybe we should do it
anyway.

Either way, I think if we do this it *basically* gets rid of
anti-wraparound vacuum. Yeah, we'll still do routine partial vacuums,
but what you won't have is... write the table, vacuum, vacuum, vacuum,
vacuum, OK, everything's visible to everyone, don't need to vacuum any
more... months pass... boom, unexpected full-table vacuum. The
beginning part is the same, but you get rid of the boom at the end.
The only way I see to cut down vacuum activity even further is to
freeze them (and set the visibility map bit) before evicting them from
shared_buffers. That's really the only way to get "write once and
only once", but it's pretty hit or miss, because the xmin horizon
might not advance fast enough to make it actually work.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	Jesper Krogh <jesper(at)krogh(dot)cc>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-05-27 05:56:16
Message-ID:	C4B54ABC-26C4-4416-AE30-B76EEF7E8C29@krogh.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 27/05/2010, at 02.48, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Wed, May 26, 2010 at 8:06 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Josh Berkus <josh(at)agliodbs(dot)com> writes:
>>> How does that get us out of reading and writing old pages, though?
>>
>> Yeah. Neither PD_ALL_VISIBLE nor the visibility map are going to
>> solve
>> your problem, because they cannot become set without having visited
>> the
>> page.
>
> Well, maybe I'm confused here, but arranging things so that we NEVER
> have to visit the page after initially writing it seems like it's
> setting the bar almost impossibly high. Consider a table that is
> regularly written but append-only. Every time autovacuum kicks in,
> we'll go and remove any dead tuples and then mark the pages
> PD_ALL_VISIBLE and set the visibility map bits, which will cause
> subsequent vacuums to ignore the all-visible portions of the table...
> until anti-wraparound kicks in, at which point we'll vacuum the entire
> table and freeze everything.

Just a thought. Wouldn't a All-visible bit also enable index only
scans to some degree?

Jesper

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Jesper Krogh <jesper(at)krogh(dot)cc>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-05-27 07:17:38
Message-ID:	4BFE1C92.1000801@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 27/05/10 08:56, Jesper Krogh wrote:
> Just a thought. Wouldn't a All-visible bit also enable index only scans
> to some degree?

Yes. In fact, that's one reason I implemented the visibility map in the
first place. I started working on index-only scans based on that last
year, if you search the archives for index-only scans you'll find those
discussions.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-05-27 18:00:24
Message-ID:	4BFEB338.6040801@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> Well, maybe I'm confused here, but arranging things so that we NEVER
> have to visit the page after initially writing it seems like it's
> setting the bar almost impossibly high.

That is the use case, though. What I've encountered so far at 3 client
sites is tables which are largely append-only, with a few selects and
very few updates (< 2%) on recent data. In general, once data gets
flushed out of memory, it goes to disk and never gets recalled, and
certainly not written. Thinks are hunky-dory until we reach
max_freeze_age, at which point the server has to chew through hundreds
of gigabytes of old data just to freeze them, sometimes bringing the
application to a halt in the process.

The user's perspective on this is quite reasonable: if I haven't
selected these pages, and I haven't written to them, why does autovacuum
need to visit them and screw up my server performance?

> Consider a table that is
> regularly written but append-only. Every time autovacuum kicks in,
> we'll go and remove any dead tuples and then mark the pages
> PD_ALL_VISIBLE and set the visibility map bits, which will cause
> subsequent vacuums to ignore the all-visible portions of the table...
> until anti-wraparound kicks in, at which point we'll vacuum the entire
> table and freeze everything.
>
> If, however, we decree that you can't write a new tuple into a
> PD_ALL_VISIBLE page without freezing the existing tuples, then you'll
> still have the small, incremental vacuums but those are pretty cheap,

That only works if those pages were going to be autovacuumed anyway. In
the case outlined above (which I've seen at 3 different production sites
this year), they wouldn't be; a table with less than 2% updates and
deletes does not get vacuumed until max_freeze_age for any reason. For
that matter, pages which are getting autovacuumed are not a problem,
period; they're being read and written and freezing them is not an issue.

I'm not seeing a way of fixing this common issue short of overhauling
CLOG, or of creating a freeze_map. Darn.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-05-27 18:17:15
Message-ID:	4BFEB72B.1040602@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 5/26/10 6:32 PM, Robert Haas wrote:
> Hmm, yeah. Maybe we should freeze when we set PD_ALL_VISIBLE; that
> might be just as good, and simpler. Assuming the visibility map is
> sufficiently crash-safe/non-buggy, we could then teach VACUUM that
> it's OK to advance relfrozenxid even when doing just a partial vacuum
> - because any pages that were skipped must contain only frozen tuples.
> Previously you've objected to proposals in this direction because
> they might destroy forensic information, but maybe we should do it
> anyway.

It would be an improvement, and easier than the various ways of never
having to visit the pages, which are all fairly intensive. Given the
destruction of rollback information, though, we'd probably want a way to
switch this behaviour on and off as an autovacuum setting.

Does this send us down the wrong path, though? I thought we wanted to
think about removing hint bits so that we could implement things like
CRCs. No?

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-05-27 18:49:25
Message-ID:	AANLkTil6xfScg0nFQoEwzU5CT0LOWDjdS-8XtpKCoZ0Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, May 27, 2010 at 2:17 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> On 5/26/10 6:32 PM, Robert Haas wrote:
>> Hmm, yeah. Maybe we should freeze when we set PD_ALL_VISIBLE; that
>> might be just as good, and simpler. Assuming the visibility map is
>> sufficiently crash-safe/non-buggy, we could then teach VACUUM that
>> it's OK to advance relfrozenxid even when doing just a partial vacuum
>> - because any pages that were skipped must contain only frozen tuples.
>> Previously you've objected to proposals in this direction because
>> they might destroy forensic information, but maybe we should do it
>> anyway.
>
> It would be an improvement, and easier than the various ways of never
> having to visit the pages, which are all fairly intensive. Given the
> destruction of rollback information, though, we'd probably want a way to
> switch this behaviour on and off as an autovacuum setting.

It's not going to destroy anything that is needed for rollback unless
there's a bug - PD_ALL_VISIBLE only gets set when all tuples on the
page are visible to all backends. That can't happen until all
transactions that wrote the page, and all others that have a lower
xmin, have committed. That having been said, if making it a GUC makes
people less nervous about doing it, then +1 from me.

> Does this send us down the wrong path, though? I thought we wanted to
> think about removing hint bits so that we could implement things like
> CRCs. No?

PD_ALL_VISIBLE is a page-level bit, not a tuple-level bit, and I
strongly suspect it's not going anywhere. It's critical
infrastructure for index-only scans, among other things.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-05-27 18:52:42
Message-ID:	AANLkTikkboKhIuReDhoH7p7QHD_Qdbb47B0HLwFbnuim@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, May 27, 2010 at 2:00 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>> Well, maybe I'm confused here, but arranging things so that we NEVER
>> have to visit the page after initially writing it seems like it's
>> setting the bar almost impossibly high.
>
> That is the use case, though. What I've encountered so far at 3 client
> sites is tables which are largely append-only, with a few selects and
> very few updates (< 2%) on recent data. In general, once data gets
> flushed out of memory, it goes to disk and never gets recalled, and
> certainly not written.

We might be able to optimize this case if the transactions are small,
such that they commit before dirtying too large a fraction of
shared_buffers. We could - at least in theory - teach the bgwriter or
some other process to freeze them before writing them to disk the
first time. But if the blocks have to be written to disk before
transaction commit it seems to me we're DOA, unless we're willing to
retain arbitrarily large amounts of CLOG.

What might be more practical is to try to find ways to spread out the
I/O so that it doesn't happen all at once in a huge ornery spike.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	Jesper Krogh <jesper(at)krogh(dot)cc>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-05-27 19:08:06
Message-ID:	139C8666-3F12-4380-A4E7-8B23A511D2A0@krogh.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 27/05/2010, at 20.00, Josh Berkus <josh(at)agliodbs(dot)com> wrote:

>
>> Well, maybe I'm confused here, but arranging things so that we NEVER
>> have to visit the page after initially writing it seems like it's
>> setting the bar almost impossibly high.
>
> That is the use case, though. What I've encountered so far at 3
> client
> sites is tables which are largely append-only, with a few selects and
> very few updates (< 2%) on recent data. In general, once data gets
> flushed out of memory, it goes to disk and never gets recalled, and
> certainly not written. Thinks are hunky-dory until we reach
> max_freeze_age, at which point the server has to chew through hundreds
> of gigabytes of old data just to freeze them, sometimes bringing the
> application to a halt in the process.

The data doesn't get in there in " no time" if autovacuum was aware of
inserts too it would incrementally freeze the table as it grows.

It would still cause it to be read in again but not in a big chunck.

Couldn't pages that are totally filled by the same transaction, be
frozen on the initial write?

Jesper - given my limited knowledge about how it works.

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Josh Berkus" <josh(at)agliodbs(dot)com>,"Jesper Krogh" <jesper(at)krogh(dot)cc>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-05-27 19:15:56
Message-ID:	4BFE7E9C0200002500031B89@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Jesper Krogh <jesper(at)krogh(dot)cc> wrote:

> Couldn't pages that are totally filled by the same transaction, be

> frozen on the initial write?

As far as I'm aware, that can only be done if:

(a) The tuples were written within the same transaction which
created or truncated the table.

*or*

(b) The writing transaction and all transactions concurrent to it
have completed by the time the page is about to be written.

-Kevin

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, Jesper Krogh <jesper(at)krogh(dot)cc>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-05-27 19:39:47
Message-ID:	AANLkTimy4hDyTqbrsi2XIXDklRRJkcA55myaxL5EZrAc@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, May 27, 2010 at 3:15 PM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> Jesper Krogh <jesper(at)krogh(dot)cc> wrote:
>
>> Couldn't pages that are totally filled by the same transaction, be
>
>> frozen on the initial write?
>
> As far as I'm aware, that can only be done if:
>
> (a) The tuples were written within the same transaction which
> created or truncated the table.
>
> *or*
>
> (b) The writing transaction and all transactions concurrent to it
> have completed by the time the page is about to be written.

Actually, I think this is true only in case (b). In case (a), you
mess up visibility with respect to other command-IDs within the
transaction.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc:	"Josh Berkus" <josh(at)agliodbs(dot)com>, "Jesper Krogh" <jesper(at)krogh(dot)cc>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-05-27 19:52:35
Message-ID:	4BFE87330200002500031B94@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, May 27, 2010 at 3:15 PM, Kevin Grittner

>> (a) The tuples were written within the same transaction which
>> created or truncated the table.

> In case (a), you mess up visibility with respect to other
> command-IDs within the transaction.

Surely that problem is surmountable?

-Kevin

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, Jesper Krogh <jesper(at)krogh(dot)cc>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-05-27 19:56:44
Message-ID:	AANLkTinZ0m7iS2bvFQlxIWXWf8LSMRML06n86Y2_Ax3E@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, May 27, 2010 at 3:52 PM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Thu, May 27, 2010 at 3:15 PM, Kevin Grittner
>
>>> (a) The tuples were written within the same transaction which
>>> created or truncated the table.
>
>> In case (a), you mess up visibility with respect to other
>> command-IDs within the transaction.
>
> Surely that problem is surmountable?

I proposed an idea at PGCon, but I believe Tom and Heikki thought it
was far too grotty to consider.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc:	"Josh Berkus" <josh(at)agliodbs(dot)com>, "Jesper Krogh" <jesper(at)krogh(dot)cc>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-05-27 20:01:09
Message-ID:	4BFE89350200002500031B9B@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> I proposed an idea at PGCon, but I believe Tom and Heikki thought
> it was far too grotty to consider.

Well, as an alternative -- don't we have some information about the
relation pinned which could hold the xid of its creator? If the
tuple is frozen check to see if your transaction is the creator and
behave like you created the tuple (which, in fact, you did)?

-Kevin

From:	Joe Conway <mail(at)joeconway(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Josh Berkus <josh(at)agliodbs(dot)com>, Jesper Krogh <jesper(at)krogh(dot)cc>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-05-27 20:21:55
Message-ID:	4BFED463.7090109@joeconway.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 05/27/2010 12:39 PM, Robert Haas wrote:
> On Thu, May 27, 2010 at 3:15 PM, Kevin Grittner
> <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
>> Jesper Krogh <jesper(at)krogh(dot)cc> wrote:
>>
>>> Couldn't pages that are totally filled by the same transaction, be
>>
>>> frozen on the initial write?
>>
>> As far as I'm aware, that can only be done if:
>>
>> (a) The tuples were written within the same transaction which
>> created or truncated the table.
>>
>> *or*
>>
>> (b) The writing transaction and all transactions concurrent to it
>> have completed by the time the page is about to be written.
>
> Actually, I think this is true only in case (b). In case (a), you
> mess up visibility with respect to other command-IDs within the
> transaction.
>

(a) can work if it is all in one command, CREATE TABLE AS SELECT...

Additionally we were discussing COPY in the FROM clause, which means you
could CREATE TABLE AS SELECT ... FROM (COPY ...). That would allow bulk
loading with hint bits already set (and tuples frozen?).

Joe

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Robert Haas" <robertmhaas(at)gmail(dot)com>, "Joe Conway" <mail(at)joeconway(dot)com>
Cc:	"Josh Berkus" <josh(at)agliodbs(dot)com>, "Jesper Krogh" <jesper(at)krogh(dot)cc>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-05-27 20:33:34
Message-ID:	4BFE90CE0200002500031BA7@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

>>>Joe Conway <mail(at)joeconway(dot)com> wrote:

> (a) can work if it is all in one command, CREATE TABLE AS
> SELECT...

> Additionally we were discussing COPY in the FROM clause, which
> means you could CREATE TABLE AS SELECT ... FROM (COPY ...). That
> would allow bulk loading with hint bits already set (and tuples
> frozen?).

As long as it's hinted and frozen after a pg_dump -1 | psql I'll be
happy.

-Kevin

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Josh Berkus <josh(at)agliodbs(dot)com>, Jesper Krogh <jesper(at)krogh(dot)cc>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-05-28 07:48:36
Message-ID:	4BFF7554.8050609@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 27/05/10 22:56, Robert Haas wrote:
> On Thu, May 27, 2010 at 3:52 PM, Kevin Grittner
> <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
>> Robert Haas<robertmhaas(at)gmail(dot)com> wrote:
>>> On Thu, May 27, 2010 at 3:15 PM, Kevin Grittner
>>
>>>> (a) The tuples were written within the same transaction which
>>>> created or truncated the table.
>>
>>> In case (a), you mess up visibility with respect to other
>>> command-IDs within the transaction.
>>
>> Surely that problem is surmountable?
>
> I proposed an idea at PGCon, but I believe Tom and Heikki thought it
> was far too grotty to consider.

No, I think it's surmountable too. We discussed hacks to teach the MVCC
checks that all frozen tuples on a table that was created in the same
transaction (i.e. the same cases where we skip WAL logging) were
actually created by the running transaction, and check commandid
accordingly.

Or detect simple DML commands where we know that the command doesn't
read the table. COPY would usually fall into that category, though
non-immutable input functions make that a bit iffy.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Russell Smith <mr-russ(at)pws(dot)com(dot)au>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-02 10:38:35
Message-ID:	4C0634AB.4010206@pws.com.au
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 28/05/10 04:00, Josh Berkus wrote:
>> Consider a table that is
>> regularly written but append-only. Every time autovacuum kicks in,
>> we'll go and remove any dead tuples and then mark the pages
>> PD_ALL_VISIBLE and set the visibility map bits, which will cause
>> subsequent vacuums to ignore the all-visible portions of the table...
>> until anti-wraparound kicks in, at which point we'll vacuum the entire
>> table and freeze everything.
>>
>> If, however, we decree that you can't write a new tuple into a
>> PD_ALL_VISIBLE page without freezing the existing tuples, then you'll
>> still have the small, incremental vacuums but those are pretty cheap,
>>
> That only works if those pages were going to be autovacuumed anyway. In
> the case outlined above (which I've seen at 3 different production sites
> this year), they wouldn't be; a table with less than 2% updates and
> deletes does not get vacuumed until max_freeze_age for any reason. For
> that matter, pages which are getting autovacuumed are not a problem,
> period; they're being read and written and freezing them is not an issue.
>
> I'm not seeing a way of fixing this common issue short of overhauling
> CLOG, or of creating a freeze_map. Darn.
>
Don't you not get a positive enough effect by adjusting the table's
autovacuum_min_freeze_age and autovacuum_max_freeze_age. If you set
those numbers small, it appears to me that you would get very quickly to
a state where the vacuum would example only the most recent part of the
table rather than the whole thing. Does that give you enough of a win
that it stops the scanning and writing of the whole table which reduces
the performance problem being experienced. It's not a complete
solution, but does it go someway?

Regards

Russell

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Russell Smith <mr-russ(at)pws(dot)com(dot)au>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-02 17:51:21
Message-ID:	1275500917-sup-2984@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Russell Smith's message of mié jun 02 06:38:35 -0400 2010:

> Don't you not get a positive enough effect by adjusting the table's
> autovacuum_min_freeze_age and autovacuum_max_freeze_age. If you set
> those numbers small, it appears to me that you would get very quickly to
> a state where the vacuum would example only the most recent part of the
> table rather than the whole thing.

The problem is that vacuum doesn't know that a certain part of the table
is already frozen. It needs to scan it completely anyways. If we had a
"frozen" map, we could mark pages that are completely frozen and thus do
not need any vacuuming; but we don't (I don't recall the reasons for
this. Maybe it's just that no one has gotten around to it, or maybe
there's something else).

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Russell Smith <mr-russ(at)pws(dot)com(dot)au>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-02 18:05:31
Message-ID:	9200.1275501931@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> The problem is that vacuum doesn't know that a certain part of the table
> is already frozen. It needs to scan it completely anyways. If we had a
> "frozen" map, we could mark pages that are completely frozen and thus do
> not need any vacuuming; but we don't (I don't recall the reasons for
> this. Maybe it's just that no one has gotten around to it, or maybe
> there's something else).

Offhand I think the reason is that you'd have to trust the frozen bit
to be 100% correct (or at least never set to 1 in error). Currently,
both the FSM and visibility forks are just hints, and we won't suffer
data corruption if they're wrong; so we don't get too tense about WAL
logging or fsync'ing updates. I believe Heikki is looking into what
it'd take to make the visibility map 100% reliable, in connection with
the desire for index-only scans. If we get that and the overhead isn't
too terrible maybe we could build a frozen-status map the same way.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Russell Smith <mr-russ(at)pws(dot)com(dot)au>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-02 18:16:33
Message-ID:	AANLkTikNFAHMBg8jnmA7kFWxwh-9TcSerIwRdnPnHA7f@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jun 2, 2010 at 2:05 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
>> The problem is that vacuum doesn't know that a certain part of the table
>> is already frozen. It needs to scan it completely anyways. If we had a
>> "frozen" map, we could mark pages that are completely frozen and thus do
>> not need any vacuuming; but we don't (I don't recall the reasons for
>> this. Maybe it's just that no one has gotten around to it, or maybe
>> there's something else).
>
> Offhand I think the reason is that you'd have to trust the frozen bit
> to be 100% correct (or at least never set to 1 in error). Currently,
> both the FSM and visibility forks are just hints, and we won't suffer
> data corruption if they're wrong; so we don't get too tense about WAL
> logging or fsync'ing updates. I believe Heikki is looking into what
> it'd take to make the visibility map 100% reliable, in connection with
> the desire for index-only scans. If we get that and the overhead isn't
> too terrible maybe we could build a frozen-status map the same way.

We could, but I think we'd be better off just freezing at the time we
mark the page PD_ALL_VISIBLE and then using the visibility map for
both purposes. Keeping around the old xmin values after every tuple
on the page is visible to every running transaction is useful only for
forensics, and building a whole new freeze map just to retain that
information longer (and eventually force a massive anti-wraparound
vacuum) seems like overkill.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Russell Smith <mr-russ(at)pws(dot)com(dot)au>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-02 19:10:23
Message-ID:	1275505502-sup-6468@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Robert Haas's message of mié jun 02 14:16:33 -0400 2010:

> We could, but I think we'd be better off just freezing at the time we
> mark the page PD_ALL_VISIBLE and then using the visibility map for
> both purposes. Keeping around the old xmin values after every tuple
> on the page is visible to every running transaction is useful only for
> forensics, and building a whole new freeze map just to retain that
> information longer (and eventually force a massive anti-wraparound
> vacuum) seems like overkill.

Reducing the xid wraparound horizon "a bit" is reasonable, but moving it
all the way forward to OldestXmin is a bit much, methinks.

Besides, there's another argument for not freezing tuples immediately:
they may be updated shortly thereafter, causing extra churn for no gain.

I'd prefer a setting that would tell the system to freeze all tuples
that fall within a safety range whenever any tuple in the page is frozen
-- weren't you working on a patch to do this? (was it Jeff Davis?)

(BTW maybe instead of separate visibility and freeze maps we could have
two bits in the visibility map?)

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Russell Smith <mr-russ(at)pws(dot)com(dot)au>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-02 23:22:51
Message-ID:	201006022322.o52NMpE05053@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera wrote:
> Excerpts from Robert Haas's message of mi jun 02 14:16:33 -0400 2010:
>
> > We could, but I think we'd be better off just freezing at the time we
> > mark the page PD_ALL_VISIBLE and then using the visibility map for
> > both purposes. Keeping around the old xmin values after every tuple
> > on the page is visible to every running transaction is useful only for
> > forensics, and building a whole new freeze map just to retain that
> > information longer (and eventually force a massive anti-wraparound
> > vacuum) seems like overkill.
>
> Reducing the xid wraparound horizon "a bit" is reasonable, but moving it
> all the way forward to OldestXmin is a bit much, methinks.
>
> Besides, there's another argument for not freezing tuples immediately:
> they may be updated shortly thereafter, causing extra churn for no gain.
>
> I'd prefer a setting that would tell the system to freeze all tuples
> that fall within a safety range whenever any tuple in the page is frozen
> -- weren't you working on a patch to do this? (was it Jeff Davis?)
>
> (BTW maybe instead of separate visibility and freeze maps we could have
> two bits in the visibility map?)

Yeah, the two-bits idea was suggested during the conversation core had
about the issue.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ None of us is going to be here forever. +

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Russell Smith <mr-russ(at)pws(dot)com(dot)au>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-03 01:39:55
Message-ID:	AANLkTikh4ZbN1aby7NY1RmEqH0oH5jJvdV0EE6xJIwNn@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jun 2, 2010 at 3:10 PM, Alvaro Herrera
<alvherre(at)commandprompt(dot)com> wrote:
> Excerpts from Robert Haas's message of mié jun 02 14:16:33 -0400 2010:
>
>> We could, but I think we'd be better off just freezing at the time we
>> mark the page PD_ALL_VISIBLE and then using the visibility map for
>> both purposes. Keeping around the old xmin values after every tuple
>> on the page is visible to every running transaction is useful only for
>> forensics, and building a whole new freeze map just to retain that
>> information longer (and eventually force a massive anti-wraparound
>> vacuum) seems like overkill.
>
> Reducing the xid wraparound horizon "a bit" is reasonable, but moving it
> all the way forward to OldestXmin is a bit much, methinks.

Why? If it's just for forensics, those are some pretty expensive
forensics - it eventually costs you an additional complete rewrite of
every page.

> Besides, there's another argument for not freezing tuples immediately:
> they may be updated shortly thereafter, causing extra churn for no gain.

But if you were going to update PD_ALL_VISIBLE, then you were going to
write the page anyway. You might as well freeze everything at the
same time so you don't have to come back.

Alternatively, you could do what I suggested upthread and just believe
PD_ALL_VISIBLE over the individual tuple xmins. Then you don't have
to freeze the page until it's next written, but you still get to keep
your forensic info.

> I'd prefer a setting that would tell the system to freeze all tuples
> that fall within a safety range whenever any tuple in the page is frozen
> -- weren't you working on a patch to do this? (was it Jeff Davis?)

Not me. I don't think that's going to help a whole lot, though. In
many of the painful scenarios, every tuple on the page will have the
same XID, and therefore they'll all be frozen at the same time anyway.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Russell Smith <mr-russ(at)pws(dot)com(dot)au>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-03 10:41:27
Message-ID:	AANLkTimmk5RnMpQbk7ZLv65YlydQ9fD4bPft_LQRrVt5@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

So I think the scheme in the original post of this thread is workable.
Not as described but could be made to work. In which case I think it's
preferable to a freeze map -- which I had previously assumed we would
need eventually.

The problem with the scheme originally described is that it assumed
you could have an cycle counter and then arrange that all the xids on
the page are within that cycle. That doesn't work because you could
easily have two live xids on the page that belong to two cycles -- one
FirstNormalTransactionId and one MaxTransactionId.

I think to make it work you need to store a whole 64-bit reference
transaction id consisting of both a cycle counter and a transaction
id. The invariant for the page is that every xid on the page can be
compared to that reference transaction id using normal transactionid
semantics. Actually I think the easiest way to do that is to set it to
the oldest xid on the page. The first thing to do before comparing any
transaction id on the page with a real transaction id would be to
figure out whether the reference xid is comparable to the live xid,
which if it's the oldest xid on the page implies they'll all be
comparable.

The way to maintain that invariant would be that any xid insertion on
the page must advance the reference xid if it's not comparable to the
newly inserted xid. It has to be advanced to the oldest xid that's
still comparable with the newly inserted xid. Any xids on the page
that are older than the new refernce xid have to be frozen or removed.
I'm not sure how to do that without keeping clog forever though.

From:	Jan Wieck <JanWieck(at)Yahoo(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Russell Smith <mr-russ(at)pws(dot)com(dot)au>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-04 11:47:19
Message-ID:	4C08E7C7.2090204@Yahoo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 6/2/2010 2:16 PM, Robert Haas wrote:
> On Wed, Jun 2, 2010 at 2:05 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
>>> The problem is that vacuum doesn't know that a certain part of the table
>>> is already frozen. It needs to scan it completely anyways. If we had a
>>> "frozen" map, we could mark pages that are completely frozen and thus do
>>> not need any vacuuming; but we don't (I don't recall the reasons for
>>> this. Maybe it's just that no one has gotten around to it, or maybe
>>> there's something else).
>>
>> Offhand I think the reason is that you'd have to trust the frozen bit
>> to be 100% correct (or at least never set to 1 in error). Currently,
>> both the FSM and visibility forks are just hints, and we won't suffer
>> data corruption if they're wrong; so we don't get too tense about WAL
>> logging or fsync'ing updates. I believe Heikki is looking into what
>> it'd take to make the visibility map 100% reliable, in connection with
>> the desire for index-only scans. If we get that and the overhead isn't
>> too terrible maybe we could build a frozen-status map the same way.
>
> We could, but I think we'd be better off just freezing at the time we
> mark the page PD_ALL_VISIBLE and then using the visibility map for
> both purposes. Keeping around the old xmin values after every tuple
> on the page is visible to every running transaction is useful only for
> forensics, and building a whole new freeze map just to retain that
> information longer (and eventually force a massive anti-wraparound
> vacuum) seems like overkill.

Agreed.

The whole business of minimum freeze age always struck me as leaving
bread crumbs behind. Other than forensics, what is the actual value of
that overhead?

Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

From:	Jan Wieck <JanWieck(at)Yahoo(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Russell Smith <mr-russ(at)pws(dot)com(dot)au>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-04 12:01:00
Message-ID:	4C08EAFC.5060609@Yahoo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 6/2/2010 3:10 PM, Alvaro Herrera wrote:
> Excerpts from Robert Haas's message of mié jun 02 14:16:33 -0400 2010:
>
>> We could, but I think we'd be better off just freezing at the time we
>> mark the page PD_ALL_VISIBLE and then using the visibility map for
>> both purposes. Keeping around the old xmin values after every tuple
>> on the page is visible to every running transaction is useful only for
>> forensics, and building a whole new freeze map just to retain that
>> information longer (and eventually force a massive anti-wraparound
>> vacuum) seems like overkill.
>
> Reducing the xid wraparound horizon "a bit" is reasonable, but moving it
> all the way forward to OldestXmin is a bit much, methinks.

Why?

>
> Besides, there's another argument for not freezing tuples immediately:
> they may be updated shortly thereafter, causing extra churn for no gain.

What extra churn does it create if the tuple can be frozen before the
bgwriter ever writes the page in the first place?

>
> I'd prefer a setting that would tell the system to freeze all tuples
> that fall within a safety range whenever any tuple in the page is frozen
> -- weren't you working on a patch to do this? (was it Jeff Davis?)

I just see a lot of cost caused by this "safety range". I yet have to
see its real value, other than "feel good".

Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Jan Wieck <JanWieck(at)Yahoo(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Russell Smith <mr-russ(at)pws(dot)com(dot)au>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-04 14:18:57
Message-ID:	2390.1275661137@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Jan Wieck <JanWieck(at)Yahoo(dot)com> writes:
> On 6/2/2010 3:10 PM, Alvaro Herrera wrote:
>> I'd prefer a setting that would tell the system to freeze all tuples
>> that fall within a safety range whenever any tuple in the page is frozen
>> -- weren't you working on a patch to do this? (was it Jeff Davis?)

> I just see a lot of cost caused by this "safety range". I yet have to
> see its real value, other than "feel good".

Jan, you don't know what you're talking about. I have repeatedly had
cases where being able to look at xmin was critical to understanding
a bug. I *will not* hold still for a solution that effectively reduces
min_freeze_age to zero.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Jan Wieck <JanWieck(at)yahoo(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Russell Smith <mr-russ(at)pws(dot)com(dot)au>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-04 14:57:50
Message-ID:	AANLkTik2T8Jo5JkJQLxoGOqqwLIHh4JiwDborhULzIas@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jun 4, 2010 at 10:18 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Jan Wieck <JanWieck(at)Yahoo(dot)com> writes:
>> On 6/2/2010 3:10 PM, Alvaro Herrera wrote:
>>> I'd prefer a setting that would tell the system to freeze all tuples
>>> that fall within a safety range whenever any tuple in the page is frozen
>>> -- weren't you working on a patch to do this? (was it Jeff Davis?)
>
>> I just see a lot of cost caused by this "safety range". I yet have to
>> see its real value, other than "feel good".
>
> Jan, you don't know what you're talking about. I have repeatedly had
> cases where being able to look at xmin was critical to understanding
> a bug. I *will not* hold still for a solution that effectively reduces
> min_freeze_age to zero.

So, we're talking in circles here. I've already proposed a method
that would avoid the need to wipe out the xmins:

http://archives.postgresql.org/pgsql-hackers/2010-05/msg01485.php

And you said that if we were going to do that we might as well just
freeze sooner:

http://archives.postgresql.org/pgsql-hackers/2010-05/msg01548.php

If you don't want to freeze sooner, let's go back to the method
described in the first email.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>,"Jan Wieck" <JanWieck(at)Yahoo(dot)com>
Cc:	"Josh Berkus" <josh(at)agliodbs(dot)com>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>, "Russell Smith" <mr-russ(at)pws(dot)com(dot)au>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-04 15:13:00
Message-ID:	4C08D1AC0200002500031F4A@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Jan Wieck <JanWieck(at)Yahoo(dot)com> writes:

>> I just see a lot of cost caused by this "safety range". I yet
>> have to see its real value, other than "feel good".
>
> Jan, you don't know what you're talking about. I have repeatedly
> had cases where being able to look at xmin was critical to
> understanding a bug. I *will not* hold still for a solution that
> effectively reduces min_freeze_age to zero.

In my experience with my own environment, I can honestly say that
it's clear that not freezing tuples quickly adds more cost than
running with cassert on. If we had to run in production with one or
the other, I would definitely choose cassert from a performance
perspective; which one would do more to find bugs? Why do we view
them so differently?

-Kevin

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	"Jan Wieck" <JanWieck(at)Yahoo(dot)com>, "Josh Berkus" <josh(at)agliodbs(dot)com>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>, "Russell Smith" <mr-russ(at)pws(dot)com(dot)au>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-04 15:45:12
Message-ID:	18122.1275666312@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
> In my experience with my own environment, I can honestly say that
> it's clear that not freezing tuples quickly adds more cost than
> running with cassert on. If we had to run in production with one or
> the other, I would definitely choose cassert from a performance
> perspective; which one would do more to find bugs? Why do we view
> them so differently?

The reason for not recommending cassert in production builds is not
cost but stability. Per the fine manual:

Also, having the tests turned on won't necessarily enhance the
stability of your server! The assertion checks are not categorized
for severity, and so what might be a relatively harmless bug will
still lead to server restarts if it triggers an assertion
failure. This option is not recommended for production use, but
you should have it on for development work or when running a beta
version.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Jan Wieck <JanWieck(at)yahoo(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Russell Smith <mr-russ(at)pws(dot)com(dot)au>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-04 15:49:34
Message-ID:	AANLkTimINO9iAhvPboj2F5iVzSYS0k1h5DEmusSZx3Vb@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jun 4, 2010 at 11:45 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
>> In my experience with my own environment, I can honestly say that
>> it's clear that not freezing tuples quickly adds more cost than
>> running with cassert on. If we had to run in production with one or
>> the other, I would definitely choose cassert from a performance
>> perspective; which one would do more to find bugs? Why do we view
>> them so differently?
>
> The reason for not recommending cassert in production builds is not
> cost but stability. Per the fine manual:
>
> Also, having the tests turned on won't necessarily enhance the
> stability of your server! The assertion checks are not categorized
> for severity, and so what might be a relatively harmless bug will
> still lead to server restarts if it triggers an assertion
> failure. This option is not recommended for production use, but
> you should have it on for development work or when running a beta
> version.

We routinely castigate people for benchmarking done with cassert
turned on, and tell them their numbers are meaningless.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Jan Wieck <JanWieck(at)yahoo(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Russell Smith <mr-russ(at)pws(dot)com(dot)au>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-04 15:59:25
Message-ID:	18960.1275667165@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Fri, Jun 4, 2010 at 11:45 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> The reason for not recommending cassert in production builds is not
>> cost but stability.

> We routinely castigate people for benchmarking done with cassert
> turned on, and tell them their numbers are meaningless.

I didn't say it wasn't expensive ;-). But Kevin's question seemed to
be based on the assumption that runtime cost was the only negative.
It wouldn't be terribly hard to make a variant of cassert that skips
two or three of the most expensive things (particularly memory context
checking and CLOBBER_FREED_MEMORY), and from a cost perspective that
would be totally reasonable to run in production. We haven't done it
because of the stability issue.

regards, tom lane

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Robert Haas" <robertmhaas(at)gmail(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Josh Berkus" <josh(at)agliodbs(dot)com>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>, "Russell Smith" <mr-russ(at)pws(dot)com(dot)au>, "Jan Wieck" <JanWieck(at)yahoo(dot)com>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-04 16:18:10
Message-ID:	4C08E0F20200002500031F90@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> But Kevin's question seemed to be based on the assumption that
> runtime cost was the only negative. It wouldn't be terribly hard
> to make a variant of cassert that skips two or three of the most
> expensive things (particularly memory context checking and
> CLOBBER_FREED_MEMORY), and from a cost perspective that would be
> totally reasonable to run in production. We haven't done it
> because of the stability issue.

Fair enough. I was thinking of them both as debugging features,
which had various ideas roiling around in my head. Having run
hundreds of databases 24/7 for years without ever needing this
information, but paying the cost for it one way or another every
day, my perspective is that it would be A Good Thing if it could
just be turned on when needed. If you have recurring bug that can
be arranged, but in those cases you have other options; so I'm
assuming you want this kept because it is primarily of forensic
value after a non-repeatable bug has munged something?

Another thought bouncing around was that these breadcrumbs are
expensive; I was trying to think of some other way to capture the
information which would be cheaper, but I haven't thought of
anything, and I'm far from certain that cheaper breadcrumbs to
answer the need can be developed. The best thought I've had so far
is that if someone kept WAL files long enough the evidence might be
in there somewhere....

-Kevin

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Russell Smith <mr-russ(at)pws(dot)com(dot)au>, Jan Wieck <JanWieck(at)yahoo(dot)com>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-04 16:23:33
Message-ID:	201006041623.o54GNXH26338@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Kevin Grittner wrote:
> Fair enough. I was thinking of them both as debugging features,
> which had various ideas roiling around in my head. Having run
> hundreds of databases 24/7 for years without ever needing this
> information, but paying the cost for it one way or another every
> day, my perspective is that it would be A Good Thing if it could
> just be turned on when needed. If you have recurring bug that can
> be arranged, but in those cases you have other options; so I'm
> assuming you want this kept because it is primarily of forensic
> value after a non-repeatable bug has munged something?
>
> Another thought bouncing around was that these breadcrumbs are
> expensive; I was trying to think of some other way to capture the
> information which would be cheaper, but I haven't thought of
> anything, and I'm far from certain that cheaper breadcrumbs to
> answer the need can be developed. The best thought I've had so far
> is that if someone kept WAL files long enough the evidence might be
> in there somewhere....

The idea that thousands of Postgres installations are slower just so we
can occasionally debug xmin/xmax issues seems way off balance to me.
If people want debugging, let them modify the freeze age settings; the
defaults should not favor debugging when there is a measurable cost
involved. How many times in the past five years have we even needed
such debugging information, and also are cases where we could not have
told the user to change freeze settings to get us that info?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ None of us is going to be here forever. +

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	"Robert Haas" <robertmhaas(at)gmail(dot)com>, "Josh Berkus" <josh(at)agliodbs(dot)com>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>, "Russell Smith" <mr-russ(at)pws(dot)com(dot)au>, "Jan Wieck" <JanWieck(at)yahoo(dot)com>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-04 16:28:01
Message-ID:	19374.1275668881@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
> ... my perspective is that it would be A Good Thing if it could
> just be turned on when needed. If you have recurring bug that can
> be arranged, but in those cases you have other options; so I'm
> assuming you want this kept because it is primarily of forensic
> value after a non-repeatable bug has munged something?

Yeah, that's exactly the problem. When you realize you need it,
it's too late.

> The best thought I've had so far
> is that if someone kept WAL files long enough the evidence might be
> in there somewhere....

Hm, that is an excellent point. The WAL trace would actually be a lot
superior in terms of being able to figure out what went wrong. But
I don't quite see how we tell people "either keep xmin or keep your
old WAL". Also, for production sites the amount of WAL you'd have to
hang onto seems a bit daunting. Other problems are the cost of shipping
it to a developer, and the impracticality of sanitizing private data in
it before you show it to somebody.

regards, tom lane

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Josh Berkus" <josh(at)agliodbs(dot)com>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>, "Russell Smith" <mr-russ(at)pws(dot)com(dot)au>, "Jan Wieck" <JanWieck(at)yahoo(dot)com>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-04 16:38:16
Message-ID:	4C08E5A80200002500031FA2@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:

>> The best thought I've had so far is that if someone kept WAL
>> files long enough the evidence might be in there somewhere....
>
> Hm, that is an excellent point. The WAL trace would actually be a
> lot superior in terms of being able to figure out what went wrong.
> But I don't quite see how we tell people "either keep xmin or keep
> your old WAL". Also, for production sites the amount of WAL you'd
> have to hang onto seems a bit daunting.

Any thoughts on how far back the WAL would need to go to deal with
the issues where such information has been useful? (For example, we
always have at least two weeks worth, but I don't know if that's a
useful range or not.)

> Other problems are the cost of shipping it to a developer, and the
> impracticality of sanitizing private data in it before you show it
> to somebody.

Yeah, this wouldn't be a practical answer to the need unless
PostgreSQL shipped with a tool which could scan WAL and extract the
relevant information (probably under direction of someone from the
list or a private support organization). Is the required
information predictable enough to make developing such a tool a
tractable problem?

-Kevin

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	"Josh Berkus" <josh(at)agliodbs(dot)com>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>, "Russell Smith" <mr-russ(at)pws(dot)com(dot)au>, "Jan Wieck" <JanWieck(at)yahoo(dot)com>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-04 17:20:15
Message-ID:	20146.1275672015@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Hm, that is an excellent point. The WAL trace would actually be a
>> lot superior in terms of being able to figure out what went wrong.
>> But I don't quite see how we tell people "either keep xmin or keep
>> your old WAL". Also, for production sites the amount of WAL you'd
>> have to hang onto seems a bit daunting.

> Any thoughts on how far back the WAL would need to go to deal with
> the issues where such information has been useful? (For example, we
> always have at least two weeks worth, but I don't know if that's a
> useful range or not.)

Well, it's a "how long does it take you to notice data corruption"
kind of issue. The most recent case I can think of where xmin was
helpful was in trying to sort out a problem with an index being
inconsistent with the heap, which manifested as wrong query answers
for the user. I don't know how long it took him to recognize and
report the problem. (We never did locate the bug-if-any, IIRC...
it would have been much more helpful to have the WAL trace. xmin
did let me rule out some theories, though.)

>> Other problems are the cost of shipping it to a developer, and the
>> impracticality of sanitizing private data in it before you show it
>> to somebody.

> Yeah, this wouldn't be a practical answer to the need unless
> PostgreSQL shipped with a tool which could scan WAL and extract the
> relevant information (probably under direction of someone from the
> list or a private support organization). Is the required
> information predictable enough to make developing such a tool a
> tractable problem?

Hard to tell. If we were actually going in this direction we'd
want to write a much better WAL-text-dump tool than we have, and then
in principle somebody could sanitize the text output before shipping
it off. But going through a large volume of data that way could be
pretty impractical. Also, we (or at least I) have nearly zip experience
with trying to debug problems by examining WAL, so it's not real clear
to me which details might be important.

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Russell Smith <mr-russ(at)pws(dot)com(dot)au>, Jan Wieck <JanWieck(at)yahoo(dot)com>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-04 17:35:59
Message-ID:	20379.1275672959@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> The idea that thousands of Postgres installations are slower just so we
> can occasionally debug xmin/xmax issues seems way off balance to me.

There's no evidence whatsoever that the scope of the problem is that large.

> If people want debugging, let them modify the freeze age settings; the
> defaults should not favor debugging when there is a measurable cost
> involved. How many times in the past five years have we even needed
> such debugging information, and also are cases where we could not have
> told the user to change freeze settings to get us that info?

You're missing the point here: this is something we need when trying
to make sense of cases that are hard or impossible to reproduce.
Retroactively changing the freeze policy isn't possible.

regards, tom lane

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Josh Berkus" <josh(at)agliodbs(dot)com>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Bruce Momjian" <bruce(at)momjian(dot)us>, "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>, "Russell Smith" <mr-russ(at)pws(dot)com(dot)au>, "Jan Wieck" <JanWieck(at)yahoo(dot)com>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-04 17:52:50
Message-ID:	4C08F7220200002500031FC1@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
>> The idea that thousands of Postgres installations are slower just
>> so we can occasionally debug xmin/xmax issues seems way off
>> balance to me.
>
> There's no evidence whatsoever that the scope of the problem is
> that large.

Well, are we agreed that the current approach means that insertion
of a heap tuple normally requires it to be written to disk three
times, with two of those WAL-logged? And that deletion of a tuple
generally requires the same? I'd say that constitutes prima facie
evidence that any PostgreSQL installation doing any significant
number of writes is slower because of this. Are you suggesting
there aren't thousands of such installations, or that the repeated
disk writes are generally free?

-Kevin

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Josh Berkus" <josh(at)agliodbs(dot)com>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>, "Russell Smith" <mr-russ(at)pws(dot)com(dot)au>, "Jan Wieck" <JanWieck(at)yahoo(dot)com>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-04 17:56:54
Message-ID:	4C08F8160200002500031FCC@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> If we were actually going in this direction we'd want to write a
> much better WAL-text-dump tool than we have, and then in principle
> somebody could sanitize the text output before shipping it off.

I wouldn't think this would be practical unless there was a way to
scan the WAL files and dump only the bits related to the affected
pages, and probably with at least an option (possibly default) to
just mention the data type and length, rather than showing the
actual values -- how often are the values relevant, anyway? (Not
rhetorical; I really don't know.)

-Kevin

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Russell Smith <mr-russ(at)pws(dot)com(dot)au>, Jan Wieck <JanWieck(at)yahoo(dot)com>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-04 17:57:58
Message-ID:	201006041757.o54Hvwt17648@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > The idea that thousands of Postgres installations are slower just so we
> > can occasionally debug xmin/xmax issues seems way off balance to me.
>
> There's no evidence whatsoever that the scope of the problem is that large.
>
> > If people want debugging, let them modify the freeze age settings; the
> > defaults should not favor debugging when there is a measurable cost
> > involved. How many times in the past five years have we even needed
> > such debugging information, and also are cases where we could not have
> > told the user to change freeze settings to get us that info?
>
> You're missing the point here: this is something we need when trying
> to make sense of cases that are hard or impossible to reproduce.
> Retroactively changing the freeze policy isn't possible.

With in-place VACUUM FULL gone in 9.0, will there be as much need for
xmin/xmax forensics?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ None of us is going to be here forever. +

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Russell Smith <mr-russ(at)pws(dot)com(dot)au>, Jan Wieck <JanWieck(at)yahoo(dot)com>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-04 19:35:29
Message-ID:	22317.1275680129@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> With in-place VACUUM FULL gone in 9.0, will there be as much need for
> xmin/xmax forensics?

You know perfectly well that no one could answer that question.
(Or at least not answer it on the basis of facts available today.)

regards, tom lane

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Russell Smith <mr-russ(at)pws(dot)com(dot)au>, Jan Wieck <JanWieck(at)yahoo(dot)com>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-04 19:39:07
Message-ID:	201006041939.o54Jd7p08264@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > With in-place VACUUM FULL gone in 9.0, will there be as much need for
> > xmin/xmax forensics?
>
> You know perfectly well that no one could answer that question.
> (Or at least not answer it on the basis of facts available today.)

Well, guess then. In the past, how many forensic cases were needed for
in-place VACUUM FULL bugs, vs. other cases?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ None of us is going to be here forever. +

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Russell Smith <mr-russ(at)pws(dot)com(dot)au>, Jan Wieck <JanWieck(at)yahoo(dot)com>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-04 20:18:58
Message-ID:	22886.1275682738@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> Tom Lane wrote:
>> Bruce Momjian <bruce(at)momjian(dot)us> writes:
>>> With in-place VACUUM FULL gone in 9.0, will there be as much need for
>>> xmin/xmax forensics?
>>
>> You know perfectly well that no one could answer that question.
>> (Or at least not answer it on the basis of facts available today.)

> Well, guess then.

I already told you my opinion on this matter. Since you're prepared
to discount that, I don't see why you'd put any credence in my
evidence-free guesses.

regards, tom lane

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kevin(dot)grittner(at)wicourts(dot)gov>, Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Russell Smith <mr-russ(at)pws(dot)com(dot)au>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-04 20:27:19
Message-ID:	1275682945-sup-9013@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Bruce Momjian's message of vie jun 04 15:39:07 -0400 2010:
> Tom Lane wrote:
> > Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > > With in-place VACUUM FULL gone in 9.0, will there be as much need for
> > > xmin/xmax forensics?
> >
> > You know perfectly well that no one could answer that question.
> > (Or at least not answer it on the basis of facts available today.)
>
> Well, guess then. In the past, how many forensic cases were needed for
> in-place VACUUM FULL bugs, vs. other cases?

I don't understand the question. I know I have debugged a bunch of
cases of data corruption, and having xmin/xmax around has been truly
useful. VACUUM FULL has never been involved (that I know of -- most of
our customers tend not to run it AFAIK), so why would I care about
whether it's gone in 9.0? Note that it's not always about PG bugs; but
in the cases where xmin=FrozenXid for all/most involved tuples, the
problems are more difficult to track down.

Yes, VACUUM FULL had bugs too -- I, for one, welcome our new
not-in-place VACUUM FULL overlord.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kevin(dot)grittner(at)wicourts(dot)gov>, Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Russell Smith <mr-russ(at)pws(dot)com(dot)au>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-04 20:35:55
Message-ID:	201006042035.o54KZtT19853@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera wrote:
> Excerpts from Bruce Momjian's message of vie jun 04 15:39:07 -0400 2010:
> > Tom Lane wrote:
> > > Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > > > With in-place VACUUM FULL gone in 9.0, will there be as much need for
> > > > xmin/xmax forensics?
> > >
> > > You know perfectly well that no one could answer that question.
> > > (Or at least not answer it on the basis of facts available today.)
> >
> > Well, guess then. In the past, how many forensic cases were needed for
> > in-place VACUUM FULL bugs, vs. other cases?
>
> I don't understand the question. I know I have debugged a bunch of
> cases of data corruption, and having xmin/xmax around has been truly
> useful. VACUUM FULL has never been involved (that I know of -- most of
> our customers tend not to run it AFAIK), so why would I care about
> whether it's gone in 9.0? Note that it's not always about PG bugs; but
> in the cases where xmin=FrozenXid for all/most involved tuples, the
> problems are more difficult to track down.
>
> Yes, VACUUM FULL had bugs too -- I, for one, welcome our new
> not-in-place VACUUM FULL overlord.

OK, so we had lots of forensics the didn't involve VACUUM FULL. That's
what I wanted to know.
--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ None of us is going to be here forever. +

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Josh Berkus <josh(at)agliodbs(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Russell Smith <mr-russ(at)pws(dot)com(dot)au>, Jan Wieck <JanWieck(at)yahoo(dot)com>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-07 22:54:13
Message-ID:	4C0D7895.30503@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> If we were actually going in this direction we'd
> want to write a much better WAL-text-dump tool than we have, and then
> in principle somebody could sanitize the text output before shipping
> it off. But going through a large volume of data that way could be
> pretty impractical. Also, we (or at least I) have nearly zip experience
> with trying to debug problems by examining WAL, so it's not real clear
> to me which details might be important.
>

There's another interesting thing about moving in this direction too.
Systems that have WAL archiving setup that run into a problem could end
up being a much richer source for historical analysis of how the system
got into the bad state than is available right now. Typically those can
have longer histories available than you'll find on a primary that's
recycling segments all the time.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us

From:	Stephen Frost <sfrost(at)snowman(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Josh Berkus <josh(at)agliodbs(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Russell Smith <mr-russ(at)pws(dot)com(dot)au>, Jan Wieck <JanWieck(at)yahoo(dot)com>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-08 01:26:14
Message-ID:	20100608012614.GH21875@tamriel.snowman.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

* Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> If we were actually going in this direction we'd
> want to write a much better WAL-text-dump tool than we have, and then
> in principle somebody could sanitize the text output before shipping
> it off. But going through a large volume of data that way could be
> pretty impractical. Also, we (or at least I) have nearly zip experience
> with trying to debug problems by examining WAL, so it's not real clear
> to me which details might be important.

Just an off-the-wall thought, but, would it be possible to have a tool
which read WAL backwards and compared entries in the WAL against entries
on disk? I realize that you'd only see one version of a particular
block and then have to skip any updates which are earlier than it, but
it seems like you could cover a pretty large chunk of the recent changes
to the database using this approach..

Thanks,

Stephen

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Stephen Frost <sfrost(at)snowman(dot)net>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Josh Berkus <josh(at)agliodbs(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Russell Smith <mr-russ(at)pws(dot)com(dot)au>, Jan Wieck <JanWieck(at)yahoo(dot)com>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-08 12:27:11
Message-ID:	AANLkTint9RWHsO9xOQljtGL4cJ969IFy8Uu-m_JoaXaL@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jun 8, 2010 at 2:26 AM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> Just an off-the-wall thought, but, would it be possible to have a tool
> which read WAL backwards and compared entries in the WAL against entries
> on disk? I realize that you'd only see one version of a particular
> block and then have to skip any updates which are earlier than it, but
> it seems like you could cover a pretty large chunk of the recent changes
> to the database using this approach..

I assume you mean back out the changes incrementally until you find a
full_page_write and see if it matches? And continue comparing with
full_page_writes once per checkpoint? I don't think the WAL has enough
information to replay backwards though. For example vacuum cleanup
records just list the tids to remove. They don't have the contents to
replace there.

--
greg

From:	Stephen Frost <sfrost(at)snowman(dot)net>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Josh Berkus <josh(at)agliodbs(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Russell Smith <mr-russ(at)pws(dot)com(dot)au>, Jan Wieck <JanWieck(at)yahoo(dot)com>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-08 12:58:02
Message-ID:	20100608125802.GM21875@tamriel.snowman.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

* Greg Stark (gsstark(at)mit(dot)edu) wrote:
> I assume you mean back out the changes incrementally until you find a
> full_page_write and see if it matches?

To be honest, you're already assuming I know more about how this all
works than I do. :) The gist of my thought was simply- we write out
block changes to the WAL, including data in many cases. If we were to
look at the very end of the WAL, at the last piece of data written
there, and the data files have supposedly been flushed, then what's in
the WAL at that point should match what's in the data files, right? If
it doesn't, that'd be bad.

> And continue comparing with
> full_page_writes once per checkpoint?

If we could only do it when there's a full page write, then perhaps that
would work as well, but I thought we tracked them at a lower level. In
any case, the idea is the same- compare what's in WAL to what's supposed
to be on disk, and alarm whenever there's a clear error.

> I don't think the WAL has enough
> information to replay backwards though. For example vacuum cleanup
> records just list the tids to remove. They don't have the contents to
> replace there.

Right, you couldn't actually move the database backwards in time using
this tool (because we only write out new data, we don't write out what
was in that block/page before the write)- that isn't the idea or intent.
It would just be a tool that someone could run against a database where
they've detected corruption (or, I dunno, more frequently, to perhaps
catch corruption faster?), to see if the problem is a PG bug or a
hardware/kernel/etc issue. In fact, if you could somehow do this
against a backup that's been taken using pg_start_backup/pg_stop_backup,
that would be pretty awesome.

I know that if such a tool existed, I'd be happy to run it as part of my
regular backup routines- I *always* have all the WALs from my last
backup to my next backup (and typically farther back than that, eg: if I
run full backups weekly, I'll have 4 full backups + all 4 weeks of WALs,
to be able to replay back to any point in the month..).

The big question that I have is- would this actually be productive?
Would it actually be able to catch hardware corruption or help at all
with PG bugs? Those are the things I'm not really sure about.

Thanks,

Stephen

From:	Jan Wieck <JanWieck(at)Yahoo(dot)com>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Stephen Frost <sfrost(at)snowman(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Josh Berkus <josh(at)agliodbs(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Russell Smith <mr-russ(at)pws(dot)com(dot)au>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-08 14:04:51
Message-ID:	4C0E4E03.2060805@Yahoo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 6/8/2010 8:27 AM, Greg Stark wrote:
> On Tue, Jun 8, 2010 at 2:26 AM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
>> Just an off-the-wall thought, but, would it be possible to have a tool
>> which read WAL backwards and compared entries in the WAL against entries
>> on disk? I realize that you'd only see one version of a particular
>> block and then have to skip any updates which are earlier than it, but
>> it seems like you could cover a pretty large chunk of the recent changes
>> to the database using this approach..
>
> I assume you mean back out the changes incrementally until you find a
> full_page_write and see if it matches? And continue comparing with
> full_page_writes once per checkpoint? I don't think the WAL has enough
> information to replay backwards though. For example vacuum cleanup
> records just list the tids to remove. They don't have the contents to
> replace there.

You can't back out changes. WAL does not contain before images.

Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Jan Wieck <JanWieck(at)Yahoo(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Russell Smith <mr-russ(at)pws(dot)com(dot)au>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-08 20:55:01
Message-ID:	1276030501.12489.3814.camel@ebony
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, 2010-06-04 at 10:18 -0400, Tom Lane wrote:
> Jan Wieck <JanWieck(at)Yahoo(dot)com> writes:
> > On 6/2/2010 3:10 PM, Alvaro Herrera wrote:
> >> I'd prefer a setting that would tell the system to freeze all tuples
> >> that fall within a safety range whenever any tuple in the page is frozen
> >> -- weren't you working on a patch to do this? (was it Jeff Davis?)
>
> > I just see a lot of cost caused by this "safety range". I yet have to
> > see its real value, other than "feel good".
>
> Jan, you don't know what you're talking about. I have repeatedly had
> cases where being able to look at xmin was critical to understanding
> a bug. I *will not* hold still for a solution that effectively reduces
> min_freeze_age to zero.

Recent history shows Tom's view to be the most useful one: its useful to
keep seeing the xmin. The last time we altered the way we set hint bits
we caused multiple data loss bugs doing it. We will need to debug things
and the WAL is always long gone (great idea though).

Why not just have a separate flag for HEAP_XMIN_FROZEN, that way we can
keep the xmin but also can see it is frozen?

We already WAL-log certain flag settings, so why not this one also?

--
Simon Riggs www.2ndQuadrant.com

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jan Wieck <JanWieck(at)yahoo(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Russell Smith <mr-russ(at)pws(dot)com(dot)au>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-08 20:58:02
Message-ID:	AANLkTikYI4Q1tC_lvleBTna5q9Tjw4C0M2c_gUjdGZzi@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jun 8, 2010 at 4:55 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Fri, 2010-06-04 at 10:18 -0400, Tom Lane wrote:
>> Jan Wieck <JanWieck(at)Yahoo(dot)com> writes:
>> > On 6/2/2010 3:10 PM, Alvaro Herrera wrote:
>> >> I'd prefer a setting that would tell the system to freeze all tuples
>> >> that fall within a safety range whenever any tuple in the page is frozen
>> >> -- weren't you working on a patch to do this? (was it Jeff Davis?)
>>
>> > I just see a lot of cost caused by this "safety range". I yet have to
>> > see its real value, other than "feel good".
>>
>> Jan, you don't know what you're talking about. I have repeatedly had
>> cases where being able to look at xmin was critical to understanding
>> a bug. I *will not* hold still for a solution that effectively reduces
>> min_freeze_age to zero.
>
> Recent history shows Tom's view to be the most useful one: its useful to
> keep seeing the xmin. The last time we altered the way we set hint bits
> we caused multiple data loss bugs doing it. We will need to debug things
> and the WAL is always long gone (great idea though).
>
> Why not just have a separate flag for HEAP_XMIN_FROZEN, that way we can
> keep the xmin but also can see it is frozen?

We could do that, but I think the point of this exercise is to reduce
I/O - specifically, I/O caused by anti-wraparound vacuums - and I'm
not clear how such a flag would help with that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jan Wieck <JanWieck(at)yahoo(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Russell Smith <mr-russ(at)pws(dot)com(dot)au>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-08 21:08:13
Message-ID:	1276031293.12489.3932.camel@ebony
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, 2010-06-08 at 16:58 -0400, Robert Haas wrote:
> On Tue, Jun 8, 2010 at 4:55 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> > On Fri, 2010-06-04 at 10:18 -0400, Tom Lane wrote:
> >> Jan Wieck <JanWieck(at)Yahoo(dot)com> writes:
> >> > On 6/2/2010 3:10 PM, Alvaro Herrera wrote:
> >> >> I'd prefer a setting that would tell the system to freeze all tuples
> >> >> that fall within a safety range whenever any tuple in the page is frozen
> >> >> -- weren't you working on a patch to do this? (was it Jeff Davis?)
> >>
> >> > I just see a lot of cost caused by this "safety range". I yet have to
> >> > see its real value, other than "feel good".
> >>
> >> Jan, you don't know what you're talking about. I have repeatedly had
> >> cases where being able to look at xmin was critical to understanding
> >> a bug. I *will not* hold still for a solution that effectively reduces
> >> min_freeze_age to zero.
> >
> > Recent history shows Tom's view to be the most useful one: its useful to
> > keep seeing the xmin. The last time we altered the way we set hint bits
> > we caused multiple data loss bugs doing it. We will need to debug things
> > and the WAL is always long gone (great idea though).
> >
> > Why not just have a separate flag for HEAP_XMIN_FROZEN, that way we can
> > keep the xmin but also can see it is frozen?
>
> We could do that, but I think the point of this exercise is to reduce
> I/O - specifically, I/O caused by anti-wraparound vacuums - and I'm
> not clear how such a flag would help with that.

Hmmm: You suggested a variant of this idea, so whatever reasoning was
behind your suggestion would be shared here, surely?

Tom has been saying we cannot freeze early because we need to keep
xmins. I agree with that. This suggestion shows it is possible to freeze
a tuple AND keep its xmin. So that removes the argument that we should
freeze more aggressively (whenever we write the block) and can thus
reduce longer term I/O costs.

--
Simon Riggs www.2ndQuadrant.com

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jan Wieck <JanWieck(at)yahoo(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Russell Smith <mr-russ(at)pws(dot)com(dot)au>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-08 22:03:32
Message-ID:	AANLkTikFDdmukPzoAT4m3LidN1pqGbo6Gt24mL3P1r5k@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jun 8, 2010 at 5:08 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> > Why not just have a separate flag for HEAP_XMIN_FROZEN, that way we can
>> > keep the xmin but also can see it is frozen?
>>
>> We could do that, but I think the point of this exercise is to reduce
>> I/O - specifically, I/O caused by anti-wraparound vacuums - and I'm
>> not clear how such a flag would help with that.
>
> Hmmm: You suggested a variant of this idea, so whatever reasoning was
> behind your suggestion would be shared here, surely?
>
> Tom has been saying we cannot freeze early because we need to keep
> xmins. I agree with that. This suggestion shows it is possible to freeze
> a tuple AND keep its xmin. So that removes the argument that we should
> freeze more aggressively (whenever we write the block) and can thus
> reduce longer term I/O costs.

OK, yes, I see what you're getting at now. There are two possible
ways to do freeze the tuples and keep the xmin: we can either rely on
the PD_ALL_VISIBLE page-level bit (as I previously proposed) or we can
additionally have a HEAP_XMIN_FROZEN bit as you propose here. I am
not sure which way is better.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jan Wieck <JanWieck(at)yahoo(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Russell Smith <mr-russ(at)pws(dot)com(dot)au>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-08 22:18:26
Message-ID:	1276035506.12489.4314.camel@ebony
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, 2010-06-08 at 18:03 -0400, Robert Haas wrote:
> OK, yes, I see what you're getting at now. There are two possible
> ways to do freeze the tuples and keep the xmin: we can either rely on
> the PD_ALL_VISIBLE page-level bit (as I previously proposed) or we can
> additionally have a HEAP_XMIN_FROZEN bit as you propose here. I am
> not sure which way is better.

Doing it at tuple level is more flexible and allows more aggressive
freezing. It also works better with existing tuple visibility code.

--
Simon Riggs www.2ndQuadrant.com

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Jan Wieck <JanWieck(at)yahoo(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Russell Smith <mr-russ(at)pws(dot)com(dot)au>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-08 22:35:00
Message-ID:	11823.1276036500@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> On Tue, 2010-06-08 at 18:03 -0400, Robert Haas wrote:
>> OK, yes, I see what you're getting at now. There are two possible
>> ways to do freeze the tuples and keep the xmin: we can either rely on
>> the PD_ALL_VISIBLE page-level bit (as I previously proposed) or we can
>> additionally have a HEAP_XMIN_FROZEN bit as you propose here. I am
>> not sure which way is better.

> Doing it at tuple level is more flexible and allows more aggressive
> freezing. It also works better with existing tuple visibility code.

I agree, relying on a page-level bit (or field) is unpleasant in a
number of ways.

But none of this accomplishes a damn thing towards the original goal,
which was to avoid an extra disk write associated with freezing (not
to mention an extra write for setting the transaction-committed hint
bit). Setting a bit is no cheaper from that standpoint than changing
the xmin field.

regards, tom lane

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jan Wieck <janwieck(at)yahoo(dot)com>, Russell Smith <mr-russ(at)pws(dot)com(dot)au>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-08 23:07:05
Message-ID:	1276038360-sup-4047@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Tom Lane's message of mar jun 08 18:35:00 -0400 2010:

> But none of this accomplishes a damn thing towards the original goal,
> which was to avoid an extra disk write associated with freezing (not
> to mention an extra write for setting the transaction-committed hint
> bit). Setting a bit is no cheaper from that standpoint than changing
> the xmin field.

... unless the bit is outside the page itself -- so we get back to the
idea of a freeze map.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Jan Wieck <JanWieck(at)yahoo(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Russell Smith <mr-russ(at)pws(dot)com(dot)au>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-09 00:34:02
Message-ID:	AANLkTikMyZaw4wrfb9C3oBnFjJ0jOHhQhtOqg8JU7xP-@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jun 8, 2010 at 6:35 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
>> On Tue, 2010-06-08 at 18:03 -0400, Robert Haas wrote:
>>> OK, yes, I see what you're getting at now. There are two possible
>>> ways to do freeze the tuples and keep the xmin: we can either rely on
>>> the PD_ALL_VISIBLE page-level bit (as I previously proposed) or we can
>>> additionally have a HEAP_XMIN_FROZEN bit as you propose here. I am
>>> not sure which way is better.
>
>> Doing it at tuple level is more flexible and allows more aggressive
>> freezing. It also works better with existing tuple visibility code.
>
> I agree, relying on a page-level bit (or field) is unpleasant in a
> number of ways.
>
> But none of this accomplishes a damn thing towards the original goal,
> which was to avoid an extra disk write associated with freezing (not
> to mention an extra write for setting the transaction-committed hint
> bit). Setting a bit is no cheaper from that standpoint than changing
> the xmin field.

Except for insert-only tables, I don't believe this is true. If you
freeze all tuples by the time the pages are marked all-visible,
perhaps via the xmin-preserving mechanism Simon suggested, then you
can use the visibility map to skip anti-wraparound vacuum as well as
regular vacuum. That sounds to me like it's accomplishing something.
Is it a complete solution? No. Is it better than what we have now?
Yes.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Jan Wieck <JanWieck(at)yahoo(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Russell Smith <mr-russ(at)pws(dot)com(dot)au>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-09 06:38:04
Message-ID:	1276065484.12489.6336.camel@ebony
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, 2010-06-08 at 18:35 -0400, Tom Lane wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> > On Tue, 2010-06-08 at 18:03 -0400, Robert Haas wrote:
> >> OK, yes, I see what you're getting at now. There are two possible
> >> ways to do freeze the tuples and keep the xmin: we can either rely on
> >> the PD_ALL_VISIBLE page-level bit (as I previously proposed) or we can
> >> additionally have a HEAP_XMIN_FROZEN bit as you propose here. I am
> >> not sure which way is better.
>
> > Doing it at tuple level is more flexible and allows more aggressive
> > freezing. It also works better with existing tuple visibility code.
>
> I agree, relying on a page-level bit (or field) is unpleasant in a
> number of ways.
>
> But none of this accomplishes a damn thing towards the original goal,
> which was to avoid an extra disk write associated with freezing (not
> to mention an extra write for setting the transaction-committed hint
> bit). Setting a bit is no cheaper from that standpoint than changing
> the xmin field.

No, it doesn't of itself, but if you raise a complaint then we must
first address the complaint as a sub-topic before we continue the main
discussion around $TOPIC. My proposal removes the barrier that early
freezing would overwrite xmin values, so early freezing need not have
any negative effects.

The general idea is to hide the "third write" (freezing) on a tuple by
making it happen at the same time as another tuple's "second
write" (hint bit setting).

I'm happy to let that continue by the OPs.

--
Simon Riggs www.2ndQuadrant.com

From:	marcin mank <marcin(dot)mank(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-09 10:45:32
Message-ID:	AANLkTins8pinYCPYpnVZTfnTrrnbJ_OZSV7jJEfP4t2W@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jun 9, 2010 at 12:35 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
>> On Tue, 2010-06-08 at 18:03 -0400, Robert Haas wrote:
>>> OK, yes, I see what you're getting at now. There are two possible
>>> ways to do freeze the tuples and keep the xmin: we can either rely on
>>> the PD_ALL_VISIBLE page-level bit (as I previously proposed) or we can
>>> additionally have a HEAP_XMIN_FROZEN bit as you propose here. I am
>>> not sure which way is better.
>
>> Doing it at tuple level is more flexible and allows more aggressive
>> freezing. It also works better with existing tuple visibility code.
>
> I agree, relying on a page-level bit (or field) is unpleasant in a
> number of ways.
>
> But none of this accomplishes a damn thing towards the original goal,
> which was to avoid an extra disk write associated with freezing (not
> to mention an extra write for setting the transaction-committed hint
> bit). Setting a bit is no cheaper from that standpoint than changing
> the xmin field.
>

Could a tuple wih the bit set be considered frozen already? Would we
actually ever need to rewrite the xmin, even for anti-wraparound
reasons?

Greetings
Marcin

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Jan Wieck <JanWieck(at)yahoo(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Russell Smith <mr-russ(at)pws(dot)com(dot)au>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-09 16:09:36
Message-ID:	26093.1276099776@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Tue, Jun 8, 2010 at 6:35 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> But none of this accomplishes a damn thing towards the original goal,
>> which was to avoid an extra disk write associated with freezing (not
>> to mention an extra write for setting the transaction-committed hint
>> bit). Setting a bit is no cheaper from that standpoint than changing
>> the xmin field.

> Except for insert-only tables, I don't believe this is true.

But insert-only tables are exactly the case that Josh complained about
to start with.

> If you
> freeze all tuples by the time the pages are marked all-visible,
> perhaps via the xmin-preserving mechanism Simon suggested, then you
> can use the visibility map to skip anti-wraparound vacuum as well as
> regular vacuum. That sounds to me like it's accomplishing something.
> Is it a complete solution? No. Is it better than what we have now?
> Yes.

I do like the idea of using a status bit rather than FrozenXid to mark a
frozen tuple, because that eliminates the conflict between wanting to
freeze aggressively for performance reasons and wanting to preserve Xids
for forensic reasons. But it doesn't seem to do much for Josh's
original problem.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Jan Wieck <JanWieck(at)yahoo(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Russell Smith <mr-russ(at)pws(dot)com(dot)au>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-09 16:49:43
Message-ID:	AANLkTilIKnm29dEoAEhWtt6kE1NxnWUVZ0bg-fqTRAbI@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jun 9, 2010 at 12:09 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> If you
>> freeze all tuples by the time the pages are marked all-visible,
>> perhaps via the xmin-preserving mechanism Simon suggested, then you
>> can use the visibility map to skip anti-wraparound vacuum as well as
>> regular vacuum. That sounds to me like it's accomplishing something.
>> Is it a complete solution? No. Is it better than what we have now?
>> Yes.
>
> I do like the idea of using a status bit rather than FrozenXid to mark a
> frozen tuple, because that eliminates the conflict between wanting to
> freeze aggressively for performance reasons and wanting to preserve Xids
> for forensic reasons. But it doesn't seem to do much for Josh's
> original problem.

OK, I see. So maybe we add a Todo to implement that, and then keep
thinking about how to fix Josh's problem.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	marcin mank <marcin(dot)mank(at)gmail(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-09 17:18:18
Message-ID:	27580.1276103898@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

marcin mank <marcin(dot)mank(at)gmail(dot)com> writes:
> Could a tuple wih the bit set be considered frozen already? Would we
> actually ever need to rewrite the xmin, even for anti-wraparound
> reasons?

That's exactly what Simon is suggesting: if we had a tuple status flag
with the semantics of "this xmin is known visible to all current and
future transactions", we could consider setting that bit to be the moral
equivalent of freezing the tuple. The tuple visibility tests would
never actually consult clog for such an xmin and thus we'd never have to
replace it with FrozenXid.

But this doesn't in itself save us any work: we'd still need to treat
setting that bit as a WAL-logged operation, and we'd still need to have
VACUUM track the oldest not-thus-hinted xmins. What it does do is
eliminate the conflict between wanting to freeze tuples aggressively for
various performance reasons and wanting to preserve original xmin values
for forensic reasons.

I wonder how this might play into Heikki's ideas about making the
visibility map trustworthy. If we WAL-logged the operation of "set all
the per-tuple VISIBLE-TO-ALL bits on this page, as well as the page's
bit in the visibility map", then that end of things would be
trustworthy. And all the operations that have to unset the map bit
are already WAL-logged.

regards, tom lane

From:	Jan Wieck <JanWieck(at)Yahoo(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Russell Smith <mr-russ(at)pws(dot)com(dot)au>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-10 19:04:02
Message-ID:	4C113722.5010000@Yahoo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Seems I underestimated the importance of forensic breadcrumbs.

On 6/9/2010 12:09 PM, Tom Lane wrote:
> I do like the idea of using a status bit rather than FrozenXid to mark a
> frozen tuple, because that eliminates the conflict between wanting to
> freeze aggressively for performance reasons and wanting to preserve Xids
> for forensic reasons. But it doesn't seem to do much for Josh's
> original problem.

Depends. Specifically on transaction profiles and how long the blocks
linger around before being written. If you can set the all visible bit
by the time, the page is written the first time, what bit including the
is-frozen one cannot be set at that time too?

Maybe some analysis on the typical behavior of such system is in order.
Especially the case Josh was mentioning seems to be a typical single
insert logging style application, with little else going on on that
particular database. I can't reveal specifics about that particular
case, but think of something like taking frequent sensor readings, that
need to be kept for years for forensics in case there is a product
recall some day.

And even if some cases still required another page write because those
frozen bits cannot be set on first write, this seems to be a win-win. We
would get rid of the FrozenXid completely and shift to a bit, so we can
effectively have a min_ freeze_age of zero while keeping the xid's forever.

Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Jan Wieck <JanWieck(at)Yahoo(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Russell Smith <mr-russ(at)pws(dot)com(dot)au>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-10 19:16:38
Message-ID:	20828.1276197398@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Jan Wieck <JanWieck(at)Yahoo(dot)com> writes:
> Depends. Specifically on transaction profiles and how long the blocks
> linger around before being written. If you can set the all visible bit
> by the time, the page is written the first time, what bit including the
> is-frozen one cannot be set at that time too?

All-visible and is-frozen would be the same bit ...

> And even if some cases still required another page write because those
> frozen bits cannot be set on first write, this seems to be a win-win. We
> would get rid of the FrozenXid completely and shift to a bit, so we can
> effectively have a min_ freeze_age of zero while keeping the xid's forever.

Right. I don't see any downside, other than eating another status bit
per tuple, which we can afford.

regards, tom lane

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Russell Smith <mr-russ(at)pws(dot)com(dot)au>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-10 21:58:06
Message-ID:	AANLkTimPDQTKUkUTEnEvNIlVZ08IDOhCaIMRM7gnMxe4@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jun 3, 2010 at 11:41 AM, Greg Stark <gsstark(at)mit(dot)edu> wrote:
> I think to make it work you need to store a whole 64-bit reference
> transaction id consisting of both a cycle counter and a transaction
> id. The invariant for the page is that every xid on the page can be
> compared to that reference transaction id using normal transactionid
> semantics. Actually I think the easiest way to do that is to set it to
> the oldest xid on the page. The first thing to do before comparing any
> transaction id on the page with a real transaction id would be to
> figure out whether the reference xid is comparable to the live xid,
> which if it's the oldest xid on the page implies they'll all be
> comparable.
>
> The way to maintain that invariant would be that any xid insertion on
> the page must advance the reference xid if it's not comparable to the
> newly inserted xid. It has to be advanced to the oldest xid that's
> still comparable with the newly inserted xid. Any xids on the page
> that are older than the new refernce xid have to be frozen or removed.
> I'm not sure how to do that without keeping clog forever though.

So the more I think about this the more I think it's unavoidable that
we would need to retain clog forever.

I think the goal here is to be able to load data into the database and
then never write the data ever again. Even if you visit the page years
later after the transaction ids have wrapped around several times. In
that case there's no avoiding that you'll need to know whether that
transaction committed or aborted.

Now we could make a bet that most transactions commit and therefore we
could keep a list of aborted transactions only which we might be able
to keep "forever" in very little space if very few transactions abort.
Presumably we would only use this form once the transaction was about
to be truncated out of clog. I'm not too happy with the assumption
that there aren't many aborts though. Someone could come along with a
use case where they have lots of aborts and run into strange
limitations and performance characteristics.

Alternatively we could do something like keeping a list of tables
touched by any transaction. Then vacuum could look for any
non-committed transactions old enough to be in danger of aging out of
clog and ensure those tables are frozen. But any tables which have
never been touched by any such old transaction could be left alone.
when we read in the page we'll be able to recognize the old
transactions as committed if they're beyond the end of the clog
horizon.

I don't really like that idea either because it leaves performance
really quite unpredictable. I could have a large table that goes
unvacuumed for a long time -- then when I come along with some tiny
query where I hit C-c and cause an abort I suddenly set a trap which
causes a huge vacuum freeze to fire off.

--
greg

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Russell Smith <mr-russ(at)pws(dot)com(dot)au>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-12 15:49:45
Message-ID:	201006121549.o5CFnjJ18975@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Stark wrote:
> On Thu, Jun 3, 2010 at 11:41 AM, Greg Stark <gsstark(at)mit(dot)edu> wrote:
> > I think to make it work you need to store a whole 64-bit reference
> > transaction id consisting of both a cycle counter and a transaction
> > id. The invariant for the page is that every xid on the page can be
> > compared to that reference transaction id using normal transactionid
> > semantics. Actually I think the easiest way to do that is to set it to
> > the oldest xid on the page. The first thing to do before comparing any
> > transaction id on the page with a real transaction id would be to
> > figure out whether the reference xid is comparable to the live xid,
> > which if it's the oldest xid on the page implies they'll all be
> > comparable.
> >
> > The way to maintain that invariant would be that any xid insertion on
> > the page must advance the reference xid if it's not comparable to the
> > newly inserted xid. It has to be advanced to the oldest xid that's
> > still comparable with the newly inserted xid. Any xids on the page
> > that are older than the new refernce xid have to be frozen or removed.
> > I'm not sure how to do that without keeping clog forever though.
>
> So the more I think about this the more I think it's unavoidable that
> we would need to retain clog forever.
>
> I think the goal here is to be able to load data into the database and
> then never write the data ever again. Even if you visit the page years
> later after the transaction ids have wrapped around several times. In
> that case there's no avoiding that you'll need to know whether that
> transaction committed or aborted.

I think we might need two bits, one commited and all visible, and
another aborted and all vislble.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ None of us is going to be here forever. +

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Russell Smith <mr-russ(at)pws(dot)com(dot)au>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea for getting rid of VACUUM FREEZE on cold pages
Date:	2010-06-12 16:54:55
Message-ID:	28123.1276361695@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> I think we might need two bits, one commited and all visible, and
> another aborted and all vislble.

Huh? The latter means "vacuumable".

regards, tom lane