Re: getting rid of freezing

Lists: pgsql-hackers
From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: getting rid of freezing
Date: 2013-05-23 17:51:48
Message-ID: 20130523175148.GA29374@alap2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

after having discussed $subject shortly over dinner yesterday, while I
should have been preparing the slides for my talk I noticed that there
might be a rather easy way to get rid of freezing.

I think that the existence of hint bits and the crash safe visibility
maps should provide sufficient tooling to make freezing unneccessary
without loosing much information for debugging if we modify the way
vacuum works a bit.

Currently, aside from recovery, we only set all visible in vacuum.

vacuumlazy.c's lazy_scan_heap currently works like:

for (blkno = 0; blkno < nblocks; blkno++)
{
if (!scan_all && invisible)
continue;

/* cannot lock buffer immediately */
if (!ConditionalLockBufferForCleanup(buf))
{
if (!scan_all)
continue;

/* don't block if we don't need freezing */
if (!lazy_check_needs_freeze(buf))
continue;

/* now wait for cleanup lock */
LockBufferForCleanup(buf);
}

for (tuple in all_tuples)
{
cleanup_tuple();
}

if (nfrozen > 0)
log_heap_freeze()

if (all_visible)
{
PageSetAllVisible(page);
visibilitymap_set(page);
}
}

In other words, if we don't need to make sure there aren't any old
tuples, we only scan visible parts of the relation. If we are making a
freeze vacuum we scan the whole relation, waiting for a cleanup lock on
the relation if necessary.

We currently need to make sure we scanned the whole relation and have
frozen everything to have a sensible relfrozenxid for a relation.

So, what I propose instead is basically:
1) only vacuum non-all-visible pages, even when doing it for
anti-wraparound
2) When we can set all-visible guarantee that all tuples on the page are
fully hinted. During recovery do the same, so we don't need to log
all hint bits.
We can do this with only an exclusive lock on the buffer, we don't
need a cleanup lock.
3) When we cannot mark a page all-visible or we cannot get the cleanup
lock, remember the oldest xmin on that page. We could set all visible
in the former case, but we want the page to be cleaned up sometime
soonish.
4) If we can get the cleanup lock, purge dead tuples from the page and
the indexes, just as today. Set the page as all-visible.

That way we know that any page that is all-visible doesn't ever need to
look at xmin/xmax since we are sure to have set all relevant hint
bits.

We don't even necessarily need to log the hint bits for all items since
the redo for all_visible could make sure all items are hinted. The only
problem is knowing up to where we can truncate pg_clog...

Makes sense?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: getting rid of freezing
Date: 2013-05-23 18:11:46
Message-ID: 20130523181146.GB29374@alap2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2013-05-23 19:51:48 +0200, Andres Freund wrote:
> I think that the existence of hint bits and the crash safe visibility
> maps should provide sufficient tooling to make freezing unneccessary
> without loosing much information for debugging if we modify the way
> vacuum works a bit.

> That way we know that any page that is all-visible doesn't ever need to
> look at xmin/xmax since we are sure to have set all relevant hint
> bits.

One case that would make this problematic is row level locks on
tuples. We would need to unset all visible for them, otherwise we might
do the wrong thing when looking at xmax...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: getting rid of freezing
Date: 2013-05-23 19:03:29
Message-ID: 20130523190329.GC29374@alap2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2013-05-23 19:51:48 +0200, Andres Freund wrote:
> We currently need to make sure we scanned the whole relation and have
> frozen everything to have a sensible relfrozenxid for a relation.
>
> So, what I propose instead is basically:
> 1) only vacuum non-all-visible pages, even when doing it for
> anti-wraparound
> 2) When we can set all-visible guarantee that all tuples on the page are
> fully hinted. During recovery do the same, so we don't need to log
> all hint bits.
> We can do this with only an exclusive lock on the buffer, we don't
> need a cleanup lock.
> 3) When we cannot mark a page all-visible or we cannot get the cleanup
> lock, remember the oldest xmin on that page. We could set all visible
> in the former case, but we want the page to be cleaned up sometime
> soonish.
> 4) If we can get the cleanup lock, purge dead tuples from the page and
> the indexes, just as today. Set the page as all-visible.
>
> That way we know that any page that is all-visible doesn't ever need to
> look at xmin/xmax since we are sure to have set all relevant hint
> bits.

Heikki noticed that I made quite the omission here which is that you
would need to mark tuples as all visible as well. I was thinking about
using HEAP_MOVED_OFF | HEAP_MOVED_IN as a hint for that.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: getting rid of freezing
Date: 2013-05-24 02:09:02
Message-ID: CA+TgmoZMAPbJ554JuT68jGM4Ye3TeMUJGE3=VaCBDGKxAdh0Jw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, May 23, 2013 at 1:51 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> So, what I propose instead is basically:
> 1) only vacuum non-all-visible pages, even when doing it for
> anti-wraparound

Check. We might want an option to force a scan of the whole relation.

> 2) When we can set all-visible guarantee that all tuples on the page are
> fully hinted. During recovery do the same, so we don't need to log
> all hint bits.
> We can do this with only an exclusive lock on the buffer, we don't
> need a cleanup lock.

I don't think this works. Emitting XLOG_HEAP_VISIBLE for a heap page
does not emit an FPI for the heap page, only (if needed) for the
visibility map page. So a subsequent crash that tears the page could
keep XLOG_HEAP_VISIBLE but lose other changes on the page - i.e. the
hint bits.

> 3) When we cannot mark a page all-visible or we cannot get the cleanup
> lock, remember the oldest xmin on that page. We could set all visible
> in the former case, but we want the page to be cleaned up sometime
> soonish.

I think you mean "in the latter case" not "in the former case". If
not, then I'm confused.

> 4) If we can get the cleanup lock, purge dead tuples from the page and
> the indexes, just as today. Set the page as all-visible.
>
> That way we know that any page that is all-visible doesn't ever need to
> look at xmin/xmax since we are sure to have set all relevant hint
> bits.
>
> We don't even necessarily need to log the hint bits for all items since
> the redo for all_visible could make sure all items are hinted. The only
> problem is knowing up to where we can truncate pg_clog...

The redo for all_visible cannot make sure all items are hinted.
Again, there's no FPI on the heap page. The heap page could in fact
contain dead tuples at the time we mark it all-visible. Consider, for
example:

0. Checkpoint.
1. The buffer becomes all visible.
2. A tuple is inserted, making the buffer not-all-visible.
3. The page is written by the OS.
4. Crash.

Now, recovery will first find the record marking the buffer
all-visible, and will mark it all-visible. Now the all-visible bit on
the page is flat-out wrong, but it doesn't matter because we haven't
reached consistency. Next we'll find the heap-insert record, which
will have an FPI, since it's the first WAL-logged change to the buffer
since the last checkpoint. Now the FPI fixes everything and we're
back in a sane state.

Now in this particular case it wouldn't hurt anything if the redo
routine that set the all-visible bit also hinted all the tuples,
because the FPI is going to overwrite it anyway. But suppose in lieu
of steps (3) and (4) we write half of the page and then crash, leaving
behind a torn page. Now it's pretty crazy to think about trying to
hint tuples; the page may be in a completely insane state.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Hannu Krosing <hannu(at)krosing(dot)net>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: getting rid of freezing
Date: 2013-05-24 03:49:37
Message-ID: 519EE351.3090408@krosing.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05/23/2013 10:03 PM, Andres Freund wrote:
> On 2013-05-23 19:51:48 +0200, Andres Freund wrote:
>> We currently need to make sure we scanned the whole relation and have
>> frozen everything to have a sensible relfrozenxid for a relation.
>>
>> So, what I propose instead is basically:
>> 1) only vacuum non-all-visible pages, even when doing it for
>> anti-wraparound
>> 2) When we can set all-visible guarantee that all tuples on the page are
>> fully hinted. During recovery do the same, so we don't need to log
>> all hint bits.
>> We can do this with only an exclusive lock on the buffer, we don't
>> need a cleanup lock.
>> 3) When we cannot mark a page all-visible or we cannot get the cleanup
>> lock, remember the oldest xmin on that page. We could set all visible
>> in the former case, but we want the page to be cleaned up sometime
>> soonish.
>> 4) If we can get the cleanup lock, purge dead tuples from the page and
>> the indexes, just as today. Set the page as all-visible.
>>
>> That way we know that any page that is all-visible doesn't ever need to
>> look at xmin/xmax since we are sure to have set all relevant hint
>> bits.
> Heikki noticed that I made quite the omission here which is that you
> would need to mark tuples as all visible as well. I was thinking about
> using HEAP_MOVED_OFF | HEAP_MOVED_IN as a hint for that.
We could have a "vacuum_less=true" mode, where instead of marking tuples
all visible
here you actually freeze them, that is set the xid to frozen. You will
get less forensic
capability in exchange of less vacuuming.

Maybe also add an "early_freeze" hint bit to mark this situation.

Or maybe set the tuples frozenxid when un-marking the page as all
visible to delay
the effects a little ?

Hannu
>
> Greetings,
>
> Andres Freund
>


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: getting rid of freezing
Date: 2013-05-24 14:53:18
Message-ID: 20130524145318.GE29374@alap2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2013-05-23 22:09:02 -0400, Robert Haas wrote:
> On Thu, May 23, 2013 at 1:51 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > So, what I propose instead is basically:
> > 1) only vacuum non-all-visible pages, even when doing it for
> > anti-wraparound
>
> Check. We might want an option to force a scan of the whole relation.

Yea, thought of that as well. VACUUM (DEEP) ;).

> > 3) When we cannot mark a page all-visible or we cannot get the cleanup
> > lock, remember the oldest xmin on that page. We could set all visible
> > in the former case, but we want the page to be cleaned up sometime
> > soonish.

> I think you mean "in the latter case" not "in the former case". If
> not, then I'm confused.

Uh. Yes.

> > We don't even necessarily need to log the hint bits for all items since
> > the redo for all_visible could make sure all items are hinted. The only
> > problem is knowing up to where we can truncate pg_clog...

> [all-visible cannot restore hint bits without FPI because of torn pages]

I haven't yet thought about this sufficiently yet. I think we might have
a chance of working around this, let me ponder a bit.

But even if that means needing a full page write via the usual mechanism
for all visible if any hint bits needed to be set we are still out far
ahead of the current state imo.
* cleanup would quite possibly do an FPI shortly after in vacuum
anyway. If we do it for all visible, it possibly does not need to be
done for it.
* freezing would FPI almost guaranteedly since we do it so much
later.
* Not having to rescan the whole heap will be a bigger cost saving...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Jim Nasby <jim(at)nasby(dot)net>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: getting rid of freezing
Date: 2013-05-24 15:01:09
Message-ID: 519F80B5.2070509@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 5/24/13 9:53 AM, Andres Freund wrote:
>>> We don't even necessarily need to log the hint bits for all items since
>>> > >the redo for all_visible could make sure all items are hinted. The only
>>> > >problem is knowing up to where we can truncate pg_clog...
>> >[all-visible cannot restore hint bits without FPI because of torn pages]
> I haven't yet thought about this sufficiently yet. I think we might have
> a chance of working around this, let me ponder a bit.
>
> But even if that means needing a full page write via the usual mechanism
> for all visible if any hint bits needed to be set we are still out far
> ahead of the current state imo.
> * cleanup would quite possibly do an FPI shortly after in vacuum
> anyway. If we do it for all visible, it possibly does not need to be
> done for it.
> * freezing would FPI almost guaranteedly since we do it so much
> later.
> * Not having to rescan the whole heap will be a bigger cost saving...

Would we only set all the hint bits within vacuum? If so I don't think the WAL hit matters at all, because vacuum is almost always a background, throttled process.
--
Jim C. Nasby, Data Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: getting rid of freezing
Date: 2013-05-24 15:29:10
Message-ID: CA+Tgmobn4qjsT5esEOhSaUSxyk3wMBFtM60SbARbEEMw-Ud9LA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, May 24, 2013 at 10:53 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> [all-visible cannot restore hint bits without FPI because of torn pages]
>
> I haven't yet thought about this sufficiently yet. I think we might have
> a chance of working around this, let me ponder a bit.

Yeah. I too feel like there might be a solution. But I don't know
have something specific in mind, yet anyway.

> But even if that means needing a full page write via the usual mechanism
> for all visible if any hint bits needed to be set we are still out far
> ahead of the current state imo.
> * cleanup would quite possibly do an FPI shortly after in vacuum
> anyway. If we do it for all visible, it possibly does not need to be
> done for it.
> * freezing would FPI almost guaranteedly since we do it so much
> later.
> * Not having to rescan the whole heap will be a bigger cost saving...

The basic problem is that if the data is going to be removed before it
would have gotten frozen, then the extra FPIs are just overhead. In
effect, we're just deciding to freeze a lot sooner. And while that
might well be beneficial in some use cases (e.g. the data's already in
cache) it might also not be so beneficial (the table is larger than
cache and would have been dropped before freezing kicked in).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: getting rid of freezing
Date: 2013-05-24 15:52:19
Message-ID: 20130524155219.GG29374@alap2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2013-05-24 11:29:10 -0400, Robert Haas wrote:
> > But even if that means needing a full page write via the usual mechanism
> > for all visible if any hint bits needed to be set we are still out far
> > ahead of the current state imo.
> > * cleanup would quite possibly do an FPI shortly after in vacuum
> > anyway. If we do it for all visible, it possibly does not need to be
> > done for it.
> > * freezing would FPI almost guaranteedly since we do it so much
> > later.
> > * Not having to rescan the whole heap will be a bigger cost saving...
>
> The basic problem is that if the data is going to be removed before it
> would have gotten frozen, then the extra FPIs are just overhead. In
> effect, we're just deciding to freeze a lot sooner.

Well, freezing without removing information for debugging.

> And while that
> might well be beneficial in some use cases (e.g. the data's already in
> cache) it might also not be so beneficial (the table is larger than
> cache and would have been dropped before freezing kicked in).

Not sure how caching comes into play here? At this point we know the
page to be in cache already since vacuum is looking at it anyway?

I think it's not really comparable since in those situations we a)
already do an XLogInsert(). b) already dirty the page. so the only
change is that we possibly write an additionall full page image. If
there is actually near future DML write activity that would make the
all-visible superflous that would have to FPI likely anyway.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: getting rid of freezing
Date: 2013-05-24 16:00:00
Message-ID: CA+TgmoZz8H8LJgKdAf61L+XefU+pYnXqMoUpzYSEhwStAuJqFw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, May 24, 2013 at 11:29 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, May 24, 2013 at 10:53 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>>> [all-visible cannot restore hint bits without FPI because of torn pages]
>>
>> I haven't yet thought about this sufficiently yet. I think we might have
>> a chance of working around this, let me ponder a bit.
>
> Yeah. I too feel like there might be a solution. But I don't know
> have something specific in mind, yet anyway.

One thought I had is that it might be beneficial to freeze when a page
ceases to be all-visible, rather than when it becomes all-visible.
Any operation that makes the page not-all-visible is going to emit an
FPI anyway, so we don't have to worry about torn pages in that case.
Under such a scheme, we'd have to enforce the rule that xmin and xmax
are ignored for any page that is all-visible; and when a page ceases
to be all-visible, we have to go back and really freeze the
pre-existing tuples. I think we might be able to use the existing
all_visible_cleared/new_all_visible_cleared flags to trigger this
behavior, without adding anything new to WAL at all.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: getting rid of freezing
Date: 2013-05-24 16:11:49
Message-ID: CA+TgmobqDh+ruU5gX0cXbakUF9kw7PK5OfukoA=b9xqxEb-X_g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, May 24, 2013 at 11:52 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> The basic problem is that if the data is going to be removed before it
>> would have gotten frozen, then the extra FPIs are just overhead. In
>> effect, we're just deciding to freeze a lot sooner.
>
> Well, freezing without removing information for debugging.

Sure, but what I'm trying to avoid is incurring the WAL cost of
freezing. If we didn't mind paying that sooner, we could just drop
vacuum_freeze_min/table_age. But we do mind that.

>> And while that
>> might well be beneficial in some use cases (e.g. the data's already in
>> cache) it might also not be so beneficial (the table is larger than
>> cache and would have been dropped before freezing kicked in).
>
> Not sure how caching comes into play here? At this point we know the
> page to be in cache already since vacuum is looking at it anyway?

OK, true.

> I think it's not really comparable since in those situations we a)
> already do an XLogInsert(). b) already dirty the page. so the only
> change is that we possibly write an additionall full page image. If
> there is actually near future DML write activity that would make the
> all-visible superflous that would have to FPI likely anyway.

Well, if there's near-future write activity, then freezing is pretty
worthless anyway. What I'm trying to avoid is adding WAL overhead in
the case where there *isnt* any near-future write activity, like
inserting 100MB of data into an existing table.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: getting rid of freezing
Date: 2013-05-24 19:49:31
Message-ID: 519FC44B.3000902@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres,

If I understand your solution correctly, though, this doesn't really
help the pathological case for freezing, which is the time-oriented
append-only table. For data which isn't being used, allvisible won't be
set either because it won't have been read, no? Is it still cheaper to
set allvisible than vacuum freeze even in that case?

Don't get me wrong, I'm in favor of this if it fixes the other (more
common) cases. I just want to be clear on the limitations.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: getting rid of freezing
Date: 2013-05-24 20:03:05
Message-ID: 20130524200305.GJ29374@alap2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2013-05-24 15:49:31 -0400, Josh Berkus wrote:
> If I understand your solution correctly, though, this doesn't really
> help the pathological case for freezing, which is the time-oriented
> append-only table. For data which isn't being used, allvisible won't be
> set either because it won't have been read, no? Is it still cheaper to
> set allvisible than vacuum freeze even in that case?

all visible is only set in vacuum and it determines which parts of a
table will be scanned in a non full table vacuum. So, since we won't
regularly start vacuum in the insert only case there will still be a
batch of work at once. But nearly all of that work is *already*
performed. We would just what the details of that around for a
bit. *But* since we now would only need to vacuum the non all-visible
part that would get noticeably cheaper as well.

I think for that case we should run vacuum more regularly for insert
only tables since we currently don't do regularly enough which a) increases
the amount of work needed at once and b) prevents index only scans from
working there.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: getting rid of freezing
Date: 2013-05-25 06:23:15
Message-ID: 51A058D3.8070001@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05/24/2013 07:00 PM, Robert Haas wrote:
> On Fri, May 24, 2013 at 11:29 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Fri, May 24, 2013 at 10:53 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>>>> [all-visible cannot restore hint bits without FPI because of torn pages]
>>> I haven't yet thought about this sufficiently yet. I think we might have
>>> a chance of working around this, let me ponder a bit.
>> Yeah. I too feel like there might be a solution. But I don't know
>> have something specific in mind, yet anyway.
> One thought I had is that it might be beneficial to freeze when a page
> ceases to be all-visible, rather than when it becomes all-visible.
That what I aimed to describe in my mail earlier, but your
description is much clearer :)
> Any operation that makes the page not-all-visible is going to emit an
> FPI anyway, so we don't have to worry about torn pages in that case.
> Under such a scheme, we'd have to enforce the rule that xmin and xmax
> are ignored for any page that is all-visible;
Agreed. We already relay on all-visible pages enough that we
can trust it to be correct. Making that universal rule should not
add any risks .
The rule "page all-visible ==> assume all tuples frozen" would
also enable VACUUM FREEZE to only work only on the
non-all-visible pages .
> and when a page ceases
> to be all-visible, we have to go back and really freeze the
> pre-existing tuples.
We can do this unconditionally, or in milder case use vacuum_freeze_min_age
if we want to retain xids for forensic purposes.
> I think we might be able to use the existing
> all_visible_cleared/new_all_visible_cleared flags to trigger this
> behavior, without adding anything new to WAL at all.
This seems to be easiest

--
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: getting rid of freezing
Date: 2013-05-25 10:14:29
Message-ID: CA+U5nMLGmJsumftGphnsmKptNa67CJn7dT_2=9BaQ43e0vFChw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 24 May 2013 17:00, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, May 24, 2013 at 11:29 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Fri, May 24, 2013 at 10:53 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>>>> [all-visible cannot restore hint bits without FPI because of torn pages]
>>>
>>> I haven't yet thought about this sufficiently yet. I think we might have
>>> a chance of working around this, let me ponder a bit.
>>
>> Yeah. I too feel like there might be a solution. But I don't know
>> have something specific in mind, yet anyway.
>
> One thought I had is that it might be beneficial to freeze when a page
> ceases to be all-visible, rather than when it becomes all-visible.
> Any operation that makes the page not-all-visible is going to emit an
> FPI anyway, so we don't have to worry about torn pages in that case.
> Under such a scheme, we'd have to enforce the rule that xmin and xmax
> are ignored for any page that is all-visible; and when a page ceases
> to be all-visible, we have to go back and really freeze the
> pre-existing tuples. I think we might be able to use the existing
> all_visible_cleared/new_all_visible_cleared flags to trigger this
> behavior, without adding anything new to WAL at all.

I like the idea but it would mean we'd have to freeze in the
foreground path rather in a background path.

Have we given up on the double buffering idea to remove FPIs
completely? If we did that, then this wouldn't work.

Anyway, I take it the direction of this idea is that "we don't need a
separate freezemap, just use the vismap". That seems to be forcing
ideas down a particular route we may regret. I'd rather just keep
those things separate, even if we manage to merge the WAL actions for
most of the time.

Some other related thoughts:

ISTM that if we really care about keeping xids for debug purposes that
it could be a parameter. For the mainline, we just freeze blocks at
the same time we do page pruning.

I think the right way is actually to rethink and simplify all this
complexity of Freezing/Pruning/Hinting/Visibility

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: getting rid of freezing
Date: 2013-05-25 22:37:13
Message-ID: 51A13D19.1070706@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres,

> all visible is only set in vacuum and it determines which parts of a
> table will be scanned in a non full table vacuum. So, since we won't
> regularly start vacuum in the insert only case there will still be a
> batch of work at once. But nearly all of that work is *already*
> performed. We would just what the details of that around for a
> bit. *But* since we now would only need to vacuum the non all-visible
> part that would get noticeably cheaper as well.

Yeah, I can see that. Seems worthwhile, then.

> I think for that case we should run vacuum more regularly for insert
> only tables since we currently don't do regularly enough which a) increases
> the amount of work needed at once and b) prevents index only scans from
> working there.

Yes. I'm not sure how we would set this though; I think it's another
example of how autovacuum's parameters for when to vaccuum etc. are too
simple-minded for the real world. Doing an all-visible scan on an
insert-only table, for example, should be based on XID age and not on %
inserted, no?

Speaking of which, I need to get on revamping the math for autoanalyze.

Mind you, in the real-world insert-only table case, this does create
extra IO -- real insert-only tables often have a few rows ( < 5% ) which
are updated/deleted. Vacuum would see these and want to clean the pages
up, which would create much more substantial IO. It might still be a
good tradeoff, but we should be aware of it.

Unless we want a special VACUUM ALL VISIBLE mode. I vote no, unless we
demonstrate some really convincing case for it.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: getting rid of freezing
Date: 2013-05-26 12:15:14
Message-ID: 51A1FCD2.4070300@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05/25/2013 01:14 PM, Simon Riggs wrote:
> On 24 May 2013 17:00, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Fri, May 24, 2013 at 11:29 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> On Fri, May 24, 2013 at 10:53 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>>>>> [all-visible cannot restore hint bits without FPI because of torn pages]
>>>> I haven't yet thought about this sufficiently yet. I think we might have
>>>> a chance of working around this, let me ponder a bit.
>>> Yeah. I too feel like there might be a solution. But I don't know
>>> have something specific in mind, yet anyway.
>> One thought I had is that it might be beneficial to freeze when a page
>> ceases to be all-visible, rather than when it becomes all-visible.
>> Any operation that makes the page not-all-visible is going to emit an
>> FPI anyway, so we don't have to worry about torn pages in that case.
>> Under such a scheme, we'd have to enforce the rule that xmin and xmax
>> are ignored for any page that is all-visible; and when a page ceases
>> to be all-visible, we have to go back and really freeze the
>> pre-existing tuples. I think we might be able to use the existing
>> all_visible_cleared/new_all_visible_cleared flags to trigger this
>> behavior, without adding anything new to WAL at all.
> I like the idea but it would mean we'd have to freeze in the
> foreground path rather in a background path.
>
> Have we given up on the double buffering idea to remove FPIs
> completely? If we did that, then this wouldn't work.
>
> Anyway, I take it the direction of this idea is that "we don't need a
> separate freezemap, just use the vismap". That seems to be forcing
> ideas down a particular route we may regret. I'd rather just keep
> those things separate, even if we manage to merge the WAL actions for
> most of the time.
>
>
> Some other related thoughts:
>
> ISTM that if we really care about keeping xids for debug purposes that
> it could be a parameter. For the mainline, we just freeze blocks at
> the same time we do page pruning.
>
> I think the right way is actually to rethink and simplify all this
> complexity of Freezing/Pruning/Hinting/Visibility
I think that tis xmin, xmax business is mainly leftovers from the time when
PostgreSQL was a full history database. If we are happy to descide that we
do not want to resurrect this feature, at least not the same way, then
freezing
at the earliest or most convenient possibility seems the way to go .

The "forensic" part has always been just a nice side effect of this
design and
not the main design considerataion.

--
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: getting rid of freezing
Date: 2013-05-26 23:58:58
Message-ID: 51A2A1C2.40809@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres,

I was talking this over with Jeff on the plane, and we wanted to be
clear on your goals here: are you looking to eliminate the *write* cost
of freezing, or just the *read* cost of re-reading already frozen pages?

If just the latter, what about just adding a bit to the visibility map
to indicate that the page is frozen? That seems simpler than what
you're proposing.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: getting rid of freezing
Date: 2013-05-28 14:15:10
Message-ID: CA+TgmoZbsaYAwd3hrp+WzL9ExxuQ3+g3Bu0ef37+q_ZBJoyj_A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, May 25, 2013 at 6:14 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> One thought I had is that it might be beneficial to freeze when a page
>> ceases to be all-visible, rather than when it becomes all-visible.
>> Any operation that makes the page not-all-visible is going to emit an
>> FPI anyway, so we don't have to worry about torn pages in that case.
>> Under such a scheme, we'd have to enforce the rule that xmin and xmax
>> are ignored for any page that is all-visible; and when a page ceases
>> to be all-visible, we have to go back and really freeze the
>> pre-existing tuples. I think we might be able to use the existing
>> all_visible_cleared/new_all_visible_cleared flags to trigger this
>> behavior, without adding anything new to WAL at all.
>
> I like the idea but it would mean we'd have to freeze in the
> foreground path rather in a background path.

That's true, but I think with this approach it would be really cheap.
The overhead of setting a few bits in a page is very small compared to
the overhead of emitting a WAL record. We'd have to test it, but I
wouldn't be surprised to find the cost is too small to measure.

> Have we given up on the double buffering idea to remove FPIs
> completely? If we did that, then this wouldn't work.

I don't see why those things are mutually exclusive. What is the relationship?

> Anyway, I take it the direction of this idea is that "we don't need a
> separate freezemap, just use the vismap". That seems to be forcing
> ideas down a particular route we may regret. I'd rather just keep
> those things separate, even if we manage to merge the WAL actions for
> most of the time.

Hmm. To me it seems highly desirable to merge those things, because
they're basically the same thing. The earliest time at which we can
freeze a tuple is when it's all-visible, and the only argument I've
ever heard for waiting longer is to preserve the original xmin for
forensic purposes, which I think we can do anyway. I have posted a
patch for that on another thread. I don't like having two separate
concepts where one will do; I think the fact that it is structured
that way today is mostly an artifact of one setting being page-level
and the other tuple-level, which is a thin excuse for so much
complexity.

> I think the right way is actually to rethink and simplify all this
> complexity of Freezing/Pruning/Hinting/Visibility

I agree, but I think that's likely to have to wait until we get a
pluggable storage API, and then a few years beyond that for someone to
develop the technology to enable the new and better way. In the
meantime, if we can eliminate or even reduce the impact of freezing in
the near term, I think that's worth doing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: getting rid of freezing
Date: 2013-05-28 14:17:52
Message-ID: 20130528141752.GD4274@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2013-05-26 16:58:58 -0700, Josh Berkus wrote:
> I was talking this over with Jeff on the plane, and we wanted to be
> clear on your goals here: are you looking to eliminate the *write* cost
> of freezing, or just the *read* cost of re-reading already frozen pages?

Both. The latter is what I have seen causing more hurt, but the former
alone is painful enough.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: getting rid of freezing
Date: 2013-05-28 16:29:26
Message-ID: 51A4DB66.70004@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05/28/2013 07:17 AM, Andres Freund wrote:
> On 2013-05-26 16:58:58 -0700, Josh Berkus wrote:
>> I was talking this over with Jeff on the plane, and we wanted to be
>> clear on your goals here: are you looking to eliminate the *write* cost
>> of freezing, or just the *read* cost of re-reading already frozen pages?
>
> Both. The latter is what I have seen causing more hurt, but the former
> alone is painful enough.

I guess I don't see how your proposal is reducing the write cost for
most users then?

- for users with frequently, randomly updated data, pdallvisible would
not be ever set, so they still need to be rewritten to freeze
- for users with append-only tables, allvisible would never be set since
those pages don't get vacuumed
- it would prevent us from getting rid of allvisible, which has a
documented and known write overhead

This means that your optimization would benefit only users whose pages
get updated occasionally (enough to trigger vaccuum) but not too
frequently (which would unset allvisible). While we lack statistics,
intuition suggests that this is a minority of databases.

If we just wanted to reduce read cost, why not just take a simpler
approach and give the visibility map a "isfrozen" bit? Then we'd know
which pages didn't need rescanning without nearly as much complexity.
That would also make it more effective to do precautionary vacuum freezing.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: getting rid of freezing
Date: 2013-05-28 23:22:42
Message-ID: 20130528232242.GA818@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2013-05-28 09:29:26 -0700, Josh Berkus wrote:
> On 05/28/2013 07:17 AM, Andres Freund wrote:
> > On 2013-05-26 16:58:58 -0700, Josh Berkus wrote:
> >> I was talking this over with Jeff on the plane, and we wanted to be
> >> clear on your goals here: are you looking to eliminate the *write* cost
> >> of freezing, or just the *read* cost of re-reading already frozen pages?
> >
> > Both. The latter is what I have seen causing more hurt, but the former
> > alone is painful enough.
>
> I guess I don't see how your proposal is reducing the write cost for
> most users then?
>
> - for users with frequently, randomly updated data, pdallvisible would
> not be ever set, so they still need to be rewritten to freeze

If they update all data they simply never need to get frozen since they
are not old enough.

> - for users with append-only tables, allvisible would never be set since
> those pages don't get vacuumed

They do get vacuumed at least every autovacuum_freeze_max_age even
now. And we should vacuum them more often to make index only scan work
without manual intervention.

> - it would prevent us from getting rid of allvisible, which has a
> documented and known write overhead

Aha.

> This means that your optimization would benefit only users whose pages
> get updated occasionally (enough to trigger vaccuum) but not too
> frequently (which would unset allvisible). While we lack statistics,
> intuition suggests that this is a minority of databases.

I don't think that follows.

> If we just wanted to reduce read cost, why not just take a simpler
> approach and give the visibility map a "isfrozen" bit? Then we'd know
> which pages didn't need rescanning without nearly as much complexity.
> That would also make it more effective to do precautionary vacuum freezing.

Because we would still write/dirty/xlog the changes three times?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: getting rid of freezing
Date: 2013-05-28 23:51:57
Message-ID: CA+TgmoZcq9+C6FAD_R-GdTHbyoNUwMk7LaJqt7ZK8iZLAu6gLw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, May 28, 2013 at 12:29 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> On 05/28/2013 07:17 AM, Andres Freund wrote:
>> On 2013-05-26 16:58:58 -0700, Josh Berkus wrote:
>>> I was talking this over with Jeff on the plane, and we wanted to be
>>> clear on your goals here: are you looking to eliminate the *write* cost
>>> of freezing, or just the *read* cost of re-reading already frozen pages?
>>
>> Both. The latter is what I have seen causing more hurt, but the former
>> alone is painful enough.
>
> I guess I don't see how your proposal is reducing the write cost for
> most users then?
>
> - for users with frequently, randomly updated data, pdallvisible would
> not be ever set, so they still need to be rewritten to freeze

Do these users never run vacuum? As of 9.3, vacuum phase 2 will
typically set PD_ALL_VISIBLE on each relevant page. The only time
that this WON'T happen is if an insert, update, or delete hits the
page after phases 1 of vacuum and before phase 2 of vacuum. I don't
think that's going to be the common case.

> - for users with append-only tables, allvisible would never be set since
> those pages don't get vacuumed

There's no good solution for append-only tables. Eventually, they
will get vacuumed, and when that happens, PD_ALL_VISIBLE will be set,
and freezing will also happen. I don't think anything that is being
proposed here is going to make that a whole lot better, but it
shouldn't make it any worse than it is now, either. Since it's
probably not solvable without a rewrite of the heap AM, I'm not going
to feel too bad about that.

> - it would prevent us from getting rid of allvisible, which has a
> documented and known write overhead

Again, I think this is going to be much less of an issue with 9.3, for
the reason explained above. In 9.2 and prior, we'd scan a page with
dead tuples, prune them to line pointers, vacuum the indexes, and then
mark the dead pointers as unused. Then, the NEXT vacuum would revisit
the same page and dirty it again ONLY to mark it all-visible. But in
9.3, the first vacuum will mark the page all-visible at the same time
it marks the dead line pointers unused. So the write overhead of
PD_ALL_VISIBLE should basically be gone. If it's not, it would be
good to know why.

> If we just wanted to reduce read cost, why not just take a simpler
> approach and give the visibility map a "isfrozen" bit? Then we'd know
> which pages didn't need rescanning without nearly as much complexity.

That would break pg_upgrade, which would have to remove visibility map
forks when upgrading. More importantly, it would require another
round of complex changes to the write-ahead logging in this area.
It's not obvious to me that we'd end up ahead of where we are today,
although perhaps I am a pessimist.

> That would also make it more effective to do precautionary vacuum freezing.

But wouldn't it be a whole lot nicer if we just didn't have to do
vacuum freezing AT ALL? The point here is to absorb freezing into
some other operation that we already have to do.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: getting rid of freezing
Date: 2013-05-29 17:11:13
Message-ID: 1369847473.9884.7.camel@jdavis-laptop
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2013-05-28 at 19:51 -0400, Robert Haas wrote:
> > If we just wanted to reduce read cost, why not just take a simpler
> > approach and give the visibility map a "isfrozen" bit? Then we'd know
> > which pages didn't need rescanning without nearly as much complexity.
>
> That would break pg_upgrade, which would have to remove visibility map
> forks when upgrading. More importantly, it would require another
> round of complex changes to the write-ahead logging in this area.
> It's not obvious to me that we'd end up ahead of where we are today,
> although perhaps I am a pessimist.

If we removed PD_ALL_VISIBLE, then this would be very simple, right? We
would just follow normal logging rules for setting the visible or frozen
bit.

Regards,
Jeff Davis


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: getting rid of freezing
Date: 2013-05-29 17:18:35
Message-ID: 1369847915.9884.10.camel@jdavis-laptop
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2013-05-28 at 09:29 -0700, Josh Berkus wrote:
> - it would prevent us from getting rid of allvisible, which has a
> documented and known write overhead

It would? I don't think these proposals are necessarily in conflict.
It's not entirely clear to me how they fit together in detail, but it
seems like it may be possible -- it may even simplify things.

Regards,
Jeff Davis


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: getting rid of freezing
Date: 2013-06-01 17:56:34
Message-ID: CA+U5nM+KroxQOggRm7Mv3sap-ZG4Zr_G1yT5r8h-C2jq8Ecu7g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 28 May 2013 15:15, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Sat, May 25, 2013 at 6:14 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

>> I think the right way is actually to rethink and simplify all this
>> complexity of Freezing/Pruning/Hinting/Visibility
>
> I agree, but I think that's likely to have to wait until we get a
> pluggable storage API, and then a few years beyond that for someone to
> develop the technology to enable the new and better way. In the
> meantime, if we can eliminate or even reduce the impact of freezing in
> the near term, I think that's worth doing.

I think we can do better more quickly than that.

Andres' basic idea of skipping freeze completely was a valuable one
and is the right way forwards. And it looks like the epoch based
approach that Heikki and I have come up seems likely to end up
somewhere workable.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services