Re: removing PD_ALL_VISIBLE

Lists: pgsql-hackers
From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: removing PD_ALL_VISIBLE
Date: 2013-05-30 02:46:58
Message-ID: CA+TgmoYd4UQhq0Vopu-7X2JB7TcsSX1z0=OJDUVRikWabaST1A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, May 29, 2013 at 1:11 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> On Tue, 2013-05-28 at 19:51 -0400, Robert Haas wrote:
>> > If we just wanted to reduce read cost, why not just take a simpler
>> > approach and give the visibility map a "isfrozen" bit? Then we'd know
>> > which pages didn't need rescanning without nearly as much complexity.
>>
>> That would break pg_upgrade, which would have to remove visibility map
>> forks when upgrading. More importantly, it would require another
>> round of complex changes to the write-ahead logging in this area.
>> It's not obvious to me that we'd end up ahead of where we are today,
>> although perhaps I am a pessimist.
>
> If we removed PD_ALL_VISIBLE, then this would be very simple, right? We
> would just follow normal logging rules for setting the visible or frozen
> bit.

I don't see how that makes Josh's proposal any simpler. His proposal
was to change, in a backward-incompatible fashion, the contents of the
visibility map. Getting rid of PD_ALL_VISIBLE will not eliminate that
backward-incompatibility. Neither will it eliminate the need to keep
the visibility/freeze map in sync with the heap itself. Whether we
get rid of PD_ALL_VISIBLE or not, we'll still have to go look at every
type of WAL record that clears the visibility map bit and make it
clear both the visibility and freeze bits. We'll still need a WAL
record to set the visibility map bit, just as we do today, and we'll
also need a new WAL record type (or a change to the existing WAL
record type) to set the all-frozen bit, when applicable.

Now, independently of Josh's proposal, we could change PD_ALL_VISIBLE
to emit FPIs for every heap page it touches. For pages that have been
hit by updates or deletes, this would be pretty much free, in 9.3,
since the PD_ALL_VISIBLE bit will probably be set at the same time
we're setting dead line pointers to unused, which is a WAL-logged
operation anyway. However, for pages that have been hit only by
inserts, this would emit many extra FPIs.

Again independently of Josh's proposal, we could eliminate
PD_ALL_VISIBLE. This would require either surrendering the
optimization whereby sequential scans can skip visibility checks on
individual tuples within the page, or referring to the visibility map
to get the bit that way. I know you tested this and couldn't measure
an impact, but with all respect I find that result hard to accept.
Contention around buffer locks and pins is very real; why should it
matter on other workloads and not matter on this one? It would also
require page modifications prior to consistency to clear the
visibility map bit unconditionally, instead of only when
PD_ALL_VISIBLE is set on the page (though I think it'd be OK to pay
that price if it ended there).

AFAICS, the main benefit of eliminating PD_ALL_VISIBLE is that we
eliminate one write cycle; that is, we won't dirty the page once to
hint it and then again to mark it all-visible. But as of 9.3, that
should really only be a problem in the insert-only case. And in that
case, my proposal to consider all-visible pages as frozen would be a
huge win, because you'd only need to emit XLOG_HEAP_VISIBLE for every
page in the heap, rather than XLOG_HEAP_FREEZE.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: removing PD_ALL_VISIBLE
Date: 2013-05-30 04:06:59
Message-ID: 1369886819.23418.10.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2013-05-29 at 22:46 -0400, Robert Haas wrote:
> Again independently of Josh's proposal, we could eliminate
> PD_ALL_VISIBLE. This would require either surrendering the
> optimization whereby sequential scans can skip visibility checks on
> individual tuples within the page, or referring to the visibility map
> to get the bit that way. I know you tested this and couldn't measure
> an impact, but with all respect I find that result hard to accept.
> Contention around buffer locks and pins is very real; why should it
> matter on other workloads and not matter on this one?

The number of pins required during a sequential scan without my patch
is:

PAGES_IN_HEAP

The number of pins required during a sequential scan with my patch is:

PAGES_IN_HEAP + ceil(PAGES_IN_HEAP/HEAP_PAGES_PER_VM_PAGE)

Because HEAP_PAGES_PER_VM_PAGE is huge, the second term only matters
when N is very small and the "ceil" is significant. So, I simply elected
not to use the VM if the table is less than 32 pages in size. For such
small tables, the benefit of using a page-at-a-time visibility check was
not apparent in my tests anyway.

> AFAICS, the main benefit of eliminating PD_ALL_VISIBLE is that we
> eliminate one write cycle; that is, we won't dirty the page once to
> hint it and then again to mark it all-visible. But as of 9.3, that
> should really only be a problem in the insert-only case. And in that
> case, my proposal to consider all-visible pages as frozen would be a
> huge win, because you'd only need to emit XLOG_HEAP_VISIBLE for every
> page in the heap, rather than XLOG_HEAP_FREEZE.

Agreed.

Regards,
Jeff Davis


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: removing PD_ALL_VISIBLE
Date: 2013-05-30 11:54:38
Message-ID: CA+TgmoavRRxVx6K3Qqmsbt-vuXsDD1Mxsn_PBw3U1gccCaRncQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, May 30, 2013 at 12:06 AM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>> AFAICS, the main benefit of eliminating PD_ALL_VISIBLE is that we
>> eliminate one write cycle; that is, we won't dirty the page once to
>> hint it and then again to mark it all-visible. But as of 9.3, that
>> should really only be a problem in the insert-only case. And in that
>> case, my proposal to consider all-visible pages as frozen would be a
>> huge win, because you'd only need to emit XLOG_HEAP_VISIBLE for every
>> page in the heap, rather than XLOG_HEAP_FREEZE.
>
> Agreed.

Just to quantify that a bit more, I ran this command a couple of times:

dropdb rhaas ; sleep 5 ; createdb ; sleep 5 ; pgbench -i -s 1000 -n;
sleep 5 ; time psql -c checkpoint ; time psql -c 'vacuum'

And also this one:

dropdb rhaas ; sleep 5 ; createdb ; sleep 5 ; pgbench -i -s 1000 -n;
sleep 5 ; time psql -c checkpoint ; time psql -c 'vacuum freeze'

In the first one, the vacuum at the end takes about 25 seconds. In
the second one, it takes about 15 minutes, during which time there's
one CPU core running at about 10%; the remainder of the time is spent
waiting for disk I/O. A little follow-up testing shows that the
vacuum emits 88MB of WAL, while the vacuum freeze emits 13GB of WAL.

This is on the 16-core, 64-thread IBM POWER box with the following
non-default configuration settings:

shared_buffers = 8GB
maintenance_work_mem = 1GB
synchronous_commit = off
checkpoint_segments = 300
checkpoint_timeout = 15min
checkpoint_completion_target = 0.9
log_line_prefix = '%t [%p] '

Andres' proposal for freezing at the same time we mark pages
all-visible relies on emitting FPIs when we mark pages all-visible,
but I hope that the test above is convincing evidence that it would be
*really* expensive for some users. My proposal to consider
all-visible pages as frozen avoids that cost, but as far as I can see,
it also requires PD_ALL_VISIBLE to stick around.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: removing PD_ALL_VISIBLE
Date: 2013-05-30 12:12:08
Message-ID: 20130530121208.GA7466@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2013-05-30 07:54:38 -0400, Robert Haas wrote:
> On Thu, May 30, 2013 at 12:06 AM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> >> AFAICS, the main benefit of eliminating PD_ALL_VISIBLE is that we
> >> eliminate one write cycle; that is, we won't dirty the page once to
> >> hint it and then again to mark it all-visible. But as of 9.3, that
> >> should really only be a problem in the insert-only case. And in that
> >> case, my proposal to consider all-visible pages as frozen would be a
> >> huge win, because you'd only need to emit XLOG_HEAP_VISIBLE for every
> >> page in the heap, rather than XLOG_HEAP_FREEZE.
> >
> > Agreed.
>
> Just to quantify that a bit more, I ran this command a couple of times:
>
> dropdb rhaas ; sleep 5 ; createdb ; sleep 5 ; pgbench -i -s 1000 -n;
> sleep 5 ; time psql -c checkpoint ; time psql -c 'vacuum'
>
> And also this one:
>
> dropdb rhaas ; sleep 5 ; createdb ; sleep 5 ; pgbench -i -s 1000 -n;
> sleep 5 ; time psql -c checkpoint ; time psql -c 'vacuum freeze'
>
> In the first one, the vacuum at the end takes about 25 seconds. In
> the second one, it takes about 15 minutes, during which time there's
> one CPU core running at about 10%; the remainder of the time is spent
> waiting for disk I/O. A little follow-up testing shows that the
> vacuum emits 88MB of WAL, while the vacuum freeze emits 13GB of WAL.
>
> This is on the 16-core, 64-thread IBM POWER box with the following
> non-default configuration settings:
>
> shared_buffers = 8GB
> maintenance_work_mem = 1GB
> synchronous_commit = off
> checkpoint_segments = 300
> checkpoint_timeout = 15min
> checkpoint_completion_target = 0.9
> log_line_prefix = '%t [%p] '
>
> Andres' proposal for freezing at the same time we mark pages
> all-visible relies on emitting FPIs when we mark pages all-visible,
> but I hope that the test above is convincing evidence that it would be
> *really* expensive for some users. My proposal to consider
> all-visible pages as frozen avoids that cost

I think I basically suggested treating all visible as frozen, didn't I?
If not, I had lost sync between my fingers and my thoughts which happens
too often ;).
You had noticed that my proposed was lacking a bit around when we omit
FPIs for the page while setting all-visible, but we both thought that we
may find a workaround that - which looking at the page level flag first
basically is.

As far as I understand the trick basically is that we can rely on an FPI
being logged when an action unsetting ALL_VISIBLE is performed. That
all-visible would then make sure the hint-bits marking indvidual tuples
as frozen would hit disk. For that we need to add some more work though,
consider:

1) write tuples on a page
2) "freeze" page by setting ALL_VISIBLE and setting hint
bits. Setting ALL_VISIBLE is wall logged
3) crash
4) replay ALL_VISIBLE, set it on the page level. The individual tuples
are *not* guaranteed to be marked frozen.
5) update tuple on the page unsetting all visible. Emits an FPI which
does *not* have the tuples marked as frozen.

Easy enough and fairly cheap to fix by having a function that checks
that updates the hint bits on a page when unsetting all visible since we
can just set it for all pre-existing tuples.

> but as far as I can see, it also requires PD_ALL_VISIBLE to stick
> around.

Now, I am far from being convinced its a good idea to get rid of
PD_ALL_VISIBLE, but I don't think it does. Except that it currently is
legal for the page level ALL_VISIBLE being set while the corresponding
visibilitymap one isn't there's not much prohibiting us fundamentally
from looking in the vm when we need to know whether the page is all
visible, is there?
To the contrary, this actually seems to be a pretty good case for Jeff's
proposed behaviour since it would allow freezing while only writing the
vm?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: removing PD_ALL_VISIBLE
Date: 2013-05-30 12:34:04
Message-ID: 51A7473C.6070208@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 30.05.2013 15:12, Andres Freund wrote:
> Now, I am far from being convinced its a good idea to get rid of
> PD_ALL_VISIBLE, but I don't think it does. Except that it currently is
> legal for the page level ALL_VISIBLE being set while the corresponding
> visibilitymap one isn't there's not much prohibiting us fundamentally
> from looking in the vm when we need to know whether the page is all
> visible, is there?

Hmm, so you're suggesting that the visibility map would be *required* to
interpret the pages correctly. Ie. if you just zapped the visibility
map, you'd lose critical information and the heap would appear to be
corrupt. I guess that's possible, but it makes me quite uneasy. At the
moment, it's relieving to know that it's always safe to just truncate
the visibility map in case of emergency.

- Heikki


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: removing PD_ALL_VISIBLE
Date: 2013-05-30 12:39:30
Message-ID: 20130530123930.GD14029@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2013-05-30 15:34:04 +0300, Heikki Linnakangas wrote:
> On 30.05.2013 15:12, Andres Freund wrote:
> >Now, I am far from being convinced its a good idea to get rid of
> >PD_ALL_VISIBLE, but I don't think it does. Except that it currently is
> >legal for the page level ALL_VISIBLE being set while the corresponding
> >visibilitymap one isn't there's not much prohibiting us fundamentally
> >from looking in the vm when we need to know whether the page is all
> >visible, is there?
>
> Hmm, so you're suggesting that the visibility map would be *required* to
> interpret the pages correctly. Ie. if you just zapped the visibility map,
> you'd lose critical information and the heap would appear to be corrupt. I
> guess that's possible, but it makes me quite uneasy. At the moment, it's
> relieving to know that it's always safe to just truncate the visibility map
> in case of emergency.

I didn't say its a good idea, just that I don't think Robert's
conclusion is necessarily valid. But requiring only the few kbytes of
the vm to be written instead of all of the heap during freezeing (or
whatever we would call it) has quite some allure, I admit that.

But I think that should be a separate project to reeingineering how
freezing works.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: removing PD_ALL_VISIBLE
Date: 2013-05-30 13:47:22
Message-ID: CA+Tgmob4O4Gx4w0nCobXxe4Gzs+BdHbJ5MEKD+H_KCYZEpVo7g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, May 30, 2013 at 8:12 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> As far as I understand the trick basically is that we can rely on an FPI
> being logged when an action unsetting ALL_VISIBLE is performed. That
> all-visible would then make sure the hint-bits marking indvidual tuples
> as frozen would hit disk. For that we need to add some more work though,
> consider:
>
> 1) write tuples on a page
> 2) "freeze" page by setting ALL_VISIBLE and setting hint
> bits. Setting ALL_VISIBLE is wall logged
> 3) crash
> 4) replay ALL_VISIBLE, set it on the page level. The individual tuples
> are *not* guaranteed to be marked frozen.
> 5) update tuple on the page unsetting all visible. Emits an FPI which
> does *not* have the tuples marked as frozen.
>
> Easy enough and fairly cheap to fix by having a function that checks
> that updates the hint bits on a page when unsetting all visible since we
> can just set it for all pre-existing tuples.

Basically, yes, though I would say "infomask bits" rather than "hint
bits", since not all of them are only hints, and this case would not
be merely a hint.

>> but as far as I can see, it also requires PD_ALL_VISIBLE to stick
>> around.
>
> Now, I am far from being convinced its a good idea to get rid of
> PD_ALL_VISIBLE, but I don't think it does. Except that it currently is
> legal for the page level ALL_VISIBLE being set while the corresponding
> visibilitymap one isn't there's not much prohibiting us fundamentally
> from looking in the vm when we need to know whether the page is all
> visible, is there?
> To the contrary, this actually seems to be a pretty good case for Jeff's
> proposed behaviour since it would allow freezing while only writing the
> vm?

Well, as Heikki points out, I think that's unacceptably dangerous.
Loss or corruption of a single visibility map page means possible loss
of half a gigabyte of data.

Also, if we go that route, looking at the visibility map is no longer
an optimization; it's essential for correctness. We can't decide to
skip it when it seems expensive, for example, as Jeff was proposing.

There's another thing that's bothering me about this whole discussion,
too. If looking at another page for the information we need to make
visibility decisions is so cheap that we need not be concerned with
it, then why do we need hint bits? I realize that it's not quite the
same thing, because CLOG doesn't have as much locality of access as
the visibility map; you're guaranteed to find all the information you
need for a single heap page on a single VM page. Also, CLOG is
per-tuple, not per-page, and we get a decent speed-up from checking
once for the whole page rather than for each tuple. Nonetheless, I
think the contrast between Jeff's tests, which aren't showing much
impact from the increased visibility map traffic, and previous
hint-bit removal tests, which have crashed and burned, may be caused
in part by the fact that our algorithms and locking regimen for
shared_buffers are much more sophisticated than for SLRU. I'm not
eager to have our design decisions driven by that gap.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: removing PD_ALL_VISIBLE
Date: 2013-05-31 17:14:13
Message-ID: 20130531171413.GA1728@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, May 30, 2013 at 09:47:22AM -0400, Robert Haas wrote:
> Well, as Heikki points out, I think that's unacceptably dangerous.
> Loss or corruption of a single visibility map page means possible loss
> of half a gigabyte of data.
>
> Also, if we go that route, looking at the visibility map is no longer
> an optimization; it's essential for correctness. We can't decide to
> skip it when it seems expensive, for example, as Jeff was proposing.

Isn't the visibility map already required for proper return results as
we use it for index-only scans. I think the optimization-only ship has
sailed.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: removing PD_ALL_VISIBLE
Date: 2013-05-31 17:17:09
Message-ID: 20130531171709.GE4606@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2013-05-31 13:14:13 -0400, Bruce Momjian wrote:
> On Thu, May 30, 2013 at 09:47:22AM -0400, Robert Haas wrote:
> > Well, as Heikki points out, I think that's unacceptably dangerous.
> > Loss or corruption of a single visibility map page means possible loss
> > of half a gigabyte of data.
> >
> > Also, if we go that route, looking at the visibility map is no longer
> > an optimization; it's essential for correctness. We can't decide to
> > skip it when it seems expensive, for example, as Jeff was proposing.
>
> Isn't the visibility map already required for proper return results as
> we use it for index-only scans. I think the optimization-only ship has
> sailed.

At the moment we can remove it without causing corruption. If we were to
use it for freezing we couldn't anymore. So there's a difference - how
big it is I am not sure.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: removing PD_ALL_VISIBLE
Date: 2013-05-31 17:28:12
Message-ID: 51A8DDAC.3040109@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


>> Isn't the visibility map already required for proper return results as
>> we use it for index-only scans. I think the optimization-only ship has
>> sailed.
>
> At the moment we can remove it without causing corruption. If we were to
> use it for freezing we couldn't anymore. So there's a difference - how
> big it is I am not sure.

Depends on your definition of corruption, really.

But yes, right now, the vismap can lose bits without causing any
corruption, and making all-frozen depend on it would eliminate that.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: removing PD_ALL_VISIBLE
Date: 2013-05-31 17:44:58
Message-ID: 20130531174458.GC1728@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, May 31, 2013 at 10:28:12AM -0700, Josh Berkus wrote:
>
> >> Isn't the visibility map already required for proper return results as
> >> we use it for index-only scans. I think the optimization-only ship has
> >> sailed.
> >
> > At the moment we can remove it without causing corruption. If we were to
> > use it for freezing we couldn't anymore. So there's a difference - how
> > big it is I am not sure.
>
> Depends on your definition of corruption, really.
>
> But yes, right now, the vismap can lose bits without causing any
> corruption, and making all-frozen depend on it would eliminate that.

Roberts statement was:

> Loss or corruption of a single visibility map page means possible loss
> of half a gigabyte of data.

Certainly unidentified corruption of a visibility map page could easily
cause incorrect results. So, technically, _adding_ bits would cause
corruption.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: removing PD_ALL_VISIBLE
Date: 2013-05-31 18:00:19
Message-ID: 51A8E533.5080109@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce,

> Roberts statement was:
>
>> Loss or corruption of a single visibility map page means possible loss
>> of half a gigabyte of data.

I fail to be alarmed at this; currently losing a single page of the clog
causes just as widespread corruption (worse, actually, since it's not
confined to one table). It does point to the eventual need to checksum
these things, though.

> Certainly unidentified corruption of a visibility map page could easily
> cause incorrect results. So, technically, _adding_ bits would cause
> corruption.

Yes, that's already true. I'm pointing out that if we depend on the
vismap for all-frozen, then losing bits *also* causes corruption, so
that's something we need to test for. Right now, there is no possible
corruption from losing bits; we simply end up scannning more pages than
we have to.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: removing PD_ALL_VISIBLE
Date: 2013-05-31 18:05:36
Message-ID: 20130531180536.GD1728@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, May 31, 2013 at 11:00:19AM -0700, Josh Berkus wrote:
> Bruce,
>
> > Roberts statement was:
> >
> >> Loss or corruption of a single visibility map page means possible loss
> >> of half a gigabyte of data.
>
> I fail to be alarmed at this; currently losing a single page of the clog
> causes just as widespread corruption (worse, actually, since it's not
> confined to one table). It does point to the eventual need to checksum
> these things, though.
>
> > Certainly unidentified corruption of a visibility map page could easily
> > cause incorrect results. So, technically, _adding_ bits would cause
> > corruption.
>
> Yes, that's already true. I'm pointing out that if we depend on the
> vismap for all-frozen, then losing bits *also* causes corruption, so
> that's something we need to test for. Right now, there is no possible
> corruption from losing bits; we simply end up scannning more pages than
> we have to.

Right, and it is hard to see that losing and adding are somehow
more/less likely, so it seems we already realy on the visibility map
being correct.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: removing PD_ALL_VISIBLE
Date: 2013-05-31 19:22:57
Message-ID: 20130531192256.GX6434@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Bruce Momjian (bruce(at)momjian(dot)us) wrote:
> Right, and it is hard to see that losing and adding are somehow
> more/less likely, so it seems we already realy on the visibility map
> being correct.

Yes, but there's also a way to get *back* to a valid state if things go
south with the visibility map. It's similar to an index today- sure,
you might get corruption, and that might give you wrong results, but if
you detect it, it's easy to correct- rebuild the index, drop the
visibility map, etc. With CRCs, it'll be even easier to detect and
discover when you need to do such a thing.

Corruption in the heap is certainly bad too- you may lose some records,
but you could zero those pages out and move on with only losing 8k or
what-have-you. Clog corruption is certainly a bad thing, but if nearly
everything in your DB is already frozen, it's not as bad as it *could*
be. I wonder if you could rebuild a portion of clog from the WAL..

There are a lot of different tradeoffs here, certainly. It's certainly
nice that single bit or single page corruption can often be recovered
from by rebuilding an index or rebuilding the visbility map or just
nuking a few records from a table. CRCs help with identifying when
corruption has happened, but they do nothing for what to *do* when it
happens, and "restore completely from backup" isn't a great answer when
you've got terrabytes of data.

Where I'm going with this whole thing is simply that I do worry a bit
about using a bitmap for freeze, or similar, information and not being
able to reconstruct that bitmap from the heap. Perhaps that's overly
paranoid, but, well, we also write the same data out to disk in multiple
places multiple times- some might call that paranoid too. ;)

Thanks,

Stephen


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: removing PD_ALL_VISIBLE
Date: 2013-05-31 19:45:57
Message-ID: 51A8FDF5.7050404@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05/31/2013 12:22 PM, Stephen Frost wrote:
> Where I'm going with this whole thing is simply that I do worry a bit
> about using a bitmap for freeze, or similar, information and not being
> able to reconstruct that bitmap from the heap. Perhaps that's overly
> paranoid, but, well, we also write the same data out to disk in multiple
> places multiple times- some might call that paranoid too. ;)

On the other hand, we could combine Heikki's proposal (epoch numbers in
the page header) together with using the visibility map for pages we
don't need to vacuum freeze, and get vastly better behavior without
increasing corruption risk that I can see.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: removing PD_ALL_VISIBLE
Date: 2013-05-31 20:49:09
Message-ID: 20130531204909.GY6434@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Josh Berkus (josh(at)agliodbs(dot)com) wrote:
> On 05/31/2013 12:22 PM, Stephen Frost wrote:
> > Where I'm going with this whole thing is simply that I do worry a bit
> > about using a bitmap for freeze, or similar, information and not being
> > able to reconstruct that bitmap from the heap. Perhaps that's overly
> > paranoid, but, well, we also write the same data out to disk in multiple
> > places multiple times- some might call that paranoid too. ;)
>
> On the other hand, we could combine Heikki's proposal (epoch numbers in
> the page header) together with using the visibility map for pages we
> don't need to vacuum freeze, and get vastly better behavior without
> increasing corruption risk that I can see.

That was actually my thinking as well..

Thanks,

Stephen


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: removing PD_ALL_VISIBLE
Date: 2013-06-01 02:48:50
Message-ID: CA+TgmoYt=Y1G+d4Nbbmo1PM6XLL2NM5Y598WavknM=zTSxdScQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, May 31, 2013 at 1:44 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> On Fri, May 31, 2013 at 10:28:12AM -0700, Josh Berkus wrote:
>> >> Isn't the visibility map already required for proper return results as
>> >> we use it for index-only scans. I think the optimization-only ship has
>> >> sailed.
>> >
>> > At the moment we can remove it without causing corruption. If we were to
>> > use it for freezing we couldn't anymore. So there's a difference - how
>> > big it is I am not sure.
>>
>> Depends on your definition of corruption, really.
>>
>> But yes, right now, the vismap can lose bits without causing any
>> corruption, and making all-frozen depend on it would eliminate that.
>
> Roberts statement was:
>
>> Loss or corruption of a single visibility map page means possible loss
>> of half a gigabyte of data.
>
> Certainly unidentified corruption of a visibility map page could easily
> cause incorrect results. So, technically, _adding_ bits would cause
> corruption.

Adding bits could cause tuples that ought to be invisible to be
treated as visible. Currently, removing bits is harmless (except to
performance), but if we used the VM bit to indicate whether the page
was frozen in lieu of actually freezing it, a cleared bit would
potentially cause vacuum to nuke everything on that page.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: removing PD_ALL_VISIBLE
Date: 2013-06-01 03:30:29
Message-ID: CA+TgmoaGzsViLnzQOAYDG1bMdqmNUU44aE=7UVpQzxZWnQbjaA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, May 31, 2013 at 3:45 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> On 05/31/2013 12:22 PM, Stephen Frost wrote:
>> Where I'm going with this whole thing is simply that I do worry a bit
>> about using a bitmap for freeze, or similar, information and not being
>> able to reconstruct that bitmap from the heap. Perhaps that's overly
>> paranoid, but, well, we also write the same data out to disk in multiple
>> places multiple times- some might call that paranoid too. ;)
>
> On the other hand, we could combine Heikki's proposal (epoch numbers in
> the page header) together with using the visibility map for pages we
> don't need to vacuum freeze, and get vastly better behavior without
> increasing corruption risk that I can see.

Yeah, I was thinking about that as well. In fact, under either
Heikki's proposal or Andres's proposal or my variants of either one of
them, anti-wraparound vacuums no longer need to scan all-visible
pages. Under Andres's proposal (and variants), all-visible pages are
ipso facto frozen and therefore need not be scanned for freezing. But
under Heikki's proposal (and variants), anti-wraparound vacuums only
need to remove dead tuples; freezing live ones is a no-op. And
all-visible pages don't contain any dead tuples, so we're right back
in the same place.[1]

Where things diverge a little is what you when an anti-wraparound scan
encounters a page that isn't all-visible and can't be marked
all-visible. Under the "XID epoch" family of proposals, we need to
truncate any dead tuples on the page to line pointers, and that's it.
Under the "treat all-visible as frozen" family of proposals, we *also*
need to do old-style freezing on any aged but live tuples on the page.
So the "XID epoch" saves write I/O in this case, because we don't
dirty pages or write WAL just because we see a tuple with a high XID
age. Only pages that contain actual dead tuples get dirtied.

So on further reflection, I'm not seeing any advantage to combining
the two proposals. The "XID epoch" approach has the complications of
requiring page format changes and consuming space on every page,
although I think I may have worked out a variant approach that will
avoid that (which no one has commented on, so maybe I'm
hallucinating). But in other respects it seems better all around.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] I said upthread that Heikki's idea would require a separate freeze
map, but now I think I was wrong.