Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

Lists: pgsql-hackers
From: Alexey Bashtanov <bashtanov(at)imap(dot)cc>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables
Date: 2014-10-09 10:34:17
Message-ID: 543664A9.2000908@imap.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello!

Autovacuum daemon performs vacuum when the number of rows
updated/deleted (n_dead_tuples) reaches some threshold.
Similarly it performs analyze when the number of rows changed in any way
(incl. inserted).
When a table is mostly insert-only, its visibility map is not updated as
vacuum threshold is almost never reached, but analyze does not update
visibility map.

Why could it be a bad idea to run vacuum after some number of any
changes including inserts, like analyze?
Or at least make it tunable by user (add a second bunch of paramters to
control second vacuum threshold, disabled by default)?

Best regards,
Alexey Bashtanov


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Alexey Bashtanov <bashtanov(at)imap(dot)cc>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables
Date: 2014-10-09 20:58:51
Message-ID: 20141009205851.GC19877@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Oct 9, 2014 at 02:34:17PM +0400, Alexey Bashtanov wrote:
> Hello!
>
> Autovacuum daemon performs vacuum when the number of rows
> updated/deleted (n_dead_tuples) reaches some threshold.
> Similarly it performs analyze when the number of rows changed in any
> way (incl. inserted).
> When a table is mostly insert-only, its visibility map is not
> updated as vacuum threshold is almost never reached, but analyze
> does not update visibility map.
>
> Why could it be a bad idea to run vacuum after some number of any
> changes including inserts, like analyze?
> Or at least make it tunable by user (add a second bunch of paramters
> to control second vacuum threshold, disabled by default)?

I agree this is a serious problem. We have discussed various options,
but have not decided on anything. The TODO list has:

https://wiki.postgresql.org/wiki/Todo

Improve setting of visibility map bits for read-only and insert-only
workloads

http://www.postgresql.org/message-id/20130906001437.GA29264@momjian.us

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Alexey Bashtanov <bashtanov(at)imap(dot)cc>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables
Date: 2014-10-09 21:03:00
Message-ID: 20141009210300.GI7043@eldon.alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian wrote:

> I agree this is a serious problem. We have discussed various options,
> but have not decided on anything. The TODO list has:
>
> https://wiki.postgresql.org/wiki/Todo
>
> Improve setting of visibility map bits for read-only and insert-only
> workloads
>
> http://www.postgresql.org/message-id/20130906001437.GA29264@momjian.us

I hate to repeat myself, but I think autovacuum could be modified to run
actions other than vacuum and analyze. In this specific case we could
be running a table scan that checks only pages that don't have the
all-visible bit set, and see if it can be set. (Of course, this idea
needs refinement to avoid running over and over when the bit cannot be
set on some pages for whatever reason.)

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Alexey Bashtanov <bashtanov(at)imap(dot)cc>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables
Date: 2014-10-09 21:10:42
Message-ID: 1412889042.34580.YahooMailNeo@web122305.mail.ne1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
> Bruce Momjian wrote:

>> I agree this is a serious problem. We have discussed various options,
>> but have not decided on anything. The TODO list has:
>>
>> https://wiki.postgresql.org/wiki/Todo
>>
>> Improve setting of visibility map bits for read-only and insert-only
>> workloads
>>
>> http://www.postgresql.org/message-id/20130906001437.GA29264@momjian.us
>
> I hate to repeat myself, but I think autovacuum could be modified to run
> actions other than vacuum and analyze. In this specific case we could
> be running a table scan that checks only pages that don't have the
> all-visible bit set, and see if it can be set. (Of course, this idea
> needs refinement to avoid running over and over when the bit cannot be
> set on some pages for whatever reason.)

Wouldn't we get substantially the same thing just by counting tuple
inserts toward the autovacuum vacuum threshold? I mean, it unless
the table is due for wraparound prevention autovacuum, it will only
visit pages that don't have the all-visible bit set, right? And
how much work would that do beyond what you're describing if none
of the tuples are dead?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Alexey Bashtanov <bashtanov(at)imap(dot)cc>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables
Date: 2014-10-09 21:11:26
Message-ID: 20141009211125.GI29124@awork2.int
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-10-09 18:03:00 -0300, Alvaro Herrera wrote:
> Bruce Momjian wrote:
>
> > I agree this is a serious problem. We have discussed various options,
> > but have not decided on anything. The TODO list has:
> >
> > https://wiki.postgresql.org/wiki/Todo
> >
> > Improve setting of visibility map bits for read-only and insert-only
> > workloads
> >
> > http://www.postgresql.org/message-id/20130906001437.GA29264@momjian.us
>
> I hate to repeat myself, but I think autovacuum could be modified to run
> actions other than vacuum and analyze. In this specific case we could
> be running a table scan that checks only pages that don't have the
> all-visible bit set, and see if it can be set.

Isn't that *precisely* what a plain vacuum run does?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Alexey Bashtanov <bashtanov(at)imap(dot)cc>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables
Date: 2014-10-09 21:16:46
Message-ID: 20141009211646.GJ7043@eldon.alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund wrote:
> On 2014-10-09 18:03:00 -0300, Alvaro Herrera wrote:
> > Bruce Momjian wrote:
> >
> > > I agree this is a serious problem. We have discussed various options,
> > > but have not decided on anything. The TODO list has:
> > >
> > > https://wiki.postgresql.org/wiki/Todo
> > >
> > > Improve setting of visibility map bits for read-only and insert-only
> > > workloads
> > >
> > > http://www.postgresql.org/message-id/20130906001437.GA29264@momjian.us
> >
> > I hate to repeat myself, but I think autovacuum could be modified to run
> > actions other than vacuum and analyze. In this specific case we could
> > be running a table scan that checks only pages that don't have the
> > all-visible bit set, and see if it can be set.
>
> Isn't that *precisely* what a plain vacuum run does?

Well, it also scans for dead tuples, removes them, and needs to go
through indexes to remove their references. I'm thinking in something
very lightweight. Otherwise, why don't we just reduce the
vacuum_scale_factor default to something very small, so that vacuum is
triggered more often?

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Alexey Bashtanov <bashtanov(at)imap(dot)cc>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables
Date: 2014-10-09 21:18:29
Message-ID: 20141009211829.GK7043@eldon.alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kevin Grittner wrote:

> Wouldn't we get substantially the same thing just by counting tuple
> inserts toward the autovacuum vacuum threshold? I mean, it unless
> the table is due for wraparound prevention autovacuum, it will only
> visit pages that don't have the all-visible bit set, right? And
> how much work would that do beyond what you're describing if none
> of the tuples are dead?

The problem is precisely what happens if there are some dead tuples, but
not enough to reach the 20% threshold: this vacuum now has to scan the
table twice and has to clean up indexes also.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Alexey Bashtanov <bashtanov(at)imap(dot)cc>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables
Date: 2014-10-09 21:19:41
Message-ID: 20141009211941.GJ29124@awork2.int
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-10-09 18:16:46 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> > On 2014-10-09 18:03:00 -0300, Alvaro Herrera wrote:
> > > Bruce Momjian wrote:
> > >
> > > > I agree this is a serious problem. We have discussed various options,
> > > > but have not decided on anything. The TODO list has:
> > > >
> > > > https://wiki.postgresql.org/wiki/Todo
> > > >
> > > > Improve setting of visibility map bits for read-only and insert-only
> > > > workloads
> > > >
> > > > http://www.postgresql.org/message-id/20130906001437.GA29264@momjian.us
> > >
> > > I hate to repeat myself, but I think autovacuum could be modified to run
> > > actions other than vacuum and analyze. In this specific case we could
> > > be running a table scan that checks only pages that don't have the
> > > all-visible bit set, and see if it can be set.
> >
> > Isn't that *precisely* what a plain vacuum run does?
>
> Well, it also scans for dead tuples, removes them, and needs to go
> through indexes to remove their references.

IIRC it doesn't do most of that if that there's no need. And if it's a
insert only table without rollbacks. I *do* think there's some
optimizations we could make in general.

> I'm thinking in something
> very lightweight. Otherwise, why don't we just reduce the
> vacuum_scale_factor default to something very small, so that vacuum is
> triggered more often?

The problem here is that that doesn't trigger for inserts. Just for
updates/deletes or rollbacks.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Alexey Bashtanov <bashtanov(at)imap(dot)cc>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables
Date: 2014-10-09 23:15:29
Message-ID: 54371711.1060304@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/9/14, 4:03 PM, Alvaro Herrera wrote:
> Bruce Momjian wrote:
>
>> I agree this is a serious problem. We have discussed various options,
>> but have not decided on anything. The TODO list has:
>>
>> https://wiki.postgresql.org/wiki/Todo
>>
>> Improve setting of visibility map bits for read-only and insert-only
>> workloads
>>
>> http://www.postgresql.org/message-id/20130906001437.GA29264@momjian.us
>
> I hate to repeat myself, but I think autovacuum could be modified to run
> actions other than vacuum and analyze. In this specific case we could
> be running a table scan that checks only pages that don't have the
> all-visible bit set, and see if it can be set. (Of course, this idea
> needs refinement to avoid running over and over when the bit cannot be
> set on some pages for whatever reason.)

If we go down that road we should also think about having it proactively set hint bits...
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Alexey Bashtanov <bashtanov(at)imap(dot)cc>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables
Date: 2014-10-19 02:36:48
Message-ID: 544323C0.4000601@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/9/14, 4:19 PM, Andres Freund wrote:
> On 2014-10-09 18:16:46 -0300, Alvaro Herrera wrote:
>> >Andres Freund wrote:
>>> > >On 2014-10-09 18:03:00 -0300, Alvaro Herrera wrote:
>>>> > > >Bruce Momjian wrote:
>>>> > > >
>>>>> > > > >I agree this is a serious problem. We have discussed various options,
>>>>> > > > >but have not decided on anything. The TODO list has:
>>>>> > > > >
>>>>> > > > > https://wiki.postgresql.org/wiki/Todo
>>>>> > > > >
>>>>> > > > > Improve setting of visibility map bits for read-only and insert-only
>>>>> > > > > workloads
>>>>> > > > >
>>>>> > > > > http://www.postgresql.org/message-id/20130906001437.GA29264@momjian.us
>>>> > > >
>>>> > > >I hate to repeat myself, but I think autovacuum could be modified to run
>>>> > > >actions other than vacuum and analyze. In this specific case we could
>>>> > > >be running a table scan that checks only pages that don't have the
>>>> > > >all-visible bit set, and see if it can be set.
>>> > >
>>> > >Isn't that*precisely* what a plain vacuum run does?
>> >
>> >Well, it also scans for dead tuples, removes them, and needs to go
>> >through indexes to remove their references.
> IIRC it doesn't do most of that if that there's no need. And if it's a
> insert only table without rollbacks. I*do* think there's some
> optimizations we could make in general.

No, it always attempts dead tuple removal. The "weird" part is that if it's not doing a freeze it will just punt on a page if it can't get the cleanup lock. I have to believe that could seriously screw up autovacuum scheduling.

Now that we have forks, I'm wondering if it would be best to come up with a per-page system that could be used to determine when a table needs background work to be done. The visibility map could serve a lot of this purpose, but I'm not sure if it would work for getting hint bits set in the background. I think it would also be a win if we had a way to advance relfrozenxid and relminmxid. Perhaps something that simply remembered the last XID that touched each page...
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Alexey Bashtanov <bashtanov(at)imap(dot)cc>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables
Date: 2014-10-19 16:41:34
Message-ID: 20141019164134.GK22660@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-10-18 21:36:48 -0500, Jim Nasby wrote:
> On 10/9/14, 4:19 PM, Andres Freund wrote:
> >On 2014-10-09 18:16:46 -0300, Alvaro Herrera wrote:
> >>>Andres Freund wrote:
> >>>> >On 2014-10-09 18:03:00 -0300, Alvaro Herrera wrote:
> >>>>> > >Bruce Momjian wrote:
> >>>>> > >
> >>>>>> > > >I agree this is a serious problem. We have discussed various options,
> >>>>>> > > >but have not decided on anything. The TODO list has:
> >>>>>> > > >
> >>>>>> > > > https://wiki.postgresql.org/wiki/Todo
> >>>>>> > > >
> >>>>>> > > > Improve setting of visibility map bits for read-only and insert-only
> >>>>>> > > > workloads
> >>>>>> > > >
> >>>>>> > > > http://www.postgresql.org/message-id/20130906001437.GA29264@momjian.us
> >>>>> > >
> >>>>> > >I hate to repeat myself, but I think autovacuum could be modified to run
> >>>>> > >actions other than vacuum and analyze. In this specific case we could
> >>>>> > >be running a table scan that checks only pages that don't have the
> >>>>> > >all-visible bit set, and see if it can be set.
> >>>> >
> >>>> >Isn't that*precisely* what a plain vacuum run does?
> >>>
> >>>Well, it also scans for dead tuples, removes them, and needs to go
> >>>through indexes to remove their references.

> >IIRC it doesn't do most of that if that there's no need. And if it's a
> >insert only table without rollbacks. I*do* think there's some
> >optimizations we could make in general.
>
> No, it always attempts dead tuple removal.

I said some steps, not all steps. Check it out:

/* If any tuples need to be deleted, perform final vacuum cycle */
/* XXX put a threshold on min number of tuples here? */
if (vacrelstats->num_dead_tuples > 0)
{
/* Log cleanup info before we touch indexes */
vacuum_log_cleanup_info(onerel, vacrelstats);

/* Remove index entries */
for (i = 0; i < nindexes; i++)
lazy_vacuum_index(Irel[i],
&indstats[i],
vacrelstats);
/* Remove tuples from heap */
lazy_vacuum_heap(onerel, vacrelstats);
vacrelstats->num_index_scans++;
}

There's rub here though. We unconditionally do:
/* Do post-vacuum cleanup and statistics update for each index */
for (i = 0; i < nindexes; i++)
lazy_cleanup_index(Irel[i], indstats[i], vacrelstats);

and that's not particularly cheap. Maybe we should make that conditional
when there's been no lazy_vacuum_index/heap calls at all?

> The "weird" part is that if it's not doing a freeze it will just punt
> on a page if it can't get the cleanup lock.

I don't think that's particularly wierd. Otherwise vacuum can get stuck
behind a single very hot page - leading to much, much more bloat.

> I have to believe that could seriously screw up autovacuum scheduling.

Why?

> Now that we have forks, I'm wondering if it would be best to come up
> with a per-page system that could be used to determine when a table
> needs background work to be done. The visibility map could serve a lot
> of this purpose, but I'm not sure if it would work for getting hint
> bits set in the background.

It would. Per definition, all tuples that are 'all visible' need to be
fully hint bitted.

> I think it would also be a win if we had a way to advance relfrozenxid
> and relminmxid. Perhaps something that simply remembered the last XID
> that touched each page...

Not sure what you're getting at here?

I think the big missing piece lest something like Heikki's xid lsn
ranges thing gets finished is a freeze map.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Alexey Bashtanov <bashtanov(at)imap(dot)cc>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables
Date: 2014-10-19 16:50:30
Message-ID: 23152.1413737430@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> There's rub here though. We unconditionally do:
> /* Do post-vacuum cleanup and statistics update for each index */
> for (i = 0; i < nindexes; i++)
> lazy_cleanup_index(Irel[i], indstats[i], vacrelstats);

> and that's not particularly cheap. Maybe we should make that conditional
> when there's been no lazy_vacuum_index/heap calls at all?

Absolutely not. If the cleanup step is skippable, it would be the
province of the index AM to make that decision.

regards, tom lane


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Alexey Bashtanov <bashtanov(at)imap(dot)cc>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables
Date: 2014-10-19 16:54:59
Message-ID: 20141019165459.GJ9267@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-10-19 12:50:30 -0400, Tom Lane wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> > There's rub here though. We unconditionally do:
> > /* Do post-vacuum cleanup and statistics update for each index */
> > for (i = 0; i < nindexes; i++)
> > lazy_cleanup_index(Irel[i], indstats[i], vacrelstats);
>
> > and that's not particularly cheap. Maybe we should make that conditional
> > when there's been no lazy_vacuum_index/heap calls at all?
>
> Absolutely not. If the cleanup step is skippable, it would be the
> province of the index AM to make that decision.

Fair point. At the moment we're doing a full of nbtree indexes everytime
we do a vacuum. Even when the heap vacuum only scanned a couple hundred
pages of a huge table. That makes partial vacuum noticeably less
useful. So I do think we need to do something to improve upon the
situation.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Alexey Bashtanov <bashtanov(at)imap(dot)cc>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables
Date: 2014-10-20 01:43:29
Message-ID: 544468C1.6050101@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/19/14, 11:41 AM, Andres Freund wrote:
> On 2014-10-18 21:36:48 -0500, Jim Nasby wrote:
>> On 10/9/14, 4:19 PM, Andres Freund wrote:
>>> On 2014-10-09 18:16:46 -0300, Alvaro Herrera wrote:
>>>>> Andres Freund wrote:
>>>>>>> On 2014-10-09 18:03:00 -0300, Alvaro Herrera wrote:
>>>>>>>>> Bruce Momjian wrote:
>>>>>>>>>
>>>>>>>>>>> I agree this is a serious problem. We have discussed various options,
>>>>>>>>>>> but have not decided on anything. The TODO list has:
>>>>>>>>>>>
>>>>>>>>>>> https://wiki.postgresql.org/wiki/Todo
>>>>>>>>>>>
>>>>>>>>>>> Improve setting of visibility map bits for read-only and insert-only
>>>>>>>>>>> workloads
>>>>>>>>>>>
>>>>>>>>>>> http://www.postgresql.org/message-id/20130906001437.GA29264@momjian.us
>>>>>>>>>
>>>>>>>>> I hate to repeat myself, but I think autovacuum could be modified to run
>>>>>>>>> actions other than vacuum and analyze. In this specific case we could
>>>>>>>>> be running a table scan that checks only pages that don't have the
>>>>>>>>> all-visible bit set, and see if it can be set.
>>>>>>>
>>>>>>> Isn't that*precisely* what a plain vacuum run does?
>>>>>
>>>>> Well, it also scans for dead tuples, removes them, and needs to go
>>>>> through indexes to remove their references.
>
>>> IIRC it doesn't do most of that if that there's no need. And if it's a
>>> insert only table without rollbacks. I*do* think there's some
>>> optimizations we could make in general.
>>
>> No, it always attempts dead tuple removal.
>
> I said some steps, not all steps. Check it out:
>
> /* If any tuples need to be deleted, perform final vacuum cycle */
> /* XXX put a threshold on min number of tuples here? */
> if (vacrelstats->num_dead_tuples > 0)
> {
> /* Log cleanup info before we touch indexes */
> vacuum_log_cleanup_info(onerel, vacrelstats);
>
> /* Remove index entries */
> for (i = 0; i < nindexes; i++)
> lazy_vacuum_index(Irel[i],
> &indstats[i],
> vacrelstats);
> /* Remove tuples from heap */
> lazy_vacuum_heap(onerel, vacrelstats);
> vacrelstats->num_index_scans++;
> }
>
> There's rub here though. We unconditionally do:
> /* Do post-vacuum cleanup and statistics update for each index */
> for (i = 0; i < nindexes; i++)
> lazy_cleanup_index(Irel[i], indstats[i], vacrelstats);
>
> and that's not particularly cheap. Maybe we should make that conditional
> when there's been no lazy_vacuum_index/heap calls at all?

We could possibly pass in to lazy_cleanup_index whether we actually removed any tuples.

>> The "weird" part is that if it's not doing a freeze it will just punt
>> on a page if it can't get the cleanup lock.
>
> I don't think that's particularly wierd. Otherwise vacuum can get stuck
> behind a single very hot page - leading to much, much more bloat.
>
>> I have to believe that could seriously screw up autovacuum scheduling.
>
> Why?

I'm worried there could be some pathological cases where we'd skip a large number of pages, perhaps if a vacuum scan and a seqscan ended up running alongside each other.

Perhaps this is just paranoia, but we have no idea how bad things might be, because we don't have any logging for how many pages we skipped because we couldn't lock them.

Also, if this really is that big a deal for heap pages, how come we don't get screwed by it on Btree index pages, where we mandate that we acquire a cleanup lock?

>> Now that we have forks, I'm wondering if it would be best to come up
>> with a per-page system that could be used to determine when a table
>> needs background work to be done. The visibility map could serve a lot
>> of this purpose, but I'm not sure if it would work for getting hint
>> bits set in the background.
>
> It would. Per definition, all tuples that are 'all visible' need to be
> fully hint bitted.
>
>> I think it would also be a win if we had a way to advance relfrozenxid
>> and relminmxid. Perhaps something that simply remembered the last XID
>> that touched each page...
>
> Not sure what you're getting at here?

That ultimately, our current method for determining when and what to vacuum is rather crude, and likely results in wasted effort during scans as well as not firing autovac often enough. Keep in mind that autovac started as a user-space utility and the best it could possibly do was to keep a table of stats counters.

The visibility map obviously helps cut down on extra work during a scan, but it only goes so far in that regard.

Instead of relying on the crude methods, if we reliably tracked certain txids on a per-block basis in a fork, we could cheaply scan the fork and make an extremely informed decision on how much a vacuum would gain us, and exactly what blocks it should hit.

Let me use freezing as an example. If we had a reliable list of the lowest txid for each block of a relation that would allow us to do a freeze scan by hitting only blocks with minimum txid within our freeze range. The same could be done for multixacts.

That's just one example. We could do something similar for background hinting (oldest xmin/xmax of all non-hinted tuples), and whether it's worth trying to vacuum (I think that'd be a combination of oldest non-locking xmax and seeing how much room the FSM has listed for the page).

If we stored 3 txids for each block in a fork, we could fit information for ~680 heap blocks in each fork block. So in a database with 680G of heap data, we could fully determine every *block* (not table) we needed to vacuum by scanning just 1GB of data. That would allow for far better autovacuum scheduling than what we do today.

> I think the big missing piece lest something like Heikki's xid lsn
> ranges thing gets finished is a freeze map.

The problem with a simple freeze map is when do you actually set the bit? If you do it while the transaction that created all the tuples is still running then any attempt to use the map prior to those tuples becoming all visible is pointless. Presumably this is why pd_prune_xid stores a txid and isn't just a boolean.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Alexey Bashtanov <bashtanov(at)imap(dot)cc>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables
Date: 2014-10-20 20:11:14
Message-ID: 20141020201114.GG7176@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-10-19 20:43:29 -0500, Jim Nasby wrote:
> On 10/19/14, 11:41 AM, Andres Freund wrote:
> >On 2014-10-18 21:36:48 -0500, Jim Nasby wrote:
> >>The "weird" part is that if it's not doing a freeze it will just punt
> >>on a page if it can't get the cleanup lock.
> >
> >I don't think that's particularly wierd. Otherwise vacuum can get stuck
> >behind a single very hot page - leading to much, much more bloat.
> >
> >>I have to believe that could seriously screw up autovacuum scheduling.
> >
> >Why?
>
> I'm worried there could be some pathological cases where we'd skip a
> large number of pages, perhaps if a vacuum scan and a seqscan ended up
> running alongside each other.

I've seen little evidence of that. The reverse, a stuck autovacuum, is
imo much more likely. For this to be an actual problem you'd need to
encounter many pages that are not locked, but are pinned. That state
doesn't exist for very long.

> Perhaps this is just paranoia, but we have no idea how bad things
> might be, because we don't have any logging for how many pages we
> skipped because we couldn't lock them.

But so what? If we skip individual pages it won't be too bad - and very
likely waiting very long is going to be more painful. The page won't be
marked 'all visible' so the next vacuum will come around to it
again. And it'll also get cleaned up by opportunistic hot pruning.

> Also, if this really is that big a deal for heap pages, how come we
> don't get screwed by it on Btree index pages, where we mandate that we
> acquire a cleanup lock?

Because we never hold pins for btree pages for very long. Whereas we do
that for heap pages. If you e.g. run a cursor forward you can hold a pin
for essentially unbounded time.

> >>Now that we have forks, I'm wondering if it would be best to come up
> >>with a per-page system that could be used to determine when a table
> >>needs background work to be done. The visibility map could serve a lot
> >>of this purpose, but I'm not sure if it would work for getting hint
> >>bits set in the background.
> >
> >It would. Per definition, all tuples that are 'all visible' need to be
> >fully hint bitted.
> >
> >>I think it would also be a win if we had a way to advance relfrozenxid
> >>and relminmxid. Perhaps something that simply remembered the last XID
> >>that touched each page...
> >
> >Not sure what you're getting at here?
>
> That ultimately, our current method for determining when and what to
> vacuum is rather crude, and likely results in wasted effort during
> scans as well as not firing autovac often enough. Keep in mind that
> autovac started as a user-space utility and the best it could possibly
> do was to keep a table of stats counters.

I agree that we should trigger autovacuum more often. It's
*intentionally* not triggered *at all* for insert only workloads (if you
discount anti wraparound vacuums). I think it's time to change that. For
that we'd need to make vacuums that don't delete any tuples cheaper. We
already rescan only the changed parts of the heaps - but we always scan
indexes fully...

> The visibility map obviously helps cut down on extra work during a
> scan, but it only goes so far in that regard.

Aha.

> Instead of relying on the crude methods, if we reliably tracked
> certain txids on a per-block basis in a fork, we could cheaply scan
> the fork and make an extremely informed decision on how much a vacuum
> would gain us, and exactly what blocks it should hit.

> Let me use freezing as an example. If we had a reliable list of the
> lowest txid for each block of a relation that would allow us to do a
> freeze scan by hitting only blocks with minimum txid within our freeze
> range. The same could be done for multixacts.

It'd also become a prime contention point because you'd need to
constantly update it. In contrast to a simple 'is frozen' bit (akin to
is_visible) which only changes infrequently, and only in one direction.

> If we stored 3 txids for each block in a fork, we could fit
> information for ~680 heap blocks in each fork block. So in a database
> with 680G of heap data, we could fully determine every *block* (not
> table) we needed to vacuum by scanning just 1GB of data. That would
> allow for far better autovacuum scheduling than what we do today.

It's not that simple. Wraparounds and locking complicate it
significantly.

> >I think the big missing piece lest something like Heikki's xid lsn
> >ranges thing gets finished is a freeze map.
>
> The problem with a simple freeze map is when do you actually set the
> bit?

There's precisely one place where you can set it for normal
operation. During vacuum's scan.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Alexey Bashtanov <bashtanov(at)imap(dot)cc>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables
Date: 2014-10-21 00:39:16
Message-ID: 5445AB34.6030301@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/20/14, 3:11 PM, Andres Freund wrote:
> On 2014-10-19 20:43:29 -0500, Jim Nasby wrote:
>> On 10/19/14, 11:41 AM, Andres Freund wrote:
>>> On 2014-10-18 21:36:48 -0500, Jim Nasby wrote:
>>>> The "weird" part is that if it's not doing a freeze it will just punt
>>>> on a page if it can't get the cleanup lock.
>>>
>>> I don't think that's particularly wierd. Otherwise vacuum can get stuck
>>> behind a single very hot page - leading to much, much more bloat.
>>>
>>>> I have to believe that could seriously screw up autovacuum scheduling.
>>>
>>> Why?
>>
>> I'm worried there could be some pathological cases where we'd skip a
>> large number of pages, perhaps if a vacuum scan and a seqscan ended up
>> running alongside each other.
>
> I've seen little evidence of that. The reverse, a stuck autovacuum, is
> imo much more likely. For this to be an actual problem you'd need to
> encounter many pages that are not locked, but are pinned. That state
> doesn't exist for very long.

How would you actually get evidence of this... we don't log it. :) (See my proposal at http://www.postgresql.org/message-id/54446C10.2080203@BlueTreble.com)

>> Perhaps this is just paranoia, but we have no idea how bad things
>> might be, because we don't have any logging for how many pages we
>> skipped because we couldn't lock them.
>
> But so what? If we skip individual pages it won't be too bad - and very
> likely waiting very long is going to be more painful. The page won't be
> marked 'all visible' so the next vacuum will come around to it
> again. And it'll also get cleaned up by opportunistic hot pruning.

Probably true. Hopefully we can start logging it and then we'll know for sure.

>> That ultimately, our current method for determining when and what to
>> vacuum is rather crude, and likely results in wasted effort during
>> scans as well as not firing autovac often enough. Keep in mind that
>> autovac started as a user-space utility and the best it could possibly
>> do was to keep a table of stats counters.
>
> I agree that we should trigger autovacuum more often. It's
> *intentionally* not triggered *at all* for insert only workloads (if you
> discount anti wraparound vacuums). I think it's time to change that. For
> that we'd need to make vacuums that don't delete any tuples cheaper. We
> already rescan only the changed parts of the heaps - but we always scan
> indexes fully...

Or maybe vacuum isn't the right way to handle some of these scenarios. It's become the catch-all for all of this stuff, but maybe that doesn't make sense anymore. Certainly when it comes to dealing with inserts there's no reason we *have* to do anything other than set hint bits and possibly freeze xmin.

>> Instead of relying on the crude methods, if we reliably tracked
>> certain txids on a per-block basis in a fork, we could cheaply scan
>> the fork and make an extremely informed decision on how much a vacuum
>> would gain us, and exactly what blocks it should hit.
>
>> Let me use freezing as an example. If we had a reliable list of the
>> lowest txid for each block of a relation that would allow us to do a
>> freeze scan by hitting only blocks with minimum txid within our freeze
>> range. The same could be done for multixacts.
>
> It'd also become a prime contention point because you'd need to
> constantly update it. In contrast to a simple 'is frozen' bit (akin to
> is_visible) which only changes infrequently, and only in one direction.

Actually, the contention on freeze would very possibly be minimal, because it probably doesn't change very often. Even if it did, it's OK if the value isn't 100% accurate, so long as the recorded XID is guaranteed older than what's actually on the page.

>> If we stored 3 txids for each block in a fork, we could fit
>> information for ~680 heap blocks in each fork block. So in a database
>> with 680G of heap data, we could fully determine every *block* (not
>> table) we needed to vacuum by scanning just 1GB of data. That would
>> allow for far better autovacuum scheduling than what we do today.
>
> It's not that simple. Wraparounds and locking complicate it
> significantly.

I realize what I'm talking about isn't trivial (though, I'm confused by your comment about wraparound since presumably TransactionIdPrecedes() and it's ilk solve that problem...)

My ultimate point here is that we're using what are (today) very crude methods to control what gets vacuumed when, and I think that now that we have resource forks would could do *much* better without a tremendous amount of work. But to make a big advancement here we'll need to take a step back and rethink some things (like vacuum is the only way to handle these problems).

Let me put some thought into this.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Alexey Bashtanov <bashtanov(at)imap(dot)cc>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables
Date: 2014-10-21 00:43:26
Message-ID: 5445AC2E.7030302@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/20/2014 05:39 PM, Jim Nasby wrote:
> Or maybe vacuum isn't the right way to handle some of these scenarios.
> It's become the catch-all for all of this stuff, but maybe that doesn't
> make sense anymore. Certainly when it comes to dealing with inserts
> there's no reason we *have* to do anything other than set hint bits and
> possibly freeze xmin.

+1

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Alexey Bashtanov <bashtanov(at)imap(dot)cc>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables
Date: 2014-10-21 00:46:47
Message-ID: 20141021004647.GL7176@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-10-20 17:43:26 -0700, Josh Berkus wrote:
> On 10/20/2014 05:39 PM, Jim Nasby wrote:
> > Or maybe vacuum isn't the right way to handle some of these scenarios.
> > It's become the catch-all for all of this stuff, but maybe that doesn't
> > make sense anymore. Certainly when it comes to dealing with inserts
> > there's no reason we *have* to do anything other than set hint bits and
> > possibly freeze xmin.
>
> +1

A page read is a page read. What's the point of heaving another process
do it? Vacuum doesn't dirty pages if they don't have to be
dirtied. Especially stuff like freezing cannot really be dealt with
outside of vacuum unless you make already complex stuff more complex for
a marginal benefit.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Alexey Bashtanov <bashtanov(at)imap(dot)cc>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables
Date: 2014-10-21 21:36:58
Message-ID: CAMkU=1zf1Yo0dYJzJ-pk9o4mwLuMD4Uzw6Jck7u1nC_Xb2gYWA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Oct 20, 2014 at 5:46 PM, Andres Freund <andres(at)2ndquadrant(dot)com>
wrote:

> On 2014-10-20 17:43:26 -0700, Josh Berkus wrote:
> > On 10/20/2014 05:39 PM, Jim Nasby wrote:
> > > Or maybe vacuum isn't the right way to handle some of these scenarios.
> > > It's become the catch-all for all of this stuff, but maybe that doesn't
> > > make sense anymore. Certainly when it comes to dealing with inserts
> > > there's no reason we *have* to do anything other than set hint bits and
> > > possibly freeze xmin.
> >
> > +1
>
> A page read is a page read. What's the point of heaving another process
> do it?

It is only a page read if you have to read the page. It would seem optimal
to have bgwriter adventitiously set hint bits and vm bits, because that is
the last point at which the page can be changed without risking that it be
written out twice. At that point, it has been given the maximum amount of
time it can be given for the interested transactions to have committed and
to have aged past the xmin horizon. I seem to recall that the main problem
with that, though, is that you must be attached to a database in order to
determine visibility, and bgwriter is not attached to a database.

Cheers,

Jeff


From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Alexey Bashtanov <bashtanov(at)imap(dot)cc>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables
Date: 2014-10-21 22:25:58
Message-ID: 5446DD76.8050905@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/21/14, 4:36 PM, Jeff Janes wrote:
> On Mon, Oct 20, 2014 at 5:46 PM, Andres Freund <andres(at)2ndquadrant(dot)com <mailto:andres(at)2ndquadrant(dot)com>> wrote:
>
> On 2014-10-20 17:43:26 -0700, Josh Berkus wrote:
> > On 10/20/2014 05:39 PM, Jim Nasby wrote:
> > > Or maybe vacuum isn't the right way to handle some of these scenarios.
> > > It's become the catch-all for all of this stuff, but maybe that doesn't
> > > make sense anymore. Certainly when it comes to dealing with inserts
> > > there's no reason we *have* to do anything other than set hint bits and
> > > possibly freeze xmin.
> >
> > +1
>
> A page read is a page read. What's the point of heaving another process
> do it?
>
>
> It is only a page read if you have to read the page. It would seem optimal to have bgwriter adventitiously set hint bits and vm bits, because that is the last point at which the page can be changed without risking that it be written out twice. At that point, it has been given the maximum amount of time it can be given for the interested transactions to have committed and to have aged past the xmin horizon. I seem to recall that the main problem with that, though, is that you must be attached to a database in order to determine visibility, and bgwriter is not attached to a database.

It's also a bit more complex than a simple question of "is the page still in shared buffers". Our *real* last chance is when the page is about to be evicted from the filesystem cache; after that reading it back it will be extremely expensive (relatively speaking).

I think it's worth considering this, because if you have any moderate length transactions on a busy database bgwriter won't be able to help much; you'll be burning through shared buffers too quickly.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Alexey Bashtanov <bashtanov(at)imap(dot)cc>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables
Date: 2014-10-27 21:51:44
Message-ID: 20141027215144.GX1791@alvin.alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Janes wrote:

> It is only a page read if you have to read the page. It would seem optimal
> to have bgwriter adventitiously set hint bits and vm bits, because that is
> the last point at which the page can be changed without risking that it be
> written out twice. At that point, it has been given the maximum amount of
> time it can be given for the interested transactions to have committed and
> to have aged past the xmin horizon. I seem to recall that the main problem
> with that, though, is that you must be attached to a database in order to
> determine visibility, and bgwriter is not attached to a database.

Regarding tuple hint bits, I couldn't find any such limitation in
SetHintBits, other than in MarkBufferDirtyHint there being some code
that would cause trouble: it accesses MyPgXact, which bgwriter would
obviously not have. Maybe worth some experimentation ...

I'm not sure about vm bits, though. That's a whole different topic.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Alexey Bashtanov <bashtanov(at)imap(dot)cc>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables
Date: 2014-10-29 17:18:23
Message-ID: CA+TgmoZQkHzQPAM=3Uw=egw-TFzhq9A5wPuKfXoc97h0c+yyow@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Oct 27, 2014 at 5:51 PM, Alvaro Herrera
<alvherre(at)2ndquadrant(dot)com> wrote:
> Jeff Janes wrote:
>> It is only a page read if you have to read the page. It would seem optimal
>> to have bgwriter adventitiously set hint bits and vm bits, because that is
>> the last point at which the page can be changed without risking that it be
>> written out twice. At that point, it has been given the maximum amount of
>> time it can be given for the interested transactions to have committed and
>> to have aged past the xmin horizon. I seem to recall that the main problem
>> with that, though, is that you must be attached to a database in order to
>> determine visibility, and bgwriter is not attached to a database.
>
> Regarding tuple hint bits, I couldn't find any such limitation in
> SetHintBits, other than in MarkBufferDirtyHint there being some code
> that would cause trouble: it accesses MyPgXact, which bgwriter would
> obviously not have. Maybe worth some experimentation ...
>
> I'm not sure about vm bits, though. That's a whole different topic.

From a theoretical point of view, hint bits and VM bits present
mostly-similar issues; they are basically dependent on the state of
the transaction machinery, which for the most part is cluster-wide, so
it should be fine for a process to do that sort of work without being
bound to a specific database. I think the biggest problem for either
is that the bgwriter can't look at the system catalogs to find out
whether a particular buffer contains a heap page or an index page (or
something else) and it needs to know that, which probably requires
another buffer flag.

From a practical point of view, there's a major performance difference
between the two operations: setting a VM bit requires emitting a WAL
record, whereas setting hint bits does not. That might make it too
expensive to be worth doing, at least in some situations.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company