Re: [PERFORM] encouraging index-only scans

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <peter(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PERFORM] encouraging index-only scans
Date: 2013-09-06 22:22:41
Message-ID: 20130906222241.GB626072@alap2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On 2013-09-06 13:01:59 -0400, Bruce Momjian wrote:
> On Fri, Sep 6, 2013 at 06:36:47PM +0200, Andres Freund wrote:
> > On 2013-09-06 12:30:56 -0400, Bruce Momjian wrote:
> > > > I am not sure I understand this though. What would be the point to go
> > > > and set all visible and not do the rest of the vacuuming work?
> > > >
> > > > I think triggering vacuuming by scanning the visibility map for the
> > > > number of unset bits and use that as another trigger is a good idea. The
> > > > vm should ensure we're not doing superflous work.
> > >
> > > Yes, I think it might be hard to justify a separate VM-set-only scan of
> > > the table. If you are already reading the table, and already checking
> > > to see if you can set the VM bit, I am not sure why you would not also
> > > remove old rows, especially since removing those rows might be necessary
> > > to allow setting VM bits.
> >
> > Yep. Although adding the table back into the fsm will lead to it being
> > used for new writes again...
>
> You mean adding _pages_ back into the table's FSM? Yes, that is going
> to cause those pages to get dirty, but it is better than expanding the
> table size. I don't see why you would not update the FSM.

You're right, we should add them, I wasn't really questioning that. I
was, quietly so you couldn't hear it, wondering whether we should
priorize the target buffer selection differently.

> > > Another problem I thought of is that while automatic vacuuming only
> > > happens with high update/delete load, index-only scans are best on
> > > mostly non-write tables, so we have bad behavior where the ideal case
> > > (static data) doesn't get vm-bits set, while update/delete has the
> > > vm-bits set, but then cleared as more update/deletes occur.
> >
> > Well, older tables will get vacuumed due to vacuum_freeze_table_age. So
> > at some point they will get vacuumed and the vm bits will get set.
>
> Hmm, good point. That would help with an insert-only workload, as long
> as you can chew through 200M transactions. That doesn't help with a
> read-only workload as we don't consume transction IDs for SELECT.

It's even 150mio. For the other workloads, its pretty "common" wisdom to
VACUUM after bulk data loading. I think we even document that.

> > > The more I look at this the worse it appears. How has this gone
> > > unaddressed for over a year?
> >
> > It's been discussed several times including during the introduction of
> > the feature. I am a bit surprised about the panickey tone in this
> > thread.
>
> This December 2012 thread by Andrew Dunstan shows he wasn't aware that a
> manual VACUUM was required for index-only scans. That thread ended with
> us realizing that pg_upgrade's ANALYZE runs will populate
> pg_class.relallvisible.

> What I didn't see in that thread is an analysis of what cases are going
> to require manual vacuum, and I have seen no work in 9.3 to improve
> that. I don't even see it on the TODO list.

Yes, TODO maybe missing.

> It bothers me that we spent time developing index-only scans, but have
> significant workloads where it doesn't work, no efforts on improving it,
> and no documentation on manual workarounds. I have not even seen
> discussion on how we are going to improve this. I would like to have
> that discussion now.

It's not like the feature is useless in this case. You just need to
perform an extra operation to activate it. I am not saying we shouldn't
document it better, but it seriously worries me that a useful feature is
depicted as useless because it requires a manual VACUUM in some
circumstances.

> > Yes, we need to overhaul the way vacuum works (to reduce the frequency
> > of rewriting stuff repeatedly) and the way it's triggered (priorization,
> > more trigger conditions) but all these are known things and "just" need
> > somebody with time.

> Based on the work needed to improve this, I would have thought someone
> would have taken this on during 9.3 development.

There has been some discussion about it indirectly via the freezing
stuff. That also would require more "advanced" scheduling.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-09-06 22:26:23 Re: [PERFORM] encouraging index-only scans
Previous Message Tom Lane 2013-09-06 21:47:11 Re: Fix picksplit with nan values

Browse pgsql-performance by date

  From Date Subject
Next Message Andres Freund 2013-09-06 22:26:23 Re: [PERFORM] encouraging index-only scans
Previous Message Jim Nasby 2013-09-06 20:10:06 Re: [PERFORM] encouraging index-only scans