Re: [PERFORM] encouraging index-only scans

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <peter(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PERFORM] encouraging index-only scans
Date: 2013-09-09 03:49:03
Message-ID: CAA4eK1+SNfiDaVPEB_xv4-kWP8LBPXdh+C2WVC277Y-djT=qdw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Mon, Sep 9, 2013 at 2:35 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> On Thu, Sep 5, 2013 at 7:00 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>> On Thu, Sep 5, 2013 at 09:10:06PM -0400, Robert Haas wrote:
>>> On Thu, Sep 5, 2013 at 8:14 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>>> > Actually, I now realize it is more complex than that, and worse. There
>>> > are several questions to study to understand when pg_class.relallvisible
>>> > is updated (which is used to determine if index-only scans are a good
>>> > optimization choice), and when VM all-visible bits are set so heap pages
>>> > can be skipped during index-only scans:
>>> >
>>> > 1) When are VM bits set:
>>> > vacuum (non-full)
>>> > analyze (only some random pages)
>>>
>>> Analyze doesn't set visibility-map bits. It only updates statistics
>>> about how many are set.
>>
>> Sorry, yes you are correct.
>>
>>> > The calculus we should use to determine when we need to run vacuum has
>>> > changed with index-only scans, and I am not sure we ever fully addressed
>>> > this.
>>>
>>> Yeah, we didn't. I think the hard part is figuring out what behavior
>>> would be best. Counting inserts as well as updates and deletes would
>>> be a simple approach, but I don't have much confidence in it. My
>>> experience is that having vacuum or analyze kick in during a bulk-load
>>> operation is a disaster. We'd kinda like to come up with a way to
>>> make vacuum run after the bulk load is complete, maybe, but how would
>>> we identify that time, and there are probably cases where that's not
>>> right either.
>>
>> I am unsure how we have gone a year with index-only scans and I am just
>> now learning that it only works well with update/delete workloads or by
>> running vacuum manually. I only found this out going back over January
>> emails. Did other people know this? Was it not considered a serious
>> problem?
>
> I thought it was well known, but maybe I was overly optimistic. I've
> considered IOS to be mostly useful for data mining work on read-mostly
> tables, which you would probably vacuum manually after a bulk load.
>
> For transactional tables, I think that trying to keep the vm set-bit
> density high enough would be a losing battle. If we redefined the
> nature of the vm so that doing a HOT update would not clear the
> visibility bit, perhaps that would change the outcome of this battle.

Wouldn't it make the Vacuum bit in-efficient in the sense that it will
skip some of the pages in which there are only
HOT updates for cleaning dead rows.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2013-09-09 05:21:47 INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Previous Message Tomas Vondra 2013-09-08 23:19:31 Re: [rfc] overhauling pgstat.stat

Browse pgsql-performance by date

  From Date Subject
Next Message Jeff Janes 2013-09-09 16:03:19 Re: [PERFORM] encouraging index-only scans
Previous Message Jeff Janes 2013-09-08 21:05:00 Re: [PERFORM] encouraging index-only scans