Re: Minor optimizations in lazy_scan_heap

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minor optimizations in lazy_scan_heap
Date: 2012-12-04 17:32:26
Message-ID: CA+TgmobXdhS+-xt=knjYz0QoaqLKv1zEpEm1WbS2ctpFOe0N2g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 3, 2012 at 1:23 AM, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> wrote:
> I was looking at the code in lazy_scan_heap() and I realized there are
> couple of low-hanging optimizations that we can do there.
>
> 1. The for-loop walks through each block of the relation. But if scan_all is
> set to false, which would be the case most often, we can jump over to the
> next not-all-visible block directly (after considering the
> SKIP_PAGES_THRESHOLD etc). I understand that the cost of looping with no-op
> may not be considerable, but it looks unnecessary. And it can matter when
> there are thousands and millions of consecutive all-visible blocks in a
> large table.
>
> 2. We also do a visibilitymap_test() for each block. I think it will be more
> prudent to have a visibilitymap API, say visibilitymap_test_range(), which
> can take a range of blocks and return the first not-all-visible block from
> the range. Internally, the function can then test several blocks at a time.
> We can still do this without holding a lock on the VM buffer because when
> scan_all is false, we don't care much about the correctness of the
> visibility check anyway. Also, this function can later be optimized if we
> start saving some summary information about visibility maps, in which case
> we can more efficiently find first not-all-visible block.
>
> 3. I also thought that the call to vacuum_delay_point() for every visibility
> check is not required and a simple CHECK_FOR_INTERRUPTS would be good
> enough. Later I realized that may be we need that because visibility map
> check can do an IO for the VM page. But if we do 2, then we can at least
> limit calling vacuum_delay_point() once for every VM page, instead of one
> per bit. I concede that the cost of calling vacuum_delay_point() may not be
> too high, but it again looks unnecessary and can be taken care by a slight
> re-factoring of the code.
>
> Comments ? Anyone thinks any/all of above is useful ?

I doubt that any of these things make enough difference to be worth
bothering with, but if you have benchmark results suggesting otherwise
I'm all ears.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-12-04 17:34:51 Re: WIP: store additional info in GIN index
Previous Message Bruce Momjian 2012-12-04 17:31:28 Re: [PATCH] Patch to fix libecpg.so for isinf missing