Minor optimizations in lazy_scan_heap

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Minor optimizations in lazy_scan_heap
Date: 2012-12-03 06:23:30
Message-ID: CABOikdP05GmaV=3vA7isJA7KO=iWCXGaBSnq7J+Hgqwq9WkFkg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I was looking at the code in lazy_scan_heap() and I realized there are
couple of low-hanging optimizations that we can do there.

1. The for-loop walks through each block of the relation. But if scan_all
is set to false, which would be the case most often, we can jump over to
the next not-all-visible block directly (after considering the
SKIP_PAGES_THRESHOLD etc). I understand that the cost of looping with no-op
may not be considerable, but it looks unnecessary. And it can matter when
there are thousands and millions of consecutive all-visible blocks in a
large table.

2. We also do a visibilitymap_test() for each block. I think it will be
more prudent to have a visibilitymap API, say visibilitymap_test_range(),
which can take a range of blocks and return the first not-all-visible block
from the range. Internally, the function can then test several blocks at a
time. We can still do this without holding a lock on the VM buffer because
when scan_all is false, we don't care much about the correctness of the
visibility check anyway. Also, this function can later be optimized if we
start saving some summary information about visibility maps, in which case
we can more efficiently find first not-all-visible block.

3. I also thought that the call to vacuum_delay_point() for every
visibility check is not required and a simple CHECK_FOR_INTERRUPTS would be
good enough. Later I realized that may be we need that because visibility
map check can do an IO for the VM page. But if we do 2, then we can at
least limit calling vacuum_delay_point() once for every VM page, instead of
one per bit. I concede that the cost of calling vacuum_delay_point() may
not be too high, but it again looks unnecessary and can be taken care by a
slight re-factoring of the code.

Comments ? Anyone thinks any/all of above is useful ?

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2012-12-03 06:30:54 Re: Patch for removng unused targets
Previous Message Jiang Guiqing 2012-12-03 05:12:48 [PATCH] Patch to fix libecpg.so for isinf missing