Minor optimizations in lazy_scan_heap

Lists: pgsql-hackers
From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Minor optimizations in lazy_scan_heap
Date: 2012-12-03 06:23:30
Message-ID: CABOikdP05GmaV=3vA7isJA7KO=iWCXGaBSnq7J+Hgqwq9WkFkg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I was looking at the code in lazy_scan_heap() and I realized there are
couple of low-hanging optimizations that we can do there.

1. The for-loop walks through each block of the relation. But if scan_all
is set to false, which would be the case most often, we can jump over to
the next not-all-visible block directly (after considering the
SKIP_PAGES_THRESHOLD etc). I understand that the cost of looping with no-op
may not be considerable, but it looks unnecessary. And it can matter when
there are thousands and millions of consecutive all-visible blocks in a
large table.

2. We also do a visibilitymap_test() for each block. I think it will be
more prudent to have a visibilitymap API, say visibilitymap_test_range(),
which can take a range of blocks and return the first not-all-visible block
from the range. Internally, the function can then test several blocks at a
time. We can still do this without holding a lock on the VM buffer because
when scan_all is false, we don't care much about the correctness of the
visibility check anyway. Also, this function can later be optimized if we
start saving some summary information about visibility maps, in which case
we can more efficiently find first not-all-visible block.

3. I also thought that the call to vacuum_delay_point() for every
visibility check is not required and a simple CHECK_FOR_INTERRUPTS would be
good enough. Later I realized that may be we need that because visibility
map check can do an IO for the VM page. But if we do 2, then we can at
least limit calling vacuum_delay_point() once for every VM page, instead of
one per bit. I concede that the cost of calling vacuum_delay_point() may
not be too high, but it again looks unnecessary and can be taken care by a
slight re-factoring of the code.

Comments ? Anyone thinks any/all of above is useful ?

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minor optimizations in lazy_scan_heap
Date: 2012-12-04 17:32:26
Message-ID: CA+TgmobXdhS+-xt=knjYz0QoaqLKv1zEpEm1WbS2ctpFOe0N2g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Dec 3, 2012 at 1:23 AM, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> wrote:
> I was looking at the code in lazy_scan_heap() and I realized there are
> couple of low-hanging optimizations that we can do there.
>
> 1. The for-loop walks through each block of the relation. But if scan_all is
> set to false, which would be the case most often, we can jump over to the
> next not-all-visible block directly (after considering the
> SKIP_PAGES_THRESHOLD etc). I understand that the cost of looping with no-op
> may not be considerable, but it looks unnecessary. And it can matter when
> there are thousands and millions of consecutive all-visible blocks in a
> large table.
>
> 2. We also do a visibilitymap_test() for each block. I think it will be more
> prudent to have a visibilitymap API, say visibilitymap_test_range(), which
> can take a range of blocks and return the first not-all-visible block from
> the range. Internally, the function can then test several blocks at a time.
> We can still do this without holding a lock on the VM buffer because when
> scan_all is false, we don't care much about the correctness of the
> visibility check anyway. Also, this function can later be optimized if we
> start saving some summary information about visibility maps, in which case
> we can more efficiently find first not-all-visible block.
>
> 3. I also thought that the call to vacuum_delay_point() for every visibility
> check is not required and a simple CHECK_FOR_INTERRUPTS would be good
> enough. Later I realized that may be we need that because visibility map
> check can do an IO for the VM page. But if we do 2, then we can at least
> limit calling vacuum_delay_point() once for every VM page, instead of one
> per bit. I concede that the cost of calling vacuum_delay_point() may not be
> too high, but it again looks unnecessary and can be taken care by a slight
> re-factoring of the code.
>
> Comments ? Anyone thinks any/all of above is useful ?

I doubt that any of these things make enough difference to be worth
bothering with, but if you have benchmark results suggesting otherwise
I'm all ears.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minor optimizations in lazy_scan_heap
Date: 2012-12-08 07:11:06
Message-ID: CABOikdN2i2HVGB2V-S-pgmMysbgBtidcxXxi4eVoWhFR0Tyjnw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 4, 2012 at 11:02 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Mon, Dec 3, 2012 at 1:23 AM, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
> >
> > Comments ? Anyone thinks any/all of above is useful ?
>
> I doubt that any of these things make enough difference to be worth
> bothering with,

You're right. These are not big ticket optimisations, still I felt they are
worth doing because tiny bits add up over a time and also because the code
may become little simpler. The benchmarks don't show anything interesting
though. The time taken to scan 100K+ bits is sub-second. So even when I
tried with the attached patch, the numbers did not show any noticeable
difference. It might be worth trying with a table with 1M or 10M data
blocks, but I don't have such a hardware to test.

The patch itself can be improved further, especially we can possibly
optimise the loop and test 32-bits at a time, instead of 8 I am doing
currently. Not sure its worth though.

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

Attachment Content-Type Size
vm_test_range-v2.patch application/octet-stream 7.4 KB