Quick Links

Re: speedup tidbitmap patch: cache page

Lists:	pgsql-hackers

From:	Teodor Sigaev <teodor(at)sigaev(dot)ru>
To:	Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	speedup tidbitmap patch: cache page
Date:	2014-10-22 11:52:37
Message-ID:	54479A85.8060309@sigaev.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

In specific workload postgres could spend a lot of time in tbm_add_tuples, up
to 50% of query time. hash_search call is expensive and called twice for each
ItemPointer to insert. Suggested patch tries to cache last PagetableEntry
pointer in hope that next ItemPointer will be on the same relation page.

Patch is rather trivial and I don't believe that it could cause performance
degradation. But it could increase performance on some workload.

An example:
# create extension btree_gin;
# select (v / 10)::int4 as i into t from generate_series(1, 5000000) as v;
# create index idx on t using gin (i);
# set enable_seqscan = off;

# explain analyze select * from t where i >= 0;
without patch: Execution time: 2427.692 ms
with patch: Execution time: 2058.718 ms

# explain analyze select * from t where i >= 100 and i<= 100;
without patch: Execution time: 524.441 ms
with patch: Execution time: 195.381 ms
--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

Attachment	Content-Type	Size
tbm_cachepage-1.patch.gz	application/x-gzip	637 bytes

From:	David Rowley <dgrowleyml(at)gmail(dot)com>
To:	Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc:	Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: speedup tidbitmap patch: cache page
Date:	2014-12-14 11:58:07
Message-ID:	CAApHDvp_+c1hPRiVfKcT1ty__6q5wnz2cMc_tNyVx6mhbbzSAg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 23 October 2014 at 00:52, Teodor Sigaev <teodor(at)sigaev(dot)ru> wrote:
>
> In specific workload postgres could spend a lot of time in
> tbm_add_tuples, up to 50% of query time. hash_search call is expensive and
> called twice for each ItemPointer to insert. Suggested patch tries to cache
> last PagetableEntry pointer in hope that next ItemPointer will be on the
> same relation page.
>
> Patch is rather trivial and I don't believe that it could cause
> performance degradation. But it could increase performance on some workload.
>
> An example:
> # create extension btree_gin;
> # select (v / 10)::int4 as i into t from generate_series(1, 5000000) as v;
> # create index idx on t using gin (i);
> # set enable_seqscan = off;
>
> # explain analyze select * from t where i >= 0;
> without patch: Execution time: 2427.692 ms
> with patch: Execution time: 2058.718 ms
>
> # explain analyze select * from t where i >= 100 and i<= 100;
> without patch: Execution time: 524.441 ms
> with patch: Execution time: 195.381 ms
>
>
This seems like quite a promising performance boost.

I've been having a look at this and I'm wondering about a certain scenario:

In tbm_add_tuples, if tbm_page_is_lossy() returns true for a given block,
and on the next iteration of the loop we have the same block again, have
you benchmarked any caching code to store if tbm_page_is_lossy() returned
true for that block on the previous iteration of the loop? This would save
from having to call tbm_page_is_lossy() again for the same block. Or are
you just expecting that tbm_page_is_lossy() returns true so rarely that
you'll end up caching the page most of the time, and gain on skipping both
hash lookups on the next loop, since page will be set in this case?

It would be nice to see a comment to explain why it might be a good idea to
cache the page lookup. Perhaps something like:

+ /*
+ * We'll cache this page as it's quite likely that
on the next loop
+ * we'll be seeing the same page again. This will
save from having
+ * to lookup the page in the hashtable again.
+ */
+ page = tbm_get_pageentry(tbm, blk);

I also wondered if there might be a small slowdown in the case where the
index only finds 1 matching tuple. So I tried the following:

select v::int4 as i into t1 from generate_series(1, 5000000) as v;
create index t1_idx on t1 using gin (i);

select count(*) from t1 where i >= 0;

In this case tbm_add_tuples is called with ntids == 1, so there's only 1
loop performed per call. I wondered if the page == NULL check would add
much overhead the function.

Times are in milliseconds:

Patch Master
2413.183 2339.979
2352.564 2428.025
2369.118 2359.498
2338.442 2350.974
2373.330 2400.942
2383.883 2342.461
2393.804 2359.49
2334.998 2359.884
2428.156 2520.842
2336.978 2356.995
avg. 2372.4456 2381.909 99.6%
med. 2371.224 2359.494 100.5%

It appears that if it does, then it's not very much.

Regards

David Rowley

From:	Teodor Sigaev <teodor(at)sigaev(dot)ru>
To:	David Rowley <dgrowleyml(at)gmail(dot)com>
Cc:	Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: speedup tidbitmap patch: cache page
Date:	2014-12-16 16:25:02
Message-ID:	54905CDE.5020500@sigaev.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> I've been having a look at this and I'm wondering about a certain scenario:
>
> In tbm_add_tuples, if tbm_page_is_lossy() returns true for a given block, and on
> the next iteration of the loop we have the same block again, have you
> benchmarked any caching code to store if tbm_page_is_lossy() returned true for
> that block on the previous iteration of the loop? This would save from having to
> call tbm_page_is_lossy() again for the same block. Or are you just expecting
> that tbm_page_is_lossy() returns true so rarely that you'll end up caching the
> page most of the time, and gain on skipping both hash lookups on the next loop,
> since page will be set in this case?
I believe that if we fall in lossy pages then tidbitmap will not have a
significant impact on preformance because postgres will spend a lot of time on
tuple rechecking on page. If work_mem is to small to keep exact tidbitmap then
postgres will significantly slowdown. I implemented it, (v2.1 in attachs) but
I don't think that is an improvement, at least significant improvement.

>
> It would be nice to see a comment to explain why it might be a good idea to
> cache the page lookup. Perhaps something like:
>

added, see attachment (v2)

>
> I also wondered if there might be a small slowdown in the case where the index
> only finds 1 matching tuple. So I tried the following:
> avg.2372.4456 2381.909 99.6%
> med.2371.224 2359.494 100.5%
>
> It appears that if it does, then it's not very much.

I believe, that's unmeasurable because standard deviation of your tests is about
2% what is greater that difference between pathed and master versions.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

Attachment	Content-Type	Size
tbm_cachepage-2.patch.gz	application/x-gzip	742 bytes
tbm_cachepage-2.1.patch.gz	application/x-gzip	841 bytes

From:	David Rowley <dgrowleyml(at)gmail(dot)com>
To:	Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc:	Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: speedup tidbitmap patch: cache page
Date:	2014-12-17 11:28:43
Message-ID:	CAApHDvrzZgb6SXQXd4xAoOGRQBsP-t1X-Li8LahBCUJxwLYmvg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 17 December 2014 at 05:25, Teodor Sigaev <teodor(at)sigaev(dot)ru> wrote:
>
> I've been having a look at this and I'm wondering about a certain scenario:
>>
>> In tbm_add_tuples, if tbm_page_is_lossy() returns true for a given block,
>> and on
>> the next iteration of the loop we have the same block again, have you
>> benchmarked any caching code to store if tbm_page_is_lossy() returned
>> true for
>> that block on the previous iteration of the loop? This would save from
>> having to
>> call tbm_page_is_lossy() again for the same block. Or are you just
>> expecting
>> that tbm_page_is_lossy() returns true so rarely that you'll end up
>> caching the
>> page most of the time, and gain on skipping both hash lookups on the next
>> loop,
>> since page will be set in this case?
>>
> I believe that if we fall in lossy pages then tidbitmap will not have a
> significant impact on preformance because postgres will spend a lot of
> time on tuple rechecking on page. If work_mem is to small to keep exact
> tidbitmap then postgres will significantly slowdown. I implemented it,
> (v2.1 in attachs) but
> I don't think that is an improvement, at least significant improvement.
>
>
You could well be right, but it would be good to compare the numbers just
so we know this for sure.

There seems to be a problem with the v2.1. The code does not look quite
right, if have expected something more like:

if (lossy_page == blk)
continue; /* whole page is already marked */

if (page == NULL || page->blockno != blk)

where you have the lossy_page check within the 2nd if test. I'd imagine
this will never be true as page->blockno != blk, or perhaps it'll only be
true when tbm_lossify() is called and you set the page to NULL.

There's also something weird going on using your original test case:

postgres=# set work_mem = '256MB';
SET
Time: 0.450 ms
postgres=# select count(*) from t where i >= 0;
count
---------
5000000
(1 row)

That's ok

Time: 1369.562 ms
postgres=# set work_mem = '64kB';
SET
Time: 0.425 ms
postgres=# select count(*) from t where i >= 0;
count
---------
4999998
(1 row)

Time: 892.726 ms

Notice the row counts are not the same.

create table t_result (i int not null);

postgres=# select a.a,a.b,b.a,b.b from (select i a,count(*) b from t group
by i) a full outer join (select i a,count(*) b from t_result group by i) b
on a.a = b.a where
b.b <> a.b;
a | b | a | b
--------+----+--------+---
315744 | 10 | 315744 | 8
(1 row)

postgres=# select ctid from t where i = 315744;
ctid
-------------
(13970,220)
(13970,221)
(13970,222)
(13970,223)
(13970,224)
(13970,225)
(13970,226)
(13971,1)
(13971,2)
(13971,3)
(10 rows)

This only seems to be broken in the v2.1.

Are you seeing the same?

>> It would be nice to see a comment to explain why it might be a good idea
>> to
>> cache the page lookup. Perhaps something like:
>>
>>
> added, see attachment (v2)
>
>
Thanks. That looks better. It would be good to compare performance of v2
and v2.1, but I think the problem with v2.1 needs to be fixed before that
can happen.

>
>> I also wondered if there might be a small slowdown in the case where the
>> index
>> only finds 1 matching tuple. So I tried the following:
>> avg.2372.4456 2381.909 99.6%
>> med.2371.224 2359.494 100.5%
>>
>> It appears that if it does, then it's not very much.
>>
>
> I believe, that's unmeasurable because standard deviation of your tests is
> about 2% what is greater that difference between pathed and master versions.
>
>
Agreed.

Regards

David Rowley

From:	Teodor Sigaev <teodor(at)sigaev(dot)ru>
To:	David Rowley <dgrowleyml(at)gmail(dot)com>
Cc:	Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: speedup tidbitmap patch: cache page
Date:	2014-12-17 15:56:48
Message-ID:	5491A7C0.20708@sigaev.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> You could well be right, but it would be good to compare the numbers just so we
> know this for sure.
I wasn't right :(

# set work_mem='64kB';
# set enable_seqscan = off;
Patched: 1194.094 ms
Master: 1765.338 ms

> Are you seeing the same?
Fixed too, the mistake was in supposition that current page becomes lossy in
tbm_lossify. It's not always true because tbm_lossify() could lossify only part
of pages and we don't know which. Thank you very much.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

Attachment	Content-Type	Size
tbm_cachepage-2.2.patch.gz	application/x-gzip	865 bytes

From:	David Rowley <dgrowleyml(at)gmail(dot)com>
To:	Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc:	Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: speedup tidbitmap patch: cache page
Date:	2014-12-18 10:52:35
Message-ID:	CAApHDvp9u9KDhWUF-sP-H7FQG6Ei=yaJnoLF8dBromy6CogNTg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 18 December 2014 at 04:56, Teodor Sigaev <teodor(at)sigaev(dot)ru> wrote:
>
> You could well be right, but it would be good to compare the numbers just
>> so we
>> know this for sure.
>>
> I wasn't right :(
>
> # set work_mem='64kB';
> # set enable_seqscan = off;
> Patched: 1194.094 ms
> Master: 1765.338 ms
>
> > Are you seeing the same?
> Fixed too, the mistake was in supposition that current page becomes lossy
> in tbm_lossify. It's not always true because tbm_lossify() could lossify
> only part of pages and we don't know which. Thank you very much.
>
>
Oh, that makes sense. Though I wonder if you need to clear the caches at
all when calling tbm_lossify(). Surely it never becomes un-lossified and
plus, at least for lossy_page it would never be set to the current page
anyway, it's either going to be set to InvalidBlockNumber, or some other
previous page that was lossy. I also can't quite see the need to set page
to NULL. Surely doing this would just mean we'd have to lookup the page
again once tbm_lossify() is called if the next loop happened to be the same
page? I think this would only be needed if the hash lookup was going to
return a new instance of the page after we've lossified it, which from what
I can tell won't happen.

I've made these small changes, and just tweaked the comments a little.
(attached)

I've also attached some benchmark results using your original table from
up-thread. It seems that the caching if the page was seen as lossy is not
much of a help in this test case. Did you find another one where you saw
some better gains?

Regards

David Rowley

Attachment	Content-Type	Size
tbm_cache_benchmark.xlsx	application/vnd.openxmlformats-officedocument.spreadsheetml.sheet	10.2 KB
tbm_cachepage-2.2_dr.patch	application/octet-stream	1.7 KB

From:	Teodor Sigaev <teodor(at)sigaev(dot)ru>
To:	David Rowley <dgrowleyml(at)gmail(dot)com>
Cc:	Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: speedup tidbitmap patch: cache page
Date:	2014-12-23 11:24:43
Message-ID:	549950FB.2050004@sigaev.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

>
> Oh, that makes sense. Though I wonder if you need to clear the caches at all
> when calling tbm_lossify(). Surely it never becomes un-lossified and plus, at
> least for lossy_page it would never be set to the current page anyway, it's
> either going to be set to InvalidBlockNumber, or some other previous page that
agree, fixed

> was lossy. I also can't quite see the need to set page to NULL. Surely doing
> this would just mean we'd have to lookup the page again once tbm_lossify() is
> called if the next loop happened to be the same page? I think this would only be
> needed if the hash lookup was going to return a new instance of the page after
> we've lossified it, which from what I can tell won't happen.

Page could become an invalid pointer, because during tbm_mark_page_lossy()
called from tbm_lossify() it could be freed.

> I've also attached some benchmark results using your original table from
> up-thread. It seems that the caching if the page was seen as lossy is not much
> of a help in this test case. Did you find another one where you saw some better
> gains?

All what I found is about 0.5%... v3 contains your comments but it doesn't use
lossy_page cache.
--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

Attachment	Content-Type	Size
tbm_cachepage-2.3.patch.gz	application/x-gzip	892 bytes
tbm_cachepage-3.patch.gz	application/x-gzip	751 bytes

From:	David Rowley <dgrowleyml(at)gmail(dot)com>
To:	Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc:	Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: speedup tidbitmap patch: cache page
Date:	2014-12-24 12:26:53
Message-ID:	CAApHDvr8f9UnfWZpVY0OEfd_Eoxa7Br_29j+fkZQ3-Qi=fU36w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 24 December 2014 at 00:24, Teodor Sigaev <teodor(at)sigaev(dot)ru> wrote:

> I've also attached some benchmark results using your original table from
>>
> up-thread. It seems that the caching if the page was seen as lossy is not
>> much
>> of a help in this test case. Did you find another one where you saw some
>> better
>> gains?
>>
>
> All what I found is about 0.5%... v3 contains your comments but it
> doesn't use
> lossy_page cache.
>
>
Ok, I've performed some more benchmarks (attached) using your original
table. I'm thinking the v2.3 version is not worth the extra complexity. It
seems the extra caching in v2.3, going by my benchmarks, is more likely to
add overhead than save from hash lookups.

With the query used in my tests using v2.3 I didn't even see a speed up
with 5 million records and 64KB of work_mem.

So I think v3 is the one to go with, and I can't see any problems with it,
so I'm marking it as ready for committer.

Nice work

Kind Regards

David Rowley

Attachment	Content-Type	Size
tidbench2.xlsx	application/vnd.openxmlformats-officedocument.spreadsheetml.sheet	10.3 KB

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	David Rowley <dgrowleyml(at)gmail(dot)com>
Cc:	Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: speedup tidbitmap patch: cache page
Date:	2015-01-16 16:50:37
Message-ID:	20150116165037.GG21581@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2014-12-25 01:26:53 +1300, David Rowley wrote:
> So I think v3 is the one to go with, and I can't see any problems with it,
> so I'm marking it as ready for committer.

And committed.

Thanks Teodor and David.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	David Rowley <dgrowleyml(at)gmail(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: speedup tidbitmap patch: cache page
Date:	2015-01-16 17:15:35
Message-ID:	29303.1421428535@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> On 2014-12-25 01:26:53 +1300, David Rowley wrote:
>> So I think v3 is the one to go with, and I can't see any problems with it,
>> so I'm marking it as ready for committer.

> And committed.

It strikes me that this patch leaves some lookups on the table,
specifically that it fails to avoid repeated hash_search lookups
inside tbm_page_is_lossy() in the situation where we're adding
new TIDs to an already-lossified page. Is it worth adding a few
more lines to handle that case as well?

regards, tom lane

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	David Rowley <dgrowleyml(at)gmail(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: speedup tidbitmap patch: cache page
Date:	2015-01-16 17:20:42
Message-ID:	20150116172042.GH21581@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2015-01-16 12:15:35 -0500, Tom Lane wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> > On 2014-12-25 01:26:53 +1300, David Rowley wrote:
> >> So I think v3 is the one to go with, and I can't see any problems with it,
> >> so I'm marking it as ready for committer.
>
> > And committed.
>
> It strikes me that this patch leaves some lookups on the table,
> specifically that it fails to avoid repeated hash_search lookups
> inside tbm_page_is_lossy() in the situation where we're adding
> new TIDs to an already-lossified page. Is it worth adding a few
> more lines to handle that case as well?

There was a alternative version (v2.3 in 549950FB(dot)2050004(at)sigaev(dot)ru) of
the patch that cached the lossyness of a page, but Teodor/David didn't
find it to be beneficial in their benchmarking.

Teodor's argument was basically that it's completely lost in the noise
due to the much bigger overhead of rechecks.

I thought it'd better to get this improvement committed rather than
waiting for someone to find a even bigger improvement for some case.

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	David Rowley <dgrowleyml(at)gmail(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: speedup tidbitmap patch: cache page
Date:	2015-01-16 17:35:34
Message-ID:	29899.1421429734@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> On 2015-01-16 12:15:35 -0500, Tom Lane wrote:
>> It strikes me that this patch leaves some lookups on the table,
>> specifically that it fails to avoid repeated hash_search lookups
>> inside tbm_page_is_lossy() in the situation where we're adding
>> new TIDs to an already-lossified page. Is it worth adding a few
>> more lines to handle that case as well?

> There was a alternative version (v2.3 in 549950FB(dot)2050004(at)sigaev(dot)ru) of
> the patch that cached the lossyness of a page, but Teodor/David didn't
> find it to be beneficial in their benchmarking.

> Teodor's argument was basically that it's completely lost in the noise
> due to the much bigger overhead of rechecks.

That's a fair point, but on reflection it seems like a patch that covered
this case too wouldn't actually be any more complicated, so why not?
v2.3 is pretty brute-force and I agree it's not very attractive, but
I was thinking about something like

BlockNumber cached_blkno = InvalidBlockNumber;
PagetableEntry *page = NULL;

inside loop:

/* look up the target page unless we already have it */
if (blk != cached_blkno)
{
if (tbm_page_is_lossy(tbm, blk))
page = NULL;
else
page = tbm_get_pageentry(tbm, blk);
cached_blkno = blk;
}
if (page == NULL)
continue; /* page is already marked lossy */

The "reset" after calling tbm_lossify() would just need to be
"cached_blkno = InvalidBlockNumber".

regards, tom lane