Re: Patch for fast gin cache performance improvement

Lists: pgsql-hackers
From: Ian Link <ian(at)ilink(dot)io>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Patch for fast gin cache performance improvement
Date: 2013-06-18 04:42:37
Message-ID: 51BFE53D.9050008@ilink.io
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

*

This patch contains a performance improvement for the fast gin cache. As
you may know, the performance of the fast gin cache decreases with its
size. Currently, the size of the fast gin cache is tied to work_mem. The
size of work_mem can often be quite high. The large size of work_mem is
inappropriate for the fast gin cache size. Therefore, we created a
separate cache size called gin_fast_limit. This global variable controls
the size of the fast gin cache, independently of work_mem. Currently,
the default gin_fast_limit is set to 128kB. However, that value could
need tweaking. 64kB may work better, but it's hard to say with only my
single machine to test on.

On my machine, this patch results in a nice speed up. Our test queries
improve from about 0.9 ms to 0.030 ms. Please feel free to use the test
case yourself: it should be attached. I can look into additional test
cases (tsvectors) if anyone is interested.

In addition to the global limit, we have provided a per-index limit:
fast_cache_size. This per-index limit begins at -1, which means that it
is disabled. If the user does not specify a per-index limit, the index
will simply use the global limit.

I would like to thank Andrew Gierth for all his help on this patch. As
this is my first patch he was extremely helpful. The idea for this
performance improvement was entirely his. I just did the implementation.
Thanks for reading and considering this patch!*

Ian Link

Attachment Content-Type Size
gin-fast-perf.sql text/plain 1.5 KB
fastgin_mem.patch text/plain 9.5 KB

From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Ian Link <ian(at)ilink(dot)io>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch for fast gin cache performance improvement
Date: 2013-06-18 16:09:59
Message-ID: 1371571799.93981.YahooMailNeo@web162904.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Ian Link <ian(at)ilink(dot)io> wrote:

> This patch contains a performance improvement for the fast gin
> cache.

> Our test queries improve from about 0.9 ms to 0.030 ms.

Impressive!

> Thanks for reading and considering this patch!

Congratulations on your first PostgreSQL patch!  To get it
scheduled for review, please add it to this page:

https://commitfest.postgresql.org/action/commitfest_view/open

You will need to get a community login (if you don't already have
one), but that is a quick and painless process.  Choose an
appropriate topic (like "Performance") and reference the message ID
of the email to which you attached the patch.  Don't worry about
the fields for reviewers, committer, or date closed.

Sorry for the administrative overhead, but without it things can
fall through the cracks.  You can find an overview of the review
process with links to more detail here:

http://wiki.postgresql.org/wiki/CommitFest

Thanks for contributing!

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>, Ian Link <ian(at)ilink(dot)io>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch for fast gin cache performance improvement
Date: 2013-06-18 16:15:58
Message-ID: 1371572158.11590.YahooMailNeo@web162904.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:

> You will need to get a community login (if you don't already have
> one), but that is a quick and painless process.

Oops -- we seem to have a problem with new community logins at the
moment, which will hopefully be straightened out soon.  You might
want to wait a few days if you don't already have a login.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Ian Link <ian(at)ilink(dot)io>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch for fast gin cache performance improvement
Date: 2013-06-18 18:41:36
Message-ID: 51C0A9E0.5080400@ilink.io
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


No worries! I'll just wait until it's up again.
Thanks
Ian
> Kevin Grittner <mailto:kgrittn(at)ymail(dot)com>
> Tuesday, June 18, 2013 9:15 AM
>
> Oops -- we seem to have a problem with new community logins at the
> moment, which will hopefully be straightened out soon. You might
> want to wait a few days if you don't already have a login.
>
> --
> Kevin Grittner
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
> Kevin Grittner <mailto:kgrittn(at)ymail(dot)com>
> Tuesday, June 18, 2013 9:09 AM
> Ian Link<ian(at)ilink(dot)io> wrote:
>
>> This patch contains a performance improvement for the fast gin
>> cache.
>
>> Our test queries improve from about 0.9 ms to 0.030 ms.
>
> Impressive!
>
>> Thanks for reading and considering this patch!
>
> Congratulations on your first PostgreSQL patch! To get it
> scheduled for review, please add it to this page:
>
> https://commitfest.postgresql.org/action/commitfest_view/open
>
> You will need to get a community login (if you don't already have
> one), but that is a quick and painless process. Choose an
> appropriate topic (like "Performance") and reference the message ID
> of the email to which you attached the patch. Don't worry about
> the fields for reviewers, committer, or date closed.
>
> Sorry for the administrative overhead, but without it things can
> fall through the cracks. You can find an overview of the review
> process with links to more detail here:
>
> http://wiki.postgresql.org/wiki/CommitFest
>
> Thanks for contributing!
>
> Ian Link <mailto:ian(at)ilink(dot)io>
> Monday, June 17, 2013 9:42 PM
> *
>
> This patch contains a performance improvement for the fast gin cache.
> As you may know, the performance of the fast gin cache decreases with
> its size. Currently, the size of the fast gin cache is tied to
> work_mem. The size of work_mem can often be quite high. The large size
> of work_mem is inappropriate for the fast gin cache size. Therefore,
> we created a separate cache size called gin_fast_limit. This global
> variable controls the size of the fast gin cache, independently of
> work_mem. Currently, the default gin_fast_limit is set to 128kB.
> However, that value could need tweaking. 64kB may work better, but
> it's hard to say with only my single machine to test on.
>
>
> On my machine, this patch results in a nice speed up. Our test queries
> improve from about 0.9 ms to 0.030 ms. Please feel free to use the
> test case yourself: it should be attached. I can look into additional
> test cases (tsvectors) if anyone is interested.
>
>
> In addition to the global limit, we have provided a per-index limit:
> fast_cache_size. This per-index limit begins at -1, which means that
> it is disabled. If the user does not specify a per-index limit, the
> index will simply use the global limit.
>
>
> I would like to thank Andrew Gierth for all his help on this patch. As
> this is my first patch he was extremely helpful. The idea for this
> performance improvement was entirely his. I just did the
> implementation. Thanks for reading and considering this patch!*
>
>
> Ian Link


From: ian link <ian(at)ilink(dot)io>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch for fast gin cache performance improvement
Date: 2013-06-23 02:03:00
Message-ID: CAOOwM5+AuRtPDfyXL9z-9TqmFun_=hdMA+iSs3HrcU4JcTtvLA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Looks like my community login is still not working. No rush, just wanted to
let you know. Thanks!

Ian

On Tue, Jun 18, 2013 at 11:41 AM, Ian Link <ian(at)ilink(dot)io> wrote:

>
> No worries! I'll just wait until it's up again.
> Thanks
> Ian
>
> Kevin Grittner <kgrittn(at)ymail(dot)com>
> Tuesday, June 18, 2013 9:15 AM
>
> Oops -- we seem to have a problem with new community logins at the
> moment, which will hopefully be straightened out soon. You might
> want to wait a few days if you don't already have a login.
>
> --
> Kevin Grittner
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
> Kevin Grittner <kgrittn(at)ymail(dot)com>
> Tuesday, June 18, 2013 9:09 AM
>
> Ian Link <ian(at)ilink(dot)io> <ian(at)ilink(dot)io> wrote:
>
>
> This patch contains a performance improvement for the fast gin
> cache.
>
> Our test queries improve from about 0.9 ms to 0.030 ms.
>
> Impressive!
>
>
> Thanks for reading and considering this patch!
>
>
> Congratulations on your first PostgreSQL patch! To get it
> scheduled for review, please add it to this page:
> https://commitfest.postgresql.org/action/commitfest_view/open
>
>
> You will need to get a community login (if you don't already have
> one), but that is a quick and painless process. Choose an
> appropriate topic (like "Performance") and reference the message ID
> of the email to which you attached the patch. Don't worry about
> the fields for reviewers, committer, or date closed.
>
> Sorry for the administrative overhead, but without it things can
> fall through the cracks. You can find an overview of the review
> process with links to more detail here:
>
> http://wiki.postgresql.org/wiki/CommitFest
>
> Thanks for contributing!
>
>
> Ian Link <ian(at)ilink(dot)io>
> Monday, June 17, 2013 9:42 PM
> *
>
> This patch contains a performance improvement for the fast gin cache. As
> you may know, the performance of the fast gin cache decreases with its
> size. Currently, the size of the fast gin cache is tied to work_mem. The
> size of work_mem can often be quite high. The large size of work_mem is
> inappropriate for the fast gin cache size. Therefore, we created a separate
> cache size called gin_fast_limit. This global variable controls the size of
> the fast gin cache, independently of work_mem. Currently, the default
> gin_fast_limit is set to 128kB. However, that value could need tweaking.
> 64kB may work better, but it's hard to say with only my single machine to
> test on.
>
> On my machine, this patch results in a nice speed up. Our test queries
> improve from about 0.9 ms to 0.030 ms. Please feel free to use the test
> case yourself: it should be attached. I can look into additional test cases
> (tsvectors) if anyone is interested.
>
> In addition to the global limit, we have provided a per-index limit:
> fast_cache_size. This per-index limit begins at -1, which means that it is
> disabled. If the user does not specify a per-index limit, the index will
> simply use the global limit.
>
> I would like to thank Andrew Gierth for all his help on this patch. As
> this is my first patch he was extremely helpful. The idea for this
> performance improvement was entirely his. I just did the implementation.
> Thanks for reading and considering this patch!*
>
>
> Ian Link
>
>


From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: ian link <ian(at)ilink(dot)io>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch for fast gin cache performance improvement
Date: 2013-06-23 09:05:35
Message-ID: 51C6BA5F.2060801@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 06/23/2013 04:03 AM, ian link wrote:
> Looks like my community login is still not working. No rush, just wanted
> to let you know. Thanks!

have you tried to log in once to the main website per:

http://www.postgresql.org/message-id/CABUevEyt9tQfcF7T2Uzcr8WeF9M=s8qSACuCmN5L2Et26=RTjw@mail.gmail.com

?

Stefan


From: Ian Link <ian(at)ilink(dot)io>
To: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch for fast gin cache performance improvement
Date: 2013-06-23 09:28:04
Message-ID: 51C6BFA4.4040003@ilink.io
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I just tried it and my account works now. Thanks!
Ian

> Stefan Kaltenbrunner <mailto:stefan(at)kaltenbrunner(dot)cc>
> Sunday, June 23, 2013 2:05 AM
>
> have you tried to log in once to the main website per:
>
> http://www.postgresql.org/message-id/CABUevEyt9tQfcF7T2Uzcr8WeF9M=s8qSACuCmN5L2Et26=RTjw@mail.gmail.com
>
> ?
>
>
> Stefan
> ian link <mailto:ian(at)ilink(dot)io>
> Saturday, June 22, 2013 7:03 PM
> Looks like my community login is still not working. No rush, just
> wanted to let you know. Thanks!
>
> Ian
>
>
>
> Ian Link <mailto:ian(at)ilink(dot)io>
> Tuesday, June 18, 2013 11:41 AM
>
> No worries! I'll just wait until it's up again.
> Thanks
> Ian
> Kevin Grittner <mailto:kgrittn(at)ymail(dot)com>
> Tuesday, June 18, 2013 9:15 AM
>
> Oops -- we seem to have a problem with new community logins at the
> moment, which will hopefully be straightened out soon. You might
> want to wait a few days if you don't already have a login.
>
> --
> Kevin Grittner
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
> Kevin Grittner <mailto:kgrittn(at)ymail(dot)com>
> Tuesday, June 18, 2013 9:09 AM
> Ian Link<ian(at)ilink(dot)io> wrote:
>
>> This patch contains a performance improvement for the fast gin
>> cache.
>
>> Our test queries improve from about 0.9 ms to 0.030 ms.
>
> Impressive!
>
>> Thanks for reading and considering this patch!
>
> Congratulations on your first PostgreSQL patch! To get it
> scheduled for review, please add it to this page:
>
> https://commitfest.postgresql.org/action/commitfest_view/open
>
> You will need to get a community login (if you don't already have
> one), but that is a quick and painless process. Choose an
> appropriate topic (like "Performance") and reference the message ID
> of the email to which you attached the patch. Don't worry about
> the fields for reviewers, committer, or date closed.
>
> Sorry for the administrative overhead, but without it things can
> fall through the cracks. You can find an overview of the review
> process with links to more detail here:
>
> http://wiki.postgresql.org/wiki/CommitFest
>
> Thanks for contributing!
>


From: "Etsuro Fujita" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: "'Ian Link'" <ian(at)ilink(dot)io>, "'pgsql-hackers'" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch for fast gin cache performance improvement
Date: 2013-09-26 13:02:53
Message-ID: 006801cebab8$b6352c90$229f85b0$@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Ian,

> This patch contains a performance improvement for the fast gin cache. As you
> may know, the performance of the fast gin cache decreases with its size.
> Currently, the size of the fast gin cache is tied to work_mem. The size of
> work_mem can often be quite high. The large size of work_mem is inappropriate
> for the fast gin cache size. Therefore, we created a separate cache size
called
> gin_fast_limit. This global variable controls the size of the fast gin cache,
> independently of work_mem. Currently, the default gin_fast_limit is set to
128kB.
> However, that value could need tweaking. 64kB may work better, but it's hard
> to say with only my single machine to test on.

> On my machine, this patch results in a nice speed up. Our test queries improve
> from about 0.9 ms to 0.030 ms. Please feel free to use the test case yourself:
> it should be attached. I can look into additional test cases (tsvectors) if
> anyone is interested.

> In addition to the global limit, we have provided a per-index limit:
> fast_cache_size. This per-index limit begins at -1, which means that it is
> disabled. If the user does not specify a per-index limit, the index will
simply
> use the global limit.

I had a look over this patch. I think this patch is interesting and very
useful. Here are my review points:

1. Patch applies cleanly.
2. make, make install and make check is good.
3. I did performance evaluation using your test queries with 64kB and 128kB of
gin_fast_limit (or fast_cache_size), and saw that both values achieved the
performance gains over gin_fast_limit = '256MB'. 64kB worked better than 128kB.
64kB improved from 1.057 ms to 0.075 ms. Great!
4. In my understanding, the small value of gin_fast_limit/fast_cache_size leads
to the increase in GIN search performance, which, however, leads to the decrease
in GIN update performance. Am I right? If so, I think the tradeoff should be
noted in the documentation.
5. The following documents in Chapter 57. GIN Indexes need to be updated:
* 57.3.1. GIN Fast Update Technique
* 57.4. GIN Tips and Tricks
6. I would like to see the results for the additional test cases (tsvectors).
7. The commented-out elog() code should be removed.
8. I think there are no issues in this patch. However, I have one question: how
this patch works in the case where gin_fast_limit/fast_cache_size = 0? In this
case, in my understanding, this patch inserts new entries into the pending list
temporarily and immediately moves them to the main GIN data structure using
ginInsertCleanup(). Am I right? If so, that is obviously inefficient.

Sorry for the delay.

Best regards,
Etsuro Fujita


From: "Etsuro Fujita" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: "'Ian Link'" <ian(at)ilink(dot)io>, "'pgsql-hackers'" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch for fast gin cache performance improvement
Date: 2013-09-27 09:24:43
Message-ID: 004001cebb63$669f85d0$33de9170$@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> I had a look over this patch. I think this patch is interesting and very
useful.
> Here are my review points:

> 8. I think there are no issues in this patch. However, I have one question:
> how this patch works in the case where gin_fast_limit/fast_cache_size = 0? In
> this case, in my understanding, this patch inserts new entries into the
pending
> list temporarily and immediately moves them to the main GIN data structure
using
> ginInsertCleanup(). Am I right? If so, that is obviously inefficient.

Sorry, There are incorrect expressions. I mean gin_fast_limit > 0 and
fast_cache_size = 0.

Although I asked this question, I've reconsidered about these parameters, and it
seems that these parameters not only make code rather complex but are a little
confusing to users. So I'd like to propose to introduce only one parameter:
fast_cache_size. While users that give weight to update performance for the
fast update technique set this parameter to a large value, users that give
weight not only to update performance but to search performance set this
parameter to a small value. What do you think about this?

Thanks,

Best regards,
Etsuro Fujita


From: Ian Link <ian(at)ilink(dot)io>
To: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: 'pgsql-hackers' <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch for fast gin cache performance improvement
Date: 2013-09-30 22:09:35
Message-ID: 5249F69F.10305@ilink.io
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Etsuro,
Sorry for the delay but I have been very busy with work. I have been
away from postgres for a while, so I will need a little time to review
the code and make sure I give you an informed response. I'll get back to
you as soon as I am able. Thanks for understanding.
Ian Link

> Etsuro Fujita <mailto:fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
> Friday, September 27, 2013 2:24 AM
> I wrote:
>> I had a look over this patch. I think this patch is interesting and very
> useful.
>> Here are my review points:
>
>> 8. I think there are no issues in this patch. However, I have one question:
>> how this patch works in the case where gin_fast_limit/fast_cache_size = 0? In
>> this case, in my understanding, this patch inserts new entries into the
> pending
>> list temporarily and immediately moves them to the main GIN data structure
> using
>> ginInsertCleanup(). Am I right? If so, that is obviously inefficient.
>
> Sorry, There are incorrect expressions. I mean gin_fast_limit> 0 and
> fast_cache_size = 0.
>
> Although I asked this question, I've reconsidered about these parameters, and it
> seems that these parameters not only make code rather complex but are a little
> confusing to users. So I'd like to propose to introduce only one parameter:
> fast_cache_size. While users that give weight to update performance for the
> fast update technique set this parameter to a large value, users that give
> weight not only to update performance but to search performance set this
> parameter to a small value. What do you think about this?
>
> Thanks,
>
> Best regards,
> Etsuro Fujita
>
>
> Etsuro Fujita <mailto:fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
> Thursday, September 26, 2013 6:02 AM
> Hi Ian,
>
>> This patch contains a performance improvement for the fast gin cache. As you
>> may know, the performance of the fast gin cache decreases with its size.
>> Currently, the size of the fast gin cache is tied to work_mem. The size of
>> work_mem can often be quite high. The large size of work_mem is inappropriate
>> for the fast gin cache size. Therefore, we created a separate cache size
> called
>> gin_fast_limit. This global variable controls the size of the fast gin cache,
>> independently of work_mem. Currently, the default gin_fast_limit is set to
> 128kB.
>> However, that value could need tweaking. 64kB may work better, but it's hard
>> to say with only my single machine to test on.
>
>> On my machine, this patch results in a nice speed up. Our test queries improve
>> from about 0.9 ms to 0.030 ms. Please feel free to use the test case yourself:
>> it should be attached. I can look into additional test cases (tsvectors) if
>> anyone is interested.
>
>> In addition to the global limit, we have provided a per-index limit:
>> fast_cache_size. This per-index limit begins at -1, which means that it is
>> disabled. If the user does not specify a per-index limit, the index will
> simply
>> use the global limit.
>
> I had a look over this patch. I think this patch is interesting and very
> useful. Here are my review points:
>
> 1. Patch applies cleanly.
> 2. make, make install and make check is good.
> 3. I did performance evaluation using your test queries with 64kB and 128kB of
> gin_fast_limit (or fast_cache_size), and saw that both values achieved the
> performance gains over gin_fast_limit = '256MB'. 64kB worked better than 128kB.
> 64kB improved from 1.057 ms to 0.075 ms. Great!
> 4. In my understanding, the small value of gin_fast_limit/fast_cache_size leads
> to the increase in GIN search performance, which, however, leads to the decrease
> in GIN update performance. Am I right? If so, I think the tradeoff should be
> noted in the documentation.
> 5. The following documents in Chapter 57. GIN Indexes need to be updated:
> * 57.3.1. GIN Fast Update Technique
> * 57.4. GIN Tips and Tricks
> 6. I would like to see the results for the additional test cases (tsvectors).
> 7. The commented-out elog() code should be removed.
> 8. I think there are no issues in this patch. However, I have one question: how
> this patch works in the case where gin_fast_limit/fast_cache_size = 0? In this
> case, in my understanding, this patch inserts new entries into the pending list
> temporarily and immediately moves them to the main GIN data structure using
> ginInsertCleanup(). Am I right? If so, that is obviously inefficient.
>
> Sorry for the delay.
>
> Best regards,
> Etsuro Fujita
>
>
> Ian Link <mailto:ian(at)ilink(dot)io>
> Monday, June 17, 2013 9:42 PM
> *
>
> This patch contains a performance improvement for the fast gin cache.
> As you may know, the performance of the fast gin cache decreases with
> its size. Currently, the size of the fast gin cache is tied to
> work_mem. The size of work_mem can often be quite high. The large size
> of work_mem is inappropriate for the fast gin cache size. Therefore,
> we created a separate cache size called gin_fast_limit. This global
> variable controls the size of the fast gin cache, independently of
> work_mem. Currently, the default gin_fast_limit is set to 128kB.
> However, that value could need tweaking. 64kB may work better, but
> it's hard to say with only my single machine to test on.
>
>
> On my machine, this patch results in a nice speed up. Our test queries
> improve from about 0.9 ms to 0.030 ms. Please feel free to use the
> test case yourself: it should be attached. I can look into additional
> test cases (tsvectors) if anyone is interested.
>
>
> In addition to the global limit, we have provided a per-index limit:
> fast_cache_size. This per-index limit begins at -1, which means that
> it is disabled. If the user does not specify a per-index limit, the
> index will simply use the global limit.
>
>
> I would like to thank Andrew Gierth for all his help on this patch. As
> this is my first patch he was extremely helpful. The idea for this
> performance improvement was entirely his. I just did the
> implementation. Thanks for reading and considering this patch!*
>
>
> Ian Link


From: Ian Link <ian(at)ilink(dot)io>
To: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: 'pgsql-hackers' <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch for fast gin cache performance improvement
Date: 2013-10-08 19:18:08
Message-ID: 52545A70.3030004@ilink.io
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> Although I asked this question, I've reconsidered about these parameters, and it
> seems that these parameters not only make code rather complex but are a little
> confusing to users. So I'd like to propose to introduce only one parameter:
> fast_cache_size. While users that give weight to update performance for the
> fast update technique set this parameter to a large value, users that give
> weight not only to update performance but to search performance set this
> parameter to a small value. What do you think about this?
I think it makes sense to maintain this separation. If the user doesn't
need a per-index setting, they don't have to use the parameter. Since
the parameter is off by default, they don't even need to worry about it.
There might as well be one parameter for users that don't need
fine-grained control. We can document this and I don't think it will be
confusing to the user.

> 4. In my understanding, the small value of gin_fast_limit/fast_cache_size leads
> to the increase in GIN search performance, which, however, leads to the decrease
> in GIN update performance. Am I right? If so, I think the tradeoff should be
> noted in the documentation.
I believe this is correct.

> 5. The following documents in Chapter 57. GIN Indexes need to be
> updated: * 57.3.1. GIN Fast Update Technique * 57.4. GIN Tips and Tricks
Sure, I can add something.

> 6. I would like to see the results for the additional test cases (tsvectors).
I don't really have any good test cases for this available, and have
very limited time for postgres at the moment. Feel free to create a test
case, but I don't believe I can at the moment. Sorry!

> 7. The commented-out elog() code should be removed.
Sorry about that, I shouldn't have submitted the patch with those still
there.

I should have a new patch soonish, hopefully. Thanks for your feedback!
Ian

Ian Link wrote:
> 8. I think there are no issues in this patch. However, I have one question: how
> this patch works in the case where gin_fast_limit/fast_cache_size = 0? In this
> case, in my understanding, this patch inserts new entries into the pending list
> temporarily and immediately moves them to the main GIN data structure using
> ginInsertCleanup(). Am I right? If so, that is obviously inefficient.


From: "Etsuro Fujita" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: "'Ian Link'" <ian(at)ilink(dot)io>
Cc: "'pgsql-hackers'" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch for fast gin cache performance improvement
Date: 2013-10-10 08:01:44
Message-ID: 006401cec58e$f658b680$e30a2380$@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Ian Link wrote:
> > Although I asked this question, I've reconsidered about these
> > parameters, and it seems that these parameters not only make code
> > rather complex but are a little confusing to users. So I'd like to propose
> to introduce only one parameter:
> > fast_cache_size. While users that give weight to update performance
> > for the fast update technique set this parameter to a large value,
> > users that give weight not only to update performance but to search
> > performance set this parameter to a small value. What do you think about
> this?
> I think it makes sense to maintain this separation. If the user doesn't need
> a per-index setting, they don't have to use the parameter. Since the parameter
> is off by default, they don't even need to worry about it.
> There might as well be one parameter for users that don't need fine-grained
> control. We can document this and I don't think it will be confusing to the
> user.

OK, though I'd like to hear the opinion of others.

> > 4. In my understanding, the small value of
> > gin_fast_limit/fast_cache_size leads to the increase in GIN search
> > performance, which, however, leads to the decrease in GIN update
> > performance. Am I right? If so, I think the tradeoff should be noted in
> the documentation.
> I believe this is correct.
>
> > 5. The following documents in Chapter 57. GIN Indexes need to be
> > updated: * 57.3.1. GIN Fast Update Technique * 57.4. GIN Tips and
> > Tricks
> Sure, I can add something.
>
> > 6. I would like to see the results for the additional test cases
(tsvectors).
> I don't really have any good test cases for this available, and have very
limited
> time for postgres at the moment. Feel free to create a test case, but I don't
> believe I can at the moment. Sorry!

Unfortunately, I don't have much time to do such a thing, though I think we
should do that. (In addition, we should do another performance test to make
clear an influence of fast update performance from introducing these parameters,
which would be required to determine the default values of these parameters.)

> > 7. The commented-out elog() code should be removed.
> Sorry about that, I shouldn't have submitted the patch with those still there.
>
> I should have a new patch soonish, hopefully. Thanks for your feedback!

I think it is desirable that this patch should be resubmitted for the next
CommitFest for further review and testing mentioned above. So I'd like to mark
this patch as Returned with Feedback. Is it OK?

Thanks,

Best regards,
Etsuro Fujita


From: Ian Link <ian(at)ilink(dot)io>
To: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: 'pgsql-hackers' <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch for fast gin cache performance improvement
Date: 2013-10-12 06:57:07
Message-ID: 5258F2C3.7020301@ilink.io
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> I think it is desirable that this patch should be resubmitted for the next
> CommitFest for further review and testing mentioned above. So I'd like to mark
> this patch as Returned with Feedback. Is it OK?
Sounds like a good idea. Thanks for the review!
Ian Link

> Etsuro Fujita <mailto:fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
> Thursday, October 10, 2013 1:01 AM
> Ian Link wrote:
>>> Although I asked this question, I've reconsidered about these
>>> parameters, and it seems that these parameters not only make code
>>> rather complex but are a little confusing to users. So I'd like to propose
>> to introduce only one parameter:
>>> fast_cache_size. While users that give weight to update performance
>>> for the fast update technique set this parameter to a large value,
>>> users that give weight not only to update performance but to search
>>> performance set this parameter to a small value. What do you think about
>> this?
>> I think it makes sense to maintain this separation. If the user doesn't need
>> a per-index setting, they don't have to use the parameter. Since the parameter
>> is off by default, they don't even need to worry about it.
>> There might as well be one parameter for users that don't need fine-grained
>> control. We can document this and I don't think it will be confusing to the
>> user.
>
> OK, though I'd like to hear the opinion of others.
>
>>> 4. In my understanding, the small value of
>>> gin_fast_limit/fast_cache_size leads to the increase in GIN search
>>> performance, which, however, leads to the decrease in GIN update
>>> performance. Am I right? If so, I think the tradeoff should be noted in
>> the documentation.
>> I believe this is correct.
>>
>>> 5. The following documents in Chapter 57. GIN Indexes need to be
>>> updated: * 57.3.1. GIN Fast Update Technique * 57.4. GIN Tips and
>>> Tricks
>> Sure, I can add something.
>>
>>> 6. I would like to see the results for the additional test cases
> (tsvectors).
>> I don't really have any good test cases for this available, and have very
> limited
>> time for postgres at the moment. Feel free to create a test case, but I don't
>> believe I can at the moment. Sorry!
>
> Unfortunately, I don't have much time to do such a thing, though I think we
> should do that. (In addition, we should do another performance test to make
> clear an influence of fast update performance from introducing these parameters,
> which would be required to determine the default values of these parameters.)
>
>>> 7. The commented-out elog() code should be removed.
>> Sorry about that, I shouldn't have submitted the patch with those still there.
>>
>> I should have a new patch soonish, hopefully. Thanks for your feedback!
>
> I think it is desirable that this patch should be resubmitted for the next
> CommitFest for further review and testing mentioned above. So I'd like to mark
> this patch as Returned with Feedback. Is it OK?
>
> Thanks,
>
> Best regards,
> Etsuro Fujita
>
>
> Ian Link <mailto:ian(at)ilink(dot)io>
> Monday, September 30, 2013 3:09 PM
> Hi Etsuro,
> Sorry for the delay but I have been very busy with work. I have been
> away from postgres for a while, so I will need a little time to review
> the code and make sure I give you an informed response. I'll get back
> to you as soon as I am able. Thanks for understanding.
> Ian Link
>
> Etsuro Fujita <mailto:fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
> Friday, September 27, 2013 2:24 AM
> I wrote:
>> I had a look over this patch. I think this patch is interesting and very
> useful.
>> Here are my review points:
>
>> 8. I think there are no issues in this patch. However, I have one question:
>> how this patch works in the case where gin_fast_limit/fast_cache_size = 0? In
>> this case, in my understanding, this patch inserts new entries into the
> pending
>> list temporarily and immediately moves them to the main GIN data structure
> using
>> ginInsertCleanup(). Am I right? If so, that is obviously inefficient.
>
> Sorry, There are incorrect expressions. I mean gin_fast_limit> 0 and
> fast_cache_size = 0.
>
> Although I asked this question, I've reconsidered about these parameters, and it
> seems that these parameters not only make code rather complex but are a little
> confusing to users. So I'd like to propose to introduce only one parameter:
> fast_cache_size. While users that give weight to update performance for the
> fast update technique set this parameter to a large value, users that give
> weight not only to update performance but to search performance set this
> parameter to a small value. What do you think about this?
>
> Thanks,
>
> Best regards,
> Etsuro Fujita
>
>
> Etsuro Fujita <mailto:fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
> Thursday, September 26, 2013 6:02 AM
> Hi Ian,
>
>> This patch contains a performance improvement for the fast gin cache. As you
>> may know, the performance of the fast gin cache decreases with its size.
>> Currently, the size of the fast gin cache is tied to work_mem. The size of
>> work_mem can often be quite high. The large size of work_mem is inappropriate
>> for the fast gin cache size. Therefore, we created a separate cache size
> called
>> gin_fast_limit. This global variable controls the size of the fast gin cache,
>> independently of work_mem. Currently, the default gin_fast_limit is set to
> 128kB.
>> However, that value could need tweaking. 64kB may work better, but it's hard
>> to say with only my single machine to test on.
>
>> On my machine, this patch results in a nice speed up. Our test queries improve
>> from about 0.9 ms to 0.030 ms. Please feel free to use the test case yourself:
>> it should be attached. I can look into additional test cases (tsvectors) if
>> anyone is interested.
>
>> In addition to the global limit, we have provided a per-index limit:
>> fast_cache_size. This per-index limit begins at -1, which means that it is
>> disabled. If the user does not specify a per-index limit, the index will
> simply
>> use the global limit.
>
> I had a look over this patch. I think this patch is interesting and very
> useful. Here are my review points:
>
> 1. Patch applies cleanly.
> 2. make, make install and make check is good.
> 3. I did performance evaluation using your test queries with 64kB and 128kB of
> gin_fast_limit (or fast_cache_size), and saw that both values achieved the
> performance gains over gin_fast_limit = '256MB'. 64kB worked better than 128kB.
> 64kB improved from 1.057 ms to 0.075 ms. Great!
> 4. In my understanding, the small value of gin_fast_limit/fast_cache_size leads
> to the increase in GIN search performance, which, however, leads to the decrease
> in GIN update performance. Am I right? If so, I think the tradeoff should be
> noted in the documentation.
> 5. The following documents in Chapter 57. GIN Indexes need to be updated:
> * 57.3.1. GIN Fast Update Technique
> * 57.4. GIN Tips and Tricks
> 6. I would like to see the results for the additional test cases (tsvectors).
> 7. The commented-out elog() code should be removed.
> 8. I think there are no issues in this patch. However, I have one question: how
> this patch works in the case where gin_fast_limit/fast_cache_size = 0? In this
> case, in my understanding, this patch inserts new entries into the pending list
> temporarily and immediately moves them to the main GIN data structure using
> ginInsertCleanup(). Am I right? If so, that is obviously inefficient.
>
> Sorry for the delay.
>
> Best regards,
> Etsuro Fujita
>
>
> Ian Link <mailto:ian(at)ilink(dot)io>
> Monday, June 17, 2013 9:42 PM
> *
>
> This patch contains a performance improvement for the fast gin cache.
> As you may know, the performance of the fast gin cache decreases with
> its size. Currently, the size of the fast gin cache is tied to
> work_mem. The size of work_mem can often be quite high. The large size
> of work_mem is inappropriate for the fast gin cache size. Therefore,
> we created a separate cache size called gin_fast_limit. This global
> variable controls the size of the fast gin cache, independently of
> work_mem. Currently, the default gin_fast_limit is set to 128kB.
> However, that value could need tweaking. 64kB may work better, but
> it's hard to say with only my single machine to test on.
>
>
> On my machine, this patch results in a nice speed up. Our test queries
> improve from about 0.9 ms to 0.030 ms. Please feel free to use the
> test case yourself: it should be attached. I can look into additional
> test cases (tsvectors) if anyone is interested.
>
>
> In addition to the global limit, we have provided a per-index limit:
> fast_cache_size. This per-index limit begins at -1, which means that
> it is disabled. If the user does not specify a per-index limit, the
> index will simply use the global limit.
>
>
> I would like to thank Andrew Gierth for all his help on this patch. As
> this is my first patch he was extremely helpful. The idea for this
> performance improvement was entirely his. I just did the
> implementation. Thanks for reading and considering this patch!*
>
>
> Ian Link