Re: GSoC Proposal - Caching query results in pgpool-II

Lists: pgsql-hackers
From: Masanori Yamazaki <m(dot)yamazaki23(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: GSoC Proposal - Caching query results in pgpool-II
Date: 2011-04-06 00:09:53
Message-ID: BANLkTi=4h7WbAGRGgpVcDMpzCgs3DfmPNw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello

My name is Masanori Yamazaki. I am sending my proposal about
Google Summer Of Code2011. It would be nice if you could give
me your opinion.

・title

Caching query results in pgpool-II

・Synopsis

Pgpool-II has query caching functionality using storage provided by
dedicated PostgreSQL ("system database"). This has several drawbacks
however. 1)it is slow because it needs to access disk storage 2)it
does not invalidate the cache automatically.

This proposal tries to solve these problems.

- To speed up the cache access, it will be placed on memory, rather
than database. The memory will be put on shared memory or external
memory services such as memcached so that the cache can be shared by
multiple sessions. Old cache entries will be deleted by LRU manner.

- The cache will be invalidated automatically upon the timing when the
relevant tables are updated. Note that this is not always possible
because the query result might come from multiple tables, views or
even functions. In this case the cache will be invalidated by
timeout(or they are not cached at all).

- Fine tuning knobs need to be invented to control the cache behavior
though they are not clear at this moment.

・Benefits to the PostgreSQL Community:

Query caching will effectively enhance the performance of PostgreSQL
and this project will contribute to increase the number of users of
PostgreSQL, who need more high performance database systems.

Note that implementing query cache in pgpool-II will bring merits not
only to the latest version of PostgreSQL but to the previous releases
of PostgreSQL.

・Project Schedule

-April
preparation

-May 1 - May 22
write a specification

-May 23 - June 19
coding

-June 20 - July 22
test

-July 23 - August 12
complete of coding and test, commit

・Personal Data and Biographical Information

Name : Masanori Yamazaki
Born : 23.1.1981
School :Currently I learn contemporary philosophy, culture and literature
at Waseda University in Japan.
Coding :
1.About five years job as web application programer(PHP, Java).
2.I experienced projects used framework such as Symfony, Zend Framework,
CakePHP, and Struts.
3.I am interested in OSS and like coding.

Regards


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Masanori Yamazaki <m(dot)yamazaki23(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GSoC Proposal - Caching query results in pgpool-II
Date: 2011-04-06 10:04:41
Message-ID: BANLkTi=_onbVNxEKX-sc3oxoqhOdUGLmMg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

How does this relate to the existing pqc project (
http://code.google.com/p/pqc/)? Seems the goals are fairly similar, and both
are based off pgpool?

/Magnus
On Apr 6, 2011 2:10 AM, "Masanori Yamazaki" <m(dot)yamazaki23(at)gmail(dot)com> wrote:
> Hello
>
> My name is Masanori Yamazaki. I am sending my proposal about
> Google Summer Of Code2011. It would be nice if you could give
> me your opinion.
>
>
> ・title
>
> Caching query results in pgpool-II
>
>
> ・Synopsis
>
> Pgpool-II has query caching functionality using storage provided by
> dedicated PostgreSQL ("system database"). This has several drawbacks
> however. 1)it is slow because it needs to access disk storage 2)it
> does not invalidate the cache automatically.
>
> This proposal tries to solve these problems.
>
> - To speed up the cache access, it will be placed on memory, rather
> than database. The memory will be put on shared memory or external
> memory services such as memcached so that the cache can be shared by
> multiple sessions. Old cache entries will be deleted by LRU manner.
>
> - The cache will be invalidated automatically upon the timing when the
> relevant tables are updated. Note that this is not always possible
> because the query result might come from multiple tables, views or
> even functions. In this case the cache will be invalidated by
> timeout(or they are not cached at all).
>
> - Fine tuning knobs need to be invented to control the cache behavior
> though they are not clear at this moment.
>
>
> ・Benefits to the PostgreSQL Community:
>
>
> Query caching will effectively enhance the performance of PostgreSQL
> and this project will contribute to increase the number of users of
> PostgreSQL, who need more high performance database systems.
>
> Note that implementing query cache in pgpool-II will bring merits not
> only to the latest version of PostgreSQL but to the previous releases
> of PostgreSQL.
>
>
> ・Project Schedule
>
> -April
> preparation
>
> -May 1 - May 22
> write a specification
>
> -May 23 - June 19
> coding
>
> -June 20 - July 22
> test
>
> -July 23 - August 12
> complete of coding and test, commit
>
>
> ・Personal Data and Biographical Information
>
> Name : Masanori Yamazaki
> Born : 23.1.1981
> School :Currently I learn contemporary philosophy, culture and literature
> at Waseda University in Japan.
> Coding :
> 1.About five years job as web application programer(PHP, Java).
> 2.I experienced projects used framework such as Symfony, Zend Framework,
> CakePHP, and Struts.
> 3.I am interested in OSS and like coding.
>
>
> Regards


From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: magnus(at)hagander(dot)net
Cc: m(dot)yamazaki23(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GSoC Proposal - Caching query results in pgpool-II
Date: 2011-04-07 01:34:16
Message-ID: 20110407.103416.843046270922110098.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

In my understanding pqc is not designed to be working with pgpool.
Thus if a user want to use both query cache and query dispatching,
replication or failover etc. which are provided by pgpool, it seems
it's not possible. For this purpose maybe user could *cascade* pqc and
pgpool, but I'm not sure. Even if it's possible, it will bring huge
performance penalty.

Another point is cache invalidation. Masanori's proposal includes
cache invalidation technique by looking at write queries, which is
lacking in pqc in my understanding.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> How does this relate to the existing pqc project (
> http://code.google.com/p/pqc/)? Seems the goals are fairly similar, and both
> are based off pgpool?
>
> /Magnus
> On Apr 6, 2011 2:10 AM, "Masanori Yamazaki" <m(dot)yamazaki23(at)gmail(dot)com> wrote:
>> Hello
>>
>> My name is Masanori Yamazaki. I am sending my proposal about
>> Google Summer Of Code2011. It would be nice if you could give
>> me your opinion.
>>
>>
>> ・title
>>
>> Caching query results in pgpool-II
>>
>>
>> ・Synopsis
>>
>> Pgpool-II has query caching functionality using storage provided by
>> dedicated PostgreSQL ("system database"). This has several drawbacks
>> however. 1)it is slow because it needs to access disk storage 2)it
>> does not invalidate the cache automatically.
>>
>> This proposal tries to solve these problems.
>>
>> - To speed up the cache access, it will be placed on memory, rather
>> than database. The memory will be put on shared memory or external
>> memory services such as memcached so that the cache can be shared by
>> multiple sessions. Old cache entries will be deleted by LRU manner.
>>
>> - The cache will be invalidated automatically upon the timing when the
>> relevant tables are updated. Note that this is not always possible
>> because the query result might come from multiple tables, views or
>> even functions. In this case the cache will be invalidated by
>> timeout(or they are not cached at all).
>>
>> - Fine tuning knobs need to be invented to control the cache behavior
>> though they are not clear at this moment.
>>
>>
>> ・Benefits to the PostgreSQL Community:
>>
>>
>> Query caching will effectively enhance the performance of PostgreSQL
>> and this project will contribute to increase the number of users of
>> PostgreSQL, who need more high performance database systems.
>>
>> Note that implementing query cache in pgpool-II will bring merits not
>> only to the latest version of PostgreSQL but to the previous releases
>> of PostgreSQL.
>>
>>
>> ・Project Schedule
>>
>> -April
>> preparation
>>
>> -May 1 - May 22
>> write a specification
>>
>> -May 23 - June 19
>> coding
>>
>> -June 20 - July 22
>> test
>>
>> -July 23 - August 12
>> complete of coding and test, commit
>>
>>
>> ・Personal Data and Biographical Information
>>
>> Name : Masanori Yamazaki
>> Born : 23.1.1981
>> School :Currently I learn contemporary philosophy, culture and literature
>> at Waseda University in Japan.
>> Coding :
>> 1.About five years job as web application programer(PHP, Java).
>> 2.I experienced projects used framework such as Symfony, Zend Framework,
>> CakePHP, and Struts.
>> 3.I am interested in OSS and like coding.
>>
>>
>> Regards


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc: m(dot)yamazaki23(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GSoC Proposal - Caching query results in pgpool-II
Date: 2011-04-07 08:57:18
Message-ID: BANLkTim-TUjuDsMivACFYJmiwmD5ZQ3pBA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2011/4/7 Tatsuo Ishii <ishii(at)postgresql(dot)org>:
> In my understanding pqc is not designed to be working with pgpool.
> Thus if a user want to use both query cache and query dispatching,
> replication or failover etc. which are provided by pgpool, it seems
> it's not possible. For this purpose maybe user could *cascade* pqc and
> pgpool, but I'm not sure. Even if it's possible, it will bring huge
> performance penalty.
>
> Another point is cache invalidation. Masanori's proposal includes
> cache invalidation technique by looking at write queries, which is
> lacking in pqc in my understanding.

Probably. My question wasn't necessarily "hasn't this already been
done in pqc", more "should this perhaps build on or integrate with pgc
in order not to duplicate effort". I think at the very least, any
overlap should be researched and identified - because if it can
integrate parts of pgc, or work with, more effort can be spent on the
new parts rather than redoing something that's already been done.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/