Re: Priority table or Cache table

Lists: pgsql-hackers
From: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Priority table or Cache table
Date: 2014-02-20 00:34:28
Message-ID: CAJrrPGdz=RvBKWY8-hZWzOgjH5uHCiNFyQbLxiPVRhBDKPmi3g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

I want to propose a new feature called "priority table" or "cache table".
This is same as regular table except the pages of these tables are having
high priority than normal tables. These tables are very useful, where a
faster query processing on some particular tables is expected.

The same faster query processing can be achieved by placing the tables on a
tablespace of ram disk. In this case there is a problem of data loss in
case of system shutdown. To avoid this there is a need of continuous backup
of this tablespace and WAL files is required. The priority table feature
will solve these problems by providing the similar functionality.

User needs a careful decision in deciding how many tables which require a
faster access, those can be declared as priority tables and also these
tables should be in small in both number of columns and size.

New syntax:

create [priority] Table ...;

or

Create Table .. [ buffer_pool = priority | default ];

By adding a new storage parameter of buffer_pool to specify the type of
buffer pool this table can use.

The same can be extended for index also.

Solution -1:

This solution may not be a proper one, but it is simple. So while placing
these table pages into buffer pool, the usage count is changed to double
max buffer usage count instead of 1 for normal tables. Because of this
reason there is a less chance of these pages will be moved out of buffer
pool. The queries which operates on these tables will be faster because of
less I/O. In case if the tables are not used for a long time, then only the
first query on the table will be slower and rest of the queries are faster.

Just for test, a new bool member can be added to RELFILENODE structure to
indicate the table type is priority or not. Using this while loading the
page the usage count can be modified.

The pg_buffercache output of a priority table:

postgres=# select * from pg_buffercache where relfilenode=16385;
bufferid | relfilenode | reltablespace | reldatabase | relforknumber |
relblocknumber | isdirty | usagecount
-----------+---------------+-------------------+-----------------+--------------------+---------------------+---------+------------
270 | 16385 | 1663 | 12831 |
0 | 0 | t | 10

Solution - 2:

By keeping an extra flag in the buffer to know whether the buffer is used
for a priority table or not? By using this flag while replacing a buffer
used for priority table some extra steps needs to be taken care like
1. Only another page of priority table can replace this priority page.
2. Only after at least two complete cycles of clock sweep, a normal table
page can replace this.

In this case the priority buffers are present in memory for long time as
similar to the solution-1, but not guaranteed always.

Solution - 3:

Create an another buffer pool called "priority buffer pool" similar to
shared buffer pool to place the priority table pages. A new guc parameter
called "priority_buffers" can be added to the get the priority buffer pool
size from the user. The Maximum limit of these buffers can be kept smaller
value to make use of it properly.

As an extra care, whenever any page needs to move out of the priority
buffer pool a warning is issued, so that user can check whether the
configured the priority_buffers size is small or the priority tables are
grown too much as not expected?

In this case all the pages are always loaded into memory thus the queries
gets the faster processing.

IBM DB2 have the facility of creating one more buffer pools and fixing
specific tables and indexes into them. Oracle is also having a facility to
specify the buffer pool option as keep or recycle.

I am preferring syntax-2 and solution-3. please provide your
suggestions/improvements.

Regards,
Hari Babu
Fujitsu Australia


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Priority table or Cache table
Date: 2014-02-20 00:38:20
Message-ID: 19195.1392856700@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com> writes:
> I want to propose a new feature called "priority table" or "cache table".
> This is same as regular table except the pages of these tables are having
> high priority than normal tables. These tables are very useful, where a
> faster query processing on some particular tables is expected.

Why exactly does the existing LRU behavior of shared buffers not do
what you need?

I am really dubious that letting DBAs manage buffers is going to be
an improvement over automatic management.

regards, tom lane


From: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Priority table or Cache table
Date: 2014-02-20 00:54:58
Message-ID: CAJrrPGejxVZ5-tavN95PWeQx+WPSf733wdVXSbs7w2JsGyuYLQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 20, 2014 at 11:38 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com> writes:
> > I want to propose a new feature called "priority table" or "cache table".
> > This is same as regular table except the pages of these tables are having
> > high priority than normal tables. These tables are very useful, where a
> > faster query processing on some particular tables is expected.
>
> Why exactly does the existing LRU behavior of shared buffers not do
> what you need?
>

Lets assume a database having 3 tables, which are accessed regularly. The
user is expecting a faster query results on one table.
Because of LRU behavior which is not happening some times. So if we just
separate those table pages into an another buffer
pool then all the pages of that table resides in memory and gets faster
query processing.

Regards,
Hari Babu
Fujitsu Australia


From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Priority table or Cache table
Date: 2014-02-20 03:26:09
Message-ID: CAA4eK1JHx8AthJ56sUSJdV3C3XJkLQEm8XHYX1SC7VfH9sTopw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 20, 2014 at 6:24 AM, Haribabu Kommi
<kommi(dot)haribabu(at)gmail(dot)com> wrote:
> On Thu, Feb 20, 2014 at 11:38 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> > I want to propose a new feature called "priority table" or "cache
>> > table".
>> > This is same as regular table except the pages of these tables are
>> > having
>> > high priority than normal tables. These tables are very useful, where a
>> > faster query processing on some particular tables is expected.
>>
>> Why exactly does the existing LRU behavior of shared buffers not do
>> what you need?
>
>
> Lets assume a database having 3 tables, which are accessed regularly. The
> user is expecting a faster query results on one table.
> Because of LRU behavior which is not happening some times.

I think this will not be a problem for regularly accessed tables(pages),
as per current algorithm they will get more priority before getting
flushed out of shared buffer cache.
Have you come across any such case where regularly accessed pages
get lower priority than non-regularly accessed pages?

However it might be required for cases where user wants to control
such behaviour and pass such hints through table level option or some
other way to indicate that he wants more priority for certain tables
irrespective
of their usage w.r.t other tables.

Now I think here important thing to find out is how much helpful it is for
users or why do they want to control such behaviour even when Database
already takes care of such thing based on access pattern.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Priority table or Cache table
Date: 2014-02-20 04:53:57
Message-ID: CAJrrPGew0wh4ZWDuLrAbJ5X4LLqO0ZoddtK-CFgPCPWr92rMvg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 20, 2014 at 2:26 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>wrote:

> On Thu, Feb 20, 2014 at 6:24 AM, Haribabu Kommi
> <kommi(dot)haribabu(at)gmail(dot)com> wrote:
> > On Thu, Feb 20, 2014 at 11:38 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> > I want to propose a new feature called "priority table" or "cache
> >> > table".
> >> > This is same as regular table except the pages of these tables are
> >> > having
> >> > high priority than normal tables. These tables are very useful, where
> a
> >> > faster query processing on some particular tables is expected.
> >>
> >> Why exactly does the existing LRU behavior of shared buffers not do
> >> what you need?
> >
> >
> > Lets assume a database having 3 tables, which are accessed regularly. The
> > user is expecting a faster query results on one table.
> > Because of LRU behavior which is not happening some times.
>
> I think this will not be a problem for regularly accessed tables(pages),
> as per current algorithm they will get more priority before getting
> flushed out of shared buffer cache.
> Have you come across any such case where regularly accessed pages
> get lower priority than non-regularly accessed pages?
>

Because of other regularly accessed tables, some times the table which
expects faster results is getting delayed.

> However it might be required for cases where user wants to control
> such behaviour and pass such hints through table level option or some
> other way to indicate that he wants more priority for certain tables
> irrespective
> of their usage w.r.t other tables.
>
> Now I think here important thing to find out is how much helpful it is for
> users or why do they want to control such behaviour even when Database
> already takes care of such thing based on access pattern.
>

Yes it is useful in cases where the application always expects the faster
results whether the table is used regularly or not.

Regards,
Hari Babu
Fujitsu Australia


From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Priority table or Cache table
Date: 2014-02-20 11:06:46
Message-ID: CAFjFpRdz-6XB=XuU5CPSMe6MgQowmhwjcOPPdYWYpi3bXhchrg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 20, 2014 at 10:23 AM, Haribabu Kommi
<kommi(dot)haribabu(at)gmail(dot)com>wrote:

> On Thu, Feb 20, 2014 at 2:26 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>wrote:
>
>> On Thu, Feb 20, 2014 at 6:24 AM, Haribabu Kommi
>> <kommi(dot)haribabu(at)gmail(dot)com> wrote:
>> > On Thu, Feb 20, 2014 at 11:38 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> >> > I want to propose a new feature called "priority table" or "cache
>> >> > table".
>> >> > This is same as regular table except the pages of these tables are
>> >> > having
>> >> > high priority than normal tables. These tables are very useful,
>> where a
>> >> > faster query processing on some particular tables is expected.
>> >>
>> >> Why exactly does the existing LRU behavior of shared buffers not do
>> >> what you need?
>> >
>> >
>> > Lets assume a database having 3 tables, which are accessed regularly.
>> The
>> > user is expecting a faster query results on one table.
>> > Because of LRU behavior which is not happening some times.
>>
>> I think this will not be a problem for regularly accessed tables(pages),
>> as per current algorithm they will get more priority before getting
>> flushed out of shared buffer cache.
>> Have you come across any such case where regularly accessed pages
>> get lower priority than non-regularly accessed pages?
>>
>
> Because of other regularly accessed tables, some times the table which
> expects faster results is getting delayed.
>

The solution involving buffer pools partitions the buffer cache in separate
pools explicitly. The way PostgreSQL buffer manager works, for a regular
pattern table accesses the buffer cache automatically reaches a stable
point where the number of buffers containing pages belonging to a
particular table starts to stabilize. Thus at an equilibrium point for
given access pattern, the buffer cache automatically gets partitioned by
the tables, each using its share of buffers. So, solution using buffer
pools seems useless.

PFA some scripts, which I used to verify the behaviour. The scripts create
two tables, one large and other half it's size (buffer_usage_objects.sql).
The other script contains few queries which will simulate a simple table
access pattern by running select count(*) on these tables N times. The same
script contains query of pg_buffercache view provided by pg_buffercache
extension. This query counts the number of buffers uses by either of these
tables. So, if you run three session in parallel, two querying either of
the tables and the third taking snapshot of buffer usage per table, you
would be able to see this partitioning.

>
>
>> However it might be required for cases where user wants to control
>> such behaviour and pass such hints through table level option or some
>> other way to indicate that he wants more priority for certain tables
>> irrespective
>> of their usage w.r.t other tables.
>>
>> Now I think here important thing to find out is how much helpful it is for
>> users or why do they want to control such behaviour even when Database
>> already takes care of such thing based on access pattern.
>>
>
> Yes it is useful in cases where the application always expects the faster
> results whether the table is used regularly or not.
>

In such case, it might be valuable to see if we should play with the
maximum usage parameter, which is set to 5 currently.
54 #define BM_MAX_USAGE_COUNT 5

> Regards,
> Hari Babu
> Fujitsu Australia
>

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment Content-Type Size
buffer_usage_objects.sql application/octet-stream 838 bytes
buffer_usage_queries.sql application/octet-stream 1.2 KB

From: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Priority table or Cache table
Date: 2014-02-21 01:02:54
Message-ID: CAJrrPGf9c6MJ6BdPDADdOzyPYQ4mh3jHuDPivj-TWS62WcP-zQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 20, 2014 at 10:06 PM, Ashutosh Bapat <
ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:

> On Thu, Feb 20, 2014 at 10:23 AM, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com
> > wrote:
>
>> On Thu, Feb 20, 2014 at 2:26 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>wrote:
>>
>>> On Thu, Feb 20, 2014 at 6:24 AM, Haribabu Kommi
>>> <kommi(dot)haribabu(at)gmail(dot)com> wrote:
>>> > On Thu, Feb 20, 2014 at 11:38 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> >> > I want to propose a new feature called "priority table" or "cache
>>> >> > table".
>>> >> > This is same as regular table except the pages of these tables are
>>> >> > having
>>> >> > high priority than normal tables. These tables are very useful,
>>> where a
>>> >> > faster query processing on some particular tables is expected.
>>> >>
>>> >> Why exactly does the existing LRU behavior of shared buffers not do
>>> >> what you need?
>>> >
>>> >
>>> > Lets assume a database having 3 tables, which are accessed regularly.
>>> The
>>> > user is expecting a faster query results on one table.
>>> > Because of LRU behavior which is not happening some times.
>>>
>>> I think this will not be a problem for regularly accessed tables(pages),
>>> as per current algorithm they will get more priority before getting
>>> flushed out of shared buffer cache.
>>> Have you come across any such case where regularly accessed pages
>>> get lower priority than non-regularly accessed pages?
>>>
>>
>> Because of other regularly accessed tables, some times the table which
>> expects faster results is getting delayed.
>>
>
> The solution involving buffer pools partitions the buffer cache in
> separate pools explicitly. The way PostgreSQL buffer manager works, for a
> regular pattern table accesses the buffer cache automatically reaches a
> stable point where the number of buffers containing pages belonging to a
> particular table starts to stabilize. Thus at an equilibrium point for
> given access pattern, the buffer cache automatically gets partitioned by
> the tables, each using its share of buffers. So, solution using buffer
> pools seems useless.
>

I checked some of the performance reports on the oracle multiple buffer
pool concept, shown as there is an increase in cache hit ratio compared to
a single buffer pool.
After that only I proposed this split pool solution. I don't know how much
it really works for Postgresql. The performance report on oracle is
attached in the mail.

> PFA some scripts, which I used to verify the behaviour. The scripts create
> two tables, one large and other half it's size (buffer_usage_objects.sql).
> The other script contains few queries which will simulate a simple table
> access pattern by running select count(*) on these tables N times. The same
> script contains query of pg_buffercache view provided by pg_buffercache
> extension. This query counts the number of buffers uses by either of these
> tables. So, if you run three session in parallel, two querying either of
> the tables and the third taking snapshot of buffer usage per table, you
> would be able to see this partitioning.
>

Thanks for the scripts. I will check it.

> However it might be required for cases where user wants to control
>>> such behaviour and pass such hints through table level option or some
>>> other way to indicate that he wants more priority for certain tables
>>> irrespective
>>> of their usage w.r.t other tables.
>>>
>>> Now I think here important thing to find out is how much helpful it is
>>> for
>>> users or why do they want to control such behaviour even when Database
>>> already takes care of such thing based on access pattern.
>>>
>>
>> Yes it is useful in cases where the application always expects the faster
>> results whether the table is used regularly or not.
>>
>
> In such case, it might be valuable to see if we should play with the
> maximum usage parameter, which is set to 5 currently.
> 54 #define BM_MAX_USAGE_COUNT 5
>

This is the first solution which i have described in my first mail. Thanks,
I will check further into it.

Regards,
Hari Babu
Fujitsu Australia

Attachment Content-Type Size
oracle9i_buffer_pools.pdf application/pdf 51.2 KB

From: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Priority table or Cache table
Date: 2014-03-17 04:16:43
Message-ID: CAJrrPGfHA_XzcaH4vTJKd0yQMws5JgirH9jicU2GYxFJNf0Qfg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Feb 21, 2014 at 12:02 PM, Haribabu Kommi
<kommi(dot)haribabu(at)gmail(dot)com> wrote:
> On Thu, Feb 20, 2014 at 10:06 PM, Ashutosh Bapat
> <ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
>>
>> On Thu, Feb 20, 2014 at 10:23 AM, Haribabu Kommi
>> <kommi(dot)haribabu(at)gmail(dot)com> wrote:
>>>
>>> On Thu, Feb 20, 2014 at 2:26 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
>>> wrote:
>>>>
>>>> On Thu, Feb 20, 2014 at 6:24 AM, Haribabu Kommi
>>>> <kommi(dot)haribabu(at)gmail(dot)com> wrote:
>>>> > On Thu, Feb 20, 2014 at 11:38 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>> >> > I want to propose a new feature called "priority table" or "cache
>>>> >> > table".
>>>> >> > This is same as regular table except the pages of these tables are
>>>> >> > having
>>>> >> > high priority than normal tables. These tables are very useful,
>>>> >> > where a
>>>> >> > faster query processing on some particular tables is expected.
>>>> >>
>>>> >> Why exactly does the existing LRU behavior of shared buffers not do
>>>> >> what you need?
>>>> >
>>>> >
>>>> > Lets assume a database having 3 tables, which are accessed regularly.
>>>> > The
>>>> > user is expecting a faster query results on one table.
>>>> > Because of LRU behavior which is not happening some times.

I Implemented a proof of concept patch to see whether the buffer pool
split can improve the performance or not.

Summary of the changes:
1. The priority buffers are allocated as continuous to the shared buffers.
2. Added new reloption parameter called "buffer_pool" to specify the
buffer_pool user wants the table to use.
3. Two free lists are created to store the information for two buffer pools.
4. While allocating the buffer based on the table type, the
corresponding buffer is allocated.

The Performance test is carried as follows:
1. Create all the pgbench tables and indexes on the new buffer pool.
2. Initialize the pgbench test with a scale factor of 75 equals to a
size of 1GB.
3. Create an another load test table with a size of 1GB with default
buffer pool.
4. In-parallel with performance test the select and update operations
are carried out on the load test table (singe thread).

Configuration changes:
shared_buffers - 1536MB (Head) Patched Shared_buffers
-512MB, Priority_buffers - 1024MB.
synchronous_commit - off, wal_buffers-16MB, checkpoint_segments - 255,
checkpoint_timeout - 15min.

Threads Head Patched Diff
1 25 25 0%
2 35 59 68%
4 52 79 51%
8 79 150 89%

In my testing it shows very good improvement in performance.

The POC patch and the test script is attached in the mail used for
testing the performance.
The modified pgbench.c code is also attached to use the newly created
buffer pool instead of default for the test purpose.
Copy the test script to the installation folder and execute as
./rub_bg.sh ./run_reading.sh 1 1

please let me know your suggestions.

Regards,
Hari Babu
Fujitsu Australia

Attachment Content-Type Size
test_script.zip application/zip 25.1 KB
cache_table_poc.patch application/octet-stream 69.0 KB

From: Sameer Thakur <samthakur74(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Priority table or Cache table
Date: 2014-05-16 10:29:01
Message-ID: 1400236141628-5804200.post@n5.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello,
I applied the patch to current HEAD. There was one failure (attached),
freelist.rej
<http://postgresql.1045698.n5.nabble.com/file/n5804200/freelist.rej>

Compiled the provided pgbench.c and added following in .conf
shared_buffers = 128MB # min 128kB
Shared_buffers=64MB
Priority_buffers=128MB

I was planning to performance test later hence different values.

But while executing pgbench the following assertion occurs

LOG: database system is ready to accept connections
LOG: autovacuum launcher started
TRAP: FailedAssertion("!(strategy_delta >= 0)", File: "bufmgr.c", Line:
1435)
LOG: background writer process (PID 10274) was terminated by signal 6:
Aborted
LOG: terminating any other active server processes
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.

Is there a way to avoid it? Am i making some mistake?
regards
Sameer

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Priority-table-or-Cache-table-tp5792831p5804200.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


From: Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Priority table or Cache table
Date: 2014-05-16 13:15:57
Message-ID: 156928B0-02FF-4AD9-85B7-7FE69D40C4B2@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 20 Feb 2014, at 01:38, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com> writes:
>> I want to propose a new feature called "priority table" or "cache table".
>> This is same as regular table except the pages of these tables are having
>> high priority than normal tables. These tables are very useful, where a
>> faster query processing on some particular tables is expected.
>
> Why exactly does the existing LRU behavior of shared buffers not do
> what you need?
>
> I am really dubious that letting DBAs manage buffers is going to be
> an improvement over automatic management.
>
> regards, tom lane

the reason for a feature like that is to define an area of the application which needs more predictable runtime behaviour.
not all tables are created equals in term of importance.

example: user authentication should always be supersonic fast while some reporting tables might gladly be forgotten even if they happened to be in use recently.

i am not saying that we should have this feature.
however, there are definitely use cases which would justify some more control here.
otherwise people will fall back and use dirty tricks sucks as “SELECT count(*)” or so to emulate what we got here.

many thanks,

hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Priority table or Cache table
Date: 2014-05-20 11:46:48
Message-ID: CAHGQGwHFU6TJpRmuFe=B5wfM6tm0ZnQ_Q01-4gnjFrJX9_ij1Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Mar 17, 2014 at 1:16 PM, Haribabu Kommi
<kommi(dot)haribabu(at)gmail(dot)com> wrote:
> On Fri, Feb 21, 2014 at 12:02 PM, Haribabu Kommi
> <kommi(dot)haribabu(at)gmail(dot)com> wrote:
>> On Thu, Feb 20, 2014 at 10:06 PM, Ashutosh Bapat
>> <ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
>>>
>>> On Thu, Feb 20, 2014 at 10:23 AM, Haribabu Kommi
>>> <kommi(dot)haribabu(at)gmail(dot)com> wrote:
>>>>
>>>> On Thu, Feb 20, 2014 at 2:26 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
>>>> wrote:
>>>>>
>>>>> On Thu, Feb 20, 2014 at 6:24 AM, Haribabu Kommi
>>>>> <kommi(dot)haribabu(at)gmail(dot)com> wrote:
>>>>> > On Thu, Feb 20, 2014 at 11:38 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>>> >> > I want to propose a new feature called "priority table" or "cache
>>>>> >> > table".
>>>>> >> > This is same as regular table except the pages of these tables are
>>>>> >> > having
>>>>> >> > high priority than normal tables. These tables are very useful,
>>>>> >> > where a
>>>>> >> > faster query processing on some particular tables is expected.
>>>>> >>
>>>>> >> Why exactly does the existing LRU behavior of shared buffers not do
>>>>> >> what you need?
>>>>> >
>>>>> >
>>>>> > Lets assume a database having 3 tables, which are accessed regularly.
>>>>> > The
>>>>> > user is expecting a faster query results on one table.
>>>>> > Because of LRU behavior which is not happening some times.
>
> I Implemented a proof of concept patch to see whether the buffer pool
> split can improve the performance or not.
>
> Summary of the changes:
> 1. The priority buffers are allocated as continuous to the shared buffers.
> 2. Added new reloption parameter called "buffer_pool" to specify the
> buffer_pool user wants the table to use.

I'm not sure if storing the information of "priority table" into
database is good
because this means that it's replicated to the standby and the same table
will be treated with high priority even in the standby server. I can imagine
some users want to set different tables as high priority ones in master and
standby.

Regards,

--
Fujii Masao


From: Jim Nasby <jim(at)nasby(dot)net>
To: Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Priority table or Cache table
Date: 2014-05-24 17:58:30
Message-ID: 5380DDC6.8060903@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 5/16/14, 8:15 AM, Hans-Jürgen Schönig wrote:

> On 20 Feb 2014, at 01:38, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I am really dubious that letting DBAs manage buffers is going to be
>> an improvement over automatic management.
>
> the reason for a feature like that is to define an area of the application which needs more predictable runtime behaviour.
> not all tables are created equals in term of importance.
>
> example: user authentication should always be supersonic fast while some reporting tables might gladly be forgotten even if they happened to be in use recently.
>
> i am not saying that we should have this feature.
> however, there are definitely use cases which would justify some more control here.
> otherwise people will fall back and use dirty tricks sucks as “SELECT count(*)” or so to emulate what we got here.

Which is really just an extension of a larger problem: many applications do not care one iota about ideal performance; they care about *always* having some minimum level of performance. This frequently comes up with the issue of a query plan that is marginally faster 99% of the time but sucks horribly for the remaining 1%. Frequently it's far better to chose a less optimal query that doesn't have a degenerate case.
--
Jim C. Nasby, Data Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net


From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Priority table or Cache table
Date: 2014-05-25 09:52:47
Message-ID: 5381BD6F.7000209@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05/20/2014 01:46 PM, Fujii Masao wrote:
> On Mon, Mar 17, 2014 at 1:16 PM, Haribabu Kommi
> <kommi(dot)haribabu(at)gmail(dot)com> wrote:
>> ...
>> I Implemented a proof of concept patch to see whether the buffer pool
>> split can improve the performance or not.
>>
>> Summary of the changes:
>> 1. The priority buffers are allocated as continuous to the shared buffers.
>> 2. Added new reloption parameter called "buffer_pool" to specify the
>> buffer_pool user wants the table to use.
> I'm not sure if storing the information of "priority table" into
> database is good
> because this means that it's replicated to the standby and the same table
> will be treated with high priority even in the standby server. I can imagine
> some users want to set different tables as high priority ones in master and
> standby.
There might be a possibility to override this in postgresql.conf for
optimising what you described but for most uses it is best to be in
the database, at least to get started.

Cheers

--
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Hannu Krosing <hannu(at)2ndquadrant(dot)com>
Cc: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Priority table or Cache table
Date: 2014-05-26 14:16:55
Message-ID: CAHGQGwF_y=XjWMy2Fypeqi6PE7AUg1e+4KECOCZFjQDqpEZSwQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, May 25, 2014 at 6:52 PM, Hannu Krosing <hannu(at)2ndquadrant(dot)com> wrote:
> On 05/20/2014 01:46 PM, Fujii Masao wrote:
>> On Mon, Mar 17, 2014 at 1:16 PM, Haribabu Kommi
>> <kommi(dot)haribabu(at)gmail(dot)com> wrote:
>>> ...
>>> I Implemented a proof of concept patch to see whether the buffer pool
>>> split can improve the performance or not.
>>>
>>> Summary of the changes:
>>> 1. The priority buffers are allocated as continuous to the shared buffers.
>>> 2. Added new reloption parameter called "buffer_pool" to specify the
>>> buffer_pool user wants the table to use.
>> I'm not sure if storing the information of "priority table" into
>> database is good
>> because this means that it's replicated to the standby and the same table
>> will be treated with high priority even in the standby server. I can imagine
>> some users want to set different tables as high priority ones in master and
>> standby.
> There might be a possibility to override this in postgresql.conf for
> optimising what you described but for most uses it is best to be in
> the database, at least to get started.

Overriding the setting in postgresql.conf rather than that in database might
confuse users because it's opposite order of the priority of the GUC setting.

Or, what about storig the setting into flat file like replication slot?

Regards,

--
Fujii Masao


From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Priority table or Cache table
Date: 2014-05-26 16:11:04
Message-ID: 53836798.7000001@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05/26/2014 04:16 PM, Fujii Masao wrote:
> On Sun, May 25, 2014 at 6:52 PM, Hannu Krosing <hannu(at)2ndquadrant(dot)com> wrote:
>> On 05/20/2014 01:46 PM, Fujii Masao wrote:
>>> On Mon, Mar 17, 2014 at 1:16 PM, Haribabu Kommi
>>> <kommi(dot)haribabu(at)gmail(dot)com> wrote:
>>>> ...
>>>> I Implemented a proof of concept patch to see whether the buffer pool
>>>> split can improve the performance or not.
>>>>
>>>> Summary of the changes:
>>>> 1. The priority buffers are allocated as continuous to the shared buffers.
>>>> 2. Added new reloption parameter called "buffer_pool" to specify the
>>>> buffer_pool user wants the table to use.
>>> I'm not sure if storing the information of "priority table" into
>>> database is good
>>> because this means that it's replicated to the standby and the same table
>>> will be treated with high priority even in the standby server. I can imagine
>>> some users want to set different tables as high priority ones in master and
>>> standby.
>> There might be a possibility to override this in postgresql.conf for
>> optimising what you described but for most uses it is best to be in
>> the database, at least to get started.
> Overriding the setting in postgresql.conf rather than that in database might
> confuse users because it's opposite order of the priority of the GUC setting.
>
> Or, what about storig the setting into flat file like replication slot?
seems like a good time to introduce a notion of non-replicated tables :)

should be a good fit with logical replication.

Cheers
Hannu
>
> Regards,
>

--
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ


From: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
To: Sameer Thakur <samthakur74(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Priority table or Cache table
Date: 2014-06-03 04:20:10
Message-ID: CAJrrPGeMY=chgBP3SP6TgGvF27d_YPDes6OcGEZiLVx7VjhKqA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, May 16, 2014 at 8:29 PM, Sameer Thakur <samthakur74(at)gmail(dot)com> wrote:
> Hello,
> I applied the patch to current HEAD. There was one failure (attached),
> freelist.rej
> <http://postgresql.1045698.n5.nabble.com/file/n5804200/freelist.rej>
>
> Compiled the provided pgbench.c and added following in .conf
> shared_buffers = 128MB # min 128kB
> Shared_buffers=64MB
> Priority_buffers=128MB
>
> I was planning to performance test later hence different values.
>
> But while executing pgbench the following assertion occurs
>
> LOG: database system is ready to accept connections
> LOG: autovacuum launcher started
> TRAP: FailedAssertion("!(strategy_delta >= 0)", File: "bufmgr.c", Line:
> 1435)
> LOG: background writer process (PID 10274) was terminated by signal 6:
> Aborted
> LOG: terminating any other active server processes
> WARNING: terminating connection because of crash of another server process
> DETAIL: The postmaster has commanded this server process to roll back the
> current transaction and exit, because another server process exited
> abnormally and possibly corrupted shared memory.
>
> Is there a way to avoid it? Am i making some mistake?

Sorry for the late reply. Thanks for the test.
Please find the re-based patch with a temp fix for correcting the problem.
I will a submit a proper patch fix later.

Regards,
Hari Babu
Fujitsu Australia

Attachment Content-Type Size
cache_table_poc_v2.patch application/octet-stream 67.8 KB

From: Beena Emerson <memissemerson(at)gmail(dot)com>
To: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
Cc: Sameer Thakur <samthakur74(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Priority table or Cache table
Date: 2014-06-30 13:08:15
Message-ID: CAOG9ApE4Qu7fuoOEFRUiG0xFZYqNtscNgH2s2O8Q2iGgeoO66A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jun 3, 2014 at 9:50 AM, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
wrote:

> Sorry for the late reply. Thanks for the test.
> Please find the re-based patch with a temp fix for correcting the problem.
> I will a submit a proper patch fix later.
>
>
Please note that the new patch still gives assertion error:

TRAP: FailedAssertion("!(buf->freeNext != (-2))", File: "freelist.c", Line:
178)
psql:load_test.sql:5: connection to server was lost

Hence, the patch was installed with assertions off.

I also ran the test script after making the same configuration changes that
you have specified. I found that I was not able to get the same performance
difference that you have reported.

Following table lists the tps in each scenario and the % increase in
performance.

Threads Head Patched Diff
1 1669 1718 3%
2 2844 3195 12%
4 3909 4915 26%
8 7332 8329 14%

Kindly let me know if I am missing something.

--
Beena Emerson


From: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
To: Beena Emerson <memissemerson(at)gmail(dot)com>
Cc: Sameer Thakur <samthakur74(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Priority table or Cache table
Date: 2015-08-06 06:54:06
Message-ID: CAJrrPGdPfXYdh3cSoarn352RgW=2s_KxLvPdLy8DZeUZ-_qpvw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jun 30, 2014 at 11:08 PM, Beena Emerson <memissemerson(at)gmail(dot)com> wrote:
>
> I also ran the test script after making the same configuration changes that
> you have specified. I found that I was not able to get the same performance
> difference that you have reported.
>
> Following table lists the tps in each scenario and the % increase in
> performance.
>
> Threads Head Patched Diff
> 1 1669 1718 3%
> 2 2844 3195 12%
> 4 3909 4915 26%
> 8 7332 8329 14%
>

coming back to this old thread.

I just tried a new approach for this priority table, instead of a
entirely separate buffer pool,
Just try to use a some portion of shared buffers to priority tables
using some GUC variable
"buffer_cache_ratio"(0-75) to specify what percentage of shared
buffers to be used.

Syntax:

create table tbl(f1 int) with(buffer_cache=true);

Comparing earlier approach, I though of this approach is easier to implement.
But during the performance run, it didn't showed much improvement in
performance.
Here are the test results.

Threads Head Patched Diff
1 3123 3238 3.68%
2 5997 6261 4.40%
4 11102 11407 2.75%

I am suspecting that, this may because of buffer locks that are
causing the problem.
where as in older approach of different buffer pools, each buffer pool
have it's own locks.
I will try to collect the profile output and analyze the same.

Any better ideas?

Here I attached a proof of concept patch.

Regards,
Hari Babu
Fujitsu Australia

Attachment Content-Type Size
cache_table_poc.patch application/octet-stream 12.1 KB

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
Cc: Beena Emerson <memissemerson(at)gmail(dot)com>, Sameer Thakur <samthakur74(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Priority table or Cache table
Date: 2015-08-10 05:09:38
Message-ID: CAA4eK1L3HkZ8-M=ksVYTc98A4tOi0=Tg-HW_6bHj9EJo81f_6A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Aug 6, 2015 at 12:24 PM, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
wrote:
>
> On Mon, Jun 30, 2014 at 11:08 PM, Beena Emerson <memissemerson(at)gmail(dot)com>
wrote:
> >
> > I also ran the test script after making the same configuration changes
that
> > you have specified. I found that I was not able to get the same
performance
> > difference that you have reported.
> >
> > Following table lists the tps in each scenario and the % increase in
> > performance.
> >
> > Threads Head Patched Diff
> > 1 1669 1718 3%
> > 2 2844 3195 12%
> > 4 3909 4915 26%
> > 8 7332 8329 14%
> >
>
>
> coming back to this old thread.
>
> I just tried a new approach for this priority table, instead of a
> entirely separate buffer pool,
> Just try to use a some portion of shared buffers to priority tables
> using some GUC variable
> "buffer_cache_ratio"(0-75) to specify what percentage of shared
> buffers to be used.
>
> Syntax:
>
> create table tbl(f1 int) with(buffer_cache=true);
>
> Comparing earlier approach, I though of this approach is easier to
implement.
> But during the performance run, it didn't showed much improvement in
> performance.
> Here are the test results.
>

What is the configuration for test (RAM of m/c, shared_buffers,
scale_factor, etc.)?

> Threads Head Patched Diff
> 1 3123 3238 3.68%
> 2 5997 6261 4.40%
> 4 11102 11407 2.75%
>
> I am suspecting that, this may because of buffer locks that are
> causing the problem.
> where as in older approach of different buffer pools, each buffer pool
> have it's own locks.
> I will try to collect the profile output and analyze the same.
>
> Any better ideas?
>

I think you should try to find out during test, for how many many pages,
it needs to perform clocksweep (add some new counter like
numBufferBackendClocksweep in BufferStrategyControl to find out the
same). By theory your patch should reduce the number of times it needs
to perform clock sweep.

I think in this approach even if you make some buffers as non-replaceable
(buffers for which BM_BUFFER_CACHE_PAGE is set), still clock sweep
needs to access all the buffers. I think we might want to find some way to
reduce that if this idea helps.

Another thing is that, this idea looks somewhat similar (although not same)
to current Ring Buffer concept, where Buffers for particular types of scan
uses buffers from Ring. I think it is okay to prototype as you have done
in patch and we can consider to do something on those lines if at all
this patch's idea helps.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Beena Emerson <memissemerson(at)gmail(dot)com>, Sameer Thakur <samthakur74(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Priority table or Cache table
Date: 2015-08-11 06:01:57
Message-ID: CAJrrPGcAOz1FdKE4PEscKZ2yQWvgHBU5tnKEr97hg8QVxguk7g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Aug 10, 2015 at 3:09 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Thu, Aug 6, 2015 at 12:24 PM, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
> wrote:
>
> What is the configuration for test (RAM of m/c, shared_buffers,
> scale_factor, etc.)?

Here are the details:

CPU - 16 core, RAM - 252 GB

shared_buffers - 1700MB, buffer_cache_ratio - 70
wal_buffers - 16MB, synchronous_commit - off
checkpoint_timeout - 15min, max_wal_size - 5GB.

pgbench scale factor - 75 (1GB)

Load test table size - 1GB

>> Threads Head Patched Diff
>> 1 3123 3238 3.68%
>> 2 5997 6261 4.40%
>> 4 11102 11407 2.75%
>>
>> I am suspecting that, this may because of buffer locks that are
>> causing the problem.
>> where as in older approach of different buffer pools, each buffer pool
>> have it's own locks.
>> I will try to collect the profile output and analyze the same.
>>
>> Any better ideas?
>>
>
> I think you should try to find out during test, for how many many pages,
> it needs to perform clocksweep (add some new counter like
> numBufferBackendClocksweep in BufferStrategyControl to find out the
> same). By theory your patch should reduce the number of times it needs
> to perform clock sweep.
>
> I think in this approach even if you make some buffers as non-replaceable
> (buffers for which BM_BUFFER_CACHE_PAGE is set), still clock sweep
> needs to access all the buffers. I think we might want to find some way to
> reduce that if this idea helps.
>
> Another thing is that, this idea looks somewhat similar (although not same)
> to current Ring Buffer concept, where Buffers for particular types of scan
> uses buffers from Ring. I think it is okay to prototype as you have done
> in patch and we can consider to do something on those lines if at all
> this patch's idea helps.

Thanks for the details. I will try the same.

Regards,
Hari Babu
Fujitsu Australia


From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
Cc: Beena Emerson <memissemerson(at)gmail(dot)com>, Sameer Thakur <samthakur74(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Priority table or Cache table
Date: 2015-08-11 06:43:25
Message-ID: CAA4eK1LRuqvJcEJZQ1c1vKL2XVhPSJwZMVid0C+n=4z15TDePA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Aug 11, 2015 at 11:31 AM, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
wrote:

> On Mon, Aug 10, 2015 at 3:09 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> > On Thu, Aug 6, 2015 at 12:24 PM, Haribabu Kommi <
> kommi(dot)haribabu(at)gmail(dot)com>
> > wrote:
> >
> > What is the configuration for test (RAM of m/c, shared_buffers,
> > scale_factor, etc.)?
>
> Here are the details:
>
> CPU - 16 core, RAM - 252 GB
>
> shared_buffers - 1700MB, buffer_cache_ratio - 70
> wal_buffers - 16MB, synchronous_commit - off
> checkpoint_timeout - 15min, max_wal_size - 5GB.
>
> pgbench scale factor - 75 (1GB)
>
> Load test table size - 1GB
>

It seems that test table can fit easily in shared buffers, I am not sure
this patch will be of benefit for such cases, why do you think it can be
beneficial for such cases?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Beena Emerson <memissemerson(at)gmail(dot)com>, Sameer Thakur <samthakur74(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Priority table or Cache table
Date: 2015-08-11 07:56:02
Message-ID: CAJrrPGe9twTtDOr64fhRBrL5YKFqomZeqhOioDBQoquAGS27hA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Aug 11, 2015 at 4:43 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Tue, Aug 11, 2015 at 11:31 AM, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
> wrote:
>>
>> On Mon, Aug 10, 2015 at 3:09 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
>> wrote:
>> > On Thu, Aug 6, 2015 at 12:24 PM, Haribabu Kommi
>> > <kommi(dot)haribabu(at)gmail(dot)com>
>> > wrote:
>> >
>> > What is the configuration for test (RAM of m/c, shared_buffers,
>> > scale_factor, etc.)?
>>
>> Here are the details:
>>
>> CPU - 16 core, RAM - 252 GB
>>
>> shared_buffers - 1700MB, buffer_cache_ratio - 70
>> wal_buffers - 16MB, synchronous_commit - off
>> checkpoint_timeout - 15min, max_wal_size - 5GB.
>>
>> pgbench scale factor - 75 (1GB)
>>
>> Load test table size - 1GB
>
>
> It seems that test table can fit easily in shared buffers, I am not sure
> this patch will be of benefit for such cases, why do you think it can be
> beneficial for such cases?

Yes. This configuration combination is may not be best for the test.

The idea behind these setting is to provide enough shared buffers to cache
table by tuning the buffer_cache_ratio from 0 to 70% of shared buffers
So the cache tables have enough shared buffers and rest of the shared
buffers can be used for normal tables i.e load test table.

I will try to evaluate some more performance tests with different shared
buffers settings and load.

Regards,
Hari Babu
Fujitsu Australia