Re: [rfc] overhauling pgstat.stat

From: Satoshi Nagayasu <snaga(at)uptime(dot)jp>
To: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [rfc] overhauling pgstat.stat
Date: 2013-09-04 10:09:25
Message-ID: 522706D5.8040008@uptime.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

(2013/09/04 15:23), Atri Sharma wrote:
>
>
> Sent from my iPad
>
> On 04-Sep-2013, at 10:54, Satoshi Nagayasu <snaga(at)uptime(dot)jp> wrote:
>
>> Hi,
>>
>> (2013/09/04 12:52), Atri Sharma wrote:
>>> On Wed, Sep 4, 2013 at 6:40 AM, Satoshi Nagayasu <snaga(at)uptime(dot)jp> wrote:
>>>> Hi,
>>>>
>>>> I'm considering overhauling pgstat.stat, and would like to know how many
>>>> people are interested in this topic.
>>>>
>>>> As you may know, this file could be handreds of MB in size, because
>>>> pgstat.stat holds all access statistics in each database, and it needs
>>>> to read/write an entire pgstat.stat frequently.
>>>>
>>>> As a result, pgstat.stat often generates massive I/O operation,
>>>> particularly when having a large number of tables in the database.
>>>>
>>>> To support multi-tenancy or just a large number of tables (up to 10k
>>>> tables in single database), I think pgstat.stat needs to be overhauled.
>>>>
>>>> I think using heap and btree in pgstat.stat would be preferred to reduce
>>>> read/write and to allow updating access statistics for specific tables
>>>> in pgstat.stat file.
>>>>
>>>> Is this good for us?
>>>
>>> Hi,
>>>
>>> Nice thought. I/O reduction in pgstat can be really helpful.
>>>
>>> I am trying to think of our aim here. Would we be looking to split
>>> pgstat per table, so that the I/O write happens for only a portion of
>>> pgstat? Or reduce the I/O in general?
>>
>> I prefer the latter.
>>
>> Under the current implementation, DBA need to split single database
>> into many smaller databases with considering access locality of the
>> tables. It's difficult and could be change in future.
>>
>> And splitting the statistics data into many files (per table,
>> for example) would cause another performance issue when
>> collecting/showing statistics at once. Just my guess though.
>>
>> So, I'm looking for a new way to reduce I/O for the statistics data
>> in general.
>>
>> Regards,
>>
>>>
>>> If the later, how would using BTree help us? I would rather go for a
>>> range tree or something. But again, I may be completely wrong.
>>>
>>> Please elaborate a bit more on the solution we are trying to
>>> achieve.It seems really interesting.
>>>
>>> Regards,
>>>
>>> Atri
>>>
>>>
>>
>>
>
> Right,thanks.
>
> How would using heap and BTree help here? Are we looking at a priority queue which supports the main storage system of the stats?

For example, when you read only a single block from your table,
then you need to write all values in your database statistics next.
It often generates large amount of i/o operation.

However, if random access is allowed in the statistics, you can
update only as single record for the specific table which you read.
It would be less than 100 bytes for each table.

I have no idea about how a priority queue can work here so far.
However, if the statistics is overhauled, PostgreSQL would be able
to host a much larger number of customers more easily.

Regards,
--
Satoshi Nagayasu <snaga(at)uptime(dot)jp>
Uptime Technologies, LLC. http://www.uptime.jp

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message MauMau 2013-09-04 10:10:00 Re: [9.3 doc fix] clarification of Solaris versions
Previous Message Boszormenyi Zoltan 2013-09-04 08:06:24 Re: ECPG FETCH readahead