[PATCH] Statistics collection for CLUSTER command

Lists: pgsql-hackers
From: Vik Fearing <vik(dot)fearing(at)dalibo(dot)com>
To: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: [PATCH] Statistics collection for CLUSTER command
Date: 2013-08-08 11:52:47
Message-ID: 5203868F.3020606@dalibo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

As part of routine maintenance monitoring, it is interesting for us to
have statistics on the CLUSTER command (timestamp of last run, and
number of runs since stat reset) like we have for (auto)ANALYZE and
(auto)VACUUM. Patch against today's HEAD attached.

I would add this to the next commitfest but I seem to be unable to log
in with my community account (I can log in to the wiki). Help appreciated.

Attachment Content-Type Size
clusterstats.patch text/x-patch 28.5 KB

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Vik Fearing <vik(dot)fearing(at)dalibo(dot)com>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Statistics collection for CLUSTER command
Date: 2013-08-08 12:26:00
Message-ID: alpine.DEB.2.02.1308081425390.9285@localhost6.localdomain6
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> As part of routine maintenance monitoring, it is interesting for us to
> have statistics on the CLUSTER command (timestamp of last run, and
> number of runs since stat reset) like we have for (auto)ANALYZE and
> (auto)VACUUM. Patch against today's HEAD attached.
>
> I would add this to the next commitfest but I seem to be unable to log
> in with my community account (I can log in to the wiki). Help appreciated.

Done.

--
Fabien.


From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: Vik Fearing <vik(dot)fearing(at)dalibo(dot)com>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Statistics collection for CLUSTER command
Date: 2013-08-08 17:57:33
Message-ID: 5203DC0D.3060906@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 08/08/2013 01:52 PM, Vik Fearing wrote:
> As part of routine maintenance monitoring, it is interesting for us to
> have statistics on the CLUSTER command (timestamp of last run, and
> number of runs since stat reset) like we have for (auto)ANALYZE and
> (auto)VACUUM. Patch against today's HEAD attached.
>
> I would add this to the next commitfest but I seem to be unable to log
> in with my community account (I can log in to the wiki). Help appreciated.

whould be a bit easier to diagnose if we knew your community account name ;)

Stefan


From: Vik Fearing <vik(dot)fearing(at)dalibo(dot)com>
To: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Statistics collection for CLUSTER command
Date: 2013-08-08 22:02:14
Message-ID: 52041566.4080701@dalibo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 08/08/2013 07:57 PM, Stefan Kaltenbrunner wrote:

> On 08/08/2013 01:52 PM, Vik Fearing wrote:
>> I would add this to the next commitfest but I seem to be unable to log
>> in with my community account (I can log in to the wiki). Help appreciated.
> whould be a bit easier to diagnose if we knew your community account name

Sorry, it's "glaucous".

Vik


From: Vik Fearing <vik(dot)fearing(at)dalibo(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Statistics collection for CLUSTER command
Date: 2013-08-09 07:44:28
Message-ID: 52049DDC.8020008@dalibo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 08/08/2013 02:26 PM, Fabien COELHO wrote:
>
>> As part of routine maintenance monitoring, it is interesting for us to
>> have statistics on the CLUSTER command (timestamp of last run, and
>> number of runs since stat reset) like we have for (auto)ANALYZE and
>> (auto)VACUUM. Patch against today's HEAD attached.
>>
>> I would add this to the next commitfest but I seem to be unable to log
>> in with my community account (I can log in to the wiki). Help
>> appreciated.
>
> Done.
>

Thank you, but it seems you've duplicated the title from the other patch
(and thanks for adding that one, too!).

https://commitfest.postgresql.org/action/patch_view?id=1190

Vik


From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Vik Fearing <vik(dot)fearing(at)dalibo(dot)com>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Statistics collection for CLUSTER command
Date: 2013-08-09 08:04:11
Message-ID: alpine.DEB.2.02.1308091001490.9285@localhost6.localdomain6
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> Thank you, but it seems you've duplicated the title from the other patch
> (and thanks for adding that one, too!).

Indeed, possibly a wrong copy paste. Fixed.

--
Fabien.


From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: Vik Fearing <vik(dot)fearing(at)dalibo(dot)com>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Statistics collection for CLUSTER command
Date: 2013-08-09 20:37:21
Message-ID: 52055301.2080901@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 08/09/2013 12:02 AM, Vik Fearing wrote:
> On 08/08/2013 07:57 PM, Stefan Kaltenbrunner wrote:
>
>> On 08/08/2013 01:52 PM, Vik Fearing wrote:
>>> I would add this to the next commitfest but I seem to be unable to log
>>> in with my community account (I can log in to the wiki). Help appreciated.
>> whould be a bit easier to diagnose if we knew your community account name
>
> Sorry, it's "glaucous".

hmm looks like your account may be affected by one of the buglets
introduced (and fixed shortly afterwards) of the main infrastructure to
debian wheezy - please try logging in to the main website and change
your password at least once. That should make it working again for the
commitfest app...

Stefan


From: Vik Fearing <vik(dot)fearing(at)dalibo(dot)com>
To: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Statistics collection for CLUSTER command
Date: 2013-08-09 22:50:29
Message-ID: 52057235.9040809@dalibo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 08/09/2013 10:37 PM, Stefan Kaltenbrunner wrote:
>>> On 08/08/2013 01:52 PM, Vik Fearing wrote:
>>>> I would add this to the next commitfest but I seem to be unable to log
>>>> in with my community account (I can log in to the wiki). Help appreciated.
> hmm looks like your account may be affected by one of the buglets
> introduced (and fixed shortly afterwards) of the main infrastructure to
> debian wheezy - please try logging in to the main website and change
> your password at least once. That should make it working again for the
> commitfest app...

That worked. Thank you.

Vik


From: Satoshi Nagayasu <snaga(at)uptime(dot)jp>
To: Vik Fearing <vik(dot)fearing(at)dalibo(dot)com>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Statistics collection for CLUSTER command
Date: 2013-09-16 06:26:00
Message-ID: 5236A478.4020803@uptime.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

(2013/08/08 20:52), Vik Fearing wrote:
> As part of routine maintenance monitoring, it is interesting for us to
> have statistics on the CLUSTER command (timestamp of last run, and
> number of runs since stat reset) like we have for (auto)ANALYZE and
> (auto)VACUUM. Patch against today's HEAD attached.
>
> I would add this to the next commitfest but I seem to be unable to log
> in with my community account (I can log in to the wiki). Help appreciated.

I have reviewed the patch.

Succeeded to build with the latest HEAD, and passed the regression
tests.

Looks good enough, and I'd like to add a test case here, not only
for the view definition, but also working correctly.

Please take a look at attached one.

Regards,
--
Satoshi Nagayasu <snaga(at)uptime(dot)jp>
Uptime Technologies, LLC. http://www.uptime.jp

Attachment Content-Type Size
clusterstats_regress.patch text/plain 3.2 KB

From: Vik Fearing <vik(dot)fearing(at)dalibo(dot)com>
To: Satoshi Nagayasu <snaga(at)uptime(dot)jp>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Statistics collection for CLUSTER command
Date: 2013-10-15 11:52:42
Message-ID: 525D2C8A.2030702@dalibo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 09/16/2013 08:26 AM, Satoshi Nagayasu wrote:
> (2013/08/08 20:52), Vik Fearing wrote:
>> As part of routine maintenance monitoring, it is interesting for us to
>> have statistics on the CLUSTER command (timestamp of last run, and
>> number of runs since stat reset) like we have for (auto)ANALYZE and
>> (auto)VACUUM. Patch against today's HEAD attached.
>>
>> I would add this to the next commitfest but I seem to be unable to log
>> in with my community account (I can log in to the wiki). Help
>> appreciated.
>
> I have reviewed the patch.

Thank you for your review.

> Succeeded to build with the latest HEAD, and passed the regression
> tests.
>
> Looks good enough, and I'd like to add a test case here, not only
> for the view definition, but also working correctly.
>
> Please take a look at attached one.

Looks good to me. Attached is a rebased patch with those tests added.

--
Vik

Attachment Content-Type Size
clusterstats.v2.patch text/x-patch 32.2 KB

From: Noah Misch <noah(at)leadboat(dot)com>
To: Vik Fearing <vik(dot)fearing(at)dalibo(dot)com>
Cc: Satoshi Nagayasu <snaga(at)uptime(dot)jp>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Statistics collection for CLUSTER command
Date: 2013-10-20 05:37:02
Message-ID: 20131020053702.GA383291@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> > (2013/08/08 20:52), Vik Fearing wrote:
> >> As part of routine maintenance monitoring, it is interesting for us to
> >> have statistics on the CLUSTER command (timestamp of last run, and
> >> number of runs since stat reset) like we have for (auto)ANALYZE and
> >> (auto)VACUUM. Patch against today's HEAD attached.

Adding new fields to PgStat_StatTabEntry imposes a substantial distributed
cost, because every database stats file write-out grows by the width of those
fields times the number of tables in the database. Associated costs have been
and continue to be a pain point with large table counts:

http://www.postgresql.org/message-id/flat/1718942738eb65c8407fcd864883f4c8(at)fuzzy(dot)cz
http://www.postgresql.org/message-id/flat/52268887(dot)9010509(at)uptime(dot)jp

In that light, I can't justify widening PgStat_StatTabEntry by 9.5% for this.
I recommend satisfying this monitoring need in your application by creating a
cluster_table wrapper function that issues CLUSTER and then updates statistics
you store in an ordinary table. Issue all routine CLUSTERs by way of that
wrapper function. A backend change that would help here is to extend event
triggers to cover the CLUSTER command, permitting you to inject monitoring
after plain CLUSTER and dispense with the wrapper.

Thanks,
nm

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Vik Fearing <vik(dot)fearing(at)dalibo(dot)com>, Satoshi Nagayasu <snaga(at)uptime(dot)jp>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Statistics collection for CLUSTER command
Date: 2013-10-20 20:04:45
Message-ID: m2ppqzu2si.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Noah Misch <noah(at)leadboat(dot)com> writes:
> wrapper function. A backend change that would help here is to extend event
> triggers to cover the CLUSTER command, permitting you to inject monitoring
> after plain CLUSTER and dispense with the wrapper.

I didn't look in any level of details, but it might be as simple as
moving the T_ClusterStmt case from standard_ProcessUtility() down to the
Event Trigger friendly part known as ProcessUtilitySlow().

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Vik Fearing <vik(dot)fearing(at)dalibo(dot)com>, Satoshi Nagayasu <snaga(at)uptime(dot)jp>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Statistics collection for CLUSTER command
Date: 2013-10-22 16:36:17
Message-ID: CA+TgmoZ0hw=Z5ihJ7Onhmy1trk-+ZM2zgKernHoHmAH=cjmeOQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Oct 20, 2013 at 1:37 AM, Noah Misch <noah(at)leadboat(dot)com> wrote:
>> > (2013/08/08 20:52), Vik Fearing wrote:
>> >> As part of routine maintenance monitoring, it is interesting for us to
>> >> have statistics on the CLUSTER command (timestamp of last run, and
>> >> number of runs since stat reset) like we have for (auto)ANALYZE and
>> >> (auto)VACUUM. Patch against today's HEAD attached.
>
> Adding new fields to PgStat_StatTabEntry imposes a substantial distributed
> cost, because every database stats file write-out grows by the width of those
> fields times the number of tables in the database. Associated costs have been
> and continue to be a pain point with large table counts:
>
> http://www.postgresql.org/message-id/flat/1718942738eb65c8407fcd864883f4c8(at)fuzzy(dot)cz
> http://www.postgresql.org/message-id/flat/52268887(dot)9010509(at)uptime(dot)jp
>
> In that light, I can't justify widening PgStat_StatTabEntry by 9.5% for this.
> I recommend satisfying this monitoring need in your application by creating a
> cluster_table wrapper function that issues CLUSTER and then updates statistics
> you store in an ordinary table. Issue all routine CLUSTERs by way of that
> wrapper function. A backend change that would help here is to extend event
> triggers to cover the CLUSTER command, permitting you to inject monitoring
> after plain CLUSTER and dispense with the wrapper.

I unfortunately have to agree with this, but I think it points to the
need for further work on the pgstat infrastructure. We used to have
one file; now we have one per database. That's better for people with
lots of databases, but many people just have one big database. We
need a solution here that relieves the pain for those people.

(I can't help thinking that the root of the problem here is that we're
rewriting the whole file, and that any solution that doesn't somehow
facilitate updates of individual records will be only a small
improvement.)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company