Re: [GENERAL] autoanalyze criteria

Lists: pgsql-generalpgsql-hackers
From: Stefan Andreatta <s(dot)andreatta(at)synedra(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: autoanalyze criteria
Date: 2013-02-22 13:57:28
Message-ID: 51277948.8060005@synedra.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Hi,

If I understand
http://www.postgresql.org/docs/9.1/static/routine-vacuuming.html
correctly, the autovacuum threshold in could be estimated like this in
PostgreSQL 9.1:

SELECT pg_stat_user_tables.relname,
pg_stat_user_tables.n_dead_tup,
CAST(current_setting('autovacuum_vacuum_threshold') AS bigint)
+ (CAST(current_setting('autovacuum_vacuum_scale_factor') AS
numeric)
* pg_class.reltuples) AS av_threshold
FROM pg_stat_user_tables
JOIN pg_class on pg_stat_user_tables.relid = pg_class.oid
ORDER BY 1;

If pg_stat_user_tables.n_dead_tup exceeds av_threshold autovacuum should
kick in. Obviously, that does rely on up-to-date statistics. Is that how
it is actually done?

2nd question: because pg_stat_user_tables.n_dead_tup is itself estimated
by ANALYZE it cannot be used as a criterion for the next autoanalyze
run, I think. Is there any way to query, whether a table currently
qualifies for autoanalyze?

Thanks and Regards,
Stefan


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Stefan Andreatta <s(dot)andreatta(at)synedra(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: autoanalyze criteria
Date: 2013-02-22 17:27:18
Message-ID: CAMkU=1x3fWE_tro87xB4MZmqVkR0xKUX+47Nwcu-M2YRA+f_vA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Fri, Feb 22, 2013 at 5:57 AM, Stefan Andreatta
<s(dot)andreatta(at)synedra(dot)com>wrote:

> Hi,
>
> If I understand http://www.postgresql.org/**docs/9.1/static/routine-**
> vacuuming.html<http://www.postgresql.org/docs/9.1/static/routine-vacuuming.html>correctly, the autovacuum threshold in could be estimated like this in
> PostgreSQL 9.1:
>
> SELECT pg_stat_user_tables.relname,
> pg_stat_user_tables.n_dead_**tup,
> CAST(current_setting('**autovacuum_vacuum_threshold') AS bigint)
> + (CAST(current_setting('**autovacuum_vacuum_scale_**factor') AS
> numeric)
> * pg_class.reltuples) AS av_threshold
> FROM pg_stat_user_tables
> JOIN pg_class on pg_stat_user_tables.relid = pg_class.oid
> ORDER BY 1;
>
> If pg_stat_user_tables.n_dead_tup exceeds av_threshold autovacuum should
> kick in. Obviously, that does rely on up-to-date statistics. Is that how it
> is actually done?
>

Pretty much, yes. With the caveat that table storage settings can override
the global settings.

>
> 2nd question: because pg_stat_user_tables.n_dead_tup is itself estimated
> by ANALYZE it cannot be used as a criterion for the next autoanalyze run, I
> think.

n_dead_tup is updated by the stats collector, not by ANALYZE.

Cheers,

Jeff


From: Stefan Andreatta <s(dot)andreatta(at)synedra(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: autoanalyze criteria
Date: 2013-02-23 13:11:16
Message-ID: 5128BFF4.7030406@synedra.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On 02/22/2013 06:27 PM, Jeff Janes wrote:
> On Fri, Feb 22, 2013 at 5:57 AM, Stefan Andreatta
> <s(dot)andreatta(at)synedra(dot)com <mailto:s(dot)andreatta(at)synedra(dot)com>> wrote:
>
> Hi,
>
> If I understand
> http://www.postgresql.org/docs/9.1/static/routine-vacuuming.html
> correctly, the autovacuum threshold in could be estimated like
> this in PostgreSQL 9.1:
>
> SELECT pg_stat_user_tables.relname,
> pg_stat_user_tables.n_dead_tup,
> CAST(current_setting('autovacuum_vacuum_threshold') AS bigint)
> + (CAST(current_setting('autovacuum_vacuum_scale_factor')
> AS numeric)
> * pg_class.reltuples) AS av_threshold
> FROM pg_stat_user_tables
> JOIN pg_class on pg_stat_user_tables.relid = pg_class.oid
> ORDER BY 1;
>
> If pg_stat_user_tables.n_dead_tup exceeds av_threshold autovacuum
> should kick in. Obviously, that does rely on up-to-date
> statistics. Is that how it is actually done?
>
>
> Pretty much, yes. With the caveat that table storage settings can
> override the global settings.
>
>
> 2nd question: because pg_stat_user_tables.n_dead_tup is itself
> estimated by ANALYZE it cannot be used as a criterion for the next
> autoanalyze run, I think.
>
>
> n_dead_tup is updated by the stats collector, not by ANALYZE.
>
Thanks Jeff, that helped a lot (as did a careful rereading of
http://www.postgresql.org/docs/9.1/static/monitoring-stats.html and
http://www.postgresql.org/docs/9.1/static/catalog-pg-class.html ;-)

However, to estimate whether autoanalyze should be triggered, I am still
missing something: the analyze threshold is compared to the "total
number of tuples inserted, updated, or deleted since the last ANALYZE."
(according to
http://www.postgresql.org/docs/9.1/static/routine-vacuuming.html).

pg_stat_user_tables.n_live tup - pg_class.reltuples should give
something like the sum of rows inserted minus rows deleted since the
last ANALYZE. But according to the documentation we would need the sum
of those values. And we are still missing a number for rows updated
since the last analyze. pg_stat_usert_tables. n_dead_tup, on the other
hand, is only set back by successful VACUUM. autoanalyzing a table with
more than 10% dead rows would therefore keep autoanalyze in a loop until
the ratio rises beyond 20% (default configuration) and autovacuum kicks
in. So that wouldn't make a lot of sense.

Regards,
Stefan


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Stefan Andreatta <s(dot)andreatta(at)synedra(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: autoanalyze criteria
Date: 2013-02-23 16:10:36
Message-ID: CAMkU=1ysn9rJC=WEuiAQWP90mYa25hvPhWW6jtQ_J-uZVaOBGg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Saturday, February 23, 2013, Stefan Andreatta wrote:

>
>
> Thanks Jeff, that helped a lot (as did a careful rereading of
> http://www.postgresql.org/docs/9.1/static/monitoring-stats.html and
> http://www.postgresql.org/docs/9.1/static/catalog-pg-class.html ;-)
>
> However, to estimate whether autoanalyze should be triggered, I am still
> missing something: the analyze threshold is compared to the "total number
> of tuples inserted, updated, or deleted since the last ANALYZE."
> (according to
> http://www.postgresql.org/docs/9.1/static/routine-vacuuming.html).
>
> pg_stat_user_tables.n_live tup - pg_class.reltuples should give something
> like the sum of rows inserted minus rows deleted since the last ANALYZE.
> But according to the documentation we would need the sum of those values.
> And we are still missing a number for rows updated since the last analyze.
> pg_stat_usert_tables. n_dead_tup, on the other hand, is only set back by
> successful VACUUM. autoanalyzing a table with more than 10% dead rows would
> therefore keep autoanalyze in a loop until the ratio rises beyond 20%
> (default configuration) and autovacuum kicks in. So that wouldn't make a
> lot of sense.
>
>
Hi Stefan,

Sorry, I got tunnel vision about the how the threshold was computed, and
forgot about the thing it was compared to. There is a "secret" data point
in the stats collector called changes_since_analyze. This is not exposed
in the pg_stat_user_tables. But I think it should be as I often have
wanted to see it.

Cheers,

Jeff


From: Stefan Andreatta <s(dot)andreatta(at)synedra(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: autoanalyze criteria
Date: 2013-02-23 18:41:09
Message-ID: 51290D45.30609@synedra.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers


On 02/23/2013 05:10 PM, Jeff Janes wrote:
> On Saturday, February 23, 2013, Stefan Andreatta wrote:
>
>>
> Thanks Jeff, that helped a lot (as did a careful rereading of
> http://www.postgresql.org/docs/9.1/static/monitoring-stats.html
> and
> http://www.postgresql.org/docs/9.1/static/catalog-pg-class.html ;-)
>
> However, to estimate whether autoanalyze should be triggered, I am
> still missing something: the analyze threshold is compared to the
> "total number of tuples inserted, updated, or deleted since the
> last ANALYZE." (according to
> http://www.postgresql.org/docs/9.1/static/routine-vacuuming.html).
>
> pg_stat_user_tables.n_live tup - pg_class.reltuples should give
> something like the sum of rows inserted minus rows deleted since
> the last ANALYZE. But according to the documentation we would need
> the sum of those values. And we are still missing a number for
> rows updated since the last analyze. pg_stat_usert_tables.
> n_dead_tup, on the other hand, is only set back by successful
> VACUUM. autoanalyzing a table with more than 10% dead rows would
> therefore keep autoanalyze in a loop until the ratio rises beyond
> 20% (default configuration) and autovacuum kicks in. So that
> wouldn't make a lot of sense.
>
>
> Hi Stefan,
>
> Sorry, I got tunnel vision about the how the threshold was computed,
> and forgot about the thing it was compared to. There is a "secret"
> data point in the stats collector called changes_since_analyze. This
> is not exposed in the pg_stat_user_tables. But I think it should be
> as I often have wanted to see it.
>
>
> Cheers,
>
> Jeff

Sounds like a very good idea to me - any way I could help to make such a
thing happen?

Stefan


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Stefan Andreatta <s(dot)andreatta(at)synedra(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] autoanalyze criteria
Date: 2013-02-23 20:30:22
Message-ID: CAMkU=1yrf7Fukn-xF_LfLZYqJdXnYXXVD773c105wETtSRCNrA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Moved discussion from General To Hackers.

On Sat, Feb 23, 2013 at 10:41 AM, Stefan Andreatta
<s(dot)andreatta(at)synedra(dot)com>wrote:

>
> On 02/23/2013 05:10 PM, Jeff Janes wrote:
>
>
> Sorry, I got tunnel vision about the how the threshold was computed, and
> forgot about the thing it was compared to. There is a "secret" data point
> in the stats collector called changes_since_analyze. This is not exposed
> in the pg_stat_user_tables. But I think it should be as I often have
> wanted to see it.
>
>
>
> Sounds like a very good idea to me - any way I could help to make such a
> thing happen?
>

It should be fairly easy to implement because the other columns are already
there to show you the way, and if you want to try your hand at hacking
pgsql it would be a good introduction to doing so.

Look at each instance in the code of n_dead_dup and
pg_stat_get_dead_tuples, and those are the places where
changes_since_analyze also need to be addressed, in an analogous manner
(assuming it is isn't already there.)

git grep 'n_dead_tup'

It looks like we would need to add an SQL function to retrieve the data,
then incorporate that function into the view definitions that make up the
pg_stat_user_tables etc. views. and of course update the regression test
and the documentation.

Other than implementing it, we would need to convince other hackers that
this is desirable to have. I'm not sure how hard that would be. I've
looked in the archives to see if this idea was already considered but
rejected, but I don't see any indication that it was previously considered.

(http://www.postgresql.org/message-id/4823.1262132964@sss.pgh.pa.us).

Cheers,

Jeff


From: Stefan Andreatta <s(dot)andreatta(at)synedra(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] autoanalyze criteria
Date: 2013-02-23 21:12:03
Message-ID: 512930A3.7090301@synedra.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers


On 02/23/2013 09:30 PM, Jeff Janes wrote:
> Moved discussion from General To Hackers.
>
> On Sat, Feb 23, 2013 at 10:41 AM, Stefan Andreatta
> <s(dot)andreatta(at)synedra(dot)com <mailto:s(dot)andreatta(at)synedra(dot)com>> wrote:
>
>
> On 02/23/2013 05:10 PM, Jeff Janes wrote:
>>
>> Sorry, I got tunnel vision about the how the threshold was
>> computed, and forgot about the thing it was compared to. There
>> is a "secret" data point in the stats collector
>> called changes_since_analyze. This is not exposed in the
>> pg_stat_user_tables. But I think it should be as I often have
>> wanted to see it.
>>
>>
>
> Sounds like a very good idea to me - any way I could help to make
> such a thing happen?
>
>
>
> It should be fairly easy to implement because the other columns are
> already there to show you the way, and if you want to try your hand at
> hacking pgsql it would be a good introduction to doing so.
>
> Look at each instance in the code of n_dead_dup and
> pg_stat_get_dead_tuples, and those are the places where
> changes_since_analyze also need to be addressed, in an analogous
> manner (assuming it is isn't already there.)
>
> git grep 'n_dead_tup'
>
> It looks like we would need to add an SQL function to retrieve the
> data, then incorporate that function into the view definitions that
> make up the pg_stat_user_tables etc. views. and of course update the
> regression test and the documentation.
>
> Other than implementing it, we would need to convince other hackers
> that this is desirable to have. I'm not sure how hard that would be.
> I've looked in the archives to see if this idea was already considered
> but rejected, but I don't see any indication that it was previously
> considered.
>
> (http://www.postgresql.org/message-id/4823.1262132964@sss.pgh.pa.us).
>
> Cheers,
>
> Jeff

Not being a developer, I am afraid, I will not be going to implement it
myself - nor would anybody wish so ;-)

I also searched the archives, but the closest I found is a discussion on
the Admin List starting here:
http://www.postgresql.org/message-id/626919622.7634700.1351695913466.JavaMail.root@alaloop.com

On the other hand, there is quite a lot of discussion about making
autoanalyze more (or less) aggressive - which seems a difficult task to
me, when you cannot even check what's triggering your autoanalyze.

Anybody else interested?

Regards,
Stefan


From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: Stefan Andreatta <s(dot)andreatta(at)synedra(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] autoanalyze criteria
Date: 2013-02-23 21:51:08
Message-ID: 512939CC.6060009@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On 24/02/13 10:12, Stefan Andreatta wrote:
>
> On 02/23/2013 09:30 PM, Jeff Janes wrote:
>> Moved discussion from General To Hackers.
>>
>> On Sat, Feb 23, 2013 at 10:41 AM, Stefan Andreatta
>> <s(dot)andreatta(at)synedra(dot)com <mailto:s(dot)andreatta(at)synedra(dot)com>> wrote:
>>
>>
>> On 02/23/2013 05:10 PM, Jeff Janes wrote:
>>>
>>> Sorry, I got tunnel vision about the how the threshold was
>>> computed, and forgot about the thing it was compared to. There
>>> is a "secret" data point in the stats collector
>>> called changes_since_analyze. This is not exposed in the
>>> pg_stat_user_tables. But I think it should be as I often have
>>> wanted to see it.
>>>
>>>
>>
>> Sounds like a very good idea to me - any way I could help to make
>> such a thing happen?
>>
>>
>>
>> It should be fairly easy to implement because the other columns are
>> already there to show you the way, and if you want to try your hand at
>> hacking pgsql it would be a good introduction to doing so.
>>
>> Look at each instance in the code of n_dead_dup and
>> pg_stat_get_dead_tuples, and those are the places where
>> changes_since_analyze also need to be addressed, in an analogous
>> manner (assuming it is isn't already there.)
>>
>> git grep 'n_dead_tup'
>>
>> It looks like we would need to add an SQL function to retrieve the
>> data, then incorporate that function into the view definitions that
>> make up the pg_stat_user_tables etc. views. and of course update the
>> regression test and the documentation.
>>
>> Other than implementing it, we would need to convince other hackers
>> that this is desirable to have. I'm not sure how hard that would be.
>> I've looked in the archives to see if this idea was already considered
>> but rejected, but I don't see any indication that it was previously
>> considered.
>>
>> (http://www.postgresql.org/message-id/4823.1262132964@sss.pgh.pa.us).
>>
>> Cheers,
>>
>> Jeff
>
> Not being a developer, I am afraid, I will not be going to implement it
> myself - nor would anybody wish so ;-)
>
> I also searched the archives, but the closest I found is a discussion on
> the Admin List starting here:
> http://www.postgresql.org/message-id/626919622.7634700.1351695913466.JavaMail.root@alaloop.com
>
> On the other hand, there is quite a lot of discussion about making
> autoanalyze more (or less) aggressive - which seems a difficult task to
> me, when you cannot even check what's triggering your autoanalyze.
>
> Anybody else interested?
>

I was asked about this exact thing the other day - it would be very nice
to have the information visible. I may take a look at doing it (I've
done some hacking on the stats system previously). However don't let
that put anyone else off - as I'll have to find the time to start :-)

Regards

Mark


From: Alban Hertroys <haramrae(at)gmail(dot)com>
To: Stefan Andreatta <s(dot)andreatta(at)synedra(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: autoanalyze criteria
Date: 2013-02-24 11:52:16
Message-ID: 7B8C1C0C-DA7E-4C4B-A8DB-CE63830F502C@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Feb 23, 2013, at 14:11, Stefan Andreatta <s(dot)andreatta(at)synedra(dot)com> wrote:

> And we are still missing a number for rows updated since the last analyse.

In MVCC an update is an insert + delete, so you already got those numbers.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.


From: Stefan Andreatta <s(dot)andreatta(at)synedra(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: autoanalyze criteria
Date: 2013-02-25 06:23:40
Message-ID: 512B036C.4060601@synedra.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On 02/24/2013 12:52 PM, Alban Hertroys wrote:
> On Feb 23, 2013, at 14:11, Stefan Andreatta <s(dot)andreatta(at)synedra(dot)com
> <mailto:s(dot)andreatta(at)synedra(dot)com>> wrote:
>
>> And we are still missing a number for rows updated since the last
>> analyse.
>
> In MVCC an update is an insert + delete, so you already got those numbers.
>
Good point. But because they are an update and a delete, they cancel
each other out and do not show up in pg_stat_user_tables.n_live_tup -
and that's the only value for which we have a reference value from the
time of the last analyze (pg_class.reltuples).

On the other hand, I might again miss something out. I would be most
happy if anybody could come up with a query to estimate autoanalyze
trigger conditions from the values we have available now.

Stefan


From: Alban Hertroys <haramrae(at)gmail(dot)com>
To: Stefan Andreatta <s(dot)andreatta(at)synedra(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: autoanalyze criteria
Date: 2013-02-25 08:00:13
Message-ID: C69D2798-07FA-4E82-94FB-03EC4A62A479@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Feb 25, 2013, at 7:23, Stefan Andreatta <s(dot)andreatta(at)synedra(dot)com> wrote:

> On 02/24/2013 12:52 PM, Alban Hertroys wrote:
>> On Feb 23, 2013, at 14:11, Stefan Andreatta <s(dot)andreatta(at)synedra(dot)com> wrote:
>>
>>> And we are still missing a number for rows updated since the last analyse.
>>
>> In MVCC an update is an insert + delete, so you already got those numbers.
>>
> Good point. But because they are an update and a delete, they cancel each other out and do not show up in pg_stat_user_tables.n_live_tup - and that's the only value for which we have a reference value from the time of the last analyze (pg_class.reltuples).

I'm pretty sure that an update results in 1 live + 1 dead tuple, so they don't cancel each other out - they end up adding to different statistics. Assuming those statistics are both since last vacuum, added together they are the total number of changed records since last vacuum.
What gain do you expect from a number of updated tuples?

And it seems to me those numbers are since last vacuum, not since last analyse - analyse doesn't change the amount of dead tuples (it just updates them to closer match reality), but vacuum does.

Disclaimer: I'm not intimately familiar with the planner statistics, but knowing what vacuum and analyse do in an MVCC database, like I described above it makes sense to me. I might be wrong though.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.


From: Stefan Andreatta <s(dot)andreatta(at)synedra(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: autoanalyze criteria
Date: 2013-02-25 11:07:19
Message-ID: 512B45E7.1000007@synedra.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers


On 02/25/2013 09:00 AM, Alban Hertroys wrote:
> On Feb 25, 2013, at 7:23, Stefan Andreatta <s(dot)andreatta(at)synedra(dot)com
> <mailto:s(dot)andreatta(at)synedra(dot)com>> wrote:
>
>> On 02/24/2013 12:52 PM, Alban Hertroys wrote:
>>> On Feb 23, 2013, at 14:11, Stefan Andreatta <s(dot)andreatta(at)synedra(dot)com
>>> <mailto:s(dot)andreatta(at)synedra(dot)com>> wrote:
>>>
>>>> And we are still missing a number for rows updated since the last
>>>> analyse.
>>>
>>> In MVCC an update is an insert + delete, so you already got those
>>> numbers.
>>>
>> Good point. But because they are an update and a delete, they cancel
>> each other out and do not show up in pg_stat_user_tables.n_live_tup -
>> and that's the only value for which we have a reference value from
>> the time of the last analyze (pg_class.reltuples).
>
> I'm pretty sure that an update results in 1 live + 1 dead tuple, so
> they don't cancel each other out - they end up adding to different
> statistics. Assuming those statistics are both since last vacuum,
> added together they are the total number of changed records since last
> vacuum.
> What gain do you expect from a number of updated tuples?
>
> And it seems to me those numbers are since last vacuum, not since last
> analyse - analyse doesn't change the amount of dead tuples (it just
> updates them to closer match reality), but vacuum does.
>
> Disclaimer: I'm not intimately familiar with the planner statistics,
> but knowing what vacuum and analyse do in an MVCC database, like I
> described above it makes sense to me. I might be wrong though.
1 update = 1 insert + 1 delete cancel each other out with respect to
pg_stat_user_tables.n_live_tup. Naturally, they dont't cancel each other
out with pg_stat_user_tables.n_tup_ins or n_tup_del - they don't even
show up in those values, presumably because that's what n_tup_upd is
there for. However the update adds to n_dead_tup.

VACUUM does not reset *any* of the statistics values that can be
accessed via pg_stat_user_tables, apart from n_dead_tup (hopefully ;-)
Anyway, to estimate the autoanalyze trigger, I would need statistics
that get reset by autoanalyze not autovacuum.

I wrote a test script to show the behaviour. Be sure to wait a second
each time before accessing pg_stat_user_tables as there is a delay in
getting those data:

CREATE TABLE test_stat (id BIGINT, some_number BIGINT);
INSERT INTO test_stat (SELECT generate_series(1,10000) AS i, random() AS r);
SELECT count(*) FROM test_stat;
ANALYZE test_stat;

-- wait here (0.5 s) for statistics collector to catch up
SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat';
SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd, n_tup_del,
n_tup_hot_upd FROM pg_stat_user_tables WHERE relname = 'test_stat';

INSERT INTO test_stat (SELECT generate_series(10001,10900) AS i,
random() AS r);

-- wait here (0.5 s) for statistics collector to catch up
SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat';
SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd, n_tup_del,
n_tup_hot_upd FROM pg_stat_user_tables WHERE relname = 'test_stat';

DELETE FROM test_stat WHERE id > 10000;

-- wait here (0.5 s) for statistics collector to catch up
SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat';
SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd, n_tup_del,
n_tup_hot_upd FROM pg_stat_user_tables WHERE relname = 'test_stat';

UPDATE test_stat set some_number = 1 where id > 9100;

-- wait here (0.5 s) for statistics collector to catch up
SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat';
SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd, n_tup_del,
n_tup_hot_upd FROM pg_stat_user_tables WHERE relname = 'test_stat';

ANALYZE test_stat;

SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat';
SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd, n_tup_del,
n_tup_hot_upd FROM pg_stat_user_tables WHERE relname = 'test_stat';

VACUUM test_stat;

-- wait here (0.5 s) for statistics collector to catch up
SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat';
SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd, n_tup_del,
n_tup_hot_upd FROM pg_stat_user_tables WHERE relname = 'test_stat';

DROP TABLE test_stat;

Output from a postgres 9.2 database:
--------------------------------------------------

test=# CREATE TABLE test_stat (id BIGINT, some_number BIGINT);
CREATE TABLE
test=# INSERT INTO test_stat (SELECT generate_series(1,10000) AS i,
random() AS r);
INSERT 0 10000
test=# SELECT count(*) FROM test_stat;
count
-------
10000
(1 row)

test=# ANALYZE test_stat;
ANALYZE
test=# SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat';
relname | reltuples
-----------+-----------
test_stat | 10000
(1 row)

test=# SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd,
n_tup_del, n_tup_hot_upd FROM pg_stat_user_tables WHERE relname =
'test_stat';
relname | n_live_tup | n_dead_tup | n_tup_ins | n_tup_upd |
n_tup_del | n_tup_hot_upd
-----------+------------+------------+-----------+-----------+-----------+---------------
test_stat | 10000 | 0 | 10000 | 0 |
0 | 0
(1 row)

test=# INSERT INTO test_stat (SELECT generate_series(10001,10900) AS i,
random() AS r);
INSERT 0 900
test=# SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat';
relname | reltuples
-----------+-----------
test_stat | 10000
(1 row)

test=# SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd,
n_tup_del, n_tup_hot_upd FROM pg_stat_user_tables WHERE relname =
'test_stat';
relname | n_live_tup | n_dead_tup | n_tup_ins | n_tup_upd |
n_tup_del | n_tup_hot_upd
-----------+------------+------------+-----------+-----------+-----------+---------------
test_stat | 10900 | 0 | 10900 | 0 |
0 | 0
(1 row)

test=# DELETE FROM test_stat WHERE id > 10000;
DELETE 900
test=# SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat';
relname | reltuples
-----------+-----------
test_stat | 10000
(1 row)

test=# SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd,
n_tup_del, n_tup_hot_upd FROM pg_stat_user_tables WHERE relname =
'test_stat';
relname | n_live_tup | n_dead_tup | n_tup_ins | n_tup_upd |
n_tup_del | n_tup_hot_upd
-----------+------------+------------+-----------+-----------+-----------+---------------
test_stat | 10000 | 900 | 10900 | 0 | 900
| 0
(1 row)

test=# UPDATE test_stat set some_number = 1 where id > 9100;
UPDATE 900
test=# -- wait here (0.5 s) for statistics collector to catch up
test=# SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat';
relname | reltuples
-----------+-----------
test_stat | 10000
(1 row)

test=# SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd,
n_tup_del, n_tup_hot_upd FROM pg_stat_user_tables WHERE relname =
'test_stat';
relname | n_live_tup | n_dead_tup | n_tup_ins | n_tup_upd |
n_tup_del | n_tup_hot_upd
-----------+------------+------------+-----------+-----------+-----------+---------------
test_stat | 10000 | 1800 | 10900 | 900 | 900
| 10
(1 row)

test=# ANALYZE test_stat;
ANALYZE
test=#
test=# SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat';
relname | reltuples
-----------+-----------
test_stat | 10000
(1 row)

test=# SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd,
n_tup_del, n_tup_hot_upd FROM pg_stat_user_tables WHERE relname =
'test_stat';
relname | n_live_tup | n_dead_tup | n_tup_ins | n_tup_upd |
n_tup_del | n_tup_hot_upd
-----------+------------+------------+-----------+-----------+-----------+---------------
test_stat | 10000 | 1800 | 10900 | 900 | 900
| 10
(1 row)

test=# VACUUM test_stat;
VACUUM
test=# SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat';
relname | reltuples
-----------+-----------
test_stat | 10000
(1 row)

test=# SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd,
n_tup_del, n_tup_hot_upd FROM pg_stat_user_tables WHERE relname =
'test_stat';
relname | n_live_tup | n_dead_tup | n_tup_ins | n_tup_upd |
n_tup_del | n_tup_hot_upd
-----------+------------+------------+-----------+-----------+-----------+---------------
test_stat | 10000 | 0 | 10900 | 900 | 900
| 10
(1 row)

test=# DROP TABLE test_stat;
DROP TABLE

Regards,
Stefan


From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: Stefan Andreatta <s(dot)andreatta(at)synedra(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] autoanalyze criteria
Date: 2013-05-15 00:33:22
Message-ID: 5192D7D2.8020605@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On 24/02/13 10:51, Mark Kirkwood wrote:
> On 24/02/13 10:12, Stefan Andreatta wrote:
>>
>> On 02/23/2013 09:30 PM, Jeff Janes wrote:
>>> Moved discussion from General To Hackers.
>>>
>>> On Sat, Feb 23, 2013 at 10:41 AM, Stefan Andreatta
>>> <s(dot)andreatta(at)synedra(dot)com <mailto:s(dot)andreatta(at)synedra(dot)com>> wrote:
>>>
>>>
>>> On 02/23/2013 05:10 PM, Jeff Janes wrote:
>>>>
>>>> Sorry, I got tunnel vision about the how the threshold was
>>>> computed, and forgot about the thing it was compared to. There
>>>> is a "secret" data point in the stats collector
>>>> called changes_since_analyze. This is not exposed in the
>>>> pg_stat_user_tables. But I think it should be as I often have
>>>> wanted to see it.
>>>>
>>>>
>>>
>>> Sounds like a very good idea to me - any way I could help to make
>>> such a thing happen?
>>>
>>>
>>>
>>> It should be fairly easy to implement because the other columns are
>>> already there to show you the way, and if you want to try your hand at
>>> hacking pgsql it would be a good introduction to doing so.
>>>
>>> Look at each instance in the code of n_dead_dup and
>>> pg_stat_get_dead_tuples, and those are the places where
>>> changes_since_analyze also need to be addressed, in an analogous
>>> manner (assuming it is isn't already there.)
>>>
>>> git grep 'n_dead_tup'
>>>
>>> It looks like we would need to add an SQL function to retrieve the
>>> data, then incorporate that function into the view definitions that
>>> make up the pg_stat_user_tables etc. views. and of course update the
>>> regression test and the documentation.
>>>
>>> Other than implementing it, we would need to convince other hackers
>>> that this is desirable to have. I'm not sure how hard that would be.
>>> I've looked in the archives to see if this idea was already considered
>>> but rejected, but I don't see any indication that it was previously
>>> considered.
>>>
>>> (http://www.postgresql.org/message-id/4823.1262132964@sss.pgh.pa.us).
>>>
>>> Cheers,
>>>
>>> Jeff
>>
>> Not being a developer, I am afraid, I will not be going to implement it
>> myself - nor would anybody wish so ;-)
>>
>> I also searched the archives, but the closest I found is a discussion on
>> the Admin List starting here:
>> http://www.postgresql.org/message-id/626919622.7634700.1351695913466.JavaMail.root@alaloop.com
>>
>>
>> On the other hand, there is quite a lot of discussion about making
>> autoanalyze more (or less) aggressive - which seems a difficult task to
>> me, when you cannot even check what's triggering your autoanalyze.
>>
>> Anybody else interested?
>>
>
> I was asked about this exact thing the other day - it would be very
> nice to have the information visible. I may take a look at doing it
> (I've done some hacking on the stats system previously). However don't
> let that put anyone else off - as I'll have to find the time to start :-)
>
>
>

I happened to be looking at the whole autovacuum/analyze setup in
another context - which reminded me about volunteering to take a look at
a patch for adding changes_since_analyze. So with probably impeccably
poor timing (smack in the middle of 9.3 beta), here is a patch that does
that (so it is probably an early 9.4 addition).

I've called the column "n_changes_since_analyze" - I can sense that
there might be discussion about how to maybe shorten that :-) , and
added a doc line for the view + updated the regression test expected input.

Regards

Mark

Attachment Content-Type Size
pgstat-changes-since-analyze.1.patch text/x-patch 9.6 KB

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Cc: Stefan Andreatta <s(dot)andreatta(at)synedra(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] autoanalyze criteria
Date: 2013-07-05 13:12:44
Message-ID: CABUevEyWWOscPjVLhPh2NK4f2xNwKPbr=g+Y7Zhs0r8F67oFGA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Wed, May 15, 2013 at 2:33 AM, Mark Kirkwood
<mark(dot)kirkwood(at)catalyst(dot)net(dot)nz> wrote:
> On 24/02/13 10:51, Mark Kirkwood wrote:
>>
>> On 24/02/13 10:12, Stefan Andreatta wrote:
>>>
>>>
>>> On 02/23/2013 09:30 PM, Jeff Janes wrote:
>>>>
>>>> Moved discussion from General To Hackers.
>>>>
>>>> On Sat, Feb 23, 2013 at 10:41 AM, Stefan Andreatta
>>>> <s(dot)andreatta(at)synedra(dot)com <mailto:s(dot)andreatta(at)synedra(dot)com>> wrote:
>>>>
>>>>
>>>> On 02/23/2013 05:10 PM, Jeff Janes wrote:
>>>>>
>>>>>
>>>>> Sorry, I got tunnel vision about the how the threshold was
>>>>> computed, and forgot about the thing it was compared to. There
>>>>> is a "secret" data point in the stats collector
>>>>> called changes_since_analyze. This is not exposed in the
>>>>> pg_stat_user_tables. But I think it should be as I often have
>>>>> wanted to see it.
>>>>>
>>>>>
>>>>
>>>> Sounds like a very good idea to me - any way I could help to make
>>>> such a thing happen?
>>>>
>>>>
>>>>
>>>> It should be fairly easy to implement because the other columns are
>>>> already there to show you the way, and if you want to try your hand at
>>>> hacking pgsql it would be a good introduction to doing so.
>>>>
>>>> Look at each instance in the code of n_dead_dup and
>>>> pg_stat_get_dead_tuples, and those are the places where
>>>> changes_since_analyze also need to be addressed, in an analogous
>>>> manner (assuming it is isn't already there.)
>>>>
>>>> git grep 'n_dead_tup'
>>>>
>>>> It looks like we would need to add an SQL function to retrieve the
>>>> data, then incorporate that function into the view definitions that
>>>> make up the pg_stat_user_tables etc. views. and of course update the
>>>> regression test and the documentation.
>>>>
>>>> Other than implementing it, we would need to convince other hackers
>>>> that this is desirable to have. I'm not sure how hard that would be.
>>>> I've looked in the archives to see if this idea was already considered
>>>> but rejected, but I don't see any indication that it was previously
>>>> considered.
>>>>
>>>> (http://www.postgresql.org/message-id/4823.1262132964@sss.pgh.pa.us).
>>>>
>>>> Cheers,
>>>>
>>>> Jeff
>>>
>>>
>>> Not being a developer, I am afraid, I will not be going to implement it
>>> myself - nor would anybody wish so ;-)
>>>
>>> I also searched the archives, but the closest I found is a discussion on
>>> the Admin List starting here:
>>>
>>> http://www.postgresql.org/message-id/626919622.7634700.1351695913466.JavaMail.root@alaloop.com
>>>
>>> On the other hand, there is quite a lot of discussion about making
>>> autoanalyze more (or less) aggressive - which seems a difficult task to
>>> me, when you cannot even check what's triggering your autoanalyze.
>>>
>>> Anybody else interested?
>>>
>>
>> I was asked about this exact thing the other day - it would be very nice
>> to have the information visible. I may take a look at doing it (I've done
>> some hacking on the stats system previously). However don't let that put
>> anyone else off - as I'll have to find the time to start :-)
>>
>>
>>
>
> I happened to be looking at the whole autovacuum/analyze setup in another
> context - which reminded me about volunteering to take a look at a patch for
> adding changes_since_analyze. So with probably impeccably poor timing (smack
> in the middle of 9.3 beta), here is a patch that does that (so it is
> probably an early 9.4 addition).
>
> I've called the column "n_changes_since_analyze" - I can sense that there
> might be discussion about how to maybe shorten that :-) , and added a doc
> line for the view + updated the regression test expected input.

Applied, with the changs suggested by Laurenz Albe in his review.

Thanks!

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/