Re: autoanalyze criteria

From: Stefan Andreatta <s(dot)andreatta(at)synedra(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: autoanalyze criteria
Date: 2013-02-25 11:07:19
Message-ID: 512B45E7.1000007@synedra.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers


On 02/25/2013 09:00 AM, Alban Hertroys wrote:
> On Feb 25, 2013, at 7:23, Stefan Andreatta <s(dot)andreatta(at)synedra(dot)com
> <mailto:s(dot)andreatta(at)synedra(dot)com>> wrote:
>
>> On 02/24/2013 12:52 PM, Alban Hertroys wrote:
>>> On Feb 23, 2013, at 14:11, Stefan Andreatta <s(dot)andreatta(at)synedra(dot)com
>>> <mailto:s(dot)andreatta(at)synedra(dot)com>> wrote:
>>>
>>>> And we are still missing a number for rows updated since the last
>>>> analyse.
>>>
>>> In MVCC an update is an insert + delete, so you already got those
>>> numbers.
>>>
>> Good point. But because they are an update and a delete, they cancel
>> each other out and do not show up in pg_stat_user_tables.n_live_tup -
>> and that's the only value for which we have a reference value from
>> the time of the last analyze (pg_class.reltuples).
>
> I'm pretty sure that an update results in 1 live + 1 dead tuple, so
> they don't cancel each other out - they end up adding to different
> statistics. Assuming those statistics are both since last vacuum,
> added together they are the total number of changed records since last
> vacuum.
> What gain do you expect from a number of updated tuples?
>
> And it seems to me those numbers are since last vacuum, not since last
> analyse - analyse doesn't change the amount of dead tuples (it just
> updates them to closer match reality), but vacuum does.
>
> Disclaimer: I'm not intimately familiar with the planner statistics,
> but knowing what vacuum and analyse do in an MVCC database, like I
> described above it makes sense to me. I might be wrong though.
1 update = 1 insert + 1 delete cancel each other out with respect to
pg_stat_user_tables.n_live_tup. Naturally, they dont't cancel each other
out with pg_stat_user_tables.n_tup_ins or n_tup_del - they don't even
show up in those values, presumably because that's what n_tup_upd is
there for. However the update adds to n_dead_tup.

VACUUM does not reset *any* of the statistics values that can be
accessed via pg_stat_user_tables, apart from n_dead_tup (hopefully ;-)
Anyway, to estimate the autoanalyze trigger, I would need statistics
that get reset by autoanalyze not autovacuum.

I wrote a test script to show the behaviour. Be sure to wait a second
each time before accessing pg_stat_user_tables as there is a delay in
getting those data:

CREATE TABLE test_stat (id BIGINT, some_number BIGINT);
INSERT INTO test_stat (SELECT generate_series(1,10000) AS i, random() AS r);
SELECT count(*) FROM test_stat;
ANALYZE test_stat;

-- wait here (0.5 s) for statistics collector to catch up
SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat';
SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd, n_tup_del,
n_tup_hot_upd FROM pg_stat_user_tables WHERE relname = 'test_stat';

INSERT INTO test_stat (SELECT generate_series(10001,10900) AS i,
random() AS r);

-- wait here (0.5 s) for statistics collector to catch up
SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat';
SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd, n_tup_del,
n_tup_hot_upd FROM pg_stat_user_tables WHERE relname = 'test_stat';

DELETE FROM test_stat WHERE id > 10000;

-- wait here (0.5 s) for statistics collector to catch up
SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat';
SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd, n_tup_del,
n_tup_hot_upd FROM pg_stat_user_tables WHERE relname = 'test_stat';

UPDATE test_stat set some_number = 1 where id > 9100;

-- wait here (0.5 s) for statistics collector to catch up
SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat';
SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd, n_tup_del,
n_tup_hot_upd FROM pg_stat_user_tables WHERE relname = 'test_stat';

ANALYZE test_stat;

SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat';
SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd, n_tup_del,
n_tup_hot_upd FROM pg_stat_user_tables WHERE relname = 'test_stat';

VACUUM test_stat;

-- wait here (0.5 s) for statistics collector to catch up
SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat';
SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd, n_tup_del,
n_tup_hot_upd FROM pg_stat_user_tables WHERE relname = 'test_stat';

DROP TABLE test_stat;

Output from a postgres 9.2 database:
--------------------------------------------------

test=# CREATE TABLE test_stat (id BIGINT, some_number BIGINT);
CREATE TABLE
test=# INSERT INTO test_stat (SELECT generate_series(1,10000) AS i,
random() AS r);
INSERT 0 10000
test=# SELECT count(*) FROM test_stat;
count
-------
10000
(1 row)

test=# ANALYZE test_stat;
ANALYZE
test=# SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat';
relname | reltuples
-----------+-----------
test_stat | 10000
(1 row)

test=# SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd,
n_tup_del, n_tup_hot_upd FROM pg_stat_user_tables WHERE relname =
'test_stat';
relname | n_live_tup | n_dead_tup | n_tup_ins | n_tup_upd |
n_tup_del | n_tup_hot_upd
-----------+------------+------------+-----------+-----------+-----------+---------------
test_stat | 10000 | 0 | 10000 | 0 |
0 | 0
(1 row)

test=# INSERT INTO test_stat (SELECT generate_series(10001,10900) AS i,
random() AS r);
INSERT 0 900
test=# SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat';
relname | reltuples
-----------+-----------
test_stat | 10000
(1 row)

test=# SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd,
n_tup_del, n_tup_hot_upd FROM pg_stat_user_tables WHERE relname =
'test_stat';
relname | n_live_tup | n_dead_tup | n_tup_ins | n_tup_upd |
n_tup_del | n_tup_hot_upd
-----------+------------+------------+-----------+-----------+-----------+---------------
test_stat | 10900 | 0 | 10900 | 0 |
0 | 0
(1 row)

test=# DELETE FROM test_stat WHERE id > 10000;
DELETE 900
test=# SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat';
relname | reltuples
-----------+-----------
test_stat | 10000
(1 row)

test=# SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd,
n_tup_del, n_tup_hot_upd FROM pg_stat_user_tables WHERE relname =
'test_stat';
relname | n_live_tup | n_dead_tup | n_tup_ins | n_tup_upd |
n_tup_del | n_tup_hot_upd
-----------+------------+------------+-----------+-----------+-----------+---------------
test_stat | 10000 | 900 | 10900 | 0 | 900
| 0
(1 row)

test=# UPDATE test_stat set some_number = 1 where id > 9100;
UPDATE 900
test=# -- wait here (0.5 s) for statistics collector to catch up
test=# SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat';
relname | reltuples
-----------+-----------
test_stat | 10000
(1 row)

test=# SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd,
n_tup_del, n_tup_hot_upd FROM pg_stat_user_tables WHERE relname =
'test_stat';
relname | n_live_tup | n_dead_tup | n_tup_ins | n_tup_upd |
n_tup_del | n_tup_hot_upd
-----------+------------+------------+-----------+-----------+-----------+---------------
test_stat | 10000 | 1800 | 10900 | 900 | 900
| 10
(1 row)

test=# ANALYZE test_stat;
ANALYZE
test=#
test=# SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat';
relname | reltuples
-----------+-----------
test_stat | 10000
(1 row)

test=# SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd,
n_tup_del, n_tup_hot_upd FROM pg_stat_user_tables WHERE relname =
'test_stat';
relname | n_live_tup | n_dead_tup | n_tup_ins | n_tup_upd |
n_tup_del | n_tup_hot_upd
-----------+------------+------------+-----------+-----------+-----------+---------------
test_stat | 10000 | 1800 | 10900 | 900 | 900
| 10
(1 row)

test=# VACUUM test_stat;
VACUUM
test=# SELECT relname, reltuples FROM pg_class WHERE relname = 'test_stat';
relname | reltuples
-----------+-----------
test_stat | 10000
(1 row)

test=# SELECT relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_upd,
n_tup_del, n_tup_hot_upd FROM pg_stat_user_tables WHERE relname =
'test_stat';
relname | n_live_tup | n_dead_tup | n_tup_ins | n_tup_upd |
n_tup_del | n_tup_hot_upd
-----------+------------+------------+-----------+-----------+-----------+---------------
test_stat | 10000 | 0 | 10900 | 900 | 900
| 10
(1 row)

test=# DROP TABLE test_stat;
DROP TABLE

Regards,
Stefan

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Ali Pouya 2013-02-25 12:44:47 Partitionning by trigger
Previous Message Alban Hertroys 2013-02-25 08:00:13 Re: autoanalyze criteria

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2013-02-25 11:44:14 Re: Why do we still perform a check for pre-sorted input within qsort variants?
Previous Message Bernd Helmle 2013-02-25 10:50:46 PGXS contrib builds broken?