pg_upgrade and statistics

From: Daniel Farina <daniel(at)heroku(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: pg_upgrade and statistics
Date: 2012-03-13 01:38:30
Message-ID: CAAZKuFaWdLkK8eozSAooZBets9y_mfo2HS6urPAKXEPbd-JLCA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

As noted by the manual, pg_statistic is ported in any way when
performing pg_upgrade. I have been investigating what it would take
to (even via just a connected SQL superuser client running UPDATE or
INSERT against pg_statistic) get at least some baseline statistics
into the database as quickly as possible, since in practice the
underlying implementation of the statistics and cost estimation does
not change so dramatically between releases as to make the old
statistics useless (AFAIK). I eventually used a few contortions to be
able to update the anyarray elements in pg_statistic:

UPDATE pg_statistic SET
stavalues1=array_in(anyarray_out('{thearrayliteral}'::concrete_type[]),
'concrete_type'::regtype, atttypemod)
WHERE staattnum = attnum and starelid = therelation;

Notably, the type analysis phase is a bit too smart for me to simply
cast to "anyarray" from a concrete type, so I run it through a
deparse/reparse phase instead to fool it.

Now I'm stuck trying to ensure that autoanalyze will run at least once
after we have committed the old statistics to the new catalogs,
regardless of how much activity has taken place on the table,
regardless of how cold (thus, tuning the GUC thresholds is not
attractive, because at what point should I tune them back to normal
settings?). One idea I had was to jigger pg_stat to indicate that a
lot of tuples have changed since the last analyze (which will be
automatically fixed after autoanalyze on a relation completes) but
because this is not a regular table it doesn't look too easy unless I
break out a new C extension.

You probably are going to ask: "why not just run ANALYZE and be done
with it?" The reasons are:

* ANALYZE can take a sufficiently long time on large databases that
the downtime of switching versions is not attractive

* If we don't run ANALYZE and have no old statistics, then the plans
can be disastrously bad for the user

* If we do run the ANALYZE statement on a user's behalf as part of
the upgrade, any compatibility fixups that require an exclusive lock
(such as some ALTER TABLE statements) would have to block on this
relatively long ANALYZE. autoanalyze/autovacuum, by comparison, backs
off frequently, so disaster is averted.

If anyone has any insightful comments as to how to meet these
requirements, I'd appreciate them, otherwise I can consider it an
interesting area for improvement and will eat the ANALYZE and salt the
documentation with caveats.

--
fdr

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-03-13 02:26:34 Re: wal_buffers, redux
Previous Message Noah Misch 2012-03-13 01:24:40 Re: foreign key locks, 2nd attempt