Re: analyze after a database restore?

Lists: pgsql-hackers
From: mlw <pgsql(at)mohawksoft(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: analyze after a database restore?
Date: 2003-02-27 17:27:44
Message-ID: 3E5E4A90.1020605@mohawksoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I just dumped and restored a rather large database, I upgraded from
7.2.x to 7.3.x. When I went to test my application against the new
database, it was dog slow. It had all the indexes, and looked fine.

Then it dawned on me, Doh! ANALYZE!

Should pg_dump appened an ANALYZE for each table?

On small tables, this shouldn't take too long. On large tables, you're
gonna have to do it anyway. I guess it could be an option as well.

It just seems like on of the tasks that is required for a "restored"
database to work properly, and as such, should probably be specified in
the backup procedure.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: mlw <pgsql(at)mohawksoft(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: analyze after a database restore?
Date: 2003-02-27 17:55:42
Message-ID: 13176.1046368542@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

mlw <pgsql(at)mohawksoft(dot)com> writes:
> Should pg_dump appened an ANALYZE for each table?

A single ANALYZE at the end of the script would be sufficient. I'm not
sure that pg_dump should do this automatically though. If you're not
done restoring then it's mostly a waste of cycles, and how is pg_dump to
know that?

I do note that the docs are rather stingy with this important bit of
knowhow :-( Neither of the obvious places that I looked in (pg_dump
reference page and admin guide's backup/restore chapter) mention the
need to issue an ANALYZE after completing a restore. I'm pretty sure it
is mentioned *somewhere* ;-) ... but it needs to be more prominent.

regards, tom lane


From: Rod Taylor <rbt(at)rbt(dot)ca>
To: mlw <pgsql(at)mohawksoft(dot)com>
Cc: PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: analyze after a database restore?
Date: 2003-02-27 18:01:09
Message-ID: 1046368869.91396.80.camel@jester
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2003-02-27 at 12:27, mlw wrote:
> I just dumped and restored a rather large database, I upgraded from
> 7.2.x to 7.3.x. When I went to test my application against the new
> database, it was dog slow. It had all the indexes, and looked fine.
>
> Then it dawned on me, Doh! ANALYZE!
>
> Should pg_dump appened an ANALYZE for each table?
>
> On small tables, this shouldn't take too long. On large tables, you're
> gonna have to do it anyway. I guess it could be an option as well.
>
> It just seems like on of the tasks that is required for a "restored"
> database to work properly, and as such, should probably be specified in
> the backup procedure.

It's been debated before whether pg_dump should append ANALYZE to the
end of the load script. It has always been determined that it shouldn't
(see archives for arguments).

However, an 'Auto-vacuum' process should watch stats and re-analyze the
table when the larger of 30% or 1000 rows has been affected (inserts, or
deletes mostly). That is probably a better solution overall.

--
Rod Taylor <rbt(at)rbt(dot)ca>

PGP Key: http://www.rbt.ca/rbtpub.asc


From: mlw <pgsql(at)mohawksoft(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: analyze after a database restore?
Date: 2003-02-27 18:12:29
Message-ID: 3E5E550D.3090305@mohawksoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:

>mlw <pgsql(at)mohawksoft(dot)com> writes:
>
>
>>Should pg_dump appened an ANALYZE for each table?
>>
>>
>
>A single ANALYZE at the end of the script would be sufficient. I'm not
>sure that pg_dump should do this automatically though. If you're not
>done restoring then it's mostly a waste of cycles, and how is pg_dump to
>know that?
>
>I do note that the docs are rather stingy with this important bit of
>knowhow :-( Neither of the obvious places that I looked in (pg_dump
>reference page and admin guide's backup/restore chapter) mention the
>need to issue an ANALYZE after completing a restore. I'm pretty sure it
>is mentioned *somewhere* ;-) ... but it needs to be more prominent.
>
>
While these are all comforting points, I *know* about analyze and I
occasionally forget. It just seems like a nessisary step after restoring
a backup. Conceptually, one could consider it just as important as an
index, i.e. the system will perform poorly without it.

From an "ease of use" perspective, it would be one less step.


From: "D'Arcy J(dot)M(dot) Cain" <darcy(at)druid(dot)net>
To: mlw <pgsql(at)mohawksoft(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: analyze after a database restore?
Date: 2003-02-27 18:58:47
Message-ID: 200302271358.47967.darcy@druid.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thursday 27 February 2003 13:12, mlw wrote:
> Tom Lane wrote:
> >A single ANALYZE at the end of the script would be sufficient. I'm not
> >sure that pg_dump should do this automatically though. If you're not
> >done restoring then it's mostly a waste of cycles, and how is pg_dump to
> >know that?
> [...]
> From an "ease of use" perspective, it would be one less step.

Why not have pg_dump emit a friendly reminder?

--
D'Arcy J.M. Cain <darcy(at){druid|vex}.net> | Democracy is three wolves
http://www.druid.net/darcy/ | and a sheep voting on
+1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.


From: mlw <pgsql(at)mohawksoft(dot)com>
To: "D'Arcy J(dot)M(dot) Cain" <darcy(at)druid(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: analyze after a database restore?
Date: 2003-02-27 19:45:46
Message-ID: 3E5E6AEA.9070106@mohawksoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

D'Arcy J.M. Cain wrote:

>On Thursday 27 February 2003 13:12, mlw wrote:
>
>
>>Tom Lane wrote:
>>
>>
>>>A single ANALYZE at the end of the script would be sufficient. I'm not
>>>sure that pg_dump should do this automatically though. If you're not
>>>done restoring then it's mostly a waste of cycles, and how is pg_dump to
>>>know that?
>>>
>>>
>>[...]
>> From an "ease of use" perspective, it would be one less step.
>>
>>
>
>Why not have pg_dump emit a friendly reminder?
>
>
>
The reminder won't work, because the backup may be happening in an
automatic fashion, and anything but error messages will be lost. I
dislike having to have an "expert" be present at the database "restore"
phase of operation.

Suppose a company loses the PG admin and a reasonably experienced person
takes his or her place temporarily, This scenario happens all the time
with all sorts of projects. A reasonably experienced person will be able
to accomplish a DB restore but will probably not know about performing
an analyze. Under the pressure of restoring after a crash on a live
system, even a reasonably experienced PG admin may forget, hell I forgot
and I've been using PG since 1997.

The "correct" view of a database backup should be to include the
statistics of the database as it existed at the time backup, these
statistics are part of this state "snapshot" because the directly affect
the operation of the database. I do not want to evoke the name of
Larry's evil product, but it saves its statistics when the data is exported.

Short of including the relevant statistics, there should be an option on
pg_dump to emit an "ANALYZE;" at the end of a database dump. This will
allow a "knowledgeable" admin to selectively add the vacuum so that
someone possibly less qualified than he can do the restore.

Does anyone disagree that a query's "explain" should look the same or
better after a successful restore? From a product QA point of view, if a
valid backup set, when restored, does not recreate the system in a state
at least as efficient and workable as the system when it was backed up,
you did not have a successful restore. Any QA department would rate this
as a serious bug.

Are there any reasons why it should not be an option on pg_dump?


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: mlw <pgsql(at)mohawksoft(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: analyze after a database restore?
Date: 2003-02-27 20:12:36
Message-ID: 27615.1046376756@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

mlw <pgsql(at)mohawksoft(dot)com> writes:
> From an "ease of use" perspective, it would be one less step.

There is something to be said for that. As Rod notes, this has been
considered and rejected before --- but I think that was back when
ANALYZE (a) could only be done as part of VACUUM, and (b) insisted on
scanning the whole table. The current implementation is vastly
lighter-weight than what we were looking at back then. Perhaps it's
time to reconsider.

Although I suggested doing a single unconditional ANALYZE at the end
of the script, second thought leads me to think the per-table ANALYZE
(probably issued right after the table's data-load step) might be
better. That way you'd not have any side-effects on already-existing
tables in the database you are loading to. OTOH, that way would leave
the system catalogs un-analyzed, which might be bad.

regards, tom lane


From: Andrew Sullivan <andrew(at)libertyrms(dot)info>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: analyze after a database restore?
Date: 2003-02-27 22:25:03
Message-ID: 20030227172503.R17141@mail.libertyrms.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 27, 2003 at 02:45:46PM -0500, mlw wrote:
>
> Are there any reasons why it should not be an option on pg_dump?

I wonder whether that mightn't be the best answer. Maybe it should
even be the default, and --noanalyse an option.

I agree that from the point of view of simplifying administration,
it'd be a nice addition.

A

--
----
Andrew Sullivan 204-4141 Yonge Street
Liberty RMS Toronto, Ontario Canada
<andrew(at)libertyrms(dot)info> M2P 2A8
+1 416 646 3304 x110


From: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: mlw <pgsql(at)mohawksoft(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: analyze after a database restore?
Date: 2003-02-27 22:44:11
Message-ID: 20030227224411.GA10724@dcc.uchile.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 27, 2003 at 03:12:36PM -0500, Tom Lane wrote:

> Although I suggested doing a single unconditional ANALYZE at the end
> of the script, second thought leads me to think the per-table ANALYZE
> (probably issued right after the table's data-load step) might be
> better. That way you'd not have any side-effects on already-existing
> tables in the database you are loading to. OTOH, that way would leave
> the system catalogs un-analyzed, which might be bad.

Huh... is there a way to analyze a specific schema? Maybe that can
solve the problem of system catalogs being left un-analyzed, by having
pg_dump emit an "ANALYZE SCHEMA pg_catalog" or something.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"La libertad es como el dinero; el que no la sabe emplear la pierde" (Alvarez)


From: Justin Clift <justin(at)postgresql(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: mlw <pgsql(at)mohawksoft(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: analyze after a database restore?
Date: 2003-02-27 23:46:56
Message-ID: 3E5EA370.30109@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> mlw <pgsql(at)mohawksoft(dot)com> writes:
>
>> From an "ease of use" perspective, it would be one less step.
>
>
> There is something to be said for that. As Rod notes, this has been
> considered and rejected before --- but I think that was back when
> ANALYZE (a) could only be done as part of VACUUM, and (b) insisted on
> scanning the whole table. The current implementation is vastly
> lighter-weight than what we were looking at back then. Perhaps it's
> time to reconsider.
>
> Although I suggested doing a single unconditional ANALYZE at the end
> of the script, second thought leads me to think the per-table ANALYZE
> (probably issued right after the table's data-load step) might be
> better. That way you'd not have any side-effects on already-existing
> tables in the database you are loading to. OTOH, that way would leave
> the system catalogs un-analyzed, which might be bad.

Analyzing the impact a bit further, which has the more positives/negatives?

o Adding an ANALYZE (per table or not)

+ Restoring from backup and getting decent performance is more
reliably, especially for people new to PostgreSQL.

+ Reduces the number of things an admin has to remember or
checklist

- Could be redundant if the restore is part of a larger group of
restores

- Adds a bit more time to the restoration process (not much through)

o Leaving it the way it is (no ANALYZE included)

+ Stops the potential problem of it being redundant if the restore
is part of a larger group of restores

- The SysAdmin/DBA has to know/remember to do this. Especially bad
for people new to PostgreSQL

It seems like the maximum benefits would be on the side of including it,
and the only real negative from including an ANALYZE would be in the
extra time taken for it.

Of course we may be forgetting Other Big Consequences. ;-)

As a thought, we could introduce an option to explicitly not include it,
but that doesn't feel very worthwhile.

Regards and best wishes,

Justin Clift

> regards, tom lane

--
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."
- Indira Gandhi


From: "Christopher Kings-Lynne" <chriskl(at)familyhealth(dot)com(dot)au>
To: "mlw" <pgsql(at)mohawksoft(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: analyze after a database restore?
Date: 2003-02-28 02:05:48
Message-ID: 061601c2decd$e9a14f00$6500a8c0@fhp.internal
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Although I suggested doing a single unconditional ANALYZE at the end
> of the script, second thought leads me to think the per-table ANALYZE
> (probably issued right after the table's data-load step) might be
> better. That way you'd not have any side-effects on already-existing
> tables in the database you are loading to. OTOH, that way would leave
> the system catalogs un-analyzed, which might be bad.

How about adding an ANALYZE SCHEMA pg_catalog; option :)

Chris


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: mlw <pgsql(at)mohawksoft(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: analyze after a database restore?
Date: 2003-03-18 00:01:47
Message-ID: 200303180001.h2I01lA26847@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Attached is a committed patch to add a recommendation for ANALYZE after
restore. It is a shame we only have vacuumdb -a to do analyze _and_
vacuum, and no analyze-only option.

---------------------------------------------------------------------------

Tom Lane wrote:
> mlw <pgsql(at)mohawksoft(dot)com> writes:
> > From an "ease of use" perspective, it would be one less step.
>
> There is something to be said for that. As Rod notes, this has been
> considered and rejected before --- but I think that was back when
> ANALYZE (a) could only be done as part of VACUUM, and (b) insisted on
> scanning the whole table. The current implementation is vastly
> lighter-weight than what we were looking at back then. Perhaps it's
> time to reconsider.
>
> Although I suggested doing a single unconditional ANALYZE at the end
> of the script, second thought leads me to think the per-table ANALYZE
> (probably issued right after the table's data-load step) might be
> better. That way you'd not have any side-effects on already-existing
> tables in the database you are loading to. OTOH, that way would leave
> the system catalogs un-analyzed, which might be bad.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

Attachment Content-Type Size
unknown_filename text/plain 3.9 KB

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, mlw <pgsql(at)mohawksoft(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: analyze after a database restore?
Date: 2003-03-18 14:34:53
Message-ID: Pine.LNX.4.44.0303180145180.1975-100000@peter.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian writes:

> Attached is a committed patch to add a recommendation for ANALYZE after
> restore. It is a shame we only have vacuumdb -a to do analyze _and_
> vacuum, and no analyze-only option.

Isn't one copy of the text per reference page enough?

--
Peter Eisentraut peter_e(at)gmx(dot)net


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, mlw <pgsql(at)mohawksoft(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: analyze after a database restore?
Date: 2003-03-18 17:05:08
Message-ID: 200303181705.h2IH58403689@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Peter Eisentraut wrote:
> Bruce Momjian writes:
>
> > Attached is a committed patch to add a recommendation for ANALYZE after
> > restore. It is a shame we only have vacuumdb -a to do analyze _and_
> > vacuum, and no analyze-only option.
>
> Isn't one copy of the text per reference page enough?

Thanks. Fixed. There were two in pg_dump by accident.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073