Re: ANALYZE after restore

Lists: pgsql-hackers
From: "Christopher Kings-Lynne" <chriskl(at)familyhealth(dot)com(dot)au>
To: "Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: ANALYZE after restore
Date: 2002-04-03 01:40:13
Message-ID: GNELIHDDFBOCMGBFGEFOOEOPCBAA.chriskl@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Would it be an idea to have pg_dump append an ANALYZE; command to the end of
its dumps to assist newbies / inexperienced admins?

Reason being is that I noticed that when I just restored a 50MB dump that
the pg_statistic table had no contents...

I think it'd be an idea...

Chris


From: Neil Conway <nconway(at)klamath(dot)dyndns(dot)org>
To: "Christopher Kings-Lynne" <chriskl(at)familyhealth(dot)com(dot)au>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: ANALYZE after restore
Date: 2002-04-03 01:51:47
Message-ID: 20020402205147.3960fdae.nconway@klamath.dyndns.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 3 Apr 2002 09:40:13 +0800
"Christopher Kings-Lynne" <chriskl(at)familyhealth(dot)com(dot)au> wrote:
> Hi,
>
> Would it be an idea to have pg_dump append an ANALYZE; command to the end of
> its dumps to assist newbies / inexperienced admins?

That strikes me as a good idea; a lot of the questions we get on
-general and on IRC are solved by suggesting "have you run ANALYZE?"
And that is only the sub-section of the user community that takes the
time to track down the problem and posts about it to the mailing
list -- I shudder to think how many people have never taken the time
to tune their database at all.

Given that ANALYZE is now a separate command, so there is no need to
run a VACUUM (which could be much more expensive); furthermore, since
ANALYZE now only takes a statistical sampling of the full table, it
shouldn't take very long, even on large tables. However, I'd say we
should make this behavior optional, controlled by a command-line
switch, but it should be enabled by default.

Cheers,

Neil

--
Neil Conway <neilconway(at)rogers(dot)com>
PGP Key ID: DB3C29FC


From: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
To: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
Cc: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ANALYZE after restore
Date: 2002-04-03 01:52:45
Message-ID: Pine.LNX.4.21.0204031149210.7670-100000@linuxworld.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 3 Apr 2002, Christopher Kings-Lynne wrote:

> Hi,
>
> Would it be an idea to have pg_dump append an ANALYZE; command to the end of
> its dumps to assist newbies / inexperienced admins?

I do not think this is desired behaviour. Firstly, pg_dump is not just for
restoring data to the system. Presumably another flag would need to be
added to pg_dump to prevent an ANALYZE being appended. This is messing
and, in my opinion, it goes against the 'does what it says it does' nature
of Postgres. Secondly, in experienced admins are not going to get
experienced with database management unless they see that their database
runs like a dog and they have to read the manual.

Gavin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
Cc: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ANALYZE after restore
Date: 2002-04-03 04:09:55
Message-ID: 6520.1017806995@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Gavin Sherry <swm(at)linuxworld(dot)com(dot)au> writes:
> On Wed, 3 Apr 2002, Christopher Kings-Lynne wrote:
>> Would it be an idea to have pg_dump append an ANALYZE; command to the end of
>> its dumps to assist newbies / inexperienced admins?

> I do not think this is desired behaviour.

I agree with Gavin here ... a forced VACUUM or ANALYZE after a restore
will just get in the way of people who know what they're doing, and it's
not at all clear that it will help people who do not.

regards, tom lane


From: Hannu Krosing <hannu(at)krosing(dot)net>
To: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
Cc: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ANALYZE after restore
Date: 2002-04-03 07:59:16
Message-ID: 1017820759.2058.10.camel@rh72.home.ee
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2002-04-03 at 06:52, Gavin Sherry wrote:
> On Wed, 3 Apr 2002, Christopher Kings-Lynne wrote:
>
> > Hi,
> >
> > Would it be an idea to have pg_dump append an ANALYZE; command to the end of
> > its dumps to assist newbies / inexperienced admins?
>
> I do not think this is desired behaviour. Firstly, pg_dump is not just for
> restoring data to the system. Presumably another flag would need to be
> added to pg_dump to prevent an ANALYZE being appended.

Yes.

> This is messing and, in my opinion, it goes against the 'does what it says> it does' nature of Postgres.

What does pg_dump say it does ?

Or should pg_dump append ANALYZE only if it determines that ANALYZE has
been run on the database being dumped ?

Do you have any tools that will break when ANALYZE is added, (and which
don't break on the weird way of dumping foreign keys ;) ?

> Secondly, in experienced admins are not going to get
> experienced with database management unless they see that their database
> runs like a dog and they have to read the manual.

Rather they think that the database is indeed designed to run like a
dog.

For _forcing_ them newbies to learn we could append a new UNANALYZE
command that inserts delibarately bogus info into pg_statistic to make
it perform even worse by default ;)

In general, I'd prefer a database that has no need to be explicitly
maintained. How many experienced file-system managers do you know ?

---------------------
Hannu


From: Justin Clift <justin(at)postgresql(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ANALYZE after restore
Date: 2002-04-03 08:16:11
Message-ID: 3CAABA4B.52BCD2F4@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
>
> Gavin Sherry <swm(at)linuxworld(dot)com(dot)au> writes:
> > On Wed, 3 Apr 2002, Christopher Kings-Lynne wrote:
> >> Would it be an idea to have pg_dump append an ANALYZE; command to the end of
> >> its dumps to assist newbies / inexperienced admins?
>
> > I do not think this is desired behaviour.
>
> I agree with Gavin here ... a forced VACUUM or ANALYZE after a restore
> will just get in the way of people who know what they're doing, and it's
> not at all clear that it will help people who do not.

Sorry Tom and Gavin, but I feel it really comes down to our idea of what
we're
trying to do here :

a) A database which is very self-maintaining, so people DON'T HAVE to
learn it's intricacies in order to be getting decent performance.
(They'll have to learn the intricacies if they want *better*
performance)

b) A database which works. But if you want decent performance, you'd
better
take the time and effort to learn it.
(This is the approach the commercial vendors take)

I feel we should always target a) where it's possible to without it
seriously
getting in the way of people who've take the time to learn the skills.

The far majority of people who use PostgreSQL are in the category which
will
benefit from a) so they can put their time to other uses instead of
having to
learn and keep-up-to-date-with PostgreSQL. This will *always* be the
case.

Having decent performance by default should definitely be an important
objective, so having an ANALYZE command run at the end of a restore - by
default only - is a good idea.

Regards and best wishes,

Justin Clift

> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org

--
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."
- Indira Gandhi


From: Jan Wieck <janwieck(at)yahoo(dot)com>
To: Justin Clift <justin(at)postgresql(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ANALYZE after restore
Date: 2002-04-03 16:41:19
Message-ID: 200204031641.g33GfJr32202@saturn.janwieck.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Justin Clift wrote:
> Tom Lane wrote:
> >
> > Gavin Sherry <swm(at)linuxworld(dot)com(dot)au> writes:
> > > On Wed, 3 Apr 2002, Christopher Kings-Lynne wrote:
> > >> Would it be an idea to have pg_dump append an ANALYZE; command to the end of
> > >> its dumps to assist newbies / inexperienced admins?
> >
> > > I do not think this is desired behaviour.
> >
> > I agree with Gavin here ... a forced VACUUM or ANALYZE after a restore
> > will just get in the way of people who know what they're doing, and it's
> > not at all clear that it will help people who do not.
>
> Sorry Tom and Gavin, but I feel it really comes down to our idea of what
> we're
> trying to do here :
>
> a) A database which is very self-maintaining, so people DON'T HAVE to
> learn it's intricacies in order to be getting decent performance.
> (They'll have to learn the intricacies if they want *better*
> performance)

The defaults after a restore should result in index scans
most of the time, resulting in some medium decent
performance. And PostgreSQL needs some frequent VACUUM
anyway, so after a while this problem solves itself for the
average user.

A database wide forced VACUUM on the other hand can make
things worse. I have seen scenarios, where you have to
explicitly leave out ANALYZE for specific tables in order to
keep them index-scanned. So what you're proposing is to force
professional PostgreSQL users to wait after restore for a
useless ANALYZE to complete, before they can reset things
with a normal VACUUM to get their required performance back?
And all that just to make dummies happier?

Jan

> b) A database which works. But if you want decent performance, you'd
> better
> take the time and effort to learn it.
> (This is the approach the commercial vendors take)
>
> I feel we should always target a) where it's possible to without it
> seriously
> getting in the way of people who've take the time to learn the skills.
>
> The far majority of people who use PostgreSQL are in the category which
> will
> benefit from a) so they can put their time to other uses instead of
> having to
> learn and keep-up-to-date-with PostgreSQL. This will *always* be the
> case.
>
> Having decent performance by default should definitely be an important
> objective, so having an ANALYZE command run at the end of a restore - by
> default only - is a good idea.
>
> Regards and best wishes,
>
> Justin Clift
>
> > regards, tom lane
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
>
> --
> "My grandfather once told me that there are two kinds of people: those
> who work and those who take the credit. He told me to try to be in the
> first group; there was less competition there."
> - Indira Gandhi
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck(at)Yahoo(dot)com #


From: Justin Clift <justin(at)postgresql(dot)org>
To: Jan Wieck <janwieck(at)yahoo(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ANALYZE after restore
Date: 2002-04-03 17:06:02
Message-ID: 3CAB367A.4E1DD48D@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Jan,

Jan Wieck wrote:
>
<snip>
> The defaults after a restore should result in index scans
> most of the time, resulting in some medium decent
> performance. And PostgreSQL needs some frequent VACUUM
> anyway, so after a while this problem solves itself for the
> average user.
>
> A database wide forced VACUUM on the other hand can make
> things worse. I have seen scenarios, where you have to
> explicitly leave out ANALYZE for specific tables in order to
> keep them index-scanned. So what you're proposing is to force
> professional PostgreSQL users to wait after restore for a
> useless ANALYZE to complete, before they can reset things
> with a normal VACUUM to get their required performance back?
> And all that just to make dummies happier?
>
> Jan

Nope, I'm figuring that if it's an option, and the option is on by
default, then for the majority of people that will be a good thing.

Anyone that's a professional PostgreSQL user will know about to turn the
option off i.e. pg_dump --something (etc). Sure, we all make mistakes
and will forget now and again, but I don't think that should stop us
from taking into account that the majority of users out there are fairly
PostgreSQL clue-less.

If we can make it easy without much inconvenience and without
sacrificing the power of the database, we should.

:-)

Regards and best wishes,

Justin Clift

<snip>

--
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."
- Indira Gandhi


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jan Wieck <janwieck(at)yahoo(dot)com>
Cc: Justin Clift <justin(at)postgresql(dot)org>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ANALYZE after restore
Date: 2002-04-03 17:56:06
Message-ID: 10511.1017856566@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jan Wieck <janwieck(at)yahoo(dot)com> writes:
> ... And PostgreSQL needs some frequent VACUUM
> anyway, so after a while this problem solves itself for the
> average user.

Yes, that's the key point for me too. Anyone who doesn't set up for
routine vacuums/analyzes is going to have performance problems anyway.
Attacking that by making pg_dump force a vacuum is attacking the wrong
place.

There's been discussion of adding automatic background vacuums to
Postgres; that seems like a more useful response to the issue.

regards, tom lane


From: Justin Clift <justin(at)postgresql(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jan Wieck <janwieck(at)yahoo(dot)com>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ANALYZE after restore
Date: 2002-04-03 19:19:57
Message-ID: 3CAB55DD.4DF16B9F@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
>
> Jan Wieck <janwieck(at)yahoo(dot)com> writes:
> > ... And PostgreSQL needs some frequent VACUUM
> > anyway, so after a while this problem solves itself for the
> > average user.
>
> Yes, that's the key point for me too. Anyone who doesn't set up for
> routine vacuums/analyzes is going to have performance problems anyway.
> Attacking that by making pg_dump force a vacuum is attacking the wrong
> place.

Hi Tom,

Good point. Although I also think we're talking about two different
things here.

No-one is proposing running a VACCUM after the load, but instead getting
some accurate statistics about the data which was loaded.

I agree adding an automatic background vacuum thread/process/something
will be really, really useful too.
Should we instead have this proposed automatic background something also
update the statistics every now and again?

If so, I think this will all be a moot point.

:-)

Regards and best wishes,

Justin Clift


> There's been discussion of adding automatic background vacuums to
> Postgres; that seems like a more useful response to the issue.
>
> regards, tom lane

--
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."
- Indira Gandhi


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Justin Clift <justin(at)postgresql(dot)org>
Cc: Jan Wieck <janwieck(at)yahoo(dot)com>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ANALYZE after restore
Date: 2002-04-03 19:30:10
Message-ID: 11785.1017862210@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Justin Clift <justin(at)postgresql(dot)org> writes:
> I agree adding an automatic background vacuum thread/process/something
> will be really, really useful too.
> Should we instead have this proposed automatic background something also
> update the statistics every now and again?

Yes, I had always assumed that would be part of the feature ...

regards, tom lane


From: Justin Clift <justin(at)postgresql(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jan Wieck <janwieck(at)yahoo(dot)com>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ANALYZE after restore
Date: 2002-04-03 19:41:18
Message-ID: 3CAB5ADE.B31DCBB0@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
>
> Justin Clift <justin(at)postgresql(dot)org> writes:
> > I agree adding an automatic background vacuum thread/process/something
> > will be really, really useful too.
> > Should we instead have this proposed automatic background something also
> > update the statistics every now and again?
>
> Yes, I had always assumed that would be part of the feature ...

Hi Tom,

Cool. I wasn't sure of that (probably haven't been following the
correct threads).

That makes way more sense then.

:-)

Regards and best wishes,

Justin Clift

>
> regards, tom lane

--
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."
- Indira Gandhi


From: Scott Marlowe <smarlowe(at)ihs(dot)com>
To: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ANALYZE after restore
Date: 2002-04-04 18:27:55
Message-ID: Pine.LNX.4.33.0204041118180.13845-100000@css120.ihs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 3 Apr 2002, Hannu Krosing wrote:

> On Wed, 2002-04-03 at 06:52, Gavin Sherry wrote:
> > On Wed, 3 Apr 2002, Christopher Kings-Lynne wrote:
> >
> > > Hi,
> > >
> > > Would it be an idea to have pg_dump append an ANALYZE; command to the end of
> > > its dumps to assist newbies / inexperienced admins?
> >
> > I do not think this is desired behaviour. Firstly, pg_dump is not just for
> > restoring data to the system. Presumably another flag would need to be
> > added to pg_dump to prevent an ANALYZE being appended.
>
> Yes.
>
> > This is messing and, in my opinion, it goes against the 'does what it says> it does' nature of Postgres.
>
> What does pg_dump say it does ?

from man pg_dump:

pg_dump - extract a PostgreSQL database into a script file or other
archive file

Pretty simple really.

I've been using postgresql for about three years now, and it only took me
about 15 minutes of reading the docs to find the vacuum and vacuum
analyze command. It was far harder to figure out subselects,
transactions, outer joins, unions, and a dozen other things than vacuum.
I was a total database newbie back then, by the way.

One of the things I liked about postgresql was that it wasn't stuffed full
of marketing fluff to try and impress the PHBs at the top of the corporate
ladder, but was full of useful extensibility and was very much a "do what
it said it would" database.

while I agree that postgresql could do with some automated housekeeping
routines that would allow joe sixpack to grab it and go, no database that
has real power is going to run very well without some administration,
period.

The last place to put house keeping is in the end of my data dumps.
pg_dump's job is to dump the data from my database in a format that is as
transportable as possible. not to hold my hand the next time I need to
load data into my own database.

While I fully support a switch like -z on pg_dump that puts an analyze on
the end of my dumps if I so choose, I don't want them showing up
automatically and me wondering if the data feeds I make for other will
work.

I can see junior dbas who don't understand vacuum and analyze recommending
to people that they need to dump / restore their whole database once a
week to get good performance if we add aht analyze switch to the end of
the pg_dump file. NOT a good thing. :-)

anywho, I don't post much here, cause I don't hack postgresql that much,
but I love this database, and I don't want it filled up with useless
marketing cruft like analyze being haphazardly tacked onto the pg_dump
output, so my vote is a great big NO.