Re: [PATCHES] Autovacuum launcher doesn't notice death of postmaster immediately

Lists: pgsql-hackerspgsql-patches
From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Autovacuum launcher doesn't notice death of postmaster immediately
Date: 2007-06-02 20:19:15
Message-ID: 200706022219.16212.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

I notice that in 8.3, when I kill the postmaster process with SIGKILL or
SIGSEGV, the child processes writer and stats collector go away
immediately, but the autovacuum launcher hangs around for up to a
minute. (I suppose this has to do with the periodic wakeups?). When
you try to restart the postmaster before that it fails with a complaint
that someone is still attached to the shared memory segment.

These are obviously not normal modes of operation, but I fear that this
could cause some problems with people's control scripts of the
sort, "it crashed, let's try to restart it".

--
Peter Eisentraut
http://developer.postgresql.org/~petere/


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum launcher doesn't notice death of postmaster immediately
Date: 2007-06-04 15:04:26
Message-ID: 20070604150426.GI4779@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Peter Eisentraut wrote:
> I notice that in 8.3, when I kill the postmaster process with SIGKILL or
> SIGSEGV, the child processes writer and stats collector go away
> immediately, but the autovacuum launcher hangs around for up to a
> minute. (I suppose this has to do with the periodic wakeups?). When
> you try to restart the postmaster before that it fails with a complaint
> that someone is still attached to the shared memory segment.
>
> These are obviously not normal modes of operation, but I fear that this
> could cause some problems with people's control scripts of the
> sort, "it crashed, let's try to restart it".

The launcher is set up to wake up in autovacuum_naptime seconds at most.
So if the user configures a ridiculuos time (for example 86400 seconds,
which I've seen) then the launcher would not detect the postmaster death
for a very long time, which is probably bad. (You measured a one minute
delay because that's the default naptime).

Maybe this is not such a hot idea, and we should wake the launcher up
every 10 seconds (or less?). I picked 10 seconds because that's the
time the bgwriter sleeps if there is no activity configured. Does this
sound acceptable? The only problem with waking it up too frequently is
that it would be waking the system up (for gettimeofday()) even if
nothing is happening.

I also just noticed that the launcher will check if postmaster is alive,
then sleep, and then possibly do some work. So if the postmaster died
in the sleep period, the launcher might try to do some work. Should we
add a check for postmaster liveliness after the sleep?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum launcher doesn't notice death of postmaster immediately
Date: 2007-06-07 15:50:36
Message-ID: 20070607155036.GH92628@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Mon, Jun 04, 2007 at 11:04:26AM -0400, Alvaro Herrera wrote:
> The launcher is set up to wake up in autovacuum_naptime seconds at most.
> So if the user configures a ridiculuos time (for example 86400 seconds,
> which I've seen) then the launcher would not detect the postmaster death

Yeah, I've seen people set that up with the intention of "now autovacuum
will only run during our slow time!". I'm thinking it'd be worth
mentioning in the docs that this won't work, and instead suggesting that
they run vacuumdb -a or equivalent at that time instead. Thoughts?
--
Jim Nasby decibel(at)decibel(dot)org
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)


From: "Andrew Hammond" <andrew(dot)george(dot)hammond(at)gmail(dot)com>
To: "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>
Cc: "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Peter Eisentraut" <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum launcher doesn't notice death of postmaster immediately
Date: 2007-06-07 19:13:09
Message-ID: 5a0a9d6f0706071213g6a3a984bm247ebfbc80c95444@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On 6/7/07, Jim C. Nasby <decibel(at)decibel(dot)org> wrote:
> On Mon, Jun 04, 2007 at 11:04:26AM -0400, Alvaro Herrera wrote:
> > The launcher is set up to wake up in autovacuum_naptime seconds at most.
> > So if the user configures a ridiculuos time (for example 86400 seconds,
> > which I've seen) then the launcher would not detect the postmaster death

Is there some threshold after which we should have PostgreSQL emit a
warning to the effect of "autovacuum_naptime is very large. Are you
sure you know what you're doing?"

> Yeah, I've seen people set that up with the intention of "now autovacuum
> will only run during our slow time!". I'm thinking it'd be worth
> mentioning in the docs that this won't work, and instead suggesting that
> they run vacuumdb -a or equivalent at that time instead. Thoughts?

Hmmm... it seems to me that points new users towards not using
autovacuum, which doesn't seem like the best idea. I think it'd be
better to say that setting the naptime really high is a Bad Idea.
Instead, if they want to shift maintenances to "off hours" they should
consider using a cron job that bonks around the
pg_autovacuum.vac_base_thresh or vac_scale_factor values for tables
they don't want vacuumed during "operational hours" (set them really
high at the start of operational hours, then to normal during off
hours). Tweaking the enable column would work too, but they presumably
don't want to disable ANALYZE, although it's entirely likely that new
users don't know what ANALYZE does, in which case they _really_ don't
want to disable it.

This should probably be very close to a section that says something
about how insufficient maintenance can be expected to lead to greater
performance issues than using autovacuum with default settings.
Assuming we believe that to be the case, which I think is reasonable
given that we are now defaulting to having autovacuum enabled.

Andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Andrew Hammond" <andrew(dot)george(dot)hammond(at)gmail(dot)com>
Cc: "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Peter Eisentraut" <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum launcher doesn't notice death of postmaster immediately
Date: 2007-06-07 19:27:34
Message-ID: 20701.1181244454@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

"Andrew Hammond" <andrew(dot)george(dot)hammond(at)gmail(dot)com> writes:
> Hmmm... it seems to me that points new users towards not using
> autovacuum, which doesn't seem like the best idea. I think it'd be
> better to say that setting the naptime really high is a Bad Idea.

It seems like we should have an upper limit on the GUC variable that's
less than INT_MAX ;-). Would an hour be sane? 10 minutes?

This is independent of the problem at hand, though, which is that we
probably want the launcher to notice postmaster death in less time
than autovacuum_naptime, for reasonable values of same.

regards, tom lane


From: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Hammond <andrew(dot)george(dot)hammond(at)gmail(dot)com>, "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum launcher doesn't notice death of postmaster immediately
Date: 2007-06-07 20:24:58
Message-ID: 4668699A.5030906@zeut.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> "Andrew Hammond" <andrew(dot)george(dot)hammond(at)gmail(dot)com> writes:
>> Hmmm... it seems to me that points new users towards not using
>> autovacuum, which doesn't seem like the best idea. I think it'd be
>> better to say that setting the naptime really high is a Bad Idea.
>
> It seems like we should have an upper limit on the GUC variable that's
> less than INT_MAX ;-). Would an hour be sane? 10 minutes?
>
> This is independent of the problem at hand, though, which is that we
> probably want the launcher to notice postmaster death in less time
> than autovacuum_naptime, for reasonable values of same.

Do we need a configurable autovacuum naptime at all? I know I put it in
the original contrib autovacuum because I had no idea what knobs might
be needed. I can't see a good reason to ever have a naptime longer than
the default 60 seconds, but I suppose one might want a smaller naptime
for a very active system?


From: Michael Paesold <mpaesold(at)gmx(dot)at>
To: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Hammond <andrew(dot)george(dot)hammond(at)gmail(dot)com>, "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum launcher doesn't notice death of postmaster immediately
Date: 2007-06-08 07:54:09
Message-ID: 46690B21.60104@gmx.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Matthew T. O'Connor schrieb:
> Tom Lane wrote:
>> "Andrew Hammond" <andrew(dot)george(dot)hammond(at)gmail(dot)com> writes:
>>> Hmmm... it seems to me that points new users towards not using
>>> autovacuum, which doesn't seem like the best idea. I think it'd be
>>> better to say that setting the naptime really high is a Bad Idea.
>>
>> It seems like we should have an upper limit on the GUC variable that's
>> less than INT_MAX ;-). Would an hour be sane? 10 minutes?
>>
>> This is independent of the problem at hand, though, which is that we
>> probably want the launcher to notice postmaster death in less time
>> than autovacuum_naptime, for reasonable values of same.
>
> Do we need a configurable autovacuum naptime at all? I know I put it in
> the original contrib autovacuum because I had no idea what knobs might
> be needed. I can't see a good reason to ever have a naptime longer than
> the default 60 seconds, but I suppose one might want a smaller naptime
> for a very active system?

A PostgreSQL database on my laptop for testing. It should use as little
resources as possible while being idle. That would be a scenario for
naptime greater than 60 seconds, wouldn't it?

Best Regards
Michael Paesold


From: "Zeugswetter Andreas ADI SD" <ZeugswetterA(at)spardat(dot)at>
To: "Andrew Hammond" <andrew(dot)george(dot)hammond(at)gmail(dot)com>, "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>
Cc: "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Peter Eisentraut" <peter_e(at)gmx(dot)net>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Autovacuum launcher doesn't notice death of postmaster immediately
Date: 2007-06-08 08:12:52
Message-ID: E1539E0ED7043848906A8FF995BDA579021B3417@m0143.s-mxs.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


> > > The launcher is set up to wake up in autovacuum_naptime seconds at
most.

Imho the fix is usually to have a sleep loop.

Andreas


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Zeugswetter Andreas ADI SD <ZeugswetterA(at)spardat(dot)at>
Cc: Andrew Hammond <andrew(dot)george(dot)hammond(at)gmail(dot)com>, "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum launcher doesn't notice death of postmaster immediately
Date: 2007-06-08 13:27:26
Message-ID: 20070608132726.GD9071@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Zeugswetter Andreas ADI SD escribió:
>
> > > > The launcher is set up to wake up in autovacuum_naptime seconds at
> > > > most.
>
> Imho the fix is usually to have a sleep loop.

This is what we have. The sleep time depends on the schedule of next
vacuum for the closest database in time. If naptime is high, the sleep
time will be high (depending on number of databases needing attention).

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Matthew O'Connor <matthew(at)zeut(dot)net>
To: Michael Paesold <mpaesold(at)gmx(dot)at>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Hammond <andrew(dot)george(dot)hammond(at)gmail(dot)com>, "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum launcher doesn't notice death of postmaster immediately
Date: 2007-06-08 13:49:56
Message-ID: 46695E84.90101@zeut.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Michael Paesold wrote:
> Matthew T. O'Connor schrieb:
>> Do we need a configurable autovacuum naptime at all? I know I put it
>> in the original contrib autovacuum because I had no idea what knobs
>> might be needed. I can't see a good reason to ever have a naptime
>> longer than the default 60 seconds, but I suppose one might want a
>> smaller naptime for a very active system?
>
> A PostgreSQL database on my laptop for testing. It should use as little
> resources as possible while being idle. That would be a scenario for
> naptime greater than 60 seconds, wouldn't it?

Perhaps, but that isn't the use case PostgresSQL is being designed for.
If that is what you really need, then you should probably disable
autovacuum. Also a very long naptime means that autovacuum will still
wake up at random times and to do the work. At least with short
naptime, it will do the work shortly after you updated your tables.


From: "Zeugswetter Andreas ADI SD" <ZeugswetterA(at)spardat(dot)at>
To: "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>
Cc: "Andrew Hammond" <andrew(dot)george(dot)hammond(at)gmail(dot)com>, "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>, "Peter Eisentraut" <peter_e(at)gmx(dot)net>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Autovacuum launcher doesn't notice death of postmaster immediately
Date: 2007-06-08 14:57:59
Message-ID: E1539E0ED7043848906A8FF995BDA579021B347A@m0143.s-mxs.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


> > > > > The launcher is set up to wake up in autovacuum_naptime
seconds
> > > > > at most.
> >
> > Imho the fix is usually to have a sleep loop.
>
> This is what we have. The sleep time depends on the schedule
> of next vacuum for the closest database in time. If naptime
> is high, the sleep time will be high (depending on number of
> databases needing attention).

No, I meant a "while (sleep 1(or 10) and counter < longtime) check for
exit" instead of "sleep longtime".

Andreas


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Zeugswetter Andreas ADI SD <ZeugswetterA(at)spardat(dot)at>
Cc: Andrew Hammond <andrew(dot)george(dot)hammond(at)gmail(dot)com>, "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum launcher doesn't notice death of postmaster immediately
Date: 2007-06-08 15:14:52
Message-ID: 20070608151452.GI9071@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Zeugswetter Andreas ADI SD escribió:
>
> > > > > > The launcher is set up to wake up in autovacuum_naptime
> seconds
> > > > > > at most.
> > >
> > > Imho the fix is usually to have a sleep loop.
> >
> > This is what we have. The sleep time depends on the schedule
> > of next vacuum for the closest database in time. If naptime
> > is high, the sleep time will be high (depending on number of
> > databases needing attention).
>
> No, I meant a "while (sleep 1(or 10) and counter < longtime) check for
> exit" instead of "sleep longtime".

Ah; yes, what I was proposing (or thought about proposing, not sure if I
posted it or not) was putting a upper limit of 10 seconds in the sleep
(bgwriter sleeps 10 seconds if configured to not do anything). Though
10 seconds may seem like an eternity for systems like the ones Peter was
talking about, where there is a script trying to restart the server as
soon as the postmaster dies.

--
Alvaro Herrera Developer, http://www.PostgreSQL.org/
"Limítate a mirar... y algun día veras"


From: "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>
To: Matthew O'Connor <matthew(at)zeut(dot)net>
Cc: Michael Paesold <mpaesold(at)gmx(dot)at>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Hammond <andrew(dot)george(dot)hammond(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum launcher doesn't notice death of postmaster immediately
Date: 2007-06-08 21:44:27
Message-ID: 20070608214427.GS92628@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Fri, Jun 08, 2007 at 09:49:56AM -0400, Matthew O'Connor wrote:
> Michael Paesold wrote:
> >Matthew T. O'Connor schrieb:
> >>Do we need a configurable autovacuum naptime at all? I know I put it
> >>in the original contrib autovacuum because I had no idea what knobs
> >>might be needed. I can't see a good reason to ever have a naptime
> >>longer than the default 60 seconds, but I suppose one might want a
> >>smaller naptime for a very active system?
> >
> >A PostgreSQL database on my laptop for testing. It should use as little
> >resources as possible while being idle. That would be a scenario for
> >naptime greater than 60 seconds, wouldn't it?
>
> Perhaps, but that isn't the use case PostgresSQL is being designed for.
> If that is what you really need, then you should probably disable
> autovacuum. Also a very long naptime means that autovacuum will still
> wake up at random times and to do the work. At least with short
> naptime, it will do the work shortly after you updated your tables.

Agreed. Maybe 10 minutes might make sense, but the overhead of checking
to see if anything needs vacuuming is pretty tiny.

There *is* reason to allow setting the naptime smaller, though (or at
least there was; perhaps Alvero's recent changes negate this need):
clusters that have a large number of databases. I've worked with folks
who are in a hosted environment and give each customer their own
database; it's not hard to get a couple hundred databases that way.
Setting the naptime higher than a second in such an environment would
mean it could be hours before a database is checked for vacuuming.
--
Jim Nasby decibel(at)decibel(dot)org
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)


From: "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>
To: Andrew Hammond <andrew(dot)george(dot)hammond(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum launcher doesn't notice death of postmaster immediately
Date: 2007-06-08 21:47:46
Message-ID: 20070608214746.GT92628@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Thu, Jun 07, 2007 at 12:13:09PM -0700, Andrew Hammond wrote:
> On 6/7/07, Jim C. Nasby <decibel(at)decibel(dot)org> wrote:
> >On Mon, Jun 04, 2007 at 11:04:26AM -0400, Alvaro Herrera wrote:
> >> The launcher is set up to wake up in autovacuum_naptime seconds at most.
> >> So if the user configures a ridiculuos time (for example 86400 seconds,
> >> which I've seen) then the launcher would not detect the postmaster death
>
> Is there some threshold after which we should have PostgreSQL emit a
> warning to the effect of "autovacuum_naptime is very large. Are you
> sure you know what you're doing?"
>
> >Yeah, I've seen people set that up with the intention of "now autovacuum
> >will only run during our slow time!". I'm thinking it'd be worth
> >mentioning in the docs that this won't work, and instead suggesting that
> >they run vacuumdb -a or equivalent at that time instead. Thoughts?
>
> Hmmm... it seems to me that points new users towards not using
> autovacuum, which doesn't seem like the best idea. I think it'd be

I think we could easily word it so that it's clear that just letting
autovacuum do it's thing is preferred.

> better to say that setting the naptime really high is a Bad Idea.
> Instead, if they want to shift maintenances to "off hours" they should
> consider using a cron job that bonks around the
> pg_autovacuum.vac_base_thresh or vac_scale_factor values for tables
> they don't want vacuumed during "operational hours" (set them really
> high at the start of operational hours, then to normal during off
> hours). Tweaking the enable column would work too, but they presumably
> don't want to disable ANALYZE, although it's entirely likely that new
> users don't know what ANALYZE does, in which case they _really_ don't
> want to disable it.

That sounds like a rather ugly solution, and one that would be hard to
implement; not something to be putting in the docs.
--
Jim Nasby decibel(at)decibel(dot)org
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>
Cc: "Matthew O'Connor" <matthew(at)zeut(dot)net>, Michael Paesold <mpaesold(at)gmx(dot)at>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Hammond <andrew(dot)george(dot)hammond(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum launcher doesn't notice death of postmaster immediately
Date: 2007-06-08 22:06:47
Message-ID: 20070608220647.GB23222@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Jim C. Nasby escribió:

> There *is* reason to allow setting the naptime smaller, though (or at
> least there was; perhaps Alvero's recent changes negate this need):
> clusters that have a large number of databases. I've worked with folks
> who are in a hosted environment and give each customer their own
> database; it's not hard to get a couple hundred databases that way.
> Setting the naptime higher than a second in such an environment would
> mean it could be hours before a database is checked for vacuuming.

Yes, the code in HEAD is different -- each database will be considered
separately. So the huge database taking all day to vacuum will not stop
the tiny databases from being vacuumed in a timely manner.

And the very huge table in that database will not stop the other tables
in the database from being vacuumed either. There can be more than one
worker in a single database.

The limit is autovacuum_max_workers.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>, Michael Paesold <mpaesold(at)gmx(dot)at>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Hammond <andrew(dot)george(dot)hammond(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum launcher doesn't notice death of postmaster immediately
Date: 2007-06-08 22:40:59
Message-ID: 4669DAFB.1060600@zeut.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Alvaro Herrera wrote:
> Jim C. Nasby escribió:
>> There *is* reason to allow setting the naptime smaller, though (or at
>> least there was; perhaps Alvero's recent changes negate this need):
>> clusters that have a large number of databases. I've worked with folks
>> who are in a hosted environment and give each customer their own
>> database; it's not hard to get a couple hundred databases that way.
>> Setting the naptime higher than a second in such an environment would
>> mean it could be hours before a database is checked for vacuuming.
>
> Yes, the code in HEAD is different -- each database will be considered
> separately. So the huge database taking all day to vacuum will not stop
> the tiny databases from being vacuumed in a timely manner.
>
> And the very huge table in that database will not stop the other tables
> in the database from being vacuumed either. There can be more than one
> worker in a single database.

Ok, but I think the question posed is that in say a virtual hosting
environment there might be say 1,000 databases in the cluster. Am I
still going to have to wait a long time for my database to get vacuumed?
I don't think this has changed much no?

(If default naptime is 1 minute, then autovacuum won't even look at a
given database but once every 1,000 minutes (16.67 hours) assuming that
there isn't enough work to keep all the workers busy.)


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>
Cc: "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>, Michael Paesold <mpaesold(at)gmx(dot)at>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Hammond <andrew(dot)george(dot)hammond(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum launcher doesn't notice death of postmaster immediately
Date: 2007-06-08 23:19:40
Message-ID: 20070608231940.GE23222@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Matthew T. O'Connor escribió:

> Ok, but I think the question posed is that in say a virtual hosting
> environment there might be say 1,000 databases in the cluster. Am I
> still going to have to wait a long time for my database to get vacuumed?
> I don't think this has changed much no?

Depends on how much time it takes to vacuum the other 999 databases.
The default max workers is 3.

> (If default naptime is 1 minute, then autovacuum won't even look at a
> given database but once every 1,000 minutes (16.67 hours) assuming that
> there isn't enough work to keep all the workers busy.)

The naptime is per database. Which means if you have 1000 databases and
a naptime of 60 seconds, the launcher is going to wake up every 100
milliseconds to check things up. (This results from 60000 / 1000 = 60
ms, but there is a minimum of 100 ms just to keep things sane).

If there are 3 workers and each of the 1000 databases in average takes
10 seconds to vacuum, there will be around 3000 seconds between autovac
runs of your database assuming my math is right.

I hope those 1000 databases you put in your shared hosting are not very
big.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>, "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>, Michael Paesold <mpaesold(at)gmx(dot)at>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Hammond <andrew(dot)george(dot)hammond(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum launcher doesn't notice death of postmaster immediately
Date: 2007-06-09 05:49:00
Message-ID: 466A3F4C.5040409@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Alvaro Herrera wrote:
> Matthew T. O'Connor escribió:
>
>> Ok, but I think the question posed is that in say a virtual hosting
>> environment there might be say 1,000 databases in the cluster.

That is uhmmm insane... 1000 databases?

Joshua D. Drake

Am I
>> still going to have to wait a long time for my database to get vacuumed?
>> I don't think this has changed much no?
>
> Depends on how much time it takes to vacuum the other 999 databases.
> The default max workers is 3.
>
>> (If default naptime is 1 minute, then autovacuum won't even look at a
>> given database but once every 1,000 minutes (16.67 hours) assuming that
>> there isn't enough work to keep all the workers busy.)
>
> The naptime is per database. Which means if you have 1000 databases and
> a naptime of 60 seconds, the launcher is going to wake up every 100
> milliseconds to check things up. (This results from 60000 / 1000 = 60
> ms, but there is a minimum of 100 ms just to keep things sane).
>
> If there are 3 workers and each of the 1000 databases in average takes
> 10 seconds to vacuum, there will be around 3000 seconds between autovac
> runs of your database assuming my math is right.
>
> I hope those 1000 databases you put in your shared hosting are not very
> big.
>

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


From: "Dann Corbit" <DCorbit(at)connx(dot)com>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>
Cc: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>, "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>, "Michael Paesold" <mpaesold(at)gmx(dot)at>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Andrew Hammond" <andrew(dot)george(dot)hammond(at)gmail(dot)com>, "Peter Eisentraut" <peter_e(at)gmx(dot)net>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Autovacuum launcher doesn't notice death of postmaster immediately
Date: 2007-06-09 05:55:52
Message-ID: D425483C2C5C9F49B5B7A41F89441547010006FA@postal.corporate.connx.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

> -----Original Message-----
> From: pgsql-hackers-owner(at)postgresql(dot)org [mailto:pgsql-hackers-
> owner(at)postgresql(dot)org] On Behalf Of Joshua D. Drake
> Sent: Friday, June 08, 2007 10:49 PM
> To: Alvaro Herrera
> Cc: Matthew T. O'Connor; Jim C. Nasby; Michael Paesold; Tom Lane; Andrew
> Hammond; Peter Eisentraut; pgsql-hackers(at)postgresql(dot)org
> Subject: Re: [HACKERS] Autovacuum launcher doesn't notice death of
> postmaster immediately
>
> Alvaro Herrera wrote:
> > Matthew T. O'Connor escribió:
> >
> >> Ok, but I think the question posed is that in say a virtual hosting
> >> environment there might be say 1,000 databases in the cluster.
>
> That is uhmmm insane... 1000 databases?

Not in a test environment. We have several hundred databases here. Of course, only a few dozen (or at most ~100) are of any one type, but I can imagine that under certain circumstances 1000 databases would not be unreasonable.

[snip]


From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-patches(at)postgresql(dot)org
Subject: Re: Autovacuum launcher doesn't notice death of postmaster immediately
Date: 2007-06-12 08:35:55
Message-ID: 20070612172451.6C41.ITAGAKI.TAKAHIRO@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


Alvaro Herrera <alvherre(at)commandprompt(dot)com> wrote:

> > No, I meant a "while (sleep 1(or 10) and counter < longtime) check for
> > exit" instead of "sleep longtime".
>
> Ah; yes, what I was proposing (or thought about proposing, not sure if I
> posted it or not) was putting a upper limit of 10 seconds in the sleep
> (bgwriter sleeps 10 seconds if configured to not do anything). Though
> 10 seconds may seem like an eternity for systems like the ones Peter was
> talking about, where there is a script trying to restart the server as
> soon as the postmaster dies.

Here is a patch for split-sleep of autovacuum_naptime.

There are some other issues in CVS HEAD; We use the calculation
{autovacuum_naptime * 1000000} in launcher_determine_sleep().
The result will be corrupted if we set autovacuum_naptime to >2147.

In another place, we use {autovacuum_naptime * 1000}, so we should
set the upper bound to INT_MAX/1000 instead of INT_MAX.
Incidentally, we've already had the same protections for
log_min_duration_statement and log_autovacuum.

I hope this patch could fix those large-autovacuum_naptime problems.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

Attachment Content-Type Size
autovacuum_naptime_overflow.patch application/octet-stream 1.2 KB

From: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Zeugswetter Andreas ADI SD <ZeugswetterA(at)spardat(dot)at>, Andrew Hammond <andrew(dot)george(dot)hammond(at)gmail(dot)com>, "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum launcher doesn't notice death of postmaster immediately
Date: 2007-06-12 10:23:50
Message-ID: 466E7436.5070000@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Alvaro Herrera wrote:
> Zeugswetter Andreas ADI SD escribió:
>>>>>>> The launcher is set up to wake up in autovacuum_naptime
>> seconds
>>>>>>> at most.
>>>> Imho the fix is usually to have a sleep loop.
>>> This is what we have. The sleep time depends on the schedule
>>> of next vacuum for the closest database in time. If naptime
>>> is high, the sleep time will be high (depending on number of
>>> databases needing attention).
>> No, I meant a "while (sleep 1(or 10) and counter < longtime) check for
>> exit" instead of "sleep longtime".
>
> Ah; yes, what I was proposing (or thought about proposing, not sure if I
> posted it or not) was putting a upper limit of 10 seconds in the sleep
> (bgwriter sleeps 10 seconds if configured to not do anything). Though
> 10 seconds may seem like an eternity for systems like the ones Peter was
> talking about, where there is a script trying to restart the server as
> soon as the postmaster dies.

There is also one "wild" solution. Postmaster and bgwriter will connect
with socket/pipe and select command will be used instead sleep. If
connection unexpectedly fails, select finish immediately and we are able
to handle this issue asap. This socket should be used also in some
special case when we need wake up it faster.

Zdenek


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Zeugswetter Andreas ADI SD <ZeugswetterA(at)spardat(dot)at>, Andrew Hammond <andrew(dot)george(dot)hammond(at)gmail(dot)com>, "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum launcher doesn't notice death of postmaster immediately
Date: 2007-06-12 10:37:29
Message-ID: 20070612103729.GA3332@svr2.hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Tue, Jun 12, 2007 at 12:23:50PM +0200, Zdenek Kotala wrote:
> Alvaro Herrera wrote:
> >Zeugswetter Andreas ADI SD escribió:
> >>>>>>>The launcher is set up to wake up in autovacuum_naptime
> >>seconds
> >>>>>>>at most.
> >>>>Imho the fix is usually to have a sleep loop.
> >>>This is what we have. The sleep time depends on the schedule
> >>>of next vacuum for the closest database in time. If naptime
> >>>is high, the sleep time will be high (depending on number of
> >>>databases needing attention).
> >>No, I meant a "while (sleep 1(or 10) and counter < longtime) check for
> >>exit" instead of "sleep longtime".
> >
> >Ah; yes, what I was proposing (or thought about proposing, not sure if I
> >posted it or not) was putting a upper limit of 10 seconds in the sleep
> >(bgwriter sleeps 10 seconds if configured to not do anything). Though
> >10 seconds may seem like an eternity for systems like the ones Peter was
> >talking about, where there is a script trying to restart the server as
> >soon as the postmaster dies.
>
> There is also one "wild" solution. Postmaster and bgwriter will connect
> with socket/pipe and select command will be used instead sleep. If
> connection unexpectedly fails, select finish immediately and we are able
> to handle this issue asap. This socket should be used also in some
> special case when we need wake up it faster.

Given the amount of problems we've had with pipes on win32, let's try to
avoid adding extra ones unless they're really necessary. If split-sleep
works, that seems a safer bet.

//Magnus


From: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Zeugswetter Andreas ADI SD <ZeugswetterA(at)spardat(dot)at>, Andrew Hammond <andrew(dot)george(dot)hammond(at)gmail(dot)com>, "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum launcher doesn't notice death of postmaster immediately
Date: 2007-06-12 11:20:21
Message-ID: 466E8175.20003@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Magnus Hagander wrote:
> On Tue, Jun 12, 2007 at 12:23:50PM +0200, Zdenek Kotala wrote:
>> Alvaro Herrera wrote:
>>> Zeugswetter Andreas ADI SD escribió:
>>>>>>>>> The launcher is set up to wake up in autovacuum_naptime
>>>> seconds
>>>>>>>>> at most.
>>>>>> Imho the fix is usually to have a sleep loop.
>>>>> This is what we have. The sleep time depends on the schedule
>>>>> of next vacuum for the closest database in time. If naptime
>>>>> is high, the sleep time will be high (depending on number of
>>>>> databases needing attention).
>>>> No, I meant a "while (sleep 1(or 10) and counter < longtime) check for
>>>> exit" instead of "sleep longtime".
>>> Ah; yes, what I was proposing (or thought about proposing, not sure if I
>>> posted it or not) was putting a upper limit of 10 seconds in the sleep
>>> (bgwriter sleeps 10 seconds if configured to not do anything). Though
>>> 10 seconds may seem like an eternity for systems like the ones Peter was
>>> talking about, where there is a script trying to restart the server as
>>> soon as the postmaster dies.
>> There is also one "wild" solution. Postmaster and bgwriter will connect
>> with socket/pipe and select command will be used instead sleep. If
>> connection unexpectedly fails, select finish immediately and we are able
>> to handle this issue asap. This socket should be used also in some
>> special case when we need wake up it faster.
>
> Given the amount of problems we've had with pipes on win32, let's try to
> avoid adding extra ones unless they're really necessary. If split-sleep
> works, that seems a safer bet.

Ok It should be problem. But I'm afraid split-sleep is not good solution
as well. It should generate a lot of race condition in start/stop
scripts and monitoring tools. Much better should be improve pg_ctl to
perform clean up ("pg_ctl cleanup) when postmaster fails.

I think we must offer deterministic way to packagers integrator how to
handle this issue.

Zdenek


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-patches(at)postgresql(dot)org
Subject: Re: [PATCHES] Autovacuum launcher doesn't notice death of postmaster immediately
Date: 2007-06-13 17:29:11
Message-ID: 20070613172911.GC11499@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

ITAGAKI Takahiro wrote:
>
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> wrote:
>
> > > No, I meant a "while (sleep 1(or 10) and counter < longtime) check for
> > > exit" instead of "sleep longtime".
> >
> > Ah; yes, what I was proposing (or thought about proposing, not sure if I
> > posted it or not) was putting a upper limit of 10 seconds in the sleep
> > (bgwriter sleeps 10 seconds if configured to not do anything). Though
> > 10 seconds may seem like an eternity for systems like the ones Peter was
> > talking about, where there is a script trying to restart the server as
> > soon as the postmaster dies.
>
> Here is a patch for split-sleep of autovacuum_naptime.
>
> There are some other issues in CVS HEAD; We use the calculation
> {autovacuum_naptime * 1000000} in launcher_determine_sleep().
> The result will be corrupted if we set autovacuum_naptime to >2147.

Ugh. How about this patch; this avoids the overflow issue altogether.
I am not sure that this works on Win32 but it seems we are already using
struct timeval elsewhere, so I don't see why it wouldn't work.

> In another place, we use {autovacuum_naptime * 1000}, so we should
> set the upper bound to INT_MAX/1000 instead of INT_MAX.
> Incidentally, we've already had the same protections for
> log_min_duration_statement and log_autovacuum.

Hmm, yes, the naptime should have an upper bound of INT_MAX/1000. It
doesn't seem worth the trouble of changing those places, when we know
that such a high value of naptime is uselessly high.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Attachment Content-Type Size
av-naptime-overflow.patch text/x-diff 6.5 KB

From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] Autovacuum launcher doesn't notice death of postmaster immediately
Date: 2007-06-13 18:49:11
Message-ID: 20070613184911.GD11499@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Alvaro Herrera wrote:

> > Ah; yes, what I was proposing (or thought about proposing, not sure if I
> > posted it or not) was putting a upper limit of 10 seconds in the sleep
> > (bgwriter sleeps 10 seconds if configured to not do anything). Though
> > 10 seconds may seem like an eternity for systems like the ones Peter was
> > talking about, where there is a script trying to restart the server as
> > soon as the postmaster dies.

Peter, is 10 seconds good enough for you?

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.