Re: [HACKERS] Stats processor not restarting

Lists: pgsql-hackerspgsql-patches
From: Magnus Hagander <magnus(at)hagander(dot)net>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Stats processor not restarting
Date: 2007-03-20 11:23:50
Message-ID: 20070320112350.GB24280@svr2.hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

I've noticed that if for example the autovacuum process dies (such as
with a kill -9 when testing my new shared mem implementation), only
autovac and bgwriter are restarted. The stats collector is terminated,
but not restarted. (Same goes for a regular backend, and not just
autovac)

Is there a reason for this, or is it a bug?

//Magnus


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Stats processor not restarting
Date: 2007-03-20 12:48:30
Message-ID: 20070320124830.GN24234@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Magnus Hagander wrote:
> I've noticed that if for example the autovacuum process dies (such as
> with a kill -9 when testing my new shared mem implementation), only
> autovac and bgwriter are restarted. The stats collector is terminated,
> but not restarted. (Same goes for a regular backend, and not just
> autovac)
>
> Is there a reason for this, or is it a bug?

I would say it is a bug, because the comments and code in ServerLoop()
and reaper() say different.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Stats processor not restarting
Date: 2007-03-20 14:50:33
Message-ID: 20070320145033.GE24280@svr2.hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Tue, Mar 20, 2007 at 08:48:30AM -0400, Alvaro Herrera wrote:
> Magnus Hagander wrote:
> > I've noticed that if for example the autovacuum process dies (such as
> > with a kill -9 when testing my new shared mem implementation), only
> > autovac and bgwriter are restarted. The stats collector is terminated,
> > but not restarted. (Same goes for a regular backend, and not just
> > autovac)
> >
> > Is there a reason for this, or is it a bug?
>
> I would say it is a bug, because the comments and code in ServerLoop()
> and reaper() say different.

Bah, sorry about the noise. It was the effect of
PGSTAT_RESTART_INTERVAL.

Do we want to add some logging when we don't restart it due to repeated
failures?

//Magnus


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Stats processor not restarting
Date: 2007-03-20 15:23:11
Message-ID: 13948.1174404191@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Magnus Hagander wrote:
>> I've noticed that if for example the autovacuum process dies (such as
>> with a kill -9 when testing my new shared mem implementation), only
>> autovac and bgwriter are restarted. The stats collector is terminated,
>> but not restarted. (Same goes for a regular backend, and not just
>> autovac)
>>
>> Is there a reason for this, or is it a bug?

> I would say it is a bug, because the comments and code in ServerLoop()
> and reaper() say different.

There is code in pgstat_start that limits the frequency with which new
stats collectors can be spawned --- maybe your test case is hitting that?

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Stats processor not restarting
Date: 2007-03-20 15:51:04
Message-ID: 14392.1174405864@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> Bah, sorry about the noise. It was the effect of
> PGSTAT_RESTART_INTERVAL.
> Do we want to add some logging when we don't restart it due to repeated
> failures?

Not really, but maybe it would be sensible to reset last_pgstat_start_time
when doing a database-wide restart? The motivation for the timeout was
to reduce cycle wastage if pgstat crashed by itself, but when you've
deliberately SIGQUITed it, that hardly seems to apply ...

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] Stats processor not restarting
Date: 2007-03-22 02:04:09
Message-ID: 200703220204.l2M249127340@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
> > Bah, sorry about the noise. It was the effect of
> > PGSTAT_RESTART_INTERVAL.
> > Do we want to add some logging when we don't restart it due to repeated
> > failures?
>
> Not really, but maybe it would be sensible to reset last_pgstat_start_time
> when doing a database-wide restart? The motivation for the timeout was
> to reduce cycle wastage if pgstat crashed by itself, but when you've
> deliberately SIGQUITed it, that hardly seems to apply ...

You mean like this, attached?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Attachment Content-Type Size
unknown_filename text/plain 571 bytes

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] Stats processor not restarting
Date: 2007-03-22 04:33:43
Message-ID: 25538.1174538023@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> Tom Lane wrote:
>> Not really, but maybe it would be sensible to reset last_pgstat_start_time
>> when doing a database-wide restart?

> You mean like this, attached?

I'd be fairly surprised if resetting the variable in the child process
had any effect on the postmaster...

I think a correct fix would involve exposing either the variable or a
routine to zero it from pgstat.c, and having postmaster.c do that at
the points where it intentionally SIGQUITs the stats collector.

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] Stats processor not restarting
Date: 2007-03-22 14:41:46
Message-ID: 200703221441.l2MEfkf05255@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > Tom Lane wrote:
> >> Not really, but maybe it would be sensible to reset last_pgstat_start_time
> >> when doing a database-wide restart?
>
> > You mean like this, attached?
>
> I'd be fairly surprised if resetting the variable in the child process
> had any effect on the postmaster...

Yep, me too. ;-)

> I think a correct fix would involve exposing either the variable or a
> routine to zero it from pgstat.c, and having postmaster.c do that at
> the points where it intentionally SIGQUITs the stats collector.

OK, patch attached. I just reset the value in all places where we were
killing the pgstat process and not immediately exiting ourselves.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Attachment Content-Type Size
unknown_filename text/plain 3.1 KB

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] Stats processor not restarting
Date: 2007-03-22 19:53:44
Message-ID: 200703221953.l2MJri200875@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


Applied. I did the case where we exit right away too, just for
consistency.

---------------------------------------------------------------------------

Bruce Momjian wrote:
> Tom Lane wrote:
> > Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > > Tom Lane wrote:
> > >> Not really, but maybe it would be sensible to reset last_pgstat_start_time
> > >> when doing a database-wide restart?
> >
> > > You mean like this, attached?
> >
> > I'd be fairly surprised if resetting the variable in the child process
> > had any effect on the postmaster...
>
> Yep, me too. ;-)
>
> > I think a correct fix would involve exposing either the variable or a
> > routine to zero it from pgstat.c, and having postmaster.c do that at
> > the points where it intentionally SIGQUITs the stats collector.
>
> OK, patch attached. I just reset the value in all places where we were
> killing the pgstat process and not immediately exiting ourselves.
>
> --
> Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
> EnterpriseDB http://www.enterprisedb.com
>
> + If your life is a hard drive, Christ can be your backup. +

> Index: src/backend/postmaster/pgstat.c
> ===================================================================
> RCS file: /cvsroot/pgsql/src/backend/postmaster/pgstat.c,v
> retrieving revision 1.149
> diff -c -c -r1.149 pgstat.c
> *** src/backend/postmaster/pgstat.c 16 Mar 2007 17:57:36 -0000 1.149
> --- src/backend/postmaster/pgstat.c 22 Mar 2007 14:40:36 -0000
> ***************
> *** 572,577 ****
> --- 572,581 ----
> return 0;
> }
>
> + void allow_immediate_pgstat_restart(void)
> + {
> + last_pgstat_start_time = 0;
> + }
>
> /* ------------------------------------------------------------
> * Public functions used by backends follow
> Index: src/backend/postmaster/postmaster.c
> ===================================================================
> RCS file: /cvsroot/pgsql/src/backend/postmaster/postmaster.c,v
> retrieving revision 1.526
> diff -c -c -r1.526 postmaster.c
> *** src/backend/postmaster/postmaster.c 7 Mar 2007 13:35:02 -0000 1.526
> --- src/backend/postmaster/postmaster.c 22 Mar 2007 14:40:37 -0000
> ***************
> *** 1896,1902 ****
> --- 1896,1905 ----
> signal_child(PgArchPID, SIGQUIT);
> /* Tell pgstat to shut down too; nothing left for it to do */
> if (PgStatPID != 0)
> + {
> signal_child(PgStatPID, SIGQUIT);
> + allow_immediate_pgstat_restart();
> + }
> /* Tell autovac launcher to shut down too */
> if (AutoVacPID != 0)
> signal_child(AutoVacPID, SIGTERM);
> ***************
> *** 1952,1958 ****
> --- 1955,1964 ----
> signal_child(PgArchPID, SIGQUIT);
> /* Tell pgstat to shut down too; nothing left for it to do */
> if (PgStatPID != 0)
> + {
> signal_child(PgStatPID, SIGQUIT);
> + allow_immediate_pgstat_restart();
> + }
> /* Tell autovac launcher to shut down too */
> if (AutoVacPID != 0)
> signal_child(AutoVacPID, SIGTERM);
> ***************
> *** 2241,2247 ****
> --- 2247,2256 ----
> signal_child(PgArchPID, SIGQUIT);
> /* Tell pgstat to shut down too; nothing left for it to do */
> if (PgStatPID != 0)
> + {
> signal_child(PgStatPID, SIGQUIT);
> + allow_immediate_pgstat_restart();
> + }
> /* Tell autovac launcher to shut down too */
> if (AutoVacPID != 0)
> signal_child(AutoVacPID, SIGTERM);
> ***************
> *** 2404,2409 ****
> --- 2413,2419 ----
> "SIGQUIT",
> (int) PgStatPID)));
> signal_child(PgStatPID, SIGQUIT);
> + allow_immediate_pgstat_restart();
> }
>
> /* We do NOT restart the syslogger */
> Index: src/include/pgstat.h
> ===================================================================
> RCS file: /cvsroot/pgsql/src/include/pgstat.h,v
> retrieving revision 1.55
> diff -c -c -r1.55 pgstat.h
> *** src/include/pgstat.h 16 Mar 2007 17:57:36 -0000 1.55
> --- src/include/pgstat.h 22 Mar 2007 14:40:38 -0000
> ***************
> *** 369,375 ****
> extern void pgstat_init(void);
> extern int pgstat_start(void);
> extern void pgstat_reset_all(void);
> !
> #ifdef EXEC_BACKEND
> extern void PgstatCollectorMain(int argc, char *argv[]);
> #endif
> --- 369,375 ----
> extern void pgstat_init(void);
> extern int pgstat_start(void);
> extern void pgstat_reset_all(void);
> ! extern void allow_immediate_pgstat_restart(void);
> #ifdef EXEC_BACKEND
> extern void PgstatCollectorMain(int argc, char *argv[]);
> #endif

>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +