Re: pgstat: delayed write of stats file

Lists: pgsql-patches
From: "Magnus Hagander" <mha(at)sollentuna(dot)net>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "PostgreSQL-patches" <pgsql-patches(at)postgresql(dot)org>
Subject: Re: pgstat: delayed write of stats file
Date: 2006-04-30 08:56:14
Message-ID: 6BCB9D8A16AC4241919521715F4D8BCEA0F94A@algol.sollentuna.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

> >>> This was not ready to be applied, was it?
>
> > I don't recall any specific objections that weren't answered.
>
> How about the fact that it's already caused one buildfarm failure?
>
> http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=wasp&dt=2006
-04-30%2003:05:01

Well, that objection certainly wasn't raised before since it wasn't on
buildfarm, and it passed the regression tests many times on my systems.

Oh, and apparantly it caused the same issue on firefly. I don't really
see the connection between these two platforms and why it should happen
just there, though. Any ideas on that part?

//Magnus


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Magnus Hagander <mha(at)sollentuna(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: pgstat: delayed write of stats file
Date: 2006-05-07 13:09:17
Message-ID: 200605071309.k47D9HM26331@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Magnus Hagander wrote:
> > >>> This was not ready to be applied, was it?
> >
> > > I don't recall any specific objections that weren't answered.
> >
> > How about the fact that it's already caused one buildfarm failure?
> >
> > http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=wasp&dt=2006
> -04-30%2003:05:01
>
>
> Well, that objection certainly wasn't raised before since it wasn't on
> buildfarm, and it passed the regression tests many times on my systems.
>
> Oh, and apparantly it caused the same issue on firefly. I don't really
> see the connection between these two platforms and why it should happen
> just there, though. Any ideas on that part?

Was this fixed somehow? I don't see any more buildfarm failures, but I
didn't see a fix go in either. It is very possible the pgport fix I did
yesterday is related, but I have not applied it yet.

--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Magnus Hagander <mha(at)sollentuna(dot)net>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: pgstat: delayed write of stats file
Date: 2006-05-07 15:05:43
Message-ID: 24749.1147014343@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> Was this fixed somehow? I don't see any more buildfarm failures, but I
> didn't see a fix go in either. It is very possible the pgport fix I did
> yesterday is related, but I have not applied it yet.

It's an intermittent failure; the fact you don't see any at this instant
doesn't mean it's fixed. I don't think your patch of yesterday is
related, since we're seeing the problem on non-WIN32 machines.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Magnus Hagander <mha(at)sollentuna(dot)net>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: pgstat: delayed write of stats file
Date: 2006-05-08 03:15:58
Message-ID: 200605080315.k483Fwc00755@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > Was this fixed somehow? I don't see any more buildfarm failures, but I
> > didn't see a fix go in either. It is very possible the pgport fix I did
> > yesterday is related, but I have not applied it yet.
>
> It's an intermittent failure; the fact you don't see any at this instant
> doesn't mean it's fixed. I don't think your patch of yesterday is
> related, since we're seeing the problem on non-WIN32 machines.

Have we seen any such failures since the first day they appeared?

--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Magnus Hagander <mha(at)sollentuna(dot)net>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: pgstat: delayed write of stats file
Date: 2006-05-08 03:33:33
Message-ID: 16984.1147059213@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> Have we seen any such failures since the first day they appeared?

agouti blew up about the same time you typed that, so yes it's still
a problem.

http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=agouti&dt=2006-05-08%2003:15:01

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Magnus Hagander <mha(at)sollentuna(dot)net>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: pgstat: delayed write of stats file
Date: 2006-05-30 02:34:34
Message-ID: 200605300234.k4U2YYa21180@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > Have we seen any such failures since the first day they appeared?
>
> agouti blew up about the same time you typed that, so yes it's still
> a problem.
>
> http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=agouti&dt=2006-05-08%2003:15:01

Delay pgstat file write patch reverted.

--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Attachment Content-Type Size
unknown_filename text/plain 8.9 KB

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <mha(at)sollentuna(dot)net>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: pgstat: delayed write of stats file
Date: 2007-02-08 18:53:54
Message-ID: 200702081853.l18IrsB12172@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches


I assume we no longer need this stats patch from May of 2006.

---------------------------------------------------------------------------

Bruce Momjian wrote:
> Tom Lane wrote:
> > Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > > Have we seen any such failures since the first day they appeared?
> >
> > agouti blew up about the same time you typed that, so yes it's still
> > a problem.
> >
> > http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=agouti&dt=2006-05-08%2003:15:01
>
> Delay pgstat file write patch reverted.
>
> --
> Bruce Momjian http://candle.pha.pa.us
> EnterpriseDB http://www.enterprisedb.com
>
> + If your life is a hard drive, Christ can be your backup. +

> Index: src/backend/postmaster/pgstat.c
> ===================================================================
> RCS file: /cvsroot/pgsql/src/backend/postmaster/pgstat.c,v
> retrieving revision 1.123
> retrieving revision 1.124
> diff -c -r1.123 -r1.124
> *** src/backend/postmaster/pgstat.c 20 Apr 2006 10:51:32 -0000 1.123
> --- src/backend/postmaster/pgstat.c 27 Apr 2006 00:06:58 -0000 1.124
> ***************
> ***************
> *** 28,33 ****
> --- 28,34 ----
> #include <arpa/inet.h>
> #include <signal.h>
> #include <time.h>
> + #include <sys/stat.h>
>
> #include "pgstat.h"
>
> ***************
> *** 66,77 ****
> * Timer definitions.
> * ----------
> */
> - #define PGSTAT_STAT_INTERVAL 500 /* How often to write the status file;
> - * in milliseconds. */
>
> ! #define PGSTAT_RESTART_INTERVAL 60 /* How often to attempt to restart a
> ! * failed statistics collector; in
> ! * seconds. */
>
> /* ----------
> * Amount of space reserved in pgstat_recvbuffer().
> --- 67,81 ----
> * Timer definitions.
> * ----------
> */
>
> ! /* How often to write the status file, in milliseconds. */
> ! #define PGSTAT_STAT_INTERVAL (5*60*1000)
> !
> ! /*
> ! * How often to attempt to restart a failed statistics collector; in ms.
> ! * Must be at least PGSTAT_STAT_INTERVAL.
> ! */
> ! #define PGSTAT_RESTART_INTERVAL (5*60*1000)
>
> /* ----------
> * Amount of space reserved in pgstat_recvbuffer().
> ***************
> *** 172,182 ****
> static void pgstat_write_statsfile(void);
> static void pgstat_read_statsfile(HTAB **dbhash, Oid onlydb,
> PgStat_StatBeEntry **betab,
> ! int *numbackends);
> static void backend_read_statsfile(void);
>
> static void pgstat_setheader(PgStat_MsgHdr *hdr, StatMsgType mtype);
> static void pgstat_send(void *msg, int len);
>
> static void pgstat_recv_bestart(PgStat_MsgBestart *msg, int len);
> static void pgstat_recv_beterm(PgStat_MsgBeterm *msg, int len);
> --- 176,187 ----
> static void pgstat_write_statsfile(void);
> static void pgstat_read_statsfile(HTAB **dbhash, Oid onlydb,
> PgStat_StatBeEntry **betab,
> ! int *numbackends, bool rewrite);
> static void backend_read_statsfile(void);
>
> static void pgstat_setheader(PgStat_MsgHdr *hdr, StatMsgType mtype);
> static void pgstat_send(void *msg, int len);
> + static void pgstat_send_rewrite(void);
>
> static void pgstat_recv_bestart(PgStat_MsgBestart *msg, int len);
> static void pgstat_recv_beterm(PgStat_MsgBeterm *msg, int len);
> ***************
> *** 1449,1454 ****
> --- 1454,1477 ----
> #endif
> }
>
> + /*
> + * pgstat_send_rewrite() -
> + *
> + * Send a command to the collector to rewrite the stats file.
> + * ----------
> + */
> + static void
> + pgstat_send_rewrite(void)
> + {
> + PgStat_MsgRewrite msg;
> +
> + if (pgStatSock < 0)
> + return;
> +
> + pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_REWRITE);
> + pgstat_send(&msg, sizeof(msg));
> + }
> +
>
> /* ----------
> * PgstatBufferMain() -
> ***************
> *** 1549,1555 ****
> fd_set rfds;
> int readPipe;
> int len = 0;
> ! struct itimerval timeout;
> bool need_timer = false;
>
> MyProcPid = getpid(); /* reset MyProcPid */
> --- 1572,1578 ----
> fd_set rfds;
> int readPipe;
> int len = 0;
> ! struct itimerval timeout, canceltimeout;
> bool need_timer = false;
>
> MyProcPid = getpid(); /* reset MyProcPid */
> ***************
> *** 1604,1615 ****
> timeout.it_value.tv_sec = PGSTAT_STAT_INTERVAL / 1000;
> timeout.it_value.tv_usec = PGSTAT_STAT_INTERVAL % 1000;
>
> /*
> * Read in an existing statistics stats file or initialize the stats to
> * zero.
> */
> pgStatRunningInCollector = true;
> ! pgstat_read_statsfile(&pgStatDBHash, InvalidOid, NULL, NULL);
>
> /*
> * Create the known backends table
> --- 1627,1641 ----
> timeout.it_value.tv_sec = PGSTAT_STAT_INTERVAL / 1000;
> timeout.it_value.tv_usec = PGSTAT_STAT_INTERVAL % 1000;
>
> + /* Values set to zero will cancel the active timer */
> + MemSet(&canceltimeout, 0, sizeof(struct itimerval));
> +
> /*
> * Read in an existing statistics stats file or initialize the stats to
> * zero.
> */
> pgStatRunningInCollector = true;
> ! pgstat_read_statsfile(&pgStatDBHash, InvalidOid, NULL, NULL, false);
>
> /*
> * Create the known backends table
> ***************
> *** 1764,1769 ****
> --- 1790,1801 ----
> pgstat_recv_analyze((PgStat_MsgAnalyze *) &msg, nread);
> break;
>
> + case PGSTAT_MTYPE_REWRITE:
> + need_statwrite = true;
> + /* Disable the timer - it will be restarted on next data update */
> + setitimer(ITIMER_REAL, &canceltimeout, NULL);
> + break;
> +
> default:
> break;
> }
> ***************
> *** 2344,2350 ****
> */
> static void
> pgstat_read_statsfile(HTAB **dbhash, Oid onlydb,
> ! PgStat_StatBeEntry **betab, int *numbackends)
> {
> PgStat_StatDBEntry *dbentry;
> PgStat_StatDBEntry dbbuf;
> --- 2376,2382 ----
> */
> static void
> pgstat_read_statsfile(HTAB **dbhash, Oid onlydb,
> ! PgStat_StatBeEntry **betab, int *numbackends, bool rewrite)
> {
> PgStat_StatDBEntry *dbentry;
> PgStat_StatDBEntry dbbuf;
> ***************
> *** 2363,2368 ****
> --- 2395,2465 ----
> MemoryContext use_mcxt;
> int mcxt_flags;
>
> +
> + if (rewrite)
> + {
> + /*
> + * To force a rewrite of the stats file from the collector, send
> + * a REWRITE message to the stats collector. Then wait for the file
> + * to change. On Unix, we wait for the inode to change (as the file
> + * is renamed into place from a different file). Win32 has no concept
> + * of inodes, so we wait for the date on the file to change instead.
> + * We can do this on win32 because we have high-res timing on the
> + * file dates, but we can't on unix, because it has 1sec resolution
> + * on the fields in struct stat.
> + */
> + int i;
> + #ifndef WIN32
> + struct stat st1, st2;
> +
> + if (stat(PGSTAT_STAT_FILENAME, &st1))
> + {
> + /* Assume no file there yet */
> + st1.st_ino = 0;
> + }
> + st2.st_ino = 0;
> + #else
> + WIN32_FILE_ATTRIBUTE_DATA fd1, fd2;
> +
> + if (!GetFileAttributesEx(PGSTAT_STAT_FILENAME, GetFileExInfoStandard, &fd1))
> + {
> + fd1.ftLastWriteTime.dwLowDateTime = 0;
> + fd1.ftLastWriteTime.dwHighDateTime = 0;
> + }
> + fd2.ftLastWriteTime.dwLowDateTime = 0;
> + fd2.ftLastWriteTime.dwHighDateTime = 0;
> + #endif
> +
> +
> + /* Send rewrite message */
> + pgstat_send_rewrite();
> +
> + /* Now wait for the file to change */
> + for (i=0; i < 50; i++)
> + {
> + #ifndef WIN32
> + if (!stat(PGSTAT_STAT_FILENAME, &st2))
> + {
> + if (st2.st_ino != st1.st_ino)
> + break;
> + }
> + #else
> + if (GetFileAttributesEx(PGSTAT_STAT_FILENAME, GetFileExInfoStandard, &fd2))
> + {
> + if (fd1.ftLastWriteTime.dwLowDateTime != fd2.ftLastWriteTime.dwLowDateTime ||
> + fd1.ftLastWriteTime.dwHighDateTime != fd2.ftLastWriteTime.dwHighDateTime)
> + break;
> + }
> + #endif
> +
> + pg_usleep(50000);
> + }
> + if (i >= 50)
> + ereport(WARNING,
> + (errmsg("pgstat update timeout")));
> + /* Fallthrough and read the old file anyway - old data better than no data */
> + }
> +
> /*
> * If running in the collector or the autovacuum process, we use the
> * DynaHashCxt memory context. If running in a backend, we use the
> ***************
> *** 2681,2687 ****
> return;
> Assert(!pgStatRunningInCollector);
> pgstat_read_statsfile(&pgStatDBHash, InvalidOid,
> ! &pgStatBeTable, &pgStatNumBackends);
> }
> else
> {
> --- 2778,2784 ----
> return;
> Assert(!pgStatRunningInCollector);
> pgstat_read_statsfile(&pgStatDBHash, InvalidOid,
> ! &pgStatBeTable, &pgStatNumBackends, true);
> }
> else
> {
> ***************
> *** 2691,2697 ****
> {
> Assert(!pgStatRunningInCollector);
> pgstat_read_statsfile(&pgStatDBHash, MyDatabaseId,
> ! &pgStatBeTable, &pgStatNumBackends);
> pgStatDBHashXact = topXid;
> }
> }
> --- 2788,2794 ----
> {
> Assert(!pgStatRunningInCollector);
> pgstat_read_statsfile(&pgStatDBHash, MyDatabaseId,
> ! &pgStatBeTable, &pgStatNumBackends, true);
> pgStatDBHashXact = topXid;
> }
> }
> Index: src/include/pgstat.h
> ===================================================================
> RCS file: /cvsroot/pgsql/src/include/pgstat.h,v
> retrieving revision 1.43
> retrieving revision 1.44
> diff -c -r1.43 -r1.44
> *** src/include/pgstat.h 6 Apr 2006 20:38:00 -0000 1.43
> --- src/include/pgstat.h 27 Apr 2006 00:06:59 -0000 1.44
> ***************
> *** 32,38 ****
> PGSTAT_MTYPE_RESETCOUNTER,
> PGSTAT_MTYPE_AUTOVAC_START,
> PGSTAT_MTYPE_VACUUM,
> ! PGSTAT_MTYPE_ANALYZE
> } StatMsgType;
>
> /* ----------
> --- 32,39 ----
> PGSTAT_MTYPE_RESETCOUNTER,
> PGSTAT_MTYPE_AUTOVAC_START,
> PGSTAT_MTYPE_VACUUM,
> ! PGSTAT_MTYPE_ANALYZE,
> ! PGSTAT_MTYPE_REWRITE
> } StatMsgType;
>
> /* ----------
> ***************
> *** 108,113 ****
> --- 109,123 ----
> } PgStat_MsgDummy;
>
> /* ----------
> + * PgStat_MsgRewrite Sent by backends to cause a rewrite of the stats file
> + * ----------
> + */
> + typedef struct Pgstat_MsgRewrite
> + {
> + PgStat_MsgHdr m_hdr;
> + } PgStat_MsgRewrite;
> +
> + /* ----------
> * PgStat_MsgBestart Sent by the backend on startup
> * ----------
> */

>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +