Re: Location for pgstat.stat

Lists: pgsql-hackers
From: Magnus Hagander <magnus(at)hagander(dot)net>
To: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Location for pgstat.stat
Date: 2008-07-01 18:48:41
Message-ID: 486A7C09.2060301@hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Per this thread:
http://archives.postgresql.org/pgsql-general/2007-12/msg00255.php

it was pretty much (again, IIRC) concluded that we want "some better
way" to transfer the stats data.

But pending that we have that, how about we just move it into it's own
subdirectory? AFAICS that would be a simple change of two #defines
moving it from "global/pgstat.stat" to "pgstat/pgstat.stat" or something
like that. Might also need some code to create the directory if it
doesn't exist, but that shouldn't be hard.

This would make it possible to symlink or mount that directory off to a
ramdrive (for example).

It's not a perfect solution, but it would at least give a better tool
than we have today, no?

//Magnus


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Location for pgstat.stat
Date: 2008-07-01 18:56:02
Message-ID: 17634.1214938562@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> But pending that we have that, how about we just move it into it's own
> subdirectory?
> This would make it possible to symlink or mount that directory off to a
> ramdrive (for example).

Hmm ... that would almost certainly result in the stats being lost over
a system shutdown. How much do we care?

regards, tom lane


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Location for pgstat.stat
Date: 2008-07-01 18:58:53
Message-ID: 486A7E6D.7020509@hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>> But pending that we have that, how about we just move it into it's own
>> subdirectory?
>> This would make it possible to symlink or mount that directory off to a
>> ramdrive (for example).
>
> Hmm ... that would almost certainly result in the stats being lost over
> a system shutdown. How much do we care?

Only for those who put it on a ramdrive. The default, unless you
move/sync it off, would still be the same as it is today. While not
perfect, the performance difference of going to a ramdrive might easily
be enough to offset that in some cases, I think.

//Magnus


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Location for pgstat.stat
Date: 2008-07-01 19:11:03
Message-ID: 17913.1214939463@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> Tom Lane wrote:
>> Hmm ... that would almost certainly result in the stats being lost over
>> a system shutdown. How much do we care?

> Only for those who put it on a ramdrive. The default, unless you
> move/sync it off, would still be the same as it is today. While not
> perfect, the performance difference of going to a ramdrive might easily
> be enough to offset that in some cases, I think.

Well, what I was wondering about is whether it'd be worth adding logic
to copy the file to/from a "safer" location at startup/shutdown.

regards, tom lane


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Location for pgstat.stat
Date: 2008-07-01 19:18:24
Message-ID: 486A8300.1000909@hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>> Tom Lane wrote:
>>> Hmm ... that would almost certainly result in the stats being lost over
>>> a system shutdown. How much do we care?
>
>> Only for those who put it on a ramdrive. The default, unless you
>> move/sync it off, would still be the same as it is today. While not
>> perfect, the performance difference of going to a ramdrive might easily
>> be enough to offset that in some cases, I think.
>
> Well, what I was wondering about is whether it'd be worth adding logic
> to copy the file to/from a "safer" location at startup/shutdown.

Oh, I see. I should think more before I answer sometimes :-)

Not sure. I guess my own personal concern would be how badly is
autovacuum affected by having to start off a blank set of stats? Any
other uses I have I think are capable of dealing with reset-to-zero states.

//Magnus


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Location for pgstat.stat
Date: 2008-07-01 19:27:50
Message-ID: 20080701192750.GL18252@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander wrote:

> Not sure. I guess my own personal concern would be how badly is
> autovacuum affected by having to start off a blank set of stats? Any
> other uses I have I think are capable of dealing with reset-to-zero states.

Well, it doesn't :-) No database or table will be processed until stat
entries are created, and then I think it will first wait until enough
activity gathers to take any actions at all.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Location for pgstat.stat
Date: 2008-07-01 19:34:45
Message-ID: 486A86D5.7060501@hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera wrote:
> Magnus Hagander wrote:
>
>> Not sure. I guess my own personal concern would be how badly is
>> autovacuum affected by having to start off a blank set of stats? Any
>> other uses I have I think are capable of dealing with reset-to-zero states.
>
> Well, it doesn't :-) No database or table will be processed until stat
> entries are created, and then I think it will first wait until enough
> activity gathers to take any actions at all.

That's not actualliy not affected, but it does seem like it wouldn't be
a very big issue. If one table was just about to be vacuumed or
analyzed, this would just push it up to twice the threshold, right?

//Magnus


From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Location for pgstat.stat
Date: 2008-07-01 19:48:56
Message-ID: Pine.GSO.4.64.0807011539050.8907@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 1 Jul 2008, Tom Lane wrote:

> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>> Tom Lane wrote:
>>> Hmm ... that would almost certainly result in the stats being lost over
>>> a system shutdown. How much do we care?
>
>> Only for those who put it on a ramdrive. The default, unless you
>> move/sync it off, would still be the same as it is today. While not
>> perfect, the performance difference of going to a ramdrive might easily
>> be enough to offset that in some cases, I think.
>
> Well, what I was wondering about is whether it'd be worth adding logic
> to copy the file to/from a "safer" location at startup/shutdown.

Anyone who needs fast stats storage enough that they're going to symlink
it to RAM should be perfectly capable of scripting server startup/shutdown
to shuffle that to/from a more permanent location. Compared to the admin
chores you're likely to encounter before reaching that scale it's a pretty
easy job, and it's not like losing that data file is a giant loss in any
case. The only thing I could see putting into the server code to help
support this situation is rejecting an old stats file and starting from
scratch instead if they restored a previous version after a crash that
didn't save an updated copy.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Location for pgstat.stat
Date: 2008-07-01 20:02:26
Message-ID: 18884.1214942546@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> Alvaro Herrera wrote:
>> Well, it doesn't :-) No database or table will be processed until stat
>> entries are created, and then I think it will first wait until enough
>> activity gathers to take any actions at all.

> That's not actualliy not affected, but it does seem like it wouldn't be
> a very big issue. If one table was just about to be vacuumed or
> analyzed, this would just push it up to twice the threshold, right?

Except you could lather, rinse, repeat indefinitely.

The stats system started out with the idea that the stats were
disposable, but I don't really think that's an acceptable behavior
today. We don't even have stats_reset_on_server_start anymore.

It doesn't seem to me that it'd be hard to support two locations for the
stats file --- it'd just take another parameter to the read and write
routines. pgstat.c already knows the difference between a normal write
and a shutdown write ...

regards, tom lane


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Location for pgstat.stat
Date: 2008-07-02 14:10:10
Message-ID: 486B8C42.7090009@hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>> Alvaro Herrera wrote:
>>> Well, it doesn't :-) No database or table will be processed until stat
>>> entries are created, and then I think it will first wait until enough
>>> activity gathers to take any actions at all.
>
>> That's not actualliy not affected, but it does seem like it wouldn't be
>> a very big issue. If one table was just about to be vacuumed or
>> analyzed, this would just push it up to twice the threshold, right?
>
> Except you could lather, rinse, repeat indefinitely.

Yeha, but if you do that, you certainly have other problems as well....

> The stats system started out with the idea that the stats were
> disposable, but I don't really think that's an acceptable behavior
> today. We don't even have stats_reset_on_server_start anymore.

Good point.

> It doesn't seem to me that it'd be hard to support two locations for the
> stats file --- it'd just take another parameter to the read and write
> routines. pgstat.c already knows the difference between a normal write
> and a shutdown write ...

Right. Should it be removed from the permanent location when the server
starts? Otherwise, if it crashes, we'll pick up the old, stale, version
of the file since it didn't have a chance to get saved away. Better to
start from an empty file, or to start from one that has old data in it?

//Magnus


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Location for pgstat.stat
Date: 2008-07-02 14:28:01
Message-ID: 10918.1215008881@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> Tom Lane wrote:
>> It doesn't seem to me that it'd be hard to support two locations for the
>> stats file --- it'd just take another parameter to the read and write
>> routines. pgstat.c already knows the difference between a normal write
>> and a shutdown write ...

> Right. Should it be removed from the permanent location when the server
> starts?

Yes, I would say so. There are two possible exit paths: normal shutdown
(where we'd write a new file) and crash. In a crash we'd wish to delete
the file anyway for fear that it's corrupted.

Startup: read permanent file, then delete it.

Post-crash: remove any permanent file (same as now)

Shutdown: write permanent file.

Normal stats collector write: write temp file.

Backend stats fetch: read temp file.

regards, tom lane


From: Decibel! <decibel(at)decibel(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Location for pgstat.stat
Date: 2008-07-02 23:47:41
Message-ID: 4779C757-2605-4315-B93E-1610CFC5466F@decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jul 1, 2008, at 3:02 PM, Tom Lane wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>> Alvaro Herrera wrote:
>>> Well, it doesn't :-) No database or table will be processed
>>> until stat
>>> entries are created, and then I think it will first wait until
>>> enough
>>> activity gathers to take any actions at all.
>
>> That's not actualliy not affected, but it does seem like it
>> wouldn't be
>> a very big issue. If one table was just about to be vacuumed or
>> analyzed, this would just push it up to twice the threshold, right?
>
> Except you could lather, rinse, repeat indefinitely.
>
> The stats system started out with the idea that the stats were
> disposable, but I don't really think that's an acceptable behavior
> today. We don't even have stats_reset_on_server_start anymore.
>
> It doesn't seem to me that it'd be hard to support two locations
> for the
> stats file --- it'd just take another parameter to the read and write
> routines. pgstat.c already knows the difference between a normal
> write
> and a shutdown write ...

Leaving the realm of "an easy change", what about periodically (once
a minute?) writing stats to a real table? That means we should never
have to suffer corrupted or lost stats on a crash. Along the same
lines, perhaps we can just keep updates in shared memory instead of
in a file, since that's proven to cause issues for some people.
--
Decibel!, aka Jim C. Nasby, Database Architect decibel(at)decibel(dot)org
Give your computer some brain candy! www.distributed.net Team #1828


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Decibel! <decibel(at)decibel(dot)org>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Location for pgstat.stat
Date: 2008-07-03 00:51:59
Message-ID: 19314.1215046319@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Decibel! <decibel(at)decibel(dot)org> writes:
> Leaving the realm of "an easy change", what about periodically (once
> a minute?) writing stats to a real table?

The ensuing vacuum overhead seems a sufficient reason why not.

regards, tom lane


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Location for pgstat.stat
Date: 2008-08-04 10:40:34
Message-ID: 4896DCA2.4050106@hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>> Tom Lane wrote:
>>> It doesn't seem to me that it'd be hard to support two locations for the
>>> stats file --- it'd just take another parameter to the read and write
>>> routines. pgstat.c already knows the difference between a normal write
>>> and a shutdown write ...
>
>> Right. Should it be removed from the permanent location when the server
>> starts?
>
> Yes, I would say so. There are two possible exit paths: normal shutdown
> (where we'd write a new file) and crash. In a crash we'd wish to delete
> the file anyway for fear that it's corrupted.
>
> Startup: read permanent file, then delete it.
>
> Post-crash: remove any permanent file (same as now)
>
> Shutdown: write permanent file.
>
> Normal stats collector write: write temp file.
>
> Backend stats fetch: read temp file.

Attached is a patch that implements this. I went with the option of just
storing it in a temporary directory that can be symlinked, and not
bothering with a GUC for it. Comments? (documentation updates are also
needed, but I'll wait with those until I hear patch comments :-P)

//Magnus

Attachment Content-Type Size
pgstat_location.diff text/x-diff 8.6 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Location for pgstat.stat
Date: 2008-08-04 15:37:36
Message-ID: 5238.1217864256@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> Attached is a patch that implements this. I went with the option of just
> storing it in a temporary directory that can be symlinked, and not
> bothering with a GUC for it. Comments? (documentation updates are also
> needed, but I'll wait with those until I hear patch comments :-P)

Looks alright in a fast once-over (I didn't test it). Two comments:
Treating the directory as something to create in initdb means you'll
need to bump catversion when you apply it. I'm not sure where you are
planning to document, but there should at least be a mention in the
"database physical layout" chapter, since that's supposed to enumerate
all the subdirectories of $PGDATA.

regards, tom lane


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Location for pgstat.stat
Date: 2008-08-05 11:45:01
Message-ID: 48983D3D.3060502@hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>> Attached is a patch that implements this. I went with the option of just
>> storing it in a temporary directory that can be symlinked, and not
>> bothering with a GUC for it. Comments? (documentation updates are also
>> needed, but I'll wait with those until I hear patch comments :-P)
>
> Looks alright in a fast once-over (I didn't test it).

That's what I was after. I tested it myself, obviously :-) Not promising
zero bugs, but I was looking for the comment on the approach. So thanks!

> Two comments:
> Treating the directory as something to create in initdb means you'll
> need to bump catversion when you apply it.

Yeah, i meant to do that as part of the commit. But thanks for the
reminder anyway!

> I'm not sure where you are
> planning to document, but there should at least be a mention in the
> "database physical layout" chapter, since that's supposed to enumerate
> all the subdirectories of $PGDATA.

I'm putting it under "configuring the statistics collector". And I'll
add a directory in that section - had missed that.

//Magnus