Quick Links

reducing statistics write overhead

Lists:	pgsql-hackers

From:	Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com>
To:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	reducing statistics write overhead
Date:	2008-09-05 19:00:36
Message-ID:	48C181D4.8030807@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Howdy,

The statistics collector currently dumps the stats file at every 500ms. This
is a major problem if the file becomes large -- occasionally we've been forced
to disable stats collection to cope with it. Another issue is that while the
file is frequently written, it is seldom read. Typically once a minute -
autovacuum plus possible user initiated stats queries.

So, as a simple optimization I am proposing that the file should be only
written when some backend requests statistics. This would significantly reduce
the undesired write traffic at the cost of slightly slower stats access.

Attached is a WIP patch, which basically implements this:

Disable periodic writing of the stats file. Introduce new stats message type -
PGSTAT_MTYPE_INQUIRY. Backends send this to notify collector that stats is needed.
Pid of the requestor is provided in the message. Backend then installs an alarm
handler and starts a timer. Collector processes the messages and compiles a list
of pids to be notified. If there are any, the stats file is written and SIGALRM
is sent to the requestors. Backend then proceeds to read the stats file a usual.

Thoughts, comments?

regards,
Martin

Attachment	Content-Type	Size
stat-write.patch	text/x-diff	10.7 KB

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: reducing statistics write overhead
Date:	2008-09-05 19:23:18
Message-ID:	146.1220642598@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com> writes:
> So, as a simple optimization I am proposing that the file should be
> only written when some backend requests statistics. This would
> significantly reduce the undesired write traffic at the cost of
> slightly slower stats access.

How necessary is this given the recent fixes to allow the stats file to
be kept on a ramdisk?

> Attached is a WIP patch, which basically implements this:

This patch breaks deadlock checking and statement_timeout, because
backends already use SIGALRM. You can't just take over that signal.
It's possible that you could get things to work by treating this as an
additional reason for SIGALRM, but that code is unreasonably complex
already. I'd suggest finding some other way.

regards, tom lane

From:	Joshua Drake <jd(at)commandprompt(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: reducing statistics write overhead
Date:	2008-09-05 19:27:08
Message-ID:	20080905122708.462a0f44@jd-laptop
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, 05 Sep 2008 15:23:18 -0400
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com> writes:
> > So, as a simple optimization I am proposing that the file should be
> > only written when some backend requests statistics. This would
> > significantly reduce the undesired write traffic at the cost of
> > slightly slower stats access.
>
> How necessary is this given the recent fixes to allow the stats file
> to be kept on a ramdisk?

From an usability and integration perspective this patch is a nice
touch. On demand is a nice feature when used correctly.

Sincerely,

Joshua D. Drake

--
The PostgreSQL Company since 1997: http://www.commandprompt.com/
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

From:	Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: reducing statistics write overhead
Date:	2008-09-05 19:51:36
Message-ID:	48C18DC8.90608@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com> writes:
>> So, as a simple optimization I am proposing that the file should be
>> only written when some backend requests statistics. This would
>> significantly reduce the undesired write traffic at the cost of
>> slightly slower stats access.
>
> How necessary is this given the recent fixes to allow the stats file to
> be kept on a ramdisk?
>

Ramdisk helps, but requires additional effort to set up. Also the stats
file has a tendency to creep up on you -- as the database evolves the file
size gradually increases and suddenly the DBA is left wondering what
happened to performance.

>> Attached is a WIP patch, which basically implements this:
>
> This patch breaks deadlock checking and statement_timeout, because
> backends already use SIGALRM. You can't just take over that signal.
> It's possible that you could get things to work by treating this as an
> additional reason for SIGALRM, but that code is unreasonably complex
> already. I'd suggest finding some other way.
>

I suspected that, but somehow managed to overlook it :( I guess it was
too tempting to use it. I'll start looking for alternatives.

regards,
Martin

From:	Euler Taveira de Oliveira <euler(at)timbira(dot)com>
To:	Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: reducing statistics write overhead
Date:	2008-09-05 23:29:50
Message-ID:	48C1C0EE.9050702@timbira.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Martin Pihlak escreveu:
> I suspected that, but somehow managed to overlook it :( I guess it was
> too tempting to use it. I'll start looking for alternatives.
>
If you can't afford a 500 msec pgstat time, then you need to make it
tunable. Another ideas are (i) turn on/off pgstat per table or database
and (ii) make the pgstat time tunable per table or database. You can use
the reloptions column to store these info. These workarounds are much
simpler than that you proposed and they're almost for free.

--
Euler Taveira de Oliveira
http://www.timbira.com/

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Euler Taveira de Oliveira <euler(at)timbira(dot)com>
Cc:	Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: reducing statistics write overhead
Date:	2008-09-06 00:24:37
Message-ID:	7424.1220660677@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Euler Taveira de Oliveira <euler(at)timbira(dot)com> writes:
> If you can't afford a 500 msec pgstat time, then you need to make it
> tunable. Another ideas are (i) turn on/off pgstat per table or database
> and (ii) make the pgstat time tunable per table or database. You can use
> the reloptions column to store these info. These workarounds are much
> simpler than that you proposed and they're almost for free.

For normal usage on-demand dumping would be a really good thing; it'd
cut the overhead of having stats on tremendously, especially for people
who don't really use 'em. The particular signaling proposed here is
bogus, but if Martin can make it work in a cleaner fashion I think it's
likely a good idea.

regards, tom lane

From:	"Asko Oja" <ascoja(at)gmail(dot)com>
To:	"Euler Taveira de Oliveira" <euler(at)timbira(dot)com>
Cc:	"Martin Pihlak" <martin(dot)pihlak(at)gmail(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Pg Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: reducing statistics write overhead
Date:	2008-09-06 07:22:10
Message-ID:	ecd779860809060022ucfdef9dxe173339ec4b84d83@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Sep 6, 2008 at 2:29 AM, Euler Taveira de Oliveira <euler(at)timbira(dot)com
> wrote:

> Martin Pihlak escreveu:
> > I suspected that, but somehow managed to overlook it :( I guess it was
> > too tempting to use it. I'll start looking for alternatives.
> >
> If you can't afford a 500 msec pgstat time, then you need to make it
> tunable.

Additional parameter in config file. Not good.

> Another ideas are (i) turn on/off pgstat per table or database
> and (ii) make the pgstat time tunable per table or database. You can use
> the reloptions column to store these info. These workarounds are much
> simpler than that you proposed and they're almost for free.
>
Does not seem simple to me. Why would dba's want extra management work. We
want all the stats to be there as we don't know when we need to look at it.

>
>
> --
> Euler Taveira de Oliveira
> http://www.timbira.com/
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: reducing statistics write overhead
Date:	2008-09-06 15:49:24
Message-ID:	1220716164.4371.1328.camel@ebony.2ndQuadrant
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, 2008-09-05 at 15:23 -0400, Tom Lane wrote:

> How necessary is this given the recent fixes to allow the stats file to
> be kept on a ramdisk?

I would prefer this approach and back-out the other change.

On-demand is cheaper and easier to use.

> > Attached is a WIP patch, which basically implements this:
>
> This patch breaks deadlock checking and statement_timeout, because
> backends already use SIGALRM. You can't just take over that signal.
> It's possible that you could get things to work by treating this as an
> additional reason for SIGALRM, but that code is unreasonably complex
> already. I'd suggest finding some other way.

There are other ways already in use in backend, so just use those.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: reducing statistics write overhead
Date:	2008-09-06 17:52:43
Message-ID:	20555.1220723563@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> On Fri, 2008-09-05 at 15:23 -0400, Tom Lane wrote:
>> How necessary is this given the recent fixes to allow the stats file to
>> be kept on a ramdisk?

> I would prefer this approach and back-out the other change.

Even if we get on-demand done, I wouldn't see it as a reason to back out
the statfile relocation work. In an environment where the stats are
demanded frequently, you could still need that for performance.

(In fact, maybe this patch ought to include some sort of maximum update
rate tunable? The worst case behavior could actually be WORSE than now.)

regards, tom lane

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: reducing statistics write overhead
Date:	2008-09-06 18:02:17
Message-ID:	20080906180217.GA4051@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane escribió:

> (In fact, maybe this patch ought to include some sort of maximum update
> rate tunable? The worst case behavior could actually be WORSE than now.)

Some sort of "if stats were requested in the last 500 ms, just tell the
requester to read the existing file".

Things that come to mind:

- autovacuum could use a more frequent stats update in certain cases

- Maybe we oughta have separate files, one for each database? That way
we'd reduce unnecessary I/O traffic for both the reader and the writer.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: reducing statistics write overhead
Date:	2008-09-06 18:07:02
Message-ID:	20796.1220724422@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Some sort of "if stats were requested in the last 500 ms, just tell the
> requester to read the existing file".

Hmm, I was thinking of delaying both the write and the reply signal
until 500ms had elapsed. But the above behavior would certainly be
easier to implement, and would probably be good enough (TM).

> - Maybe we oughta have separate files, one for each database? That way
> we'd reduce unnecessary I/O traffic for both the reader and the writer.

The signaling would become way too complex, I think. Also what do you
do about shared tables?

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: reducing statistics write overhead
Date:	2008-09-06 18:12:32
Message-ID:	20888.1220724752@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Some sort of "if stats were requested in the last 500 ms, just tell the
> requester to read the existing file".

> Things that come to mind:

> - autovacuum could use a more frequent stats update in certain cases

BTW, we could implement that by, instead of having a global tunable,
including a field in the request message saying how stale an existing
file is acceptable for this requestor. 500ms might be the standard
value but autovac could use a smaller number.

regards, tom lane

From:	"Asko Oja" <ascoja(at)gmail(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Martin Pihlak" <martin(dot)pihlak(at)gmail(dot)com>, "Pg Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: reducing statistics write overhead
Date:	2008-09-06 19:45:13
Message-ID:	ecd779860809061245m1d1d69c5of5104b890d466dda@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Too frequent read protection is already handled in the patch but these
comments might lead it into new directions. Current implementation had this
same limit that file was written no more than once per 500 ms.

On Sat, Sep 6, 2008 at 9:12 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> > Some sort of "if stats were requested in the last 500 ms, just tell the
> > requester to read the existing file".
>
> > Things that come to mind:
>
> > - autovacuum could use a more frequent stats update in certain cases
>
> BTW, we could implement that by, instead of having a global tunable,
> including a field in the request message saying how stale an existing
> file is acceptable for this requestor. 500ms might be the standard
> value but autovac could use a smaller number.
>
> regards, tom lane
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: reducing statistics write overhead
Date:	2008-09-06 22:30:37
Message-ID:	20080906223037.GC4051@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane escribió:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:

> > - Maybe we oughta have separate files, one for each database? That way
> > we'd reduce unnecessary I/O traffic for both the reader and the writer.
>
> The signaling would become way too complex, I think. Also what do you
> do about shared tables?

They are already stored in a "separate database" (denoted with
InvalidOid dbid), and autovacuum grabs it separately. I admit I don't
know what do regular backends do about it.

As for signalling, maybe we could implement something like we do for the
postmaster signal stuff: the requestor stores a dbid in shared memory
and sends a SIGUSR2 to pgstat or some such. We'd have enough shmem
space for a reasonable number of requests, and pgstat consumes them from
there into local memory (similar to what Andrew proposes for
LISTEN/NOTIFY); it stores the dbid and PID of the requestor. As soon as
the request has been fulfilled, pgstat responds by <fill in magical
mechanism that Martin is about to propose> to that particular backend.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: reducing statistics write overhead
Date:	2008-09-06 22:38:45
Message-ID:	26321.1220740725@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> As for signalling, maybe we could implement something like we do for the
> postmaster signal stuff: the requestor stores a dbid in shared memory
> and sends a SIGUSR2 to pgstat or some such.

No, no, no. Martin already had a perfectly sane design for that
direction of signalling: send a special stats message to the collector.
That can carry whatever baggage it needs to. It's the reverse direction
of "the data you requested is available now, sir" that is tricky.
And I fear that having to keep track of multiple stats-collector output
files would make it very significantly trickier --- both for the stats
collector's own bookkeeping and for the signaling mechanism itself.
I don't believe it's gonna be worth that.

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: reducing statistics write overhead
Date:	2008-09-06 22:45:33
Message-ID:	26440.1220741133@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I wrote:
> No, no, no. Martin already had a perfectly sane design for that
> direction of signalling: send a special stats message to the collector.

Actually ... given that the stats message mechanism is designed to be
lossy under high load, maybe that isn't so sane. At the very least
there would have to be timeout-and-resend logic on the backend side.

I dislike the alternative of communicating through shared memory,
though. Right now the stats collector isn't even connected to shared
memory.

regards, tom lane

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: reducing statistics write overhead
Date:	2008-09-07 13:09:25
Message-ID:	48C3D285.1040304@hagander.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Martin Pihlak wrote:
>>> Attached is a WIP patch, which basically implements this:
>> This patch breaks deadlock checking and statement_timeout, because
>> backends already use SIGALRM. You can't just take over that signal.
>> It's possible that you could get things to work by treating this as an
>> additional reason for SIGALRM, but that code is unreasonably complex
>> already. I'd suggest finding some other way.
>>
>
> I suspected that, but somehow managed to overlook it :( I guess it was
> too tempting to use it. I'll start looking for alternatives.

I wrote a patch for this some time back, that was actually applied.
Turns out it didn't work, and I ran out of time to fix it, so it was
backed out again. And then I forgot about it :-) If you look through the
cvs history of pgstat you should be able to find it - maybe it can give
you some further ideas.

//Magnus

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: reducing statistics write overhead
Date:	2008-09-07 13:11:24
Message-ID:	48C3D2FC.1040303@hagander.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
>> As for signalling, maybe we could implement something like we do for the
>> postmaster signal stuff: the requestor stores a dbid in shared memory
>> and sends a SIGUSR2 to pgstat or some such.
>
> No, no, no. Martin already had a perfectly sane design for that
> direction of signalling: send a special stats message to the collector.
> That can carry whatever baggage it needs to. It's the reverse direction
> of "the data you requested is available now, sir" that is tricky.

IIRC, my previous patch looked at the inode of the stats file, then sent
of the "gimme a new file" signal, and then read the file once the inode
change.

But also IIRC, that's the area where there was a problem - sometimes it
didn't properly pick up changes...

//Magnus

From:	Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: reducing statistics write overhead
Date:	2008-09-07 18:02:51
Message-ID:	6B5B6064-2C5E-46F9-A24B-20A965BA9A59@hi-media.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

Le 7 sept. 08 à 00:45, Tom Lane a écrit :
> I dislike the alternative of communicating through shared memory,
> though. Right now the stats collector isn't even connected to shared
> memory.

Maybe Markus Wanner work for Postgres-R internal messaging, now it has
been reworked to follow your advices, could be of some use here?
http://archives.postgresql.org/pgsql-hackers/2008-07/msg01114.php
http://archives.postgresql.org/pgsql-hackers/2008-07/msg01420.php

Regards,
- --
dim

- --
Dimitri Fontaine
PostgreSQL DBA, Architecte

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iEYEARECAAYFAkjEF0sACgkQlBXRlnbh1bl/FACeORN+NjEFC9wi22suNaSoWmi5
LBEAnj9Qo2E6GWqVjdtsSCG7JILBPmX6
=5jPo
-----END PGP SIGNATURE-----

From:	Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com>
To:	Magnus Hagander <magnus(at)hagander(dot)net>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: reducing statistics write overhead
Date:	2008-09-07 19:47:44
Message-ID:	48C42FE0.3050902@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Magnus Hagander wrote:
> I wrote a patch for this some time back, that was actually applied.
> Turns out it didn't work, and I ran out of time to fix it, so it was
> backed out again. And then I forgot about it :-) If you look through the
> cvs history of pgstat you should be able to find it - maybe it can give
> you some further ideas.

Got it - this was 1.126. Looks very familiar indeed :)

I had also previously experimented with stat() based polling but ran into
the same issues - no portable high resolution timestamp on files. I guess
stat() is unusable unless we can live with 1 second update interval for the
stats (eg. backend reads the file if it is within 1 second of the request).

One alternative is to include a timestamp in the stats file header - the
backend can then wait on that -- check the timestamp, sleep, resend the
request, loop. Not particularly elegant, but easy to implement. Would this
be acceptable?

regards,
Martin

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com>
Cc:	Magnus Hagander <magnus(at)hagander(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: reducing statistics write overhead
Date:	2008-09-07 19:52:47
Message-ID:	28445.1220817167@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com> writes:
> I had also previously experimented with stat() based polling but ran into
> the same issues - no portable high resolution timestamp on files. I guess
> stat() is unusable unless we can live with 1 second update interval for the
> stats (eg. backend reads the file if it is within 1 second of the request).

> One alternative is to include a timestamp in the stats file header - the
> backend can then wait on that -- check the timestamp, sleep, resend the
> request, loop. Not particularly elegant, but easy to implement. Would this
> be acceptable?

Timestamp within the file is certainly better than trying to rely on
filesystem timestamps. I doubt 1 sec resolution is good enough, and
I'd also be worried about issues like clock skew between the
postmaster's time and the filesystem's time.

regards, tom lane

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: reducing statistics write overhead
Date:	2008-09-08 07:50:58
Message-ID:	48C4D962.40205@hagander.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Martin Pihlak wrote:
> Magnus Hagander wrote:
>> I wrote a patch for this some time back, that was actually applied.
>> Turns out it didn't work, and I ran out of time to fix it, so it was
>> backed out again. And then I forgot about it :-) If you look through the
>> cvs history of pgstat you should be able to find it - maybe it can give
>> you some further ideas.
>
> Got it - this was 1.126. Looks very familiar indeed :)

:-)

> I had also previously experimented with stat() based polling but ran into
> the same issues - no portable high resolution timestamp on files. I guess
> stat() is unusable unless we can live with 1 second update interval for the
> stats (eg. backend reads the file if it is within 1 second of the request).

If the filesystem has inodes, noticing that the inode changes should
usually be enough, no? Since we always create a new file and rename it
into place, there is no way that inode can be reused right away. But it
does require that the stat info is not cached somewhere in between...

//Magnus

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: reducing statistics write overhead
Date:	2008-09-08 07:52:00
Message-ID:	48C4D9A0.50207@hagander.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com> writes:
>> I had also previously experimented with stat() based polling but ran into
>> the same issues - no portable high resolution timestamp on files. I guess
>> stat() is unusable unless we can live with 1 second update interval for the
>> stats (eg. backend reads the file if it is within 1 second of the request).
>
>> One alternative is to include a timestamp in the stats file header - the
>> backend can then wait on that -- check the timestamp, sleep, resend the
>> request, loop. Not particularly elegant, but easy to implement. Would this
>> be acceptable?
>
> Timestamp within the file is certainly better than trying to rely on
> filesystem timestamps. I doubt 1 sec resolution is good enough, and

We'd need half a second resolution just to keep up with the level we
have *now*, don't we?

> I'd also be worried about issues like clock skew between the
> postmaster's time and the filesystem's time.

Can that even happen on a local filesystem? I guess you could put the
file on NFS though, but that seems to be.. eh. sub-optimal.. in more
than one way..

//Magnus

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Magnus Hagander <magnus(at)hagander(dot)net>
Cc:	Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: reducing statistics write overhead
Date:	2008-09-08 12:15:03
Message-ID:	12403.1220876103@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> Tom Lane wrote:
>> I'd also be worried about issues like clock skew between the
>> postmaster's time and the filesystem's time.

> Can that even happen on a local filesystem? I guess you could put the
> file on NFS though, but that seems to be.. eh. sub-optimal.. in more
> than one way..

Lots of people use NAS/SAN type setups.

regards, tom lane

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: reducing statistics write overhead
Date:	2008-09-08 13:05:06
Message-ID:	48C52302.5020101@hagander.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>> Tom Lane wrote:
>>> I'd also be worried about issues like clock skew between the
>>> postmaster's time and the filesystem's time.
>
>> Can that even happen on a local filesystem? I guess you could put the
>> file on NFS though, but that seems to be.. eh. sub-optimal.. in more
>> than one way..
>
> Lots of people use NAS/SAN type setups.

For NAS it could happen, but certainly not for SAN, no? SANs have all
the filesystem processing in the kernel of the server...

I still agree that it's not a good thing to rely on, though :-)

//Magnus

From:	Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Magnus Hagander <magnus(at)hagander(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: reducing statistics write overhead
Date:	2008-09-08 13:25:20
Message-ID:	48C527C0.2070707@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> Timestamp within the file is certainly better than trying to rely on
> filesystem timestamps. I doubt 1 sec resolution is good enough, and
> I'd also be worried about issues like clock skew between the
> postmaster's time and the filesystem's time.
>

Attached is a patch which adds a timestamp to pgstat.stat file header,
backend_read_statsfile uses this to determine if the file is fresh.
During the wait loop, the stats request message is retransmitted to
compensate for possible loss of message(s).

The collector only writes the file mostly at PGSTAT_STAT_INTERVAL frequency,
currently no extra custom timeouts can be passed in the message. This can
of course be added if need arises.

regards,
Martin

Attachment	Content-Type	Size
stat-write.patch	text/x-diff	11.9 KB

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com>
Cc:	Magnus Hagander <magnus(at)hagander(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: reducing statistics write overhead
Date:	2008-09-08 14:05:50
Message-ID:	14204.1220882750@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com> writes:
> Attached is a patch which adds a timestamp to pgstat.stat file header,
> backend_read_statsfile uses this to determine if the file is fresh.
> During the wait loop, the stats request message is retransmitted to
> compensate for possible loss of message(s).

> The collector only writes the file mostly at PGSTAT_STAT_INTERVAL frequency,
> currently no extra custom timeouts can be passed in the message. This can
> of course be added if need arises.

Hmm. With the timestamp in the file, ISTM that we could put all the
intelligence on the reader side. Reader checks file, sends message if
it's too stale. The collector just writes when it's told to, no
filtering. In this design, rate limiting relies on the readers to not
be unreasonable in how old a file they'll accept; and there's no problem
with different readers having different requirements.

A possible problem with this idea is that two readers might send request
messages at about the same time, resulting in two writes where there
need have been only one. However I think that could be fixed if we add
a little more information to the request messages and have the collector
remember the file timestamp it last wrote out. There are various ways
you could design it but what comes to mind here is for readers to send
a timestamp defined as minimum acceptable file timestamp (ie, their
current clock time less whatever slop they want to allow). Then,
when the collector gets a request with that timestamp <= its last
write timestamp, it knows it need not do anything.

With signaling like that, there's no excess writes *and* no delay in
responding to a new live write request. It's sort of annoying that
the readers have to sleep for an arbitrary interval though. If we could
get rid of that part...

regards, tom lane

From:	Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Magnus Hagander <magnus(at)hagander(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: reducing statistics write overhead
Date:	2008-09-08 21:10:48
Message-ID:	48C594D8.6050504@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> Hmm. With the timestamp in the file, ISTM that we could put all the
> intelligence on the reader side. Reader checks file, sends message if
... snip ...
> remember the file timestamp it last wrote out. There are various ways
> you could design it but what comes to mind here is for readers to send
> a timestamp defined as minimum acceptable file timestamp (ie, their
> current clock time less whatever slop they want to allow). Then,
> when the collector gets a request with that timestamp <= its last
> write timestamp, it knows it need not do anything.

Attached is a patch that implements the described signalling. Additionally
following non-related changes have been made:
1. fopen/fclose replaced with AllocateFile/FreeFile
2. pgstat_report_stat() now also checks if there are functions to report
before returning. Previous behavior was to return immediately if no table
stats were available for reporting.

> responding to a new live write request. It's sort of annoying that
> the readers have to sleep for an arbitrary interval though. If we could
> get rid of that part...
>
It is, but I guess its unavoidable if we have to wait for the file to be
written. I'll try to do some load testing tomorrow, to see if something
nasty comes out.

regards,
Martin

Attachment	Content-Type	Size
stat-write.patch	text/x-diff	13.2 KB

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: reducing statistics write overhead
Date:	2008-09-08 23:36:15
Message-ID:	20080908233615.GJ4411@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Martin Pihlak escribió:
> Tom Lane wrote:
> > Hmm. With the timestamp in the file, ISTM that we could put all the
> > intelligence on the reader side. Reader checks file, sends message if

> Attached is a patch that implements the described signalling.

Are we keeping the idea that the reader sends a stat message when it
needs to read stats? What about the lossiness of the transport?

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

From:	Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: reducing statistics write overhead
Date:	2008-09-09 13:11:58
Message-ID:	48C6761E.60900@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera wrote:
>> Attached is a patch that implements the described signalling.
>
> Are we keeping the idea that the reader sends a stat message when it
> needs to read stats? What about the lossiness of the transport?
>

As the message is resent in the wait loop, the collector should receive
it sooner or later. And initial testing shows that its not really easy to
make the collector lose messages.

I used a modified version of the patch to run a simple load test on a 4 core
amd64 Linux box:

- patch modified so that pgstat_send_inquiry() is sent only once - before wait loop,
so it times out if message is lost.
- stats file bloated to ~ 5MB by creating 40k tables.
- 4 pgbench instances running: -c 500 -t 500
- 2 clients constantly pulling stats
- all cores near 100% busy, tx traffic over loopback ~ 200kB/sec.

Most of the stats requests required 1 or 2 file wait iterations (5ms sleep each).
Occasionally 3, but no timeouts (ie. no lost messages). Maybe other platforms are
more sensitive, but I have none available for testing at the moment.

regards,
Martin

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com>
Cc:	Magnus Hagander <magnus(at)hagander(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: reducing statistics write overhead
Date:	2008-11-03 01:18:10
Message-ID:	4696.1225675090@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com> writes:
> Attached is a patch that implements the described signalling. Additionally
> following non-related changes have been made:
> 1. fopen/fclose replaced with AllocateFile/FreeFile
> 2. pgstat_report_stat() now also checks if there are functions to report
> before returning. Previous behavior was to return immediately if no table
> stats were available for reporting.

Applied with minor corrections.

regards, tom lane

From:	Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: reducing statistics write overhead
Date:	2009-01-07 10:36:50
Message-ID:	496485C2.1010307@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera wrote:
> Tom Lane escribió:
>
>> (In fact, maybe this patch ought to include some sort of maximum update
>> rate tunable? The worst case behavior could actually be WORSE than now.)
>
> Some sort of "if stats were requested in the last 500 ms, just tell the
> requester to read the existing file".
>
> Things that come to mind:
>
> - autovacuum could use a more frequent stats update in certain cases
>

[ digging up an old tread ... ]

I recently tested the on-demand stats write on an installation with over 100
databases on a single cluster (ca 5MB stats file). As I saw no visible reduction
in the stats file updates, I started investigating. Turned out that autovacuum
was configured with 1minute naptime, and was constantly walking the database list
and launching workers. Each worker does several stats file requests, and often
it turns out that the file is older than the allowed 10ms. Increasing the naptime
somewhat alleviates the problem, but still introduces peaks.

As I understand the autovacuum workers need up to date stats to minimize the
risk of re-vacuuming a table (in case it was already vacuumed by someone else).
However, with the visibility map based vacuums the cost of re-vacuuming should
be reasonably low. So I'd propose to increase the max allowed stats age for
autovacuum workers. Perhaps even as high as to allow reusing of the file requested
by the launcher. Or am I missing something?

regards,
Martin

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: reducing statistics write overhead
Date:	2009-01-07 14:44:09
Message-ID:	13344.1231339449@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com> writes:
> As I understand the autovacuum workers need up to date stats to minimize the
> risk of re-vacuuming a table (in case it was already vacuumed by someone else).

I never understood why autovacuum should need a particularly short fuse
on the stats file age to start with. If the launcher is launching
multiple workers into the same database with only a few milliseconds
between them, isn't the launcher pretty broken anyhow? ISTM that stats
files even several seconds old ought to be plenty good enough.

regards, tom lane

From:	Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: reducing statistics write overhead
Date:	2009-01-12 09:38:51
Message-ID:	496B0FAB.7070803@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> I never understood why autovacuum should need a particularly short fuse
> on the stats file age to start with. If the launcher is launching
> multiple workers into the same database with only a few milliseconds
> between them, isn't the launcher pretty broken anyhow? ISTM that stats
> files even several seconds old ought to be plenty good enough.
>

I was thinking that the launcher should only request fresh stats at wakeup,
the workers could then reuse that file. This could be implemented by calling
pgstat_clear_snapshot only at launcher wakeup and setting max stats age to
to autovacuum_naptime for the workers.

This would effectively disable the table vacuum recheck though. If this is
OK, I'll start working on it.

regards,
Martin

From:	Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com>
To:	Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: reducing statistics write overhead
Date:	2009-01-21 10:53:22
Message-ID:	4976FEA2.3090203@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I wrote:
> I was thinking that the launcher should only request fresh stats at wakeup,
> the workers could then reuse that file. This could be implemented by calling
> pgstat_clear_snapshot only at launcher wakeup and setting max stats age to
> to autovacuum_naptime for the workers.
>

Attached is a patch that increases the autovacuum stats age tolerance to
autovacuum_naptime. This is handled by autovac_refresh_stats() by not clearing
the stats snapshot unless nap time elapsed or explicitly forced by an error
or SIGHUP.

For the time being, I left the table vacuum recheck in place. Removing the
table_recheck_autovac function requires some further work. I have started on
this, but decided to defer until it is clear whether the whole approach is
acceptable or not.

regards,
Martin

Attachment	Content-Type	Size
autovac-stats.patch	text/x-diff	8.3 KB

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: reducing statistics write overhead
Date:	2009-01-21 12:52:09
Message-ID:	20090121125209.GA4038@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Martin Pihlak escribió:
> I wrote:
> > I was thinking that the launcher should only request fresh stats at wakeup,
> > the workers could then reuse that file. This could be implemented by calling
> > pgstat_clear_snapshot only at launcher wakeup and setting max stats age to
> > to autovacuum_naptime for the workers.
>
> Attached is a patch that increases the autovacuum stats age tolerance to
> autovacuum_naptime. This is handled by autovac_refresh_stats() by not clearing
> the stats snapshot unless nap time elapsed or explicitly forced by an error
> or SIGHUP.

You missed putting back the BUG comment that used to be there about
this.

In other words I think this is a bad idea, because there is a very wide
window for a table to be vacuumed twice. Since naptime can be
arbitrarily large, this is an arbitrarily large bug. I'm sure there are
other ways to fix this, but please propose those before this patch.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

From:	Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: reducing statistics write overhead
Date:	2009-01-21 16:15:29
Message-ID:	49774A21.1050005@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera wrote:
> You missed putting back the BUG comment that used to be there about
> this.
>

This was deliberate, I did mention the condition in the comment at
the beginning of the file. This actually makes it a feature :)

Seriously though, do you think that this is still a problem? Given
the rare occurrence of the revacuum and the fact that it is made
cheap by visibility map? In my initial testing, I couldn't reproduce
the revacuum. But I'll keep at it.

> In other words I think this is a bad idea, because there is a very wide
> window for a table to be vacuumed twice. Since naptime can be
> arbitrarily large, this is an arbitrarily large bug. I'm sure there are
> other ways to fix this, but please propose those before this patch.
>

I was wondering that maybe the stats subsystem shouldn't be used for
vacuum tracking at all. It maybe convenient to use, but has several
deficiencies (pobig file, lossy, no crash safety, etc). Could we move
vacuum tracking to pg_class instead?

regards,
Martin

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: reducing statistics write overhead
Date:	2009-01-21 16:27:51
Message-ID:	20090121162750.GH4038@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Martin Pihlak escribió:
> Alvaro Herrera wrote:
> > You missed putting back the BUG comment that used to be there about
> > this.
>
> This was deliberate, I did mention the condition in the comment at
> the beginning of the file. This actually makes it a feature :)
>
> Seriously though, do you think that this is still a problem? Given
> the rare occurrence of the revacuum and the fact that it is made
> cheap by visibility map?

Hmm, maybe it's no longer an issue with the visibility map, yes.

> I was wondering that maybe the stats subsystem shouldn't be used for
> vacuum tracking at all. It maybe convenient to use, but has several
> deficiencies (pobig file, lossy, no crash safety, etc). Could we move
> vacuum tracking to pg_class instead?

I agree that pgstats is not ideal (we've said this from the very
beginning), but I doubt that updating pg_class is the answer; you'd be
generating thousands of dead tuples there.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: reducing statistics write overhead
Date:	2009-01-21 18:16:36
Message-ID:	49776684.3000900@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera wrote:
> Martin Pihlak escribió:
>> Alvaro Herrera wrote:
>>> You missed putting back the BUG comment that used to be there about
>>> this.
>> This was deliberate, I did mention the condition in the comment at
>> the beginning of the file. This actually makes it a feature :)
>>
>> Seriously though, do you think that this is still a problem? Given
>> the rare occurrence of the revacuum and the fact that it is made
>> cheap by visibility map?
>
> Hmm, maybe it's no longer an issue with the visibility map, yes.

You still have to scan all indexes, so it's still not free by any means.

(I haven't been paying attention to what kind of a risk we're talking
about..)

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: reducing statistics write overhead
Date:	2009-01-21 19:45:08
Message-ID:	49777B44.10105@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera wrote:
> I agree that pgstats is not ideal (we've said this from the very
> beginning), but I doubt that updating pg_class is the answer; you'd be
> generating thousands of dead tuples there.
>

But we already do update pg_class after vacuum -- in vac_update_relstats().
Hmm, that performs a heap_inplace_update() ... I assume that this is cheap,
but have no idea as if it is suitable for the purpouse.

regards,
Martin

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: reducing statistics write overhead
Date:	2009-01-21 20:17:02
Message-ID:	20090121201702.GO4038@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Martin Pihlak escribió:
> Alvaro Herrera wrote:
> > I agree that pgstats is not ideal (we've said this from the very
> > beginning), but I doubt that updating pg_class is the answer; you'd be
> > generating thousands of dead tuples there.
>
> But we already do update pg_class after vacuum -- in vac_update_relstats().
> Hmm, that performs a heap_inplace_update() ... I assume that this is cheap,
> but have no idea as if it is suitable for the purpouse.

Oh, sorry, I thought you were suggesting to use pg_class to store number
of tuples dead/alive/etc.

I had a patch to introduce a new type of table, which would only be used
for non-transactional updates. That would allow what you're proposing.
I think we discussed something similar to what you propose and rejected
it for some reason I can't recall offhand. Search the archives for
pg_class_nt and pg_ntclass, that might give you some ideas.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: reducing statistics write overhead
Date:	2009-01-22 21:53:19
Message-ID:	18522.1232661199@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Martin Pihlak escribi:
>> [ patch to fool with stats refresh logic in autovac ]

(1) I still don't understand why we don't just make the launcher request
a new stats file once per naptime cycle, and then allow the workers to
work from that.

(2) The current code in autovacuum.c seems to be redundant with the
logic that now exists in the stats mechanism itself to decide whether a
stats file is too old.

regards, tom lane

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: reducing statistics write overhead
Date:	2009-01-22 22:00:37
Message-ID:	20090122220037.GM4296@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane escribió:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> > Martin Pihlak escribi:
> >> [ patch to fool with stats refresh logic in autovac ]
>
> (1) I still don't understand why we don't just make the launcher request
> a new stats file once per naptime cycle, and then allow the workers to
> work from that.

The problem is workers that spend too much time on a single database.
If the stats at the time they both start say that a given table must be
vacuumed (consider for example that the first one spent too much time
vacuuming some other big table), they could end up both vacuuming that
table. The second vacuum would be a waste.

This could be solved if the workers kept the whole history of tables
that they have vacuumed. Currently we keep only a single table (the one
being vacuumed right now). I proposed writing these history files back
when workers were first implemented, but the idea was shot down before
flying very far because it was way too complex (the rest of the patch
was more than complex enough.) Maybe we can implement this now.

> (2) The current code in autovacuum.c seems to be redundant with the
> logic that now exists in the stats mechanism itself to decide whether a
> stats file is too old.

Hmm, yeah, possibly.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Euler Taveira de Oliveira <euler(at)timbira(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: reducing statistics write overhead
Date:	2009-01-22 22:58:53
Message-ID:	4978FA2D.3000706@timbira.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera escreveu:
> This could be solved if the workers kept the whole history of tables
> that they have vacuumed. Currently we keep only a single table (the one
> being vacuumed right now). I proposed writing these history files back
> when workers were first implemented, but the idea was shot down before
> flying very far because it was way too complex (the rest of the patch
> was more than complex enough.) Maybe we can implement this now.
>
[I don't remember your proposal...] Isn't it just add a circular linked list
at AutoVacuumShmemStruct? Of course some lock mechanism needs to exist to
guarantee that we don't write at the same time. The size of this linked list
would be scale by a startup-time-guc or a reasonable fixed value.

--
Euler Taveira de Oliveira
http://www.timbira.com/

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Euler Taveira de Oliveira <euler(at)timbira(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: reducing statistics write overhead
Date:	2009-01-22 23:14:37
Message-ID:	20090122231437.GO4296@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Euler Taveira de Oliveira escribió:
> Alvaro Herrera escreveu:
> > This could be solved if the workers kept the whole history of tables
> > that they have vacuumed. Currently we keep only a single table (the one
> > being vacuumed right now). I proposed writing these history files back
> > when workers were first implemented, but the idea was shot down before
> > flying very far because it was way too complex (the rest of the patch
> > was more than complex enough.) Maybe we can implement this now.
> >
> [I don't remember your proposal...] Isn't it just add a circular linked list
> at AutoVacuumShmemStruct? Of course some lock mechanism needs to exist to
> guarantee that we don't write at the same time. The size of this linked list
> would be scale by a startup-time-guc or a reasonable fixed value.

Well, the problem is precisely how to size the list. I don't like the
idea of keeping an arbitrary number in memory; it adds another
mostly-useless tunable that we'll need to answer questions about for all
eternity.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

From:	Euler Taveira de Oliveira <euler(at)timbira(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: reducing statistics write overhead
Date:	2009-01-23 04:28:03
Message-ID:	49794753.90902@timbira.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera escreveu:
> Euler Taveira de Oliveira escribió:
>> Alvaro Herrera escreveu:
>>> This could be solved if the workers kept the whole history of tables
>>> that they have vacuumed. Currently we keep only a single table (the one
>>> being vacuumed right now). I proposed writing these history files back
>>> when workers were first implemented, but the idea was shot down before
>>> flying very far because it was way too complex (the rest of the patch
>>> was more than complex enough.) Maybe we can implement this now.
>>>
>> [I don't remember your proposal...] Isn't it just add a circular linked list
>> at AutoVacuumShmemStruct? Of course some lock mechanism needs to exist to
>> guarantee that we don't write at the same time. The size of this linked list
>> would be scale by a startup-time-guc or a reasonable fixed value.
>
> Well, the problem is precisely how to size the list. I don't like the
> idea of keeping an arbitrary number in memory; it adds another
> mostly-useless tunable that we'll need to answer questions about for all
> eternity.
>
[Poking the code a little...] You're right. We could do that but it isn't an
elegant solution. What about tracking that information at table_oids?

struct table_oids {
bool skipit; /* initially false */
Oid relid;
};

--
Euler Taveira de Oliveira
http://www.timbira.com/

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Euler Taveira de Oliveira <euler(at)timbira(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Martin Pihlak <martin(dot)pihlak(at)gmail(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: reducing statistics write overhead
Date:	2009-01-23 04:58:41
Message-ID:	26106.1232686721@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Euler Taveira de Oliveira <euler(at)timbira(dot)com> writes:
> Alvaro Herrera escreveu:
>> Well, the problem is precisely how to size the list. I don't like the
>> idea of keeping an arbitrary number in memory; it adds another
>> mostly-useless tunable that we'll need to answer questions about for all
>> eternity.

Is it so hard? In particular, rather than making it a tunable, what say
we freeze the list size at exactly two, ie each AV worker advertises its
current and most recent target table in shared memory. Other workers
avoid re-vacuuming those. Then the most work you can "waste" by extra
vacuuming is less than the maximum allowed stats file age. I'd have no
problem whatsoever in letting that run into multiple seconds, as long
as it doesn't get into minutes or hours.

regards, tom lane