Re: autovacuum causing numerous regression-test failures

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: autovacuum causing numerous regression-test failures
Date: 2006-08-28 18:33:22
Message-ID: 15424.1156790002@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I think we shall have to reconsider that patch to turn it on by default.
So far I've seen two categories of failure:

* manual ANALYZE issued by regression tests fails because autovac is
analyzing the same table concurrently.

* contrib tests fail in their repeated drop/create database operations
because autovac is connected to that database. (pl tests presumably
have same issue.)

There are probably more symptoms we have not seen yet.

In the long run it would be good to figure out fixes to make these
problems not happen, but I'm not putting that on the must-fix-for-8.2
list.

BTW, it would sure be nice to know what happened here:
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=wasp&dt=2006-08-28%2017:05:01

LOG: autovacuum process (PID 26315) was terminated by signal 11
LOG: terminating any other active server processes

but even if there was a core file, it got wiped out immediately by
the next "DROP DATABASE" command :-(. This one does look like a
must-fix, if we can find out what happened.

regards, tom lane


From: "Alon Goldshuv" <agoldshuv(at)greenplum(dot)com>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Unnecessary rescan for non scrollable holdable cursors
Date: 2006-08-28 18:50:35
Message-ID: C118B33B.139C0%agoldshuv@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

When persisting a holdable cursor at COMMIT time we currently choose to
rewind the executor and re-scan the whole result set into the tuplestore in
order to be able to scroll backwards later on. And then, we reposition the
cursor to the position we been in. However, unless I am missing something,
this seems to be done always, even if the cursor is not scrollable. I
suppose adding a simple conditional or two in PersistHoldablePortal() in
portalcmds.c could save the rescan and filling up the tuplestore with tuples
that will never be looked at, in the case that we never want to scroll back.

Anyway, definitely not critical, but should save some time and space in
those specific situations.

Regards,
Alon.


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: autovacuum causing numerous regression-test failures
Date: 2006-08-28 18:51:17
Message-ID: 200608282051.18137.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> I think we shall have to reconsider that patch to turn it on by
> default. So far I've seen two categories of failure:

So we turn autovacuum off for regression test instance.

> * manual ANALYZE issued by regression tests fails because autovac is
> analyzing the same table concurrently.

Or we put manual exceptions for the affected tables into pg_autovacuum.

> * contrib tests fail in their repeated drop/create database
> operations because autovac is connected to that database. (pl tests
> presumably have same issue.)

I opine that when a database is to be dropped, the connections should be
cut.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: autovacuum causing numerous regression-test failures
Date: 2006-08-28 19:21:22
Message-ID: 15978.1156792882@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> Tom Lane wrote:
>> I think we shall have to reconsider that patch to turn it on by
>> default. So far I've seen two categories of failure:

> So we turn autovacuum off for regression test instance.

Not a solution for "make installcheck", unless you are proposing adding
the ability to suppress autovac per-database. Which would be a good
new feature ... for 8.3.

>> * manual ANALYZE issued by regression tests fails because autovac is
>> analyzing the same table concurrently.

> Or we put manual exceptions for the affected tables into pg_autovacuum.

New feature? Or does that capability exist already?

>> * contrib tests fail in their repeated drop/create database
>> operations because autovac is connected to that database. (pl tests
>> presumably have same issue.)

> I opine that when a database is to be dropped, the connections should be
> cut.

Sure, but that's another thing that we're not going to start designing
and implementing four weeks after feature freeze.

I didn't complain about your proposing two weeks after feature freeze
that we turn autovac on by default, because I assumed (same as you no
doubt) that it would be a trivial one-liner change. It is becoming
clear that that is not the case, and I don't think it makes any sense
from a project-management standpoint to try to flush the problems out
at this time in the release cycle. We have more than enough problems
to fix for 8.2 already. Let's try to do this early in the 8.3 cycle
instead.

regards, tom lane


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: autovacuum causing numerous regression-test failures
Date: 2006-08-28 19:39:00
Message-ID: 200608282139.00561.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> > So we turn autovacuum off for regression test instance.
>
> Not a solution for "make installcheck",

Well, for "make installcheck" we don't have any control over whether
autovacuum has been turned on or off manually anyway. If you are
concerned about build farm reliability, the build farm scripts can
surely be made to initialize or start the instance in a particular way.

Another option might be to turn off stats_row_level on the fly.

> > Or we put manual exceptions for the affected tables into
> > pg_autovacuum.
>
> New feature? Or does that capability exist already?

I haven't ever used the pg_autovacuum table but the documentation
certainly makes one believe that this is possible.

> > I opine that when a database is to be dropped, the connections
> > should be cut.
>
> Sure, but that's another thing that we're not going to start
> designing and implementing four weeks after feature freeze.

Right.

> clear that that is not the case, and I don't think it makes any sense
> from a project-management standpoint to try to flush the problems out
> at this time in the release cycle. We have more than enough problems
> to fix for 8.2 already. Let's try to do this early in the 8.3 cycle
> instead.

Let's just consider some of the options a bit more closely, and if they
don't work, we'll revert it.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: autovacuum causing numerous regression-test failures
Date: 2006-08-28 19:42:15
Message-ID: 16845.1156794135@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> BTW, it would sure be nice to know what happened here:
> http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=wasp&dt=2006-08-28%2017:05:01
> LOG: autovacuum process (PID 26315) was terminated by signal 11

I was able to cause autovac to crash by repeating contrib/intarray
regression test enough times in a row. The cause is not specific
to autovac, it's a generic bug created by my recent patch to add
"waiting" status to pg_stat_activity. If we block on a lock during
InitPostgres then the stats stuff isn't ready yet ... oops.
Patch committed.

The other issues remain problems however.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: autovacuum causing numerous regression-test failures
Date: 2006-08-28 20:05:35
Message-ID: 17333.1156795535@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=osprey&dt=2006-08-28%2016:00:17
shows another autovac-induced failure mode:

! psql: FATAL: sorry, too many clients already

initdb is choosing max_connections = 20 on this machine, which is
sufficient to run the parallel regression tests by themselves,
but not regression tests plus autovac.

IIRC initdb will go down to 10 or so connections before deciding
it's hopeless. I don't really want to change that behavior because
it might make it impossible to initdb at all on a small machine.
But probably there needs to be a way for pg_regress to set a floor
on the acceptable max_connections setting while initializing the
test instance for "make check".

This also ties into the recent discussions about whether autovac needs
its own reserved backend slots. Which, again, sounds to me like a fine
idea for 8.3 work.

regards, tom lane


From: Neil Conway <neilc(at)samurai(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: autovacuum causing numerous regression-test failures
Date: 2006-08-28 22:07:21
Message-ID: 1156802841.8404.2.camel@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2006-08-28 at 15:21 -0400, Tom Lane wrote:
> We have more than enough problems to fix for 8.2 already. Let's
> try to do this early in the 8.3 cycle instead.

I agree -- I think this is exactly the sort of change that is best made
at the beginning of a development cycle, so that there's a whole cycle's
worth of testing to ensure it plays nicely with the rest of the system.

-Neil


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: autovacuum causing numerous regression-test failures
Date: 2006-08-28 22:13:35
Message-ID: 20060828221335.GC13899@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Neil Conway wrote:
> On Mon, 2006-08-28 at 15:21 -0400, Tom Lane wrote:
> > We have more than enough problems to fix for 8.2 already. Let's
> > try to do this early in the 8.3 cycle instead.
>
> I agree -- I think this is exactly the sort of change that is best made
> at the beginning of a development cycle, so that there's a whole cycle's
> worth of testing to ensure it plays nicely with the rest of the system.

On the other hand, the bug Tom found on DROP OWNED a couple of weeks ago
was introduced right at the start of this development cycle, which tells
us that our testing of the development branch is not very exhaustive.
But I agree anyway.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: autovacuum causing numerous regression-test failures
Date: 2006-08-28 22:39:35
Message-ID: 44F370A7.708@zeut.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Peter Eisentraut wrote:
> Tom Lane wrote:
>> Not a solution for "make installcheck",
>
> Well, for "make installcheck" we don't have any control over whether
> autovacuum has been turned on or off manually anyway. If you are
> concerned about build farm reliability, the build farm scripts can
> surely be made to initialize or start the instance in a particular way.
>
> Another option might be to turn off stats_row_level on the fly.

I'm sure I'm missing some of the subtleties of make installcheck issues,
but autovacuum can be enabled / disabled on the fly just as easily as
stats_row_level, so I don't see the difference?

>>> Or we put manual exceptions for the affected tables into
>>> pg_autovacuum.
>> New feature? Or does that capability exist already?
>
> I haven't ever used the pg_autovacuum table but the documentation
> certainly makes one believe that this is possible.

Right, if it doesn't work, that would certainly be a bug. This feature
was included during the original integration into the backend during the
8.0 dev cycle.

> Let's just consider some of the options a bit more closely, and if they
> don't work, we'll revert it.

Agreed.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: autovacuum causing numerous regression-test failures
Date: 2006-08-28 23:18:37
Message-ID: 19821.1156807117@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Matthew T. O'Connor" <matthew(at)zeut(dot)net> writes:
>> Tom Lane wrote:
>>> Not a solution for "make installcheck",

> I'm sure I'm missing some of the subtleties of make installcheck issues,
> but autovacuum can be enabled / disabled on the fly just as easily as
> stats_row_level, so I don't see the difference?

Well, "just as easily" means "edit postgresql.conf and SIGHUP", which is
not an option available to "make installcheck", even if we thought that
an invasive change of the server configuration would be acceptable for
it to do. It's conceivable that we could invent a per-database
autovac-off variable controlled by, say, ALTER DATABASE SET ... but we
haven't got one today.

My objection here is basically that this proposal passed on the
assumption that it would be very nearly zero effort to make it happen.
We are now finding out that we have a fair amount of work to do if we
want autovac to not mess up the regression tests, and I think that has
to mean that the proposal goes back on the shelf until 8.3 development
starts. We are already overcommitted in terms of the stuff that was
submitted *before* feature freeze.

regards, tom lane


From: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>, pgsql-hackers(at)postgresql(dot)org, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: autovacuum causing numerous regression-test failures
Date: 2006-08-29 09:14:44
Message-ID: 44F40584.1090905@pse-consulting.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
>
> My objection here is basically that this proposal passed on the
> assumption that it would be very nearly zero effort to make it happen.
> We are now finding out that we have a fair amount of work to do if we
> want autovac to not mess up the regression tests, and I think that has
> to mean that the proposal goes back on the shelf until 8.3 development
> starts. We are already overcommitted in terms of the stuff that was
> submitted *before* feature freeze.
>

Kicking out autovacuum as default is a disaster, it took far too long to
get in the backend already (wasn't it planned for 8.0?).
You discuss this on the base of the regression tests, which obviously
run on installations that do _not_ represent standard recommended
installations. It's required for ages now to have vacuum running
regularly, using cron or so. The regression tests have to deal with that
default situation, in one way or the other (which might well mean "this
tables don't need vacuum" or "this instance doesn't need vacuum"). IMHO
blaming autovacuum for the test failures reverses cause and effect.

Missing vacuum was probably a reason for poor performance of many newbie
pgsql installations (and I must admit that I missed installing the cron
job myself from time to time, though I _knew_ it was needed). As Magnus
already pointed out, all win32 installations have it on by default, to
take them to the safe side. Disabling it for modules a "retail" user
will never launch appears overreacting.

I can positively acknowledge that disabling autovacuum with a
pg_autovacuum row does work, I'm using it in production.

Regards,
Andreas


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
Cc: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>, pgsql-hackers(at)postgresql(dot)org, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: autovacuum causing numerous regression-test failures
Date: 2006-08-29 12:39:47
Message-ID: 28614.1156855187@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andreas Pflug <pgadmin(at)pse-consulting(dot)de> writes:
> Tom Lane wrote:
>> My objection here is basically that this proposal passed on the
>> assumption that it would be very nearly zero effort to make it happen.

> Kicking out autovacuum as default is a disaster, it took far too long to
> get in the backend already (wasn't it planned for 8.0?).

If it's so "disastrous" to not have it, why wasn't it even proposed
until two weeks after feature freeze? Sorry, I'm not buying this
argument.

regards, tom lane


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: autovacuum causing numerous regression-test failures
Date: 2006-08-29 13:42:14
Message-ID: 200608291542.15665.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Am Dienstag, 29. August 2006 11:14 schrieb Andreas Pflug:
> already pointed out, all win32 installations have it on by default, to
> take them to the safe side. Disabling it for modules a "retail" user
> will never launch appears overreacting.

Well, the really big problem is that autovacuum may be connected to a database
when you want to drop it. (There may be related problems like vacuuming a
template database at the wrong time. I'm not sure how that is handled.) I
think this is not only a problem that is specific to the regression testing
but a potential problem in deployment. I have opined earlier how I think
that should behave properly, but we're not going to change that in 8.2.

The other problems that were mentioned are pretty easy to work around by
setting stats_row_level to off on the fly, but that doesn't stop autovacuum
from connecting.

The good thing is that we have collected plenty of interesting data in the
last 24 hours which will make for plenty of development work next time
around. :)

--
Peter Eisentraut
http://developer.postgresql.org/~petere/


From: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>, pgsql-hackers(at)postgresql(dot)org, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: autovacuum causing numerous regression-test failures
Date: 2006-08-29 15:55:25
Message-ID: 44F4636D.5000905@pse-consulting.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Andreas Pflug <pgadmin(at)pse-consulting(dot)de> writes:
>
>> Tom Lane wrote:
>>
>>> My objection here is basically that this proposal passed on the
>>> assumption that it would be very nearly zero effort to make it happen.
>>>
>
>
>> Kicking out autovacuum as default is a disaster, it took far too long to
>> get in the backend already (wasn't it planned for 8.0?).
>>
>
> If it's so "disastrous" to not have it, why wasn't it even proposed
> until two weeks after feature freeze?
To me, this proposal was just too obvious, for reasons already discussed
earlier.

Regards,
Andreas


From: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: autovacuum causing numerous regression-test failures
Date: 2006-08-29 15:58:29
Message-ID: 44F46425.8010107@pse-consulting.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Peter Eisentraut wrote:
> Am Dienstag, 29. August 2006 11:14 schrieb Andreas Pflug:
>
>> already pointed out, all win32 installations have it on by default, to
>> take them to the safe side. Disabling it for modules a "retail" user
>> will never launch appears overreacting.
>>
>
> Well, the really big problem is that autovacuum may be connected to a database
> when you want to drop it. (There may be related problems like vacuuming a
> template database at the wrong time. I'm not sure how that is handled.) I
> think this is not only a problem that is specific to the regression testing
> but a potential problem in deployment. I have opined earlier how I think
> that should behave properly, but we're not going to change that in 8.2.
>
Don't these issues hit a cron scheduled vacuum as well?

Regards,
Andreas


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: autovacuum causing numerous regression-test failures
Date: 2006-08-30 04:31:42
Message-ID: 200608292131.42234.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Folks,

My vote is with Peter and Tom on not putting it in. We needed to discuss/test
this well before feature freeze if we really wanted to do it.

Here's what needs to be resolved:
a) make autovaccum play nice with the regression tests
b) come up with default threshold/multiplier values which are backed by test
data

--
Josh Berkus
PostgreSQL @ Sun
San Francisco