Re: more autovacuum fixes

Lists: pgsql-patches
From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Patches <pgsql-patches(at)postgresql(dot)org>
Subject: more autovacuum fixes
Date: 2007-06-19 17:49:21
Message-ID: 20070619174921.GZ4265@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

I attach a patch to fix some issues in the autovac process handling
code. Mainly, instead of rechecking periodically in the launcher
whether a worker was able to start or not, what we do is signal it from
the postmaster when it fails.

In order to do this I had to make the postmaster set a flag in shared
memory, much like the pmsignal.c code does. According to previous
discussions this is an acceptable thing for postmaster to do.

I must admit I did not try to fill shared memory with garbage and see if
it caused a postmaster crash. What I did test was creating a database,
making sure that the launcher was picking it up, and then delete the
database's directory. This correctly causes a worker to log errors when
trying to connect, and it is handled well -- i.e. other databases are
not starved. (One remaining problem I see is that a database in this
state, or similar, that fails to be vacuumed, *will* cause starvation if
in danger of Xid wraparound).

Another thing I noticed is that I can change the "autovacuum" parameter
in postgresql.conf and send SIGHUP to postmaster, which correctly
starts the launcher if it was previously disabled. To make the shutdown
case act accordingly, I made the launcher check the "start daemon" flag
after rereading the config file, and quit if it's off. I tried multiple
combinations of having it set and unset, changing
stats_collect_tuple_level and it works fine.

One problem with the patch is this (new code):

bn = (Backend *) malloc(sizeof(Backend));
! if (bn)
{
! bn->pid = StartAutoVacWorker();
! bn->is_autovacuum = true;
! /* we don't need a cancel key */

! if (bn->pid > 0)
! {
! /* FIXME -- unchecked memory allocation here */
! DLAddHead(BackendList, DLNewElem(bn));

If the palloc() inside DLNewElem fails, we will fail to report a "fork
failure" to the launcher. I am not sure how serious this is. One idea
that came to mind was using a PG_TRY block, sending the signal in the
CATCH block, and then rethrowing the exception. Is this acceptable?

All in all, this patch increases the reliability of autovacuum, so I
intend to apply it shortly unless there are objections. Further
improvements might come later as I become aware of possible fixes.

--
Alvaro Herrera http://www.amazon.com/gp/registry/DXLWNGRJD34J
"Hay quien adquiere la mala costumbre de ser infeliz" (M. A. Evans)

Attachment Content-Type Size
autovacuum-procs.patch text/x-diff 18.5 KB

From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: more autovacuum fixes
Date: 2007-06-19 17:58:14
Message-ID: 20070619175814.GC21268@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Alvaro Herrera wrote:

> One problem with the patch is this (new code):
>
> bn = (Backend *) malloc(sizeof(Backend));
> ! if (bn)
> {
> ! bn->pid = StartAutoVacWorker();
> ! bn->is_autovacuum = true;
> ! /* we don't need a cancel key */
>
> ! if (bn->pid > 0)
> ! {
> ! /* FIXME -- unchecked memory allocation here */
> ! DLAddHead(BackendList, DLNewElem(bn));
>
>
> If the palloc() inside DLNewElem fails, we will fail to report a "fork
> failure" to the launcher. I am not sure how serious this is. One idea
> that came to mind was using a PG_TRY block, sending the signal in the
> CATCH block, and then rethrowing the exception. Is this acceptable?

I noticed another problem: the worker may fail during BaseInit() or
InitProcess(). This is not where most problems will be (that would be
later, in InitPostgres(), which is when the worker connects to a DB) but
still could cause a starvation problem, I think. Maybe the PG_TRY block
is called for in there, as well as the postmaster code.

--
Alvaro Herrera http://www.PlanetPostgreSQL.org/
"The ability to monopolize a planet is insignificant
next to the power of the source"


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: more autovacuum fixes
Date: 2007-06-19 20:02:20
Message-ID: 20070619200220.GH21268@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Alvaro Herrera wrote:

> One problem with the patch is this (new code):
>
> bn = (Backend *) malloc(sizeof(Backend));
> ! if (bn)
> {
> ! bn->pid = StartAutoVacWorker();
> ! bn->is_autovacuum = true;
> ! /* we don't need a cancel key */
>
> ! if (bn->pid > 0)
> ! {
> ! /* FIXME -- unchecked memory allocation here */
> ! DLAddHead(BackendList, DLNewElem(bn));
>
>
> If the palloc() inside DLNewElem fails, we will fail to report a "fork
> failure" to the launcher. I am not sure how serious this is.

Turns out that this problem is not serious at all, because if that
palloc() fails, the whole postmaster will exit with a FATAL out of
memory message.

The problems in the worker code after fork are still an issue though.

--
Alvaro Herrera http://www.amazon.com/gp/registry/DXLWNGRJD34J
"MySQL is a toy compared to PostgreSQL." (Randal L. Schwartz)
(http://archives.postgresql.org/pgsql-general/2005-07/msg00517.php)


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: more autovacuum fixes
Date: 2007-06-20 13:47:00
Message-ID: 20070620134659.GD30369@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Alvaro Herrera wrote:

> Turns out that this problem is not serious at all, because if that
> palloc() fails, the whole postmaster will exit with a FATAL out of
> memory message.
>
> The problems in the worker code after fork are still an issue though.

It _is_ still an issue -- and a very serious issue at that. If a worker
fails before getting its entry from the startingWorker pointer, then
(patched) autovacuum will no longer run any task.

Since it doesn't seem like it is possible to signal the launcher until
the worker has set up shared memory, I think we should leave the current
shenanigans in place. They are really ugly code and I'd love to get rid
of them, but currently I don't see any better mechanism to deal with
this failure scenario.

Thanks for your attention ;-)

--
Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4
Bob [Floyd] used to say that he was planning to get a Ph.D. by the "green
stamp method," namely by saving envelopes addressed to him as 'Dr. Floyd'.
After collecting 500 such letters, he mused, a university somewhere in
Arizona would probably grant him a degree. (Don Knuth)


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: more autovacuum fixes
Date: 2007-06-20 17:30:54
Message-ID: 20070620173054.GM30369@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Alvaro Herrera wrote:
> Alvaro Herrera wrote:
>
> > Turns out that this problem is not serious at all, because if that
> > palloc() fails, the whole postmaster will exit with a FATAL out of
> > memory message.
> >
> > The problems in the worker code after fork are still an issue though.
>
> It _is_ still an issue -- and a very serious issue at that. If a worker
> fails before getting its entry from the startingWorker pointer, then
> (patched) autovacuum will no longer run any task.

I figured that I could keep the old check there for when the worker
failed, but still add the new signalling mechanism so that a fork()
failure (which I would think is the most likely of all) is taken care of
in a more timely manner.

I've also improved the rest of the code and comments a bit, and the new
code does seem better now. So I'll go ahead and commit it later today.

--
Alvaro Herrera http://www.flickr.com/photos/alvherre/
"Uno puede defenderse de los ataques; contra los elogios se esta indefenso"

Attachment Content-Type Size
autovacuum-procs-2.patch text/x-diff 19.2 KB