Cancel autovacuum conflicting with DROP TABLE

Lists: pgsql-patches
From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: pgsql-patches(at)postgresql(dot)org
Subject: Cancel autovacuum conflicting with DROP TABLE
Date: 2007-06-20 02:40:28
Message-ID: 20070620111526.6956.ITAGAKI.TAKAHIRO@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Here is a patch that cancels autovacuum workers conflicting with
DROP TABLE, TRUNCATE and CLUSTER. It was discussed here:
http://archives.postgresql.org/pgsql-hackers/2007-06/msg00556.php

Before backends drop a table, they search autovacuum workers that
are running on the table. If found, they send SIGINT signal to
the autovacuum to avoid waiting for the worthless vacuum.

When an autovacuum worker receives SIGINT, it skips only the vacuuming
table and continues the remaining jobs. Now SIGINT and SIGTERM have
different meanings and their logs are changed.

SIGINT -- Cancel the current job.
DEBUG: autovacuum on relation <schema>.<table> is canceled
SIGTERM -- Cancel all jobs.
FATAL: terminating autovacuum due to administrator command

We can see the second SIGTERM log on shutdown and DROP DATABASE.
In such case, we send SIGTERM to autovacuums instead of SIGINT.
I'd like to bring down the error level to WARNING or less actually,
but I have no idea to do it because canceling queries and error levels
are tightly coupled in the present implementation.
Instead, I added the special FATAL message for autovacuum workers
so that we can distinguish the lines are ignorable.

Comments welcome.

---
ITAGAKI Takahiro
NTT Open Source Software Center

Attachment Content-Type Size
cancel_autovac_on_drop.patch application/octet-stream 9.6 KB

From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: pgsql-patches(at)postgresql(dot)org
Subject: Re: Cancel autovacuum conflicting with DROP TABLE
Date: 2007-06-22 02:09:24
Message-ID: 20070622105344.6118.ITAGAKI.TAKAHIRO@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> wrote:

> Here is a patch that cancels autovacuum workers conflicting with
> DROP TABLE, TRUNCATE and CLUSTER. It was discussed here:
> http://archives.postgresql.org/pgsql-hackers/2007-06/msg00556.php

I made an adjustment for the latest 'more autovacuum fixes' patch.
http://archives.postgresql.org/pgsql-patches/2007-06/msg00217.php

Now, autovacuum workers handles ProcDie signals using ERROR
instead of FATAL. The exception is caught by the worker and
converted to the following logs.

SIGINT -- Cancel the current job.
LOG: autovacuum on <db>.<schema>.<table> is canceled
SIGTERM -- Cancel all jobs.
LOG: autovacuum on <db> is canceled

We are planning to ship 8.3 with autovacuum=on, so users will be
more likely to see conflicts between autovacuum and their commands.
This makes autovacuum more gentle.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

Attachment Content-Type Size
cancel_autovac_on_drop-2.patch application/octet-stream 10.2 KB

From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: Cancel autovacuum conflicting with DROP TABLE
Date: 2007-06-27 04:14:18
Message-ID: 20070627041418.GP11609@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

ITAGAKI Takahiro wrote:
> ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> wrote:
>
> > Here is a patch that cancels autovacuum workers conflicting with
> > DROP TABLE, TRUNCATE and CLUSTER. It was discussed here:
> > http://archives.postgresql.org/pgsql-hackers/2007-06/msg00556.php
>
> I made an adjustment for the latest 'more autovacuum fixes' patch.
> http://archives.postgresql.org/pgsql-patches/2007-06/msg00217.php
>
> Now, autovacuum workers handles ProcDie signals using ERROR
> instead of FATAL. The exception is caught by the worker and
> converted to the following logs.
>
> SIGINT -- Cancel the current job.
> LOG: autovacuum on <db>.<schema>.<table> is canceled
> SIGTERM -- Cancel all jobs.
> LOG: autovacuum on <db> is canceled

Thanks for the patch. I think there are actually three patches in here:

1. changing SIGINT so that it cancels the current table instead of
shutting down the entire worker.

2. changing DROP TABLE and TRUNCATE so that they cancel an autovac
worker by sending SIGINT.

3. change the interrupt code so that autovac workers are terminated with
ERROR instead of FATAL, knowing that the final outcome is the same. If
I'm not mistaken the only point of the change is to be able to change
the error message, is that correct?

I don't feel very much comfortable with the patch (3). I would prefer
that we keep errcode FATAL and select a different error message in
ProcessInterrupts instead. I don't see the point in complicating the
sigjmp target without need.

I agree with the (2) change.

The change in (1) I don't feel comfortable with commenting. It seems
strange to me, and although it seems like it would be safe, I cannot
help having a strange feeling about it. I'll try to digest it a bit
more.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: Cancel autovacuum conflicting with DROP TABLE
Date: 2007-06-27 07:22:22
Message-ID: 20070627153150.6418.ITAGAKI.TAKAHIRO@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches


Alvaro Herrera <alvherre(at)commandprompt(dot)com> wrote:

> 1. changing SIGINT so that it cancels the current table instead of
> shutting down the entire worker.
>
> 2. changing DROP TABLE and TRUNCATE so that they cancel an autovac
> worker by sending SIGINT.

Quite so.

> 3. change the interrupt code so that autovac workers are terminated with
> ERROR instead of FATAL, knowing that the final outcome is the same. If
> I'm not mistaken the only point of the change is to be able to change
> the error message, is that correct?

Yes, I used ERROR instead of FATAL just for changing the message.
In addition, the actual message level will be LOG, instead of FATAL.

> I don't feel very much comfortable with the patch (3). I would prefer
> that we keep errcode FATAL and select a different error message in
> ProcessInterrupts instead. I don't see the point in complicating the
> sigjmp target without need.

My first patch did so and I changed it in the second patch because
I feel the FATAL entry in syslog is too obtrusive; Internal termination
of autovacuum workers in this way is ignorable for normal users.
However, I'm sure that my complaint is not so important. Please don't
apply thie part of the patch if you don't like it.

> I agree with the (2) change.
>
> The change in (1) I don't feel comfortable with commenting. It seems
> strange to me, and although it seems like it would be safe, I cannot
> help having a strange feeling about it. I'll try to digest it a bit
> more.

Thanks. I forgot to adjust comments in the code, sorry.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: Cancel autovacuum conflicting with DROP TABLE
Date: 2007-06-29 17:19:29
Message-ID: 20070629171929.GF10563@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

ITAGAKI Takahiro wrote:
>
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> wrote:
>
> > 1. changing SIGINT so that it cancels the current table instead of
> > shutting down the entire worker.

I applied this part of the patch, thanks.

In passing I noticed that apparently we are leaking memory, because the
vacuum memory context is created with a parent of PortalContext, which
ISTM to be NULL in autovacuum. So when we cancel the vacuuming work, we
never delete or reset that context.

I think what we should be doing is creating a context to act as
PortalContext, and reset it after each vacuuming operation.

--
Alvaro Herrera http://www.amazon.com/gp/registry/5ZYLFMCVHXC
"El que vive para el futuro es un iluso, y el que vive para el pasado,
un imbécil" (Luis Adler, "Los tripulantes de la noche")