Re: [bug fix] Suppress "autovacuum: found orphan temp table" message

From: "MauMau" <maumau307(at)gmail(dot)com>
To: "Andres Freund" <andres(at)2ndquadrant(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [bug fix] Suppress "autovacuum: found orphan temp table" message
Date: 2014-07-22 13:18:03
Message-ID: 1C2948EA6273403C901A8C4EF4E3488B@maumau
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: "Andres Freund" <andres(at)2ndquadrant(dot)com>
> On 2014-07-22 19:13:56 +0900, MauMau wrote:
>> But this is true if restart_after_crash = on in postgresql.conf, because
>> the
>> crash restart only occurs in that case. However, in HA cluster, whether
>> it
>> is shared-disk or replication, restart_after_crash is set to off, isn't
>> it?
>
> In almost all setups I've seen it's set to on, even in HA scenarios.

I'm afraid that's because people don't notice the existence or purpose of
this parameter. The 9.1 release note says:

Add restart_after_crash setting which disables automatic server restart
after a backend crash (Robert Haas)
This allows external cluster management software to control whether the
database server restarts or not.

Reading this, I guess the parameter was introduced, and should be used, for
HA environments controlled by the clusterware. Restarting the database
server on the same machine may fail, or the restarted server may fail again,
due to the broken hardware components, so I guess it was considered better
to let the clusterware determine what to do.

>> Moreover, as the comment says, the behavior of keeping leftover temp
>> files
>> is for debugging by developers. It's not helpful for users, isn't it? I
>> thought messages of DEBUG level is more appropriate, because the behavior
>> is
>> for debugging purposes.
>
> GRR. That doesn't change the fact that there'll be files left over after
> a crash restart.

Yes... that's a source of headache. But please understand that there's a
problem -- trying to leave temp relations just for debugging is causing a
flood of messages, which the customer is actually concerned about.

> I think you're making lots of noise over a trivial log message.

Maybe so, and I hope so. I may be too nervous about what the customer will
ask and/or request next. If they request something similar to what I
proposed here, let me consult you again.

>> Could you please reconsider this?
>
> No. Just removing a warning isn't the way to solve this. If you want to
> improve things you'll actually need to improve things not just stick
> your head into the sand.

I have a few ideas below, but none of them seems better than the original
proposal. What do you think?

1. startup process deletes the catalog entries and data files of leftover
temp relations at the end of recovery.
This is probably difficult, impossible or undesirable, because the startup
process cannot access system catalogs. Even if it's possible, it is against
the developers' desire to leave temp relation files for debugging.

2. autovacuum launcher deletes the catalog entries and data files of
leftover temp relations during its initialization.
This may be possible, but it is against the developers' desire to leave temp
relation files for debugging.

3. Emit the "orphan temp relation" message only when the associated data
file actually exists.
autovacuum workers check if the temp relation file is left over with stat().
If not, delete the catalog entry in pg_class silently.
This sounds reasonable because the purpose of the message is to notify users
of potential disk space shortage. In the streaming replication case, no
data files should exist on the promoted new primary, so no messages should
be emitted.
However, in the shared-disk HA cluster case, the temp relation files are
left over on the shared disk, so this fix doesn't improve anything.

4. Emit the "orphan temp relation" message only when restart_after_crash is
on.
i.e.
ereport(restart_after_crash ? LOG : DEBUG1, ...

Regards
MauMau

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-07-22 13:39:13 Re: [bug fix] Suppress "autovacuum: found orphan temp table" message
Previous Message Simon Riggs 2014-07-22 12:46:13 Re: Production block comparison facility