Re: Proposal: Log inability to lock pages during vacuum

From: Greg Stark <stark(at)mit(dot)edu>
To: Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Log inability to lock pages during vacuum
Date: 2014-10-20 15:29:58
Message-ID: CAM-w4HNpoj_qfPY+7juVrcFhR=Gbk3tpFcPc_5q8R-tdmbsinQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 20, 2014 at 2:57 AM, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com> wrote:
> Currently, a non-freeze vacuum will punt on any page it can't get a cleanup
> lock on, with no retry. Presumably this should be a rare occurrence, but I
> think it's bad that we just assume that and won't warn the user if something
> bad is going on.
>
> My thought is that if we skip any pages elog(LOG) how many we skipped. If we
> skip more than 1% of the pages we visited (not relpages) then elog(WARNING)
> instead.

Is there some specific failure you've run into where a page was stuck
in a pinned state and never got vacuumed?

I would like to see a more systematic way of going about this. What
LSN or timestamp is associated with the oldest unvacuumed page? How
many times have we tried to visit it? What do those numbers look like
overall -- i.e. what's the median number of times it takes to vacuum a
page and what does the distribution look like of the unvacuumed ages?

With that data it should be possible to determine if the behaviour is
actually working well and where to draw the line to determine outliers
that might represent bugs.

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Brightwell, Adam 2014-10-20 15:30:42 Re: alter user/role CURRENT_USER
Previous Message Noah Misch 2014-10-20 15:24:26 Re: narwhal and PGDLLIMPORT