Re: 8.3.0 Core with concurrent vacuum fulls

From: "Gavin M(dot) Roy" <gmr(at)myyearbook(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 8.3.0 Core with concurrent vacuum fulls
Date: 2008-03-05 15:27:32
Message-ID: af1bce590803050727y13991498i445cd8a9c75f8be1@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 5, 2008 at 10:13 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> I wrote:
> > In particular, if that's the problem, why has this not been seen before?
> > The fact that it's going through heap_page_prune doesn't seem very
> > relevant --- VACUUM FULL has certainly always had to invoke
> > CacheInvalidateHeapTuple someplace or other. So I still want to see
> > the deadlock report ... we at least need to know which tables are
> > involved in the deadlock.
>
> Actually, maybe it *has* been seen before. Gavin, are you in the habit
> of running concurrent VACUUM FULLs on system catalogs, and if so have
> you noted that they occasionally get deadlock failures?

Generally no, I've never noticed deadlocks before, but I'll go back and
look at some of the other the machines.

> A separate line of thought is whether it's a good idea that
> > heap_page_prune calls the inval code inside a critical section.
> > That's what's turning an ordinary deadlock failure into a PANIC.
> > Even without the possibility of having to do cache initialization,
> > that's a lot of code to be invoking, and it has obvious failure
> > modes (eg, out of memory for the new inval list item).
>
> The more I think about this the more I don't like it. The critical
> section in heap_page_prune is *way* too big. Aside from the inval
> call, there are HeapTupleSatisfiesVacuum() calls, which could have
> failures during attempted clog references.
>
> The reason the critical section is so large is that we're manipulating
> the contents of a shared buffer, and we don't want a failure to leave a
> partially-modified page in the buffer. We could fix that if we were to
> memcpy the page into local storage and do all the pruning work there.
> Then the critical section would only surround copying the page back to
> the buffer and writing the WAL record. Copying the page is a tad
> annoying but heap_page_prune is an expensive operation anyway, and
> I think we really are at too much risk of PANIC the way it's being done
> now. Has anyone got a better idea?
>
> regards, tom lane
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-03-05 15:31:05 Re: 8.3.0 Core with concurrent vacuum fulls
Previous Message Tom Lane 2008-03-05 15:13:37 Re: 8.3.0 Core with concurrent vacuum fulls