Re: cost-based vacuum

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Ian Westmacott <ianw(at)intellivid(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: cost-based vacuum
Date: 2005-07-13 20:39:48
Message-ID: 1121287188.3970.296.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, 2005-07-13 at 14:58 -0400, Tom Lane wrote:
> Ian Westmacott <ianw(at)intellivid(dot)com> writes:
> > On Wed, 2005-07-13 at 11:55, Simon Riggs wrote:
> >> On Tue, 2005-07-12 at 13:50 -0400, Ian Westmacott wrote:
> >>> It appears not to matter whether it is one of the tables
> >>> being written to that is ANALYZEd. I can ANALYZE an old,
> >>> quiescent table, or a system table and see this effect.
> >>
> >> Can you confirm that this effect is still seen even when the ANALYZE
> >> doesn't touch *any* of the tables being accessed?
>
> > Yes.
>
> This really isn't making any sense at all.

Agreed. I think all of this indicates that some wierdness (technical
term) is happening at a different level in the computing stack. I think
all of this points fairly strongly to it *not* being a PostgreSQL
algorithm problem, i.e. if the code was executed by an idealised Knuth-
like CPU then we would not get this problem. Plus, I have faith that if
it was a problem in that "plane" then you or another would have
uncovered it by now.

> However, these certainly do not explain Ian's problem, because (a) these
> only apply to VACUUM, not ANALYZE; (b) they would only lock the table
> being VACUUMed, not other ones; (c) if these locks were to block the
> reader or writer thread, it'd manifest as blocking on a semaphore, not
> as a surge in LWLock thrashing.

I've seen enough circumstantial evidence to connect the time spent
inside LWLockAcquire/Release as being connected to the Semaphore ops
within them, not the other aspects of the code.

Months ago we discussed the problem of false sharing on closely packed
arrays of shared variables because of the large cache line size of the
Xeon MP. When last we touched on that thought, I focused on the thought
that the LWLock array was too tightly packed for the predefined locks.
What we didn't discuss (because I was too focused on the other array)
was the PGPROC shared array is equally tightly packed, which could give
problems on the semaphores in LWLock.

Intel says fairly clearly that this would be an issue.

> >> Is that Xeon MP then?
>
> > Yes.
>
> The LWLock activity is certainly suggestive of prior reports of
> excessive buffer manager lock contention, but it makes *no* sense that
> that would be higher with vacuum cost delay than without. I'd have
> expected the other way around.
>
> I'd really like to see a test case for this...

My feeling is that a "micro-architecture" test would be more likely to
reveal some interesting information.

Best Regards, Simon Riggs

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Vivek Khera 2005-07-13 20:49:31 Re: Quad Opteron stuck in the mud
Previous Message Dan Harris 2005-07-13 20:20:24 Re: Quad Opteron stuck in the mud