Re: 8.3.0 Core with concurrent vacuum fulls

From: "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Gavin M(dot) Roy" <gmr(at)myyearbook(dot)com>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 8.3.0 Core with concurrent vacuum fulls
Date: 2008-03-05 10:11:53
Message-ID: 2e78013d0803050211t73fae3f2j1da5760e55c0173b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 5, 2008 at 8:26 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Gavin M. Roy" <gmr(at)myyearbook(dot)com> writes:
> > (gdb) where
> > #0 0x0000003fe362e21d in raise () from /lib64/tls/libc.so.6
> > #1 0x0000003fe362fa1e in abort () from /lib64/tls/libc.so.6
> > #2 0x000000000063a2e3 in errfinish ()
> > #3 0x00000000005974c4 in DeadLockReport ()
> > #4 0x000000000059381f in LockAcquire ()
> > #5 0x0000000000592357 in LockRelationOid ()
> > #6 0x0000000000457255 in relation_open ()
> > #7 0x00000000004574c3 in heap_open ()
> > #8 0x000000000062cf41 in CatalogCacheInitializeCache ()
> > #9 0x000000000062dfad in PrepareToInvalidateCacheTuple ()
> > #10 0x000000000062e8c5 in CacheInvalidateHeapTuple ()
> > #11 0x000000000045c803 in heap_page_prune ()
> > #12 0x00000000005086cd in vacuum_rel ()
> > #13 0x00000000005096bb in vacuum ()
> > #14 0x00000000005a163b in PortalRunUtility ()
> > #15 0x00000000005a1714 in PortalRunMulti ()
> > #16 0x00000000005a1d30 in PortalRun ()
> > #17 0x000000000059f4b6 in PostgresMain ()
> > #18 0x00000000005760c0 in ServerLoop ()
> > #19 0x0000000000577770 in PostmasterMain ()
> > #20 0x000000000052fd3e in main ()
>
> So what did DeadLockReport put in the server log before panic'ing?
>
> I'm wondering exactly why CatalogCacheInitializeCache is being called
> here --- seems like that should have been done long before we got to
> VACUUM. But it would be useful to know just what deadlock it saw.
>

Seems like its possible that the second phase of catalog cache initialization
is not done when VACUUM FULL is called, especially if VACUUM FULL is
the first command/query in the backend.

InitCatalogCachePhase2() is called only if new cache file needs to be
written.

/*
* Lastly, write out a new relcache cache file if one is needed.
*/
if (needNewCacheFile)
{
/*
* Force all the catcaches to finish initializing and thereby open the
* catalogs and indexes they use. This will preload the relcache with
* entries for all the most important system catalogs and indexes, so
* that the init file will be most useful for future backends.
*/
InitCatalogCachePhase2();

/* now write the file */
write_relcache_init_file();
}

We haven't yet seen the deadlock message, but here is my theory of a possible
deadlock scenario:

Two backends try to vacuum full two different catalog tables. Each acquires an
exclusive lock on the respective catalog relation. Then each try to
initialize its
own catalog cache. But to do that they need AccessShareLock on each other's
table leading to a deadlock.

Why not just unconditionally finish the phase 2 as part of
InitPostgres ? I understand
that we may end up initializing caches that we don't need in that
session, but there
might be other places where this deadlock is waiting to happen.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavan Deolasee 2008-03-05 10:46:07 Re: 8.3.0 Core with concurrent vacuum fulls
Previous Message Heikki Linnakangas 2008-03-05 10:02:48 Re: "could not open relation 1663/16384/16584: No such file or directory" in a specific combination of transactions with temp tables