Re: Relation locking and relcache load (was Re: Going for "all green" buildfarm results)

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Relation locking and relcache load (was Re: Going for "all green" buildfarm results)
Date: 2006-07-30 18:39:44
Message-ID: 2880.1154284784@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> I bet Alvaro's spotted the problem. ALTER INDEX RENAME doesn't seem to
> take any lock on the index's parent table, only on the index itself.
> That means that a query on "onek" could be trying to read the pg_class
> entries for onek's indexes concurrently with someone trying to commit
> a pg_class update to rename an index. If the query manages to visit
> the new and old versions of the row in that order, and the commit
> happens between, *neither* of the versions would look valid. MVCC
> doesn't save us because this is all SnapshotNow.

On further reflection, it seems this is a special case of the general
problem that we ought to lock a relation *before* we attempt to load
the relcache entry, not after. We've seen previous complaints about
this, for instance a process erroring out with a message about
pg_trigger having a different number of entries than it expected because
someone had committed an ADD/DROP TRIGGER between it reading pg_class
and reading pg_trigger.

A long time ago (30 Oct 2002) I wrote:

> Thinking further, it's really kinda bogus that LockRelation() works on
> an already-opened Relation; if possible we should acquire the lock
> before attempting to create a relcache entry. (We only need to know the
> OID and the relisshared status before we can make a locktag, so it'd be
> possible to acquire the lock using only the contents of the pg_class row.)
> Not sure how much code restructuring might be involved to make this
> happen, but it'd be worth thinking about for 7.4.

but that never got off the back burner. The current example shows that
"read the pg_class row and then lock" doesn't work anyway, because we
might fail to find any version of the pg_class row that passes
SnapshotNow.

One thing that has improved since 7.3 is that we only do relcache loads
by OID, never by name. This means the only thing stopping us from
taking lock before we invoke relcache is lack of knowledge about the
rel's relisshared status. Given that the set of shared relations is
pretty small, and fixed in any given backend version, it wouldn't be
unreasonable to handle this by keeping a hardwired list of shared
relation OIDs somewhere in the backend.

Anyway I really think we need to move to a coding rule that says thou
shalt not access a relcache entry without *already* having some kind
of lock on the relation.

Comments?

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Relation locking and relcache load (was Re: Going for "all green" buildfarm results)
Date: 2006-07-30 21:19:04
Message-ID: 3887.1154294344@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> ... This means the only thing stopping us from
> taking lock before we invoke relcache is lack of knowledge about the
> rel's relisshared status. Given that the set of shared relations is
> pretty small, and fixed in any given backend version, it wouldn't be
> unreasonable to handle this by keeping a hardwired list of shared
> relation OIDs somewhere in the backend.

On further investigation, there is one small stumbling block in that
plan. We currently have hard-wired OIDs for shared catalogs and their
indexes ... but not for their toast tables (nor the indexes thereon).

I think the best solution for this might be to put the responsibility
for creating system catalogs' toast tables into the bootstrap phase
instead of making initdb do it afterwards. This would be a Good Thing
anyway since currently we are incapable of dealing with bootstrap-time
insertions of values large enough to need toasting. I'm imagining
adding macros to the include/catalog/*.h files along the lines of

TOAST_CATALOG(pg_class,4242,4243)

where the numbers are hand-assigned OIDs for the toast table and index.
The existing mechanisms for creating the .bki bootstrap script would
turn these into commands in the .bki script.

We could then get rid of ALTER TABLE ... CREATE TOAST TABLE as a SQL
command altogether, which seems like a good thing to me. Anyone want to
argue for keeping it? There really shouldn't be any case where a user
needs to invoke it.

Thoughts, objections?

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Relation locking and relcache load (was Re: Going for "all green" buildfarm results)
Date: 2006-07-31 14:09:57
Message-ID: 14291.1154354997@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
>> ... This means the only thing stopping us from
>> taking lock before we invoke relcache is lack of knowledge about the
>> rel's relisshared status.

While digging through all the places that open relcache entries,
I've realized that there's another problem, specifically the way that
we lock indexes. The current theory is that index_open() takes no lock,
and then we establish a lock just for the duration of an index scan.
The comments for index_open explain:

* Note: we acquire no lock on the index. A lock is not needed when
* simply examining the index reldesc; the index's schema information
* is considered to be protected by the lock that the caller had better
* be holding on the parent relation. Some type of lock should be
* obtained on the index before physically accessing it, however.
* This is handled automatically for most uses by index_beginscan
* and index_endscan for scan cases, or by ExecOpenIndices and
* ExecCloseIndices for update cases. Other callers will need to
* obtain their own locks.

However, the lionfish failure makes the folly of this approach evident
(it was in fact index_open that failed in that example, unless my theory
about it is all wrong). If we're going to move to lock-before-open then
we've got to fix index_open too. I envision making index_open just
about like heap_open, ie add a lockmode parameter, and then get rid of
the separate lock step in index_beginscan.

There is one small problem with doing that, which is this code in
ExecOpenIndices:

indexDesc = index_open(indexOid);

if (indexDesc->rd_am->amconcurrent)
LockRelation(indexDesc, RowExclusiveLock);
else
LockRelation(indexDesc, AccessExclusiveLock);

IOW you need to already have the index open to find out what sort of
lock to take on it.

I have a modest proposal for fixing that: let's get rid of amconcurrent,
so that RowExclusiveLock is always the right lock to take here. All of
the currently supported AMs have amconcurrent = true, and so does the
proposed bitmap index patch, and given the project's current focus on
high concurrent performance I cannot imagine that we'd accept a future
patch to add a nonconcurrent index type. So let's just legislate that
all AMs have to support concurrent updates (or at least that they can't
rely on the main system to protect them from the case).

Any objections?

regards, tom lane


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Relation locking and relcache load (was Re: Going for "all green" buildfarm results)
Date: 2006-07-31 14:18:47
Message-ID: 20060731141846.GZ20016@kenobi.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> I think the best solution for this might be to put the responsibility
> for creating system catalogs' toast tables into the bootstrap phase
> instead of making initdb do it afterwards. This would be a Good Thing
> anyway since currently we are incapable of dealing with bootstrap-time
> insertions of values large enough to need toasting. I'm imagining
> adding macros to the include/catalog/*.h files along the lines of

Would this make it much more difficult to support user-defined indexes
on system catalogs? It looks like we don't support that at the moment
but as we see larger Postgres installations it seems likely we'll need
to. I don't really consider myself a very heavy Postgres user but I've
got databases w/ > 30k entries in pg_class and near 300k in
pg_attribute... Those aren't shared but it sounds like you were talking
about all of them above anyway.

Thanks,

Stephen


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Relation locking and relcache load (was Re: Going for "all green" buildfarm results)
Date: 2006-07-31 14:35:15
Message-ID: 14516.1154356515@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Stephen Frost <sfrost(at)snowman(dot)net> writes:
> * Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
>> I think the best solution for this might be to put the responsibility
>> for creating system catalogs' toast tables into the bootstrap phase
>> instead of making initdb do it afterwards.

> Would this make it much more difficult to support user-defined indexes
> on system catalogs?

AFAICS the problems with that are orthogonal to this. You'll never have
user-defined (as in "added after initdb") indexes on shared catalogs,
because there is no way to update their pg_class descriptions in all
databases at once. For non-shared catalogs there's nothing except
access permissions stopping you from adding ordinary indexes now.
We don't support partial or functional indexes on system catalogs,
but the implementation reasons for that are unrelated to what I'm doing.

> It looks like we don't support that at the moment
> but as we see larger Postgres installations it seems likely we'll need
> to. I don't really consider myself a very heavy Postgres user but I've
> got databases w/ > 30k entries in pg_class and near 300k in
> pg_attribute...

And are you seeing any performance issues related to lack of indexes?
For the system catalogs we understand the access patterns pretty well
(I think), and I thought we pretty much had the right indexes on them
already.

regards, tom lane


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Relation locking and relcache load (was Re: Going for "all green" buildfarm results)
Date: 2006-07-31 17:50:23
Message-ID: 20060731175023.GA20016@kenobi.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> Stephen Frost <sfrost(at)snowman(dot)net> writes:
> > * Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> >> I think the best solution for this might be to put the responsibility
> >> for creating system catalogs' toast tables into the bootstrap phase
> >> instead of making initdb do it afterwards.
>
> > Would this make it much more difficult to support user-defined indexes
> > on system catalogs?
>
> AFAICS the problems with that are orthogonal to this. You'll never have
> user-defined (as in "added after initdb") indexes on shared catalogs,
> because there is no way to update their pg_class descriptions in all
> databases at once.

Ok.

> For non-shared catalogs there's nothing except
> access permissions stopping you from adding ordinary indexes now.

I had thought this might be the case since I had some recollection of
indexes on catalogs either being speculated about or suggested on
-perform. The error-message isn't entirely clear about this fact
though:

src/backend/catalog/index.c:495 (or so)
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("user-defined indexes on system catalog tables are not supported")));

> And are you seeing any performance issues related to lack of indexes?

Depends on the eye of the beholder to some extent I suppose.

> For the system catalogs we understand the access patterns pretty well
> (I think), and I thought we pretty much had the right indexes on them
> already.

The case that I was specifically thinking about was the relowner in
pg_class not being indexed.

tsf=> explain analyze select cl.relname from pg_authid a join pg_class
cl on (a.oid = cl.relowner) where a.rolname = 'postgres';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=2.54..1970.25 rows=383 width=64) (actual
time=0.113..77.950 rows=223 loops=1)
Hash Cond: ("outer".relowner = "inner".oid)
-> Seq Scan on pg_class cl (cost=0.00..1881.59 rows=16459 width=68)
(actual time=0.036..46.607 rows=17436 loops=1)
-> Hash (cost=2.54..2.54 rows=1 width=4) (actual time=0.057..0.057
rows=1 loops=1)
-> Seq Scan on pg_authid a (cost=0.00..2.54 rows=1 width=4)
(actual time=0.047..0.050 rows=1 loops=1)
Filter: (rolname = 'postgres'::name)
Total runtime: 78.358 ms
(7 rows)

It's not exactly *slow* but an index might speed it up. I was trying to
create one and couldn't figure out the right incantation to make it
happen. 'allow_system_table_mods = true' wasn't working in
postgresql.conf (it wouldn't start) for some reason...

Other system-catalog queries that I've been a little unhappy about the
performance of (though I don't know if indexes would help, so this is
really just me complaining) are: initial table list in ODBC w/ Access
(takes *forever* when you have alot of tables...); schema/table lists in
phppgadmin when there are alot of schemas/tables; information_schema
queries (try looking at information_schema.columns for a given table
when you've got alot of tables... over 10x slower than looking at
pg_class/pg_attribute directly, 3 seconds vs. 200ms, or so).

Thanks,

Stephen