Re: [COMMITTERS] pgsql: Create a "relation mapping" infrastructure to support changing

Lists: pgsql-committerspgsql-hackers
From: tgl(at)postgresql(dot)org (Tom Lane)
To: pgsql-committers(at)postgresql(dot)org
Subject: pgsql: Create a "relation mapping" infrastructure to support changing
Date: 2010-02-07 20:48:13
Message-ID: 20100207204813.D88FB7541B9@cvs.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Log Message:
-----------
Create a "relation mapping" infrastructure to support changing the relfilenodes
of shared or nailed system catalogs. This has two key benefits:

* The new CLUSTER-based VACUUM FULL can be applied safely to all catalogs.

* We no longer have to use an unsafe reindex-in-place approach for reindexing
shared catalogs.

CLUSTER on nailed catalogs now works too, although I left it disabled on
shared catalogs because the resulting pg_index.indisclustered update would
only be visible in one database.

Since reindexing shared system catalogs is now fully transactional and
crash-safe, the former special cases in REINDEX behavior have been removed;
shared catalogs are treated the same as non-shared.

This commit does not do anything about the recently-discussed problem of
deadlocks between VACUUM FULL/CLUSTER on a system catalog and other
concurrent queries; will address that in a separate patch. As a stopgap,
parallel_schedule has been tweaked to run vacuum.sql by itself, to avoid
such failures during the regression tests.

Modified Files:
--------------
pgsql/contrib/oid2name:
oid2name.c (r1.36 -> r1.37)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/contrib/oid2name/oid2name.c?r1=1.36&r2=1.37)
pgsql/doc/src/sgml:
catalogs.sgml (r2.220 -> r2.221)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/doc/src/sgml/catalogs.sgml?r1=2.220&r2=2.221)
diskusage.sgml (r1.19 -> r1.20)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/doc/src/sgml/diskusage.sgml?r1=1.19&r2=1.20)
func.sgml (r1.500 -> r1.501)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/doc/src/sgml/func.sgml?r1=1.500&r2=1.501)
pgbuffercache.sgml (r2.5 -> r2.6)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/doc/src/sgml/pgbuffercache.sgml?r1=2.5&r2=2.6)
storage.sgml (r1.30 -> r1.31)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/doc/src/sgml/storage.sgml?r1=1.30&r2=1.31)
pgsql/doc/src/sgml/ref:
cluster.sgml (r1.47 -> r1.48)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/doc/src/sgml/ref/cluster.sgml?r1=1.47&r2=1.48)
reindex.sgml (r1.38 -> r1.39)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/doc/src/sgml/ref/reindex.sgml?r1=1.38&r2=1.39)
pgsql/src/backend/access/index:
genam.c (r1.79 -> r1.80)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/access/index/genam.c?r1=1.79&r2=1.80)
pgsql/src/backend/access/transam:
rmgr.c (r1.28 -> r1.29)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/access/transam/rmgr.c?r1=1.28&r2=1.29)
xact.c (r1.282 -> r1.283)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/access/transam/xact.c?r1=1.282&r2=1.283)
xlog.c (r1.366 -> r1.367)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/access/transam/xlog.c?r1=1.366&r2=1.367)
pgsql/src/backend/bootstrap:
bootparse.y (r1.104 -> r1.105)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/bootstrap/bootparse.y?r1=1.104&r2=1.105)
bootstrap.c (r1.258 -> r1.259)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/bootstrap/bootstrap.c?r1=1.258&r2=1.259)
pgsql/src/backend/catalog:
catalog.c (r1.87 -> r1.88)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/catalog/catalog.c?r1=1.87&r2=1.88)
heap.c (r1.369 -> r1.370)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/catalog/heap.c?r1=1.369&r2=1.370)
index.c (r1.332 -> r1.333)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/catalog/index.c?r1=1.332&r2=1.333)
storage.c (r1.7 -> r1.8)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/catalog/storage.c?r1=1.7&r2=1.8)
toasting.c (r1.29 -> r1.30)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/catalog/toasting.c?r1=1.29&r2=1.30)
pgsql/src/backend/commands:
cluster.c (r1.197 -> r1.198)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/commands/cluster.c?r1=1.197&r2=1.198)
indexcmds.c (r1.190 -> r1.191)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/commands/indexcmds.c?r1=1.190&r2=1.191)
tablecmds.c (r1.324 -> r1.325)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/commands/tablecmds.c?r1=1.324&r2=1.325)
vacuum.c (r1.403 -> r1.404)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/commands/vacuum.c?r1=1.403&r2=1.404)
pgsql/src/backend/executor:
execMain.c (r1.344 -> r1.345)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/executor/execMain.c?r1=1.344&r2=1.345)
pgsql/src/backend/parser:
parse_clause.c (r1.195 -> r1.196)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/parser/parse_clause.c?r1=1.195&r2=1.196)
pgsql/src/backend/utils/adt:
dbsize.c (r1.28 -> r1.29)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/adt/dbsize.c?r1=1.28&r2=1.29)
pgsql/src/backend/utils/cache:
Makefile (r1.25 -> r1.26)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/cache/Makefile?r1=1.25&r2=1.26)
catcache.c (r1.148 -> r1.149)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/cache/catcache.c?r1=1.148&r2=1.149)
inval.c (r1.93 -> r1.94)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/cache/inval.c?r1=1.93&r2=1.94)
relcache.c (r1.302 -> r1.303)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/cache/relcache.c?r1=1.302&r2=1.303)
pgsql/src/backend/utils/init:
miscinit.c (r1.180 -> r1.181)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/init/miscinit.c?r1=1.180&r2=1.181)
pgsql/src/bin/pg_dump:
pg_dump.c (r1.569 -> r1.570)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/bin/pg_dump/pg_dump.c?r1=1.569&r2=1.570)
pgsql/src/include/access:
rmgr.h (r1.20 -> r1.21)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/include/access/rmgr.h?r1=1.20&r2=1.21)
pgsql/src/include/catalog:
catalog.h (r1.47 -> r1.48)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/include/catalog/catalog.h?r1=1.47&r2=1.48)
catversion.h (r1.582 -> r1.583)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/include/catalog/catversion.h?r1=1.582&r2=1.583)
heap.h (r1.96 -> r1.97)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/include/catalog/heap.h?r1=1.96&r2=1.97)
index.h (r1.81 -> r1.82)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/include/catalog/index.h?r1=1.81&r2=1.82)
pg_class.h (r1.120 -> r1.121)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/include/catalog/pg_class.h?r1=1.120&r2=1.121)
pg_proc.h (r1.567 -> r1.568)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/include/catalog/pg_proc.h?r1=1.567&r2=1.568)
storage.h (r1.4 -> r1.5)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/include/catalog/storage.h?r1=1.4&r2=1.5)
pgsql/src/include/commands:
cluster.h (r1.39 -> r1.40)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/include/commands/cluster.h?r1=1.39&r2=1.40)
pgsql/src/include:
miscadmin.h (r1.217 -> r1.218)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/include/miscadmin.h?r1=1.217&r2=1.218)
pgsql/src/include/storage:
lwlock.h (r1.43 -> r1.44)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/include/storage/lwlock.h?r1=1.43&r2=1.44)
relfilenode.h (r1.24 -> r1.25)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/include/storage/relfilenode.h?r1=1.24&r2=1.25)
sinval.h (r1.56 -> r1.57)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/include/storage/sinval.h?r1=1.56&r2=1.57)
pgsql/src/include/utils:
builtins.h (r1.346 -> r1.347)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/include/utils/builtins.h?r1=1.346&r2=1.347)
catcache.h (r1.69 -> r1.70)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/include/utils/catcache.h?r1=1.69&r2=1.70)
inval.h (r1.47 -> r1.48)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/include/utils/inval.h?r1=1.47&r2=1.48)
rel.h (r1.121 -> r1.122)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/include/utils/rel.h?r1=1.121&r2=1.122)
relcache.h (r1.67 -> r1.68)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/include/utils/relcache.h?r1=1.67&r2=1.68)
pgsql/src/test/regress/expected:
vacuum.out (r1.3 -> r1.4)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/test/regress/expected/vacuum.out?r1=1.3&r2=1.4)
pgsql/src/test/regress:
parallel_schedule (r1.58 -> r1.59)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/test/regress/parallel_schedule?r1=1.58&r2=1.59)

Added Files:
-----------
pgsql/src/backend/utils/cache:
relmapper.c (r1.1)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/cache/relmapper.c?rev=1.1&content-type=text/x-cvsweb-markup)
pgsql/src/include/utils:
relmapper.h (r1.1)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/include/utils/relmapper.h?rev=1.1&content-type=text/x-cvsweb-markup)


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)postgresql(dot)org>
Cc: pgsql-committers(at)postgresql(dot)org
Subject: Re: pgsql: Create a "relation mapping" infrastructure to support changing
Date: 2010-02-07 22:29:17
Message-ID: 1265581757.27207.202.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Sun, 2010-02-07 at 20:48 +0000, Tom Lane wrote:
> Log Message:
> -----------
> Create a "relation mapping" infrastructure to support changing the relfilenodes
> of shared or nailed system catalogs. This has two key benefits:
>
> * The new CLUSTER-based VACUUM FULL can be applied safely to all catalogs.
>
> * We no longer have to use an unsafe reindex-in-place approach for reindexing
> shared catalogs.
>
> CLUSTER on nailed catalogs now works too, although I left it disabled on
> shared catalogs because the resulting pg_index.indisclustered update would
> only be visible in one database.
>
> Since reindexing shared system catalogs is now fully transactional and
> crash-safe, the former special cases in REINDEX behavior have been removed;
> shared catalogs are treated the same as non-shared.
>
> This commit does not do anything about the recently-discussed problem of
> deadlocks between VACUUM FULL/CLUSTER on a system catalog and other
> concurrent queries; will address that in a separate patch. As a stopgap,
> parallel_schedule has been tweaked to run vacuum.sql by itself, to avoid
> such failures during the regression tests.

Awesome!

Few questions:

We XLogFlush() in normal running, but not in recovery. Would it be
possible for the map changes to hit disk after the data changes in that
case?

What we're saying is if we process a relmap update WAL record then it
doesn't really matter whether or not we commit that transaction?

> func.sgml (r1.500 -> r1.501)
> (http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/doc/src/sgml/func.sgml?r1=1.500&r2=1.501)

It would be useful to mention that the return value is volatile and can
change across transactions or even within a transaction, just like ctid.

> catalogs.sgml (r2.220 -> r2.221)
> (http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/doc/src/sgml/catalogs.sgml?r1=2.220&r2=2.221)

No mention of pg_relation_filepath/filenode in there.

--
Simon Riggs www.2ndQuadrant.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-committers(at)postgresql(dot)org
Subject: Re: pgsql: Create a "relation mapping" infrastructure to support changing
Date: 2010-02-07 22:52:02
Message-ID: 8688.1265583122@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> Few questions:

> We XLogFlush() in normal running, but not in recovery. Would it be
> possible for the map changes to hit disk after the data changes in that
> case?

No, because the map change is flushed to disk on sight of the WAL record
anyway.

> What we're saying is if we process a relmap update WAL record then it
> doesn't really matter whether or not we commit that transaction?

Right. relmap update is committed independently of the surrounding
transaction.

> It would be useful to mention that the return value is volatile and can
> change across transactions or even within a transaction, just like ctid.

Hm. I thought it was more reasonable to treat it as "stable", which is
the normal case. You're right that there are some corner cases where it
could change without the caller's snapshot changing, but I'm not sure
it's worth worrying about.

regards, tom lane


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-committers(at)postgresql(dot)org
Subject: Re: pgsql: Create a "relation mapping" infrastructure to support changing
Date: 2010-02-07 23:06:41
Message-ID: 1265584001.27207.237.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Sun, 2010-02-07 at 17:52 -0500, Tom Lane wrote:
>
> > It would be useful to mention that the return value is volatile and
> can change across transactions or even within a transaction, just like
> ctid.
>
> Hm. I thought it was more reasonable to treat it as "stable", which
> is the normal case. You're right that there are some corner cases
> where it could change without the caller's snapshot changing, but I'm
> not sure it's worth worrying about.

OK, even if its stable I think we need to mention it because it's not
obvious that the file we keep catalog data in could change name. It's a
also a change from earlier releases where that would never happen.

--
Simon Riggs www.2ndQuadrant.com


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [COMMITTERS] pgsql: Create a "relation mapping" infrastructure to support changing
Date: 2010-02-08 13:47:02
Message-ID: 20100208134702.GB4113@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers


I just noticed that this patch

> Create a "relation mapping" infrastructure to support changing the relfilenodes
> of shared or nailed system catalogs. This has two key benefits:

creates a new function pg_relation_filenode() that only uses the
syscache to fetch the relation's filenode, without locking it. I wonder
if we could do the same in the pg_relation_size() function and friends,
to avoid having to grab a lock on the relation.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [COMMITTERS] pgsql: Create a "relation mapping" infrastructure to support changing
Date: 2010-02-08 15:15:47
Message-ID: 10292.1265642147@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> I just noticed that this patch

>> Create a "relation mapping" infrastructure to support changing the relfilenodes
>> of shared or nailed system catalogs. This has two key benefits:

> creates a new function pg_relation_filenode() that only uses the
> syscache to fetch the relation's filenode, without locking it. I wonder
> if we could do the same in the pg_relation_size() function and friends,
> to avoid having to grab a lock on the relation.

I don't think it's a good idea to try to do physical access to the
relation without any lock. The filenode function is a bit special
because it doesn't need anything except the pg_class row itself.
(Except in the case of a mapped relationn, but the underlying
mapping entry is unlikely to disappear, too.)

regards, tom lane