Visibility Groups

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Visibility Groups
Date: 2008-08-07 10:38:38
Message-ID: 1218105518.4549.365.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Currently, we calculate a single OldestXmin across all snapshots on the
assumption that any transaction might access any table.

I propose creating "Visibility Groups" that *explicitly* limit the
ability of a transaction to access data outside its visibility group(s).
By default, visibility_groups would be NULL, implying potential access
to all tables.

Once set, any attempt to lock an object outside of a transactions
defined visibility_groups will result in an error:
ERROR attempt to lock table outside of visibility group(s): foo
HINT you need to set a different value for visibility_groups
A transaction can only ever reduce or restrict its visibility_groups, it
cannot reset or add visibility groups.

This would give us the ability to explicitly prevent long running
statements from interfering with updates of critical tables, when those
tables are not accessed.

The visibility_groups list would be held on each PGPROC, so that a
snapshot will be able to calculate an xmin for each visibility group.
When checking visibility of rows using HeapTupleSatisfiesVacuum() we
would use the oldest xmin for the visibility group of the table, rather
than the single global value.

If no visibility groups are in use then everything happens just as it
does now, with only a single "if" test in GetSnapshotData() and
HeapTupleSatisfiesVacuum().

Patch would require providing info through to HeapTupleSatisfiesVacuum()
to allow it to resolve the appropriate xmin to use for visibility
checks, since it will have more than one to choose from. Various ways of
doing that.

We might like the visibility group to be set automatically but that
seems like a harder problem. I do not propose to solve that here. This
general idea has been proposed before, but we always get hung up on our
inability to automatically determine the visibility group. Let's just do
this explicitly, so we can get benefit in the cases where we know very
clearly which tables we'll access and more importantly, which we won't.

How do we specify visibility groups? Well various ways, various syntax,
so that is up for discussion and debate. This might be the place the
concept falls down, maybe where it starts having wings.

* SET visibility_groups = table, schema.*
* define visibility groups using a function:
create_visibility_group('name', 'table/schema list')
* specify them via ALTER TABLE etc

This idea is connected somewhat to replication, so floating it now to
see how viable a concept people think this is. I'm floating the idea in
a fairly neutral way in the hope that it leads others to even
better/more workable proposals, possibly now, possibly over the next few
years.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jochem van Dieten 2008-08-07 10:55:56 Re: Visibility Groups
Previous Message Magnus Hagander 2008-08-07 09:20:22 Re: Parsing of pg_hba.conf and authentication inconsistencies