Re: CLUSTER and MVCC

From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Csaba Nagy <nagy(at)ecircle-ag(dot)com>
Cc: postgres hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CLUSTER and MVCC
Date: 2007-03-09 15:47:40
Message-ID: 45F1819C.9050500@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Csaba Nagy wrote:
>> Hmm. You could use something along these lines instead:
>>
>> 0. LOCK TABLE queue_table
>> 1. SELECT * INTO queue_table_new FROM queue_table
>> 2. DROP TABLE queue_table
>> 3. ALTER TABLE queue_table_new RENAME queue_table
>>
>> After all, it's not that you care about the clustering of the table, you
>> just want to remove old tuples.
>
> ... and then restart the app so all my pooled connections drop their
> cached plans ;-)

Yeah, though Tom's working on plan invalidation for 8.3, so that
wouldn't be an issue.

> Seriously, that won't work. If a session tries to insert a new row after
> I lock the table to clean it up, I still want it to be able to insert
> after the cleanup is finished... if I drop the table it tries to insert
> to, it will fail.

Hmm. How about:

1. LOCK TABLE queue_table
2. SELECT * INTO temp_table FROM queue_table
3. TRUNCATE queue_table
4. INSERT INTO queue_table SELECT * FROM temp_table

That way you're copying the rows twice, but if there isn't many live
tuples it shouldn't matter too much.

>> As a long term solution, it would be nice if we had more fine-grained
>> bookkeeping of snapshots that are in use in the system. In your case,
>> there's a lot of tuples that are not visible to pg_dump because xmin is
>> too new, and also not visible to any other transaction because xmax is
>> too old. If we had a way to recognize situations like that, and vacuum
>> those tuples, much of the problem with long-running transactions would
>> go away.
>
> In the general case that won't work either in a strict MVCC sense... if
> you have an old transaction, you should never clean up a dead tuple
> which could be still visible to it.

We wouldn't clean up tuples that are visible to a transaction, but if
you have one long-running transaction like pg_dump in a database with
otherwise short transaction, you'll have a lot of tuples that are not
vacuumable because of the long-running process, but are not in fact
visible to any transaction. That's transactions that were inserted too
late to be seen by the old transaction, and deleted too long time ago to
be seen by any other transaction. Let me illustrate this with a timeline:

xmin1 xmax1
| |
-----+--X-X+X-+ooooooooooooooXoooooXoXoXXo+------>now
| |
xmin2 xmax2

xmin1 and xmax1 are the xmin and xmax of an old, long-running
serializable transaction, like pg_dump. The Xs between them are xids of
transactions that the old transaction sees as in-progress, IOW the
SnapshotData.xip-array.

xmin2 and xmax2 are the xmin and xmax of a newer transaction. Because of
the old-running transaction, xmin2 is far behind xmax2, but there's a
wide gap between that and the next transaction that the newer
transaction sees as in-progress.

The current rule to determine if a tuple is dead or not is to check that
tuple's xmax < oldestxmin. Oldestxmin is in this case xmin1. But in
addition to that, any tuple with an xmin > xmax1 and xmax that's not in
the xip-array of any snapshot in use (marked with o above), isn't
visible to any current or future transaction and can therefore be safely
vacuumed.

The implementation problem is that we don't have a global view of all
snapshots in the system. If we solve that, we can be more aggressive
with vacuuming in presence of long-running transactions. It's not an
easy problem, we don't want to add a lot of accounting overhead, but
maybe we could have some kind of an approximation of the global state
with little overhead, that would give most of the benefit.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2007-03-09 15:48:55 Re: CLUSTER and MVCC
Previous Message Gaetano Mendola 2007-03-09 15:41:42 Re: Calculated view fields (8.1 != 8.2)