Re: Make CLUSTER MVCC-safe

Lists: pgsql-patches
From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Patches <pgsql-patches(at)postgresql(dot)org>
Subject: Make CLUSTER MVCC-safe
Date: 2007-03-20 18:20:45
Message-ID: 460025FD.6080208@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

This patch makes CLUSTER MVCC-safe. Visibility information and update
chains are preserved like in VACUUM FULL.

I created a new generic rewriteheap-facility to handle rewriting tables
in a visibility-preserving manner. All the update chain tracking is done
in rewriteheap.c, the caller is responsible for supplying the stream of
tuples.

CLUSTER is currently the only user of the facility, but I'm envisioning
we might have other users in the future. For example, a version of
VACUUM FULL that rewrites the whole table. We could also use it to make
ALTER TABLE MVCC-safe, but there's some issues with that. For example,
what to do if RECENTLY_DEAD tuples don't satisfy a newly added constraint.

One complication in the implementation was the fact that heap_insert
overwrites the visibility information, and it doesn't write the full
tuple header to WAL. I ended up implementing a special-purpose
raw_heap_insert function instead, which is optimized for bulk inserting
a lot of tuples, knowing that we have exclusive access to the heap.
raw_heap_insert keeps the current buffer locked over calls, until it
gets full, and inserts the whole page to WAL as a single record using
the existing XLOG_HEAP_NEWPAGE record type.

This makes CLUSTER a more viable alternative to VACUUM FULL. One
motivation for making CLUSTER MVCC-safe is that if some poor soul runs
pg_dump to make a backup concurrently with CLUSTER, the clustered tables
will appear to be empty in the dump file.

The documentation doesn't anything about CLUSTER not being MVCC-safe, so
I suppose there's no need to touch the docs. I sent a doc patch earlier
to add a note about it, that doc patch should still be applied to older
release branches, IMO.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
cluster-mvcc-8.patch text/x-diff 33.7 KB

From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc: "Patches" <pgsql-patches(at)postgresql(dot)org>
Subject: Re: Make CLUSTER MVCC-safe
Date: 2007-03-21 19:44:40
Message-ID: 1174506280.6069.69.camel@silverbirch.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

On Tue, 2007-03-20 at 18:20 +0000, Heikki Linnakangas wrote:
> This patch makes CLUSTER MVCC-safe. Visibility information and update
> chains are preserved like in VACUUM FULL.

Sounds good.

> CLUSTER is currently the only user of the facility, but I'm envisioning
> we might have other users in the future. For example, a version of
> VACUUM FULL that rewrites the whole table.

I would be very much in favour of that, as discussed on -hackers.

There might be some requirement for the older VACUUM FULL behaviour, so
I'd like to suggest the syntax:

VACUUM FULL tablename [REPLACE | PRESERVE [STORAGE]];

where VACUUM FULL foo REPLACE STORAGE would be the new default, using
your new functions, and PRESERVE STORAGE would implement the old method.

> One complication in the implementation was the fact that heap_insert
> overwrites the visibility information, and it doesn't write the full
> tuple header to WAL. I ended up implementing a special-purpose
> raw_heap_insert function instead, which is optimized for bulk inserting
> a lot of tuples, knowing that we have exclusive access to the heap.
> raw_heap_insert keeps the current buffer locked over calls, until it
> gets full, and inserts the whole page to WAL as a single record using
> the existing XLOG_HEAP_NEWPAGE record type.

I submitted Fast CLUSTER patch earlier which avoided writing WAL in the
same way that has been done for COPY, CREATE INDEX and CTAS. Would you
like to update your patch to do this also, or would you like me to
re-write the patch to fit with yours?

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: Patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: Make CLUSTER MVCC-safe
Date: 2007-03-21 21:59:44
Message-ID: 200703212159.l2LLxiL09836@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches


Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---------------------------------------------------------------------------

Heikki Linnakangas wrote:
> This patch makes CLUSTER MVCC-safe. Visibility information and update
> chains are preserved like in VACUUM FULL.
>
> I created a new generic rewriteheap-facility to handle rewriting tables
> in a visibility-preserving manner. All the update chain tracking is done
> in rewriteheap.c, the caller is responsible for supplying the stream of
> tuples.
>
> CLUSTER is currently the only user of the facility, but I'm envisioning
> we might have other users in the future. For example, a version of
> VACUUM FULL that rewrites the whole table. We could also use it to make
> ALTER TABLE MVCC-safe, but there's some issues with that. For example,
> what to do if RECENTLY_DEAD tuples don't satisfy a newly added constraint.
>
> One complication in the implementation was the fact that heap_insert
> overwrites the visibility information, and it doesn't write the full
> tuple header to WAL. I ended up implementing a special-purpose
> raw_heap_insert function instead, which is optimized for bulk inserting
> a lot of tuples, knowing that we have exclusive access to the heap.
> raw_heap_insert keeps the current buffer locked over calls, until it
> gets full, and inserts the whole page to WAL as a single record using
> the existing XLOG_HEAP_NEWPAGE record type.
>
> This makes CLUSTER a more viable alternative to VACUUM FULL. One
> motivation for making CLUSTER MVCC-safe is that if some poor soul runs
> pg_dump to make a backup concurrently with CLUSTER, the clustered tables
> will appear to be empty in the dump file.
>
> The documentation doesn't anything about CLUSTER not being MVCC-safe, so
> I suppose there's no need to touch the docs. I sent a doc patch earlier
> to add a note about it, that doc patch should still be applied to older
> release branches, IMO.
>
> --
> Heikki Linnakangas
> EnterpriseDB http://www.enterprisedb.com

>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: Patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: Make CLUSTER MVCC-safe
Date: 2007-03-29 08:40:01
Message-ID: 460B7B61.5090902@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Here's an update, fixing conflict by Tom's recent commit of Simon's
patch to skip WAL-inserts when archiving is not enabled.

Heikki Linnakangas wrote:
> This patch makes CLUSTER MVCC-safe. Visibility information and update
> chains are preserved like in VACUUM FULL.
>
> I created a new generic rewriteheap-facility to handle rewriting tables
> in a visibility-preserving manner. All the update chain tracking is done
> in rewriteheap.c, the caller is responsible for supplying the stream of
> tuples.
>
> CLUSTER is currently the only user of the facility, but I'm envisioning
> we might have other users in the future. For example, a version of
> VACUUM FULL that rewrites the whole table. We could also use it to make
> ALTER TABLE MVCC-safe, but there's some issues with that. For example,
> what to do if RECENTLY_DEAD tuples don't satisfy a newly added constraint.
>
> One complication in the implementation was the fact that heap_insert
> overwrites the visibility information, and it doesn't write the full
> tuple header to WAL. I ended up implementing a special-purpose
> raw_heap_insert function instead, which is optimized for bulk inserting
> a lot of tuples, knowing that we have exclusive access to the heap.
> raw_heap_insert keeps the current buffer locked over calls, until it
> gets full, and inserts the whole page to WAL as a single record using
> the existing XLOG_HEAP_NEWPAGE record type.
>
> This makes CLUSTER a more viable alternative to VACUUM FULL. One
> motivation for making CLUSTER MVCC-safe is that if some poor soul runs
> pg_dump to make a backup concurrently with CLUSTER, the clustered tables
> will appear to be empty in the dump file.
>
> The documentation doesn't anything about CLUSTER not being MVCC-safe, so
> I suppose there's no need to touch the docs. I sent a doc patch earlier
> to add a note about it, that doc patch should still be applied to older
> release branches, IMO.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
cluster-mvcc-9.patch text/x-diff 34.2 KB

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: Patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: Make CLUSTER MVCC-safe
Date: 2007-03-29 16:08:34
Message-ID: 200703291608.l2TG8Y818015@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches


Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---------------------------------------------------------------------------

Heikki Linnakangas wrote:
> Here's an update, fixing conflict by Tom's recent commit of Simon's
> patch to skip WAL-inserts when archiving is not enabled.
>
> Heikki Linnakangas wrote:
> > This patch makes CLUSTER MVCC-safe. Visibility information and update
> > chains are preserved like in VACUUM FULL.
> >
> > I created a new generic rewriteheap-facility to handle rewriting tables
> > in a visibility-preserving manner. All the update chain tracking is done
> > in rewriteheap.c, the caller is responsible for supplying the stream of
> > tuples.
> >
> > CLUSTER is currently the only user of the facility, but I'm envisioning
> > we might have other users in the future. For example, a version of
> > VACUUM FULL that rewrites the whole table. We could also use it to make
> > ALTER TABLE MVCC-safe, but there's some issues with that. For example,
> > what to do if RECENTLY_DEAD tuples don't satisfy a newly added constraint.
> >
> > One complication in the implementation was the fact that heap_insert
> > overwrites the visibility information, and it doesn't write the full
> > tuple header to WAL. I ended up implementing a special-purpose
> > raw_heap_insert function instead, which is optimized for bulk inserting
> > a lot of tuples, knowing that we have exclusive access to the heap.
> > raw_heap_insert keeps the current buffer locked over calls, until it
> > gets full, and inserts the whole page to WAL as a single record using
> > the existing XLOG_HEAP_NEWPAGE record type.
> >
> > This makes CLUSTER a more viable alternative to VACUUM FULL. One
> > motivation for making CLUSTER MVCC-safe is that if some poor soul runs
> > pg_dump to make a backup concurrently with CLUSTER, the clustered tables
> > will appear to be empty in the dump file.
> >
> > The documentation doesn't anything about CLUSTER not being MVCC-safe, so
> > I suppose there's no need to touch the docs. I sent a doc patch earlier
> > to add a note about it, that doc patch should still be applied to older
> > release branches, IMO.
>
> --
> Heikki Linnakangas
> EnterpriseDB http://www.enterprisedb.com

>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: Patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: Make CLUSTER MVCC-safe
Date: 2007-03-30 01:13:28
Message-ID: 20070330011328.GA8205@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Heikki Linnakangas wrote:
> Here's an update, fixing conflict by Tom's recent commit of Simon's
> patch to skip WAL-inserts when archiving is not enabled.

Hmm, wouldn't it be better if the rewriteheap.c file was in
access/heap/ instead of commands/?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: alvherre(at)commandprompt(dot)com
Cc: Patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: Make CLUSTER MVCC-safe
Date: 2007-03-30 11:28:32
Message-ID: 460CF460.6030607@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Alvaro Herrera wrote:
> Heikki Linnakangas wrote:
>> Here's an update, fixing conflict by Tom's recent commit of Simon's
>> patch to skip WAL-inserts when archiving is not enabled.
>
> Hmm, wouldn't it be better if the rewriteheap.c file was in
> access/heap/ instead of commands/?

Yeah, maybe. I thought of it as a subsystem of cluster, and possibly
other commands like alter table. But it really is pretty low-level.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: alvherre(at)commandprompt(dot)com, Patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: Make CLUSTER MVCC-safe
Date: 2007-04-03 01:09:15
Message-ID: 200704030109.l3319F109560@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Heikki Linnakangas wrote:
> Alvaro Herrera wrote:
> > Heikki Linnakangas wrote:
> >> Here's an update, fixing conflict by Tom's recent commit of Simon's
> >> patch to skip WAL-inserts when archiving is not enabled.
> >
> > Hmm, wouldn't it be better if the rewriteheap.c file was in
> > access/heap/ instead of commands/?
>
> Yeah, maybe. I thought of it as a subsystem of cluster, and possibly
> other commands like alter table. But it really is pretty low-level.

So, does the patch need adjustment? Should we move it before
application?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: Patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: Make CLUSTER MVCC-safe
Date: 2007-04-08 01:31:18
Message-ID: 28941.1175995878@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Heikki Linnakangas <heikki(at)enterprisedb(dot)com> writes:
>> This patch makes CLUSTER MVCC-safe. Visibility information and update
>> chains are preserved like in VACUUM FULL.

> Here's an update, fixing conflict by Tom's recent commit of Simon's
> patch to skip WAL-inserts when archiving is not enabled.

Applied with revisions. There were some bugs in it: you need to check
both xmin and tid when determining if one tuple chains to another,
and you can't separate MarkBufferDirty from the critical section that
writes xlog. (I got around that by not keeping the working page in
buffers at all, the same way btree index build works; should be a bit
faster as well as more correct.) It had some memory leakage too.

regards, tom lane