pg_reorg in core?

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: pg_reorg in core?
Date: 2012-09-21 02:05:46
Message-ID: CAB7nPqTGmNUFi+W6F1iwmf7J-o6sY+xxo6Yb=mkUVYT-CG-B5A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,

During the last PGCon, I heard that some community members would be
interested in having pg_reorg directly in core.
Just to recall, pg_reorg is a functionality developped by NTT that allows
to redistribute a table without taking locks on it.
The technique it uses to reorganize the table is to create a temporary copy
of the table to be redistributed with a CREATE TABLE AS
whose definition changes if table is redistributed with a VACUUM FULL or
CLUSTER.
Then it follows this mechanism:
- triggers are created to redirect all the DMLs that occur on the table to
an intermediate log table.
- creation of indexes on the temporary table based on what the user wishes
- Apply the logs registered during the index creation
- Swap the names of freshly created table and old table
- Drop the useless objects

The code is hosted by pg_foundry here: http://pgfoundry.org/projects/reorg/.
I am also maintaining a fork in github in sync with pgfoundry here:
https://github.com/michaelpq/pg_reorg.

Just, do you guys think it is worth adding a functionality like pg_reorg in
core or not?

If yes, well I think the code of pg_reorg is going to need some
modifications to make it more compatible with contrib modules using only
EXTENSION.
For the time being pg_reorg is divided into 2 parts, binary and library.
The library part is the SQL portion of pg_reorg, containing a set of C
functions that are called by the binary part. This has been extended to
support CREATE EXTENSION recently.
The binary part creates a command pg_reorg in charge of calling the set of
functions created by the lib part, being just a wrapper of the library part
to control the creation and deletion of the objects.
It is also in charge of deleting the temporary objects by callback if an
error occurs.

By using the binary command, it is possible to reorganize a single table or
a database, in this case reorganizing a database launches only a loop on
each table of this database.

My idea is to remove the binary part and to rely only on the library part
to make pg_reorg a single extension with only system functions like other
contrib modules.
In order to do that what is missing is a function that could be used as an
entry point for table reorganization, a function of the type
pg_reorg_table(tableoid) and pg_reorg_table(tableoid, text).
All the functionalities of pg_reorg could be reproducible:
- pg_reorg_table(tableoid) for a VACUUM FULL reorganization
- pg_reorg_table(tableoid, NULL) for a CLUSTER reorganization if table has
a CLUSTER key
- pg_reorg_table(tableoid, columnname) for a CLUSTER reorganization based
on a wanted column.

Is it worth the shot?

Regards,
--
Michael Paquier
http://michael.otacoo.com

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Kupershmidt 2012-09-21 03:07:26 Re: pg_reorg in core?
Previous Message Nozomi Anzai 2012-09-21 01:34:31 Re: 64-bit API for large object