Re: pg_reorg in core?

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Josh Kupershmidt <schmiddy(at)gmail(dot)com>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_reorg in core?
Date: 2012-09-21 03:33:03
Message-ID: CAB7nPqTFhL8_eHTG=XT5Tfju_+7bewASXY5ivdVJGXQ4yBJxjA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 21, 2012 at 12:07 PM, Josh Kupershmidt <schmiddy(at)gmail(dot)com>wrote:

> On Thu, Sep 20, 2012 at 7:05 PM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
> > Hi all,
> >
> > During the last PGCon, I heard that some community members would be
> > interested in having pg_reorg directly in core.
>
> I'm actually not crazy about this idea, at least not given the current
> state of pg_reorg. Right now, there are a quite a few fixes and
> features which remain to be merged in to cvs head, but at least we can
> develop pg_reorg on a schedule independent of Postgres itself, i.e. we
> can release new features more often than once a year. Perhaps when
> pg_reorg is more stable, and the known bugs and missing features have
> been ironed out, we could think about integrating into core.
>

What could be also great is to move the project directly into github to
facilitate its maintenance and development.
My own copy is based and synced on what is in pgfoundry as I don't own any
admin access to on pgfoundry (honestly don't think I can get one either),
even if I am from NTT. Hey, some people with admin rights here?

> Granted, a nice thing about integrating with core is we'd probably
> have more of an early warning when reshuffling of PG breaks pg_reorg
> (e.g. the recent splitting of the htup headers), but such changes have
> been quick and easy to fix so far.

Yes, that is also why I am proposing to integrate it into core. Its
maintenance pace would be faster and easier than it is now in pgfoundry.
However, if hackers do not think that it is worth adding it to core... Well
separate development as done now would be fine but slower...
Also, just by watching the extension modules in contrib, I haven't seen one
using both the library and binary at the same time like pg_reorg does.

> - creation of indexes on the temporary table based on what the user wishes
> > - Apply the logs registered during the index creation
> > - Swap the names of freshly created table and old table
> > - Drop the useless objects
> >
> > The code is hosted by pg_foundry here:
> http://pgfoundry.org/projects/reorg/.
> > I am also maintaining a fork in github in sync with pgfoundry here:
> > https://github.com/michaelpq/pg_reorg.
> >
> > Just, do you guys think it is worth adding a functionality like pg_reorg
> in
> > core or not?
> >
> > If yes, well I think the code of pg_reorg is going to need some
> > modifications to make it more compatible with contrib modules using only
> > EXTENSION.
> > For the time being pg_reorg is divided into 2 parts, binary and library.
> > The library part is the SQL portion of pg_reorg, containing a set of C
> > functions that are called by the binary part. This has been extended to
> > support CREATE EXTENSION recently.
> > The binary part creates a command pg_reorg in charge of calling the set
> of
> > functions created by the lib part, being just a wrapper of the library
> part
> > to control the creation and deletion of the objects.
> > It is also in charge of deleting the temporary objects by callback if an
> > error occurs.
> >
> > By using the binary command, it is possible to reorganize a single table
> or
> > a database, in this case reorganizing a database launches only a loop on
> > each table of this database.
> >
> > My idea is to remove the binary part and to rely only on the library
> part to
> > make pg_reorg a single extension with only system functions like other
> > contrib modules.
>
> > In order to do that what is missing is a function that could be used as
> an
> > entry point for table reorganization, a function of the type
> > pg_reorg_table(tableoid) and pg_reorg_table(tableoid, text).
> > All the functionalities of pg_reorg could be reproducible:
> > - pg_reorg_table(tableoid) for a VACUUM FULL reorganization
> > - pg_reorg_table(tableoid, NULL) for a CLUSTER reorganization if table
> has a
> > CLUSTER key
> > - pg_reorg_table(tableoid, columnname) for a CLUSTER reorganization
> based on
> > a wanted column.
> >
> > Is it worth the shot?
>
> I haven't seen this documented as such, but AFAICT the reason that
> pg_reorg is split into a binary and set of backend functions which are
> called by the binary is that pg_reorg needs to be able to control its
> steps in several transactions so as to avoid holding locks
> excessively. The reorg_one_table() function uses four or five
> transactions per table, in fact. If all the logic currently in the
> pg_reorg binary were moved into backend functions, calling
> pg_reorg_table() would have to be a single transaction, and there
> would be no advantage to using such a function vs. CLUSTER or VACUUM
> FULL.
>
Of course, but functionalities like CREATE INDEX CONCURRENTLY use multiple
transactions. Couldn't it be possible to use something similar to make the
modifications visible to other backends?

>
> Also, having a separate binary we should be able to perform some neat
> tricks such as parallel index builds using multiple connections (I'm
> messing around with this idea now). AFAIK this would also not be
> possible if pg_reorg were contained solely in the library functions.
>
Interesting idea, this could accelerate the whole process. I am just
wondering about possible consistency issues like the logs being replayed
before swap.
--
Michael Paquier
http://michael.otacoo.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hitoshi Harada 2012-09-21 04:00:47 Re: pg_reorg in core?
Previous Message Tatsuo Ishii 2012-09-21 03:09:14 Re: 64-bit API for large object