Re: Schema version management

From: Benedikt Grundmann <bgrundmann(at)janestreet(dot)com>
To: Joel Jacobson <joel(at)trustly(dot)com>
Cc: Daniel Farina <daniel(at)heroku(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Schema version management
Date: 2012-05-21 07:57:06
Message-ID: CADbMkNMzxZnoBm=4c9mnLNX9qTXzoyp2rkq4ud9iMu+iXoZOXQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 21, 2012 at 5:03 AM, Joel Jacobson <joel(at)trustly(dot)com> wrote:
>
> http://archives.postgresql.org/pgsql-hackers/2010-12/msg02301.php
> The initial feedback was on the usage of OIDs as file names.
> This was indeed a bad idea and was changed, see
> http://archives.postgresql.org/pgsql-hackers/2010-12/msg02318.php
> Gurjeet Singh pointed out the problem with functions sharing the same
> name but having different arguments.
> As of now, it's not certain they will always be dumped into the same files.
> This is a valid point, and needs to be solved in an elegant way.
> The arguments needs to be made part of the path somehow.

This is interesting at Jane Street we actually have a small tool
that "parses" the output of pg_dump. (Well applies a set of regular
expressions plus a little bit guesswork). We use this to do three things
all of which I would love to see supported by postgres tool chain proper:

1) split the output into one file per thing (basically as per this
thread) each file named <type>_<name>_<running-integer> for use
with a VCS. So if we have an overloaded function foo we end up with
several function_foo_1.sql function_foo_2.sql ... The order of the
enumeration is just the order the functions occurred in the pg_dump
which seems to be stable and therefore good enough.

2) extract a patch. You give the tool the name of one or more roots
(e.g. a table or set of tables you want to modify). It finds all
things that depend on it (well sort of just turn the body of each
definition into a list of words and a depends on b if the name of b
occurrs in a). Do a topological sort (if there are cycles because
of the hack dependency check break them but continue and produce a
warning). Output a file that first drops the definitions in inverse
dependency order and then recreates them (in dependency order).
The file starts with a begin but does NOT end with a commit so you
are forced to enter it yourself.

This tool is fantastic if you have a big set of plpgsql functions as
it is otherwise hard to make sure that you have modified all places
when refactoring, changing a column, etc...

3) Find all leaves. E.g. do the topsort on the whole pg_dump and list
the names of all things nothing depends on. This is mostly useful if
you want to make sure that you are not accumulating cruft that isn't
used by anything. Of course you separately need a list or knowledge
about the entry points of your application(s).

Cheers,

Bene

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2012-05-21 15:47:14 release note item
Previous Message Daniel Farina 2012-05-21 05:42:31 Re: External Open Standards