PostgreSQL extensions packaging

Lists: pgsql-hackers
From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: PostgreSQL extensions packaging
Date: 2008-07-23 21:08:19
Message-ID: 3F9B5817-0246-4D69-83C4-DDD7DA06B77E@hi-media.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

I promised to have an in-depth look at the archives before to spend
time on my ideas for $subject, but failed to do so. Here are the ideas
(user level design if you will) :)

As a PostgreSQL extension developer (see http://pgfoundry.org/projects/prefix)
I'd like to benefit from a dump and restore facility. We don't offer
any easy way for the DBA to restore a dump which happen to depends on
external modules. This proposal tries to solve this by providing a
notion of package.

A module is currently, as far as I understand it, a .so file installed
into some superuser (postgres) owned filesystem place. I'm unsure if
the related .sql file (needed to expose the module functions) is a
part of the module notion, but I don't think so.

A package is a namespace much comparable to a schema, to be found at
the same level (in a database can live many packages), and allowed to
have any SQL object under it. A package can also host variables, which
visibility are package global: any SQL into the package can refer
directly to package variables.
And a package can host modules dependancies (not the .so code itself).

Let's try with an example of an imaginary package declaration:

create or replace package prefix_range
with (version = 0.3, parameter = value, ...)
as $pkg$
declare
prefix_range_global_var text := 'init value';
prefix_range_syntax_error exception;
module
prefix.so;
begin
create schema prefix;

create or replace function prefix_range_in(cstring) ...;
create or replace function prefix_range_out(prefix_range) ...;
create type prefix_range;
create function ...

create operator ...
create operator class ...

-- private hidden things?
create role prefix nologin;
create schema prefix_private owner to prefix;
create table prefix_private.relname ...;
revoke all privileges on prefix_private to public;

-- private stuff ...
-- create table, create index, etc ...
--
-- need some though as to how to allow this from SQL objects
-- declared into the package *only*
end;
$pkg$;

The parameters in the with clause are visible inside the package body
and allow package installer to tune the installation: we could use
this for tablespace creation needs, e.g., and version at least should
be displayed from \dP associated command (is this one free?).

This package creation SQL command would fail if any contained SQL is
wrong, of course, but also if one of the declared modules were not
registered/known/found by the server.

We would certainly want to add a package scope construct to existing
CREATE commands, in order to be able to add a function to a package
without re-running the entire create package command, but this could
come at a later date:
CREATE FUNCTION ... PACKAGE prefix ...

Given this infrastructure, pg_dump would (have to) be able to dump the
SQL and pg_restore to complain when the module dependancies are not
met, error'ing out a list of modules to install.

Now, what would be really good to have would be this pg_pkg command I
was dreaming about in another -hacker mail:
pg_pkg add-mirror http://packages.postgresql.org/
pg-pkg list [remote | available]
pg_pkg add plproxy prefix citext
pg_pkg install plproxy mydatabase
pg_pkg uninstall [--force] plproxy mydatabase
pg_pkg remove <package> ...
...

First, we could have a packages.postgresql.org infrastructure where to
provide source code packages depending on PostgreSQL major version.
Those packages would be known to have received code review and -core
acceptance, so would be as trustworthy as PostgreSQL itself is.
And ideally, any developer could prepare his own PostgreSQL packaging
facility where to propose his own packages, this time out of -core
acceptance, but still integrated into the extension system.

pg_pkg add <package> ... would fetch a source code archive (last
version available, or maybe given version with pg_pkg add prefix=0.3
if we really want this feature) and compile and install it with PGXS.
So you would need to have installed server development support to
benefit from package distribution...

Then pg_pkg install would install given package into given database,
running its CREATE OR REPLACE PACKAGE sql script, responsible of
package object creation and variable, tables, etc initialisation.

The uninstall command would get rid of the package, only to produce
errors if some object existing in the target database had some
dependancy to the package, the -f would force a DROP PACKAGE pkgname
CASCADE;

The remove would get rid of the installed files (modules and .sql),
only when the package is no more in use in any database of the cluster.

With this command set and pg_restore giving a list of missing modules
for each package of a given dump file, it would become easy to restore
a database containing extensions.
$ pg_restore ...
ERROR: failed to create package prefix, missing module prefix.so
$ pg_pkg add prefix
$ pg_restore ...
$ psql -U myuser mydb && enjoy :)

Of course, in case of a remote pg_restore call, the pg_pkg command
line has to be done locally on the target server. Maybe this is a
problem for share hosting facilities, but I don't see pg_restore going
to compile and install stuff on the filesystem by itself.

Anyone willing to share some comments on this dream?
- --
Dimitri Fontaine

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iEYEARECAAYFAkiHncQACgkQlBXRlnbh1bnreQCcCYMfln8BqDAcGzs1pTBe9RVa
aN8AnjZ3viA9xbVg4Ka2lS0eIrbOJFpV
=FGYQ
-----END PGP SIGNATURE-----


From: "Tom Dunstan" <tomdcc(at)gmail(dot)com>
To: "Dimitri Fontaine" <dfontaine(at)hi-media(dot)com>
Cc: "Pgsql Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PostgreSQL extensions packaging
Date: 2008-07-23 23:40:30
Message-ID: ca33c0a30807231640n6fb4035dod8121a18aa1fa29c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi!

On Wed, Jul 23, 2008 at 5:08 PM, Dimitri Fontaine
<dfontaine(at)hi-media(dot)com> wrote:
> I promised to have an in-depth look at the archives before to spend time on
> my ideas for $subject, but failed to do so.

I guess that means you missed both the original discussion at
http://archives.postgresql.org/pgsql-hackers/2008-04/msg00132.php and
my initial patch in that direction and subsequent discussion at
http://archives.postgresql.org/pgsql-hackers/2008-04/msg00132.php then
:(.

There were two core components to my idea of modules/packages:
- Separation of installation at an OS level (RPM/yum, deb/dpkg, MSI
installer etc) and installation into a database. The intention was a)
to standardize package installation generally so that users didn't
have to read n different sets of installation instructions for n
different packages, and b) so that a db owner could install into their
own database any module that had been installed on the system, even if
that might include e.g. C functions that they otherwise would not be
able to install without being a superuser.

- Have dependency tracking so that pg_dump could emit e.g. "LOAD
MODULE foo;" rather than all the different instructions to recreate
the module.

So the proposed installation procedure would be more along the lines of:

yum install postgresql-module-postgis
echo "load module postgis" | psql mydb

My intention was to use whatever native package manager was
appropriate for your distro rather than trying to recreate CPAN,
although some people in the original discussion wanted to go down that
route.

The patch that I provided didn't do any of the dependency stuff yet -
I had been investigating various ways to do it automagically, although
I haven't worked on it for a little while. It may be that the straight
forward explicit declaration that you have here is a better way to do
it.

I didn't have versioning and interdependencies between modules yet,
although it's an obvious extension to the idea.

> A package can also host variables, which visibility are
> package global: any SQL into the package can refer directly to package
> variables.

That was way out of scope for my more modest suggestion - I certainly
wasn't going to change pl/pgsql semantics. For example, how do those
variables behave upon a transaction rollback?

> Now, what would be really good to have would be this pg_pkg command I was
> dreaming about in another -hacker mail:

This turns into recreating CPAN. I like the idea of a "blessed" set of
packages, but would rather not require all postgresql users to have a
full build environment (especially on windows) and have to replace
their native packaging solution. It seems that you agree that
fetching/installing should be separate from loading/installing into
the database. Good. Some posters on the original thread were
suggesting that the fetch/install step should somehow do the database
installation as well, which sounded like a huge can of worms.

I think that we can come up with a package/module format that allows
installation at the OS level without demanding a whole set of download
/ build machinery. If someone then wants to build that and have it
install packages, then fine, but we definitely should not require it
to be able to install stuff.

Look forward to your comments

Cheers

Tom


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: "Tom Dunstan" <tomdcc(at)gmail(dot)com>
Cc: "Pgsql Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PostgreSQL extensions packaging
Date: 2008-07-24 07:51:39
Message-ID: 200807240951.41918.dfontaine@hi-media.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Le jeudi 24 juillet 2008, Tom Dunstan a écrit :
> I guess that means you missed both the original discussion at
> http://archives.postgresql.org/pgsql-hackers/2008-04/msg00132.php and
> my initial patch in that direction and subsequent discussion at
> http://archives.postgresql.org/pgsql-hackers/2008-04/msg00132.php then

Thanks for the links, I've read a little down there now :)

> There were two core components to my idea of modules/packages:

First reading makes me think your proposal is all about having a user-visible
management of modules, which in my proposal are a part of packages, and a
much needed one.
So it seems to me both proposals are complementary, in that I didn't go in any
detail about how to manage this module part of a package declaration, and it
looks like your work is all about this.

Where we're trying to solve the same issue(s) is on the OS level packaging.

> - Separation of installation at an OS level (RPM/yum, deb/dpkg, MSI
> installer etc) and installation into a database. The intention was a)
> to standardize package installation generally so that users didn't
> have to read n different sets of installation instructions for n
> different packages, and b) so that a db owner could install into their
> own database any module that had been installed on the system, even if
> that might include e.g. C functions that they otherwise would not be
> able to install without being a superuser.

I'm proposing that PostgreSQL includes a source level package management
toolset, and the OS distributions take advantage of it to release binary
packages easy to install, as it's done now with make && make install (using
PGXS) at PG level.

As you're saying, OS install means the same thing as PGXS make install, that
is having the .so and .sql files at the right place and in the right format.
So even if PostgreSQL was to propose a source level integration with pg_pkg
add <package>, distributions would still be left with the binary packaging
work.

As for the database level installation, I think this is best done by
PostgreSQL itself this time, I'd much prefer the distributions not to bother
with pg_pkg install <package> <database>.
Of course, debian wrapper scripts would certainly repackage this in order for
the user to choose which cluster to target here.

> - Have dependency tracking so that pg_dump could emit e.g. "LOAD
> MODULE foo;" rather than all the different instructions to recreate
> the module.

That could be what the module section of create package means internally. I
don't foresee a need for separating module only management stuff out of
package, but I'm all ears :)

> So the proposed installation procedure would be more along the lines of:
>
> yum install postgresql-module-postgis
> echo "load module postgis" | psql mydb

Agreed, but with those little differences:
- PostgreSQL provides pg_pkg add to distribution to ease binary packaging
- apt-get install postgresql-module-8.3-prefix
- and either
$ pg_pkg install prefix mydb
or
$ psql -c "INSTALL PACKAGE prefix" mydb

> My intention was to use whatever native package manager was
> appropriate for your distro rather than trying to recreate CPAN,
> although some people in the original discussion wanted to go down that
> route.

I know nothing about CPAN, but I hope offering tools for packagers to ease
their work is a good idea. Plus this allows for the PostgreSQL project
approved extensions, -core level quality, reviewed code at an easy to grasp
place.
And it allows advanced user, who compile their PostgreSQL theirself, to still
benefit from a level of OS integration for packages.

> The patch that I provided didn't do any of the dependency stuff yet -
> I had been investigating various ways to do it automagically, although
> I haven't worked on it for a little while. It may be that the straight
> forward explicit declaration that you have here is a better way to do
> it.

It seems to me that your patch would certainly be a step towards implementing
my idea of a package.

> I didn't have versioning and interdependencies between modules yet,
> although it's an obvious extension to the idea.

And a much necessary one. As soon as we have a SQL level object for modules,
with oids in the catalog and all, we surely are able to add entries in
pg_depend about this?

> > A package can also host variables, which visibility are
> > package global: any SQL into the package can refer directly to package
> > variables.
>
> That was way out of scope for my more modest suggestion - I certainly
> wasn't going to change pl/pgsql semantics. For example, how do those
> variables behave upon a transaction rollback?

No idea yet, I just saw that Oracle packages host package level global
variables, and I guess it would work the same as a SET [LOCAL] GUC, except
you could only see the variable from objects within the package.

> This turns into recreating CPAN. I like the idea of a "blessed" set of
> packages, but would rather not require all postgresql users to have a
> full build environment (especially on windows) and have to replace
> their native packaging solution.

As said before, I'm thinking about providing this pg_pkg add as a packager
facility, not replacing binary distributions, but still available for
advanced user.

> I think that we can come up with a package/module format that allows
> installation at the OS level without demanding a whole set of download
> / build machinery.

I think this part is up to distributions.

Oh, and... well, I don't think I'm in a position to write the code to make
this packaging idea a reality. I'm willing to contribute as a non-hacker here
by helping to define a user-level specification which would need to find a
developer.

Hope this helps, regards,
--
dim