Review: Extensions Patch

From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
Subject: Review: Extensions Patch
Date: 2010-12-04 01:09:24
Message-ID: 7D51D3B5-F89D-46ED-9D0E-859B2CABC9AD@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Extensions Patch v15 Review
===========================

Submission review
-----------------

* Is the patch in context diff format?

Yes.

* Does it apply cleanly to the current git master?

Yes.

* Does it include reasonable tests, necessary doc patches, etc?

`make check` passes.
`make installcheck` fails (regression.diffs attached).
`make -C contrib installcheck` passes

I see tests for pg_execute_sql_string() and replace(), but absolutely nothing else. Such an important core feature really should have a pretty comprehensive suite of tests exercising all the functionality. Relying on the contrib extension tests won't do it, as they exercise only a subset of the functionality (e.g., no custom_variable_classes), and then only as a side-effect of their actual purpose.

Details on docs below.

Usability review
----------------

Read what the patch is supposed to do, and consider:

* Does the patch actually implement that?

Yes.

* Do we want that?

OH YES!

* Do we already have it?

Nope.

* Does it follow SQL spec, or the community-agreed behavior?

It's an implementation of community-agreed behavior, as hashed out on the mail list over the years.

* Does it include pg_dump support (if applicable)?

Yes, it does.

* Are there dangers?

Yes. Much of the execution is superuser-only, so we need to make sure that such is actually the case (unit tests would help with that).

* Have all the bases been covered?

Mostly, yes. The lack of tests is the single biggest drawback to this patch.

Feature test
------------

Apply the patch, compile it and test:

* Does the feature work as advertised?

Yes.

* Are there corner cases the author has failed to consider?

I had some trouble getting a third-party extension to work. See the end of this document for details.

* Are there any assertion failures or crashes?

No.

Performance review
------------------

* Does the patch slow down simple tests?

Not that I've noticed.

* If it claims to improve performance, does it?

N/A.

* Does it slow down other things?

Not that I've noticed.

Coding review
-------------

Read the changes to the code in detail and consider:

* Does it follow the project [http://developer.postgresql.org/pgdocs/postgres/source.html coding guidelines]?

I'm not a C expert, but it looks like it matches quite closely with all the other code. It's very neat and well-commented.

* Are there portability issues?
* Will it work on Windows/BSD etc?

I can't comment on these.

* Are the comments sufficient and accurate?

Yes.

* Does it do what it says, correctly?

I can't comment on this by looking at the code; see detailed review from a user's perspective below.

* Does it produce compiler warnings?

No.

* Can you make it crash?

No.

Architecture review
-------------------

* Is everything done in a way that fits together coherently with other features/modules?
* Are there interdependencies that can cause problems?

Can't really comment on this.

Notes on the documentation
--------------------------

The pg_extension catalog docs are a little thin, but probably sufficient. They appear to be duplicated, though, with the entries for catalog-pg-extension and view-pg-extensions looking identical. One is apparently a list of installed extensions, and the other an SRF that lists installed and available extensions. But I find the duplication a bit confusing. (Might be easer if I wasn't reading SGML though.)

The extend-extension docs

I don't understand this paragraph at all:

<para>
The way to put together a coherent set of custom <acronym>SQL</> objects
is to create an <literal>extension</literal>. There are two sides of the
extension, the Operating System side contains several files and
the <acronym>SQL</> one uses them.
</para>

This paragraph isn't very clear, either:

<para>
The control file can also contain
a <literal>custom_variable_classes</literal> key followed by any
number of custom settings,
named <literal><replaceable class="parameter">class</replaceable>.varname</literal>.
The <literal>custom_variable_classes</literal> value takes effect
at <command>CREATE EXTENSION</command> time, and is persistent. The
custom settings are set automatically too, but are not persistent.

Examples might help.

Control file keys:

* comment -- Should mention that it's used for CREATE COMMENT ON EXTENSION, no?
* version -- If it's free-form, how will you do dependency-checking? Or will you?
* script
* encoding -- Seems unnecessary; one can explicitly set it in the .sql file, no?
* custom_variable_classes

I don't understand this paragraph:

<para>
The admin
function <link linkend="functions-extension">pg_extension_flag_dump</link>
can be used to revert the default <literal>pg_dump</literal> policy
about objects that belong to an extension and force the flagged objects
to be part of the backups.
</para>

What's a pg_dump policy?

* There appears to be an irrelevant change to the docs for replace().
* For the pg_read_binary_file() docs, should there not be some sort of note about permissions to the file to be read in?

With pg_execute_sql_string() and friends, what if there aren't the same number of parameters as there are placeholders? Should be a note about that.

Also, why are the placholders different than everywhere else? I see

+ SELECT pg_execute_sql_string('CREATE SCHEMA @schema@;', '@schema@', 'utils');

and

+ SELECT pg_execute_sql_string('CREATE SCHEMA s;', 's', 'bar');

Why not $1 like is used in PREPARE and user-defined functions? I think the second example is particularly weird, as there's nothing to indicate that "s" is a placeholder. I don't mind "@schema@", but would like to see uses of such things be consistent throughout PostgreSQL (and I expect that one cannot add such as placeholders to PREPARE).

In the CREATE EXTENSION docs, I don't understand this section:

<varlistentry>
<term>NO USER DATA</term>
<listitem>
<para>
This option is used by <xref linkend="APP-PGDUMP">. As it's possible
to flag extension's objects so that they get dumped
(see <xref linkend="functions-extension">), this option allow
extension authors to prepare installation scripts that will avoid
creating user data when restoring.
</para>
</listitem>
</varlistentry>

Does it mean that no user data will be dumped when the extension is dumped?

In the DROP EXTENSION docs, I think there's a pasto:

<refname>DROP EXTENSION</refname>
<refpurpose>remove a tablespace</refpurpose>

Another one here:

<varlistentry>
<term><literal>CASCADE</literal></term>
<listitem>
<para>
Automatically drop objects that depend on the index.
</para>
</listitem>
</varlistentry>

s/index/extension/. Same in the RESTRICT section.

In xfunc.sgml, I see:

<term><varname>EXTENSION</varname></term>
<listitem>
<para>
extension control files to install
into <literal><replaceable>prefix</replaceable>/share/$MODULEDIR</literal>,
derived from
the <literal><replaceable>extension</replaceable>.control.in</literal>
file to add in your sources. <literal>EXTENSION</literal> can also
be a list, you then have to provide a <literal>.control.in</literal>
file per extension in this list.

Why would multiple control files be necessary? And how would it map to EXTVERSION? Would all the extensions get the one version in EXTVERSION?

In func.sgml:

<literal><function>pg_extension_flag_dump(<parameter>objid</> <type>oid</>)</function></literal>
</entry>
<entry><type>bool</type></entry>
<entry>Flag an extension's <literal>objectid</literal> as worthy
of <literal>pg_dump</literal>.</entry>

Why is this necessary? Aren't all database objects "worthy" of being dumped? And this:

<entry>
<literal><function>pg_extension_with_user_data()</function></literal>
</entry>
<entry><type>bool</type></entry>
<entry>Return <literal>true</literal> only when the extension's
script should care for user data.</entry>

What constitutes "user data" when it comes to extensions? In contrast, the explanation below of "pg_extension_flag_dump" is very clear, making it obvious what it's for and why it might be useful. I don't see how it relates to the summary above, however. OTOH, it also says:

<para>
This function can only be called when loading a <acronym>SQL</> script
using the command <command>CREATE
EXTENSION</command>. See <xref linkend="sql-createextension">
and <xref linkend="extend-extension">.
</para>

And I have no idea how I'd call that function within another function call. Does this mean that only the extension author can call it? Seems like a weird scoping without precedent.

Hrm. Looking at extension.c, I see:

+ if (!create_extension)
+ ereport(ERROR,
+ (errmsg("This function can only be used from CREATE EXTENSION.")));

I think that's a confusing error message, frankly. Perhaps better would be something like "This function can only be called from SQL files executed by CREATE EXTENSION". The docs should make that clearer, too. It was only when I saw this error message that I realized how it's scoped.

Now, some of the information in the [wiki page](http://wiki.postgresql.org/wiki/Extensions) actually seems more informative. For example, the description and examples of \dx, \dx+, and pg_extension_objects() are *much* better. Any way to work those into the docs? Maybe they don't go into the function reference pages, but perhaps into the "Putting it all together" doc? In fact, I think the wiki doc should be copied into the "putting it all together" doc almost verbatim (modulo the patch excerpts). That will be a big help to prospective extension authors.

Speaking of pg_extension_objects(), it'd be nice if its output identified the extension to which each object belongs. Oh, I see, you can ask for only the objects in a single extension. That's okay then. But I don't think you need to document all the OUT paramters, just the required single argument specifying the extension name.

Reviewing the wiki page some more, you have this example (also in the docs, IIRC):

+ DO $$
+ BEGIN
+ IF pg_extension_with_user_data() THEN
+ create schema foo;
+ create table foo.bar(id serial primary key);
+ perform pg_extension_flag_dump('foo.bar_id_seq'::regclass);
+ perform pg_extension_flag_dump('foo.bar::regclass);
+ END IF;
+ END;
+ $$;

That's good, but what if someone has restored a database that has those objects already? Will the extension be re-created with this code before the tables are loaded? Or vice versa? Either way, you're going to have a conflict when you restore from a dump.

Code Review
-----------

I find the use of variables in the control files confusing. It looks like this:

version = 'EXTVERSION'

Why not use make syntax?

version = $(EXTVERSION)

That makes it much clearer that it's a variable and provides the same syntax a the extension developer will already be using in the Makefile.

I've already mentioned that the @extschema@ placeholder is different from any other SQL placeholder:

SET search_path = @extschema@;

I'm not sure of a better option, but I do think it should be made consistent, if at all possible, since this is executed in the database, not at `make` time.

Speaking of which, what happens if an extension has no search_path? Or an old explicit one?

SET search_path = public;

Hrm. I'm wondering if it might make more sense to ignore such statements when sometning is executed by `pg_execute_sql_file()`? IOW, if I do

CREATE EXTENSION foo WITH SCHEMA bar;

That the schema would be installed in bar no matter what search_path is set in the file loaded from the file system. I think that this would make CREATE EXTENSION behave more as expected, *and* allow existing extension distributions to just work (modulo the control file). No need for placeholders in the .sql file anymore, either. Maybe there's some optoin to CREATE EXTENSION that would *not* disable `SET search_path`.

Speaking of the contol file, I think that its inclusion in extension distributions should be completely optional. Of the keys supported:

* comment
* version
* script
* encoding
* custom_variable_classes

All except the last one could be written into the Makefile, and then `make` would generate the control file for installation. Only if the extension author wants to support custom_variable_classes would you need to create an explicit control file. And even that could probably be finessed with some sort of Make variable.

I'd like to see the control file be something that the extension author doesn't have to deal with, as much as possible. It just feels superfluous.

BTW, you might want to put the changes to the existing contrib modules in a separate patch, just to simplify things a bit. Not a big deal at this point, but it might help the committer to have further separation. But reviewing that bit, I was wondering, how are the uninstall scripts handled? Are they necessary anymore?

### Questions ###

* I assume that if an extension is installed that includes a DSO that the server will need to be HUPed before you can CREATE EXTENSION, yes?

* I trust that CREATE EXTENSION and DROP EXTENSION and ALTER EXTENSION are transactional, yes?

* Is there a regtype for extensions?

* Why can only super users get a list of extensions? It might be useful for me to be able to check from within a udf to see if a given extension is installed, for example. I can see why you wouldn't want to allow listing of extensions that are installed on the system but not in the current database.

* Is there no way to keep a list of available extensions somewhere in the cluster? Seems kind of silly to parse all the control files from within pg_extensions(). That can make a call to pg_extensions() pretty expensive.

* What's with `replace_text_variadic(PG_FUNCTION_ARGS)`? Related?

* \dx+ is not documented in \? in psql. I think you just need to add "[+]".

* `doc/src/sgml/ref/psql-ref.sgml` needs updating too.

* I like that I can see a list of installed and available extensions, but I'm not sure that \dx+ is the right command for the latter. Usually + means "more columns to provide more information for the same objects as you saw without the +". Is there any precedent for showing different objects with + than without?

Exercising the Feature
----------------------

* I built my cluster with all the contrib extensions. \dx+ does a nice job of listing what's available. So I ran `CREATE EXTENSION hstore`. It seemed to work great, but was *very* verbose. Every SQL statement was emitted to the terminal. I thought \echo was turned off by CREATE EXTENSION. I was able to use hstore after that; worked great. Yay!

* Wanted to try the schema support, so next did

CREATE SCHEMA ex;
CREATE EXTENSION citext WITH SCHEMA ex;

This was not verbose the way hstore was. And now it's nicely installed:

postgres=# \dx
List of extensions
Schema │ Name │ Version │ Description
────────┼────────┼──────────┼────────────────────────────────────────
ex │ citext │ 9.1devel │ case-insensitive character string type
public │ hstore │ 9.1devel │ storing sets of key/value pairs
(2 rows)

Very nice. And I was able to use it with `create table users (nickname ex.citext primary key);` Awesome.

* Next I tried dropping it. `DROP EXTENSION ex.citext;` did not work. What if I have citext installed in two schemas and just want to drop one of them? Ah, I see, one cannot have an extension with the same name in different schemas. Currently the schema-qualification is a syntax error, but maybe it should be a little more informative?

* Without the schema-qualification I get an error because it's in use. But `CASCADE` works. Great.

* The `pg_extension` catalog and `pg_extensions` views are both useful, but too close in name. I found it confusing how similar they are. It seems to me like the `pg_extension` class is well-named and like the other catalog table names, but `pg_extenions` should perhaps be `pg_available_extensions` or something.

Creating a Third Party Extension
--------------------------------

I have a very simple extension I wrote more or less as an example extension for PGXN. It's a key/value pair data type called [pair](https://github.com/theory/kv-pair). As an experiment, to see how it would feel to use extensions as an extension developer, I created [a branch](https://github.com/theory/kv-pair/tree/9.1-extension) to port it. With that change, I gave it a try:

export PATH=/usr/local/pgsql-devel/bin:$PATH
make
sudo make install
make installcheck PGUSER=postgres

The test failed with this error:

! CREATE EXTENSION pair;
! NOTICE: Installing extension 'pair' from '/usr/local/pgsql-devel/share/contrib/pair.sql
! ERROR: could not execute sql file: '/usr/local/pgsql-devel/share/contrib/pair.sql'

The file is there and looks fine. I also just tried to install it myself:

> /usr/local/pgsql-devel/bin/psql -U postgres
psql (9.1devel)
Type "help" for help.

postgres=# create extension pair;
NOTICE: Installing extension 'pair' from '/usr/local/pgsql-devel/share/contrib/pair.sql', in schema public, with user data
ERROR: could not execute sql file: '/usr/local/pgsql-devel/share/contrib/pair.sql'

It would be nice if it gave me more information. What failed, exactly? It also fails when I use `WITH SCHEMA public`.

I assume this will be a simple issue. Overall I'm happy with how easy it will be to update existing extensions to use the new functionality, but as I've mentioned above, I do think that it would be better if:

* One could specify stuff *only* in the `Makefile` and let `make` create a control file.

* I could omit the @extschema@ business altogether and just let CREATE EXTENSION set the path for the duration of its execution.

With these two changes, it would be *even easier* for existing third-party extensions to be adapted.

Conclusion
----------

Overall I'm very happy with this patch. There are some nits I've brought up here that need to be addressed, and I do think there are some design changes that would be worth considering. So I'm marking it as returned. But I also think that someone more familiar with the core code should do a proper code review during this commitfest (I'm mostly about functionality and interface).

But I think it's close, and I can't wait to have this stuff solidly in core.

Best,

David

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2010-12-04 02:18:20 Streaming replication document
Previous Message Itagaki Takahiro 2010-12-04 00:44:25 Re: pg_execute_from_file review