Re: Review: Extensions Patch

From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
Subject: Re: Review: Extensions Patch
Date: 2010-12-04 14:14:11
Message-ID: m2ipz9tyks.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Thanks for the review, that's quite one! :)

I'm not sure to follow you all along, it seems like the reading is "try
it first then understand and comment again", so sometimes I'm not sure
if you're saying that docs are missing the point or that the behaviour
ain't right.

"David E. Wheeler" <david(at)kineticode(dot)com> writes:
> `make installcheck` fails (regression.diffs attached).

You forgot the attachment. I'd add mine if I didn't get "All 124 tests
passed." result for the command :)

(Note that I've been having problems too, but can't reproduce them
easily, and they might have been related to forgotten maintainer-clean
steps).

> I see tests for pg_execute_sql_string() and replace(), but absolutely
> nothing else. Such an important core feature really should have a
> pretty comprehensive suite of tests exercising all the
> functionality. Relying on the contrib extension tests won't do it, as
> they exercise only a subset of the functionality (e.g., no
> custom_variable_classes), and then only as a side-effect of their
> actual purpose.

Well, the exposed functionality aren't much (2 SQL commands and some
support functions) and you need to have a contrib at hand to exercise
them. So I'm not sure about how to add in regression tests for contrib
infrastructure work and have them depend only on -core. I'll take ideas.

The other thing is that the very important feature here is dump and
restore of a database containing extensions. You're not reporting about
it, though. And I'm not sure about how to check for pg_dump content from
pg_regress oriented tests.

> Yes. Much of the execution is superuser-only, so we need to make sure
> that such is actually the case (unit tests would help with that).

How do you mean? Trying to subvert the features by being able to run
them as a non-superuser? How do you install tests around that?

> Notes on the documentation
> --------------------------
>
> The pg_extension catalog docs are a little thin, but probably
> sufficient. They appear to be duplicated, though, with the entries for
> catalog-pg-extension and view-pg-extensions looking identical. One is
> apparently a list of installed extensions, and the other an SRF that
> lists installed and available extensions. But I find the duplication a
> bit confusing. (Might be easer if I wasn't reading SGML though.)

Well yes the pg_extension catalog is filled at CREATE EXTENSION time so
that we have an OID to register dependencies on. Then I added a SRF to
also list available extensions so that it's easy to integrate the
feature into pgadmin and the like.

Now I'm open to suggestions as far as the names are involved, or maybe
we should keep the SRF but remove the view. psql could easily enough use
the function directly I think.

> I don't understand this paragraph at all:
> This paragraph isn't very clear, either:
> Examples might help.

Well, I think native English skills are required here. And I'm not too
sure about the example.

> Control file keys:
>
> * comment -- Should mention that it's used for CREATE COMMENT ON EXTENSION, no?
> * version -- If it's free-form, how will you do dependency-checking? Or will you?
> * script
> * encoding -- Seems unnecessary; one can explicitly set it in the .sql file, no?
> * custom_variable_classes

Doc says:
http://pgsql.tapoueh.org/extensions/doc/html/extend-extension.html

This control file allows for registering the extension and should
provide values for the keys described below.

Then there's the detail about fields. I think the context is enough to
understand, but I'd be happy to integrate some revisions of the current
content.

For the version, it has been agreed before (Developer Meeting) that v1
would not care about inter-extension dependencies nor be smart about it,
or I would have proposed a patch to have the following into core:

http://packages.debian.org/sid/postgresql-9.0-debversion

> I don't understand this paragraph:
>
> <para>
> The admin
> function <link linkend="functions-extension">pg_extension_flag_dump</link>
> can be used to revert the default <literal>pg_dump</literal> policy
> about objects that belong to an extension and force the flagged objects
> to be part of the backups.
> </para>
>
> What's a pg_dump policy?

Here again not being a native speaker shows up it seems. What I mean is
that the only goal of all this patch is for pg_dump to stop including
extensions install scripts, but just mark the dependency.

So obviously, if your extension provides for some user editable content,
you want to reverse the default behavior and have those content be parts
of the dump, right?

Now, to phrase that in a manual fashion...

> * There appears to be an irrelevant change to the docs for replace().

As detailed in another thread, the patch you're reviewing depends on
another one that has been reviewed separately, marked as ready, but is
not yet in. So the extension patch is including the pg_execute_sql_file
patch, to ease your reviewer work (you don't have to deal with patch
dependencies). So you're seeing unrelated-at-first-glance changes.

> Also, why are the placholders different than everywhere else? I see
>
> + SELECT pg_execute_sql_string('CREATE SCHEMA @schema@;', '@schema@', 'utils');
>
> and
>
> + SELECT pg_execute_sql_string('CREATE SCHEMA s;', 's', 'bar');
>
> Why not $1 like is used in PREPARE and user-defined functions? I think
> the second example is particularly weird, as there's nothing to
> indicate that "s" is a placeholder. I don't mind "@schema@", but would
> like to see uses of such things be consistent throughout PostgreSQL
> (and I expect that one cannot add such as placeholders to PREPARE).

The @extschema@ syntax has been proposed by Heikki, and I just adopted
it. If another is to emerge, I'll adapt to it. While preparing this, I
didn't want to mix mechanism and policy, so there's nothing imposed on
the extension authors or the function users wrt to what a placeholder
name should look like.

> In the CREATE EXTENSION docs, I don't understand this section:
> <varlistentry>
> <term>NO USER DATA</term>
> Does it mean that no user data will be dumped when the extension is
> dumped?

Consider you have an extension with user data. At create extension time,
you're creating a table to hold the data, and maybe you insert default
values into it. At pg_restore time, CREATE EXTENSION will be called
again. Do you want the extension's provided default user data to be
restored, or the one from the backup?

So pg_dump will issue CREATE EXTENSION foo WITH NO USER DATA variant,
and will also issue dump commands for objects marked as being user data,
so that at pg_restore time you get the values from the backup when it
makes sense.

Updates on the manual style paragraph to explain that are welcome. It
could be that what's needed is a better overview of the whole thing
somewhere, but I though what's written would be enough. It seems to be
only enough when you already know what it's all about --- and that's
typical of Unix style man pages. Do we want to include an extension
author tutorial?

> In the DROP EXTENSION docs, I think there's a pasto:
> s/index/extension/. Same in the RESTRICT section.

Thanks, fixed here.

> Why would multiple control files be necessary? And how would it map to
> EXTVERSION? Would all the extensions get the one version in
> EXTVERSION?

That part has been extensively re-hashed by Alvaro (and Heikki I think)
already. The basis for multiple control files per extension is to be
able to cope with contrib/spi, that contains 5 extensions:

MODULES = autoinc insert_username moddatetime refint timetravel
EXTENSION = autoinc auto_username moddatetime refint timetravel
EXTVERSION = $(VERSION)

Now, would you make the same layout choice and maintain different
version numbering schemes, you still can do that, but you won't benefit
from the EXTVERSION facility.

> <literal><function>pg_extension_flag_dump(<parameter>objid</> <type>oid</>)</function></literal>
> Why is this necessary? Aren't all database objects "worthy" of being
> dumped? And this:

Remember that the only goal of all that patch is *NOT* to dump the
extension's install script?

> What constitutes "user data" when it comes to extensions? In contrast,
> the explanation below of "pg_extension_flag_dump" is very clear,
> making it obvious what it's for and why it might be useful. I don't
> see how it relates to the summary above, however. OTOH, it also says:

User data is anything that the extension's install script flagged so.

> <para>
> This function can only be called when loading a <acronym>SQL</> script
> using the command <command>CREATE
> EXTENSION</command>. See <xref linkend="sql-createextension">
> and <xref linkend="extend-extension">.
> </para>
>
> And I have no idea how I'd call that function within another function
> call. Does this mean that only the extension author can call it? Seems
> like a weird scoping without precedent.

Indeed. Those two functions only make sense when used while running the
CREATE EXTENSION sql command, so are only meaningful in the extension's
install script. And yes that looks like a precedent.

> + (errmsg("This function can only be used from CREATE EXTENSION.")));
>
> I think that's a confusing error message, frankly. Perhaps better
> would be something like "This function can only be called from SQL
> files executed by CREATE EXTENSION". The docs should make that
> clearer, too. It was only when I saw this error message that I
> realized how it's scoped.

Any proposal here?

> Now, some of the information in the [wiki
> page](http://wiki.postgresql.org/wiki/Extensions) actually seems more
> informative. For example, the description and examples of \dx, \dx+,
> and pg_extension_objects() are *much* better. Any way to work those
> into the docs? Maybe they don't go into the function reference pages,
> but perhaps into the "Putting it all together" doc? In fact, I think
> the wiki doc should be copied into the "putting it all together" doc
> almost verbatim (modulo the patch excerpts). That will be a big help
> to prospective extension authors.

Well maybe it's just me but I feel like wiki content is "free style"
while the manual tone has to be sharp and concise, and the examples in
there must be carefully chosen so as not to deprecate early. But if the
consensus is to add the wiki content into the documentation chapter,
that's what I'll do for next patch version.

http://pgsql.tapoueh.org/extensions/doc/html/extend-extension.html

> Reviewing the wiki page some more, you have this example (also in the docs, IIRC):
>
> + DO $$
> + BEGIN
> + IF pg_extension_with_user_data() THEN
> + create schema foo;
> + create table foo.bar(id serial primary key);
> + perform pg_extension_flag_dump('foo.bar_id_seq'::regclass);
> + perform pg_extension_flag_dump('foo.bar::regclass);
> + END IF;
> + END;
> + $$;
>
> That's good, but what if someone has restored a database that has
> those objects already? Will the extension be re-created with this code
> before the tables are loaded? Or vice versa? Either way, you're going
> to have a conflict when you restore from a dump.

The example is in the docs, yes. The whole goal of the two functions
pg_extension_flag_dump() and pg_extension_with_user_data() is for
pg_restore to instruct the extension's install script if it should
create the user data objects or bypass the step because they will be in
the dump already.

I think we have to find a way to state that clearly in the manual.

> Code Review
> -----------
>
> I find the use of variables in the control files confusing. It looks like this:
>
> version = 'EXTVERSION'
>
> Why not use make syntax?
>
> version = $(EXTVERSION)
>
> That makes it much clearer that it's a variable and provides the same
> syntax a the extension developer will already be using in the
> Makefile.

For starters, you should be comparing this facility with the .sql.in one
where you can use the following notation:

CREATE OR REPLACE FUNCTION citext_cmp(citext, citext)
RETURNS int4
AS 'MODULE_PATHNAME'
LANGUAGE C STRICT IMMUTABLE;

Then, if you insist on some strange cleverer input syntax, I'll welcome
the C-coded patch to make that happen. You might want to know that the
current extension patch is re-using the same parsing code that is used
for postgresql.conf and recovery.conf, though.

> I've already mentioned that the @extschema@ placeholder is different from any other SQL placeholder:
>
> SET search_path = @extschema@;
>
> I'm not sure of a better option, but I do think it should be made
> consistent, if at all possible, since this is executed in the
> database, not at `make` time.

Again, there's no technical restriction in there.

=# select replace('Are placeholders $2 and $1 better than @name(at)?', '$1', 'foo', '$2', 'bar', '@name@', '@');
replace
---------------------------------------------
Are placeholders bar and foo better than @?
(1 row)

It looks like bikeshedding time. We managed to reduce it up until now,
but well, it has to kick-in after a while :)

> Speaking of which, what happens if an extension has no search_path? Or
> an old explicit one?

No. Please do not try to open this can of worms. It took one entire year
for us to agree that extension should not concern themselves with schema
and search_path issues, and 2 developer meetings.

So what's in the patch is a very simple way for users to instruct the
extension about a schema preference choice, which as been asked by
Heikki and Robert this time. I'll try to come up with the archives link
later, now ain't a good time for me to crawl the lists.

The only other acceptable option would be to do nothing with schema at
install time and to ask users to use the next patch in the series.

CREATE EXTENSION foo;
ALTER EXTENSION foo SET SCHEMA bar;

Now, this would have to fail if more than exactly one schema is
depending on the extension's foo, and that's it. Considering that most
(if not all) extensions are using only one schema, it was though nice to
include the WITH SCHEMA bar; facility at CREATE EXTENSION time, and the
simple implementation is what you're commenting on.

> All except the last one could be written into the Makefile, and then
> `make` would generate the control file for installation. Only if the
> extension author wants to support custom_variable_classes would you
> need to create an explicit control file. And even that could probably
> be finessed with some sort of Make variable.

We've been trying to get there, and Tom and Alvaro both said that it's
not worth the Makefile effort and dance, then Itagaki proposed the
current mechanism, which is based on extension.control.in files and a
simple implicit Make rule in pgxs.mk.

> BTW, you might want to put the changes to the existing contrib modules
> in a separate patch, just to simplify things a bit. Not a big deal at
> this point, but it might help the committer to have further
> separation. But reviewing that bit, I was wondering, how are the
> uninstall scripts handled? Are they necessary anymore?

Separating the only use case of the patch away does not sound like
simplifying anything for me, but maybe that's only because I'm already
hard at work maintaining 7 git branches (+ master) to be able to issue
reviewable patches.

As far as the uninstall scripts goes, yes, I think they are not
necessary anymore and I now realize I didn't remove them. Will wait for
some more input before to add their removal in the v16 patch.

> ### Questions ###
>
> * I assume that if an extension is installed that includes a DSO that
> the server will need to be HUPed before you can CREATE EXTENSION, yes?

If by DSO you mean dynamic shared object (.so, .dylib or .dll) then no.

> * I trust that CREATE EXTENSION and DROP EXTENSION and ALTER EXTENSION
> are transactional, yes?

Sure. But the patch you're reviewing is not featuring ALTER EXTENSION,
that's for the next one :)

> * Is there a regtype for extensions?

No. Do we need one?

> * Why can only super users get a list of extensions? It might be
> useful for me to be able to check from within a udf to see if a given
> extension is installed, for example. I can see why you wouldn't want
> to allow listing of extensions that are installed on the system but
> not in the current database.

Anyone can SELECT FROM pg_extension, the catalog, and so list the
installed extensions. Same for listing an extension's object.

The superuser only function is the one that has to parse the control
files. I'm not good about security concerns, but it well seems to me
that parsing files in $SHAREDIR should be superuser only.

So I think what you want to happen is what's in the code. Right?

> * Is there no way to keep a list of available extensions somewhere in
> the cluster? Seems kind of silly to parse all the control files from
> within pg_extensions(). That can make a call to pg_extensions() pretty
> expensive.

Where would you put that cache, and when do you invalidate its entries
or add new ones? I don't think that listing available extensions is such
a frequent operation that it warrant for its own cache, even if you're
able to come up with a good enough management rule set.

Oh, and I just did select * from pg_extensions(); 3 times here, to be
sure we're on the same page. Time: 166,868 ms then 7,823 then 3,584.

> * What's with `replace_text_variadic(PG_FUNCTION_ARGS)`? Related?

Again, it's all to do with giving you a single all-included patch. This
function is a byproduct of the pg_execute_from_file() patch review.

> * \dx+ is not documented in \? in psql. I think you just need to add
> "[+]".

Well spotted! Fixed here.

> * `doc/src/sgml/ref/psql-ref.sgml` needs updating too.

Done, thanks.

> * I like that I can see a list of installed and available extensions,
> but I'm not sure that \dx+ is the right command for the
> latter. Usually + means "more columns to provide more information for
> the same objects as you saw without the +". Is there any precedent for
> showing different objects with + than without?

Well, I'm not sure about what the best user interface for it is in psql,
but the feature is there already:

=# select name, comment, installed from pg_extensions limit 2;
name | comment | installed
---------------+--------------------------------------------+-----------
adminpack | Administrative functions for PostgreSQL | t
auto_username | functions for tracking who changed a table | f
(2 rows)

> Exercising the Feature
> ----------------------
>
> * I built my cluster with all the contrib extensions. \dx+ does a nice
> job of listing what's available. So I ran `CREATE EXTENSION
> hstore`. It seemed to work great, but was *very* verbose. Every SQL
> statement was emitted to the terminal. I thought \echo was turned off
> by CREATE EXTENSION. I was able to use hstore after that; worked
> great. Yay!

The chatter has been discussed extensively with Alvaro and Tom in the
pg_execute_from_file() branch already --- sorry about that, but really,
I though that handing you a single patch would make it easier.

So this chatter is a result of 2 things:

- we're using SPI in CREATE EXTENSION

- hstore issues a WARNING on the => operator creation, and SPI reaction
to WARNING is to dump the query string, which is the install script
here.

> * Next I tried dropping it. `DROP EXTENSION ex.citext;` did not
> work. What if I have citext installed in two schemas and just want to
> drop one of them? Ah, I see, one cannot have an extension with the
> same name in different schemas. Currently the schema-qualification is
> a syntax error, but maybe it should be a little more informative?

Well this point is not decided upon yet. I'm arguing that an extension
is a database global object, and there's a point to be made for them to
live in a schema.

Currently, they depend on the schema you've been using at CREATE
EXTENSION time, and the schema created by the script are depending on
the extension, and as a result all is working fine and following the
usual conventions (try drop schema cascade when its owner ain't a
superuser but an extension has been installed there, e.g.).

It could be that the easiest way to fix the current mess is to remove
WITH SCHEMA support at CREATE EXTENSION time and have the facility back
in core with the ALTER EXTENSION SET SCHEMA patch.

> * The `pg_extension` catalog and `pg_extensions` views are both
> useful, but too close in name. I found it confusing how similar they
> are. It seems to me like the `pg_extension` class is well-named and
> like the other catalog table names, but `pg_extenions` should perhaps
> be `pg_available_extensions` or something.

Give me the name and I'll update the patch. Well, or this could be
considered a commiter decision.

> Creating a Third Party Extension
> --------------------------------
>
> ! CREATE EXTENSION pair;
> ! NOTICE: Installing extension 'pair' from '/usr/local/pgsql-devel/share/contrib/pair.sql
> ! ERROR: could not execute sql file: '/usr/local/pgsql-devel/share/contrib/pair.sql'
>
> It would be nice if it gave me more information. What failed, exactly?
> It also fails when I use `WITH SCHEMA public`.

Yeah, that's a byproduct of using SPI_execute() in pg_execute_sql_file.
Your script contains BEGIN; and COMMIT; commands. It's not supposed to.
You won't find a single contrib module containing those either, with or
without my patch.

> * One could specify stuff *only* in the `Makefile` and let `make`
> create a control file.

Been trying, failed. Sorry.

> * I could omit the @extschema@ business altogether and just let CREATE
> EXTENSION set the path for the duration of its execution.

That won't fly. The only workable solution is a 2-steps recipe:
CREATE EXTENSION pair;
ALTER EXTENSION pair SET SCHEMA utils;

And the second command will raise an error if 'pair' has registered
dependencies for more than exactly one schema. Some work needs to be
done to see that happen, though.

> Conclusion
> ----------
>
> Overall I'm very happy with this patch. There are some nits I've
> brought up here that need to be addressed, and I do think there are
> some design changes that would be worth considering. So I'm marking it
> as returned. But I also think that someone more familiar with the core
> code should do a proper code review during this commitfest (I'm mostly
> about functionality and interface).

I'm pleased to see that you mean 'Waiting on Author', because I intend
to answer timely enough to still hope for the patch to make it in this
commitfest, baring showstoppers.

If you follow the git repository you will notice that the doc changes
asked for in this review are pushed. I'll wait some more for us to
decide on remaining points before to produce yet another patch version,
though.

http://git.postgresql.org/gitweb?p=postgresql-extension.git;a=shortlog;h=refs/heads/extension

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dimitri Fontaine 2010-12-04 14:15:34 Re: pg_execute_from_file review
Previous Message Boxuan Zhai 2010-12-04 13:27:52 new patch of MERGE (merge_204) & a question about duplicated ctid