Re: pg_dump does not honor namespaces when functions are used in index

Lists: pgsql-hackers
From: Jean-Baptiste Quenot <jbq(at)caraldi(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: pg_dump does not honor namespaces when functions are used in index
Date: 2010-06-17 11:31:53
Message-ID: AANLkTinLbpmO1YnK2i5k3lYS9BZWASl8Uiv81wH_P12A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dear hackers,

I have a pretty nasty problem to submit to your careful scrutiny.

Please consider the following piece of SQL code:

CREATE SCHEMA bar;
SET search_path = bar;

CREATE FUNCTION bar() RETURNS text AS $$
BEGIN
RETURN 'foobar';
END
$$ LANGUAGE plpgsql IMMUTABLE;

CREATE SCHEMA foo;
SET search_path = foo;

CREATE FUNCTION foo() RETURNS text AS $$
BEGIN
RETURN bar();
END
$$ LANGUAGE plpgsql IMMUTABLE;

SET search_path = public;

CREATE TABLE foobar (d text);
insert into foobar (d) values ('foobar');

set search_path = public, foo, bar;
CREATE INDEX foobar_d on foobar using btree(foo());

Run this on a newly created database, and dump it with pg_dump. You'll
notice that the dump is unusable. Creating a new database from this
dump will trigger the following error:

ERROR: function bar() does not exist
LINE 1: SELECT bar()
^
HINT: No function matches the given name and argument types. You
might need to add explicit type casts.
QUERY: SELECT bar()
CONTEXT: PL/pgSQL function "foo" line 2 at RETURN

How can we fix this?
--
Jean-Baptiste Quenot


From: Thom Brown <thombrown(at)gmail(dot)com>
To: Jean-Baptiste Quenot <jbq(at)caraldi(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_dump does not honor namespaces when functions are used in index
Date: 2010-06-17 12:13:05
Message-ID: AANLkTilY3Vsdc1Fip966Ha7wmA9pceErpiOxrGJ4T9hC@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 17 June 2010 12:31, Jean-Baptiste Quenot <jbq(at)caraldi(dot)com> wrote:

> Dear hackers,
>
> I have a pretty nasty problem to submit to your careful scrutiny.
>
> Please consider the following piece of SQL code:
>
>
> CREATE SCHEMA bar;
> SET search_path = bar;
>
> CREATE FUNCTION bar() RETURNS text AS $$
> BEGIN
> RETURN 'foobar';
> END
> $$ LANGUAGE plpgsql IMMUTABLE;
>
> CREATE SCHEMA foo;
> SET search_path = foo;
>
> CREATE FUNCTION foo() RETURNS text AS $$
> BEGIN
> RETURN bar();
> END
> $$ LANGUAGE plpgsql IMMUTABLE;
>
> SET search_path = public;
>
> CREATE TABLE foobar (d text);
> insert into foobar (d) values ('foobar');
>
> set search_path = public, foo, bar;
> CREATE INDEX foobar_d on foobar using btree(foo());
>
>
> Run this on a newly created database, and dump it with pg_dump. You'll
> notice that the dump is unusable. Creating a new database from this
> dump will trigger the following error:
>
> ERROR: function bar() does not exist
> LINE 1: SELECT bar()
> ^
> HINT: No function matches the given name and argument types. You
> might need to add explicit type casts.
> QUERY: SELECT bar()
> CONTEXT: PL/pgSQL function "foo" line 2 at RETURN
>
> How can we fix this?
> --
> Jean-Baptiste Quenot
>
> --
>

I think Postgres doesn't check to see whether bar() exists in the current
search path when you create the foo() function, and since it isn't in the
foo() function's search path value, it fails to find the function when you
try to use it. It can probably be fixed (this specific case, not generally)
with:

ALTER FUNCTION foo.foo() SET search_path=foo, bar;

Thom


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Thom Brown <thombrown(at)gmail(dot)com>
Cc: Jean-Baptiste Quenot <jbq(at)caraldi(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_dump does not honor namespaces when functions are used in index
Date: 2010-06-17 13:20:29
Message-ID: AANLkTikc_wo_Uvp1eIDgkDuaBZrQs_VfD6zucb3GnrHX@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jun 17, 2010 at 8:13 AM, Thom Brown <thombrown(at)gmail(dot)com> wrote:
> On 17 June 2010 12:31, Jean-Baptiste Quenot <jbq(at)caraldi(dot)com> wrote:
>>
>> Dear hackers,
>>
>> I have a pretty nasty problem to submit to your careful scrutiny.
>>
>> Please consider the following piece of SQL code:
>>
>>
>> CREATE SCHEMA bar;
>> SET search_path = bar;
>>
>> CREATE FUNCTION bar() RETURNS text AS $$
>> BEGIN
>>    RETURN 'foobar';
>> END
>> $$ LANGUAGE plpgsql IMMUTABLE;
>>
>> CREATE SCHEMA foo;
>> SET search_path = foo;
>>
>> CREATE FUNCTION foo() RETURNS text AS $$
>> BEGIN
>>    RETURN bar();
>> END
>> $$ LANGUAGE plpgsql IMMUTABLE;
>>
>> SET search_path = public;
>>
>> CREATE TABLE foobar (d text);
>> insert into foobar (d) values ('foobar');
>>
>> set search_path = public, foo, bar;
>> CREATE INDEX foobar_d on foobar using btree(foo());
>>
>>
>> Run this on a newly created database, and dump it with pg_dump. You'll
>> notice that the dump is unusable.  Creating a new database from this
>> dump will trigger the following error:
>>
>> ERROR:  function bar() does not exist
>> LINE 1: SELECT bar()
>>               ^
>> HINT:  No function matches the given name and argument types. You
>> might need to add explicit type casts.
>> QUERY:  SELECT bar()
>> CONTEXT:  PL/pgSQL function "foo" line 2 at RETURN
>>
>> How can we fix this?
>> --
>> Jean-Baptiste Quenot
>>
>> --
>
> I think Postgres doesn't check to see whether bar() exists in the current
> search path when you create the foo() function, and since it isn't in the
> foo() function's search path value, it fails to find the function when you
> try to use it.  It can probably be fixed (this specific case, not generally)
> with:
>
> ALTER FUNCTION foo.foo() SET search_path=foo, bar;

I suppose that the root of the problem here is that foo() is not
really immutable - it gives different results depending on the search
path. It seems like that could bite you in a number of different
ways.

I actually wonder if we shouldn't automatically tag plpgsql functions
with the search_path in effect at the time of their creation (as if
the user had done ALTER FUNCTION ... SET search_path=...whatever the
current search path is...). I suppose the current behavior could
sometimes be useful but on the whole it seems more like a giant
foot-gun which the user oughtn't to get unless they explicitly ask for
it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Thom Brown <thombrown(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jean-Baptiste Quenot <jbq(at)caraldi(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_dump does not honor namespaces when functions are used in index
Date: 2010-06-17 13:27:42
Message-ID: AANLkTikQoC0s7-YVa-ys25CH8EwwEM2j6JeD2o6vxB1W@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 17 June 2010 14:20, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Thu, Jun 17, 2010 at 8:13 AM, Thom Brown <thombrown(at)gmail(dot)com> wrote:
> > On 17 June 2010 12:31, Jean-Baptiste Quenot <jbq(at)caraldi(dot)com> wrote:
> >>
> >> Dear hackers,
> >>
> >> I have a pretty nasty problem to submit to your careful scrutiny.
> >>
> >> Please consider the following piece of SQL code:
> >>
> >>
> >> CREATE SCHEMA bar;
> >> SET search_path = bar;
> >>
> >> CREATE FUNCTION bar() RETURNS text AS $$
> >> BEGIN
> >> RETURN 'foobar';
> >> END
> >> $$ LANGUAGE plpgsql IMMUTABLE;
> >>
> >> CREATE SCHEMA foo;
> >> SET search_path = foo;
> >>
> >> CREATE FUNCTION foo() RETURNS text AS $$
> >> BEGIN
> >> RETURN bar();
> >> END
> >> $$ LANGUAGE plpgsql IMMUTABLE;
> >>
> >> SET search_path = public;
> >>
> >> CREATE TABLE foobar (d text);
> >> insert into foobar (d) values ('foobar');
> >>
> >> set search_path = public, foo, bar;
> >> CREATE INDEX foobar_d on foobar using btree(foo());
> >>
> >>
> >> Run this on a newly created database, and dump it with pg_dump. You'll
> >> notice that the dump is unusable. Creating a new database from this
> >> dump will trigger the following error:
> >>
> >> ERROR: function bar() does not exist
> >> LINE 1: SELECT bar()
> >> ^
> >> HINT: No function matches the given name and argument types. You
> >> might need to add explicit type casts.
> >> QUERY: SELECT bar()
> >> CONTEXT: PL/pgSQL function "foo" line 2 at RETURN
> >>
> >> How can we fix this?
> >> --
> >> Jean-Baptiste Quenot
> >>
> >> --
> >
> > I think Postgres doesn't check to see whether bar() exists in the current
> > search path when you create the foo() function, and since it isn't in the
> > foo() function's search path value, it fails to find the function when
> you
> > try to use it. It can probably be fixed (this specific case, not
> generally)
> > with:
> >
> > ALTER FUNCTION foo.foo() SET search_path=foo, bar;
>
> I suppose that the root of the problem here is that foo() is not
> really immutable - it gives different results depending on the search
> path. It seems like that could bite you in a number of different
> ways.
>
> I actually wonder if we shouldn't automatically tag plpgsql functions
> with the search_path in effect at the time of their creation (as if
> the user had done ALTER FUNCTION ... SET search_path=...whatever the
> current search path is...). I suppose the current behavior could
> sometimes be useful but on the whole it seems more like a giant
> foot-gun which the user oughtn't to get unless they explicitly ask for
> it.
>
>
>
That wouldn't solve the problem in the above case since the search path at
the time of declaring the function was incorrect anyway as it didn't cover
the bar schema. It would fix cases where search paths are correctly set
before functions are created though. Unless there's a language-specific
parser to validate the content of functions, typos in function names will
cause the restoration of backups to fail.

Thom


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Thom Brown <thombrown(at)gmail(dot)com>, Jean-Baptiste Quenot <jbq(at)caraldi(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_dump does not honor namespaces when functions are used in index
Date: 2010-06-17 15:08:20
Message-ID: 3959.1276787300@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> I suppose that the root of the problem here is that foo() is not
> really immutable - it gives different results depending on the search
> path.

Yeah. The declaration of the function is broken --- it's not pg_dump's
fault that the function misbehaves.

> I actually wonder if we shouldn't automatically tag plpgsql functions
> with the search_path in effect at the time of their creation (as if
> the user had done ALTER FUNCTION ... SET search_path=...whatever the
> current search path is...).

That would be extremely expensive and not very backwards-compatible.
In the case at hand, just writing "RETURN bar.bar();" would be the
best-performing solution.

regards, tom lane


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thombrown(at)gmail(dot)com>, Jean-Baptiste Quenot <jbq(at)caraldi(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_dump does not honor namespaces when functions are used in index
Date: 2010-06-17 15:30:39
Message-ID: AANLkTimUXm9vrI4Ebi6M-ndeJZlle1VXJKUFdQmqTgaW@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jun 17, 2010 at 4:08 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I actually wonder if we shouldn't automatically tag plpgsql functions
>> with the search_path in effect at the time of their creation (as if
>> the user had done ALTER FUNCTION ... SET search_path=...whatever the
>> current search path is...).
>
> That would be extremely expensive and not very backwards-compatible.
> In the case at hand, just writing "RETURN bar.bar();" would be the
> best-performing solution.
>

I wonder if we should have a mode for plpgsql functions where all name
lookups are done at definition time So the bar() function would be
resolved to bar.bar() and stored that way permanently so that pg_dump
dumped the definition as bar.bar().

That would be probably just as good as setting the search path on the
function for most users and better for some. It would have the same
problem with dynamic sql that a lot of things have though.

--
greg


From: Jean-Baptiste Quenot <jbq(at)caraldi(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thombrown(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_dump does not honor namespaces when functions are used in index
Date: 2010-06-18 08:31:29
Message-ID: AANLkTimoElJn5Ci9foVPmD07NbYpcoYbAJRAUP8vseAM@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/6/17 Greg Stark <gsstark(at)mit(dot)edu>:
> On Thu, Jun 17, 2010 at 4:08 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> I actually wonder if we shouldn't automatically tag plpgsql functions
>>> with the search_path in effect at the time of their creation (as if
>>> the user had done ALTER FUNCTION ... SET search_path=...whatever the
>>> current search path is...).
>>
>> That would be extremely expensive and not very backwards-compatible.
>> In the case at hand, just writing "RETURN bar.bar();" would be the
>> best-performing solution.
>>
>
> I wonder if we should have a mode for plpgsql functions where all name
> lookups are done at definition time So the bar() function would be
> resolved to bar.bar() and stored that way permanently so that pg_dump
> dumped the definition as bar.bar().
>
> That would be probably just as good as setting the search path on the
> function for most users and better for some. It would have the same
> problem with dynamic sql that a lot of things have though.

+1 IMHO PG should dump the bar() function call as bar.bar() to be
safe. Using fully qualified function name is what I did in my source
code, to work around this problem.
--
Jean-Baptiste Quenot


From: Thom Brown <thombrown(at)gmail(dot)com>
To: Jean-Baptiste Quenot <jbq(at)caraldi(dot)com>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_dump does not honor namespaces when functions are used in index
Date: 2010-06-18 09:58:46
Message-ID: AANLkTikR5cfhhKp211s0YoxoRPJAnJu9m2IgKVECxSdD@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 18 June 2010 09:31, Jean-Baptiste Quenot <jbq(at)caraldi(dot)com> wrote:

> 2010/6/17 Greg Stark <gsstark(at)mit(dot)edu>:
> > On Thu, Jun 17, 2010 at 4:08 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >>> I actually wonder if we shouldn't automatically tag plpgsql functions
> >>> with the search_path in effect at the time of their creation (as if
> >>> the user had done ALTER FUNCTION ... SET search_path=...whatever the
> >>> current search path is...).
> >>
> >> That would be extremely expensive and not very backwards-compatible.
> >> In the case at hand, just writing "RETURN bar.bar();" would be the
> >> best-performing solution.
> >>
> >
> > I wonder if we should have a mode for plpgsql functions where all name
> > lookups are done at definition time So the bar() function would be
> > resolved to bar.bar() and stored that way permanently so that pg_dump
> > dumped the definition as bar.bar().
> >
> > That would be probably just as good as setting the search path on the
> > function for most users and better for some. It would have the same
> > problem with dynamic sql that a lot of things have though.
>
> +1 IMHO PG should dump the bar() function call as bar.bar() to be
> safe. Using fully qualified function name is what I did in my source
> code, to work around this problem.
> --
> Jean-Baptiste Quenot
>

While this problem doesn't result in a unrestorable dump, it does mean
manual intervention would be required when perform a restore. I would
expect all backups to restore, so anything like this which causes issue at
restore time should really have been caught before it got into the database
itself.

What is causing the error is the creation of the index which actually goes
to use the function. In the OP's example, this is deemed okay since it's
found in the search path, but in theory, we could end up with the following
cases:

CREATE SCHEMA bar;
SET search_path = bar;

/* bar.bar() function */

CREATE FUNCTION bar() RETURNS text AS $$
BEGIN
RETURN 'foobar';
END
$$ LANGUAGE plpgsql IMMUTABLE;

CREATE SCHEMA foo;
SET search_path = foo;

CREATE FUNCTION foo() RETURNS text AS $$
BEGIN
RETURN bar();
END
$$ LANGUAGE plpgsql IMMUTABLE;

CREATE SCHEMA moo;
SET search_path = moo;

/* moo.bar() function */

CREATE FUNCTION bar() RETURNS text AS $$
BEGIN
RETURN 'moobar';
END
$$ LANGUAGE plpgsql IMMUTABLE;

/* CASE 1 */

CREATE TABLE foobar (d text);
insert into foobar (d) values ('foobar');

/* This index will use foo.foo() which in turn will use moo.bar() */
set search_path = foo, moo, bar;
CREATE INDEX foobar_d on foobar using btree(foo.foo());

/* CASE 2 */

CREATE TABLE foobar2 (d text);
insert into foobar2 (d) values ('foobar2');

/* This index will use foo.foo() which in turn will use bar.bar() */
set search_path = foo, bar, moo;
CREATE INDEX foobar2_d on foobar2 using btree(foo.foo());

While both tables are using the same function, they won't have the same
functionality due to the difference in the search path at the time of the
index creation. If someone inadvertently ended up with this configuration,
it could be very difficult to debug why they have different results.

If functions are parsed and resovled to prefix all contained function calls
with the relevant schema, then it would also have to prefix it regardless of
whether it's also in the current schema or not as the call could otherwise
be misdirected by a search path change prior to the container function's
usage.

The alternative, which isn't really an alternative, is to prevent the user
from creating a function with references to other functions without a
schemaname... which obviously wouldn't be backwards compatible.

Thom