Expression indexes and dependecies

Lists: pgsql-hackers
From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Expression indexes and dependecies
Date: 2013-07-22 18:29:29
Message-ID: CABOikdNLxn3UJ3zL1MAk-LYaybohiY0_dKxCxxGmvaJRg2kvOQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello,

While doing some tests, I observed that expression indexes can malfunction
if the underlying expression changes. For example, say I define a function
foo() as:

CREATE OR REPLACE FUNCTION foo(a integer) RETURNS integer AS $$
BEGIN
RETURN $1 + 1;
END;
$$ LANGUAGE plpgsql IMMUTABLE;

I then create a table, an expression index on the table and insert a few
rows:

CREATE TABLE test (a int, b char(20));
CREATE UNIQUE INDEX testindx ON test(foo(a));
INSERT INTO test VALUES (generate_series(1,10000), 'bar');

A query such as following would return result using the expression index:

SET enable_seqscan TO off;
SELECT * FROM test WHERE foo(a) = 100;

It will return row with a = 99 since foo() is defined to return (a + 1)

If I now REPLACE the function definition with something else, say to return
(a + 2):

CREATE OR REPLACE FUNCTION foo(a integer) RETURNS integer AS $$
BEGIN
RETURN $1 + 2;
END;
$$ LANGUAGE plpgsql IMMUTABLE;

I get no error/warnings, but the index and the new function definition are
now out of sync. So above query will still return the same result, though
the row with (a = 99) no longer satisfies the current definition of
function foo().

Perhaps this is a known behaviour/limitation, but I could not find that in
the documentation. But I wonder if it makes sense to check for dependencies
during function alteration and complain. Or there are other reasons why we
can't do that and its a much larger problem than what I'm imagining ?

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Expression indexes and dependecies
Date: 2013-07-22 21:04:06
Message-ID: 20130722210405.GF3795@eldon.alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavan Deolasee escribió:
> Hello,
>
> While doing some tests, I observed that expression indexes can malfunction
> if the underlying expression changes.

[...]

> Perhaps this is a known behaviour/limitation, but I could not find that in
> the documentation. But I wonder if it makes sense to check for dependencies
> during function alteration and complain. Or there are other reasons why we
> can't do that and its a much larger problem than what I'm imagining ?

This is a tough problem. The dependency mechanism has no way to keep
track of this kind of dependency; all it does is prevent the function
from being dropped altogether, but preventing it from acquiring a
conflicting definition is outside its charter.

One way to attack this would be registering dependencies of a new kind
on functions used by index expressions. Then CREATE OR REPLACE function
could reject alteration for such functions. I don't know if we care
enough about this case.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Expression indexes and dependecies
Date: 2013-07-22 21:12:10
Message-ID: CAGTBQpYCGUyOHx1D6HHDadOaX5BMLrrTiMFVT9sudFWeEJW94A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jul 22, 2013 at 6:04 PM, Alvaro Herrera
<alvherre(at)2ndquadrant(dot)com> wrote:
> Pavan Deolasee escribió:
>> Hello,
>>
>> While doing some tests, I observed that expression indexes can malfunction
>> if the underlying expression changes.
>
> [...]
>
>> Perhaps this is a known behaviour/limitation, but I could not find that in
>> the documentation. But I wonder if it makes sense to check for dependencies
>> during function alteration and complain. Or there are other reasons why we
>> can't do that and its a much larger problem than what I'm imagining ?
>
> This is a tough problem. The dependency mechanism has no way to keep
> track of this kind of dependency; all it does is prevent the function
> from being dropped altogether, but preventing it from acquiring a
> conflicting definition is outside its charter.
>
> One way to attack this would be registering dependencies of a new kind
> on functions used by index expressions. Then CREATE OR REPLACE function
> could reject alteration for such functions. I don't know if we care
> enough about this case.

What about a warning and leave it to the dba to reindex?


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Expression indexes and dependecies
Date: 2013-07-22 21:14:40
Message-ID: 20130722211440.GE752@alap2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2013-07-22 17:04:06 -0400, Alvaro Herrera wrote:
> Pavan Deolasee escribió:
> > Hello,
> >
> > While doing some tests, I observed that expression indexes can malfunction
> > if the underlying expression changes.
>
> [...]
>
> > Perhaps this is a known behaviour/limitation, but I could not find that in
> > the documentation. But I wonder if it makes sense to check for dependencies
> > during function alteration and complain. Or there are other reasons why we
> > can't do that and its a much larger problem than what I'm imagining ?
>
> This is a tough problem. The dependency mechanism has no way to keep
> track of this kind of dependency; all it does is prevent the function
> from being dropped altogether, but preventing it from acquiring a
> conflicting definition is outside its charter.
>
> One way to attack this would be registering dependencies of a new kind
> on functions used by index expressions. Then CREATE OR REPLACE function
> could reject alteration for such functions. I don't know if we care
> enough about this case.

I think changing the results of a immutable function violates the
contract enough to make this the user's fault. Also the other solutions
seem hard to achieve ;)

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Expression indexes and dependecies
Date: 2013-07-22 23:21:35
Message-ID: 28815.1374535295@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> On 2013-07-22 17:04:06 -0400, Alvaro Herrera wrote:
>> One way to attack this would be registering dependencies of a new kind
>> on functions used by index expressions. Then CREATE OR REPLACE function
>> could reject alteration for such functions. I don't know if we care
>> enough about this case.

> I think changing the results of a immutable function violates the
> contract enough to make this the user's fault. Also the other solutions
> seem hard to achieve ;)

Yeah. Prohibiting any change at all would be a cure worse than the
disease, likely, but we don't have the tools to analyze more finely than
that. And what if the index uses function A which calls function B,
and you change function B?

I'd be in favor of adding a note to the CREATE INDEX man page pointing
out that if you change the behavior of an immutable function, any bad
consequences for indexes are on your own head, and a REINDEX would be
advisable.

regards, tom lane


From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Expression indexes and dependecies
Date: 2013-07-25 08:46:10
Message-ID: CABOikdPAkf54rO-NqwmZqWXc9C7r-mpdp5JFNnoSBe8v3_LdJQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jul 23, 2013 at 4:51 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> > On 2013-07-22 17:04:06 -0400, Alvaro Herrera wrote:
> >> One way to attack this would be registering dependencies of a new kind
> >> on functions used by index expressions. Then CREATE OR REPLACE function
> >> could reject alteration for such functions. I don't know if we care
> >> enough about this case.
>
> > I think changing the results of a immutable function violates the
> > contract enough to make this the user's fault. Also the other solutions
> > seem hard to achieve ;)
>
> Yeah. Prohibiting any change at all would be a cure worse than the
> disease, likely, but we don't have the tools to analyze more finely than
> that. And what if the index uses function A which calls function B,
> and you change function B?
>

Right. I was gonna suggest that if can mark the index invalid if a
dependent immutable function is being changed, but that clearly does not
solve the case of nested function calls and we don't have any mechanism to
track that either.

>
> I'd be in favor of adding a note to the CREATE INDEX man page pointing
> out that if you change the behavior of an immutable function, any bad
> consequences for indexes are on your own head, and a REINDEX would be
> advisable.
>
>
Ok. I will write up something and submit a patch. Constraints probably also
suffer from the same issue. Whats surprising is we don't mandate that the
functions used in CHECK constraint are immutable (like we do for indexes).
What that means is, even if a row was satisfying a constraint while
insertion, it may not once its there. Is that intentional ?

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Expression indexes and dependecies
Date: 2013-07-25 13:13:10
Message-ID: 7513.1374757990@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> writes:
> Ok. I will write up something and submit a patch. Constraints probably also
> suffer from the same issue. Whats surprising is we don't mandate that the
> functions used in CHECK constraint are immutable (like we do for indexes).
> What that means is, even if a row was satisfying a constraint while
> insertion, it may not once its there. Is that intentional ?

Well, it's probably somewhat historical, but I doubt we'd want to
tighten it up now. Here's an example of a sensible CHECK that's
only stable:

create ... last_update timestamptz check (last_update <= now()) ...

More generally, I think the argument was that the behavior of a
non-immutable CHECK would at least be easy to understand, assuming you
know that the check will only be applied at row insertion or update.
Non-immutable indexes could misbehave in much less obvious ways, for
instance causing the results of a query to differ depending on whether
the planner chose to use that index.

regards, tom lane


From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Expression indexes and dependecies
Date: 2013-07-26 02:55:55
Message-ID: CABOikdOkRUir+1=62Uhd9i9oS9S5Ju+tPg4cJOk32PrVkZ8cvw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jul 25, 2013 at 6:43 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

>
>
> Well, it's probably somewhat historical, but I doubt we'd want to
> tighten it up now. Here's an example of a sensible CHECK that's
> only stable:
>
> create ... last_update timestamptz check (last_update <= now()) ...
>

Agree. That looks like a very sensible use case and something not possible
without support for mutable functions.

>
> More generally, I think the argument was that the behavior of a
> non-immutable CHECK would at least be easy to understand, assuming you
> know that the check will only be applied at row insertion or update.
>

But they are also prone to unexpected behaviour, no ? For example, a slight
variation of the above example is:

create ... last_update timestamptz check (last_update <= now() and
last_update >= now() - '1 week'::interval) ...

This constraint would most likely fail if someone was to restore the table
from a dump.

Given that we haven't seen any complaints may mean I am imagining a problem
that does not exist in practice, though I thought the example looks quite
sensible too.

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Expression indexes and dependecies
Date: 2013-07-26 03:12:45
Message-ID: 2915.1374808365@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> writes:
> On Thu, Jul 25, 2013 at 6:43 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> More generally, I think the argument was that the behavior of a
>> non-immutable CHECK would at least be easy to understand, assuming you
>> know that the check will only be applied at row insertion or update.

> But they are also prone to unexpected behaviour, no ? For example, a slight
> variation of the above example is:
> create ... last_update timestamptz check (last_update <= now() and
> last_update >= now() - '1 week'::interval) ...
> This constraint would most likely fail if someone was to restore the table
> from a dump.

Sure, but the reason for the failure would be entirely obvious. It
might be annoying, but it'd still be obvious --- and not too hard to
fix, either. The prohibition on mutable index functions is because you
might waste a great deal of time on diagnosing the reason for a problem.

Now, I grant that that argument could also be used to justify trying
harder than we do now to detect not-really-immutable index functions,
or for trying harder than we do now to prevent you from changing an
index function's behavior. I'm not opposed in principle to tightening
those checks more; I'm just doubtful that we can easily make things
better there.

regards, tom lane