Re: Remove lossy-operator RECHECK flag?

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Remove lossy-operator RECHECK flag?
Date: 2008-04-11 17:41:16
Message-ID: 13936.1207935676@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

[ Changing subject to draw people's attention ... ]

Teodor Sigaev <teodor(at)sigaev(dot)ru> writes:
> RECHECK flag could be removed.

Hmm, that's slightly more radical than I was considering, but it would
simplify matters wouldn't it? The only strong argument against it that
I can think of is that it'd break user-defined opclasses ... but it's
not like we don't change the API for GIST/GIN support functions from
time to time anyway. Does anyone have any idea how many people are
building custom opclasses containing lossy operators? Offhand I suspect
only the PostGIS project would be affected.

If we do this, should we remove RECHECK from the CREATE OPERATOR CLASS
syntax altogether, or leave it in but treat it as a no-op (probably
with a warning)? The latter would make it a shade easier to load
existing dumps, but I'm inclined to think if we're going to break
something it'd be better to break it obviously than subtly.

regards, tom lane


From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Remove lossy-operator RECHECK flag?
Date: 2008-04-11 17:55:32
Message-ID: 47FFA614.5080803@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Teodor Sigaev <teodor(at)sigaev(dot)ru> writes:
>> RECHECK flag could be removed.
>
> Hmm, that's slightly more radical than I was considering, but it would
> simplify matters wouldn't it? The only strong argument against it that
> I can think of is that it'd break user-defined opclasses ... but it's
> not like we don't change the API for GIST/GIN support functions from
> time to time anyway.

Don't we need to change the GiST/GIN support function interface anyway,
to be able to return the "recheck" flag?

> If we do this, should we remove RECHECK from the CREATE OPERATOR CLASS
> syntax altogether, or leave it in but treat it as a no-op (probably
> with a warning)? The latter would make it a shade easier to load
> existing dumps, but I'm inclined to think if we're going to break
> something it'd be better to break it obviously than subtly.

I agree with rather breaking it obviously than subtly.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Remove lossy-operator RECHECK flag?
Date: 2008-04-11 17:58:53
Message-ID: 47FFA6DD.6020408@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> I can think of is that it'd break user-defined opclasses ... but it's
> not like we don't change the API for GIST/GIN support functions from
> time to time anyway. Does anyone have any idea how many people are
Hmm. The biggest breakage of interface to indexes was a removing
pg_am.amconcurrent flag - there is one or two implementation of indexes which
was depending of this flag. All our modules related to GIN or GiST are in
contrib already, new wildspeed will not work with <=8.3 version anyway.

> building custom opclasses containing lossy operators? Offhand I suspect
> only the PostGIS project would be affected.
Yes, I think so.

> If we do this, should we remove RECHECK from the CREATE OPERATOR CLASS
> syntax altogether, or leave it in but treat it as a no-op (probably
> with a warning)?
I think, it should be a error, but not a syntax error - hint should point to use
new version of module. Loading dump from previous versions with opclass
definitions is not good action anyway.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Remove lossy-operator RECHECK flag?
Date: 2008-04-11 18:04:51
Message-ID: 47FFA843.1000307@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Don't we need to change the GiST/GIN support function interface anyway,
> to be able to return the "recheck" flag?
Now we choose - save compatibility or not.

We can save flag RECHECK and introduce optional needRecheck argument for
consistent function and new opclass can use new interface, old ones will work
with RECHECK. Or we remove RECHECK and force opclasses to use new interface.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: "Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Teodor Sigaev" <teodor(at)sigaev(dot)ru>, "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Remove lossy-operator RECHECK flag?
Date: 2008-04-11 18:42:42
Message-ID: 1d4e0c10804111142w12bd3860qcd140b11755dff2a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Apr 11, 2008 at 7:41 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Offhand I suspect
> only the PostGIS project would be affected.

Just wanted to point out that I personnally use the capability to
remove the RECHECK of PostGIS opclass (I define a similar opclass
without the recheck) when I enforce the SRID in the application: the
overhead of rechecking the rows returned by the index scan is really
noticeable (for nothing if your SRID is the same in all your
application and data - that's what I read some time ago, hope it's
still true).

I don't know how you think PostGIS should fix their opclass after
RECHECK removal. But I'd like to have a way to keep the ability to
remove the RECHECK on my own if it's possible so I hope it won't be
hardcoded in C code somewhere.

--
Guillaume


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Remove lossy-operator RECHECK flag?
Date: 2008-04-11 18:47:26
Message-ID: 14808.1207939646@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Teodor Sigaev <teodor(at)sigaev(dot)ru> writes:
> Now we choose - save compatibility or not.

> We can save flag RECHECK and introduce optional needRecheck argument for
> consistent function and new opclass can use new interface, old ones will work
> with RECHECK. Or we remove RECHECK and force opclasses to use new interface.

Yeah, that's what it boils down to.

I'm leaning towards removing RECHECK because it'll allow simplification
of the core code, and I doubt there are enough outside opclasses that're
using lossy operators for the compatibility loss to be a big deal.
We've certainly forced bigger changes than that in the past.

I seem to recall that you had some plans for other incompatible changes
in the call conventions for GIST/GIN support functions, too. If
anything like that is going to happen for 8.4, then outside opclasses
are going to need updates anyway, and forcing this one on them too would
hardly be much of a burden.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com>
Cc: "Teodor Sigaev" <teodor(at)sigaev(dot)ru>, "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Remove lossy-operator RECHECK flag?
Date: 2008-04-11 19:02:44
Message-ID: 15100.1207940564@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com> writes:
> Just wanted to point out that I personnally use the capability to
> remove the RECHECK of PostGIS opclass (I define a similar opclass
> without the recheck) when I enforce the SRID in the application:

To be blunt, that seems like a really bad idea, and I have not the
slightest hesitation about breaking your ability to do it. How
do you know that the recheck-need corresponds to what you are testing
on the application side?

If there's actually some safe, consistent use for such behavior
then I think you need to lobby the PostGIS project to provide
access to it.

regards, tom lane


From: "Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Teodor Sigaev" <teodor(at)sigaev(dot)ru>, "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Remove lossy-operator RECHECK flag?
Date: 2008-04-11 19:33:26
Message-ID: 1d4e0c10804111233t4bf23b5bw5f777edeb74722eb@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Apr 11, 2008 at 9:02 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> To be blunt, that seems like a really bad idea, and I have not the
> slightest hesitation about breaking your ability to do it. How
> do you know that the recheck-need corresponds to what you are testing
> on the application side?

From what I read when I did that a few months ago, the recheck was
added to provide SRID checking.
If you don't have it, you don't have the SRID mismatch error when
SRIDs don't match.

> If there's actually some safe, consistent use for such behavior
> then I think you need to lobby the PostGIS project to provide
> access to it.

In the general case, there isn't. If you enforce the SRID (using a
wrapper/constraints/whatever), there is.

After some googling, I finally found the post of Mark Cave-Ayland on
postgis-users which made me take this decision:
http://www.mail-archive.com/postgis-users(at)postgis(dot)refractions(dot)net/msg01206.html

Don't know if there is a better way to do it in PostGIS itself but the
ability to take this decision for a specific database (or even column)
is really convenient.

--
Guillaume


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Remove lossy-operator RECHECK flag?
Date: 2008-04-13 21:35:32
Message-ID: 19212.1208122532@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Teodor Sigaev <teodor(at)sigaev(dot)ru> writes:
>> If we do this, should we remove RECHECK from the CREATE OPERATOR CLASS
>> syntax altogether, or leave it in but treat it as a no-op (probably
>> with a warning)?

> I think, it should be a error, but not a syntax error - hint should point to use
> new version of module. Loading dump from previous versions with opclass
> definitions is not good action anyway.

Here's a related issue: what should 8.4 pg_dump do when dumping from an
older server version? Some possibilities include

1. Dump the CREATE OPERATOR CLASS command with a RECHECK phrase, same as
before. Then the dump would still work with 8.3 ... at least until we
make some other incompatible change ... while giving an error if loaded
into 8.4.

2. Silently ignore amopreqcheck in older servers. Then the dump would
not load correctly into the older server (but then again, it might
not've anyway). It *would* load into 8.4, but whether it would work
would of course depend on the opclass support functions having been
updated.

3. Throw an error and refuse to dump if it finds amopreqcheck true.

#3 seems just about useless behavior, though. For the moment I have
it doing #1, but it strikes me that that is only useful if 8.4 gets
to release without having made any backwards-incompatible changes in
pg_dump output, which is probably not better than a fifty-fifty bet.
Maybe we should do #2? Thoughts?

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Remove lossy-operator RECHECK flag?
Date: 2008-04-14 17:27:55
Message-ID: 10447.1208194075@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I've committed the runtime-recheck changes. Oleg had mentioned that
GIST text search could be improved by using runtime rechecking, but
I'll leave any refinements of that sort up to you.

One thing I was wondering about is that GIN and GIST are set up to
preinitialize the recheck flag to TRUE; this means that if someone
uses an old consistent() function that doesn't know it should set
the flag, a recheck will be forced. But it seems to me that there's
an argument for preinitializing to FALSE instead. There are four
possibilities for what will happen with an un-updated consistent()
function:

1. If we set the flag TRUE, and that's correct, everything is fine.

2. If we set the flag TRUE, and that's wrong (ie, the query is really
exact) then a useless recheck occurs when we arrive at the heap.
Nothing visibly goes wrong, but the query is slower than it should be.

3. If we set the flag FALSE, and that's correct, everything is fine.

4. If we set the flag FALSE, and that's wrong (ie, the query is really
inexact), then rows that don't match the query may get returned.

By the argument that it's better to break things obviously than to
break them subtly, risking case 4 seems more attractive than risking
case 2.

This also ties into my previous question about what 8.4 pg_dump should
do when seeing amopreqcheck = TRUE while dumping from an old server.
I'm now convinced that the committed behavior (print RECHECK anyway)
is the best choice, for a couple of reasons:
* It avoids silent breakage if the dump is reloaded into an old server.
* You'll have to deal with the issue anyhow if you made your dump with
the older version's pg_dump.

What this means is that, if we make the preinitialization value FALSE,
then an existing GIST/GIN opclass that doesn't use RECHECK will load
just fine into 8.4 and everything will work as expected, even without
touching the C code. An opclass that does use RECHECK will fail to
load from the dump, and if you're stubborn and edit the dump instead
of getting a newer version of the module, you'll start getting wrong
query answers. This means that all the pain is concentrated on the
RECHECK-using case. And you can hardly maintain that you weren't
warned about compatibility problems, if the dump didn't load ...

On the other hand, if we make the preinitialization value TRUE,
there's some pain for users whether they used RECHECK or not,
and there won't be any obvious notification of the problem when
they didn't.

So I'm thinking it might be better to switch to the other
preinitialization setting. Comments?

regards, tom lane


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Subject: Re: Remove lossy-operator RECHECK flag?
Date: 2008-04-14 18:23:27
Message-ID: 4803A11F.9020905@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> 2. If we set the flag TRUE, and that's wrong (ie, the query is really
> exact) then a useless recheck occurs when we arrive at the heap.
> Nothing visibly goes wrong, but the query is slower than it should be.
> 4. If we set the flag FALSE, and that's wrong (ie, the query is really
> inexact), then rows that don't match the query may get returned.
> By the argument that it's better to break things obviously than to
> break them subtly, risking case 4 seems more attractive than risking
> case 2.

The single thought is: usually, it's very hard to see that query returns more
results that it should be. It doesn't matter for fulltext search (and it has
very good chance to stay unnoticed forever because wrong rows will be sorted
down by ranking function, although performance will decrease. But text search is
now built-in :-) ), but for other modules it may be critical, especially when
content of db depends on result of such query.

It seems to me, there was the bug in btree at one time - it doesn't keep
uniqueness and some values was duplicated. User noticed that only during
restoring of db.

> What this means is that, if we make the preinitialization value FALSE,
> then an existing GIST/GIN opclass that doesn't use RECHECK will load
> just fine into 8.4 and everything will work as expected, even without
> touching the C code.
Yes.

> An opclass that does use RECHECK will fail to
> load from the dump, and if you're stubborn and edit the dump instead
> of getting a newer version of the module, you'll start getting wrong
> query answers. This means that all the pain is concentrated on the
> RECHECK-using case. And you can hardly maintain that you weren't

I don't think that restoration of dump from old versions with modules is good
practice anyway and saved RECHECK will signal about problem earlier.
If user edits dump to remove RECHECK flag then he is an enemy to himself.

> So I'm thinking it might be better to switch to the other
> preinitialization setting. Comments?

Agreed.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Subject: Re: Remove lossy-operator RECHECK flag?
Date: 2008-04-14 18:28:51
Message-ID: 19670.1208197731@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Teodor Sigaev <teodor(at)sigaev(dot)ru> writes:
>> By the argument that it's better to break things obviously than to
>> break them subtly, risking case 4 seems more attractive than risking
>> case 2.

> The single thought is: usually, it's very hard to see that query returns more
> results that it should be. It doesn't matter for fulltext search (and it has
> very good chance to stay unnoticed forever because wrong rows will be sorted
> down by ranking function, although performance will decrease.

Hmm ... that's a good point. And the performance loss that I'm
complaining about is probably not large, unless you've got a *really*
expensive operator. Maybe we should leave it as-is.

Anybody else have an opinion?

regards, tom lane


From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Remove lossy-operator RECHECK flag?
Date: 2008-04-14 18:50:36
Message-ID: 4803A77C.90506@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Teodor Sigaev <teodor(at)sigaev(dot)ru> writes:
>>> By the argument that it's better to break things obviously than to
>>> break them subtly, risking case 4 seems more attractive than risking
>>> case 2.
>
>> The single thought is: usually, it's very hard to see that query returns more
>> results that it should be. It doesn't matter for fulltext search (and it has
>> very good chance to stay unnoticed forever because wrong rows will be sorted
>> down by ranking function, although performance will decrease.
>
> Hmm ... that's a good point. And the performance loss that I'm
> complaining about is probably not large, unless you've got a *really*
> expensive operator. Maybe we should leave it as-is.
>
> Anybody else have an opinion?

Better slow than wrong in this case.

The "better to break obviously than subtly" argument doesn't hold here,
because "slow" isn't the same as broken, and returning extra incorrect
rows isn't "obviously" :-).

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Teodor Sigaev" <teodor(at)sigaev(dot)ru>, "Pgsql Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Remove lossy-operator RECHECK flag?
Date: 2008-04-14 20:16:45
Message-ID: 87ve2kfaz6.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> writes:

>> Anybody else have an opinion?
>
> Better slow than wrong in this case.
>
> The "better to break obviously than subtly" argument doesn't hold here, because
> "slow" isn't the same as broken, and returning extra incorrect rows isn't
> "obviously" :-).

We're talking about code which is recompiled for a new version of Postgres but
not altered to return the recheck flag for every tuple? Can we rig the code so
it effectively returns recheck=true all the time in that case? If so then it
would be safe to ignore the recheck flag on the opclass.

There's no danger of picking up code which was actually compiled with older
header files after all, the magic numbers wouldn't match if it's V1 and in any
case I would expect it to crash long before any mistaken tuples were returned.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>, "Teodor Sigaev" <teodor(at)sigaev(dot)ru>, "Pgsql Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Remove lossy-operator RECHECK flag?
Date: 2008-04-14 20:31:27
Message-ID: 22258.1208205087@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> We're talking about code which is recompiled for a new version of
> Postgres but not altered to return the recheck flag for every tuple?
> Can we rig the code so it effectively returns recheck=true all the
> time in that case?

That's what we've got now. I just thought the choice could do with
a bit harder look.

regards, tom lane