Re: Patch: improve selectivity estimation for IN/NOT IN

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: improve selectivity estimation for IN/NOT IN
Date: 2012-03-04 02:38:57
Message-ID: 27162.1330828737@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com> writes:
> the attached patch improves the array selectivity estimation for = ANY
> and <> ALL, hence for the IN/NOT IN operators, to avoid the
> shortcoming described in
> <http://archives.postgresql.org/pgsql-performance/2012-03/msg00006.php>.

In connection with Alexander Korotkov's array-estimation patch,
I just committed some code into scalararraysel() that checks whether the
operator is equality or inequality of the array element type. It does
that by consulting the default btree or hash opclass for the element
type. I did that with the thought that it could be used to attack this
issue too, but I see that you've done it another way, ie check to see
whether the operator uses eqsel() or neqsel() as selectivity estimator.

I'm not sure offhand which way is better. It could be argued that yours
is more appropriate because if the operator isn't btree equality, but acts
enough like it to use eqsel() as estimator, then it's still appropriate
for scalararraysel() to treat it as equality. On the other side of the
coin, an operator might be equality but have reason to use some
operator-specific estimator rather than eqsel(). We have real examples
of the former (such as the approximate-equality geometric operators)
but I think the latter case is just hypothetical. Another thing that
has to be thought about is that there are numerous cross-type operators
that use eqsel, such as date-vs-timestamp, and it's far from clear
whether it's appropriate for scalararraysel() to use the modified stats
calculation when dealing with one of these. The btree-based logic is
impervious to that since it won't match any cross-type operator.

Thoughts?

(BTW, in any case I don't trust function pointer comparison to be
portable. It'd be a lot safer to look at the function OID.)

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniele Varrazzo 2012-03-04 03:20:08 Re: Patch: improve selectivity estimation for IN/NOT IN
Previous Message Marti Raudsepp 2012-03-04 02:28:34 Re: RFC: Making TRUNCATE more "MVCC-safe"