Lists: | pgsql-hackers |
---|
From: | Peter Eisentraut <peter_e(at)gmx(dot)net> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Selectivity estimation for equality and range queries |
Date: | 2007-12-28 10:55:12 |
Message-ID: | 200712281155.12674.peter_e@gmx.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
I have been observing a case where the row count estimation for LIKE 'foo' is
(much) higher than for LIKE 'foo%', the rest of the query being the same.
This is a special case of the estimation for equality being higher than for a
range query that includes the value used in the equality.
I haven't been able to get a copy of the data from the client yet, but
considering the nature of the data and the description of the selectivity
estimation algorithms
(http://www.postgresql.org/docs/8.3/static/row-estimation-examples.html),
this behavior appears to be mathematically plausible. I have been wondering
whether in general the eqsel should try to compare its result with the
estimation of (x >= 'foo' AND x <= 'foo') and use that as a ceiling or
something.
Has anyone else observed something similar?
--
Peter Eisentraut
http://developer.postgresql.org/~petere/
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Peter Eisentraut <peter_e(at)gmx(dot)net> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Selectivity estimation for equality and range queries |
Date: | 2007-12-28 17:29:26 |
Message-ID: | 4454.1198862966@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> I have been observing a case where the row count estimation for LIKE 'foo' is
> (much) higher than for LIKE 'foo%', the rest of the query being the same.
> This is a special case of the estimation for equality being higher than for a
> range query that includes the value used in the equality.
Not really --- LIKE estimation is only weakly related to range
estimation.
Relevant questions here include exactly which PG version is in use and
what's the database encoding/locale. If it's not C locale, the fixes
I made in selfuncs.c during November might be relevant.
regards, tom lane
From: | Peter Eisentraut <peter_e(at)gmx(dot)net> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Selectivity estimation for equality and range queries |
Date: | 2008-01-03 15:40:13 |
Message-ID: | 200801031640.13576.peter_e@gmx.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Am Freitag, 28. Dezember 2007 schrieb Tom Lane:
> Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> > I have been observing a case where the row count estimation for LIKE
> > 'foo' is (much) higher than for LIKE 'foo%', the rest of the query being
> > the same. This is a special case of the estimation for equality being
> > higher than for a range query that includes the value used in the
> > equality.
>
> Not really --- LIKE estimation is only weakly related to range
> estimation.
Here is a narrowed down example.
* Pattern search
EXPLAIN ANALYZE
SELECT id FROM person
WHERE lower(person.name) LIKE 'foo%'
AND person.follow_nr=0
AND person.person_type='P' AND person.batch_nr=0;
Index Scan using person_idx_3 on person (cost=0.01..6.03 rows=1 width=8) (actual time=0.276..4.917 rows=188 loops=1)
Index Cond: ((lower((name)::text) ~>=~ 'foo'::text) AND (lower((name)::text) ~<~ 'fop'::text) AND (person_type = 'P'::bpchar) AND (batch_nr = 0) AND (follow_nr = 0))
Filter: (lower((name)::text) ~~ 'foo%'::text)
* Equality search
EXPLAIN ANALYZE
SELECT id FROM person
WHERE lower(person.name) LIKE 'foo'
AND person.follow_nr=0
AND person.person_type='P' AND person.batch_nr=0;
Index Scan using person_idx_3 on person (cost=0.00..2527.84 rows=627 width=8) (actual time=0.043..0.072 rows=7 loops=1)
Index Cond: ((lower((name)::text) ~=~ 'foo'::text) AND (person_type = 'P'::bpchar) AND (batch_nr = 0) AND (follow_nr = 0))
Filter: (lower((name)::text) ~~ 'foo'::text)
So it expects 1 row for the pattern search and 627 rows for the equality
search, which doesn't make mathematical sense.
What I had meant earlier with range and equality estimation is that this is
(presumably) about the same as guessing 1 row for (x >= 5 AND x < 6), but 627
rows for (x = 5). Somehow, these two estimation methods should be "talking"
to each other.
PostgreSQL is version 8.1.9.
--
Peter Eisentraut
http://developer.postgresql.org/~petere/
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Peter Eisentraut <peter_e(at)gmx(dot)net> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Selectivity estimation for equality and range queries |
Date: | 2008-01-03 15:54:16 |
Message-ID: | 5939.1199375656@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> Here is a narrowed down example.
In what locale/encoding? Can we see the pg_stats row for person_idx_3?
> PostgreSQL is version 8.1.9.
So it hasn't got the LIKE estimation fixes I put in two months ago ...
regards, tom lane