Re: empty string causes planner to avoid index. Makes me sad.

Lists: pgsql-general
From: Jeff Amiel <becauseimjeff(at)yahoo(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: empty string causes planner to avoid index. Makes me sad.
Date: 2009-11-27 20:35:50
Message-ID: 67743.28926.qm@web65516.mail.ac4.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

PostgreSQL 8.2.12 on i386-pc-solaris2.10, compiled by GCC gcc (GCC) 3.4.3 (csl-sol210-3_4-branch+sol_rpath)

CREATE TABLE items
(
field1 character(9) NOT NULL,
field2 character varying(17) NOT NULL
};

CREATE INDEX "field1-field2"
ON items
USING btree
(field1, field2);

About 15 million rows in the items table.

explain select count(*) from items where field1 = '102100400' and field2 = '';

Aggregate (cost=231884.57..231884.57 rows=1 width=0)
-> Bitmap Heap Scan on items (cost=4286.53..231841.95 rows=170468 width=0)
Recheck Cond: ((field1 = '102100400'::bpchar) AND ((field2)::text = ''::text))
-> Bitmap Index Scan on "field1-field2-check" (cost=0.00..4282.27 rows=170468 width=0)
Index Cond: ((field1 = '102100400'::bpchar) AND ((field2)::text = ''::text))

explain select count(*) from items where field1 = '102100400' and field2 = ' '; /*17 spaces*/

Aggregate (cost=34.83..34.83 rows=1 width=0)
-> Index Scan using "field1-field2" on items (cost=0.00..34.82 rows=18 width=0)
Index Cond: ((field1 = '102100400'::bpchar) AND ((field2)::text = ' '::text))

If I have any value in field2 other than an empty string '' (like '1' or 'space'), it will use the index.
It appears that somehow the empty string is causing the planner to abandon the index.

Can I get any insights into this?


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Amiel <becauseimjeff(at)yahoo(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: empty string causes planner to avoid index. Makes me sad.
Date: 2009-11-27 21:14:39
Message-ID: 19988.1259356479@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Jeff Amiel <becauseimjeff(at)yahoo(dot)com> writes:
> It appears that somehow the empty string is causing the planner to abandon the index.

You didn't actually show us such a case...

regards, tom lane


From: Jeff Amiel <becauseimjeff(at)yahoo(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: empty string causes planner to avoid index. Makes me sad.
Date: 2009-11-27 21:41:36
Message-ID: 341956.65581.qm@web65507.mail.ac4.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

hmm...ok...planner is not using the index effectively (as effectively as when a non-empty value is passed in)

--- On Fri, 11/27/09, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> Subject: Re: [GENERAL] empty string causes planner to avoid index. Makes me sad.
> To: "Jeff Amiel" <becauseimjeff(at)yahoo(dot)com>
> Cc: pgsql-general(at)postgresql(dot)org
> Date: Friday, November 27, 2009, 3:14 PM
> Jeff Amiel <becauseimjeff(at)yahoo(dot)com>
> writes:
> > It appears that somehow the empty string is causing
> the planner to abandon the index.
>
> You didn't actually show us such a case...
>
>            
> regards, tom lane
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Amiel <becauseimjeff(at)yahoo(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: empty string causes planner to avoid index. Makes me sad.
Date: 2009-11-27 22:00:26
Message-ID: 20827.1259359226@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Jeff Amiel <becauseimjeff(at)yahoo(dot)com> writes:
> hmm...ok...planner is not using the index effectively (as effectively as when a non-empty value is passed in)

You didn't show us any evidence of that, either. Both of your test
cases are using the index.

regards, tom lane


From: Jeff Amiel <becauseimjeff(at)yahoo(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: empty string causes planner to avoid index. Makes me sad.
Date: 2009-11-27 22:32:14
Message-ID: 987026.87261.qm@web65503.mail.ac4.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

--- On Fri, 11/27/09, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> You didn't show us any evidence of that, either.  Both
> of your test
> cases are using the index.

Ok...third try. The cost when passing in an empty string is SIGNIFICANTLY higher than when not. Wouldn't seem that the planner is using the index effectively.

Aggregate (cost=231884.57..231884.57 rows=1 width=0)

versus

Aggregate (cost=34.83..34.83 rows=1 width=0)

By


From: Guillaume Lelarge <guillaume(at)lelarge(dot)info>
To: pgsql-general(at)postgresql(dot)org
Cc: Jeff Amiel <becauseimjeff(at)yahoo(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: empty string causes planner to avoid index. Makes me sad.
Date: 2009-11-27 22:50:54
Message-ID: 200911272350.54263.guillaume@lelarge.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Le vendredi 27 novembre 2009 à 23:32:14, Jeff Amiel a écrit :
> --- On Fri, 11/27/09, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > You didn't show us any evidence of that, either. Both
> > of your test
> > cases are using the index.
>
> Ok...third try. The cost when passing in an empty string is SIGNIFICANTLY
> higher than when not. Wouldn't seem that the planner is using the index
> effectively.
>
> Aggregate (cost=231884.57..231884.57 rows=1 width=0)
>
> versus
>
> Aggregate (cost=34.83..34.83 rows=1 width=0)
>

But in the first example (the empty string one), it fetched 170468 rows from
the index, and in the second one (the 17-spaces string), it fetched only 18
rows. It seems quite normal that the first one is costier then the second one.

--
Guillaume.
http://www.postgresqlfr.org
http://dalibo.com


From: Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>
To: Jeff Amiel <becauseimjeff(at)yahoo(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)postgresql(dot)org
Subject: Re: empty string causes planner to avoid index. Makes me sad.
Date: 2009-11-27 23:07:43
Message-ID: 20091127145258.K69561@megazone.bigpanda.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Fri, 27 Nov 2009, Jeff Amiel wrote:

> --- On Fri, 11/27/09, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> > You didn't show us any evidence of that, either. Both
> > of your test
> > cases are using the index.
>
> Ok...third try. The cost when passing in an empty string is
> SIGNIFICANTLY higher than when not. Wouldn't seem that the planner is
> using the index effectively.

But it's also estimating that it's aggregating over around 10000 times as
many rows presumably because it thinks empty string is alot more common.
That might not be the case in the actual data, but the estimated
difference is the likely cause of the plan differences. What are the
actual runtimes and rowcounts for the queries with different values you're
trying? Explain analyze output would be useful for that.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Amiel <becauseimjeff(at)yahoo(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: empty string causes planner to avoid index. Makes me sad.
Date: 2009-11-27 23:10:31
Message-ID: 21905.1259363431@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Jeff Amiel <becauseimjeff(at)yahoo(dot)com> writes:
> Ok...third try. The cost when passing in an empty string is SIGNIFICANTLY higher than when not.

That just reflects the fact that it's expecting a lot more rows matching
that query. I suppose this is because the statistics show you've got a
lot more rows containing the empty string than other values.

regards, tom lane


From: Steve Crawford <scrawford(at)pinpointresearch(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Amiel <becauseimjeff(at)yahoo(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: empty string causes planner to avoid index. Makes me sad.
Date: 2009-11-30 20:24:22
Message-ID: 4B1429F6.2060903@pinpointresearch.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Tom Lane wrote:
> Jeff Amiel <becauseimjeff(at)yahoo(dot)com> writes:
>
>> Ok...third try. The cost when passing in an empty string is SIGNIFICANTLY higher than when not.
>>
>
> That just reflects the fact that it's expecting a lot more rows matching
> that query. I suppose this is because the statistics show you've got a
> lot more rows containing the empty string than other values.
If you believe the statistics the planner is using are not a useful
approximation of your data, you can try raising the
default_statistics_target. IIRC, it was 10 in that version of PG but has
been raised to 100 in the latest version as the improvement due to
additional data available to the planner seems to typically outweigh the
overhead of collecting/storing/processing the additional stats.

Also, are you sure that the table is being analyzed either by autovacuum
or manually (if you analyze your table, does the explain change
significantly?).

Cheers,
Steve