Re: slow seqscan

From: Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>
To: Edoardo Ceccarelli <eddy(at)axa(dot)it>
Cc: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, Nick Barr <nicky(at)chuckie(dot)co(dot)uk>, pgsql-performance(at)postgresql(dot)org
Subject: Re: slow seqscan
Date: 2004-04-21 14:31:27
Message-ID: 20040421072156.K55057@megazone.bigpanda.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


On Wed, 21 Apr 2004, Edoardo Ceccarelli wrote:

>
> > What happens if you go:
> >
> > CREATE INDEX annuncio400_rubric_testo_idx ON annuncio400(rubric,
> > LOWER(testo));
> >
> > or even just:
> >
> > CREATE INDEX annuncio400_rubric_testo_idx ON annuncio400(LOWER(testo));
> >
> I wasn't able to make this 2 field index with lower:
>
> dba400=# CREATE INDEX annuncio400_rubric_testo_idx ON
> annuncio400(rubric, LOWER(testo));
> ERROR: parser: parse error at or near "(" at character 71

That's a 7.4 feature I think (and I think the version with two columns
may need extra parens around the lower()). I think the only way to do
something equivalent in 7.3 is to make a function that concatenates the
two in some fashion after having applied the lower to the one part and
then using that in the queries as well. Plus, if you're not in "C"
locale, I'm not sure that it'd help in 7.3 anyway.

> >> But the strangest thing ever is that if I change the filter with
> >> another one that represent a smaller amount of data it uses the
> >> index scan!!!
> >
> >
> > What's strange about that? The less data is going to be retrieved,
> > the more likely postgres is to use the index.
> >
> can't understand this policy:
>
> dba400=# SELECT count(*) from annuncio400 where rubric='DD';
> count
> -------
> 6753
> (1 row)
>
> dba400=# SELECT count(*) from annuncio400 where rubric='MA';
> count
> -------
> 2165
> (1 row)
>
> so it's using the index on 2000 rows and not for 6000? it's not that
> big difference, isn't it?

It's a question of how many pages it thinks it's going to have to retrieve
in order to handle the request. If it say needs (or think it needs) to
retrieve 50% of the pages, then given a random_page_cost of 4, it's going
to expect the index scan to be about twice the cost.

Generally speaking one good way to compare is to try the query with
explain analyze and then change parameters like enable_seqscan and try the
query with explain analyze again and compare the estimated rows and costs.
That'll give an idea of how it expects the two versions of the query to
compare speed wise.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Dave Cramer 2004-04-21 15:05:31 Re: Wierd context-switching issue on Xeon
Previous Message Paul Thomas 2004-04-21 14:08:09 Re: MySQL vs PG TPC-H benchmarks