Re: Bad query plan with high-cardinality column

From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Alexander Staubo <alex(at)bengler(dot)no>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Bad query plan with high-cardinality column
Date: 2013-02-22 20:47:56
Message-ID: 1361566076.10158.YahooMailNeo@web162905.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Alexander Staubo <alex(at)bengler(dot)no> wrote:

> This is my schema:
>
>   create table comments (
>     id serial primary key,
>     conversation_id integer,
>     created_at timestamp
>   );
>   create index comments_conversation_id_index on comments (conversation_id);
>   create index comments_created_at_index on comments (created_at);

I suspect you would be better off without those two indexes, and
instead having an index on (conversation_id, created_at).  Not just
for the query you show, but in general.

>   select comments.id from comments where
>     conversation_id = 3975979 order by created_at limit 13
>
> This filters about 5000 rows and returns the oldest 13 rows. But
> the query is consistently planned wrong:

> [planner thinks it will be cheaper to read index in ORDER BY
> sequence and filter rows until it has 13 than to read 5471 rows
> and sort them to pick the top 13 after the sort.]

In my experience these problems come largely from the planner not
knowing the cost of dealing with each tuple.  I see a lot less of
this if I raise cpu_tuple_cost to something in the 0.03 to 0.05
range.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Alexander Staubo 2013-02-22 21:31:48 Re: Bad query plan with high-cardinality column
Previous Message Tom Lane 2013-02-22 20:33:36 Re: Bad query plan with high-cardinality column