Re: performance problem with LIMIT (order BY in DESC order). Wrong index used?

Lists: pgsql-performance
From: Dieter Rehbein <dieter(dot)rehbein(at)skiline(dot)cc>
To: pgsql-performance(at)postgresql(dot)org
Subject: performance problem with LIMIT (order BY in DESC order). Wrong index used?
Date: 2011-04-12 05:20:44
Message-ID: 152FFD9B-A288-4329-9C96-0FF0C6A2D418@skiline.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hi everybody,

I have a performance-problem with a query using a LIMIT. There are other threads rergading performance issues with LIMIT, but I didn't find useful hints for our problem and it might
be interesting for other postgres-users.

There are only 2 simple tables:

CREATE TABLE newsfeed
(
id varchar(32) PRIMARY KEY,
version int4 NOT NULL,
newsfeed_type varchar(20) NOT NULL,
new_item_count int4 NOT NULL
);
CREATE INDEX IDX_NEWSFEED_TYPE ON newsfeed (newsfeed_type);

CREATE TABLE newsfeed_item
(
id varchar(32) PRIMARY NOT NULL,
item_type varchar(35) NOT NULL,
version int4 NOT NULL,
category varchar(25) NULL,
data1 bytea NULL,
data2 bytea NULL,
date_time timestamp NOT NULL,
guid1 varchar(32) NULL,
guid2 varchar(32) NULL,
guid3 varchar(32) NULL,
id1 int8 NULL,
id2 int8 NULL,
long_value1 int8 NULL,
long_value2 int8 NULL,
long_value3 int8 NULL,
string_value1 varchar(4000) NULL,
string_value2 varchar(500) NULL,
string_value3 varchar(500) NULL,
string_value4 varchar(500) NULL,
string_value5 varchar(500) NULL,
string_value6 varchar(500) NULL,
newsfeed varchar(32) NOT NULL
);
CREATE UNIQUE INDEX newsfeed_item_pkey ON newsfeed_item (id);
CREATE INDEX idx_nfi_guid1 ON newsfeed_item (guid1);
CREATE INDEX idx_nfi_guid2 ON newsfeed_item (guid2);
CREATE INDEX idx_nfi_guid3 ON newsfeed_item (guid3);
CREATE INDEX idx_nfi_id1 ON newsfeed_item (id1);
CREATE INDEX idx_nfi_id2 ON newsfeed_item (id2);
CREATE INDEX idx_nfi_newsfeed ON newsfeed_item (newsfeed);
CREATE INDEX idx_nfi_type ON newsfeed_item (item_type);
CREATE INDEX idx_nfi_datetime ON newsfeed_item (date_time);

newsfeed contains 457036 rows
newsweed_item contains 5169727 rows

postgres version: 9.0.2
OS: CentOS release 5.5 (Final)

The following query took 4.2 seconds:

-------------------------
select *
from newsfeed_item
where newsfeed in
(
'173ee4dcec0d11de9f4f12313c0018c1','10dabde0f70211df816612313b02054e',
'17841c9af70211df874b12313b02054e','1783fce2f70211df814412313b02054e','1783fdd2f70211df8c1d12313b02054e','178405a2f70211df829212313b02054e',
'178440c6f70211df97c812313b02054e','178416e6f70211dfac3412313b02054e','1783e4aaf70211df9acd12313b02054e','178437e8f70211df8b8512313b02054e',
'1783f54ef70211df81e012313b02054e','178415c4f70211df8f8112313b02054e'
)
order by date_time desc

limit 25
-------------------------

If the LIMIT was removed, the query took 60 milliseconds! If the sorting order was changed to ASC, the query took 44ms, even with the LIMIT.

Then I tried to create the index on date_time in DESC order (because the result is sorted in descending order), but that did not change anything.

Then I removed the index on date_time with the following results:

query with the limit: 40 ms
query without the limit: 60 ms

=> the optimizer seems to use a wrong index (I did perform an ANALYZE on newsfeed_item and a REINDEX before I did the test). Since I currently don't need
the index on date_time (but will need it in the near future), I removed the index on date_time, which is ok for now.

------------------------

here are the explain analyze results:

1) the query in descending order with the limit and index on date_time (the slow one):

Limit (cost=0.00..980.09 rows=25 width=963) (actual time=48.592..4060.779 rows=25 loops=1)
-> Index Scan Backward using "IDX_NFI_DATETIME" on newsfeed_item (cost=0.00..409365.16 rows=10442 width=963) (actual time=48.581..4060.542 rows=25 loops=1)
Filter: ((newsfeed)::text = ANY ('{173ee4dcec0d11de9f4f12313c0018c1,10dabde0f70211df816612313b02054e,17841c9af70211df874b12313b02054e,1783fce2f70211df814412313b02054e,1783fdd2f70211df8c1d12313b02054e,178405a2f70211df829212313b02054e,178440c6f70211df97c812313b02054e,178416e6f70211dfac3412313b02054e,1783e4aaf70211df9acd12313b02054e,178437e8f70211df8b8512313b02054e,1783f54ef70211df81e012313b02054e,178415c4f70211df8f8112313b02054e}'::text[]))
Total runtime: 4060.959 ms

2) the query in descending order without the limit (which is much faster):

Sort (cost=39575.23..39601.33 rows=10442 width=963) (actual time=15.014..17.038 rows=477 loops=1)
Sort Key: date_time
Sort Method: quicksort Memory: 287kB
-> Bitmap Heap Scan on newsfeed_item (cost=421.41..34450.72 rows=10442 width=963) (actual time=0.644..12.601 rows=477 loops=1)
Recheck Cond: ((newsfeed)::text = ANY ('{173ee4dcec0d11de9f4f12313c0018c1,10dabde0f70211df816612313b02054e,17841c9af70211df874b12313b02054e,1783fce2f70211df814412313b02054e,1783fdd2f70211df8c1d12313b02054e,178405a2f70211df829212313b02054e,178440c6f70211df97c812313b02054e,178416e6f70211dfac3412313b02054e,1783e4aaf70211df9acd12313b02054e,178437e8f70211df8b8512313b02054e,1783f54ef70211df81e012313b02054e,178415c4f70211df8f8112313b02054e}'::text[]))
-> Bitmap Index Scan on idx_nfi_newsfeed (cost=0.00..418.80 rows=10442 width=0) (actual time=0.555..0.555 rows=477 loops=1)
Index Cond: ((newsfeed)::text = ANY ('{173ee4dcec0d11de9f4f12313c0018c1,10dabde0f70211df816612313b02054e,17841c9af70211df874b12313b02054e,1783fce2f70211df814412313b02054e,1783fdd2f70211df8c1d12313b02054e,178405a2f70211df829212313b02054e,178440c6f70211df97c812313b02054e,178416e6f70211dfac3412313b02054e,1783e4aaf70211df9acd12313b02054e,178437e8f70211df8b8512313b02054e,1783f54ef70211df81e012313b02054e,178415c4f70211df8f8112313b02054e}'::text[]))
Total runtime: 19.065 ms

3) the query in ascending order with the limit (which is fast):

Limit (cost=0.00..980.09 rows=25 width=963) (actual time=0.261..3.704 rows=25 loops=1)
-> Index Scan using "IDX_NFI_DATETIME" on newsfeed_item (cost=0.00..409365.16 rows=10442 width=963) (actual time=0.250..3.495 rows=25 loops=1)
Filter: ((newsfeed)::text = ANY ('{173ee4dcec0d11de9f4f12313c0018c1,10dabde0f70211df816612313b02054e,17841c9af70211df874b12313b02054e,1783fce2f70211df814412313b02054e,1783fdd2f70211df8c1d12313b02054e,178405a2f70211df829212313b02054e,178440c6f70211df97c812313b02054e,178416e6f70211dfac3412313b02054e,1783e4aaf70211df9acd12313b02054e,178437e8f70211df8b8512313b02054e,1783f54ef70211df81e012313b02054e,178415c4f70211df8f8112313b02054e}'::text[]))
Total runtime: 3.854 ms

4) The query after removing the index on date_time, in descending order with the LIMIT (which is fast as well).

Limit (cost=34745.39..34745.45 rows=25 width=963) (actual time=12.855..13.143 rows=25 loops=1)
-> Sort (cost=34745.39..34771.49 rows=10442 width=963) (actual time=12.846..12.946 rows=25 loops=1)
Sort Key: date_time
Sort Method: top-N heapsort Memory: 40kB
-> Bitmap Heap Scan on newsfeed_item (cost=421.41..34450.72 rows=10442 width=963) (actual time=0.622..9.936 rows=477 loops=1)
Recheck Cond: ((newsfeed)::text = ANY ('{173ee4dcec0d11de9f4f12313c0018c1,10dabde0f70211df816612313b02054e,17841c9af70211df874b12313b02054e,1783fce2f70211df814412313b02054e,1783fdd2f70211df8c1d12313b02054e,178405a2f70211df829212313b02054e,178440c6f70211df97c812313b02054e,178416e6f70211dfac3412313b02054e,1783e4aaf70211df9acd12313b02054e,178437e8f70211df8b8512313b02054e,1783f54ef70211df81e012313b02054e,178415c4f70211df8f8112313b02054e}'::text[]))
-> Bitmap Index Scan on idx_nfi_newsfeed (cost=0.00..418.80 rows=10442 width=0) (actual time=0.543..0.543 rows=477 loops=1)
Index Cond: ((newsfeed)::text = ANY ('{173ee4dcec0d11de9f4f12313c0018c1,10dabde0f70211df816612313b02054e,17841c9af70211df874b12313b02054e,1783fce2f70211df814412313b02054e,1783fdd2f70211df8c1d12313b02054e,178405a2f70211df829212313b02054e,178440c6f70211df97c812313b02054e,178416e6f70211dfac3412313b02054e,1783e4aaf70211df9acd12313b02054e,178437e8f70211df8b8512313b02054e,1783f54ef70211df81e012313b02054e,178415c4f70211df8f8112313b02054e}'::text[]))

Total runtime: 13.318 ms

Is there anything I can do to add the index on date_time without the performance problem?

regards
Dieter


From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: Dieter Rehbein <dieter(dot)rehbein(at)skiline(dot)cc>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: performance problem with LIMIT (order BY in DESC order). Wrong index used?
Date: 2011-04-12 07:42:27
Message-ID: BANLkTik6U2MU=s9mxr3=pK3Fp+S4bAJK3Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Tue, Apr 12, 2011 at 7:20 AM, Dieter Rehbein
<dieter(dot)rehbein(at)skiline(dot)cc> wrote:
> Hi everybody,
>
> I have a performance-problem with a query using a LIMIT. There are other threads rergading performance issues with LIMIT, but I didn't find useful hints for our problem and it might
> be interesting for other postgres-users.

Did you perform an ANALYZE or VACUUM ANALYZE?
Did you try increasing the statistic targets?

AFAIK, it looks a lot like the planner is missing stats, since it
estimates the index query on idx_nfi_newsfeed will fetch 10k rows -
instead of 25.


From: Dieter Rehbein <dieter(dot)rehbein(at)skiline(dot)cc>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: performance problem with LIMIT (order BY in DESC order). Wrong index used?
Date: 2011-04-12 08:59:22
Message-ID: CF07BFEA-8AFB-4EF8-BBB1-342F6EF23CC2@skiline.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

what I did, was an ANALYZE, which did not change anything.

I just executed a VACUUM ANALYZE and now everything performs well. hm, strange.

thanks
Dieter

Am 12.04.2011 um 09:42 schrieb Claudio Freire:

On Tue, Apr 12, 2011 at 7:20 AM, Dieter Rehbein
<dieter(dot)rehbein(at)skiline(dot)cc> wrote:
> Hi everybody,
>
> I have a performance-problem with a query using a LIMIT. There are other threads rergading performance issues with LIMIT, but I didn't find useful hints for our problem and it might
> be interesting for other postgres-users.

Did you perform an ANALYZE or VACUUM ANALYZE?
Did you try increasing the statistic targets?

AFAIK, it looks a lot like the planner is missing stats, since it
estimates the index query on idx_nfi_newsfeed will fetch 10k rows -
instead of 25.


From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: Dieter Rehbein <dieter(dot)rehbein(at)skiline(dot)cc>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: performance problem with LIMIT (order BY in DESC order). Wrong index used?
Date: 2011-04-12 09:07:27
Message-ID: BANLkTikvbf6zfOZg1KB=1prAyMNPkA=ubQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Tue, Apr 12, 2011 at 10:59 AM, Dieter Rehbein
<dieter(dot)rehbein(at)skiline(dot)cc> wrote:
> I just executed a VACUUM ANALYZE and now everything performs well. hm, strange.

That probably means you need more statistics - try increasing the
newsfeed's statistics target count.

ALTER TABLE newsfeed_item ALTER COLUMN newsfeed SET STATISTICS <n>;

Try different <n> numbers, you can crank it up to 4000 or perhaps more
in 9.0, but you should start lower I guess.


From: tv(at)fuzzy(dot)cz
To: "Claudio Freire" <klaussfreire(at)gmail(dot)com>
Cc: "Dieter Rehbein" <dieter(dot)rehbein(at)skiline(dot)cc>, pgsql-performance(at)postgresql(dot)org
Subject: Re: performance problem with LIMIT (order BY in DESC order). Wrong index used?
Date: 2011-04-12 09:33:31
Message-ID: 85b1e68e7ad292e4cae8fcfd8c8cb61a.squirrel@sq.gransy.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

> On Tue, Apr 12, 2011 at 10:59 AM, Dieter Rehbein
> <dieter(dot)rehbein(at)skiline(dot)cc> wrote:
>> I just executed a VACUUM ANALYZE and now everything performs well. hm,
>> strange.
>
> That probably means you need more statistics - try increasing the
> newsfeed's statistics target count.
>
> ALTER TABLE newsfeed_item ALTER COLUMN newsfeed SET STATISTICS <n>;
>
> Try different <n> numbers, you can crank it up to 4000 or perhaps more
> in 9.0, but you should start lower I guess.

AFAIK the max value is 10000 and the default is 100. Higher numbers mean
higher overhead, so do not jump to 10000 directly. Set it to 1000 and see
if that helps, etc.

regards
Tomas


From: Dieter Rehbein <dieter(dot)rehbein(at)skiline(dot)cc>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: performance problem with LIMIT (order BY in DESC order). Wrong index used?
Date: 2011-04-12 09:36:23
Message-ID: 93F9FA07-1BFF-4052-98FA-56FFF9D5E39D@skiline.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

thank's a lot guys, I will try that out.

regards
Dieter

Am 12.04.2011 um 11:07 schrieb Claudio Freire:

On Tue, Apr 12, 2011 at 10:59 AM, Dieter Rehbein
<dieter(dot)rehbein(at)skiline(dot)cc> wrote:
> I just executed a VACUUM ANALYZE and now everything performs well. hm, strange.

That probably means you need more statistics - try increasing the
newsfeed's statistics target count.

ALTER TABLE newsfeed_item ALTER COLUMN newsfeed SET STATISTICS <n>;

Try different <n> numbers, you can crank it up to 4000 or perhaps more
in 9.0, but you should start lower I guess.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: Dieter Rehbein <dieter(dot)rehbein(at)skiline(dot)cc>, pgsql-performance(at)postgresql(dot)org
Subject: Re: performance problem with LIMIT (order BY in DESC order). Wrong index used?
Date: 2011-04-12 14:38:27
Message-ID: 9713.1302619107@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Claudio Freire <klaussfreire(at)gmail(dot)com> writes:
> Did you try increasing the statistic targets?

> AFAIK, it looks a lot like the planner is missing stats, since it
> estimates the index query on idx_nfi_newsfeed will fetch 10k rows -
> instead of 25.

BTW, this is the right suggestion, but for the wrong reason. You seem
to be looking at

Limit (cost=0.00..980.09 rows=25 width=963) (actual time=48.592..4060.779 rows=25 loops=1)
-> Index Scan Backward using "IDX_NFI_DATETIME" on newsfeed_item (cost=0.00..409365.16 rows=10442 width=963) (actual time=48.581..4060.542 rows=25 loops=1)

Here, the actual row count is constrained to 25 because the LIMIT node
stops calling the indexscan node once it's got 25. So this case proves
little about whether the planner's estimates are any good. You need to
check the estimates in the unconstrained plan:

-> Bitmap Heap Scan on newsfeed_item (cost=421.41..34450.72 rows=10442 width=963) (actual time=0.644..12.601 rows=477 loops=1)

Here we can see that there really are only 477 rows in the table that
satisfy the WHERE clause, versus an estimate of 10K. So sure enough,
the statistics are bad, and an increase in stats target might help.
But you can't conclude that from an explain that involves LIMIT.

regards, tom lane