Quick Links

Query with order by and limit is very slow - wrong index used

From:	Nowak Michał <michal(dot)nowak(at)me(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Query with order by and limit is very slow - wrong index used
Date:	2011-10-03 09:44:21
Message-ID:	6B8FADC8-9692-439F-8AA4-0CAA287FCEB0@me.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Since it's my first on this list, I'd like to say "Hi guys" :)

Here is definition of my table:
a9-dev=> \d records;
Table "public.records"
Column | Type | Modifiers
--------------------------------------+-----------------------------+-----------
id | bigint | not null
checksum | character varying(32) | not null
data | text | not null
delete_date | timestamp without time zone |
last_processing_date | timestamp without time zone |
object_id | character varying(255) | not null
processing_path | character varying(255) | not null
schema_id | character varying(255) | not null
source_id | character varying(255) | not null
source_object_last_modification_date | timestamp without time zone | not null
Indexes:
"records_pkey" PRIMARY KEY, btree (id)
"unq_records_0" UNIQUE, btree (object_id, schema_id, source_id, processing_path)
"length_processing_path_id_idx" btree (length(processing_path::text), id)
"length_processing_path_idx" btree (length(processing_path::text))
"object_id_id_idx" btree (object_id, id)
"schema_id_id_idx" btree (schema_id, id)
"schema_id_idx" btree (schema_id)
"source_id_id_idx" btree (source_id, id)
"source_id_idx" btree (source_id)
"source_object_last_modification_date_id_idx" btree (source_object_last_modification_date, id)
"source_object_last_modification_date_idx" btree (source_object_last_modification_date)

Average length of value of "data" column = 2991.7947061626100466

When I perform query such as this: "select * from records where source_id = 'XXX' order by id limit 200;" I expect DB to use index source_id_id_idx with XXX as filter. It is true for all but one values of XXX - when I ask for records with most common source_id, records_pkey index is used instead and performance is terrible! Explain analyze results below.

a9-dev=> explain analyze select * from records where source_id ='http://ebuw.uw.edu.pl/dlibra/oai-pmh-repository.xml' order by id limit 200;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..755.61 rows=200 width=1127) (actual time=75.292..684.582 rows=200 loops=1)
-> Index Scan using source_id_id_idx on records (cost=0.00..1563542.89 rows=413849 width=1127) (actual time=75.289..684.495 rows=200 loops=1)
Index Cond: ((source_id)::text = 'http://ebuw.uw.edu.pl/dlibra/oai-pmh-repository.xml'::text)
Total runtime: 690.358 ms
(4 rows)

a9-dev=> explain analyze select * from records where source_id ='http://www.wbc.poznan.pl/dlibra/oai-pmh-repository.xml' order by id limit 200;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..466.22 rows=200 width=1127) (actual time=124093.485..124095.540 rows=200 loops=1)
-> Index Scan using records_pkey on records (cost=0.00..2333280.84 rows=1000937 width=1127) (actual time=124093.484..124095.501 rows=200 loops=1)
Filter: ((source_id)::text = 'http://www.wbc.poznan.pl/dlibra/oai-pmh-repository.xml'::text)
Total runtime: 124130.247 ms
(4 rows)

Some info about data distrubution:

a9-dev=> select min(id) from records;
min
--------
190830
(1 row)

a9-dev=> select min(id), max(id) from records where source_id='http://ebuw.uw.edu.pl/dlibra/oai-pmh-repository.xml';
min | max
---------+---------
1105217 | 3811326
(1 row)
a9-dev=> select min(id), max(id) from records where source_id='http://www.wbc.poznan.pl/dlibra/oai-pmh-repository.xml';
min | max
---------+---------
1544991 | 3811413
(1 row)

a9-dev=> select min(id), max(id) from (select id from records where source_id = 'http://ebuw.uw.edu.pl/dlibra/oai-pmh-repository.xml' order by id limit 200) as a;
min | max
---------+---------
1105217 | 1105416
(1 row)

a9-dev=> select min(id), max(id) from (select id from records where source_id = 'http://www.wbc.poznan.pl/dlibra/oai-pmh-repository.xml' order by id limit 200) as a;
min | max
---------+---------
1544991 | 1545190
(1 row)

a9-dev=> select source_id, count(*) from records where source_id = 'http://ebuw.uw.edu.pl/dlibra/oai-pmh-repository.xml' or source_id = 'http://www.wbc.poznan.pl/dlibra/oai-pmh-repository.xml' group by source_id;
source_id | count
--------------------------------------------------------+--------
http://ebuw.uw.edu.pl/dlibra/oai-pmh-repository.xml | 427254
http://www.wbc.poznan.pl/dlibra/oai-pmh-repository.xml | 989184
(2 rows)

a9-dev=> select count(*) from records;
count
---------
3620311
(1 row)

DB settings:

a9-dev=> SELECT
a9-dev-> 'version'::text AS "name",
a9-dev-> version() AS "current_setting"
a9-dev-> UNION ALL
a9-dev-> SELECT
a9-dev-> name,current_setting(name)
a9-dev-> FROM pg_settings
a9-dev-> WHERE NOT source='default' AND NOT name IN
a9-dev-> ('config_file','data_directory','hba_file','ident_file',
a9-dev(> 'log_timezone','DateStyle','lc_messages','lc_monetary',
a9-dev(> 'lc_numeric','lc_time','timezone_abbreviations',
a9-dev(> 'default_text_search_config','application_name',
a9-dev(> 'transaction_deferrable','transaction_isolation',
a9-dev(> 'transaction_read_only');
name | current_setting
--------------------------+-----------------------------------------------------------------------------------------------------------------
version | PostgreSQL 9.0.4 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.6.0 20110530 (Red Hat 4.6.0-9), 64-bit
lc_collate | en_US.UTF-8
lc_ctype | en_US.UTF-8
listen_addresses | *
log_rotation_age | 1d
log_rotation_size | 0
log_truncate_on_rotation | on
logging_collector | on
max_connections | 100
max_stack_depth | 2MB
port | 5432
server_encoding | UTF8
shared_buffers | 24MB
TimeZone | Poland
(14 rows)

This query was always slow. Autovacuum is on, and I ran VACUUM ANALYZE manually few minutes before writing this email.

Please help me with my problem. I'll be happy to provide any additional information if needed.
Michal Nowak

Responses

Re: Query with order by and limit is very slow - wrong index used at 2011-10-03 10:02:26 from Gregg Jaskiewicz
Re: Query with order by and limit is very slow - wrong index used at 2011-10-03 15:12:31 from Tom Lane
Re: Query with order by and limit is very slow - wrong index used at 2011-10-03 22:45:47 from Marcin Mańk

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Gregg Jaskiewicz	2011-10-03 10:02:26	Re: Query with order by and limit is very slow - wrong index used
Previous Message	David Boreham	2011-10-03 05:57:42	Re: Suggestions for Intel 710 SSD test