Planner choosing NestedLoop, although it is slower...

From: Mario Splivalo <mario(dot)splivalo(at)megafon(dot)hr>
To: pgsql-performance(at)postgresql(dot)org
Subject: Planner choosing NestedLoop, although it is slower...
Date: 2011-07-12 18:11:46
Message-ID: 4E1C8E62.2090800@megafon.hr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi, all.

I have a query, looking like this:
SELECT
pub_date
FROM
tubesite_object
INNER JOIN tubesite_image
ON tubesite_image.object_ptr_id = tubesite_object.id
WHERE
tubesite_object.site_id = 8
AND tubesite_object.pub_date < E'2011-07-12 13:25:00'
ORDER BY
tubesite_object.pub_date ASC
LIMIT 21;

That query takes cca 10-15 seconds to run. Here is query plan:

Limit (cost=0.00..415.91 rows=21 width=8) (actual
time=11263.089..11263.089 rows=0 loops=1)
-> Nested Loop (cost=0.00..186249.55 rows=9404 width=8) (actual
time=11263.087..11263.087 rows=0 loops=1)
-> Index Scan using tubesite_object_pub_date_idx on
tubesite_object (cost=0.00..183007.09 rows=9404 width=12) (actual
time=0.024..11059.487 rows=9374 loops=1)
Index Cond: (pub_date < '2011-07-12
13:25:00-05'::timestamp with time zone)
Filter: (site_id = 8)
-> Index Scan using tubesite_image_pkey on tubesite_image
(cost=0.00..0.33 rows=1 width=4) (actual time=0.021..0.021 rows=0
loops=9374)
Index Cond: (tubesite_image.object_ptr_id =
tubesite_object.id)
Total runtime: 11263.141 ms

This query runs quickly (around second or two) when there is only few
connections to the database. Once I have 50-80 connections (200 is the
limit, although I never have more than 120-150 connections), that query
takes around 10-15 seconds.

But, if I disable nestedloops, here is the query plan:

Limit (cost=22683.45..22683.51 rows=21 width=8) (actual
time=136.009..136.009 rows=0 loops=1)
-> Sort (cost=22683.45..22706.96 rows=9404 width=8) (actual
time=136.007..136.007 rows=0 loops=1)
Sort Key: tubesite_object.pub_date
Sort Method: quicksort Memory: 25kB
-> Hash Join (cost=946.51..22429.91 rows=9404 width=8)
(actual time=135.934..135.934 rows=0 loops=1)
Hash Cond: (tubesite_object.id =
tubesite_image.object_ptr_id)
-> Bitmap Heap Scan on tubesite_object
(cost=545.40..21828.97 rows=9404 width=12) (actual time=20.874..104.075
rows=9374 loops=1)
Recheck Cond: (site_id = 8)
Filter: (pub_date < '2011-07-12
13:25:00-05'::timestamp with time zone)
-> Bitmap Index Scan on tubesite_object_site_id
(cost=0.00..543.05 rows=9404 width=0) (actual time=18.789..18.789
rows=9374 loops=1)
Index Cond: (site_id = 8)
-> Hash (cost=215.49..215.49 rows=14849 width=4)
(actual time=21.068..21.068 rows=14849 loops=1)
-> Seq Scan on tubesite_image (cost=0.00..215.49
rows=14849 width=4) (actual time=0.029..9.073 rows=14849 loops=1)
Total runtime: 136.287 ms

Now, if I disable nested loops in postgres.conf, then my load average on
the server goes skyhigh (i presume because a lot of other queries are
now being planned incorrectly).

I have set up default_statistics_target to 2000, and have vacumed and
analyzed the database.

Here are the other options I have set up in postgresql.conf (that differ
from the default settings):

version | PostgreSQL 8.4.8 on x86_64-pc-linux-gnu,
compiled by GCC gcc-4.3.real (Debian 4.3.2-1.1) 4.3.2, 64-bit
checkpoint_segments | 64
default_statistics_target | 2000
effective_cache_size | 20GB
external_pid_file | /var/run/postgresql/8.4-main.pid
lc_collate | en_US.UTF-8
lc_ctype | en_US.UTF-8
listen_addresses | *
log_autovacuum_min_duration | 0
log_checkpoints | on
log_line_prefix | %t [%p]: [%l-1]
log_min_duration_statement | 1s
maintenance_work_mem | 256MB
max_connections | 200
max_stack_depth | 3MB
port | 5432
server_encoding | UTF8
shared_buffers | 2GB
statement_timeout | 30min
temp_buffers | 4096
TimeZone | localtime
track_activity_query_size | 2048
unix_socket_directory | /var/run/postgresql
wal_buffers | 128MB
work_mem | 64MB

Why is planner using NestedLoops, that is, what can I do to make him NOT
to use NestedLoops (other than issuing SET enable_nestloop TO false;
before each query) ?

Mario

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message lars 2011-07-12 19:01:25 Re: UPDATEDs slowing SELECTs in a fully cached database
Previous Message Merlin Moncure 2011-07-12 16:23:36 Re: UPDATEDs slowing SELECTs in a fully cached database