Planner choosing NestedLoop, although it is slower...

Lists: pgsql-performance
From: Mario Splivalo <mario(dot)splivalo(at)megafon(dot)hr>
To: pgsql-performance(at)postgresql(dot)org
Subject: Planner choosing NestedLoop, although it is slower...
Date: 2011-07-12 18:11:46
Message-ID: 4E1C8E62.2090800@megafon.hr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hi, all.

I have a query, looking like this:
SELECT
pub_date
FROM
tubesite_object
INNER JOIN tubesite_image
ON tubesite_image.object_ptr_id = tubesite_object.id
WHERE
tubesite_object.site_id = 8
AND tubesite_object.pub_date < E'2011-07-12 13:25:00'
ORDER BY
tubesite_object.pub_date ASC
LIMIT 21;

That query takes cca 10-15 seconds to run. Here is query plan:

Limit (cost=0.00..415.91 rows=21 width=8) (actual
time=11263.089..11263.089 rows=0 loops=1)
-> Nested Loop (cost=0.00..186249.55 rows=9404 width=8) (actual
time=11263.087..11263.087 rows=0 loops=1)
-> Index Scan using tubesite_object_pub_date_idx on
tubesite_object (cost=0.00..183007.09 rows=9404 width=12) (actual
time=0.024..11059.487 rows=9374 loops=1)
Index Cond: (pub_date < '2011-07-12
13:25:00-05'::timestamp with time zone)
Filter: (site_id = 8)
-> Index Scan using tubesite_image_pkey on tubesite_image
(cost=0.00..0.33 rows=1 width=4) (actual time=0.021..0.021 rows=0
loops=9374)
Index Cond: (tubesite_image.object_ptr_id =
tubesite_object.id)
Total runtime: 11263.141 ms

This query runs quickly (around second or two) when there is only few
connections to the database. Once I have 50-80 connections (200 is the
limit, although I never have more than 120-150 connections), that query
takes around 10-15 seconds.

But, if I disable nestedloops, here is the query plan:

Limit (cost=22683.45..22683.51 rows=21 width=8) (actual
time=136.009..136.009 rows=0 loops=1)
-> Sort (cost=22683.45..22706.96 rows=9404 width=8) (actual
time=136.007..136.007 rows=0 loops=1)
Sort Key: tubesite_object.pub_date
Sort Method: quicksort Memory: 25kB
-> Hash Join (cost=946.51..22429.91 rows=9404 width=8)
(actual time=135.934..135.934 rows=0 loops=1)
Hash Cond: (tubesite_object.id =
tubesite_image.object_ptr_id)
-> Bitmap Heap Scan on tubesite_object
(cost=545.40..21828.97 rows=9404 width=12) (actual time=20.874..104.075
rows=9374 loops=1)
Recheck Cond: (site_id = 8)
Filter: (pub_date < '2011-07-12
13:25:00-05'::timestamp with time zone)
-> Bitmap Index Scan on tubesite_object_site_id
(cost=0.00..543.05 rows=9404 width=0) (actual time=18.789..18.789
rows=9374 loops=1)
Index Cond: (site_id = 8)
-> Hash (cost=215.49..215.49 rows=14849 width=4)
(actual time=21.068..21.068 rows=14849 loops=1)
-> Seq Scan on tubesite_image (cost=0.00..215.49
rows=14849 width=4) (actual time=0.029..9.073 rows=14849 loops=1)
Total runtime: 136.287 ms

Now, if I disable nested loops in postgres.conf, then my load average on
the server goes skyhigh (i presume because a lot of other queries are
now being planned incorrectly).

I have set up default_statistics_target to 2000, and have vacumed and
analyzed the database.

Here are the other options I have set up in postgresql.conf (that differ
from the default settings):

version | PostgreSQL 8.4.8 on x86_64-pc-linux-gnu,
compiled by GCC gcc-4.3.real (Debian 4.3.2-1.1) 4.3.2, 64-bit
checkpoint_segments | 64
default_statistics_target | 2000
effective_cache_size | 20GB
external_pid_file | /var/run/postgresql/8.4-main.pid
lc_collate | en_US.UTF-8
lc_ctype | en_US.UTF-8
listen_addresses | *
log_autovacuum_min_duration | 0
log_checkpoints | on
log_line_prefix | %t [%p]: [%l-1]
log_min_duration_statement | 1s
maintenance_work_mem | 256MB
max_connections | 200
max_stack_depth | 3MB
port | 5432
server_encoding | UTF8
shared_buffers | 2GB
statement_timeout | 30min
temp_buffers | 4096
TimeZone | localtime
track_activity_query_size | 2048
unix_socket_directory | /var/run/postgresql
wal_buffers | 128MB
work_mem | 64MB

Why is planner using NestedLoops, that is, what can I do to make him NOT
to use NestedLoops (other than issuing SET enable_nestloop TO false;
before each query) ?

Mario


From: Clem Dickey <dickeycl(at)us(dot)ibm(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Planner choosing NestedLoop, although it is slower...
Date: 2011-07-12 19:55:14
Message-ID: ivi8r5$16se$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 07/12/2011 11:11 AM, Mario Splivalo wrote:
> Hi, all.
>
> I have a query, looking like this:
> SELECT
> pub_date
> FROM
> tubesite_object
> INNER JOIN tubesite_image
> ON tubesite_image.object_ptr_id = tubesite_object.id
> WHERE
> tubesite_object.site_id = 8
> AND tubesite_object.pub_date < E'2011-07-12 13:25:00'
> ORDER BY
> tubesite_object.pub_date ASC
> LIMIT 21;
>

> Why is planner using NestedLoops, that is, what can I do to make him NOT
> to use NestedLoops (other than issuing SET enable_nestloop TO false;
> before each query) ?

The planner is using a nested loops because the startup overhead is
less, and it think that it will only have run a small 0.2% (21/9404) of
the loops before reaching your limit of 21 results. In fact it has to
run all the loops, because there are 0 results. (Is that what you expected?)

Try a using CTE to make the planner think you are going to use all the
rows of the joined table. That may cause the planner to use a merge
join, which has higher startup cost (sort) but less overall cost if it
the join will not finish early.

WITH t AS (
SELECT tubesite_object.site_id AS site_id,
tubesite_object.pub_date as pub_date
FROM tubesite_object
INNER JOIN tubesite_image
ON tubesite_image.object_ptr_id = tubesite_object.id
)
SELECT pub_date
FROM t
WHERE t.site_id = 8 AND t.pub_date < E'2011-07-12 13:25:00'
ORDER BY t.pub_date ASC LIMIT 21;


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mario Splivalo <mario(dot)splivalo(at)megafon(dot)hr>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Planner choosing NestedLoop, although it is slower...
Date: 2011-07-12 20:04:12
Message-ID: 4626.1310501052@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Mario Splivalo <mario(dot)splivalo(at)megafon(dot)hr> writes:
> Limit (cost=0.00..415.91 rows=21 width=8) (actual
> time=11263.089..11263.089 rows=0 loops=1)
> -> Nested Loop (cost=0.00..186249.55 rows=9404 width=8) (actual
> time=11263.087..11263.087 rows=0 loops=1)

> Why is planner using NestedLoops,

Because it thinks the LIMIT will kick in and end the query when the join
is only 21/9404ths (ie, a fraction of a percent) complete. A NestLoop
results in saving a lot of work in that situation, whereas hash-and-sort
has to do the whole join despite the LIMIT.

What you need to look into is why the estimated join size is 9400 rows
when the actual join size is zero. Are both tables ANALYZEd? Are you
intentionally selecting rows that have no join partners?

regards, tom lane


From: Mario Splivalo <mario(dot)splivalo(at)megafon(dot)hr>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Planner choosing NestedLoop, although it is slower...
Date: 2011-07-12 22:06:46
Message-ID: 4E1CC576.2080200@megafon.hr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 07/12/2011 10:04 PM, Tom Lane wrote:
> Mario Splivalo<mario(dot)splivalo(at)megafon(dot)hr> writes:
>> Limit (cost=0.00..415.91 rows=21 width=8) (actual
>> time=11263.089..11263.089 rows=0 loops=1)
>> -> Nested Loop (cost=0.00..186249.55 rows=9404 width=8) (actual
>> time=11263.087..11263.087 rows=0 loops=1)
>
>> Why is planner using NestedLoops,
>
> Because it thinks the LIMIT will kick in and end the query when the join
> is only 21/9404ths (ie, a fraction of a percent) complete. A NestLoop
> results in saving a lot of work in that situation, whereas hash-and-sort
> has to do the whole join despite the LIMIT.
>
> What you need to look into is why the estimated join size is 9400 rows
> when the actual join size is zero. Are both tables ANALYZEd? Are you
> intentionally selecting rows that have no join partners?

Hi, Tom.

Yes, both tables have been ANALYZEd. What do you mean, intentilnaly
selecting rows taht have no join partners?

Mario


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mario Splivalo <mario(dot)splivalo(at)megafon(dot)hr>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Planner choosing NestedLoop, although it is slower...
Date: 2011-07-12 22:39:22
Message-ID: 18737.1310510362@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Mario Splivalo <mario(dot)splivalo(at)megafon(dot)hr> writes:
> On 07/12/2011 10:04 PM, Tom Lane wrote:
>> What you need to look into is why the estimated join size is 9400 rows
>> when the actual join size is zero. Are both tables ANALYZEd? Are you
>> intentionally selecting rows that have no join partners?

> Yes, both tables have been ANALYZEd. What do you mean, intentilnaly
> selecting rows taht have no join partners?

I'm wondering why the actual join size is zero. That seems like a
rather unexpected case for a query like this.

regards, tom lane


From: Mario Splivalo <mario(dot)splivalo(at)megafon(dot)hr>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Planner choosing NestedLoop, although it is slower...
Date: 2011-07-12 23:57:06
Message-ID: 4E1CDF52.4020208@megafon.hr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 07/13/2011 12:39 AM, Tom Lane wrote:
> Mario Splivalo<mario(dot)splivalo(at)megafon(dot)hr> writes:
>> On 07/12/2011 10:04 PM, Tom Lane wrote:
>>> What you need to look into is why the estimated join size is 9400 rows
>>> when the actual join size is zero. Are both tables ANALYZEd? Are you
>>> intentionally selecting rows that have no join partners?
>
>> Yes, both tables have been ANALYZEd. What do you mean, intentilnaly
>> selecting rows taht have no join partners?
>
> I'm wondering why the actual join size is zero. That seems like a
> rather unexpected case for a query like this.

It is true that this particular query returns 0 rows. But it's created
by django, and I can't do much to alter it.

Mario


From: Mario Splivalo <mario(dot)splivalo(at)megafon(dot)hr>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Planner choosing NestedLoop, although it is slower...
Date: 2011-07-13 00:53:32
Message-ID: 4E1CEC8C.3000508@megafon.hr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 07/13/2011 12:39 AM, Tom Lane wrote:
> Mario Splivalo<mario(dot)splivalo(at)megafon(dot)hr> writes:
>> On 07/12/2011 10:04 PM, Tom Lane wrote:
>>> What you need to look into is why the estimated join size is 9400 rows
>>> when the actual join size is zero. Are both tables ANALYZEd? Are you
>>> intentionally selecting rows that have no join partners?
>
>> Yes, both tables have been ANALYZEd. What do you mean, intentilnaly
>> selecting rows taht have no join partners?
>
> I'm wondering why the actual join size is zero. That seems like a
> rather unexpected case for a query like this.

Yes, seems that planer gets confused by LIMIT. This query:

select * from tubesite_object join tubesite_image on id=object_ptr_id
where site_id = 8 and pub_date < '2011-07-12 13:25:00' order by pub_date
desc ;

Does not choose Nested Loop, and is done instantly (20 ms), and returns
no rows. However, if I add LIMIT at the end, it chooses NestedLoop and
it takes 500ms if I'm alone on the server, and 10+ seconds if there 50+
connections on the server.

Mario


From: Mario Splivalo <mario(dot)splivalo(at)megafon(dot)hr>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Planner choosing NestedLoop, although it is slower...
Date: 2011-07-13 02:04:35
Message-ID: 4E1CFD33.7040407@megafon.hr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 07/13/2011 02:53 AM, Mario Splivalo wrote:
> On 07/13/2011 12:39 AM, Tom Lane wrote:
>> Mario Splivalo<mario(dot)splivalo(at)megafon(dot)hr> writes:
>>> On 07/12/2011 10:04 PM, Tom Lane wrote:
>>>> What you need to look into is why the estimated join size is 9400 rows
>>>> when the actual join size is zero. Are both tables ANALYZEd? Are you
>>>> intentionally selecting rows that have no join partners?
>>
>>> Yes, both tables have been ANALYZEd. What do you mean, intentilnaly
>>> selecting rows taht have no join partners?
>>
>> I'm wondering why the actual join size is zero. That seems like a
>> rather unexpected case for a query like this.
>
> Yes, seems that planer gets confused by LIMIT. This query:
>
> select * from tubesite_object join tubesite_image on id=object_ptr_id
> where site_id = 8 and pub_date < '2011-07-12 13:25:00' order by pub_date
> desc ;
>
> Does not choose Nested Loop, and is done instantly (20 ms), and returns
> no rows. However, if I add LIMIT at the end, it chooses NestedLoop and
> it takes 500ms if I'm alone on the server, and 10+ seconds if there 50+
> connections on the server.

As explained/suggested by RhodiumToad on IRC, adding composite index on
(site_id, pub_date) made nestedloop query finish in around 100 seconds!

Thank you!

Mario