GROUP BY with reasonable timings in PLAN but unreasonable execution time

From: Clem Dickey <dickeycl(at)us(dot)ibm(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: GROUP BY with reasonable timings in PLAN but unreasonable execution time
Date: 2011-07-06 02:26:18
Message-ID: iv0h49$1var$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I have a query which seems to be taking an extraordinarily long time
(many minutes, at least) when seemingly equivalent queries have
different plans and execute in seconds. naturally, I'd like to know why.

Version is Postgresql 8.4.8. The table, "t", is

Column | Type | Modifiers
--------+---------+-----------
y | integer | not null
x | integer | not null
k | integer | not null
j | integer | not null
z | integer | not null
Indexes:
"t_pkey" PRIMARY KEY, btree (j, k, x, y, z)

The table population, in pseudocode, is this:
for x in 0..9
for y in 0..9999
for z in 0..29
INSERT INTO t VALUES(y,x,0,0,z)

So the table has 300000 entries, with j and k always 0.

The query is:

SELECT *
FROM (
SELECT * FROM t GROUP BY j,k,x,z,y
) AS f
NATURAL JOIN t;

The plan:

Merge Join (cost=44508.90..66677.96 rows=1 width=20)
Merge Cond: ((public.t.j = public.t.j) AND (public.t.k = public.t.k)
AND (public.t.x = public.t.x))
Join Filter: ((public.t.y = public.t.y) AND (public.t.z = public.t.z))
-> Group (cost=44508.90..49008.90 rows=30000 width=20)
-> Sort (cost=44508.90..45258.90 rows=300000 width=20)
Sort Key: public.t.j, public.t.k, public.t.x, public.t.z,
public.t.y
-> Seq Scan on t (cost=0.00..4911.00 rows=300000 width=20)
-> Index Scan using t_pkey on t (cost=0.00..14877.18 rows=300000
width=20)

This query runs at least 20 minutes, with postmaster CPU utilization at
99%, without completing. System is a 3.2GHz Zeon, 3GB memory, and not
much else running.

By contrast, placing an intermediate result in a table "u" provides a
result in about 3 seconds:

CREATE TEMPORARY TABLE u AS SELECT * FROM t GROUP BY j,k,x,z,y;
SELECT * FROM u NATURAL JOIN t;

Changing the order of the GROUP BY clause varies the plan, sometimes
yielding shorter execution times. For example, this ordering executes in
about 1.5 seconds:

SELECT *
FROM (
SELECT * FROM t GROUP BY j,k,x,y,z
) AS f
NATURAL JOIN t;

With 120 permutations, I didn't try them all.

I should note that the plans tend to have similar costs, so the query
planner presumably does not know that some permutations have
significantly greater execution times.

Clem Dickey

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Samuel Gendler 2011-07-06 07:42:33 Re: Query in 9.0.2 not using index in 9.0.0 works fine
Previous Message Jonathan 2011-07-06 00:18:10 Slow query when using ORDER BY *and* LIMIT