Re: partition query using Seq Scan even when index is present

Lists: pgsql-performance
From: "Kenneth Cox" <kenstir(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: partition query using Seq Scan even when index is present
Date: 2009-09-02 20:15:34
Message-ID: op.uznbv8qi5ru9c3@kent60.office.vivox.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

With postgresql-8.3.6, I have many partitions inheriting a table. SELECT
min() on the parent performs a Seq Scan, but SELECT min() on a child uses
the index. Is this another case where the planner is not aware enough to
come up with the best plan? I tried creating an index on the parent table
to no avail. Is there a way to formulate the query so that it uses the
index? Here is the general flavor:

create table calls (caller text, ts timestamptz);
create table calls_partition_2009_08 (check (ts >= '2009-08-01' and ts <
'2009-09-01')) inherits (calls);
create index calls_partition_2009_08_ts on calls_partition_2009_08 (ts);
insert into calls_partition_2009_08 (ts)
select to_timestamp(unix_time)
from generate_series(extract(epoch from
'2009-08-01'::timestamptz)::int,
extract(epoch from '2009-08-31
23:59'::timestamptz)::int, 60) as unix_time;
analyze calls_partition_2009_08;
explain select min(ts) from calls;

QUERY PLAN
-----------------------------------------------------------------------------------------------
Aggregate (cost=780.50..780.51 rows=1 width=8)
-> Append (cost=0.00..666.00 rows=45800 width=8)
-> Seq Scan on calls (cost=0.00..21.60 rows=1160 width=8)
-> Seq Scan on calls_partition_2009_08 calls (cost=0.00..644.40
rows=44640 width=8)
(4 rows)

explain select min(ts) from calls_partition_2009_08;

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Result (cost=0.03..0.04 rows=1 width=0)
InitPlan
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan using calls_partition_2009_08_ts on
calls_partition_2009_08 (cost=0.00..1366.85 rows=44640 width=8)
Filter: (ts IS NOT NULL)
(5 rows)


From: Greg Jaman <gjaman(at)gmail(dot)com>
To: Kenneth Cox <kenstir(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: partition query using Seq Scan even when index is present
Date: 2009-09-02 20:31:29
Message-ID: b72893ad0909021331q45fe1a9gb74d0950ff03326a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Yep.... I ran into the exact same problem.
My solution was to create a pl/pgsql function to query the child tables: (
http://archives.postgresql.org/pgsql-performance/2008-11/msg00284.php)
If you find a better solution please share.

-Greg Jaman

On Wed, Sep 2, 2009 at 1:15 PM, Kenneth Cox <kenstir(at)gmail(dot)com> wrote:

> With postgresql-8.3.6, I have many partitions inheriting a table. SELECT
> min() on the parent performs a Seq Scan, but SELECT min() on a child uses
> the index. Is this another case where the planner is not aware enough to
> come up with the best plan? I tried creating an index on the parent table
> to no avail. Is there a way to formulate the query so that it uses the
> index? Here is the general flavor:
>
> create table calls (caller text, ts timestamptz);
> create table calls_partition_2009_08 (check (ts >= '2009-08-01' and ts <
> '2009-09-01')) inherits (calls);
> create index calls_partition_2009_08_ts on calls_partition_2009_08 (ts);
> insert into calls_partition_2009_08 (ts)
> select to_timestamp(unix_time)
> from generate_series(extract(epoch from '2009-08-01'::timestamptz)::int,
> extract(epoch from '2009-08-31
> 23:59'::timestamptz)::int, 60) as unix_time;
> analyze calls_partition_2009_08;
> explain select min(ts) from calls;
>
> QUERY PLAN
>
> -----------------------------------------------------------------------------------------------
> Aggregate (cost=780.50..780.51 rows=1 width=8)
> -> Append (cost=0.00..666.00 rows=45800 width=8)
> -> Seq Scan on calls (cost=0.00..21.60 rows=1160 width=8)
> -> Seq Scan on calls_partition_2009_08 calls (cost=0.00..644.40
> rows=44640 width=8)
> (4 rows)
>
> explain select min(ts) from calls_partition_2009_08;
>
> QUERY PLAN
>
> -------------------------------------------------------------------------------------------------------------------------------
> Result (cost=0.03..0.04 rows=1 width=0)
> InitPlan
> -> Limit (cost=0.00..0.03 rows=1 width=8)
> -> Index Scan using calls_partition_2009_08_ts on
> calls_partition_2009_08 (cost=0.00..1366.85 rows=44640 width=8)
> Filter: (ts IS NOT NULL)
> (5 rows)
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Kenneth Cox <kenstir(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: partition query using Seq Scan even when index is present
Date: 2009-09-03 14:49:36
Message-ID: b42b73150909030749h3ab95d05sa59b5a5545e376fc@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Wed, Sep 2, 2009 at 4:15 PM, Kenneth Cox<kenstir(at)gmail(dot)com> wrote:
> With postgresql-8.3.6, I have many partitions inheriting a table.  SELECT
> min() on the parent performs a Seq Scan, but SELECT min() on a child uses
> the index.  Is this another case where the planner is not aware enough to
> come up with the best plan?  I tried creating an index on the parent table
> to no avail.  Is there a way to formulate the query so that it uses the
> index?  Here is the general flavor:
>
> create table calls (caller text, ts timestamptz);
> create table calls_partition_2009_08 (check (ts >= '2009-08-01' and ts <
> '2009-09-01')) inherits (calls);
> create index calls_partition_2009_08_ts on calls_partition_2009_08 (ts);
> insert into calls_partition_2009_08 (ts)
>  select to_timestamp(unix_time)
>    from generate_series(extract(epoch from '2009-08-01'::timestamptz)::int,
>                         extract(epoch from '2009-08-31
> 23:59'::timestamptz)::int, 60) as unix_time;
> analyze calls_partition_2009_08;
> explain select min(ts) from calls;

ATM, constraint exclusion mainly only supports queries of the form:
SELECT ... WHERE 'x', with x being an expression in the check
constraint. Table partitioning unfortunately is not a free lunch, you
have to be aware of it at all times when writing queries vs your
partitioned tables.

merlin


From: "Kenneth Cox" <kenstir(at)gmail(dot)com>
To: "Greg Jaman" <gjaman(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: partition query using Seq Scan even when index is present
Date: 2009-09-03 16:13:36
Message-ID: op.uzovcy2n5ru9c3@kent60.office.vivox.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Thank you, Greg! I tweaked your function to use recursion to search all
inherited tables; my inheritance structure is two levels deep.

This function is for integers only; I will copy/waste to create one for
timestamps. Extra credit for anyone who can rewrite it to be polymorphic.

-- Same as max(_colname) from _relname but much faster for inherited
-- tables with an index on _colname. In postgresql-8.3.6 a naive query
-- on a parent table will not use the indexes on the child tables.
create or replace function partition_max_int(_relname text, _colname text)
returns int AS
$$
declare
childtable RECORD;
childres RECORD;
maxval int;
tmpval int;
sql text;
begin
-- find max in this table (only)
sql := 'select max('||_colname||') from only '||quote_ident(_relname);
execute sql into maxval;

-- recurse to find max in descendants
FOR childtable in
select pc.relname as relname
from pg_class pc
join pg_inherits pi on pc.oid=pi.inhrelid
where inhparent=(select oid from pg_class where relname=_relname)
LOOP
tmpval := partition_max_int(childtable.relname, _colname);
IF tmpval is not NULL and (tmpval > maxval or maxval is null) THEN
maxval := tmpval;
END IF;
END LOOP;

return maxval;
end;
$$
language 'plpgsql' STABLE;