Re: Strange explain on partitioned tables

Lists: pgsql-performance
From: Philippe Rimbault <primbault(at)edd(dot)fr>
To: pgsql-performance(at)postgresql(dot)org
Subject: Strange explain on partitioned tables
Date: 2010-07-22 07:52:40
Message-ID: 4C47F8C8.10005@edd.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance


Hi all,

I'm using Postgresql 8.4.4 on Debian.
In postgresql.conf, constraint_exclusion is set to "on"

I have partitioned tables with check constraints.
My primary table :
CREATE TABLE documents
(
id serial NOT NULL,
id_source smallint,
nod integer,
num text,
id_fourniture integer,
dav date NOT NULL,
maj timestamp without time zone NOT NULL DEFAULT now(),
id_location "char",
id_partition smallint,
mark text
);

There is no row in "only" documents :
SQL> select count(*) from only documents;
-> 0
SQL> select count(*) from documents;
-> 160155756

I have one thousand inherited tables like this one (with a different
check constraint on each) :
CREATE TABLE documents_mond
(
CONSTRAINT documents_mond_id_source_check CHECK (id_source = 113)
)
INHERITS (documents);
CREATE INDEX idx_documents_mond_id
ON documents_mond
USING btree
(id);

CREATE INDEX idx_documents_mond_id_partition
ON documents_mond
USING btree
(id_partition);

CREATE INDEX idx_documents_mond_id_source_dav
ON documents_mond
USING btree
(id_source, dav);
ALTER TABLE documents_mond CLUSTER ON idx_documents_mond_id_source_dav;

CREATE INDEX idx_documents_mond_id_source_nod
ON documents_mond
USING btree
(id_source, nod);

CREATE INDEX idx_documents_mond_id_source_num
ON documents_mond
USING btree
(id_source, num);

CREATE INDEX idx_documents_mond_maj
ON documents_mond
USING btree
(maj);

SQL> select count(*) from documents_mond;
-> 1053929

When i perform this query on the primary table :
EXPLAIN ANALYZE
select
documents.id,
documents.num,
sources.name,
l.name
from
documents,
locations l,
sources
where
documents.id_source = 113 and
documents.id_location=l.id and
documents.id_source=sources.id
order by
documents.id desc
limit 5;
"Limit (cost=36209.55..36209.57 rows=5 width=24) (actual
time=2307.181..2307.185 rows=5 loops=1)"
" -> Sort (cost=36209.55..36512.56 rows=121202 width=24) (actual
time=2307.180..2307.180 rows=5 loops=1)"
" Sort Key: public.documents.id"
" Sort Method: top-N heapsort Memory: 17kB"
" -> Nested Loop (cost=1.52..34196.43 rows=121202 width=24)
(actual time=0.076..1878.189 rows=1053929 loops=1)"
" -> Index Scan using pk_sources on sources
(cost=0.00..8.27 rows=1 width=8) (actual time=0.013..0.015 rows=1 loops=1)"
" Index Cond: (id = 113)"
" -> Hash Join (cost=1.52..32976.15 rows=121202 width=22)
(actual time=0.059..1468.982 rows=1053929 loops=1)"
" Hash Cond: (public.documents.id_location = l.id)"
" -> Append (cost=0.00..27810.36 rows=1053932
width=14) (actual time=0.031..836.280 rows=1053929 loops=1)"
" -> Seq Scan on documents (cost=0.00..18.25
rows=3 width=39) (actual time=0.001..0.001 rows=0 loops=1)"
" Filter: (id_source = 113)"
" -> Seq Scan on documents_mond documents
(cost=0.00..27792.11 rows=1053929 width=14) (actual time=0.030..503.815
rows=1053929 loops=1)"
" Filter: (id_source = 113)"
" -> Hash (cost=1.23..1.23 rows=23 width=10)
(actual time=0.019..0.019 rows=23 loops=1)"
" -> Seq Scan on locations l (cost=0.00..1.23
rows=23 width=10) (actual time=0.001..0.007 rows=23 loops=1)"
"Total runtime: 2307.498 ms"

And when i perform the same query directly on the inherited table (CHECK
id_source=113) :
EXPLAIN ANALYZE
select
documents.id,
documents.num,
sources.name,
l.name
from
documents_mond documents,
locations l,
sources
where
documents.id_source = 113 and
documents.id_location=l.id and
documents.id_source=sources.id
order by
documents.id desc
limit 5;
"Limit (cost=0.00..43.13 rows=5 width=24) (actual time=0.024..0.050
rows=5 loops=1)"
" -> Nested Loop (cost=0.00..9091234.75 rows=1053929 width=24)
(actual time=0.023..0.049 rows=5 loops=1)"
" -> Nested Loop (cost=0.00..8796038.31 rows=1053929 width=16)
(actual time=0.020..0.035 rows=5 loops=1)"
" -> Index Scan Backward using idx_documents_mond_id on
documents_mond documents (cost=0.00..71930.23 rows=1053929 width=14)
(actual time=0.012..0.015 rows=5 loops=1)"
" Filter: (id_source = 113)"
" -> Index Scan using pk_sources on sources
(cost=0.00..8.27 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=5)"
" Index Cond: (sources.id = 113)"
" -> Index Scan using locations_pkey on locations l
(cost=0.00..0.27 rows=1 width=10) (actual time=0.001..0.002 rows=1 loops=5)"
" Index Cond: (l.id = documents.id_location)"
"Total runtime: 0.086 ms"

OR

EXPLAIN ANALYZE
select
documents.id,
documents.num,
sources.name,
l.name
from
documents_mond documents,
locations l,
sources
where
/* documents.id_source = 113 and */
documents.id_location=l.id and
documents.id_source=sources.id
order by
documents.id desc
limit 5;
"Limit (cost=0.00..3.13 rows=5 width=24) (actual time=0.025..0.052
rows=5 loops=1)"
" -> Nested Loop (cost=0.00..659850.75 rows=1053929 width=24) (actual
time=0.024..0.051 rows=5 loops=1)"
" -> Nested Loop (cost=0.00..364654.31 rows=1053929 width=16)
(actual time=0.021..0.037 rows=5 loops=1)"
" -> Index Scan Backward using idx_documents_mond_id on
documents_mond documents (cost=0.00..69295.41 rows=1053929 width=14)
(actual time=0.011..0.013 rows=5 loops=1)"
" -> Index Scan using pk_sources on sources
(cost=0.00..0.27 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=5)"
" Index Cond: (sources.id = documents.id_source)"
" -> Index Scan using locations_pkey on locations l
(cost=0.00..0.27 rows=1 width=10) (actual time=0.002..0.002 rows=1 loops=5)"
" Index Cond: (l.id = documents.id_location)"
"Total runtime: 0.091 ms"

Is it a normal behavior ?
I need to rewrite all my Perl scripts to have query pointing only on
inherited tables (when possible) ?
I was thinking that query pointing on primary table were correctly
dispatched on inherited tables ... I missing something ?

Regards

Philippe

Ps : I'm french, so my english is approximate ... hoping it's understandable


From: Philippe Rimbault <primbault(at)edd(dot)fr>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Strange explain on partitioned tables
Date: 2010-07-22 08:57:58
Message-ID: 4C480816.5030406@edd.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Oups! searching on the mailing list show me that it's a known problem ...

http://archives.postgresql.org/pgsql-performance/2010-07/msg00063.php

sorry !

On 22/07/2010 09:52, Philippe Rimbault wrote:
>
> Hi all,
>
> I'm using Postgresql 8.4.4 on Debian.
> In postgresql.conf, constraint_exclusion is set to "on"
>
> I have partitioned tables with check constraints.
> My primary table :
> CREATE TABLE documents
> (
> id serial NOT NULL,
> id_source smallint,
> nod integer,
> num text,
> id_fourniture integer,
> dav date NOT NULL,
> maj timestamp without time zone NOT NULL DEFAULT now(),
> id_location "char",
> id_partition smallint,
> mark text
> );
>
> There is no row in "only" documents :
> SQL> select count(*) from only documents;
> -> 0
> SQL> select count(*) from documents;
> -> 160155756
>
> I have one thousand inherited tables like this one (with a different
> check constraint on each) :
> CREATE TABLE documents_mond
> (
> CONSTRAINT documents_mond_id_source_check CHECK (id_source = 113)
> )
> INHERITS (documents);
> CREATE INDEX idx_documents_mond_id
> ON documents_mond
> USING btree
> (id);
>
> CREATE INDEX idx_documents_mond_id_partition
> ON documents_mond
> USING btree
> (id_partition);
>
> CREATE INDEX idx_documents_mond_id_source_dav
> ON documents_mond
> USING btree
> (id_source, dav);
> ALTER TABLE documents_mond CLUSTER ON
> idx_documents_mond_id_source_dav;
>
> CREATE INDEX idx_documents_mond_id_source_nod
> ON documents_mond
> USING btree
> (id_source, nod);
>
> CREATE INDEX idx_documents_mond_id_source_num
> ON documents_mond
> USING btree
> (id_source, num);
>
> CREATE INDEX idx_documents_mond_maj
> ON documents_mond
> USING btree
> (maj);
>
> SQL> select count(*) from documents_mond;
> -> 1053929
>
> When i perform this query on the primary table :
> EXPLAIN ANALYZE
> select
> documents.id,
> documents.num,
> sources.name,
> l.name
> from
> documents,
> locations l,
> sources
> where
> documents.id_source = 113 and
> documents.id_location=l.id and
> documents.id_source=sources.id
> order by
> documents.id desc
> limit 5;
> "Limit (cost=36209.55..36209.57 rows=5 width=24) (actual
> time=2307.181..2307.185 rows=5 loops=1)"
> " -> Sort (cost=36209.55..36512.56 rows=121202 width=24) (actual
> time=2307.180..2307.180 rows=5 loops=1)"
> " Sort Key: public.documents.id"
> " Sort Method: top-N heapsort Memory: 17kB"
> " -> Nested Loop (cost=1.52..34196.43 rows=121202 width=24)
> (actual time=0.076..1878.189 rows=1053929 loops=1)"
> " -> Index Scan using pk_sources on sources
> (cost=0.00..8.27 rows=1 width=8) (actual time=0.013..0.015 rows=1
> loops=1)"
> " Index Cond: (id = 113)"
> " -> Hash Join (cost=1.52..32976.15 rows=121202
> width=22) (actual time=0.059..1468.982 rows=1053929 loops=1)"
> " Hash Cond: (public.documents.id_location = l.id)"
> " -> Append (cost=0.00..27810.36 rows=1053932
> width=14) (actual time=0.031..836.280 rows=1053929 loops=1)"
> " -> Seq Scan on documents
> (cost=0.00..18.25 rows=3 width=39) (actual time=0.001..0.001 rows=0
> loops=1)"
> " Filter: (id_source = 113)"
> " -> Seq Scan on documents_mond documents
> (cost=0.00..27792.11 rows=1053929 width=14) (actual
> time=0.030..503.815 rows=1053929 loops=1)"
> " Filter: (id_source = 113)"
> " -> Hash (cost=1.23..1.23 rows=23 width=10)
> (actual time=0.019..0.019 rows=23 loops=1)"
> " -> Seq Scan on locations l
> (cost=0.00..1.23 rows=23 width=10) (actual time=0.001..0.007 rows=23
> loops=1)"
> "Total runtime: 2307.498 ms"
>
> And when i perform the same query directly on the inherited table
> (CHECK id_source=113) :
> EXPLAIN ANALYZE
> select
> documents.id,
> documents.num,
> sources.name,
> l.name
> from
> documents_mond documents,
> locations l,
> sources
> where
> documents.id_source = 113 and
> documents.id_location=l.id and
> documents.id_source=sources.id
> order by
> documents.id desc
> limit 5;
> "Limit (cost=0.00..43.13 rows=5 width=24) (actual time=0.024..0.050
> rows=5 loops=1)"
> " -> Nested Loop (cost=0.00..9091234.75 rows=1053929 width=24)
> (actual time=0.023..0.049 rows=5 loops=1)"
> " -> Nested Loop (cost=0.00..8796038.31 rows=1053929
> width=16) (actual time=0.020..0.035 rows=5 loops=1)"
> " -> Index Scan Backward using idx_documents_mond_id on
> documents_mond documents (cost=0.00..71930.23 rows=1053929 width=14)
> (actual time=0.012..0.015 rows=5 loops=1)"
> " Filter: (id_source = 113)"
> " -> Index Scan using pk_sources on sources
> (cost=0.00..8.27 rows=1 width=8) (actual time=0.003..0.003 rows=1
> loops=5)"
> " Index Cond: (sources.id = 113)"
> " -> Index Scan using locations_pkey on locations l
> (cost=0.00..0.27 rows=1 width=10) (actual time=0.001..0.002 rows=1
> loops=5)"
> " Index Cond: (l.id = documents.id_location)"
> "Total runtime: 0.086 ms"
>
> OR
>
> EXPLAIN ANALYZE
> select
> documents.id,
> documents.num,
> sources.name,
> l.name
> from
> documents_mond documents,
> locations l,
> sources
> where
> /* documents.id_source = 113 and */
> documents.id_location=l.id and
> documents.id_source=sources.id
> order by
> documents.id desc
> limit 5;
> "Limit (cost=0.00..3.13 rows=5 width=24) (actual time=0.025..0.052
> rows=5 loops=1)"
> " -> Nested Loop (cost=0.00..659850.75 rows=1053929 width=24)
> (actual time=0.024..0.051 rows=5 loops=1)"
> " -> Nested Loop (cost=0.00..364654.31 rows=1053929 width=16)
> (actual time=0.021..0.037 rows=5 loops=1)"
> " -> Index Scan Backward using idx_documents_mond_id on
> documents_mond documents (cost=0.00..69295.41 rows=1053929 width=14)
> (actual time=0.011..0.013 rows=5 loops=1)"
> " -> Index Scan using pk_sources on sources
> (cost=0.00..0.27 rows=1 width=8) (actual time=0.003..0.003 rows=1
> loops=5)"
> " Index Cond: (sources.id = documents.id_source)"
> " -> Index Scan using locations_pkey on locations l
> (cost=0.00..0.27 rows=1 width=10) (actual time=0.002..0.002 rows=1
> loops=5)"
> " Index Cond: (l.id = documents.id_location)"
> "Total runtime: 0.091 ms"
>
> Is it a normal behavior ?
> I need to rewrite all my Perl scripts to have query pointing only on
> inherited tables (when possible) ?
> I was thinking that query pointing on primary table were correctly
> dispatched on inherited tables ... I missing something ?
>
> Regards
>
> Philippe
>
>
> Ps : I'm french, so my english is approximate ... hoping it's
> understandable
>
>


From: Philippe Rimbault <primbault(at)edd(dot)fr>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Strange explain on partitioned tables
Date: 2010-07-22 10:03:12
Message-ID: 4C481760.7080003@edd.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

FYI

I've just installed Postgresql 9 beta 3 (9.0beta3 on i686-pc-linux-gnu,
compiled by GCC gcc (Debian 4.4.4-6) 4.4.4, 32-bit)

After a pg_upgrade + vacuum analyze, i've got the following results :

Query on primary table :
select
documents.id,
documents.num,
sources.name,
l.name
from
documents,
locations l,
sources
where
documents.id_source = 113 and
documents.id_location=l.id and
documents.id_source=sources.id
order by
documents.id desc
limit 5;
"Limit (cost=70356.46..70356.48 rows=5 width=23) (actual
time=2362.268..2362.271 rows=5 loops=1)"
" -> Sort (cost=70356.46..72991.29 rows=1053932 width=23) (actual
time=2362.267..2362.269 rows=5 loops=1)"
" Sort Key: public.documents.id"
" Sort Method: top-N heapsort Memory: 17kB"
" -> Nested Loop (cost=1.52..52851.03 rows=1053932 width=23)
(actual time=0.062..1912.826 rows=1053929 loops=1)"
" -> Index Scan using pk_sources on sources
(cost=0.00..8.27 rows=1 width=8) (actual time=0.006..0.009 rows=1 loops=1)"
" Index Cond: (id = 113)"
" -> Hash Join (cost=1.52..42303.44 rows=1053932
width=21) (actual time=0.052..1490.353 rows=1053929 loops=1)"
" Hash Cond: (public.documents.id_location = l.id)"
" -> Append (cost=0.00..27810.36 rows=1053932
width=13) (actual time=0.027..842.627 rows=1053929 loops=1)"
" -> Seq Scan on documents (cost=0.00..18.25
rows=3 width=39) (actual time=0.000..0.000 rows=0 loops=1)"
" Filter: (id_source = 113)"
" -> Seq Scan on documents_mond documents
(cost=0.00..27792.11 rows=1053929 width=13) (actual time=0.025..497.517
rows=1053929 loops=1)"
" Filter: (id_source = 113)"
" -> Hash (cost=1.23..1.23 rows=23 width=10)
(actual time=0.018..0.018 rows=23 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 1kB"
" -> Seq Scan on locations l (cost=0.00..1.23
rows=23 width=10) (actual time=0.001..0.010 rows=23 loops=1)"
"Total runtime: 2362.369 ms"

On inherted table :
select
documents.id,
documents.num,
sources.name,
l.name
from
documents_mond documents,
locations l,
sources
where
documents.id_source = 113 and
documents.id_location=l.id and
documents.id_source=sources.id
order by
documents.id desc
limit 5;

"Limit (cost=0.00..1.81 rows=5 width=23) (actual time=0.033..0.056
rows=5 loops=1)"
" -> Nested Loop (cost=0.00..381351.92 rows=1053929 width=23) (actual
time=0.032..0.052 rows=5 loops=1)"
" -> Nested Loop (cost=0.00..368169.54 rows=1053929 width=21)
(actual time=0.023..0.037 rows=5 loops=1)"
" -> Index Scan Backward using idx_documents_mond_id on
documents_mond documents (cost=0.00..72973.11 rows=1053929 width=13)
(actual time=0.014..0.017 rows=5 loops=1)"
" Filter: (id_source = 113)"
" -> Index Scan using locations_pkey on locations l
(cost=0.00..0.27 rows=1 width=10) (actual time=0.002..0.003 rows=1 loops=5)"
" Index Cond: (l.id = documents.id_location)"
" -> Materialize (cost=0.00..8.27 rows=1 width=8) (actual
time=0.002..0.002 rows=1 loops=5)"
" -> Index Scan using pk_sources on sources
(cost=0.00..8.27 rows=1 width=8) (actual time=0.005..0.006 rows=1 loops=1)"
" Index Cond: (id = 113)"
"Total runtime: 0.095 ms"

Results are better than 8.4 if query is on inherted table but worth if
query is on primary table.

So waiting for 9.0 will not help me so much ! :)

On 22/07/2010 10:57, Philippe Rimbault wrote:
> Oups! searching on the mailing list show me that it's a known problem ...
>
> http://archives.postgresql.org/pgsql-performance/2010-07/msg00063.php
>
> sorry !
>
>
>
> On 22/07/2010 09:52, Philippe Rimbault wrote:
>>
>> Hi all,
>>
>> I'm using Postgresql 8.4.4 on Debian.
>> In postgresql.conf, constraint_exclusion is set to "on"
>>
>> I have partitioned tables with check constraints.
>> My primary table :
>> CREATE TABLE documents
>> (
>> id serial NOT NULL,
>> id_source smallint,
>> nod integer,
>> num text,
>> id_fourniture integer,
>> dav date NOT NULL,
>> maj timestamp without time zone NOT NULL DEFAULT now(),
>> id_location "char",
>> id_partition smallint,
>> mark text
>> );
>>
>> There is no row in "only" documents :
>> SQL> select count(*) from only documents;
>> -> 0
>> SQL> select count(*) from documents;
>> -> 160155756
>>
>> I have one thousand inherited tables like this one (with a different
>> check constraint on each) :
>> CREATE TABLE documents_mond
>> (
>> CONSTRAINT documents_mond_id_source_check CHECK (id_source =
>> 113)
>> )
>> INHERITS (documents);
>> CREATE INDEX idx_documents_mond_id
>> ON documents_mond
>> USING btree
>> (id);
>>
>> CREATE INDEX idx_documents_mond_id_partition
>> ON documents_mond
>> USING btree
>> (id_partition);
>>
>> CREATE INDEX idx_documents_mond_id_source_dav
>> ON documents_mond
>> USING btree
>> (id_source, dav);
>> ALTER TABLE documents_mond CLUSTER ON
>> idx_documents_mond_id_source_dav;
>>
>> CREATE INDEX idx_documents_mond_id_source_nod
>> ON documents_mond
>> USING btree
>> (id_source, nod);
>>
>> CREATE INDEX idx_documents_mond_id_source_num
>> ON documents_mond
>> USING btree
>> (id_source, num);
>>
>> CREATE INDEX idx_documents_mond_maj
>> ON documents_mond
>> USING btree
>> (maj);
>>
>> SQL> select count(*) from documents_mond;
>> -> 1053929
>>
>> When i perform this query on the primary table :
>> EXPLAIN ANALYZE
>> select
>> documents.id,
>> documents.num,
>> sources.name,
>> l.name
>> from
>> documents,
>> locations l,
>> sources
>> where
>> documents.id_source = 113 and
>> documents.id_location=l.id and
>> documents.id_source=sources.id
>> order by
>> documents.id desc
>> limit 5;
>> "Limit (cost=36209.55..36209.57 rows=5 width=24) (actual
>> time=2307.181..2307.185 rows=5 loops=1)"
>> " -> Sort (cost=36209.55..36512.56 rows=121202 width=24) (actual
>> time=2307.180..2307.180 rows=5 loops=1)"
>> " Sort Key: public.documents.id"
>> " Sort Method: top-N heapsort Memory: 17kB"
>> " -> Nested Loop (cost=1.52..34196.43 rows=121202 width=24)
>> (actual time=0.076..1878.189 rows=1053929 loops=1)"
>> " -> Index Scan using pk_sources on sources
>> (cost=0.00..8.27 rows=1 width=8) (actual time=0.013..0.015 rows=1
>> loops=1)"
>> " Index Cond: (id = 113)"
>> " -> Hash Join (cost=1.52..32976.15 rows=121202
>> width=22) (actual time=0.059..1468.982 rows=1053929 loops=1)"
>> " Hash Cond: (public.documents.id_location = l.id)"
>> " -> Append (cost=0.00..27810.36 rows=1053932
>> width=14) (actual time=0.031..836.280 rows=1053929 loops=1)"
>> " -> Seq Scan on documents
>> (cost=0.00..18.25 rows=3 width=39) (actual time=0.001..0.001 rows=0
>> loops=1)"
>> " Filter: (id_source = 113)"
>> " -> Seq Scan on documents_mond documents
>> (cost=0.00..27792.11 rows=1053929 width=14) (actual
>> time=0.030..503.815 rows=1053929 loops=1)"
>> " Filter: (id_source = 113)"
>> " -> Hash (cost=1.23..1.23 rows=23 width=10)
>> (actual time=0.019..0.019 rows=23 loops=1)"
>> " -> Seq Scan on locations l
>> (cost=0.00..1.23 rows=23 width=10) (actual time=0.001..0.007 rows=23
>> loops=1)"
>> "Total runtime: 2307.498 ms"
>>
>> And when i perform the same query directly on the inherited table
>> (CHECK id_source=113) :
>> EXPLAIN ANALYZE
>> select
>> documents.id,
>> documents.num,
>> sources.name,
>> l.name
>> from
>> documents_mond documents,
>> locations l,
>> sources
>> where
>> documents.id_source = 113 and
>> documents.id_location=l.id and
>> documents.id_source=sources.id
>> order by
>> documents.id desc
>> limit 5;
>> "Limit (cost=0.00..43.13 rows=5 width=24) (actual time=0.024..0.050
>> rows=5 loops=1)"
>> " -> Nested Loop (cost=0.00..9091234.75 rows=1053929 width=24)
>> (actual time=0.023..0.049 rows=5 loops=1)"
>> " -> Nested Loop (cost=0.00..8796038.31 rows=1053929
>> width=16) (actual time=0.020..0.035 rows=5 loops=1)"
>> " -> Index Scan Backward using idx_documents_mond_id on
>> documents_mond documents (cost=0.00..71930.23 rows=1053929 width=14)
>> (actual time=0.012..0.015 rows=5 loops=1)"
>> " Filter: (id_source = 113)"
>> " -> Index Scan using pk_sources on sources
>> (cost=0.00..8.27 rows=1 width=8) (actual time=0.003..0.003 rows=1
>> loops=5)"
>> " Index Cond: (sources.id = 113)"
>> " -> Index Scan using locations_pkey on locations l
>> (cost=0.00..0.27 rows=1 width=10) (actual time=0.001..0.002 rows=1
>> loops=5)"
>> " Index Cond: (l.id = documents.id_location)"
>> "Total runtime: 0.086 ms"
>>
>> OR
>>
>> EXPLAIN ANALYZE
>> select
>> documents.id,
>> documents.num,
>> sources.name,
>> l.name
>> from
>> documents_mond documents,
>> locations l,
>> sources
>> where
>> /* documents.id_source = 113 and */
>> documents.id_location=l.id and
>> documents.id_source=sources.id
>> order by
>> documents.id desc
>> limit 5;
>> "Limit (cost=0.00..3.13 rows=5 width=24) (actual time=0.025..0.052
>> rows=5 loops=1)"
>> " -> Nested Loop (cost=0.00..659850.75 rows=1053929 width=24)
>> (actual time=0.024..0.051 rows=5 loops=1)"
>> " -> Nested Loop (cost=0.00..364654.31 rows=1053929
>> width=16) (actual time=0.021..0.037 rows=5 loops=1)"
>> " -> Index Scan Backward using idx_documents_mond_id on
>> documents_mond documents (cost=0.00..69295.41 rows=1053929 width=14)
>> (actual time=0.011..0.013 rows=5 loops=1)"
>> " -> Index Scan using pk_sources on sources
>> (cost=0.00..0.27 rows=1 width=8) (actual time=0.003..0.003 rows=1
>> loops=5)"
>> " Index Cond: (sources.id = documents.id_source)"
>> " -> Index Scan using locations_pkey on locations l
>> (cost=0.00..0.27 rows=1 width=10) (actual time=0.002..0.002 rows=1
>> loops=5)"
>> " Index Cond: (l.id = documents.id_location)"
>> "Total runtime: 0.091 ms"
>>
>> Is it a normal behavior ?
>> I need to rewrite all my Perl scripts to have query pointing only on
>> inherited tables (when possible) ?
>> I was thinking that query pointing on primary table were correctly
>> dispatched on inherited tables ... I missing something ?
>>
>> Regards
>>
>> Philippe
>>
>>
>> Ps : I'm french, so my english is approximate ... hoping it's
>> understandable
>>
>>
>
>


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Philippe Rimbault <primbault(at)edd(dot)fr>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Strange explain on partitioned tables
Date: 2010-07-22 13:32:36
Message-ID: 4C484874.3060609@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Philippe Rimbault wrote:
> I have one thousand inherited tables like this one (with a different
> check constraint on each) :

The PostgreSQL partitioning system is aimed to support perhaps a hundred
inherited tables. You can expect to get poor performance on queries if
you create 1000 of them. That's not the cause of your current problem,
just pointing out there's a larger design problem here you'll probably
have to fix one day.

> EXPLAIN ANALYZE
> select
> documents.id,
> documents.num,
> sources.name,
> l.name
> from
> documents,
> locations l,
> sources
> where
> documents.id_source = 113 and
> documents.id_location=l.id and
> documents.id_source=sources.id
> order by
> documents.id desc
> limit 5;

Please don't put your EXPLAIN plans surrounded in " marks; it makes it
harder to copy out of your message to analyze them with tools. I put
this bad one into http://explain.depesz.com/s/XD and it notes that the
"public.documents.id_location = l.id" search is underestimating the
number of rows by a factor of 8.7. You might get a better plan if you
can get better table statistics on that column. Did you run ANALYZE
since the partitioning was done? If not, that could be making this
worse. You might increase the amount of table statistics on this
specific column too, not sure what would help without knowing exactly
what's in there.

Another thing you can try is suggest the optimizer not use a hash join
here and see if it does the right thing instead; be a useful bit of
feedback to see what that plan turns out to be. Just put "set
enable_hashjoin=off;" before the rest of the query, it will only impact
that session.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us


From: Philippe Rimbault <primbault(at)edd(dot)fr>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Strange explain on partitioned tables
Date: 2010-07-22 14:10:51
Message-ID: 4C48516B.90103@edd.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Greg,

First : thank you for you help.

On 22/07/2010 15:32, Greg Smith wrote:
> Philippe Rimbault wrote:
>> I have one thousand inherited tables like this one (with a different
>> check constraint on each) :
>
> The PostgreSQL partitioning system is aimed to support perhaps a
> hundred inherited tables. You can expect to get poor performance on
> queries if you create 1000 of them. That's not the cause of your
> current problem, just pointing out there's a larger design problem
> here you'll probably have to fix one day.

Right now, there is only 6 inherited tables, but for performance issue,
where are testing solutions on more partionned systeme (all work fine
except for query with order by).

>
>> EXPLAIN ANALYZE
>> select
>> documents.id,
>> documents.num,
>> sources.name,
>> l.name
>> from
>> documents,
>> locations l,
>> sources
>> where
>> documents.id_source = 113 and
>> documents.id_location=l.id and
>> documents.id_source=sources.id
>> order by
>> documents.id desc
>> limit 5;
>
> Please don't put your EXPLAIN plans surrounded in " marks; it makes it
> harder to copy out of your message to analyze them with tools. I put
> this bad one into http://explain.depesz.com/s/XD and it notes that the
> "public.documents.id_location = l.id" search is underestimating the
> number of rows by a factor of 8.7. You might get a better plan if you
> can get better table statistics on that column. Did you run ANALYZE
> since the partitioning was done? If not, that could be making this
> worse. You might increase the amount of table statistics on this
> specific column too, not sure what would help without knowing exactly
> what's in there.
>
> Another thing you can try is suggest the optimizer not use a hash join
> here and see if it does the right thing instead; be a useful bit of
> feedback to see what that plan turns out to be. Just put "set
> enable_hashjoin=off;" before the rest of the query, it will only
> impact that session.
>

Sorry for the output of the EXPLAIN ...

VACUUM ANALYZE have been done just before test of query.

I think that the optimizer overestimates "public.documents.id_location =
l.id" because it plan on the primary table and not the child ...
I've change statistics to 1000 for documents.id_location and result is
the same.

I've tested "set enable_hashjoin=off;" and the result is worst (sorry
i'm still using 9.0b3) :

Limit (cost=197755.49..197755.50 rows=5 width=23) (actual
time=4187.148..4187.150 rows=5 loops=1)
-> Sort (cost=197755.49..200390.32 rows=1053932 width=23) (actual
time=4187.146..4187.147 rows=5 loops=1)
Sort Key: public.documents.id
Sort Method: top-N heapsort Memory: 17kB
-> Nested Loop (cost=151258.55..180250.06 rows=1053932
width=23) (actual time=1862.214..3769.611 rows=1053929 loops=1)
-> Index Scan using pk_sources on sources
(cost=0.00..8.27 rows=1 width=8) (actual time=0.007..0.013 rows=1 loops=1)
Index Cond: (id = 113)
-> Merge Join (cost=151258.55..169702.47 rows=1053932
width=21) (actual time=1862.204..3360.555 rows=1053929 loops=1)
Merge Cond: (l.id = public.documents.id_location)
-> Sort (cost=1.75..1.81 rows=23 width=10)
(actual time=0.028..0.036 rows=21 loops=1)
Sort Key: l.id
Sort Method: quicksort Memory: 17kB
-> Seq Scan on locations l (cost=0.00..1.23
rows=23 width=10) (actual time=0.002..0.009 rows=23 loops=1)
-> Materialize (cost=151256.80..156526.46
rows=1053932 width=13) (actual time=1862.162..2841.302 rows=1053929 loops=1)
-> Sort (cost=151256.80..153891.63
rows=1053932 width=13) (actual time=1862.154..2290.881 rows=1053929 loops=1)
Sort Key: public.documents.id_location
Sort Method: external merge Disk: 24496kB
-> Append (cost=0.00..27810.36
rows=1053932 width=13) (actual time=0.003..838.644 rows=1053929 loops=1)
-> Seq Scan on documents
(cost=0.00..18.25 rows=3 width=39) (actual time=0.000..0.000 rows=0 loops=1)
Filter: (id_source = 113)
-> Seq Scan on documents_mond
documents (cost=0.00..27792.11 rows=1053929 width=13) (actual
time=0.002..502.345 rows=1053929 loops=1)
Filter: (id_source = 113)
Total runtime: 4198.703 ms


From: Gerald Fontenay <gvfontenay(at)lbl(dot)gov>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Strange explain on partitioned tables
Date: 2010-07-23 22:03:55
Message-ID: 4C4A11CB.6040800@lbl.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance


> The PostgreSQL partitioning system is aimed to support perhaps a
> hundred inherited tables. You can expect to get poor performance on
> queries if you create 1000 of them.

Hi,

Why is that you would expect poor performance for say 1000 or more? I
have a ~1000 inherited tables and I don't see any significant slowdowns.
I only ever access a single inherited table at a time though in this
situation. I suppose I am using inheritance only for organization in
this case...

Thanks,
Gerald


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: gvfontenay(at)lbl(dot)gov
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Strange explain on partitioned tables
Date: 2010-07-23 22:16:10
Message-ID: 1279923370.9866.37.camel@jd-desktop.unknown.charter.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Fri, 2010-07-23 at 15:03 -0700, Gerald Fontenay wrote:
> > The PostgreSQL partitioning system is aimed to support perhaps a
> > hundred inherited tables. You can expect to get poor performance on
> > queries if you create 1000 of them.
>
> Hi,
>
> Why is that you would expect poor performance for say 1000 or more? I
> have a ~1000 inherited tables and I don't see any significant slowdowns.
> I only ever access a single inherited table at a time though in this
> situation. I suppose I am using inheritance only for organization in
> this case...

It is variable based on workload and as I recall has to do with the
planning time. As the number of children increases, so does the planning
time.

Joshua D. Drake

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: gvfontenay(at)lbl(dot)gov
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Strange explain on partitioned tables
Date: 2010-07-24 05:12:28
Message-ID: 4C4A763C.8090701@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Gerald Fontenay wrote:
>
>> The PostgreSQL partitioning system is aimed to support perhaps a
>> hundred inherited tables. You can expect to get poor performance on
>> queries if you create 1000 of them.
>
> Why is that you would expect poor performance for say 1000 or more?

When the query planner executes, it has to scan through every child
table to run the constraint exclusion algorithm for determining whether
that table needs to be included in the query results or not. The time
that takes is proportional to the number of partitions. If your queries
take a long time to execute relative to how long they take to plan, you
may not have noticed this. But for shorter queries, and ones where
there are lots of joins that require many plans be evaluated, the
planning overhead increase can be significant. The threshold for where
it becomes painful is obviously workload dependent, but the thing to be
careful of is that supporting very large numbers of partitions is not
something that the database query planner has been optimized for yet.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us


From: Gerald Fontenay <gvfontenay(at)lbl(dot)gov>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Strange explain on partitioned tables
Date: 2010-07-24 06:29:37
Message-ID: 4C4A8851.3090608@lbl.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Joshua D. Drake wrote:
> On Fri, 2010-07-23 at 15:03 -0700, Gerald Fontenay wrote:
>
>>> The PostgreSQL partitioning system is aimed to support perhaps a
>>> hundred inherited tables. You can expect to get poor performance on
>>> queries if you create 1000 of them.
>>>
>> Hi,
>>
>> Why is that you would expect poor performance for say 1000 or more? I
>> have a ~1000 inherited tables and I don't see any significant slowdowns.
>> I only ever access a single inherited table at a time though in this
>> situation. I suppose I am using inheritance only for organization in
>> this case...
>>
>
> It is variable based on workload and as I recall has to do with the
> planning time. As the number of children increases, so does the planning
> time.
>
>

I do not execute queries against the parent and children tables.
It became obvious that this can be quite slow.
I am asking specifically if I can expect problems in *that* situation, again, where I directly query a single child table only and where I may have thousands. Just like a any other single table query, no?

Thank you,
Gerald


From: Gerald Fontenay <gvfontenay(at)lbl(dot)gov>
To: pgsql-performance(at)postgresql(dot)org
Cc: Gerald Fontenay <gvfontenay(at)lbl(dot)gov>
Subject: Re: Strange explain on partitioned tables
Date: 2010-07-26 21:26:45
Message-ID: 4C4DFD95.4000901@lbl.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Joshua D. Drake wrote:
> On Fri, 2010-07-23 at 15:03 -0700, Gerald Fontenay wrote:
>
>>> The PostgreSQL partitioning system is aimed to support perhaps a
>>> hundred inherited tables. You can expect to get poor performance on
>>> queries if you create 1000 of them.
>>>
>> Hi,
>>
>> Why is that you would expect poor performance for say 1000 or more? I
>> have a ~1000 inherited tables and I don't see any significant slowdowns.
>> I only ever access a single inherited table at a time though in this
>> situation. I suppose I am using inheritance only for organization in
>> this case...
>>
>
> It is variable based on workload and as I recall has to do with the
> planning time. As the number of children increases, so does the planning
> time.
>
> Joshua D. Drake
>
>
>
Thank you for your response. So if I query only my target child table,
this should be "just like" any other single table wrt planning right? I
have thousands of these tables. (I suppose that I'm only using
inheritance for the sake of organization in this situation...)

Thanks!
Gerald


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: gvfontenay(at)lbl(dot)gov
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Strange explain on partitioned tables
Date: 2010-08-04 13:16:37
Message-ID: AANLkTi=7cPTu2sVDf17e5haF0vpf2Gh33qYqsKSOAkJE@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Mon, Jul 26, 2010 at 5:26 PM, Gerald Fontenay <gvfontenay(at)lbl(dot)gov> wrote:
> Thank you for your response. So if I query only my target child table, this
> should be "just like" any other single table wrt planning right? I have
> thousands of these tables. (I suppose that I'm only using inheritance for
> the sake of organization in this situation...)

Yeah, I wouldn't expect planning time to be affected by whether a
table has parents; only whether it has children.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company