Partitioned table performance

From: "Stacy White" <harsh(at)computer(dot)org>
To: <pgsql-performance(at)postgresql(dot)org>
Subject: Partitioned table performance
Date: 2004-12-05 02:45:44
Message-ID: 000a01c4da74$84e072e0$0200a8c0@grownups
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

We're working with a Postgres database that includes a fairly large table
(100M rows, increasing at around 2M per day).

In some cases we've seen some increased performance in tests by splitting
the table into several smaller tables. Both 'UNION ALL' views, and the
superclass/subclass scheme work well at pruning down the set of rows a query
uses, but they seem to introduce a large performance hit to the time to
process each row (~50% for superclass/subclass, and ~150% for union views).

Is this to be expected? Or is this a problem with our test setup?

I've listed details on our tests at the end of this message. The results
are similar with our larger tables; the overhead appears to be per record
returned from the subquery/subclass; it's not a constant overhead per query.
Our production instance is running 7.4.2, but the results are the same on
8.0.

For reference, I tested with this setup (for the superclass/subclass
partitioning scheme):

CREATE TABLE super_foo ( partition NUMERIC, bar NUMERIC );
ANALYZE super_foo ;

CREATE TABLE sub_foo1 () INHERITS ( super_foo );
INSERT INTO sub_foo1 VALUES ( 1, 1 );
-- repeat insert until sub_foo1 has 1,000,000 rows
CREATE INDEX idx_subfoo1_partition ON sub_foo1 ( partition );
ANALYZE sub_foo1 ;

CREATE TABLE sub_foo2 () INHERITS ( super_foo );
INSERT INTO sub_foo2 VALUES ( 2, 1 );
-- repeat insert until sub_foo2 has 1,000,000 rows
CREATE INDEX idx_subfoo2_partition ON sub_foo2 ( partition );
ANALYZE sub_foo2 ;

and this setup for the union all scheme:

CREATE TABLE union_foo1 ( bar NUMERIC );
INSERT INTO union_foo1 VALUES ( 1 ) ;
-- repeat insert until union_foo1 has 1,000,000 rows
ANALYZE union_foo1 ;

CREATE TABLE union_foo2 ( bar NUMERIC );
INSERT INTO union_foo2 VALUES ( 1 ) ;
-- repeat insert until union_foo2 has 1,000,000 rows
ANALYZE union_foo2 ;

CREATE VIEW union_foo AS
SELECT 1 AS partition, * FROM union_foo1
UNION ALL
SELECT 2 AS partition, * FROM union_foo2 ;

The partition pruning works marvelously:

EXPLAIN SELECT SUM(bar) FROM super_foo WHERE partition = 2 ;
QUERY PLAN
---------------------------------------------------------------------------
----------------------------------
Aggregate (cost=21899.02..21899.02 rows=1 width=32)
-> Append (cost=0.00..19399.01 rows=1000002 width=32)
-> Seq Scan on super_foo (cost=0.00..0.00 rows=1 width=32)
Filter: (partition = 2::numeric)
-> Index Scan using idx_subfoo1_partition on sub_foo1 super_foo
(cost=0.00..2.01 rows=1 width=10)
Index Cond: (partition = 2::numeric)
-> Seq Scan on sub_foo2 super_foo (cost=0.00..19397.00
rows=1000000 width=10)
Filter: (partition = 2::numeric)

and

EXPLAIN SELECT SUM(bar) FROM union_foo WHERE partition = 2 ;
QUERY PLAN
---------------------------------------------------------------------------
----------------------
Aggregate (cost=75819.15..75819.15 rows=1 width=32)
-> Subquery Scan union_foo (cost=0.00..70818.60 rows=2000220 width=32)
-> Append (cost=0.00..50816.40 rows=2000220 width=10)
-> Subquery Scan "*SELECT* 1" (cost=0.00..25408.20
rows=1000110 width=10)
-> Result (cost=0.00..15407.10 rows=1000110 width=10)
One-Time Filter: false
-> Seq Scan on union_foo1 (cost=0.00..15407.10
rows=1000110 width=10)
-> Subquery Scan "*SELECT* 2" (cost=0.00..25408.20
rows=1000110 width=10)
-> Seq Scan on union_foo2 (cost=0.00..15407.10
rows=1000110 width=10)

But you can see a fair amount of overhead, espcially in the case of the
union view:

SELECT SUM(bar) FROM sub_foo1 UNION ALL SELECT SUM(bar) FROM sub_foo2 ;
Time: 2291.637 ms

SELECT SUM(bar) FROM union_foo1 UNION ALL SELECT SUM(bar) FROM union_foo2
;
Time: 2248.225 ms

SELECT SUM(bar) FROM super_foo ;
Time: 3329.953 ms

SELECT SUM(bar) FROM union_foo ;
Time: 5267.742 ms

SELECT SUM(bar) FROM sub_foo2 ;
Time: 1124.496 ms

SELECT SUM(bar) FROM union_foo2 ;
Time: 1090.616 ms

SELECT SUM(bar) FROM super_foo WHERE partition = 2 ;
Time: 2137.767 ms

SELECT SUM(bar) FROM union_foo WHERE partition = 2 ;
Time: 2774.887 ms

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Rodrigo Carvalhaes 2004-12-05 19:43:36 Re: pg_restore taking 4 hours!
Previous Message Josh Berkus 2004-12-04 19:14:18 Re: Improve BULK insertion