Re: horizontal partition

Lists: pgsql-performance
From: Gaetano Mendola <mendola(at)bigfoot(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: horizontal partition
Date: 2005-02-02 00:16:17
Message-ID: ctp64h$l4d$1@floppy.pyrenet.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hi all,
I have a big table with ~ 10 Milion rows, and is a very
pain administer it, so after years I convinced my self
to partition it and replace the table usage ( only for reading )
with a view.

Now my user_logs table is splitted in 4:

user_logs
user_logs_2002
user_logs_2003
user_logs_2004

and the view v_user_logs is builded on top of these tables:

CREATE OR REPLACE VIEW v_user_logs AS
SELECT * FROM user_logs
UNION ALL
SELECT * FROM user_logs_2002
UNION ALL
SELECT * FROM user_logs_2003
UNION ALL
SELECT * FROM user_logs_2004
;

the view is performing really well:

empdb=# explain analyze select * from v_user_logs where id_user = sp_id_user('kalman');
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Subquery Scan v_user_logs (cost=0.00..895.45 rows=645 width=88) (actual time=17.039..2345.388 rows=175 loops=1)
-> Append (cost=0.00..892.23 rows=645 width=67) (actual time=17.030..2344.195 rows=175 loops=1)
-> Subquery Scan "*SELECT* 1" (cost=0.00..120.70 rows=60 width=67) (actual time=17.028..17.036 rows=1 loops=1)
-> Index Scan using idx_user_user_logs on user_logs (cost=0.00..120.40 rows=60 width=67) (actual time=17.012..17.018 rows=1 loops=1)
Index Cond: (id_user = 4185)
-> Subquery Scan "*SELECT* 2" (cost=0.00..475.44 rows=316 width=67) (actual time=49.406..1220.400 rows=79 loops=1)
-> Index Scan using idx_user_user_logs_2004 on user_logs_2004 (cost=0.00..473.86 rows=316 width=67) (actual time=49.388..1219.386 rows=79 loops=1)
Index Cond: (id_user = 4185)
-> Subquery Scan "*SELECT* 3" (cost=0.00..204.33 rows=188 width=67) (actual time=59.375..1068.806 rows=95 loops=1)
-> Index Scan using idx_user_user_logs_2003 on user_logs_2003 (cost=0.00..203.39 rows=188 width=67) (actual time=59.356..1067.934 rows=95 loops=1)
Index Cond: (id_user = 4185)
-> Subquery Scan "*SELECT* 4" (cost=0.00..91.75 rows=81 width=67) (actual time=37.623..37.623 rows=0 loops=1)
-> Index Scan using idx_user_user_logs_2002 on user_logs_2002 (cost=0.00..91.35 rows=81 width=67) (actual time=37.618..37.618 rows=0 loops=1)
Index Cond: (id_user = 4185)
Total runtime: 2345.917 ms
(15 rows)

the problem is now if this view is used in others views like this:

CREATE OR REPLACE VIEW v_ua_user_login_logout_tmp AS
SELECT
u.login,
ul.*
FROM user_login u,
v_user_logs ul
WHERE
u.id_user = ul.id_user
;

empdb=# explain analyze select * from v_ua_user_login_logout_tmp where login = 'kalman';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=4.01..228669.81 rows=173 width=100) (actual time=1544.784..116490.363 rows=175 loops=1)
Hash Cond: ("outer".id_user = "inner".id_user)
-> Subquery Scan ul (cost=0.00..193326.71 rows=7067647 width=88) (actual time=5.677..108190.096 rows=7067831 loops=1)
-> Append (cost=0.00..157988.47 rows=7067647 width=67) (actual time=5.669..77109.995 rows=7067831 loops=1)
-> Subquery Scan "*SELECT* 1" (cost=0.00..8158.48 rows=362548 width=67) (actual time=5.666..3379.178 rows=362862 loops=1)
-> Seq Scan on user_logs (cost=0.00..6345.74 rows=362548 width=67) (actual time=5.645..1395.673 rows=362862 loops=1)
-> Subquery Scan "*SELECT* 2" (cost=0.00..93663.88 rows=4191588 width=67) (actual time=9.149..35094.798 rows=4191580 loops=1)
-> Seq Scan on user_logs_2004 (cost=0.00..72705.94 rows=4191588 width=67) (actual time=9.117..16531.486 rows=4191580 loops=1)
-> Subquery Scan "*SELECT* 3" (cost=0.00..44875.33 rows=2008233 width=67) (actual time=0.562..24017.680 rows=2008190 loops=1)
-> Seq Scan on user_logs_2003 (cost=0.00..34834.17 rows=2008233 width=67) (actual time=0.542..13224.265 rows=2008190 loops=1)
-> Subquery Scan "*SELECT* 4" (cost=0.00..11290.78 rows=505278 width=67) (actual time=7.100..3636.163 rows=505199 loops=1)
-> Seq Scan on user_logs_2002 (cost=0.00..8764.39 rows=505278 width=67) (actual time=6.446..1474.709 rows=505199 loops=1)
-> Hash (cost=4.00..4.00 rows=1 width=16) (actual time=0.083..0.083 rows=0 loops=1)
-> Index Scan using user_login_login_key on user_login u (cost=0.00..4.00 rows=1 width=16) (actual time=0.064..0.066 rows=1 loops=1)
Index Cond: ((login)::text = 'kalman'::text)
Total runtime: 116491.056 ms
(16 rows)

as you can see the index scan is not used anymore.
Do you see any problem on this approach ?

Regards
Gaetano Mendola


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Gaetano Mendola <mendola(at)bigfoot(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: horizontal partition
Date: 2005-02-02 20:09:06
Message-ID: 200502021209.06878.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Gaetano,

> I have a big table with ~ 10 Milion rows, and is a very
> pain administer it, so after years I convinced my self
> to partition it and replace the table usage ( only for reading )
> with a view.
>
> Now my user_logs table is splitted in 4:
>
> user_logs
> user_logs_2002
> user_logs_2003
> user_logs_2004

Any reason you didn't use inheritance?

--
Josh Berkus
Aglio Database Solutions
San Francisco


From: Gaetano Mendola <mendola(at)bigfoot(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: horizontal partition
Date: 2005-02-03 01:10:15
Message-ID: ctrtln$d0m$1@floppy.pyrenet.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Josh Berkus wrote:
> Gaetano,
>
>
>>I have a big table with ~ 10 Milion rows, and is a very
>>pain administer it, so after years I convinced my self
>>to partition it and replace the table usage ( only for reading )
>>with a view.
>>
>>Now my user_logs table is splitted in 4:
>>
>>user_logs
>>user_logs_2002
>>user_logs_2003
>>user_logs_2004
>
>
> Any reason you didn't use inheritance?

I did in that way just to not use postgresql specific feature.
I can give it a try and I let you know, however the question remain,
why the index usage is lost if used in that way ?

Regards
Gaetano Mendola


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Gaetano Mendola <mendola(at)bigfoot(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: horizontal partition
Date: 2005-02-03 03:16:08
Message-ID: 200502021916.08960.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Gaetano,

> I did in that way just to not use postgresql specific feature.
> I can give it a try and I let you know, however the question remain,
> why the index usage is lost if used in that way ?

Because PostgreSQL is materializing the entire UNION data set in the
subselect. What Postgres version are you using? I thought this was fixed
in 7.4, but maybe not ...

--
Josh Berkus
Aglio Database Solutions
San Francisco


From: Klint Gore <kg(at)kgb(dot)une(dot)edu(dot)au>
To: Gaetano Mendola <mendola(at)bigfoot(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: horizontal partition
Date: 2005-02-03 03:31:35
Message-ID: 42019B1735F.1271KG@129.180.47.120
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Thu, 03 Feb 2005 02:10:15 +0100, Gaetano Mendola <mendola(at)bigfoot(dot)com> wrote:
> why the index usage is lost if used in that way ?

This is how I interpret it (if anyone wants to set me straight or
improve on it feel free)

Views are implemented as rules.

Rules are pretty much just a macro to the query builder. When it sees
the view, it replaces it with the implementation of the view.

When you join a view to a table, it generates a subselect of the
implementation and joins that to the other table.

So the subselect will generate the entire set of data from the view
before it can use the join to eliminate rows.

I would like a way to make this work better as well. One of my views is
32 joins of the same table (to get tree like data for reports).

klint.

+---------------------------------------+-----------------+
: Klint Gore : "Non rhyming :
: EMail : kg(at)kgb(dot)une(dot)edu(dot)au : slang - the :
: Snail : A.B.R.I. : possibilities :
: Mail University of New England : are useless" :
: Armidale NSW 2351 Australia : L.J.J. :
: Fax : +61 2 6772 5376 : :
+---------------------------------------+-----------------+


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Klint Gore <kg(at)kgb(dot)une(dot)edu(dot)au>
Cc: Gaetano Mendola <mendola(at)bigfoot(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: horizontal partition
Date: 2005-02-03 06:41:57
Message-ID: 200502022241.57799.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Klint,

> This is how I interpret it (if anyone wants to set me straight or
> improve on it feel free)
>
> Views are implemented as rules.
>
> Rules are pretty much just a macro to the query builder. When it sees
> the view, it replaces it with the implementation of the view.

Right so far.

>
> When you join a view to a table, it generates a subselect of the
> implementation and joins that to the other table.

More or less. A join set and a subselect are not really different in the
planner.

> So the subselect will generate the entire set of data from the view
> before it can use the join to eliminate rows.

Well, not exactly. That's what's happening in THIS query, but it doesn't
happen in most queries, no matter how many view levels you nest (well, up to
the number FROM_COLLAPSE_LIMIT, anyway).

The issue here is that the planner is capable of "pushing down" the WHERE
criteria into the first view, but not into the second, "nested" view, and so
postgres materializes the UNIONed data set before perfoming the join.

Thing is, I seem to recall that this particular issue was something Tom fixed
a while ago. Which is why I wanted to know what version Gaetano is using.

--
Josh Berkus
Aglio Database Solutions
San Francisco


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Klint Gore <kg(at)kgb(dot)une(dot)edu(dot)au>, Gaetano Mendola <mendola(at)bigfoot(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: horizontal partition
Date: 2005-02-03 06:55:28
Message-ID: 12896.1107413728@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Josh Berkus <josh(at)agliodbs(dot)com> writes:
> The issue here is that the planner is capable of "pushing down" the WHERE
> criteria into the first view, but not into the second, "nested" view, and so
> postgres materializes the UNIONed data set before perfoming the join.

> Thing is, I seem to recall that this particular issue was something Tom fixed
> a while ago. Which is why I wanted to know what version Gaetano is using.

It's still true that we can't generate a nestloop-with-inner-indexscan
join plan if the inner side is anything more complex than a single table
scan. Since that's the only plan that gives you any chance of not
scanning the whole partitioned table, it's rather a hindrance :-(

It might be possible to fix this by treating the nestloop's join
conditions as "push down-able" criteria, instead of the present rather
ad hoc method for generating nestloop/indexscan plans. It'd be quite
a deal of work though, and I'm concerned about how slow the planner
might run if we did do it like that.

regards, tom lane


From: Gaetano Mendola <mendola(at)bigfoot(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Subject: Re: horizontal partition
Date: 2005-02-03 10:40:20
Message-ID: 4201FF94.5090900@bigfoot.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Josh Berkus wrote:
> Gaetano,
>
>
>>I did in that way just to not use postgresql specific feature.
>>I can give it a try and I let you know, however the question remain,
>>why the index usage is lost if used in that way ?
>
>
> Because PostgreSQL is materializing the entire UNION data set in the
> subselect. What Postgres version are you using? I thought this was fixed
> in 7.4, but maybe not ...
>

Yes, I'm using with 7.4.x, so it was not fixed...

Regards
Gaetano Mendola


From: Gaetano Mendola <mendola(at)bigfoot(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: horizontal partition
Date: 2005-02-06 15:50:08
Message-ID: cu5ehf$q6k$1@floppy.pyrenet.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Tom Lane wrote:
> Josh Berkus <josh(at)agliodbs(dot)com> writes:
>
>>The issue here is that the planner is capable of "pushing down" the WHERE
>>criteria into the first view, but not into the second, "nested" view, and so
>>postgres materializes the UNIONed data set before perfoming the join.
>
>
>>Thing is, I seem to recall that this particular issue was something Tom fixed
>>a while ago. Which is why I wanted to know what version Gaetano is using.
>
>
> It's still true that we can't generate a nestloop-with-inner-indexscan
> join plan if the inner side is anything more complex than a single table
> scan. Since that's the only plan that gives you any chance of not
> scanning the whole partitioned table, it's rather a hindrance :-(
>
> It might be possible to fix this by treating the nestloop's join
> conditions as "push down-able" criteria, instead of the present rather
> ad hoc method for generating nestloop/indexscan plans. It'd be quite
> a deal of work though, and I'm concerned about how slow the planner
> might run if we did do it like that.
>

I don't know if this will help my attempt to perform an horizontal
partition, if it do I think that it can solve lot of problems out there,
I tried the inheritance technique too:

The table user_logs is the original one, I created two tables extending this one:

CREATE TABLE user_logs_2003_h () inherits (user_logs);
CREATE TABLE user_logs_2002_h () inherits (user_logs);

I defined on this table the index already defined on user_logs.

And this is the result:

empdb=# explain analyze select * from user_logs where id_user = sp_id_user('kalman');
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=0.00..426.33 rows=335 width=67) (actual time=20.891..129.218 rows=98 loops=1)
-> Append (cost=0.00..426.33 rows=335 width=67) (actual time=20.871..128.643 rows=98 loops=1)
-> Index Scan using idx_user_user_logs on user_logs (cost=0.00..133.11 rows=66 width=67) (actual time=20.864..44.594 rows=3 loops=1)
Index Cond: (id_user = 4185)
-> Index Scan using idx_user_user_logs_2003_h on user_logs_2003_h user_logs (cost=0.00..204.39 rows=189 width=67) (actual time=1.507..83.662 rows=95 loops=1)
Index Cond: (id_user = 4185)
-> Index Scan using idx_user_user_logs_2002_h on user_logs_2002_h user_logs (cost=0.00..88.83 rows=80 width=67) (actual time=0.206..0.206 rows=0 loops=1)
Index Cond: (id_user = 4185)
Total runtime: 129.500 ms
(9 rows)

that is good, but now look what happen in a view like this one:

create view to_delete AS
SELECT v.login,
u.*
from user_login v,
user_logs u
where v.id_user = u.id_user;

empdb=# explain analyze select * from to_delete where login = 'kalman';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=4.01..65421.05 rows=143 width=79) (actual time=1479.738..37121.511 rows=98 loops=1)
Hash Cond: ("outer".id_user = "inner".id_user)
-> Append (cost=0.00..50793.17 rows=2924633 width=67) (actual time=21.391..33987.363 rows=2927428 loops=1)
-> Seq Scan on user_logs u (cost=0.00..7195.22 rows=411244 width=67) (actual time=21.385..5641.307 rows=414039 loops=1)
-> Seq Scan on user_logs_2003_h u (cost=0.00..34833.95 rows=2008190 width=67) (actual time=0.024..18031.218 rows=2008190 loops=1)
-> Seq Scan on user_logs_2002_h u (cost=0.00..8764.00 rows=505199 width=67) (actual time=0.005..5733.554 rows=505199 loops=1)
-> Hash (cost=4.00..4.00 rows=2 width=16) (actual time=0.195..0.195 rows=0 loops=1)
-> Index Scan using user_login_login_key on user_login v (cost=0.00..4.00 rows=2 width=16) (actual time=0.155..0.161 rows=1 loops=1)
Index Cond: ((login)::text = 'kalman'::text)
Total runtime: 37122.069 ms
(10 rows)

and how you can see this path is not applicable too :-(

Any other suggestion ?

Regards
Gaetano Mendola