Re: Partitioned tables constraint_exclusion

Lists: pgsql-hackers
From: Weslee Bilodeau <weslee(dot)bilodeau(at)hypermediasystems(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Partitioned tables constraint_exclusion
Date: 2007-03-26 16:38:29
Message-ID: 4607F705.4010907@hypermediasystems.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I'm not sure if this is a bug, missing feature, misunderstanding on my part?

I checked the TODO list and couldn't find anything on it.

I currently have a 750 million row table, indexes are > 10 GB, so trying
to partition it.

The basic -

constraint_exclusion + exact match = OK
constraint_exclusion + ( var + var )::case = Not OK

Weslee

I tried to break it down to a simple case -
(kid_200601 should never show up in the plan)

mytest=# create table master ( var_text text not null, var_ts timestamp
with time zone not null, unique ( var_ts ) );
NOTICE: CREATE TABLE / UNIQUE will create implicit index
"master_var_ts_key" for table "master"
CREATE TABLE
mytest=# create table kid_200601 ( check ( var_ts >= '2006-01-01
00:00:00' AND var_ts < '2006-02-01 00:00:00' ) ) inherits ( master ) ;
CREATE TABLE
mytest=# create table kid_200602 ( check ( var_ts >= '2006-02-01
00:00:00' AND var_ts < '2006-03-01 00:00:00' ) ) inherits ( master ) ;
CREATE TABLE
mytest=# create table kid_200603 ( check ( var_ts >= '2006-03-01
00:00:00' AND var_ts < '2006-04-01 00:00:00' ) ) inherits ( master ) ;
CREATE TABLE
mytest=# explain select count(*) from master where var_ts > '2006-02-22
00:00:00' ;
QUERY PLAN
-----------------------------------------------------------------------------------------------
Aggregate (cost=71.94..71.95 rows=1 width=0)
-> Append (cost=7.09..69.18 rows=1101 width=0)
-> Bitmap Heap Scan on master (cost=7.09..21.68 rows=367 width=0)
Recheck Cond: (var_ts > '2006-02-22
00:00:00+00'::timestamp with time zone)
-> Bitmap Index Scan on master_var_ts_key
(cost=0.00..7.00 rows=367 width=0)
Index Cond: (var_ts > '2006-02-22
00:00:00+00'::timestamp with time zone)
-> Seq Scan on kid_200602 master (cost=0.00..23.75 rows=367
width=0)
Filter: (var_ts > '2006-02-22 00:00:00+00'::timestamp
with time zone)
-> Seq Scan on kid_200603 master (cost=0.00..23.75 rows=367
width=0)
Filter: (var_ts > '2006-02-22 00:00:00+00'::timestamp
with time zone)
(10 rows)

mytest=# select now() ;
now
-------------------------------
2007-03-26 16:02:29.360435+00
(1 row)

mytest=# explain select count(*) from master where var_ts > ( now() - '1
month'::interval )::timestamptz ;
QUERY PLAN
----------------------------------------------------------------------------------------------
Aggregate (cost=114.94..114.95 rows=1 width=0)
-> Append (cost=7.10..111.27 rows=1468 width=0)
-> Bitmap Heap Scan on master (cost=7.10..23.52 rows=367 width=0)
Recheck Cond: (var_ts > (now() - '1 mon'::interval))
-> Bitmap Index Scan on master_var_ts_key
(cost=0.00..7.01 rows=367 width=0)
Index Cond: (var_ts > (now() - '1 mon'::interval))
-> Seq Scan on kid_200601 master (cost=0.00..29.25 rows=367
width=0)
Filter: (var_ts > (now() - '1 mon'::interval))
-> Seq Scan on kid_200602 master (cost=0.00..29.25 rows=367
width=0)
Filter: (var_ts > (now() - '1 mon'::interval))
-> Seq Scan on kid_200603 master (cost=0.00..29.25 rows=367
width=0)
Filter: (var_ts > (now() - '1 mon'::interval))
(12 rows)

mytest=# show constraint_exclusion ;
constraint_exclusion
----------------------
on
(1 row)

mytest=# explain select count(*) from master where var_ts > (
'2007-03-26 16:03:27.370627+00'::timestamptz - '1 month'::interval
)::timestamptz ;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=105.77..105.78 rows=1 width=0)
-> Append (cost=7.10..102.10 rows=1468 width=0)
-> Bitmap Heap Scan on master (cost=7.10..22.60 rows=367 width=0)
Recheck Cond: (var_ts > ('2007-03-26
16:03:27.370627+00'::timestamp with time zone - '1 mon'::interval))
-> Bitmap Index Scan on master_var_ts_key
(cost=0.00..7.01 rows=367 width=0)
Index Cond: (var_ts > ('2007-03-26
16:03:27.370627+00'::timestamp with time zone - '1 mon'::interval))
-> Seq Scan on kid_200601 master (cost=0.00..26.50 rows=367
width=0)
Filter: (var_ts > ('2007-03-26
16:03:27.370627+00'::timestamp with time zone - '1 mon'::interval))
-> Seq Scan on kid_200602 master (cost=0.00..26.50 rows=367
width=0)
Filter: (var_ts > ('2007-03-26
16:03:27.370627+00'::timestamp with time zone - '1 mon'::interval))
-> Seq Scan on kid_200603 master (cost=0.00..26.50 rows=367
width=0)
Filter: (var_ts > ('2007-03-26
16:03:27.370627+00'::timestamp with time zone - '1 mon'::interval))
(12 rows)

mytest=#


From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Weslee Bilodeau" <weslee(dot)bilodeau(at)hypermediasystems(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partitioned tables constraint_exclusion
Date: 2007-03-26 18:59:54
Message-ID: 1174935595.6069.402.camel@silverbirch.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2007-03-26 at 09:38 -0700, Weslee Bilodeau wrote:

> mytest=# explain select count(*) from master where var_ts > (
> '2007-03-26 16:03:27.370627+00'::timestamptz - '1 month'::interval
> )::timestamptz ;

If you're able to supply a constant value, why not subtract 1 month
before you submit the query?

AFAIK timestamptz arithmetic depends upon the current timezone which is
a STABLE value and so won't currently work with partitioning.

Having partitioning work with STABLE functions should be a TODO item if
it isn't already, but that requires some thought to implement and won't
happen for 8.3.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com


From: Weslee Bilodeau <weslee(dot)bilodeau(at)hypermediasystems(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Partitioned tables constraint_exclusion
Date: 2007-03-26 19:58:16
Message-ID: 460825D8.5010808@hypermediasystems.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs wrote:
> On Mon, 2007-03-26 at 09:38 -0700, Weslee Bilodeau wrote:
>
>> mytest=# explain select count(*) from master where var_ts > (
>> '2007-03-26 16:03:27.370627+00'::timestamptz - '1 month'::interval
>> )::timestamptz ;
>
> If you're able to supply a constant value, why not subtract 1 month
> before you submit the query?
>
> AFAIK timestamptz arithmetic depends upon the current timezone which is
> a STABLE value and so won't currently work with partitioning.
>
> Having partitioning work with STABLE functions should be a TODO item if
> it isn't already, but that requires some thought to implement and won't
> happen for 8.3.
>

Mainly its because the value comes from a reporting system that has
minimal brains, it passes values it gets from the user directly into a
query.

IE, they enter '1 month', which I use to populate the interval value,
"ts > ( NOW() - $VALUE )"

But, in the example I did a "timestamp - interval", the exact date, not
NOW() - Still didn't work.

I'm guessing anything that has to think, math, etc is not valid for
constrain_exclusion?

Its not in the docs anywhere, so trying to isolate what can and can't be
done.

Weslee

mytest=# explain select count(*) from master where var_ts > (
'2007-03-26 16:03:27.370627+00'::timestamptz + '1 second'::interval
)::timestamptz ;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=105.77..105.78 rows=1 width=0)
-> Append (cost=7.10..102.10 rows=1468 width=0)
-> Bitmap Heap Scan on master (cost=7.10..22.60 rows=367 width=0)
Recheck Cond: (var_ts > ('2007-03-26
16:03:27.370627+00'::timestamp with time zone + '00:00:01'::interval))
-> Bitmap Index Scan on master_var_ts_key
(cost=0.00..7.01 rows=367 width=0)
Index Cond: (var_ts > ('2007-03-26
16:03:27.370627+00'::timestamp with time zone + '00:00:01'::interval))
-> Seq Scan on kid_200601 master (cost=0.00..26.50 rows=367
width=0)
Filter: (var_ts > ('2007-03-26
16:03:27.370627+00'::timestamp with time zone + '00:00:01'::interval))
-> Seq Scan on kid_200602 master (cost=0.00..26.50 rows=367
width=0)
Filter: (var_ts > ('2007-03-26
16:03:27.370627+00'::timestamp with time zone + '00:00:01'::interval))
-> Seq Scan on kid_200603 master (cost=0.00..26.50 rows=367
width=0)
Filter: (var_ts > ('2007-03-26
16:03:27.370627+00'::timestamp with time zone + '00:00:01'::interval))
(12 rows)

mytest=# explain select count(*) from master where var_ts > (
'2007-03-26 16:03:27.370627+00' ) ;
QUERY PLAN
------------------------------------------------------------------------------------------------------
Aggregate (cost=22.60..22.61 rows=1 width=0)
-> Append (cost=7.09..21.68 rows=367 width=0)
-> Bitmap Heap Scan on master (cost=7.09..21.68 rows=367 width=0)
Recheck Cond: (var_ts > '2007-03-26
16:03:27.370627+00'::timestamp with time zone)
-> Bitmap Index Scan on master_var_ts_key
(cost=0.00..7.00 rows=367 width=0)
Index Cond: (var_ts > '2007-03-26
16:03:27.370627+00'::timestamp with time zone)
(6 rows)

mytest=#


From: Weslee Bilodeau <weslee(dot)bilodeau(at)hypermediasystems(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Partitioned tables constraint_exclusion
Date: 2007-03-26 20:48:57
Message-ID: 460831B9.9090503@hypermediasystems.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Weslee Bilodeau wrote:
> Mainly its because the value comes from a reporting system that has
> minimal brains, it passes values it gets from the user directly into a
> query.
>
> IE, they enter '1 month', which I use to populate the interval value,
> "ts > ( NOW() - $VALUE )"
>
> But, in the example I did a "timestamp - interval", the exact date, not
> NOW() - Still didn't work.
>
> I'm guessing anything that has to think, math, etc is not valid for
> constrain_exclusion?
>
> Its not in the docs anywhere, so trying to isolate what can and can't be
> done.

This works -

CREATE FUNCTION now_interval( interval ) RETURNS timestamptz IMMUTABLE
STRICT AS $$ SELECT NOW() - $1 $$ LANGUAGE SQL ;

SELECT count(*) FROM master WHERE var_ts > now_interval( '1 month' );

This doesn't work -

SELECT count(*) FROM master WHERE var_ts > ( NOW() - '1 month'::interval );

This works for me, as the reporting system I know doesn't change
timezones, and function cache doesn't last longer then the current select?

But, its basically the exact same logic in both cases?

Weslee


From: Jim Nasby <decibel(at)decibel(dot)org>
To: Weslee Bilodeau <weslee(dot)bilodeau(at)hypermediasystems(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Partitioned tables constraint_exclusion
Date: 2007-04-09 20:55:31
Message-ID: 1A23C1EA-AA0A-4BC7-AC2B-6FB5D4B77F98@decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

See Simon's reply... timestamptz math is *not* IMMUTABLE, because
sessions are free to change their timezone at any time. I bet you can
get some invalid results using that function with a clever test case.

On Mar 26, 2007, at 3:48 PM, Weslee Bilodeau wrote:

> Weslee Bilodeau wrote:
>> Mainly its because the value comes from a reporting system that has
>> minimal brains, it passes values it gets from the user directly
>> into a
>> query.
>>
>> IE, they enter '1 month', which I use to populate the interval value,
>> "ts > ( NOW() - $VALUE )"
>>
>> But, in the example I did a "timestamp - interval", the exact
>> date, not
>> NOW() - Still didn't work.
>>
>> I'm guessing anything that has to think, math, etc is not valid for
>> constrain_exclusion?
>>
>> Its not in the docs anywhere, so trying to isolate what can and
>> can't be
>> done.
>
> This works -
>
> CREATE FUNCTION now_interval( interval ) RETURNS timestamptz IMMUTABLE
> STRICT AS $$ SELECT NOW() - $1 $$ LANGUAGE SQL ;
>
> SELECT count(*) FROM master WHERE var_ts > now_interval( '1 month' );
>
> This doesn't work -
>
> SELECT count(*) FROM master WHERE var_ts > ( NOW() - '1
> month'::interval );
>
>
> This works for me, as the reporting system I know doesn't change
> timezones, and function cache doesn't last longer then the current
> select?
>
>
> But, its basically the exact same logic in both cases?
>
> Weslee
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that
> your
> message can get through to the mailing list cleanly
>

--
Jim Nasby jim(at)nasby(dot)net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)


From: Weslee Bilodeau <weslee(dot)bilodeau(at)hypermediasystems(dot)com>
To: Jim Nasby <decibel(at)decibel(dot)org>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Partitioned tables constraint_exclusion
Date: 2007-04-09 21:40:28
Message-ID: 461AB2CC.1030901@hypermediasystems.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jim Nasby wrote:
> See Simon's reply... timestamptz math is *not* IMMUTABLE, because
> sessions are free to change their timezone at any time. I bet you can
> get some invalid results using that function with a clever test case.
>

I'm pretty sure it could easily be broken.
But to make it easier for me, I know that the reporting system connects,
runs the query, and disconnects.

So I'm so far safe using my current system.

If the system had persistent connections and changed timezones a lot, it
might however cause problems.

Its been the only way that I could get it to be smart enough to not use
the tables outside its range.

With the tables growing 2+ million rows a day, approaching 1 billion
rows, its helps performance a lot.

This works at least until the ongoing discussion of partitioned tables
hopefully improves things in this area.

> On Mar 26, 2007, at 3:48 PM, Weslee Bilodeau wrote:
>> This works -
>>
>> CREATE FUNCTION now_interval( interval ) RETURNS timestamptz IMMUTABLE
>> STRICT AS $$ SELECT NOW() - $1 $$ LANGUAGE SQL ;
>>
>> SELECT count(*) FROM master WHERE var_ts > now_interval( '1 month' );
>>
>> This doesn't work -
>>
>> SELECT count(*) FROM master WHERE var_ts > ( NOW() - '1
>> month'::interval );
>>
>>
>> This works for me, as the reporting system I know doesn't change
>> timezones, and function cache doesn't last longer then the current
>> select?
>>
>>
>> But, its basically the exact same logic in both cases?
>>
>> Weslee
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 1: if posting/reading through Usenet, please send an appropriate
>> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
>> message can get through to the mailing list cleanly
>>
>
> --
> Jim Nasby jim(at)nasby(dot)net
> EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
>
>
>

Weslee