Quick Links

Re: Partitioning syntax

Lists:	pgsql-hackers

From:	Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Partitioning syntax
Date:	2010-01-14 09:13:23
Message-ID:	20100114181323.9A33.52131E4D@oss.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Here is a revised partitioning syntax patch. It implements only syntax and
on-disk structure mentioned below:
Table Partitioning#Syntax
http://wiki.postgresql.org/wiki/Table_partitioning#Syntax
Table Partitioning#On-disk structure
http://wiki.postgresql.org/wiki/Table_partitioning#On-disk_structure

What we can do with the patch is src/test/regress/sql/partition.sql.
Note that the patch does nothing about INSERTs; triggers are still needed.

The main syntax is CREATE TABLE () PARTITION BY {RANGE | LIST} (...).
The reason I use it rather than "PARTITIONED BY" is for compatibility
to other DBMSs; Oracle and MySQL.

Changes from the previous CommitFest are:
- Additinal regression tests:
1000 partitions, error cases and boolean partitions
- Use pg_inherits_parent_index index if available.
- Sort not only range partitions but also list partitions
for stable display order.
- Remove ALTER PARTITION and DROP PARTITION syntax because
they are just synonyms of ALTER TABLE and DROP TABLE.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

Attachment	Content-Type	Size
partitioning-syntax_20100114.patch	application/octet-stream	143.5 KB

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Partitioning syntax
Date:	2010-01-16 23:45:53
Message-ID:	603c8f071001161545i725d84e7p96dcff6ed777e080@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jan 14, 2010 at 4:13 AM, Takahiro Itagaki
<itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> wrote:
> Here is a revised partitioning syntax patch. It implements only syntax and
> on-disk structure mentioned below:
> Table Partitioning#Syntax
> http://wiki.postgresql.org/wiki/Table_partitioning#Syntax
> Table Partitioning#On-disk structure
> http://wiki.postgresql.org/wiki/Table_partitioning#On-disk_structure
>
> What we can do with the patch is src/test/regress/sql/partition.sql.
> Note that the patch does nothing about INSERTs; triggers are still needed.
>
> The main syntax is CREATE TABLE () PARTITION BY {RANGE | LIST} (...).
> The reason I use it rather than "PARTITIONED BY" is for compatibility
> to other DBMSs; Oracle and MySQL.
>
> Changes from the previous CommitFest are:
> - Additinal regression tests:
> 1000 partitions, error cases and boolean partitions
> - Use pg_inherits_parent_index index if available.
> - Sort not only range partitions but also list partitions
> for stable display order.
> - Remove ALTER PARTITION and DROP PARTITION syntax because
> they are just synonyms of ALTER TABLE and DROP TABLE.

A couple of preliminary comments on this:

1. If we're thinking that this syntax should eventually result in
inserts (and updates?) being redirected to the appropriate partition,
then I think we should have that in the initial version. I don't
think we really want to add the syntax with a plan to change its
behavior incompatibly down the road.

2. The documentation does not explain what partitioning by list or by
range means, or what the difference between the two is. I think some
kind of general introduction to the subject is essential.

3. This patch is large enough (+1951/-63) that we have to consider
whether it makes sense to merge it at this point in the release cycle.
It doesn't change much existing code, which is a point in its favor,
but it's still a big patch. I guess we can wait until we're a little
further along to make that decision.

...Robert

From:	Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Partitioning syntax
Date:	2010-01-18 08:55:52
Message-ID:	20100118175551.9B43.52131E4D@oss.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> A couple of preliminary comments on this:

Thanks.
The attached is rebased on HEAD, with additional documentation.

> 1. If we're thinking that this syntax should eventually result in
> inserts (and updates?) being redirected to the appropriate partition,
> then I think we should have that in the initial version. I don't
> think we really want to add the syntax with a plan to change its
> behavior incompatibly down the road.

It's true that we need an alternative method for insert triggers,
but I'd like to submit it as another patch in the next development cycle.
I think the syntax proposed here is carefully chosen so that we will
not frequently modify modify them.

> 2. The documentation does not explain what partitioning by list or by
> range means, or what the difference between the two is. I think some
> kind of general introduction to the subject is essential.

Oops, I've forgotten to update ddl.sgml for the new syntax. I rewrote
ddl-partition section to use the new syntax, with some other changes:

1. Use PARTITION BY RANGE and CREATE PARTITION for the example.

2. Recommend to define indexes for the parent table, and copy the
definition to each partition with CREATE PARTITION or
CREATE TABLE (LIKE INCLUDING ALL).

3. Use an EXECUTE USING rather than a huge IF-THEN-ELSE block
in the insert trigger. The new recommended code is:
EXECUTE 'INSERT INTO ' || to_relname(NEW) || ' VALUES ($1.*)' USING NEW;
We don't need to update the trigger function if partitions are
added or removed with this form.

We could apply changes in 2 and 3 even without the partitioning patch.
I think we can define trigger functions with EXECUTE USING easliy
compared to before. I'm willing to split the doc patch if needed.

> 3. This patch is large enough (+1951/-63) that we have to consider
> whether it makes sense to merge it at this point in the release cycle.
> It doesn't change much existing code, which is a point in its favor,
> but it's still a big patch. I guess we can wait until we're a little
> further along to make that decision.

It depends on reviewers :) But I think the partitioning patch never
go to waste -- I'll continue to improve it if any feedbacks.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

Attachment	Content-Type	Size
partitioning-syntax_20100118.patch	application/octet-stream	157.6 KB

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Partitioning syntax
Date:	2010-01-21 19:08:04
Message-ID:	603c8f071001211108y6a5ab46epab1c7144e67510ac@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jan 18, 2010 at 3:55 AM, Takahiro Itagaki
<itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> A couple of preliminary comments on this:
>
> Thanks.
> The attached is rebased on HEAD, with additional documentation.
>
>> 1. If we're thinking that this syntax should eventually result in
>> inserts (and updates?) being redirected to the appropriate partition,
>> then I think we should have that in the initial version. I don't
>> think we really want to add the syntax with a plan to change its
>> behavior incompatibly down the road.
>
> It's true that we need an alternative method for insert triggers,
> but I'd like to submit it as another patch in the next development cycle.
> I think the syntax proposed here is carefully chosen so that we will
> not frequently modify modify them.

But I think users have the right to expect that the behavior of a
feature will be consistent. If, in 9.0, we add this feature and tell
people to start using it, then when 9.1 comes out and includes
automatic routing of inserts to the proper partition, their
applications will break. I don't think there's enough value in this
by itself to think that we're getting something by committing it now -
it's better to let people do it the way they have been for 9.0, and
then have a more complete implementation for 9.1. Maybe if we were at
the very beginning of the release cycle we could think about
committing this first and then making the other changes before
release, but even that is something we usually shy away from, in case
people get bogged down and don't have time to finish the work.

>> 2. The documentation does not explain what partitioning by list or by
>> range means, or what the difference between the two is. I think some
>> kind of general introduction to the subject is essential.
>
> Oops, I've forgotten to update ddl.sgml for the new syntax. I rewrote
> ddl-partition section to use the new syntax, with some other changes:
>
> 1. Use PARTITION BY RANGE and CREATE PARTITION for the example.
>
> 2. Recommend to define indexes for the parent table, and copy the
> definition to each partition with CREATE PARTITION or
> CREATE TABLE (LIKE INCLUDING ALL).
>
> 3. Use an EXECUTE USING rather than a huge IF-THEN-ELSE block
> in the insert trigger. The new recommended code is:
> EXECUTE 'INSERT INTO ' || to_relname(NEW) || ' VALUES ($1.*)' USING NEW;
> We don't need to update the trigger function if partitions are
> added or removed with this form.
>
> We could apply changes in 2 and 3 even without the partitioning patch.
> I think we can define trigger functions with EXECUTE USING easliy
> compared to before. I'm willing to split the doc patch if needed.
>
>> 3. This patch is large enough (+1951/-63) that we have to consider
>> whether it makes sense to merge it at this point in the release cycle.
>> It doesn't change much existing code, which is a point in its favor,
>> but it's still a big patch. I guess we can wait until we're a little
>> further along to make that decision.
>
> It depends on reviewers :) But I think the partitioning patch never
> go to waste -- I'll continue to improve it if any feedbacks.

I agree - it will not go to waste. I'm glad you're planning to
continue working on it. I'll try to take a little bit more look at
the code (or perhaps someone else will weigh in) but I don't think we
should expect it to go in this time around.

...Robert

From:	Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Partitioning syntax
Date:	2010-01-22 01:19:26
Message-ID:	20100122101925.E4B2.52131E4D@oss.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> people get bogged down and don't have time to finish the work.

Ok, I moved this patch to the next commit fest for 9.1 alpha 1.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

From:	Dmitry Fefelov <fozzy(at)ac-sw(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Cc:	Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Subject:	Re: Partitioning syntax
Date:	2010-03-17 07:54:48
Message-ID:	201003171354.48668.fozzy@ac-sw.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> Here is a revised partitioning syntax patch. It implements only syntax and
> on-disk structure mentioned below:
> Table Partitioning#Syntax
> http://wiki.postgresql.org/wiki/Table_partitioning#Syntax
> Table Partitioning#On-disk structure
> http://wiki.postgresql.org/wiki/Table_partitioning#On-disk_structure

Will 9.1 partitions allow to reference partitioned tables in foreign keys?

From:	Dmitry Fefelov <fozzy(at)ac-sw(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Cc:	Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Subject:	Re: Partitioning syntax
Date:	2010-03-17 07:55:45
Message-ID:	201003171355.45363.fozzy@ac-sw.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Will 9.1 partitions allow to reference partitioned tables in foreign keys?

Regards,
Dmitry

From:	Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To:	Dmitry Fefelov <fozzy(at)ac-sw(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Partitioning syntax
Date:	2010-03-17 08:13:50
Message-ID:	20100317171349.8F83.52131E4D@oss.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Dmitry Fefelov <fozzy(at)ac-sw(dot)com> wrote:

> > Here is a revised partitioning syntax patch. It implements only syntax and
> > on-disk structure mentioned below:
> > Table Partitioning#Syntax
> > http://wiki.postgresql.org/wiki/Table_partitioning#Syntax
> > Table Partitioning#On-disk structure
> > http://wiki.postgresql.org/wiki/Table_partitioning#On-disk_structure
>
> Will 9.1 partitions allow to reference partitioned tables in foreign keys?

Not in my first goals, but it might be possible if we could support row locks
for UNION plans:

=# SELECT * FROM tbl1 UNION ALL SELECT * FROM tbl2 FOR SHARE;
ERROR: SELECT FOR UPDATE/SHARE is not allowed with UNION/INTERSECT/EXCEPT
(in 9.0)

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

From:	David Fetter <david(at)fetter(dot)org>
To:	Dmitry Fefelov <fozzy(at)ac-sw(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org, Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Subject:	Re: Partitioning syntax
Date:	2010-03-17 13:52:15
Message-ID:	20100317135215.GA2577@fetter.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Mar 17, 2010 at 01:55:45PM +0600, Dmitry Fefelov wrote:
> > Here is a revised partitioning syntax patch. It implements only syntax and
> > on-disk structure mentioned below:
> > Table Partitioning#Syntax
> > http://wiki.postgresql.org/wiki/Table_partitioning#Syntax
> > Table Partitioning#On-disk structure
> > http://wiki.postgresql.org/wiki/Table_partitioning#On-disk_structure
>
> Will 9.1 partitions allow to reference partitioned tables in foreign keys?

For now, you can do something like this:

http://people.planetpostgresql.org/dfetter/index.php?/archives/51-Partitioning-Is-Such-Sweet-Sorrow.html

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

From:	Dmitry Fefelov <fozzy(at)ac-sw(dot)com>
To:	David Fetter <david(at)fetter(dot)org>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Partitioning syntax
Date:	2010-03-18 04:13:14
Message-ID:	201003181013.14058.fozzy@ac-sw.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> > Will 9.1 partitions allow to reference partitioned tables in foreign keys?
>
> For now, you can do something like this:
>
> http://people.planetpostgresql.org/dfetter/index.php?/archives/51-
Partitioning-Is-Such-Sweet-Sorrow.html
>
> Cheers,
> David.
>

Already did ;) But workable plain references will be useful.

Regards,
Dmitry

From:	Jaime Casanova <jaime(at)2ndquadrant(dot)com>
To:	Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Partitioning syntax
Date:	2010-06-17 06:58:22
Message-ID:	AANLkTinQnLA5qgqXW_0cKVXw3gN37hwnB2jTS1T7JX-T@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jan 18, 2010 at 3:55 AM, Takahiro Itagaki
<itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> wrote:
>
> Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
>> A couple of preliminary comments on this:
>
> Thanks.
> The attached is rebased on HEAD, with additional documentation.
>

This one, doesn't apply to head anymore... please update

--
Jaime Casanova www.2ndQuadrant.com
Soporte y capacitación de PostgreSQL

From:	Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To:	Jaime Casanova <jaime(at)2ndquadrant(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Partitioning syntax
Date:	2010-06-18 01:34:23
Message-ID:	20100618103423.A12B.52131E4D@oss.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Jaime Casanova <jaime(at)2ndquadrant(dot)com> wrote:

> This one, doesn't apply to head anymore... please update

Thank you for reviewing my patch!

I attached an updated patch set for partitioning syntax.

The latest codes are available at: http://repo.or.cz/w/pgsql-fdw.git
(I'm recycling FDW repo for the feature.)
* master branch is a copy of postgres' HEAD.
* 'partition' branch contains codes for partitioning.

The details and discussion for partitioning are in the wiki page:
http://wiki.postgresql.org/wiki/Table_partitioning

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

Attachment	Content-Type	Size
partition_20100618.tar.gz	application/octet-stream	33.8 KB

From:	Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
To:	Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc:	Jaime Casanova <jaime(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Partitioning syntax
Date:	2010-06-23 15:25:19
Message-ID:	AANLkTinmTM_YI1RBdgn_zpu_Upa6g05IjTjzagyQWzSY@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

2010/6/18 Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>:
>
> Jaime Casanova <jaime(at)2ndquadrant(dot)com> wrote:
>
>> This one, doesn't apply to head anymore... please update
>
> Thank you for reviewing my patch!
>
> I attached an updated patch set for partitioning syntax.

Isn't this linked from the RF web app??

Regards,

--
Hitoshi Harada

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>, "Takahiro Itagaki" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc:	"Jaime Casanova" <jaime(at)2ndquadrant(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning syntax
Date:	2010-06-23 15:41:04
Message-ID:	4C21E4C002000025000328F7@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com> wrote:
> 2010/6/18 Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>:

>> I attached an updated patch set for partitioning syntax.
>
> Isn't this linked from the RF web app??

It should have been. Neither the reviewer nor the author updated
the CF web page (as they should have done). I've just made the
entries to bring the patch it up to date in the web app.

I apologize for not picking up on the emails earlier.

-Kevin

From:	Jaime Casanova <jaime(at)2ndquadrant(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Partitioning syntax
Date:	2010-06-23 18:06:59
Message-ID:	AANLkTim-kCuXDzuZJrd6wy64vnJj25rF2256dfdim3SP@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jun 23, 2010 at 10:41 AM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
>
> It should have been. Neither the reviewer nor the author updated
> the CF web page (as they should have done). I've just made the
> entries to bring the patch it up to date in the web app.
>

Yeah! sorry i got bussy with other things... i will make a complete
review in next days...

--
Jaime Casanova www.2ndQuadrant.com
Soporte y capacitación de PostgreSQL

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc:	Jaime Casanova <jaime(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Partitioning syntax
Date:	2010-07-06 17:49:42
Message-ID:	AANLkTikP-1_8B04eyIK0sDf8uA5KMo64o8sorFBZE_CT@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jun 17, 2010 at 9:34 PM, Takahiro Itagaki
<itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> wrote:
> Jaime Casanova <jaime(at)2ndquadrant(dot)com> wrote:
>> This one, doesn't apply to head anymore... please update
>
> Thank you for reviewing my patch!
>
> I attached an updated patch set for partitioning syntax.

I've taken a little bit more of a look at this patch and I guess I'm
not too happy with the design.

1. It seems to me that the proposed design for pg_partition is poorly
thought out. In particular, I don't see how this would work if we
wanted to partition on multiple keys, which is a feature supported by
both Oracle and MySQL. It would also be nice to give at least some
thought to how we might handle partitioning by list with
subpartitioning by range or hash, or range partitioning with
subpartitioning by hash. We certainly don't need to do
subpartitioning in the first version of the patch, but I think we
should have a plan.

2. I am still of the view that the first version of this patch should
correctly handle routing of INSERT and COPY data to the correct
partition. But at a very minimum we need to have a plan for how we're
going to implement that in a follow-on patch. I think the way to do
this is to binary search a sorted array of partition keys (perhaps
upper bounds for range partitioning, and exact values for list
partitioning). When you find the correct key, then you find the index
of that key and look up that same index in a separate array of table
OIDs and insert there. While it's possible to construct such a
structure from the proposed catalog structure, it requires an index
scan. I'm wondering if it might be better to abandon the idea of
storing the partition values in pg_inherits and instead put
preconstructed arrays directly into pg_partition. That way, with a
single row fetch, you can get all the data you need. I'm not sure
this is better, though - other opinions?

3. For a first version of this patch, I would suggest that we only
allow partitioning by base columns, rather than expressions. When
someone goes to do a bulk load of data into the table, and we want to
do automatic tuple routing, we're going to have to evaluate the
partitioning expression(s) for every row. I'm just guessing here, but
I bet it's a lot cheaper to fetch an attribute by attnum than to
evaluate an arbitrary expression. So even if we add partitioning by
expression later, I don't think that the work to make a special case
for base columns will be wasted.

4. The dependency handling needs more thought. For example, if I do this:

create table names (id integer primary key, name varchar not null)
partition by range (id) (partition names_1 values less than 1000,
partition names_2 values less than 2000, partition names_3 values less
than 3000);

and then I drop names_2, the automatically generated constraint on
names_3 still says CHECK (2000 <= id AND id < 3000). And I can drop
those constraints off the individual tables, too, which doesn't seem
like it ought to be allowed. And if I do something like this:

create or replace function f(int) returns int as $$select $1$$
immutable language sql;
create table names (id integer primary key, name varchar not null)
partition by range (f(id)) (partition names_1 values less than 1000,
partition names_2 values less than 2000, partition names_3 values less
than 3000);
alter table names_1 drop constraint names_1_id_check;
alter table names_2 drop constraint names_2_id_check;
alter table names_3 drop constraint names_3_id_check;
drop function f(int);

...then I get:

ERROR: cannot drop function f(integer) because other objects depend on it
DETAIL: range partition depends on function f(integer)

...but there's no identification of which range partition depends on it.

The locking is also broken here. In session #1, start a transaction
and do CREATE PARTITION on an existing partitioned table. Then in
session #2, do DROP FUNCTION <some function> CASCADE in a manner that
leads to the range partition getting dropped. Then commit session #1.
Now you have a pg_inherits catalog with leftovers in inhvalues.

5. The use of the term "partition" is not very consistent. For
example, we use CREATE PARTITION to create a partition, but we use
DROP TABLE to get rid of it (there is no DROP PARTITION). I think
that the right syntax to use here is ALTER TABLE ... ADD/DROP
PARTITION; both Oracle and MySQL do it that way. And meanwhile
OCLASS_PARTITION means "the partitioning information associated with
the parent table", not "a partition of a parent table".

6. There's some kind of magic in here associated with indexes on the
parent table - it seems that matching indexes or primary keys are
automatically created on each child table. But there's no provision
for keeping them in sync. If I create a partitioned table with a
primary key, the key is inherited by all its current children. If I
then drop the primary key, it disappears from the parent but it still
exists on the children. Any new children created afterwards don't
have it, however. I'm not sure whether indices should propagate from
parent to child or not, but propagating whatever exists at the moment
of creation and then forgetting about it doesn't seem right.

7. I'm not convinced that it's a good idea to treat ALTER TABLE parent
ATTACH/DETACH PARTITION child as basically a synonym for ALTER TABLE
child [NO] INHERIT parent, but even if it is the current
implementation seems way too permissive (it also lacks comments and
adequate documentation). You can, for example, use ATTACH PARTITION
to add a new child and then NO INHERIT to detach it again; or you can
use INHERIT to attach a child even when the parent is partitioned. It
does however catch the case of trying to use ATTACH PARTITION to
attach a child to an unpartitioned parent.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Jaime Casanova <jaime(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Partitioning syntax
Date:	2010-07-07 06:14:50
Message-ID:	20100707151450.9805.52131E4D@oss.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> I've taken a little bit more of a look at this patch and I guess I'm
> not too happy with the design.

Thanks. I was thinking about only syntax for partitioning in the patch,
but I need more consideration about insert-aware catalog design.

> 5. The use of the term "partition" is not very consistent. For
> example, we use CREATE PARTITION to create a partition, but we use
> DROP TABLE to get rid of it (there is no DROP PARTITION). I think
> that the right syntax to use here is ALTER TABLE ... ADD/DROP
> PARTITION; both Oracle and MySQL do it that way. And meanwhile
> OCLASS_PARTITION means "the partitioning information associated with
> the parent table", not "a partition of a parent table".

"ALTER TABLE ... ADD/DROP PARTITION" was discussed many times,
but I cannot solve syntax confict with "ALTER TABLE ... ADD [COLUMN]".
Since we can omit COLUMN, parser treats "ADD PARTITION" as adding
a column named "PARTITION". We need to add PARTITION into the reserved
keyword list to avoid shift/reduce errors.

Do you have any better idea?

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc:	Jaime Casanova <jaime(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Partitioning syntax
Date:	2010-07-07 12:57:11
Message-ID:	AANLkTikhfx5QCrdbrBCbT07p_lzyBMUVSwQy6qynY515@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jul 7, 2010 at 2:14 AM, Takahiro Itagaki
<itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> wrote:
>> 5. The use of the term "partition" is not very consistent. For
>> example, we use CREATE PARTITION to create a partition, but we use
>> DROP TABLE to get rid of it (there is no DROP PARTITION). I think
>> that the right syntax to use here is ALTER TABLE ... ADD/DROP
>> PARTITION; both Oracle and MySQL do it that way. And meanwhile
>> OCLASS_PARTITION means "the partitioning information associated with
>> the parent table", not "a partition of a parent table".
>
> "ALTER TABLE ... ADD/DROP PARTITION" was discussed many times,
> but I cannot solve syntax confict with "ALTER TABLE ... ADD [COLUMN]".
> Since we can omit COLUMN, parser treats "ADD PARTITION" as adding
> a column named "PARTITION". We need to add PARTITION into the reserved
> keyword list to avoid shift/reduce errors.
>
> Do you have any better idea?

No, I think we're going to need to at least partially reserve that
keyword. However, SQL:2003 and SQL:2008 apparently have it as a
reserved keyword, so I'm hoping we can get away with that. I don't
think it's worth inventing a totally different (and, IMHO, not very
appealing) syntax just to avoid reserving a keyword that is reserved
in the standard.

http://www.postgresql.org/docs/current/static/sql-keywords-appendix.html

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Partitioning syntax
Date:	2010-07-15 12:18:57
Message-ID:	1279196337.1735.9598.camel@ebony
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, 2010-07-06 at 13:49 -0400, Robert Haas wrote:

> 1. It seems to me that the proposed design for pg_partition is poorly
> thought out. In particular, I don't see how this would work if we
> wanted to partition on multiple keys, which is a feature supported by
> both Oracle and MySQL. It would also be nice to give at least some
> thought to how we might handle partitioning by list with
> subpartitioning by range or hash, or range partitioning with
> subpartitioning by hash. We certainly don't need to do
> subpartitioning in the first version of the patch, but I think we
> should have a plan.

Or at least a way to store that information if/when it exists later.

> 2. I am still of the view that the first version of this patch should
> correctly handle routing of INSERT and COPY data to the correct
> partition. But at a very minimum we need to have a plan for how we're
> going to implement that in a follow-on patch. I think the way to do
> this is to binary search a sorted array of partition keys (perhaps
> upper bounds for range partitioning, and exact values for list
> partitioning). When you find the correct key, then you find the index
> of that key and look up that same index in a separate array of table
> OIDs and insert there. While it's possible to construct such a
> structure from the proposed catalog structure, it requires an index
> scan. I'm wondering if it might be better to abandon the idea of
> storing the partition values in pg_inherits and instead put
> preconstructed arrays directly into pg_partition. That way, with a
> single row fetch, you can get all the data you need. I'm not sure
> this is better, though - other opinions?

Agreed that it is really important. The heart of partitioning is the
metadata that will allow us to do insert routing as well as nested joins
using dynamic routing. We *must* plan for that so that the command
syntax and catalog storage delivers what is required. This patch must
not be just about syntax. The required usage drives the syntax, not the
other way around.

> 3. For a first version of this patch, I would suggest that we only
> allow partitioning by base columns, rather than expressions. When
> someone goes to do a bulk load of data into the table, and we want to
> do automatic tuple routing, we're going to have to evaluate the
> partitioning expression(s) for every row. I'm just guessing here, but
> I bet it's a lot cheaper to fetch an attribute by attnum than to
> evaluate an arbitrary expression. So even if we add partitioning by
> expression later, I don't think that the work to make a special case
> for base columns will be wasted.

Agree that part should come out for now and resubmit as a later patch.
Lets keep it simple in the first version.

Definitely do not want CREATE PARTITION. ALTER TABLE is the best place.

> 6. There's some kind of magic in here associated with indexes on the
> parent table - it seems that matching indexes or primary keys are
> automatically created on each child table. But there's no provision
> for keeping them in sync. If I create a partitioned table with a
> primary key, the key is inherited by all its current children. If I
> then drop the primary key, it disappears from the parent but it still
> exists on the children. Any new children created afterwards don't
> have it, however. I'm not sure whether indices should propagate from
> parent to child or not, but propagating whatever exists at the moment
> of creation and then forgetting about it doesn't seem right.

IMHO it should be optional as to whether all partitions have identical
indexing. It is an important aspect of the design that an historical
table may have different indexes on different parts of the table, since
different users/use cases exist for access to that data. No problem if
some people want that though.

> 7. I'm not convinced that it's a good idea to treat ALTER TABLE parent
> ATTACH/DETACH PARTITION child as basically a synonym for ALTER TABLE
> child [NO] INHERIT parent, but even if it is the current
> implementation seems way too permissive (it also lacks comments and
> adequate documentation). You can, for example, use ATTACH PARTITION
> to add a new child and then NO INHERIT to detach it again; or you can
> use INHERIT to attach a child even when the parent is partitioned. It
> does however catch the case of trying to use ATTACH PARTITION to
> attach a child to an unpartitioned parent.

Agreed. If all we are doing is adding synonyms for existing feature then
its not good enough. We need a new syntax that does not need to be
backwards compatible, allowing various code streamlining and more
targeting to the desired use case. Inheritance != partitioning. Similar,
maybe, but not identical. Probably also the only way we can move
forwards without breaking all the existing user code in subtle ways.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Training and Services

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Jaime Casanova <jaime(at)2ndQuadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Partitioning syntax
Date:	2010-07-15 15:35:05
Message-ID:	5768.1279208105@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> Agreed. If all we are doing is adding synonyms for existing feature then
> its not good enough. We need a new syntax that does not need to be
> backwards compatible, allowing various code streamlining and more
> targeting to the desired use case. Inheritance != partitioning. Similar,
> maybe, but not identical. Probably also the only way we can move
> forwards without breaking all the existing user code in subtle ways.

My feeling about it is that partitioning should be a subset of
inheritance --- that is, a partitioned table is an inheritance tree,
but with additional constraints/properties/catalog information.

In the case at hand, that means that you couldn't use ALTER TABLE
INHERIT to install a new partition, but only because it would fail to
provide the additional information needed (partition key info).
ALTER TABLE ATTACH PARTITION is like INHERIT except it also provides
the extra partitioning info. OTOH, DETACH PARTITION is not really
significantly different from ALTER NO INHERIT --- you could allow them
to be used interchangeably. Though I'd still favor keeping them
separate just for consistency of the DDL language.

regards, tom lane