Quick Links

Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1

Lists:	pgsql-hackerspgsql-patches

From:	NikhilS <nikkhils(at)gmail(dot)com>
To:	pgsql-patches(at)postgresql(dot)org
Subject:	Auto Partitioning Patch - WIP version 1
Date:	2007-03-30 06:58:02
Message-ID:	d3c4af540703292358s8ed731el7771ab14083aa610@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Hi,

Please find attached the WIP version 1 of the auto partitioning patch. There
was discussion on this a while back on -hackers at:
http://archives.postgresql.org/pgsql-hackers/2007-03/msg00375.php

Please note that this patch tries to automate the activities that currently
are carried out manually. It does nothing fancy beyond that for now. There
were a lot of good suggestions, I have noted them down but for now I have
tried to stick to the initial goal of automating existing steps for
providing partitioning.

Things that this patch does:

i) Handle new syntax to provide partitioning:

CREATE TABLE tabname (
...
) PARTITION BY
RANGE(ColId)
| LIST(ColId)
(
PARTITION partition_name CHECK(...),
PARTITION partition_name CHECK(...)
...
);

ii) Create master table.
iii) Create children tables based on the number of partitions specified and
make them inherit from the master table.

The following things are TODOs:

iv) Auto generate rules using the checks mentioned for the partitions, to
handle INSERTs/DELETEs/UPDATEs to navigate them to the appropriate child.
Note that checks specified directly on the master table will get inherited
automatically.
v) Based on the PRIMARY, UNIQUE information specified, pass it on to the
children tables.
vi) [stretch goal] Support HASH partitions

Will try to complete the above mentioned TODOs as soon as is possible.

Comments, feedback appreciated.

Thanks and Regards,
Nikhils
--

EnterpriseDB http://www.enterprisedb.com

Attachment	Content-Type	Size
auto-partition-v1.0.patch	text/x-patch	16.8 KB

From:	NikhilS <nikkhils(at)gmail(dot)com>
To:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Auto Partitioning Patch - WIP version 1
Date:	2007-04-03 15:36:10
Message-ID:	d3c4af540704030836o242c49cl2688ab4ec9808ac3@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Hi,

>
> The following things are TODOs:
>
> iv) Auto generate rules using the checks mentioned for the partitions, to
> handle INSERTs/DELETEs/UPDATEs to navigate them to the appropriate child.
> Note that checks specified directly on the master table will get inherited
> automatically.

Am planning to do the above by using the check constraint specified for each
partition. This constraint's raw_expr field ends up becoming the whereClause
for the rule specific to that partition.

One question is whether we should we allow auto creation of UPDATE rules
given that updates can end up spanning multiple partitions if the column on
which partitioning is specified gets updated?

Also if we decide to auto - add rules for UPDATE, the raw_expr will need to
be modified to refer to "OLD."col, which can be quite a headache. We do not
have parsetree walker/mutator functions as far as I could see in the code.

Regards,
Nikhils

--
EnterpriseDB http://www.enterprisedb.com

From:	Markus Schiltknecht <markus(at)bluegap(dot)ch>
To:	NikhilS <nikkhils(at)gmail(dot)com>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, simon(at)2ndquadrant(dot)com
Subject:	Re: Auto Partitioning
Date:	2007-04-04 12:20:53
Message-ID:	46139825.6080204@bluegap.ch
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Hi,

NikhilS wrote:
>> The following things are TODOs:
>>
>> iv) Auto generate rules using the checks mentioned for the partitions, to
>> handle INSERTs/DELETEs/UPDATEs to navigate them to the appropriate child.
>> Note that checks specified directly on the master table will get
>> inherited
>> automatically.
>
> Am planning to do the above by using the check constraint specified for
> each
> partition. This constraint's raw_expr field ends up becoming the
> whereClause
> for the rule specific to that partition.

I appreciate you efforts, but I'm not sure if this has been discussed
enough. There seem to be two ideas floating around:

- you are heading for automating the current kludge, which involves
creating partitions and constraints by hand. AFAICT, you want to
support list and range partitioning.

- Simon Riggs has proposed partitioning functions, which could easily
handle any type of partitioning (hash, list, range and any mix of
those).

Both proposals do not have much to do with the missing multi-table
indices. It's clear to me that we have to implement those someday, anyway.

AFAICT, the first proposal does not ease the task of writing correct
constraints, so that we are sure that each row ends up in only exactly
one partition. The second would.

But the second proposal makes it hard for the planner to choose the
right partitions, i.e. if you request a range of ids, the planner would
have to query the partitioning function for every possible value. The
first variant could use constraint exclusion for that.

None of the two has gone as far as thinking about switching from one
partitioning rule set to another. That gets especially hard if you
consider database restarts during re-partitioning.

Here are some thought I have come up with recently. This is all about
how to partition and not about how to implement multi-table indices.
Sorry if this got somewhat longish. And no, this is certainly not for
8.3 ;-)

I don't like partitioning rules, which leave open questions, i.e. when
there are values for which the system does not have an answer (and would
have to fall back to a default) or even worse, where it could give
multiple correct answers. Given that premise, I see only two basic
partitioning types:

- splits: those can be used for what's commonly known as list and range
partitioning. If you want customers A-M to end up on partition 1 and
customers N-Z on partition 2 you would split between M and N. (That
way, the system would still know what to do with a customer name
beginning with an @ sign, for example. The only requirement for a
split is that the underlying data type supports comparison
operators.)

- modulo: I think this is commonly known as hash partitioning. It
requires an integer input, possibly by hashing, and calculates the
remainder of a division by n. That should give an equal distribution
among n partitions.

Besides the expression to work on, a split always needs one argument,
the split point, and divides into two buckets. A modulo splits into two
or more buckets and needs the divisor as an argument.

Of course, these two types can be combined. I like to think of these
combinations as trees. Let me give you a simple examlpe:

table customers
|
|
split @ name >= 'N'
/ \
/ \
part1 part2

A combination of the two would look like:

table invoices
|
|
split @ id >= 50000
/ \
/ \
hash(id) modulo 3 part4
/ | \
/ | \
part1 part2 part3

Knowledge of these trees would allow the planner to choose more wisely,
i.e. given a comparative condition (WHERE id > 100000) it could check
the splits in the partitioning tree and only scan the partitions
necessary. Likewise with an equality condition (WHERE id = 1234).

As it's a better definition of the partitioning rules, the planner would
not have to check constraints of all partitions, as the current
constraint exclusion feature does. It might even be likely that querying
this partitioning tree and then scanning the single-table index will be
faster than an index scan on a multi-table index. At least, I cannot see
why it should be any slower.

Such partitioning rule sets would allow us to re-partition by adding a
split node on top of the tree. The split point would have to increment
together with the progress of moving around the rows among the
partitions, so that the database would always be in a consistent state
regarding partitioning.

Additionally, it's easy to figure out, when no or only few moving around
is necessary, i.e. when adding a split @ id >= 1000 to a table which
only has ids < 1000.

I believe that this is a well defined partitioning rule set, which has
more information for the planner than a partitioning function could ever
have. And it is less of a foot-gun than hand written constraints,
because it does not allow the user to specify illegal partitioning rules
(i.e. it's always guaranteed, that every row ends up in only one partition).

Of course, it's far more work than either of the above proposals, but
maybe we can go there step by step? Maybe, NikhilS proposal is more like
a step towards such a beast?

Feedback of any form is very welcome.

Regards

Markus

From:	NikhilS <nikkhils(at)gmail(dot)com>
To:	"Markus Schiltknecht" <markus(at)bluegap(dot)ch>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, simon(at)2ndquadrant(dot)com
Subject:	Re: Auto Partitioning
Date:	2007-04-04 13:19:56
Message-ID:	d3c4af540704040619p6c99ee25q120f3edad5210c5e@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Hi,

>
> I appreciate you efforts, but I'm not sure if this has been discussed

Thanks Markus.

enough. There seem to be two ideas floating around:
>
> - you are heading for automating the current kludge, which involves
> creating partitions and constraints by hand. AFAICT, you want to
> support list and range partitioning.
>
> - Simon Riggs has proposed partitioning functions, which could easily
> handle any type of partitioning (hash, list, range and any mix of
> those).

When I submitted the proposal, AFAIR there was no objection to going with
the first proposal. Yes there was a lot of forward looking discussion, but
since what I had proposed (atleast syntax wise) was similar/closer to Mysql,
Oracle I did not see any one objecting to it. I think SQL server provides
partitioning functions similar to Simon's proposal. And all along, I had
maintained that I wanted to automate as far as possible, the existing
mechanism for partitioning. To this too, I do not remember anyone objecting
to.

Our current partitioning solution is based on inheritance. With that in
mind, for 8.3 I thought an implementation based on auto rules creation would
be the way to go.

Having said that, obviously I would want to go with the consensus on this
list as to what we think is the *best* way to go forward with partitioning.

Regards,
Nikhils
--
EnterpriseDB http://www.enterprisedb.com

From:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
To:	"Markus Schiltknecht" <markus(at)bluegap(dot)ch>
Cc:	"NikhilS" <nikkhils(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Auto Partitioning
Date:	2007-04-04 13:42:51
Message-ID:	1175694172.3623.120.camel@silverbirch.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

On Wed, 2007-04-04 at 14:20 +0200, Markus Schiltknecht wrote:
> Both proposals do not have much to do with the missing multi-table
> indices. It's clear to me that we have to implement those someday,
> anyway.

I agree with much of your post, though this particular point caught my
eye. If you'll forgive me for jumping on an isolated point in your post:

Multi-table indexes sound like a good solution until you consider how
big they would be. The reason we "need" a multi-table index is because
we are using partitioning, which we wouldn't be doing unless the data
was fairly large. So the index is going to be (Num partitions *
fairly-large) in size, which means its absolutely enormous. Adding and
dropping partitions also becomes a management nightmare, so overall
multi-table indexes look unusable to me. Multi-table indexes also remove
the possibility of loading data quickly, then building an index on the
data, then adding the table as a partition - both the COPY and the
CREATE INDEX would be slower with a pre-existing multi-table index.

My hope is to have a mechanism to partition indexes or recognise that
they are partitioned, so that a set of provably-distinct unique indexes
can provide the exact same functionlity as a single large unique index,
just without the management nightmare.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc:	"Markus Schiltknecht" <markus(at)bluegap(dot)ch>, "NikhilS" <nikkhils(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Auto Partitioning
Date:	2007-04-04 14:03:51
Message-ID:	87648ckzi0.fsf@oxford.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

"Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:

> On Wed, 2007-04-04 at 14:20 +0200, Markus Schiltknecht wrote:
>> Both proposals do not have much to do with the missing multi-table
>> indices. It's clear to me that we have to implement those someday,
>> anyway.
>
> I agree with much of your post, though this particular point caught my
> eye. If you'll forgive me for jumping on an isolated point in your post:
>
> Multi-table indexes sound like a good solution until you consider how
> big they would be.

Put another way, multi-table indexes defeat the whole purpose of having
partitioned the table in the first place. If you could have managed a single
massive index then you wouldn't have bothered partitioning.

However there is a use case that can be handled by a kind of compromise index.
Indexes that have leading columns which restrict all subtrees under that point
to a single partition can be handled by a kind of meta-index. So you have one
index which just points you to the right partition and corresponding index.

That lets you enforce unique constraints as long as the partition key is part
of the unique constraint. In practice people are usually pretty comfortable
not having the database enforce such a constraint since it's easy to have the
application enforce these types of constraints anyways.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Markus Schiltknecht <markus(at)bluegap(dot)ch>, NikhilS <nikkhils(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Auto Partitioning
Date:	2007-04-04 14:07:39
Message-ID:	4613B12B.8050805@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Simon Riggs wrote:
> My hope is to have a mechanism to partition indexes or recognise that
> they are partitioned, so that a set of provably-distinct unique indexes
> can provide the exact same functionlity as a single large unique index,
> just without the management nightmare.
>
>

Will this address the fairly common data design problem where we need to
ensure that a given value is unique across several tables (possibly
siblings, possibly not)? If so, then full steam ahead.

cheers

andrew

From:	Markus Schiltknecht <markus(at)bluegap(dot)ch>
To:	NikhilS <nikkhils(at)gmail(dot)com>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, simon(at)2ndquadrant(dot)com
Subject:	Re: Auto Partitioning
Date:	2007-04-04 14:21:29
Message-ID:	4613B469.3040804@bluegap.ch
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Hi,

NikhilS wrote:
> Our current partitioning solution is based on inheritance. With that in
> mind, for 8.3 I thought an implementation based on auto rules creation
> would be the way to go.

That's completely reasonable. And as I've said, it's probably even a
step towards what I've outlined (automation of creation of partitions).

Regards

Markus

From:	Markus Schiltknecht <markus(at)bluegap(dot)ch>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	NikhilS <nikkhils(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Auto Partitioning
Date:	2007-04-04 14:31:54
Message-ID:	4613B6DA.9010908@bluegap.ch
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Hi,

Simon Riggs wrote:
> I agree with much of your post, though this particular point caught my
> eye. If you'll forgive me for jumping on an isolated point in your post:

No problem.

> Multi-table indexes sound like a good solution until you consider how
> big they would be. The reason we "need" a multi-table index is because
> we are using partitioning, which we wouldn't be doing unless the data
> was fairly large. So the index is going to be (Num partitions *
> fairly-large) in size, which means its absolutely enormous. Adding and
> dropping partitions also becomes a management nightmare, so overall
> multi-table indexes look unusable to me. Multi-table indexes also remove
> the possibility of loading data quickly, then building an index on the
> data, then adding the table as a partition - both the COPY and the
> CREATE INDEX would be slower with a pre-existing multi-table index.

I agree. (And thanks to TOAST, we never have very wide tables with
relatively few rows, right? I mean, something like pictures stored in
bytea columns or some such.)

> My hope is to have a mechanism to partition indexes or recognise that
> they are partitioned, so that a set of provably-distinct unique indexes
> can provide the exact same functionlity as a single large unique index,
> just without the management nightmare.

Uhm... I don't quite get what you mean by "provably-distinct unique
indexes".

As long as the first columns of an index are equal to all columns of the
partitioning columns, there is no problem. You could easily reduce to
simple per-table indexes and using the partitioning rule set to decide
which index to query.

But how to create an (unique) index which is completely different from
the partitioning key?

Regards

Markus

From:	Markus Schiltknecht <markus(at)bluegap(dot)ch>
To:	Gregory Stark <stark(at)enterprisedb(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, NikhilS <nikkhils(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Auto Partitioning
Date:	2007-04-04 14:51:30
Message-ID:	4613BB72.1090602@bluegap.ch
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Hi,

Gregory Stark wrote:
> Put another way, multi-table indexes defeat the whole purpose of having
> partitioned the table in the first place. If you could have managed a single
> massive index then you wouldn't have bothered partitioning.

That depends very much on the implementation of the multi-table index,
as you describe below. I think the major missing part is not *how* such
a meta-index should work - it's easily understandable, that one could
use the per-table indices - but a programming interface, similar to the
current index scan or sequential scan facility, which could return a
table and tuple pointer, no?

> However there is a use case that can be handled by a kind of compromise index.
> Indexes that have leading columns which restrict all subtrees under that point
> to a single partition can be handled by a kind of meta-index. So you have one
> index which just points you to the right partition and corresponding index.

Yeah.

> That lets you enforce unique constraints as long as the partition key is part
> of the unique constraint.

Is that already sufficient? That would alter the ordering of the columns
in the index, no? I mean:

CREATE INDEX x ON test(a, b, c);

isn't the same as

CRETAE INDEX x ON test(c, b, a);

That's why I'd say, the first column of an index would have to be equal
to all of the columns used in the partitioning key.

Regards

Markus

From:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
To:	"Markus Schiltknecht" <markus(at)bluegap(dot)ch>
Cc:	"NikhilS" <nikkhils(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Auto Partitioning
Date:	2007-04-04 15:07:51
Message-ID:	1175699272.3623.171.camel@silverbirch.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

On Wed, 2007-04-04 at 16:31 +0200, Markus Schiltknecht wrote:

> But how to create an (unique) index which is completely different from
> the partitioning key?

Don't?

Most high volume tables are Fact tables with potentially more than 1 row
per Object/Dimension, so the unique index isn't appropriate in those
cases.

When partitioning a Major Entity its much easier to regard the PK as the
partitioning key + unique key, which is frequently possible, even if it
does break the exhortation against intelligent keys.

I wouldn't stand in the way of someone trying to add that functionality,
but I would describe the use case as fairly narrow.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

From:	Markus Schiltknecht <markus(at)bluegap(dot)ch>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	NikhilS <nikkhils(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Auto Partitioning
Date:	2007-04-04 16:33:05
Message-ID:	4613D341.9020708@bluegap.ch
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Hi,

Simon Riggs wrote:
> Most high volume tables are Fact tables with potentially more than 1 row
> per Object/Dimension, so the unique index isn't appropriate in those
> cases.
>
> When partitioning a Major Entity its much easier to regard the PK as the
> partitioning key + unique key, which is frequently possible, even if it
> does break the exhortation against intelligent keys.

Okay, so you are saying that a general purpose multi-table index isn't
needed, but instead something based on the partitioning rule set and the
per table indexes should be sufficient for the vast majority of cases?

Regards

Markus

From:	David Fetter <david(at)fetter(dot)org>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Markus Schiltknecht <markus(at)bluegap(dot)ch>, NikhilS <nikkhils(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Auto Partitioning
Date:	2007-04-04 16:43:28
Message-ID:	20070404164328.GA13030@fetter.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

On Wed, Apr 04, 2007 at 10:07:39AM -0400, Andrew Dunstan wrote:
> Simon Riggs wrote:
> >My hope is to have a mechanism to partition indexes or recognise
> >that they are partitioned, so that a set of provably-distinct
> >unique indexes can provide the exact same functionlity as a single
> >large unique index, just without the management nightmare.
>
> Will this address the fairly common data design problem where we
> need to ensure that a given value is unique across several tables
> (possibly siblings, possibly not)?

That would be where the provably-distinct part comes in, so yes.

> If so, then full steam ahead.

Cheers,
D
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
phone: +1 415 235 3778 AIM: dfetter666
Skype: davidfetter

Remember to vote!
Consider donating to PostgreSQL: http://www.postgresql.org/about/donate

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	"Markus Schiltknecht" <markus(at)bluegap(dot)ch>
Cc:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>, "NikhilS" <nikkhils(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Auto Partitioning
Date:	2007-04-04 18:06:50
Message-ID:	871wj0ko91.fsf@oxford.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

"Markus Schiltknecht" <markus(at)bluegap(dot)ch> writes:

> CREATE INDEX x ON test(a, b, c);
>
> That's why I'd say, the first column of an index would have to be equal to all
> of the columns used in the partitioning key.

That's certainly the simple case. It would be nice to be able to create an
index like that and have the system automatically recognize that the leading
column is identical to the partition key of (a) and therefore build indexes on
each partition on (b,c).

However there are also cases such as where you have a=0..99 in one partition
and a=100..199 in partition two, etc. It could still automatically build
indexes on (a,b,c) on each partition and somehow note that the unique
constraint is guaranteed across the whole partitioned table.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

From:	Markus Schiltknecht <markus(at)bluegap(dot)ch>
To:	Gregory Stark <stark(at)enterprisedb(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, NikhilS <nikkhils(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Auto Partitioning
Date:	2007-04-04 18:55:26
Message-ID:	4613F49E.1030901@bluegap.ch
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Hi,

Gregory Stark wrote:
> However there are also cases such as where you have a=0..99 in one partition
> and a=100..199 in partition two, etc. It could still automatically build
> indexes on (a,b,c) on each partition and somehow note that the unique
> constraint is guaranteed across the whole partitioned table.

Uhm... yes, because 'a' is the partitioning key.

According to my outline for partitioning rule sets, you would have a
split @ a <= 100. Probably another one @ a <= 200, etc... but none the
less, 'a' is the only column needed to decide what partition a row has
to end up in, so 'a' is the only column in the partitioning key.

What I'm saying is, that given your example, it's not easily possible to
have an index on (b,a) even if 'a' is also in the partitioning key. It's
very well possible to emulate a multi-table index on (a,b), though.

Brainstorming about this somewhat more: how about having multiple
columns in the partitioning key, i.e. 'a' and 'b', and the following
rule set (which admittedly is somewhat special):

table sample
|
|
split @ a >= 100
/ \
/ \
split @ b >= 100 part3
/ \
/ \
part1 part2

An index on (a, b) could easily be 'emulated' by having such an index on
all the partitions, but can we have an index on (b, a) like that?
Probably not, because at the first split, we would have to duplicate.
I.e. for an index scan on 'b = 22', we would have to scan the index on
part3 as well as part1.

Thus one can say, that an multi-table index can only easily be
'emulated', if it has the same columns as the partitioning key, in the
same order. For the above example, these ones would be possible:

(a)
(a,b)
(a,b,...)

Yet another thought: the emulation of multi-table indexes, in this case,
is like concatenating the indexes of the partitions in the right order.
Asking for an index scan for 'WHERE a >= 95 AND a <= 105' when having a
split at a >= 100, you would have to start on the index in the left
bucket (with a < 100) and return everything until the end of the index,
then continue on the index in the right bucket (with a >= 100). So you
also have to be able to determine an order, which is easily possible for
splits, but not so simple for modulos (hash partitioning).

For such a modulo node, the executor would have to start multiple index
scans, i.e.:

When scanning for a range (i.e. 'WHERE id >= 5 AND id <= 17'), the
planner would have to request an index scan on each of the partition,
joining the results in the right order.

So, why not completely emulate all multi-table index scans? The above
restriction would disappear, if we could teach the planner and executor
how to join multiple index scan results, no?

Questioning the other way around: do we need any sort of multi-table
indexes at all, or isn't it enough to teach the planner and executor how
to intelligently scan through (possibly) multiple indexes to get what is
requested?

Regards

Markus

From:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
To:	"Markus Schiltknecht" <markus(at)bluegap(dot)ch>
Cc:	"Gregory Stark" <stark(at)enterprisedb(dot)com>, "NikhilS" <nikkhils(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Auto Partitioning
Date:	2007-04-04 19:01:45
Message-ID:	1175713305.3623.197.camel@silverbirch.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

On Wed, 2007-04-04 at 20:55 +0200, Markus Schiltknecht wrote:

> Questioning the other way around: do we need any sort of multi-table
> indexes at all, or isn't it enough to teach the planner and executor how
> to intelligently scan through (possibly) multiple indexes to get what is
> requested?

No, I don't think we need multi-table indexes at all.

The planner already uses the Append node to put together multiple plans.
The great thing is it will put together IndexScans and SeqScans as
applicable. No need for multi-scans as a special node type.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

From:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Markus Schiltknecht <markus(at)bluegap(dot)ch>, Gregory Stark <stark(at)enterprisedb(dot)com>, NikhilS <nikkhils(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Auto Partitioning
Date:	2007-04-04 19:10:01
Message-ID:	4613F809.2010607@commandprompt.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Simon Riggs wrote:
> On Wed, 2007-04-04 at 20:55 +0200, Markus Schiltknecht wrote:
>
>> Questioning the other way around: do we need any sort of multi-table
>> indexes at all, or isn't it enough to teach the planner and executor how
>> to intelligently scan through (possibly) multiple indexes to get what is
>> requested?
>
> No, I don't think we need multi-table indexes at all.

If we don't have multi-table indexes how do we enforce a primary key
against a partitioned set? What about non primary keys that are just
UNIQUE? What about check constraints that aren't apart of the exclusion?

Joshua D. Drake

>
> The planner already uses the Append node to put together multiple plans.
> The great thing is it will put together IndexScans and SeqScans as
> applicable. No need for multi-scans as a special node type.
>

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	David Fetter <david(at)fetter(dot)org>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Markus Schiltknecht <markus(at)bluegap(dot)ch>, NikhilS <nikkhils(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Auto Partitioning
Date:	2007-04-04 19:26:44
Message-ID:	4613FBF4.4060109@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

David Fetter wrote:
> On Wed, Apr 04, 2007 at 10:07:39AM -0400, Andrew Dunstan wrote:
>
>> Simon Riggs wrote:
>>
>>> My hope is to have a mechanism to partition indexes or recognise
>>> that they are partitioned, so that a set of provably-distinct
>>> unique indexes can provide the exact same functionlity as a single
>>> large unique index, just without the management nightmare.
>>>
>> Will this address the fairly common data design problem where we
>> need to ensure that a given value is unique across several tables
>> (possibly siblings, possibly not)?
>>
>
> That would be where the provably-distinct part comes in, so yes.
>
>

That assumes you can provide some provably distinct test. In the general
case I have in mind that isn't so.

cheers

andrew

From:	Markus Schiltknecht <markus(at)bluegap(dot)ch>
To:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, NikhilS <nikkhils(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Auto Partitioning
Date:	2007-04-04 19:34:03
Message-ID:	4613FDAB.6080501@bluegap.ch
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Hi,

Joshua D. Drake wrote:
> If we don't have multi-table indexes how do we enforce a primary key
> against a partitioned set?

The executor would have to be clever enough to not do a single index
scan, but possibly scan through multiple indexes when asking for
uniqueness, depending on the partitioning rule set.

Regards

Markus

From:	Markus Schiltknecht <markus(at)bluegap(dot)ch>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Gregory Stark <stark(at)enterprisedb(dot)com>, NikhilS <nikkhils(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Auto Partitioning
Date:	2007-04-04 19:40:31
Message-ID:	4613FF2F.6020102@bluegap.ch
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Simon Riggs wrote:
> The planner already uses the Append node to put together multiple plans.
> The great thing is it will put together IndexScans and SeqScans as
> applicable. No need for multi-scans as a special node type.

Yes... only that mixing 'concurrent' index scans in the right order
would probably save us an extra sort step in some cases. Consider this
with hash partitioning on (id):

SELECT * FROM test WHERE id > 1 AND id < 9999999 ORDER BY id;

Every partition should have an index on (id), so we already have pretty
well sorted data, we just need to mix the results of the index scan in
the correct order, no?

Regards

Markus

From:	Markus Schiltknecht <markus(at)bluegap(dot)ch>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	David Fetter <david(at)fetter(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, NikhilS <nikkhils(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Auto Partitioning
Date:	2007-04-04 19:41:34
Message-ID:	4613FF6E.2020804@bluegap.ch
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Andrew Dunstan wrote:
> David Fetter wrote:
>> That would be where the provably-distinct part comes in, so yes.
>
> That assumes you can provide some provably distinct test. In the general
> case I have in mind that isn't so.

Could you please give a somewhat more concrete example, I'm not
following here.

Thanks

Markus

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Markus Schiltknecht <markus(at)bluegap(dot)ch>
Cc:	David Fetter <david(at)fetter(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, NikhilS <nikkhils(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Auto Partitioning
Date:	2007-04-04 20:05:55
Message-ID:	46140523.2030307@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Markus Schiltknecht wrote:
> Andrew Dunstan wrote:
>> David Fetter wrote:
>>> That would be where the provably-distinct part comes in, so yes.
>>
>> That assumes you can provide some provably distinct test. In the
>> general case I have in mind that isn't so.
>
> Could you please give a somewhat more concrete example, I'm not
> following here.

What I'm asking about has nothing much to do with partitioning.

Say I have two tables, each with a field FKed to a field in a third
table. I'd like to create the values to be unique across the referring
tables. Now, there are various tricks that can be played either with
custom triggers or redundant data to do this, but there's no easy way.
However, a multi-table unique index would do it for me quite nicely, if
we could create such a thing.

However, I don't know how to set up a test for provable distinctness in
this general case.

I guess my point was really that multi-table indexes might have uses
beyond partitioning.

cheers

andrew

From:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
To:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc:	"Markus Schiltknecht" <markus(at)bluegap(dot)ch>, "Gregory Stark" <stark(at)enterprisedb(dot)com>, "NikhilS" <nikkhils(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Auto Partitioning
Date:	2007-04-04 20:29:08
Message-ID:	1175718548.3623.234.camel@silverbirch.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

On Wed, 2007-04-04 at 12:10 -0700, Joshua D. Drake wrote:
> Simon Riggs wrote:
> > On Wed, 2007-04-04 at 20:55 +0200, Markus Schiltknecht wrote:
> >
> >> Questioning the other way around: do we need any sort of multi-table
> >> indexes at all, or isn't it enough to teach the planner and executor how
> >> to intelligently scan through (possibly) multiple indexes to get what is
> >> requested?
> >
> > No, I don't think we need multi-table indexes at all.
>
> If we don't have multi-table indexes how do we enforce a primary key
> against a partitioned set? What about non primary keys that are just
> UNIQUE? What about check constraints that aren't apart of the exclusion?

What I've been saying is that there is a way to do this that avoids the
need for multi-table indexes (MTIs), see earlier discussion. That way
avoids the massive performance overheads of MTIs, and also covers most
use-cases I can personally imagine.

I can come up with arbitrary examples that require them, but I've not
seen one that makes sense in a real business app. Calling columns a, b
and c disguises the validity of the example, IMHO.

I'm not against someone else writing them and I'm sure its a great
intellectual challenge, but I doubt whether it is worth the trouble
anytime soon because the real range of uses for them is not that wide.
Sure, Oracle has them, but in my view they are welcome to them too.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

From:	Markus Schiltknecht <markus(at)bluegap(dot)ch>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	David Fetter <david(at)fetter(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, NikhilS <nikkhils(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Auto Partitioning
Date:	2007-04-04 20:30:12
Message-ID:	46140AD4.20002@bluegap.ch
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Hi,

Andrew Dunstan wrote:
> I guess my point was really that multi-table indexes might have uses
> beyond partitioning.

Aha, now I understand. Thanks for the clarification.

> Say I have two tables, each with a field FKed to a field in a third
> table. I'd like to create the values to be unique across the referring
> tables. Now, there are various tricks that can be played either with
> custom triggers or redundant data to do this, but there's no easy way.
> However, a multi-table unique index would do it for me quite nicely, if
> we could create such a thing.

Maybe going into a similar direction and better think of it as a
multi-table uniqueness constraint, which internally uses multiple,
single-table indexes?

Regards

Markus

From:	Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
To:	pgsql-hackers(at)postgresql(dot)org
Cc:	NikhilS <nikkhils(at)gmail(dot)com>, "Markus Schiltknecht" <markus(at)bluegap(dot)ch>, simon(at)2ndquadrant(dot)com
Subject:	Re: Auto Partitioning
Date:	2007-04-04 20:55:54
Message-ID:	200704041655.55105.xzilla@users.sourceforge.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

On Wednesday 04 April 2007 09:19, NikhilS wrote:
> Our current partitioning solution is based on inheritance. With that in
> mind, for 8.3 I thought an implementation based on auto rules creation
> would be the way to go.
>

The only problem I have with this is that the shops I know with big
partitioned tables favor triggers over rules for both performance reason and
a cleaner implementation. Even with automated rule creation this isnt going
to change afaics... not to mention we already create our rules & triggers
automatically, so really this just isn't exciting to me (though it may make
it easier for people getting in on the ground floor)

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

From:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To:	Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
Cc:	pgsql-hackers(at)postgresql(dot)org, NikhilS <nikkhils(at)gmail(dot)com>, Markus Schiltknecht <markus(at)bluegap(dot)ch>, simon(at)2ndquadrant(dot)com
Subject:	Re: Auto Partitioning
Date:	2007-04-04 20:59:38
Message-ID:	461411BA.3040108@commandprompt.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Robert Treat wrote:
> On Wednesday 04 April 2007 09:19, NikhilS wrote:
>> Our current partitioning solution is based on inheritance. With that in
>> mind, for 8.3 I thought an implementation based on auto rules creation
>> would be the way to go.
>>
>
> The only problem I have with this is that the shops I know with big
> partitioned tables favor triggers over rules for both performance reason and
> a cleaner implementation. Even with automated rule creation this isnt going
> to change afaics... not to mention we already create our rules & triggers
> automatically, so really this just isn't exciting to me (though it may make
> it easier for people getting in on the ground floor)

I second this. The trigger route is much more maintainable than the rule
route. IMO what really needs to happen is something more low level where
there are no DBA visible changes. Triggers also have overhead, it would
be nice to get a little more bare metal with this.

Sincerely,

Joshua D. Drake

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "Markus Schiltknecht" <markus(at)bluegap(dot)ch>, "NikhilS" <nikkhils(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Auto Partitioning
Date:	2007-04-05 01:17:00
Message-ID:	87wt0rk4c3.fsf@oxford.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

"Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:

>> If we don't have multi-table indexes how do we enforce a primary key
>> against a partitioned set? What about non primary keys that are just
>> UNIQUE? What about check constraints that aren't apart of the exclusion?
>
> I can come up with arbitrary examples that require them, but I've not
> seen one that makes sense in a real business app. Calling columns a, b
> and c disguises the validity of the example, IMHO.

Usually it comes with a situation where you want to do something like
"partition invoices by invoice_date" while simultaneously "use invoice_num" as
the primary key".

Normally the invoices numbers will be incrementing chronologically but there's
no constraint or any mechanism to enforce that or to enforce that an old
invoice number from an old partition isn't reused.

In practice I think this isn't really a serious problem though. The old
partitions are going to be read-only so you can just check that the invoice
number doesn't already exist without worrying about race conditions. And in
most cases it's being sequence-generated or something equally reliable so the
constraints are really just there as a backstop; you're not depending on them
for correctness.

At some level not having them is actually a nice thing for DBAs. It gives them
an excuse for not having the constraint that will only cause them maintenance
headaches down the road. But it's dangerous to go too far down that road.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

From:	Martijn van Oosterhout <kleptog(at)svana(dot)org>
To:	Markus Schiltknecht <markus(at)bluegap(dot)ch>
Cc:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, NikhilS <nikkhils(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Auto Partitioning
Date:	2007-04-05 08:10:29
Message-ID:	20070405081029.GA17587@svana.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

On Wed, Apr 04, 2007 at 09:34:03PM +0200, Markus Schiltknecht wrote:
> Joshua D. Drake wrote:
> >If we don't have multi-table indexes how do we enforce a primary key
> >against a partitioned set?
>
> The executor would have to be clever enough to not do a single index
> scan, but possibly scan through multiple indexes when asking for
> uniqueness, depending on the partitioning rule set.

But it's not the executor that checks uniqueness, it's built into the
btre code.

If someone manages to crack uniqueness for GiST indexes, we'll have our
answer, since it has exactly the same problem but on a different scale.
(Or vice-versa, if some gets uniqueness for multiple indexes, we can do
it for GiST also).

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

From:	NikhilS <nikkhils(at)gmail(dot)com>
To:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc:	"Robert Treat" <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-hackers(at)postgresql(dot)org, "Markus Schiltknecht" <markus(at)bluegap(dot)ch>, simon(at)2ndquadrant(dot)com
Subject:	Re: Auto Partitioning
Date:	2007-04-05 08:29:35
Message-ID:	d3c4af540704050129i3d7366bboa5faede88dc5d2bb@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Hi,

> > The only problem I have with this is that the shops I know with big
> > partitioned tables favor triggers over rules for both performance reason
> and
> > a cleaner implementation. Even with automated rule creation this isnt
> going
> > to change afaics... not to mention we already create our rules &
> triggers
> > automatically, so really this just isn't exciting to me (though it may
> make
> > it easier for people getting in on the ground floor)
>
> I second this. The trigger route is much more maintainable than the rule
> route. IMO what really needs to happen is something more low level where
> there are no DBA visible changes. Triggers also have overhead, it would
> be nice to get a little more bare metal with this.

I had raised this issue about rules/triggers back then and the responses
seemed to be evenly split as to which ones to use.

I think the broad question really is how well we want to support the current
inheritance based partitioning mechanism. If we want to stick to it for a
while (and to which we will stick to unless something concrete/better/"bare
metal" comes up), IMHO we should try to make things simpler (by automating
things if possible) to make it easier for people getting in.

Regards,
Nikhils

--
EnterpriseDB http://www.enterprisedb.com

From:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
To:	"NikhilS" <nikkhils(at)gmail(dot)com>
Cc:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "Robert Treat" <xzilla(at)users(dot)sourceforge(dot)net>, <pgsql-hackers(at)postgresql(dot)org>, "Markus Schiltknecht" <markus(at)bluegap(dot)ch>
Subject:	Re: Auto Partitioning
Date:	2007-04-05 09:03:59
Message-ID:	1175763839.3623.291.camel@silverbirch.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

On Thu, 2007-04-05 at 13:59 +0530, NikhilS wrote:
> Hi,
> > The only problem I have with this is that the shops I know
> with big
> > partitioned tables favor triggers over rules for both
> performance reason and
> > a cleaner implementation. Even with automated rule creation
> this isnt going
> > to change afaics... not to mention we already create our
> rules & triggers
> > automatically, so really this just isn't exciting to me
> (though it may make
> > it easier for people getting in on the ground floor)
>
> I second this. The trigger route is much more maintainable
> than the rule
> route. IMO what really needs to happen is something more low
> level where
> there are no DBA visible changes. Triggers also have overhead,
> it would
> be nice to get a little more bare metal with this.
>
> I had raised this issue about rules/triggers back then and the
> responses seemed to be evenly split as to which ones to use.

Presumably your implementation already uses Triggers for INSERTs though,
so why not use triggers for everything?

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

From:	NikhilS <nikkhils(at)gmail(dot)com>
To:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "Robert Treat" <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-hackers(at)postgresql(dot)org, "Markus Schiltknecht" <markus(at)bluegap(dot)ch>
Subject:	Re: Auto Partitioning
Date:	2007-04-05 11:05:31
Message-ID:	d3c4af540704050405q329553a5if2d5a9f98c6b0a99@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Hi,

>
> > I had raised this issue about rules/triggers back then and the
> > responses seemed to be evenly split as to which ones to use.
>
> Presumably your implementation already uses Triggers for INSERTs though,
> so why not use triggers for everything?

No I am using rules for all the 3 cases. I am done with the UPDATE stuff too
on which I was stuck with some help, so here is what the patch will do:

postgres=# create table test1 (a int unique , b int check (b > 0)) partition
by range(a) (partition child_1 check (a < 10));
NOTICE: CREATE TABLE / UNIQUE will create implicit index "test1_a_key" for
table "test1"
NOTICE: CREATE TABLE / UNIQUE will create implicit index "child_1_a_key"
for table "child_1"
CREATE TABLE

A describe of the parent shows the rules added to it:
postgres=# \d test1
Table "public.test1"
Column | Type | Modifiers
--------+---------+-----------
a | integer |
b | integer |
Indexes:
"test1_a_key" UNIQUE, btree (a)
Check constraints:
"test1_b_check" CHECK (b > 0)
Rules:
test1_child_1_delete AS
ON DELETE TO test1
WHERE old.a < 10 DO INSTEAD DELETE FROM child_1
WHERE child_1.a = old.a
test1_child_1_insert AS
ON INSERT TO test1
WHERE new.a < 10 DO INSTEAD INSERT INTO child_1 (a, b)
VALUES (new.a, new.b)
test1_child_1_update AS
ON UPDATE TO test1
WHERE old.a < 10 DO INSTEAD UPDATE child_1 SET a = new.a, b = new.b
WHERE child_1.a = old.a

Whereas a describe on the child shows the following:

Regards,
Nikhils
--
EnterpriseDB http://www.enterprisedb.com

From:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
To:	"NikhilS" <nikkhils(at)gmail(dot)com>
Cc:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "Robert Treat" <xzilla(at)users(dot)sourceforge(dot)net>, <pgsql-hackers(at)postgresql(dot)org>, "Markus Schiltknecht" <markus(at)bluegap(dot)ch>
Subject:	Re: Auto Partitioning
Date:	2007-04-05 11:28:16
Message-ID:	1175772496.3623.301.camel@silverbirch.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

On Thu, 2007-04-05 at 16:35 +0530, NikhilS wrote:
> Hi,
>
> > I had raised this issue about rules/triggers back then and
> the
> > responses seemed to be evenly split as to which ones to
> use.
>
> Presumably your implementation already uses Triggers for
> INSERTs though,
> so why not use triggers for everything?
>
> No I am using rules for all the 3 cases.

So we are unable to load any of the tables using COPY.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

From:	NikhilS <nikkhils(at)gmail(dot)com>
To:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "Robert Treat" <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-hackers(at)postgresql(dot)org, "Markus Schiltknecht" <markus(at)bluegap(dot)ch>
Subject:	Re: Auto Partitioning
Date:	2007-04-05 12:13:26
Message-ID:	d3c4af540704050513g1c2ad8e3nc4d5b65abd252c07@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

>
>
> So we are unable to load any of the tables using COPY.

Aww, guess should have stuck to triggers as a first choice. Mea culpa, since
I should have investigated some more before deciding on rules, or should
have prodded you more earlier:)

Regards,
Nikhils

--
EnterpriseDB http://www.enterprisedb.com

From:	Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
To:	pgsql-hackers(at)postgresql(dot)org
Cc:	Gregory Stark <stark(at)enterprisedb(dot)com>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "Markus Schiltknecht" <markus(at)bluegap(dot)ch>, "NikhilS" <nikkhils(at)gmail(dot)com>
Subject:	Re: Auto Partitioning
Date:	2007-04-05 15:15:42
Message-ID:	200704051115.43104.xzilla@users.sourceforge.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

On Wednesday 04 April 2007 21:17, Gregory Stark wrote:
> "Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
> >> If we don't have multi-table indexes how do we enforce a primary key
> >> against a partitioned set? What about non primary keys that are just
> >> UNIQUE? What about check constraints that aren't apart of the exclusion?
> >
> > I can come up with arbitrary examples that require them, but I've not
> > seen one that makes sense in a real business app. Calling columns a, b
> > and c disguises the validity of the example, IMHO.
>
> Usually it comes with a situation where you want to do something like
> "partition invoices by invoice_date" while simultaneously "use invoice_num"
> as the primary key".
>
> Normally the invoices numbers will be incrementing chronologically but
> there's no constraint or any mechanism to enforce that or to enforce that
> an old invoice number from an old partition isn't reused.
>
> In practice I think this isn't really a serious problem though. The old
> partitions are going to be read-only so you can just check that the invoice
> number doesn't already exist without worrying about race conditions.

In practice many people need a PK on the table not just as a unique identifier
for the row, but to act as a parent in a FK relationship. If you start your
schema with one table and have to break it up into partitions later, this
will raise a number of red flags.

> And in
> most cases it's being sequence-generated or something equally reliable so
> the constraints are really just there as a backstop; you're not depending
> on them for correctness.
>

With that argument why have unique constraints at all?

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

From:	"Zeugswetter Andreas ADI SD" <ZeugswetterA(at)spardat(dot)at>
To:	"Markus Schiltknecht" <markus(at)bluegap(dot)ch>, "Gregory Stark" <stark(at)enterprisedb(dot)com>
Cc:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>, "NikhilS" <nikkhils(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Auto Partitioning
Date:	2007-04-05 15:24:08
Message-ID:	E1539E0ED7043848906A8FF995BDA57901E7B8D2@m0143.s-mxs.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

> > That lets you enforce unique constraints as long as the partition
key
> > is part of the unique constraint.
>
> Is that already sufficient?

yes

> That would alter the ordering of
> the columns in the index, no? I mean:

It produces ordered blocks of append nodes for range queries that span
multiple partitions,
but one unique key can still only be in exactly one of the partitions.
e.g. If you range partition by b, only one partition is applicable
regardless
of the position of b in the index. This is sufficient for a working
unique constraint
with current pg versions.

> CREATE INDEX x ON test(a, b, c);
>
> isn't the same as
>
> CRETAE INDEX x ON test(c, b, a);

That is only a problem if you also want to avoid a sort (e.g. for an
order by),
it is not an issue for filtering rows.
And in some cases the sort could still be avoided with some range
proving logic,
if you can bring the append nodes of partitions into an order that
represents
the order by.
(an example would be a query "where c=5 and b between 0 and 20"
and two partitions one for 0 <= b < 10 and a second for 10 <= b)

> That's why I'd say, the first columns of an index would have
> to be equal to all of the columns used in the partitioning key.

No. It may change performance in some situations, but it is not needed
for unique constraints.

Andreas

From:	Markus Schiltknecht <markus(at)bluegap(dot)ch>
To:	Zeugswetter Andreas ADI SD <ZeugswetterA(at)spardat(dot)at>
Cc:	Gregory Stark <stark(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, NikhilS <nikkhils(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Auto Partitioning
Date:	2007-04-05 16:29:27
Message-ID:	461523E7.4070908@bluegap.ch
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Hi,

Zeugswetter Andreas ADI SD wrote:
>
>> CREATE INDEX x ON test(a, b, c);
>>
>> isn't the same as
>>
>> CRETAE INDEX x ON test(c, b, a);
>
> That is only a problem if you also want to avoid a sort (e.g. for an
> order by),

..or if you want to use that index for 'WHERE a = 5'. The first one is
probably helping you, the second isn't.

> (an example would be a query "where c=5 and b between 0 and 20"
> and two partitions one for 0 <= b < 10 and a second for 10 <= b)

Hm.. in that case, an index on (a, b, c) wouldn't help. An index on (c,
b, a) would be just perfect, agreed?

Now, for the partitioning: you simply have to scan two partitions in
that case, no matter how you arrange your indexes. And this is where we
need some sort of multi-table index scan functionality. (I'm not saying
a multi-table index. Such a thing would be too large on disk. That
functionality should probably better be realized by using the underlying
per-table indexes).

>> That's why I'd say, the first columns of an index would have
>> to be equal to all of the columns used in the partitioning key.

I correct my own statement somewhat, here: only in that case, a single
table index can satisfy your request. For other cases, you'd have to
query more than one partition's indexes and mix them correctly to
maintain the right order, if required.

> No. It may change performance in some situations, but it is not needed
> for unique constraints.

Agreed, for unique constraints. But indexes are used for some more
things than just unique constraints checking. ;-)

Regards

Markus

From:	Markus Schiltknecht <markus(at)bluegap(dot)ch>
To:	Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, NikhilS <nikkhils(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Auto Partitioning
Date:	2007-04-05 20:00:37
Message-ID:	46155565.2010605@bluegap.ch
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Hi,

Martijn van Oosterhout wrote:
>> The executor would have to be clever enough to not do a single index
>> scan, but possibly scan through multiple indexes when asking for
>> uniqueness, depending on the partitioning rule set.
>
> But it's not the executor that checks uniqueness, it's built into the
> btre code.

Well, it's the executor calling into the btree code. Couldn't the
executor choose which (btree-) indexes to query for uniqueness?

> If someone manages to crack uniqueness for GiST indexes, we'll have our
> answer, since it has exactly the same problem but on a different scale.
> (Or vice-versa, if some gets uniqueness for multiple indexes, we can do
> it for GiST also).

Uh.. can you elaborate on that? AFAICS, you would simply have to query
multiple btree indexes and make sure non of them is violated. How would
that help making unique GiST indexes possible? What's the problem there?

Regards

Markus

From:	Martijn van Oosterhout <kleptog(at)svana(dot)org>
To:	Markus Schiltknecht <markus(at)bluegap(dot)ch>
Cc:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, NikhilS <nikkhils(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Auto Partitioning
Date:	2007-04-05 21:03:08
Message-ID:	20070405210308.GE17587@svana.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

On Thu, Apr 05, 2007 at 10:00:37PM +0200, Markus Schiltknecht wrote:
> >If someone manages to crack uniqueness for GiST indexes, we'll have our
> >answer, since it has exactly the same problem but on a different scale.
> >(Or vice-versa, if some gets uniqueness for multiple indexes, we can do
> >it for GiST also).
>
> Uh.. can you elaborate on that? AFAICS, you would simply have to query
> multiple btree indexes and make sure non of them is violated. How would
> that help making unique GiST indexes possible? What's the problem there?

There's a race condition. What happens if someone else tries to insert
the same key at the same time. If you know it's going to be in the same
index you can lock the page. Across multiple indexes you run into
deadlock issues.

I'm not saying it's hard, just that it's not as easy as checking each
index...

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Markus Schiltknecht <markus(at)bluegap(dot)ch>
Cc:	Martijn van Oosterhout <kleptog(at)svana(dot)org>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, NikhilS <nikkhils(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Auto Partitioning
Date:	2007-04-06 05:56:36
Message-ID:	28487.1175838996@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Markus Schiltknecht <markus(at)bluegap(dot)ch> writes:
> Uh.. can you elaborate on that? AFAICS, you would simply have to query
> multiple btree indexes and make sure non of them is violated.

That only works for the partition-key indexes, ie, ones where you can be
sure a-priori that there cannot be duplicate keys in two different indexes.
I believe the complaint here is that people would like to be able to
enforce uniqueness across the whole partitioned table on columns that
are not part of the partition key.

(But that sounds rather like pie in the sky, actually. Which other
databases can do that, and how do they do it?)

regards, tom lane

From:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Markus Schiltknecht" <markus(at)bluegap(dot)ch>, "Martijn van Oosterhout" <kleptog(at)svana(dot)org>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "Gregory Stark" <stark(at)enterprisedb(dot)com>, "NikhilS" <nikkhils(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Auto Partitioning
Date:	2007-04-06 07:11:18
Message-ID:	1175843479.3623.463.camel@silverbirch.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

On Fri, 2007-04-06 at 01:56 -0400, Tom Lane wrote:
> Markus Schiltknecht <markus(at)bluegap(dot)ch> writes:
> > Uh.. can you elaborate on that? AFAICS, you would simply have to query
> > multiple btree indexes and make sure non of them is violated.
>
> That only works for the partition-key indexes, ie, ones where you can be
> sure a-priori that there cannot be duplicate keys in two different indexes.
> I believe the complaint here is that people would like to be able to
> enforce uniqueness across the whole partitioned table on columns that
> are not part of the partition key.

I see that as a logical modelling problem, not a physical one.

If you partition on invoice_date, but have PK=invoice_id then it seems
straightforward to change the model so that the PK is a compound key
(invoice_id, invoice_period). This works whether or nor invoice_id is
unique on its own. And this is typically the way things are modelled in
the real world anyway, since such things existed from the time of paper
filing systems where partitioning like that was required to quickly
locate a file in a paper archive/library.

If we partition on invoice_date only, there is an implication that
people will search for invoices on date range only too, otherwise why
not just partition on invoice_id. This still works with the compound key
approach.

> (But that sounds rather like pie in the sky, actually. Which other
> databases can do that, and how do they do it?)

Oracle does it, by building a big index. Few people use it.

There are significant problems with this idea that I have already
raised:
- how big would the index be?
- how would you add and remove partitions with any kind of performance?
If we partitioned on date range, that will surely increase over time.
- the index could almost certainly never be REINDEXed because of space
requirements and time considerations.
- if the indexed values were monotonically increasing the RHS of the
index would become a significant hotspot in load performance, assuming
high volume inserts into a large table

My argument is that there are significant real-world disadvantages to
having this feature, yet there exists a reasonable workaround to avoid
ever needing it. Why would we spend time building and supporting it?

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Markus Schiltknecht" <markus(at)bluegap(dot)ch>, "Martijn van Oosterhout" <kleptog(at)svana(dot)org>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "NikhilS" <nikkhils(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Auto Partitioning
Date:	2007-04-06 07:40:48
Message-ID:	87ircaj6gv.fsf@oxford.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

"Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:

> If we partition on invoice_date only, there is an implication that
> people will search for invoices on date range only too, otherwise why
> not just partition on invoice_id. This still works with the compound key
> approach.

Well there are practical problems with partitioning on invoice_id. It's
convenient to have a predictable partition definition that can be calculated
in advance.

I suspect what people did with paper invoices is look at the last invoice for
a period and note the invoice_id down to check all future invoice_ids against.
Essentially partitioning on two separate equivalent partition keys.

We could do the same sort of thing since we're looking at constraints, there's
nothing stopping the partitions from having two separate but effectively
equivalent constraints on them. I'm not sure how to describe "partition based
on this rule for dates but note the range of invoice_ids covering a partition
and generate a constraint for that as well"

But if we could find a way to represent that it would make a lot of common use
cases much more convenient to use.

>> (But that sounds rather like pie in the sky, actually. Which other
>> databases can do that, and how do they do it?)
>
> Oracle does it, by building a big index. Few people use it.

The people that use it are the people stuck by dogmatic rules about "every
table must have a primary key" or "every logical constraint must be protected
by a database constraint". Ie, database shops run by the CYA principle.

But if a database feature is hurting you more than it's helping you then
you're not doing yourself any favours by using it. The database is a tool to
make your life easier, not something to flog yourself with to prove how good
your database design skills are.

Oracle calls these "global" indexes and imho they defeat the whole purpose
behind partitioning your data in the first place.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

From:	"Zeugswetter Andreas ADI SD" <ZeugswetterA(at)spardat(dot)at>
To:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Markus Schiltknecht" <markus(at)bluegap(dot)ch>, "Martijn van Oosterhout" <kleptog(at)svana(dot)org>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "Gregory Stark" <stark(at)enterprisedb(dot)com>, "NikhilS" <nikkhils(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Auto Partitioning
Date:	2007-04-06 10:47:46
Message-ID:	E1539E0ED7043848906A8FF995BDA57901E7B93D@m0143.s-mxs.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

> > (But that sounds rather like pie in the sky, actually. Which other
> > databases can do that, and how do they do it?)
>
> Oracle does it, by building a big index. Few people use it.

And others allow a different partitioning strategy for each index,
but that has the same problem of how to remove partitions without
a huge amount of index reorganization.

> There are significant problems with this idea that I have already
> raised:
> - how big would the index be?
> - how would you add and remove partitions with any kind of
> performance?
> If we partitioned on date range, that will surely increase over time.
> - the index could almost certainly never be REINDEXed because
> of space requirements and time considerations.
> - if the indexed values were monotonically increasing the RHS
> of the index would become a significant hotspot in load
> performance, assuming high volume inserts into a large table

yes

> My argument is that there are significant real-world
> disadvantages to having this feature, yet there exists a
> reasonable workaround to avoid ever needing it.

I'd say a workaround can mostly be found but not always.
But I agree, that the downsides of one large global index are
substantial enough to not make this path attractive.

> Why would we spend time building and supporting it?

What I think we would like to have is putting the append nodes into an
order that allows removing the sort node whenever that can be done. And
maybe a merge node (that replaces the append and sort node) that can
merge presorted partitions.

I have one real example where I currently need one large non unique
index in Informix.
It is a journal table that is partitioned by client timestamp,
but I need a select first 1000 (of possibly many mio rows) order by
server_timestamp in a range
that naturally sometimes needs more than one partition because client
and server timestamps diverge.
Here the merge facility would allow me to not use the global index and
still avoid sorting
millions of rows (which would not finish in time).
Problem with the global index is, that I have to delete all rows from
the oldest partition before removing it
to avoid rebuilding the global index.

Andreas

From:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
To:	"Zeugswetter Andreas ADI SD" <ZeugswetterA(at)spardat(dot)at>
Cc:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Markus Schiltknecht" <markus(at)bluegap(dot)ch>, "Martijn van Oosterhout" <kleptog(at)svana(dot)org>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "Gregory Stark" <stark(at)enterprisedb(dot)com>, "NikhilS" <nikkhils(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Auto Partitioning
Date:	2007-04-06 13:09:19
Message-ID:	1175864959.3623.497.camel@silverbirch.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

On Fri, 2007-04-06 at 12:47 +0200, Zeugswetter Andreas ADI SD wrote:
> What I think we would like to have is putting the append nodes into an
> order that allows removing the sort node whenever that can be done.
> And
> maybe a merge node (that replaces the append and sort node) that can
> merge presorted partitions.

Yes, we definitely need a way to express the ordering of partitions. We
can use this in the way that you say, as well as being able to do faster
inclusion/exclusion:

i.e. if we have partitions for each year (2001, 2002, 2003 2004, 2005,
2006, 2007) AND we have already proved that 2005 is excluded when we
have a WHERE clause saying year >= 2006, then we should be able to use
the ordering to prove that partitions for 2004 and before are also
automatically excluded.

I'll think some more about the Merge node, but not right now.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

From:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To:	Zeugswetter Andreas ADI SD <ZeugswetterA(at)spardat(dot)at>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Markus Schiltknecht <markus(at)bluegap(dot)ch>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Gregory Stark <stark(at)enterprisedb(dot)com>, NikhilS <nikkhils(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Auto Partitioning
Date:	2007-04-06 14:39:16
Message-ID:	46165B94.10809@commandprompt.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Zeugswetter Andreas ADI SD wrote:
>>> (But that sounds rather like pie in the sky, actually. Which other
>>> databases can do that, and how do they do it?)
>> Oracle does it, by building a big index. Few people use it.
>
> And others allow a different partitioning strategy for each index,
> but that has the same problem of how to remove partitions without
> a huge amount of index reorganization.

If you removed a partition, couldn't the index be cleaned up by VACUUM?

Joshua D. Drake

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

From:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To:	Gregory Stark <stark(at)enterprisedb(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Markus Schiltknecht <markus(at)bluegap(dot)ch>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, NikhilS <nikkhils(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Auto Partitioning
Date:	2007-04-06 16:22:55
Message-ID:	461673DF.6050605@commandprompt.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

> But if we could find a way to represent that it would make a lot of common use
> cases much more convenient to use.
>
>>> (But that sounds rather like pie in the sky, actually. Which other
>>> databases can do that, and how do they do it?)
>> Oracle does it, by building a big index. Few people use it.
>
> The people that use it are the people stuck by dogmatic rules about "every
> table must have a primary key" or "every logical constraint must be protected
> by a database constraint". Ie, database shops run by the CYA principle.

Or ones that actually believe that every table where possible should
have a primary key.

There are very, very few instances in good design where a table does not
have a primary key.

It has nothing to do with CYA.

Joshua D. Drake

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

From:	David Fetter <david(at)fetter(dot)org>
To:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc:	Gregory Stark <stark(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Markus Schiltknecht <markus(at)bluegap(dot)ch>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, NikhilS <nikkhils(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Auto Partitioning
Date:	2007-04-06 17:07:57
Message-ID:	20070406170757.GA3784@fetter.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

On Fri, Apr 06, 2007 at 09:22:55AM -0700, Joshua D. Drake wrote:

> >The people that use it are the people stuck by dogmatic rules about
> >"every table must have a primary key" or "every logical constraint
> >must be protected by a database constraint". Ie, database shops run
> >by the CYA principle.
>
> Or ones that actually believe that every table where possible should
> have a primary key.
>
> There are very, very few instances in good design where a table does
> not have a primary key.
>
> It has nothing to do with CYA.

That depends on what you mean by CYA. If you mean, "taking a
precaution just so you can show it's not your fault when the mature
hits the fan," I agree. If you mean, "taking a precaution that will
actually prevent a problem from occurring in the first place," it
definitely does.

Cheers,
D
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
phone: +1 415 235 3778 AIM: dfetter666
Skype: davidfetter

Remember to vote!
Consider donating to PostgreSQL: http://www.postgresql.org/about/donate

From:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To:	David Fetter <david(at)fetter(dot)org>
Cc:	Gregory Stark <stark(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Markus Schiltknecht <markus(at)bluegap(dot)ch>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, NikhilS <nikkhils(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Auto Partitioning
Date:	2007-04-06 17:12:59
Message-ID:	46167F9B.5060503@commandprompt.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

David Fetter wrote:
> On Fri, Apr 06, 2007 at 09:22:55AM -0700, Joshua D. Drake wrote:
>
>>> The people that use it are the people stuck by dogmatic rules about
>>> "every table must have a primary key" or "every logical constraint
>>> must be protected by a database constraint". Ie, database shops run
>>> by the CYA principle.
>> Or ones that actually believe that every table where possible should
>> have a primary key.
>>
>> There are very, very few instances in good design where a table does
>> not have a primary key.
>>
>> It has nothing to do with CYA.
>
> That depends on what you mean by CYA. If you mean, "taking a
> precaution just so you can show it's not your fault when the mature
> hits the fan," I agree. If you mean, "taking a precaution that will
> actually prevent a problem from occurring in the first place," it
> definitely does.

Heh, fair enough. When I think of CYA, I think of the former.

Joshua D. Drake

>
> Cheers,
> D

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

From:	Richard Troy <rtroy(at)ScienceTools(dot)com>
To:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc:	David Fetter <david(at)fetter(dot)org>, Gregory Stark <stark(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Markus Schiltknecht <markus(at)bluegap(dot)ch>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, NikhilS <nikkhils(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Auto Partitioning
Date:	2007-04-06 19:02:48
Message-ID:	Pine.LNX.4.33.0704061155280.10617-100000@denzel.in
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

> David Fetter wrote:
> > On Fri, Apr 06, 2007 at 09:22:55AM -0700, Joshua D. Drake wrote:
> >
> >>> The people that use it are the people stuck by dogmatic rules about
> >>> "every table must have a primary key" or "every logical constraint
> >>> must be protected by a database constraint". Ie, database shops run
> >>> by the CYA principle.
> >> Or ones that actually believe that every table where possible should
> >> have a primary key.
> >>
> >> There are very, very few instances in good design where a table does
> >> not have a primary key.
> >>
> >> It has nothing to do with CYA.
> >
> > That depends on what you mean by CYA. If you mean, "taking a
> > precaution just so you can show it's not your fault when the mature
> > hits the fan," I agree. If you mean, "taking a precaution that will
> > actually prevent a problem from occurring in the first place," it
> > definitely does.
>
> Heh, fair enough. When I think of CYA, I think of the former.
>
> Joshua D. Drake

...I was thinking the point was more on "primary key" as in syntax, as
opposed to a table that has a/an attribute(s) that is acknowledged by DML
coders as the appropriate way to use the stored data. That is, I may very
well _not_ want the overhead of an index of any kind, forced uniqueness,
etc, but might also well think of a given attribute as the primary key.
Use of constraints in lieu of "primary key" come to mind...

'Course, maybe I missed the point! -smile-

'Nother thought: CYA _can_ have odeous performance costs if
over-implemented. It's a matter of using actual use-cases - or observed
behavior - to taylor the CYA solution to fit the need without undue
overhead.

Rgds,
Richard

--
Richard Troy, Chief Scientist
Science Tools Corporation
510-924-1363 or 202-747-1263
rtroy(at)ScienceTools(dot)com, http://ScienceTools.com/

From:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
To:	"Markus Schiltknecht" <markus(at)bluegap(dot)ch>
Cc:	"Zeugswetter Andreas ADI SD" <ZeugswetterA(at)spardat(dot)at>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Martijn van Oosterhout" <kleptog(at)svana(dot)org>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "Gregory Stark" <stark(at)enterprisedb(dot)com>, "NikhilS" <nikkhils(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Auto Partitioning
Date:	2007-04-06 19:26:54
Message-ID:	1175887614.3623.501.camel@silverbirch.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

On Fri, 2007-04-06 at 16:08 +0200, Markus Schiltknecht wrote:
> Simon Riggs wrote:
> > i.e. if we have partitions for each year (2001, 2002, 2003 2004, 2005,
> > 2006, 2007) AND we have already proved that 2005 is excluded when we
> > have a WHERE clause saying year >= 2006, then we should be able to use
> > the ordering to prove that partitions for 2004 and before are also
> > automatically excluded.
>
> Provided you've set up the right constraints, the current
> constraint_exclusion feature does exactly that, no?

The end result yes, the mechanism, no.

> > I'll think some more about the Merge node, but not right now.

> --
> Simon Riggs
> EnterpriseDB http://www.enterprisedb.com
>

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	NikhilS <nikkhils(at)gmail(dot)com>
Cc:	pgsql-patches(at)postgresql(dot)org
Subject:	Re: Auto Partitioning Patch - WIP version 1
Date:	2007-04-26 23:08:22
Message-ID:	200704262308.l3QN8MU02077@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

NikhilS wrote:
> Hi,
>
> Please find attached the WIP version 1 of the auto partitioning patch. There
> was discussion on this a while back on -hackers at:
> http://archives.postgresql.org/pgsql-hackers/2007-03/msg00375.php
>
> Please note that this patch tries to automate the activities that currently
> are carried out manually. It does nothing fancy beyond that for now. There
> were a lot of good suggestions, I have noted them down but for now I have
> tried to stick to the initial goal of automating existing steps for
> providing partitioning.
>
> Things that this patch does:
>
> i) Handle new syntax to provide partitioning:
>
> CREATE TABLE tabname (
> ...
> ) PARTITION BY
> RANGE(ColId)
> | LIST(ColId)
> (
> PARTITION partition_name CHECK(...),
> PARTITION partition_name CHECK(...)
> ...
> );
>
> ii) Create master table.
> iii) Create children tables based on the number of partitions specified and
> make them inherit from the master table.
>
> The following things are TODOs:
>
> iv) Auto generate rules using the checks mentioned for the partitions, to
> handle INSERTs/DELETEs/UPDATEs to navigate them to the appropriate child.
> Note that checks specified directly on the master table will get inherited
> automatically.
> v) Based on the PRIMARY, UNIQUE information specified, pass it on to the
> children tables.
> vi) [stretch goal] Support HASH partitions
>
> Will try to complete the above mentioned TODOs as soon as is possible.
>
> Comments, feedback appreciated.
>
> Thanks and Regards,
> Nikhils
> --
>
> EnterpriseDB http://www.enterprisedb.com

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

From:	Simon Riggs <simon(at)2ndquadrant(dot)com>
To:	NikhilS <nikkhils(at)gmail(dot)com>
Cc:	pgsql-patches(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>
Subject:	Re: Auto Partitioning Patch - WIP version 1
Date:	2008-03-21 14:00:11
Message-ID:	1206108011.4285.666.camel@ebony.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

On Fri, 2007-03-30 at 12:28 +0530, NikhilS wrote:

> Please find attached the WIP version 1 of the auto partitioning patch.
> There was discussion on this a while back on -hackers at:
> http://archives.postgresql.org/pgsql-hackers/2007-03/msg00375.php
>
> Please note that this patch tries to automate the activities that
> currently are carried out manually. It does nothing fancy beyond that
> for now. There were a lot of good suggestions, I have noted them down
> but for now I have tried to stick to the initial goal of automating
> existing steps for providing partitioning.
>
> Things that this patch does:

I think this patch is a reasonable first step and clearly written, but
not yet ready for application to Postgres in this commit fest.

I would say we need:

* Clear explanation of the new syntax, with examples of each permutation
so we can see how that would work. In light of recent discussions on
-hackers we need to take a view on whether we should go with Gavin's
suggested syntax or this syntax.

* There are some additional syntax items I don't understand the need
for. So these need to be explained.

* I would be against using the term PARTITION BY since it is already a
phrase that is part of the SQL Standard. Perhaps PARTITIONED BY?

* We need regression tests for any new command syntax

* No docs - that might be the same thing as the first item

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk

From:	NikhilS <nikkhils(at)gmail(dot)com>
To:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc:	pgsql-patches(at)postgresql(dot)org, "Bruce Momjian" <bruce(at)momjian(dot)us>
Subject:	Re: Auto Partitioning Patch - WIP version 1
Date:	2008-03-21 14:45:43
Message-ID:	d3c4af540803210745j5ded6107r5871095a3aeb91af@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Hi Simon,

On Fri, Mar 21, 2008 at 7:30 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

> On Fri, 2007-03-30 at 12:28 +0530, NikhilS wrote:
>
> > Please find attached the WIP version 1 of the auto partitioning patch.
> > There was discussion on this a while back on -hackers at:
> > http://archives.postgresql.org/pgsql-hackers/2007-03/msg00375.php
> >
> > Please note that this patch tries to automate the activities that
> > currently are carried out manually. It does nothing fancy beyond that
> > for now. There were a lot of good suggestions, I have noted them down
> > but for now I have tried to stick to the initial goal of automating
> > existing steps for providing partitioning.
> >
> > Things that this patch does:
>
> I think this patch is a reasonable first step and clearly written, but
> not yet ready for application to Postgres in this commit fest.
>
> I would say we need:
>
> * Clear explanation of the new syntax, with examples of each permutation
> so we can see how that would work. In light of recent discussions on
> -hackers we need to take a view on whether we should go with Gavin's
> suggested syntax or this syntax.
>
> * There are some additional syntax items I don't understand the need
> for. So these need to be explained.
>
> * I would be against using the term PARTITION BY since it is already a
> phrase that is part of the SQL Standard. Perhaps PARTITIONED BY?
>
> * We need regression tests for any new command syntax
>
> * No docs - that might be the same thing as the first item
>
> --

Thanks for taking a look. But if I am not mistaken Gavin and co. are working
on a much exhaustive proposal. In light of that maybe this patch might not
be needed in the first place?

I will wait for discussion and a subsequent collective consensus here,
before deciding the further course of actions.

Regards,
Nikhils
--
EnterpriseDB http://www.enterprisedb.com

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	NikhilS <nikkhils(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject:	Re: Auto Partitioning Patch - WIP version 1
Date:	2008-03-21 15:18:41
Message-ID:	200803211518.m2LFIfJ12397@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

NikhilS wrote:
> > * Clear explanation of the new syntax, with examples of each permutation
> > so we can see how that would work. In light of recent discussions on
> > -hackers we need to take a view on whether we should go with Gavin's
> > suggested syntax or this syntax.
> >
> > * There are some additional syntax items I don't understand the need
> > for. So these need to be explained.
> >
> > * I would be against using the term PARTITION BY since it is already a
> > phrase that is part of the SQL Standard. Perhaps PARTITIONED BY?
> >
> > * We need regression tests for any new command syntax
> >
> > * No docs - that might be the same thing as the first item
>
> Thanks for taking a look. But if I am not mistaken Gavin and co. are working
> on a much exhaustive proposal. In light of that maybe this patch might not
> be needed in the first place?
>
> I will wait for discussion and a subsequent collective consensus here,
> before deciding the further course of actions.

I think it is unwise to wait on Gavin for a more complex implemention
--- we might end up with nothing for 8.4. As long as your syntax is
compatible with whatever Gavin proposed Gavin can add on to your patch
once it is applied.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	NikhilS <nikkhils(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject:	Re: Auto Partitioning Patch - WIP version 1
Date:	2008-03-21 15:53:01
Message-ID:	23163.1206114781@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> NikhilS wrote:
>> Thanks for taking a look. But if I am not mistaken Gavin and co. are working
>> on a much exhaustive proposal. In light of that maybe this patch might not
>> be needed in the first place?
>>
>> I will wait for discussion and a subsequent collective consensus here,
>> before deciding the further course of actions.

> I think it is unwise to wait on Gavin for a more complex implemention
> --- we might end up with nothing for 8.4. As long as your syntax is
> compatible with whatever Gavin proposed Gavin can add on to your patch
> once it is applied.

It would be equally unwise to apply a stopgap patch if we're not certain
it will be upward compatible with what we want to do later.

I haven't been through the partitioning threads at all yet, but I think
what we probably want to have when we emerge from commit fest is some
consensus on what the road map is for partitioning.

regards, tom lane

From:	Simon Riggs <simon(at)2ndquadrant(dot)com>
To:	NikhilS <nikkhils(at)gmail(dot)com>
Cc:	pgsql-patches(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>
Subject:	Re: Auto Partitioning Patch - WIP version 1
Date:	2008-03-21 16:53:44
Message-ID:	1206118424.4285.682.camel@ebony.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

On Fri, 2008-03-21 at 20:15 +0530, NikhilS wrote:

> Thanks for taking a look. But if I am not mistaken Gavin and co. are
> working on a much exhaustive proposal. In light of that maybe this
> patch might not be needed in the first place?

We should wait to apply, but not wait to discuss. Somebody must take the
initiative.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk

From:	Simon Riggs <simon(at)2ndquadrant(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	NikhilS <nikkhils(at)gmail(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject:	Re: Auto Partitioning Patch - WIP version 1
Date:	2008-03-21 17:01:35
Message-ID:	1206118895.4285.689.camel@ebony.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

On Fri, 2008-03-21 at 11:18 -0400, Bruce Momjian wrote:

The patch as stands does little apart from bring together many DDL
statements into one. Partitioning is much much more than this so there
seems little reason to think we should rush to commit this, especially
before we get some good docs that explain what it does, and why.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk

From:	NikhilS <nikkhils(at)gmail(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Bruce Momjian" <bruce(at)momjian(dot)us>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject:	Re: Auto Partitioning Patch - WIP version 1
Date:	2008-03-22 05:19:01
Message-ID:	d3c4af540803212219u58dd874bp8557df63285895c0@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Hi,

On Fri, Mar 21, 2008 at 9:23 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > NikhilS wrote:
> >> Thanks for taking a look. But if I am not mistaken Gavin and co. are
> working
> >> on a much exhaustive proposal. In light of that maybe this patch might
> not
> >> be needed in the first place?
> >>
> >> I will wait for discussion and a subsequent collective consensus here,
> >> before deciding the further course of actions.
>
> > I think it is unwise to wait on Gavin for a more complex implemention
> > --- we might end up with nothing for 8.4. As long as your syntax is
> > compatible with whatever Gavin proposed Gavin can add on to your patch
> > once it is applied.
>
> It would be equally unwise to apply a stopgap patch if we're not certain
> it will be upward compatible with what we want to do later.
>
> I haven't been through the partitioning threads at all yet, but I think
> what we probably want to have when we emerge from commit fest is some
> consensus on what the road map is for partitioning.
>

Regards,
Nikhils
--
EnterpriseDB http://www.enterprisedb.com

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	NikhilS <nikkhils(at)gmail(dot)com>
Cc:	pgsql-patches(at)postgresql(dot)org
Subject:	Re: Auto Partitioning Patch - WIP version 1
Date:	2008-03-26 00:55:42
Message-ID:	200803260055.m2Q0tgw24740@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

This has been saved for the next commit-fest:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	NikhilS <nikkhils(at)gmail(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject:	Re: Auto Partitioning Patch - WIP version 1
Date:	2008-03-26 12:31:15
Message-ID:	20080326123115.GA5895@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Bruce Momjian escribió:
>
> This has been saved for the next commit-fest:

I noticed you broke an URL that previously worked: what was
http://momjian.us/mhonarc/patches/d3c4af540703292358s8ed731el7771ab14083aa610(at)mail(dot)gmail(dot)com(dot)html
is now
http://momjian.us/mhonarc/patches_hold/d3c4af540703292358s8ed731el7771ab14083aa610(at)mail(dot)gmail(dot)com(dot)html

May I suggest that the URLs with Message-Ids are stored outside the
particular patch queue directory? The script I showed you yesterday
could be used to do that.

Also I noticed that by moving it to the hold queue, the comments that
may have existed on the patch queue are now gone :-( I'm not sure if
there were any in this case, but it's better if we're aware of that
fact. I think this could be solved if the "namespace" of the comment
does not contain the patch queue name.

I moved it to the May commitfest on the wiki too.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	NikhilS <nikkhils(at)gmail(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject:	Re: Auto Partitioning Patch - WIP version 1
Date:	2008-03-26 13:34:26
Message-ID:	200803261334.m2QDYQ216024@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Alvaro Herrera wrote:
> Bruce Momjian escribi?:
> >
> > This has been saved for the next commit-fest:
>
> I noticed you broke an URL that previously worked: what was
> http://momjian.us/mhonarc/patches/d3c4af540703292358s8ed731el7771ab14083aa610(at)mail(dot)gmail(dot)com(dot)html
> is now
> http://momjian.us/mhonarc/patches_hold/d3c4af540703292358s8ed731el7771ab14083aa610(at)mail(dot)gmail(dot)com(dot)html
>
> May I suggest that the URLs with Message-Ids are stored outside the
> particular patch queue directory? The script I showed you yesterday
> could be used to do that.

My email are added/removed so I am unsure how to do that easily becuase
the lists would share the same directory. We don't have that problem
with the archives.

> Also I noticed that by moving it to the hold queue, the comments that
> may have existed on the patch queue are now gone :-( I'm not sure if
> there were any in this case, but it's better if we're aware of that
> fact. I think this could be solved if the "namespace" of the comment
> does not contain the patch queue name.

I specifically set things up so the comments should move with the email.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	NikhilS <nikkhils(at)gmail(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject:	Re: Auto Partitioning Patch - WIP version 1
Date:	2008-03-26 13:42:02
Message-ID:	20080326134202.GB8100@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Bruce Momjian escribió:
> Alvaro Herrera wrote:

> > I noticed you broke an URL that previously worked: what was
> > http://momjian.us/mhonarc/patches/d3c4af540703292358s8ed731el7771ab14083aa610(at)mail(dot)gmail(dot)com(dot)html
> > is now
> > http://momjian.us/mhonarc/patches_hold/d3c4af540703292358s8ed731el7771ab14083aa610(at)mail(dot)gmail(dot)com(dot)html
> >
> > May I suggest that the URLs with Message-Ids are stored outside the
> > particular patch queue directory? The script I showed you yesterday
> > could be used to do that.
>
> My email are added/removed so I am unsure how to do that easily becuase
> the lists would share the same directory. We don't have that problem
> with the archives.

My point is that you should only _add_ Message-Ids, not remove them.
You can move the messages from one queue to the other to your heart's
content, but the Message-Id URL should continue to work with no changes.

> > Also I noticed that by moving it to the hold queue, the comments that
> > may have existed on the patch queue are now gone :-( I'm not sure if
> > there were any in this case, but it's better if we're aware of that
> > fact. I think this could be solved if the "namespace" of the comment
> > does not contain the patch queue name.
>
> I specifically set things up so the comments should move with the email.

Yeah, I noticed that after sending the email -- the js-kit name seems to
be only "msgid-<foo>". I thought the "permalink=" attribute was part of
that, but perhaps not? In case you added that permalink attribute
because of my request the other day, let me clarify that what I was
actually thinking was having something like

<a href="http://momjian.us/msgid/123foobar(at)mylaptop">permalink</a>

after the name of the poster, so that it would be visible on the index
page and the user didn't have to open the page to get it.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	NikhilS <nikkhils(at)gmail(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject:	Re: Auto Partitioning Patch - WIP version 1
Date:	2008-03-26 13:48:40
Message-ID:	200803261348.m2QDme903600@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Alvaro Herrera wrote:
> Bruce Momjian escribi?:
> > Alvaro Herrera wrote:
>
> > > I noticed you broke an URL that previously worked: what was
> > > http://momjian.us/mhonarc/patches/d3c4af540703292358s8ed731el7771ab14083aa610(at)mail(dot)gmail(dot)com(dot)html
> > > is now
> > > http://momjian.us/mhonarc/patches_hold/d3c4af540703292358s8ed731el7771ab14083aa610(at)mail(dot)gmail(dot)com(dot)html
> > >
> > > May I suggest that the URLs with Message-Ids are stored outside the
> > > particular patch queue directory? The script I showed you yesterday
> > > could be used to do that.
> >
> > My email are added/removed so I am unsure how to do that easily becuase
> > the lists would share the same directory. We don't have that problem
> > with the archives.
>
> My point is that you should only _add_ Message-Ids, not remove them.
> You can move the messages from one queue to the other to your heart's
> content, but the Message-Id URL should continue to work with no changes.

Well, when an email is applied, it is deleted. How do I update the
message-id for that in an automated manner. Right now a rebuild deleted
all the links and recreates them.

> > > Also I noticed that by moving it to the hold queue, the comments that
> > > may have existed on the patch queue are now gone :-( I'm not sure if
> > > there were any in this case, but it's better if we're aware of that
> > > fact. I think this could be solved if the "namespace" of the comment
> > > does not contain the patch queue name.
> >
> > I specifically set things up so the comments should move with the email.
>
> Yeah, I noticed that after sending the email -- the js-kit name seems to
> be only "msgid-<foo>". I thought the "permalink=" attribute was part of
> that, but perhaps not? In case you added that permalink attribute
> because of my request the other day, let me clarify that what I was
> actually thinking was having something like
>
> <a href="http://momjian.us/msgid/123foobar(at)mylaptop">permalink</a>

The permalink is for people who get email --- it tells them which
message got the comment. (People were complaining before I fixed that.)

> after the name of the poster, so that it would be visible on the index
> page and the user didn't have to open the page to get it.

We can do that if people want. I used to show the message id on the
thread page but several felt it was too cluttered-looking.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	NikhilS <nikkhils(at)gmail(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject:	Re: Auto Partitioning Patch - WIP version 1
Date:	2008-03-26 13:54:46
Message-ID:	20080326135446.GD8100@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Bruce Momjian escribió:
> Alvaro Herrera wrote:

> > My point is that you should only _add_ Message-Ids, not remove them.
> > You can move the messages from one queue to the other to your heart's
> > content, but the Message-Id URL should continue to work with no changes.
>
> Well, when an email is applied, it is deleted. How do I update the
> message-id for that in an automated manner. Right now a rebuild deleted
> all the links and recreates them.

Well, remove the part that deletes links, and keep the part that
creates links. That way, links that used to work continue working.

I am assuming you use hard links -- obviously this doesn't work with
symlinks. (My script creates hard links.)

> > after the name of the poster, so that it would be visible on the index
> > page and the user didn't have to open the page to get it.
>
> We can do that if people want. I used to show the message id on the
> thread page but several felt it was too cluttered-looking.

Well, it would have helped me. I suggested "permalink" as text because
it is less clutter than the full Message-Id. Besides, the Message-Id by
itself is useless, whereas a link is useful.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	NikhilS <nikkhils(at)gmail(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject:	Re: Auto Partitioning Patch - WIP version 1
Date:	2008-03-26 14:05:53
Message-ID:	200803261405.m2QE5rN21900@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Alvaro Herrera wrote:
> Bruce Momjian escribi?:
> > Alvaro Herrera wrote:
>
> > > My point is that you should only _add_ Message-Ids, not remove them.
> > > You can move the messages from one queue to the other to your heart's
> > > content, but the Message-Id URL should continue to work with no changes.
> >
> > Well, when an email is applied, it is deleted. How do I update the
> > message-id for that in an automated manner. Right now a rebuild deleted
> > all the links and recreates them.
>
> Well, remove the part that deletes links, and keep the part that
> creates links. That way, links that used to work continue working.
>
> I am assuming you use hard links -- obviously this doesn't work with
> symlinks. (My script creates hard links.)

I was using symlinks. I try to avoid hardlinks that cross directories
--- not sure why, probably because I can't easily manage them to find
out what something is linked to without playing with inodes. You are
right that if I used hard-links I put things in a separate directory,
and the hard links would still contain the email, though if the email
has been removed, should we still be returning it from a query? I
suppose I could delete items with only one link.

Well, if I move the permanent links to another directory, as you
suggest, I am going to invalidate every link. Maybe we should wait for
this commit fest to end and then I can create a permanent link directory
and use hard links.

Or is everyone OK with invalidating all the permanent links now.

> > > after the name of the poster, so that it would be visible on the index
> > > page and the user didn't have to open the page to get it.
> >
> > We can do that if people want. I used to show the message id on the
> > thread page but several felt it was too cluttered-looking.
>
> Well, it would have helped me. I suggested "permalink" as text because
> it is less clutter than the full Message-Id. Besides, the Message-Id by
> itself is useless, whereas a link is useful.

Are you talking about the permalink in the js-kit comment or the
permalink at the top of each message? What text do you want to be the
permalink? How is that created in an automated manner?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

From:	"Jaime Casanova" <jcasanov(at)systemguards(dot)com(dot)ec>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1
Date:	2008-10-22 14:44:46
Message-ID:	3073cc9b0810220744y987173avd560a33fcb58ede5@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

just remembering that -patches is a dead list, so i'm sending this to
-hackers where it will have more visibility...

---------- Forwarded message ----------
From: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
Date: Oct 22, 2008 9:43 AM
Subject: Re: [PATCHES] Auto Partitioning Patch - WIP version 1
To: Nikhil Sontakke <nikhil(dot)sontakke(at)enterprisedb(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, NikhilS <nikkhils(at)gmail(dot)com>,
Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-patches(at)postgresql(dot)org

On 10/22/08, Nikhil Sontakke <nikhil(dot)sontakke(at)enterprisedb(dot)com> wrote:
> Hi,
>
> > >
> > >> Thanks for taking a look. But if I am not mistaken Gavin and co. are
> working
> > >> on a much exhaustive proposal. In light of that maybe this patch might
> not
> > >> be needed in the first place?
> > >>
> > >> I will wait for discussion and a subsequent collective consensus here,
> > >> before deciding the further course of actions.
> > >
> > > I think it is unwise to wait on Gavin for a more complex implemention
> > > --- we might end up with nothing for 8.4. As long as your syntax is
> > > compatible with whatever Gavin proposed Gavin can add on to your patch
> > > once it is applied.
> > >
> >
> > seems like you're a prophet... or i miss something?
> >
>
> :)
>
> Maybe I will try to summarize the functionality of this patch, rebase it
> against latest CVS head and try to get it on the commitfest queue atleast
> for further feedback to keep the ball rolling on auto-partitioning...
>

yeah! i was thinking on doing that but still have no time... and
frankly you're the best man for the job ;)

one thing i was thinking of is to use triggers instead of rules just
as our current docs recommends
http://www.postgresql.org/docs/8.3/static/ddl-partitioning.html

with the benefit that a trigger can check if the child table exists
for the range being inserted and if not it can create it first...
haven't looked at the code in the detail but seems that your patch is
still missing the "create rule" part so we are in time to change
that... no?

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

From:	Emmanuel Cecchet <manu(at)frogthinker(dot)org>
To:	Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1
Date:	2008-10-22 21:49:53
Message-ID:	48FFA001.7030201@frogthinker.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Another advantage of triggers over rules is that it would work with COPY
which is probably a desired feature.

Emmanuel

Jaime Casanova wrote:
> just remembering that -patches is a dead list, so i'm sending this to
> -hackers where it will have more visibility...
>
> ---------- Forwarded message ----------
> From: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
> Date: Oct 22, 2008 9:43 AM
> Subject: Re: [PATCHES] Auto Partitioning Patch - WIP version 1
> To: Nikhil Sontakke <nikhil(dot)sontakke(at)enterprisedb(dot)com>
> Cc: Bruce Momjian <bruce(at)momjian(dot)us>, NikhilS <nikkhils(at)gmail(dot)com>,
> Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-patches(at)postgresql(dot)org
>
>
> On 10/22/08, Nikhil Sontakke <nikhil(dot)sontakke(at)enterprisedb(dot)com> wrote:
>
>> Hi,
>>
>>
>>>>> Thanks for taking a look. But if I am not mistaken Gavin and co. are
>>>>>
>> working
>>
>>>>> on a much exhaustive proposal. In light of that maybe this patch might
>>>>>
>> not
>>
>>>>> be needed in the first place?
>>>>>
>>>>> I will wait for discussion and a subsequent collective consensus here,
>>>>> before deciding the further course of actions.
>>>>>
>>>> I think it is unwise to wait on Gavin for a more complex implemention
>>>> --- we might end up with nothing for 8.4. As long as your syntax is
>>>> compatible with whatever Gavin proposed Gavin can add on to your patch
>>>> once it is applied.
>>>>
>>>>
>>> seems like you're a prophet... or i miss something?
>>>
>>>
>> :)
>>
>> Maybe I will try to summarize the functionality of this patch, rebase it
>> against latest CVS head and try to get it on the commitfest queue atleast
>> for further feedback to keep the ball rolling on auto-partitioning...
>>
>>
>
> yeah! i was thinking on doing that but still have no time... and
> frankly you're the best man for the job ;)
>
> one thing i was thinking of is to use triggers instead of rules just
> as our current docs recommends
> http://www.postgresql.org/docs/8.3/static/ddl-partitioning.html
>
> with the benefit that a trigger can check if the child table exists
> for the range being inserted and if not it can create it first...
> haven't looked at the code in the detail but seems that your patch is
> still missing the "create rule" part so we are in time to change
> that... no?
>

--
Emmanuel Cecchet
FTO @ Frog Thinker
Open Source Development & Consulting
--
Web: http://www.frogthinker.org
email: manu(at)frogthinker(dot)org
Skype: emmanuel_cecchet

From:	"Nikhil Sontakke" <nikhil(dot)sontakke(at)enterprisedb(dot)com>
To:	"Jaime Casanova" <jcasanov(at)systemguards(dot)com(dot)ec>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1
Date:	2008-10-23 12:41:41
Message-ID:	a301bfd90810230541h50fd7778q5f84498f7461b6f8@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Hi,

On Wed, Oct 22, 2008 at 8:14 PM, Jaime Casanova <
jcasanov(at)systemguards(dot)com(dot)ec> wrote:

> just remembering that -patches is a dead list, so i'm sending this to
> -hackers where it will have more visibility...
>
> ---------- Forwarded message ----------
> On 10/22/08, Nikhil Sontakke <nikhil(dot)sontakke(at)enterprisedb(dot)com> wrote:
> > Hi,
> >
> > > >
> > > >> Thanks for taking a look. But if I am not mistaken Gavin and co. are
> > working
> > > >> on a much exhaustive proposal. In light of that maybe this patch
> might
> > not
> > > >> be needed in the first place?
> > > >>
> > > >> I will wait for discussion and a subsequent collective consensus
> here,
> > > >> before deciding the further course of actions.
> > > >
> > > > I think it is unwise to wait on Gavin for a more complex implemention
> > > > --- we might end up with nothing for 8.4. As long as your syntax is
> > > > compatible with whatever Gavin proposed Gavin can add on to your
> patch
> > > > once it is applied.
> > > >
> > >
> > > seems like you're a prophet... or i miss something?
> > >
> >
> > :)
> >
> > Maybe I will try to summarize the functionality of this patch, rebase it
> > against latest CVS head and try to get it on the commitfest queue atleast
> > for further feedback to keep the ball rolling on auto-partitioning...
> >
>
> yeah! i was thinking on doing that but still have no time... and
> frankly you're the best man for the job ;)
>
> one thing i was thinking of is to use triggers instead of rules just
> as our current docs recommends
> http://www.postgresql.org/docs/8.3/static/ddl-partitioning.html
>
> with the benefit that a trigger can check if the child table exists
> for the range being inserted and if not it can create it first...
> haven't looked at the code in the detail but seems that your patch is
> still missing the "create rule" part so we are in time to change
> that... no?
>

Yes triggers should be used instead of rules. Automatic generation of
rules/triggers would be kind of hard and needs some looking into. Also there
are issues like checking mutual exclusivity of the partition clauses
specified too (I have been maintaining that the onus of ensuring sane
partition ranges/clauses should rest with the users atleast initially..).

I will take a stab at this again whenever I get some free cycles.

Regards,
Nikhils
--
http://www.enterprisedb.com

From:	"Nikhil Sontakke" <nikhil(dot)sontakke(at)enterprisedb(dot)com>
To:	"Jaime Casanova" <jcasanov(at)systemguards(dot)com(dot)ec>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1
Date:	2008-10-31 14:50:46
Message-ID:	a301bfd90810310750pf108c69x36499546f406650f@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Hi,

> > > >
>> > > >> Thanks for taking a look. But if I am not mistaken Gavin and co.
>> are
>> > working
>> > > >> on a much exhaustive proposal. In light of that maybe this patch
>> might
>> > not
>> > > >> be needed in the first place?
>> > > >>
>> > > >> I will wait for discussion and a subsequent collective consensus
>> here,
>> > > >> before deciding the further course of actions.
>> > > >
>> > > > I think it is unwise to wait on Gavin for a more complex
>> implemention
>> > > > --- we might end up with nothing for 8.4. As long as your syntax
>> is
>> > > > compatible with whatever Gavin proposed Gavin can add on to your
>> patch
>> > > > once it is applied.
>> > > >
>> > >
>> > > seems like you're a prophet... or i miss something?
>> > >
>> >
>> > :)
>> >
>> > Maybe I will try to summarize the functionality of this patch, rebase it
>> > against latest CVS head and try to get it on the commitfest queue
>> atleast
>> > for further feedback to keep the ball rolling on auto-partitioning...
>> >
>>
>> yeah! i was thinking on doing that but still have no time... and
>> frankly you're the best man for the job ;)
>>
>> one thing i was thinking of is to use triggers instead of rules just
>> as our current docs recommends
>> http://www.postgresql.org/docs/8.3/static/ddl-partitioning.html
>>
>> with the benefit that a trigger can check if the child table exists
>> for the range being inserted and if not it can create it first...
>> haven't looked at the code in the detail but seems that your patch is
>> still missing the "create rule" part so we are in time to change
>> that... no?
>>
>
> Yes triggers should be used instead of rules. Automatic generation of
> rules/triggers would be kind of hard and needs some looking into. Also there
> are issues like checking mutual exclusivity of the partition clauses
> specified too (I have been maintaining that the onus of ensuring sane
> partition ranges/clauses should rest with the users atleast initially..).
>
> I will take a stab at this again whenever I get some free cycles.
>

I have synced up and modified the patch against latest CVS sources. Am
attaching the latest WIP patch here.

Am restating that its a WIP patch, more so because we really need feedback
on this before trying to expend any energy trying to come up with a
commit-able patch.

As per me, the syntax introduced by this patch should be similar to what was
proposed by Gavin quite a while back and this patch essentially tries to
bring together a bunch of ddl that would otherwise have been performed
step-by-step in a manual fashion earlier. To summarize this patch provides a
one-shot mechanism to:

-- * create master table
-- * create several child tables that inherit from this master table
-- * add appropriate constraints to each of the child tables
-- * create a trigger function to redirect insert, updates, deletes to
-- appropriate child tables (plpgsql language)
-- * create the trigger using the trigger function

I have created a new file (src/test/regress/sql/partition.sql) to show a
couple of examples of the grammar and the working functionality:

There are TODOs like:
-- logic to ensure unique trigger function and trigger names
-- The trigger function body could raise an exception if the
insert/update/delete operation does not fit into any single partition
-- logic to check mutual exclusivity of ranges/lists
-- misc. issues to convert it from wip to commit-ready

If we think this is ok as a first step towards auto-partitioning then we can
do something more with this patch.

Regards,
Nikhils
--
http://www.enterprisedb.com

Attachment	Content-Type	Size
partitioning-nov-commitfest-wip-v1.0.patch.tar.gz	application/x-gzip	8.6 KB

From:	Emmanuel Cecchet <manu(at)frogthinker(dot)org>
To:	Nikhil Sontakke <nikhil(dot)sontakke(at)enterprisedb(dot)com>
Cc:	Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1
Date:	2008-10-31 18:42:00
Message-ID:	490B5178.7070600@frogthinker.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Hi Nikhil,

Here are a couple of questions:
- How do you ALTER the table to repartition it?
- The trigger function for inserts could be improved by using ELSE
instead of independent IFs. This would ensure that the row is inserted
in at most 1 partition. The last ELSE should raise an exception if there
was no match (that would solve point 2 of your TODO list).
- Another option is to have a separate trigger per child table and chain
them to the master table. For example something like:
CREATE OR REPLACE FUNCTION child_insert_trigger()
RETURNS TRIGGER AS $$
BEGIN
IF (NEW.date >= DATE(TG_ARGV[1]) AND NEW.date < DATE(TG_ARGV[2]) ) THEN
INSERT INTO TG_ARGV[0] VALUES (NEW.*);
RETURN NULL;
END IF;
RETURN NEW;
END;
$$
LANGUAGE plpgsql;
DROP TRIGGER IF EXISTS insert_child_trigger ON master;
CREATE TRIGGER insert_child_trigger_y2008m01
BEFORE INSERT ON master
FOR EACH ROW EXECUTE PROCEDURE
child_insert_trigger_date(child_y2008m01, '2008-01-01', '2008-02-01');

CREATE TRIGGER insert_child_trigger_y2008m02
BEFORE INSERT ON master
FOR EACH ROW EXECUTE PROCEDURE
child_insert_trigger_date(child_y2008m02, '2008-02-01', '2008-03-01');

CREATE TRIGGER insert_child_trigger_y2008m03
BEFORE INSERT ON master
FOR EACH ROW EXECUTE PROCEDURE
child_insert_trigger_date(child_y2008m03, '2008-03-01', '2008-04-01');

This might make it easier when you want to alter a specific partition
rather than rewriting the whole trigger. Performance-wise, I am not sure
how chained triggers will compare to the big if/then/else trigger.

- In the case of an insert, could it be possible to avoid the cost of a
new INSERT statement (parser, planner, executor, etc...) by moving
directly the tuple in the right table like the COPY code does? If we had
an INSERT trigger code in C, given a HeapTuple and a target Relation we
should be able to call heap_insert_tuple directly, with no parsing,
planning, etc. required.

Thanks for your time,
Emmanuel

> Hi,
>
>
> > > >
> > > >> Thanks for taking a look. But if I am not mistaken
> Gavin and co. are
> > working
> > > >> on a much exhaustive proposal. In light of that maybe
> this patch might
> > not
> > > >> be needed in the first place?
> > > >>
> > > >> I will wait for discussion and a subsequent collective
> consensus here,
> > > >> before deciding the further course of actions.
> > > >
> > > > I think it is unwise to wait on Gavin for a more complex
> implemention
> > > > --- we might end up with nothing for 8.4. As long as
> your syntax is
> > > > compatible with whatever Gavin proposed Gavin can add on
> to your patch
> > > > once it is applied.
> > > >
> > >
> > > seems like you're a prophet... or i miss something?
> > >
> >
> > :)
> >
> > Maybe I will try to summarize the functionality of this
> patch, rebase it
> > against latest CVS head and try to get it on the commitfest
> queue atleast
> > for further feedback to keep the ball rolling on
> auto-partitioning...
> >
>
> yeah! i was thinking on doing that but still have no time... and
> frankly you're the best man for the job ;)
>
> one thing i was thinking of is to use triggers instead of
> rules just
> as our current docs recommends
> http://www.postgresql.org/docs/8.3/static/ddl-partitioning.html
>
> with the benefit that a trigger can check if the child table
> exists
> for the range being inserted and if not it can create it first...
> haven't looked at the code in the detail but seems that your
> patch is
> still missing the "create rule" part so we are in time to change
> that... no?
>
>
> Yes triggers should be used instead of rules. Automatic generation
> of rules/triggers would be kind of hard and needs some looking
> into. Also there are issues like checking mutual exclusivity of
> the partition clauses specified too (I have been maintaining that
> the onus of ensuring sane partition ranges/clauses should rest
> with the users atleast initially..).
>
> I will take a stab at this again whenever I get some free cycles.
>
>
> I have synced up and modified the patch against latest CVS sources. Am
> attaching the latest WIP patch here.
>
> Am restating that its a WIP patch, more so because we really need
> feedback on this before trying to expend any energy trying to come up
> with a commit-able patch.
>
> As per me, the syntax introduced by this patch should be similar to
> what was proposed by Gavin quite a while back and this patch
> essentially tries to bring together a bunch of ddl that would
> otherwise have been performed step-by-step in a manual fashion
> earlier. To summarize this patch provides a one-shot mechanism to:
>
> -- * create master table
> -- * create several child tables that inherit from this master table
> -- * add appropriate constraints to each of the child tables
> -- * create a trigger function to redirect insert, updates, deletes to
> -- appropriate child tables (plpgsql language)
> -- * create the trigger using the trigger function
>
> I have created a new file (src/test/regress/sql/partition.sql) to show
> a couple of examples of the grammar and the working functionality:
>
> There are TODOs like:
> -- logic to ensure unique trigger function and trigger names
> -- The trigger function body could raise an exception if the
> insert/update/delete operation does not fit into any single partition
> -- logic to check mutual exclusivity of ranges/lists
> -- misc. issues to convert it from wip to commit-ready
>
> If we think this is ok as a first step towards auto-partitioning then
> we can do something more with this patch.
>
> Regards,
> Nikhils
> --
> http://www.enterprisedb.com
> ------------------------------------------------------------------------
>
>
>

--
Emmanuel Cecchet
FTO @ Frog Thinker
Open Source Development & Consulting
--
Web: http://www.frogthinker.org
email: manu(at)frogthinker(dot)org
Skype: emmanuel_cecchet

From:	"Jaime Casanova" <jcasanov(at)systemguards(dot)com(dot)ec>
To:	"Emmanuel Cecchet" <manu(at)frogthinker(dot)org>
Cc:	"Nikhil Sontakke" <nikhil(dot)sontakke(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1
Date:	2008-11-06 04:47:29
Message-ID:	3073cc9b0811052047o4ebe24b4vd0ab24fd3095d342@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

On Fri, Oct 31, 2008 at 7:42 PM, Emmanuel Cecchet <manu(at)frogthinker(dot)org> wrote:
> Hi Nikhil,
>

i'm looking at this one:
http://archives.postgresql.org/message-id/a301bfd90810310750pf108c69x36499546f406650f@mail.gmail.com

> Here are a couple of questions:
> - How do you ALTER the table to repartition it?

fair question. but the patch was advertized to only automate some
tasks that we do manually... so keeping the same limitations seems
reasonably to me...

> - Another option is to have a separate trigger per child table and chain
> them to the master table. For example something like:

that sounds like a lot of overhead...

---------------

Now, about the patch...

- seems strange the need to create plpgsql language before we can
create any partitioned table but given that the trigger is a plpgsql
function (and a c function can't be used because we could need to add
new partitions) it seems necesary... ideas?

- the update part of the trigger looks very simplistic... if the new
values isn't in the range accepted by the partition it errors out
because of the check constraint... can't we be a little smarter,
delete from the actual partition and insert in the new one...

for the rest, the patch passes all regression tests and seems to work
as advertized....

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

From:	"Jaime Casanova" <jcasanov(at)systemguards(dot)com(dot)ec>
To:	"Emmanuel Cecchet" <manu(at)frogthinker(dot)org>
Cc:	"Nikhil Sontakke" <nikhil(dot)sontakke(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1
Date:	2008-11-26 21:35:23
Message-ID:	3073cc9b0811261335q5e35d336i33faa17eede7d417@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

On Wed, Nov 5, 2008 at 11:47 PM, Jaime Casanova
<jcasanov(at)systemguards(dot)com(dot)ec> wrote:
> On Fri, Oct 31, 2008 at 7:42 PM, Emmanuel Cecchet <manu(at)frogthinker(dot)org> wrote:
>> Hi Nikhil,
>>
>
> i'm looking at this one:
> http://archives.postgresql.org/message-id/a301bfd90810310750pf108c69x36499546f406650f@mail.gmail.com
>

'cause the great interest this one has (i'm being ironic, just in case
;) can we safely say this was returned with feedback and remove it
from the list of pending patches?

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

From:	"Robert Haas" <robertmhaas(at)gmail(dot)com>
To:	"Jaime Casanova" <jcasanov(at)systemguards(dot)com(dot)ec>
Cc:	"Emmanuel Cecchet" <manu(at)frogthinker(dot)org>, "Nikhil Sontakke" <nikhil(dot)sontakke(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1
Date:	2008-11-26 21:52:06
Message-ID:	603c8f070811261352ta6da0b8xcbb8aaf87f050e6e@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

> 'cause the great interest this one has (i'm being ironic, just in case
> ;) can we safely say this was returned with feedback and remove it
> from the list of pending patches?

Um... are you referring to lack of interest from the patch author, or
from the community?

If the patch author is no longer interested in the patch, of course it
should be withdrawn. But as for the community, the patch is on the
commitfest wiki[1] and you are listed as the reviewer, so I wouldn't
necessarily expect anyone else to comment at this point - although, in
fact, Emmanuel Cecchet wrote in as well, so I would say you have
exactly the opposite of a lack of interest.

If you think the patch needs further review from another reviewer, say
so. I'm sure someone else can be assigned to do an additional review.

If you think the patch is ready to commit, say so, and update the wiki
accordingly.

...Robert

[1] http://wiki.postgresql.org/wiki/CommitFest_2008-11

From:	"Jaime Casanova" <jcasanov(at)systemguards(dot)com(dot)ec>
To:	"Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc:	"Emmanuel Cecchet" <manu(at)frogthinker(dot)org>, "Nikhil Sontakke" <nikhil(dot)sontakke(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1
Date:	2008-11-27 00:13:42
Message-ID:	3073cc9b0811261613l18e5283ft448ea74fed414dd@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

On Wed, Nov 26, 2008 at 10:52 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> 'cause the great interest this one has (i'm being ironic, just in case
>> ;) can we safely say this was returned with feedback and remove it
>> from the list of pending patches?
>
> the patch is on the
> commitfest wiki[1] and you are listed as the reviewer, so I wouldn't
> necessarily expect anyone else to comment at this point - although, in
> fact, Emmanuel Cecchet wrote in as well, so I would say you have
> exactly the opposite of a lack of interest.
>

i review it on nov 6, and there were open questions by me and by
Emmanuel none of those has been answered:
http://archives.postgresql.org/pgsql-hackers/2008-11/msg00362.php

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

From:	"Robert Haas" <robertmhaas(at)gmail(dot)com>
To:	"Jaime Casanova" <jcasanov(at)systemguards(dot)com(dot)ec>
Cc:	"Emmanuel Cecchet" <manu(at)frogthinker(dot)org>, "Nikhil Sontakke" <nikhil(dot)sontakke(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1
Date:	2008-11-27 04:09:46
Message-ID:	603c8f070811262009v459d0701w151c16533d4ab947@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

> i review it on nov 6, and there were open questions by me and by
> Emmanuel none of those has been answered:
> http://archives.postgresql.org/pgsql-hackers/2008-11/msg00362.php

Hmm, there's only one actual question in that email, which is a
request for ideas about PL/pgsql vs. C. I suspect you didn't get any
responses because the rest of the email seems to indicate that the
patch is not very mature at this point: for example, being able to
handle updates that move rows between partitions would seem to me to
be an essential feature for a project of this type, even though there
are many practical scenarios were it's unimportant. Likewise, being
able to repartition sounds important.

With respect to the specific question about PL/pgsql vs C, I suspect
it's very unlikely that any patch of this type that relies on PL/pgsql
being loaded would be accepted into core. However, it's possible that
a useful contrib module or pgfoundry project could be spawned on that
basis, and that might be a good place to start.

I think having a useful toolkit, or a core language feature, that
supports table partitioning would be awesome and would find very broad
application... but it sounds like there is quite a bit of work left
to be done to get there.

...Robert

From:	"Nikhil Sontakke" <nikhil(dot)sontakke(at)enterprisedb(dot)com>
To:	"Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc:	"Jaime Casanova" <jcasanov(at)systemguards(dot)com(dot)ec>, "Emmanuel Cecchet" <manu(at)frogthinker(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1
Date:	2008-11-27 09:06:16
Message-ID:	a301bfd90811270106k39d60318s7fc871a477a2b58d@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Hi,

> i review it on nov 6, and there were open questions by me and by
> > Emmanuel none of those has been answered:
> > http://archives.postgresql.org/pgsql-hackers/2008-11/msg00362.php
>
> Hmm, there's only one actual question in that email, which is a
> request for ideas about PL/pgsql vs. C. I suspect you didn't get any
> responses because the rest of the email seems to indicate that the
> patch is not very mature at this point: for example, being able to
> handle updates that move rows between partitions would seem to me to
> be an essential feature for a project of this type, even though there
> are many practical scenarios were it's unimportant. Likewise, being
> able to repartition sounds important.
>
> With respect to the specific question about PL/pgsql vs C, I suspect
> it's very unlikely that any patch of this type that relies on PL/pgsql
> being loaded would be accepted into core. However, it's possible that
> a useful contrib module or pgfoundry project could be spawned on that
> basis, and that might be a good place to start.
>
> I think having a useful toolkit, or a core language feature, that
> supports table partitioning would be awesome and would find very broad
> application... but it sounds like there is quite a bit of work left
> to be done to get there.
>

This patch does introduce some basic syntax to help create partitions.

The status has always being WIP, because what has not happened is that we
have not had consensus on whether this is a logical first baby step ahead
with partitioning. I haven't seen core members commenting on whether trying
to aggregate the current set of manual operations together via this approach
is worth spending further efforts, to get it into commitable shape.

To summarize, the community should decide if this is indeed the first step
ahead.

Regards,
Nikhils
--
http://www.enterprisedb.com

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Nikhil Sontakke <nikhil(dot)sontakke(at)enterprisedb(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, Emmanuel Cecchet <manu(at)frogthinker(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1
Date:	2008-11-27 12:04:03
Message-ID:	20081127120402.GC4586@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Nikhil Sontakke escribió:

> The status has always being WIP, because what has not happened is that we
> have not had consensus on whether this is a logical first baby step ahead
> with partitioning. I haven't seen core members commenting on whether trying
> to aggregate the current set of manual operations together via this approach
> is worth spending further efforts, to get it into commitable shape.

There was a lenghty, interesting discussion about this topic in the
developer meeting in Ottawa.
http://wiki.postgresql.org/wiki/PgCon_2008_Developer_Meeting#Partitioning_Roadmap

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

From:	"Robert Haas" <robertmhaas(at)gmail(dot)com>
To:	"Alvaro Herrera" <alvherre(at)commandprompt(dot)com>
Cc:	"Nikhil Sontakke" <nikhil(dot)sontakke(at)enterprisedb(dot)com>, "Jaime Casanova" <jcasanov(at)systemguards(dot)com(dot)ec>, "Emmanuel Cecchet" <manu(at)frogthinker(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1
Date:	2008-11-27 13:07:51
Message-ID:	603c8f070811270507i36e18e0bu5d461a4f957e24d3@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

On Thu, Nov 27, 2008 at 7:04 AM, Alvaro Herrera
<alvherre(at)commandprompt(dot)com> wrote:
> Nikhil Sontakke escribió:
>
>> The status has always being WIP, because what has not happened is that we
>> have not had consensus on whether this is a logical first baby step ahead
>> with partitioning. I haven't seen core members commenting on whether trying
>> to aggregate the current set of manual operations together via this approach
>> is worth spending further efforts, to get it into commitable shape.
>
> There was a lenghty, interesting discussion about this topic in the
> developer meeting in Ottawa.
> http://wiki.postgresql.org/wiki/PgCon_2008_Developer_Meeting#Partitioning_Roadmap

Interesting - too bad there aren't some mode detailed notes.

The semantics of PARTITION ON (<expr>) are unclear to me. I was
thinking maybe it would make sense to do something like:

CREATE PARTITION <name> ON <table> WHERE <expr>

Then you could:

CREATE PARTITION transaction_2008_11 ON transaction WHERE record_date
BETWEEN '2008-11-01' AND '2008-11-30';

I like the idea of using table inheritance as a foundation for this
feature, but I think it's not going to be very useful for real-world
applications without cross-table indexes. Suppose for example that I
have five years worth of data (thus, 60 partitions) and each
transaction has a unique identifier of some sort that is unrelated to
the date. It's bad enough that a query like this has to check every
partition:

SELECT * FROM transaction WHERE uuid = ?

What's even worse (at least IMHO) is that there's no way to use
transaction (uuid) as a reference for a foreign key.

...Robert

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	"Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc:	"Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Nikhil Sontakke" <nikhil(dot)sontakke(at)enterprisedb(dot)com>, "Jaime Casanova" <jcasanov(at)systemguards(dot)com(dot)ec>, "Emmanuel Cecchet" <manu(at)frogthinker(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1
Date:	2008-11-27 13:31:04
Message-ID:	871vwxpa7b.fsf@oxford.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

"Robert Haas" <robertmhaas(at)gmail(dot)com> writes:

> CREATE PARTITION transaction_2008_11 ON transaction WHERE record_date
> BETWEEN '2008-11-01' AND '2008-11-30';

I think the main advantage to a better partitioning method would be teaching
Postgres about the partition key. Instead of a collection of different
constraints Postgres would know that "record_date" is *always* the partition
key. So it wouldn't have to be specified every time you declare a partition.

> I like the idea of using table inheritance as a foundation for this
> feature, but I think it's not going to be very useful for real-world
> applications without cross-table indexes.

Well we could add support for cross-table indexes. It's not hard from the
point of low level implementation -- just include the table oid in the index
pointers. Figuring out how to represent such a thing at the index description
point of view would be quite tricky though.

*But*... in practice I would suggest that cross-table indexes are actually
very rarely useful. Having them defeats much of the advantage of partitioning
in the first place. Suddenly you would not be able to instantly drop and load
whole partitions. They're a big check-list item that people want to have
before they partition in case they need them but then they find out that the
down-sides of actually using them makes them quite useless.

Postgres's current architecture actually has a big advantage over more
methodical partitioning methods in this case. You can always add additional
constraints on other columns even if they aren't the "real" partitioning key.
So for example if you partition the invoice table by month once you close the
books for a previous month you can add a constraint WHERE invoice_id < 'xxx'.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!

From:	"Nikhil Sontakke" <nikhil(dot)sontakke(at)enterprisedb(dot)com>
To:	"Alvaro Herrera" <alvherre(at)commandprompt(dot)com>
Cc:	"Robert Haas" <robertmhaas(at)gmail(dot)com>, "Jaime Casanova" <jcasanov(at)systemguards(dot)com(dot)ec>, "Emmanuel Cecchet" <manu(at)frogthinker(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1
Date:	2008-11-27 14:01:27
Message-ID:	a301bfd90811270601wb7d0cc3k397328d109b54dd7@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Hi,

>
> > The status has always being WIP, because what has not happened is that we
> > have not had consensus on whether this is a logical first baby step ahead
> > with partitioning. I haven't seen core members commenting on whether
> trying
> > to aggregate the current set of manual operations together via this
> approach
> > is worth spending further efforts, to get it into commitable shape.
>
> There was a lenghty, interesting discussion about this topic in the
> developer meeting in Ottawa.
>
> http://wiki.postgresql.org/wiki/PgCon_2008_Developer_Meeting#Partitioning_Roadmap
>

The discussion is indeed interesting. But again the notes do not indicate
any broad consensus on the roadmap :).

The current inheritance based mechanism has its pros-cons and there seem to
be a multitude of requests/expectations around partitioning from different
quarters. Even basic consensus about the syntax is missing. What we need is
a step-by-step approach (starting with fixing up the syntax - if it can be
done like that) and working our way downwards towards the underlying
representation/planning for partitions...

Regards,
Nikhils
--
http://www.enterprisedb.com

From:	"Jaime Casanova" <jcasanov(at)systemguards(dot)com(dot)ec>
To:	"Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc:	"Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Nikhil Sontakke" <nikhil(dot)sontakke(at)enterprisedb(dot)com>, "Emmanuel Cecchet" <manu(at)frogthinker(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1
Date:	2008-11-27 14:01:56
Message-ID:	3073cc9b0811270601m6a7691dcg5f2ea6d4172b0d38@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

On Thu, Nov 27, 2008 at 8:07 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> The semantics of PARTITION ON (<expr>) are unclear to me. I was
> thinking maybe it would make sense to do something like:
>
> CREATE PARTITION <name> ON <table> WHERE <expr>
>

At first look seems nice but s Gregory said the ideal would be to
identify the key partition.

>
> I like the idea of using table inheritance as a foundation for this
> feature, but I think it's not going to be very useful for real-world
> applications without cross-table indexes. Suppose for example that I
> have five years worth of data (thus, 60 partitions) and each
> transaction has a unique identifier of some sort that is unrelated to
> the date. It's bad enough that a query like this has to check every
> partition:
>

you haven't. the WHERE clause in your hipotetical CREATE PARTITION
should create a check constraint on the child (inherited) table and if
you have constraint_exclusion to on you will check just the
partition(s) that match with the check constraint.

>
> What's even worse (at least IMHO) is that there's no way to use
> transaction (uuid) as a reference for a foreign key.
>

not directly, but you always can create a trigger instead of the
foreign key constraint...
mmm...the docs says that there is no good workaround, what about
mention a trigger?

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

From:	"Robert Haas" <robertmhaas(at)gmail(dot)com>
To:	"Gregory Stark" <stark(at)enterprisedb(dot)com>
Cc:	"Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Nikhil Sontakke" <nikhil(dot)sontakke(at)enterprisedb(dot)com>, "Jaime Casanova" <jcasanov(at)systemguards(dot)com(dot)ec>, "Emmanuel Cecchet" <manu(at)frogthinker(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1
Date:	2008-11-27 14:41:04
Message-ID:	603c8f070811270641u55bbf340p3383dcc6a6f1f6e8@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

On Thu, Nov 27, 2008 at 8:31 AM, Gregory Stark <stark(at)enterprisedb(dot)com> wrote:
>> CREATE PARTITION transaction_2008_11 ON transaction WHERE record_date
>> BETWEEN '2008-11-01' AND '2008-11-30';
>
> I think the main advantage to a better partitioning method would be teaching
> Postgres about the partition key. Instead of a collection of different
> constraints Postgres would know that "record_date" is *always* the partition
> key. So it wouldn't have to be specified every time you declare a partition.

Hmm... I thought the main advantage would be that you wouldn't have
to manually add constraints to all of the child tables, and you
wouldn't have to manually add rules/triggers to the parent table to
redirect DML operations.

What do you see as the advantage of pre-declaring record_date as the
partition key? The major advantage I can think of is that it should
simplify constraint exclusion calculations considerably. Also, you
can easily enforce that partitions are non-overlapping. The
disadvantage is that you can't support more complex partitioning
schemes that can't be expressed in terms of ranges on a single key (an
obvious case is when you want to partition by date AND transaction
type, though that could probably be made to work if you allow
specifying multiple partition keys; less tractable cases are
imaginable).

I guess we could decide we don't care about the more complex
scenarios. Or we could offer:

CREATE TABLE (...) WITH PARTITIONING; -- ad-hoc partitioning
CREATE TABLE (...) WITH PARTITIONING ON (...); -- partition keys must
be non-overlapping slices based only on the given columns

> *But*... in practice I would suggest that cross-table indexes are actually
> very rarely useful. Having them defeats much of the advantage of partitioning
> in the first place. Suddenly you would not be able to instantly drop and load
> whole partitions. They're a big check-list item that people want to have
> before they partition in case they need them but then they find out that the
> down-sides of actually using them makes them quite useless.

That's possible. My every attempt to use inheritance has been stymied
by lack of this feature, but my attempts may not be representative.
In any case, the projects are severable.

> Postgres's current architecture actually has a big advantage over more
> methodical partitioning methods in this case. You can always add additional
> constraints on other columns even if they aren't the "real" partitioning key.
> So for example if you partition the invoice table by month once you close the
> books for a previous month you can add a constraint WHERE invoice_id < 'xxx'.

That's cool.

...Robert

From:	"Robert Haas" <robertmhaas(at)gmail(dot)com>
To:	"Jaime Casanova" <jcasanov(at)systemguards(dot)com(dot)ec>
Cc:	"Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Nikhil Sontakke" <nikhil(dot)sontakke(at)enterprisedb(dot)com>, "Emmanuel Cecchet" <manu(at)frogthinker(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1
Date:	2008-11-27 14:47:27
Message-ID:	603c8f070811270647g1f069ed5ucefbb755dac7aa58@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

>> I like the idea of using table inheritance as a foundation for this
>> feature, but I think it's not going to be very useful for real-world
>> applications without cross-table indexes. Suppose for example that I
>> have five years worth of data (thus, 60 partitions) and each
>> transaction has a unique identifier of some sort that is unrelated to
>> the date. It's bad enough that a query like this has to check every
>> partition:
> you haven't. the WHERE clause in your hipotetical CREATE PARTITION
> should create a check constraint on the child (inherited) table and if
> you have constraint_exclusion to on you will check just the
> partition(s) that match with the check constraint.

The problem is that constraint exclusion will not be able to exclude
anything for queries unrelated to the partition key. If my
transactions are identified by UUIDs or similar, there's no way to
predict which table will contain any particular value. You end up
having to scan them all, and even if they all have individual indices
on the column in question, that's still 60 index scans instead of 1.

>> What's even worse (at least IMHO) is that there's no way to use
>> transaction (uuid) as a reference for a foreign key.
> not directly, but you always can create a trigger instead of the
> foreign key constraint...
> mmm...the docs says that there is no good workaround, what about
> mention a trigger?

I think it's pretty hard to make this bulletproof. I think the
triggers that enforce ordinary foreign key constraints contain some
magical cross-checks on transaction commit that can't easily be
emulated by user-written triggers. In any case, it's a long way from
"Oh, yeah, that just works."

...Robert

From:	"Jaime Casanova" <jcasanov(at)systemguards(dot)com(dot)ec>
To:	"Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc:	"Gregory Stark" <stark(at)enterprisedb(dot)com>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Nikhil Sontakke" <nikhil(dot)sontakke(at)enterprisedb(dot)com>, "Emmanuel Cecchet" <manu(at)frogthinker(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1
Date:	2008-11-27 15:10:38
Message-ID:	3073cc9b0811270710p3a2e5ae9s10f66b529550c03e@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

On Thu, Nov 27, 2008 at 9:41 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Nov 27, 2008 at 8:31 AM, Gregory Stark <stark(at)enterprisedb(dot)com> wrote:
>>> CREATE PARTITION transaction_2008_11 ON transaction WHERE record_date
>>> BETWEEN '2008-11-01' AND '2008-11-30';
>>
>> I think the main advantage to a better partitioning method would be teaching
>> Postgres about the partition key. Instead of a collection of different
>> constraints Postgres would know that "record_date" is *always* the partition
>> key. So it wouldn't have to be specified every time you declare a partition.
>
> Hmm... I thought the main advantage would be that you wouldn't have
> to manually add constraints to all of the child tables, and you
> wouldn't have to manually add rules/triggers to the parent table to
> redirect DML operations.
>

ok. what about let CREATE TABLE WITH PARTITIONING to create an entry
in a catalog indicating the key of the partition and install the
triggers and let the trigger decide if it has the partition to insert
the new row (making UPDATE working almost as DELETE+INSERT if it needs
to change of partitions) or create the new partition maybe with an
apropiate CREATE PARTITION...

that way i don't need to create triggers nor inherit tables
manually... and because of that maybe we can make possible to add
<expr> as partition key...

PS: i'm against using CREATE TABLE because we are inventing new syntax
but it seems like using ALTER TABLE is a *lot* of work altough ISTM
more usefull

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

From:	"Jaime Casanova" <jcasanov(at)systemguards(dot)com(dot)ec>
To:	"Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc:	"Gregory Stark" <stark(at)enterprisedb(dot)com>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Nikhil Sontakke" <nikhil(dot)sontakke(at)enterprisedb(dot)com>, "Emmanuel Cecchet" <manu(at)frogthinker(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1
Date:	2008-11-27 15:12:13
Message-ID:	3073cc9b0811270712l6384c28awac4f0318c704df99@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

On Thu, Nov 27, 2008 at 10:10 AM, Jaime Casanova
<jcasanov(at)systemguards(dot)com(dot)ec> wrote:
> On Thu, Nov 27, 2008 at 9:41 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Thu, Nov 27, 2008 at 8:31 AM, Gregory Stark <stark(at)enterprisedb(dot)com> wrote:
>>>> CREATE PARTITION transaction_2008_11 ON transaction WHERE record_date
>>>> BETWEEN '2008-11-01' AND '2008-11-30';
>>>
>>> I think the main advantage to a better partitioning method would be teaching
>>> Postgres about the partition key. Instead of a collection of different
>>> constraints Postgres would know that "record_date" is *always* the partition
>>> key. So it wouldn't have to be specified every time you declare a partition.
>>
>> Hmm... I thought the main advantage would be that you wouldn't have
>> to manually add constraints to all of the child tables, and you
>> wouldn't have to manually add rules/triggers to the parent table to
>> redirect DML operations.
>>
>
> ok. what about let CREATE TABLE WITH PARTITIONING to create an entry
> in a catalog indicating the key of the partition and install the
> triggers and let the trigger decide if it has the partition to insert
> the new row (making UPDATE working almost as DELETE+INSERT if it needs
> to change of partitions) or create the new partition maybe with an
> apropiate CREATE PARTITION...
>

i thik i have to clarify this...

i intend to say that, the trigger will insert or create the partition
and insert...

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

From:	Emmanuel Cecchet <manu(at)frogthinker(dot)org>
To:	Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Nikhil Sontakke <nikhil(dot)sontakke(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1
Date:	2008-11-27 16:09:43
Message-ID:	492EC647.4090706@frogthinker.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Hi all,

I have been following that discussion very closely but it seems that we
are debating solutions without a good specification of the
problem/requirements.
I would suggest that we collect all the partitioning requirements on a
dedicated Wiki page. There might not be a one size fits it all solution
for all requirements. We can also look at what other databases are
proposing to address these issues.
If we can prioritize features, that should also allow us to stage the
partitioning implementation.
I have a prototype insert trigger in C that directly move inserts in a
master table to the appropriate child table (directly moving the tuple).
Let me know if anyone is interested.

Emmanuel

Jaime Casanova wrote:
> On Thu, Nov 27, 2008 at 10:10 AM, Jaime Casanova
> <jcasanov(at)systemguards(dot)com(dot)ec> wrote:
>
>> On Thu, Nov 27, 2008 at 9:41 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>
>>> On Thu, Nov 27, 2008 at 8:31 AM, Gregory Stark <stark(at)enterprisedb(dot)com> wrote:
>>>
>>>>> CREATE PARTITION transaction_2008_11 ON transaction WHERE record_date
>>>>> BETWEEN '2008-11-01' AND '2008-11-30';
>>>>>
>>>> I think the main advantage to a better partitioning method would be teaching
>>>> Postgres about the partition key. Instead of a collection of different
>>>> constraints Postgres would know that "record_date" is *always* the partition
>>>> key. So it wouldn't have to be specified every time you declare a partition.
>>>>
>>> Hmm... I thought the main advantage would be that you wouldn't have
>>> to manually add constraints to all of the child tables, and you
>>> wouldn't have to manually add rules/triggers to the parent table to
>>> redirect DML operations.
>>>
>>>
>> ok. what about let CREATE TABLE WITH PARTITIONING to create an entry
>> in a catalog indicating the key of the partition and install the
>> triggers and let the trigger decide if it has the partition to insert
>> the new row (making UPDATE working almost as DELETE+INSERT if it needs
>> to change of partitions) or create the new partition maybe with an
>> apropiate CREATE PARTITION...
>>
>>
>
> i thik i have to clarify this...
>
> i intend to say that, the trigger will insert or create the partition
> and insert...
>
>

--
Emmanuel Cecchet
FTO @ Frog Thinker
Open Source Development & Consulting
--
Web: http://www.frogthinker.org
email: manu(at)frogthinker(dot)org
Skype: emmanuel_cecchet

From:	"Robert Haas" <robertmhaas(at)gmail(dot)com>
To:	"Jaime Casanova" <jcasanov(at)systemguards(dot)com(dot)ec>
Cc:	"Gregory Stark" <stark(at)enterprisedb(dot)com>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Nikhil Sontakke" <nikhil(dot)sontakke(at)enterprisedb(dot)com>, "Emmanuel Cecchet" <manu(at)frogthinker(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1
Date:	2008-11-27 16:38:39
Message-ID:	603c8f070811270838iebac4d4g172dd97303e79826@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

> ok. what about let CREATE TABLE WITH PARTITIONING to create an entry
> in a catalog indicating the key of the partition and install the
> triggers and let the trigger decide if it has the partition to insert
> the new row (making UPDATE working almost as DELETE+INSERT if it needs
> to change of partitions) or create the new partition maybe with an
> apropiate CREATE PARTITION...
>
> that way i don't need to create triggers nor inherit tables
> manually... and because of that maybe we can make possible to add
> <expr> as partition key...
>
>
> PS: i'm against using CREATE TABLE because we are inventing new syntax
> but it seems like using ALTER TABLE is a *lot* of work altough ISTM
> more usefull

I think that's one of the useful things that could be done in this
area (not the only one, certainly), but I don't think we've defined
the semantics well enough to start talking about exactly which
commands to use. As to CREATE TABLE and ALTER TABLE, I suspect you'll
need both. We have to come to some consensus on whether predefining a
partition key is necessary, optional, or not supported. And we need
to define ways both to set things up and to change them later.

If there is no predefined partition key, there's probably nothing
terribly special that needs to be done to prepare a table for
partitioning. You could decide that all the data will live in the
parent table except for the partitions that are explicitly created.
When the user creates a partition, you create the new child table, set
it to inherit from the parent, add the necessary constraint,
create/update an automatically generated rule/trigger on the parent
that redirects DML to the appropriate partition, and move any EXISTING
tuples that belong in that partition into it. You'd also need
operations to merge a partition back into the parent table (moving the
data back), drop a partition (lose the data), and change the
definition of a partition (move data around).

A significant problem with this design is that you don't know that the
partition constraints are mutually exclusive. What do you do with
data that matches multiple partition constraints? You'll have to
devise some rule, like maybe picking the first partition
alphabetically, which will complicate the rearrangement of data when
partitions are added or removed.

If there IS a predefined partition key, then you'll need a way to tell
the parent table what it is (and a way to remove it later if you
change your mind). Then it should be possible to validate that child
partitions are defined only in terms of that key and that they are
mutually exclusive. You'll still need basically all the same
operations: create partition, modify partition, merge partition back
into parent, drop partition.

...Robert

From:	"Robert Haas" <robertmhaas(at)gmail(dot)com>
To:	"Emmanuel Cecchet" <manu(at)frogthinker(dot)org>
Cc:	"Jaime Casanova" <jcasanov(at)systemguards(dot)com(dot)ec>, "Gregory Stark" <stark(at)enterprisedb(dot)com>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Nikhil Sontakke" <nikhil(dot)sontakke(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1
Date:	2008-12-02 22:52:07
Message-ID:	603c8f070812021452m65ed412ckc416e14b8e8f3f77@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

On Thu, Nov 27, 2008 at 11:09 AM, Emmanuel Cecchet <manu(at)frogthinker(dot)org> wrote:
> I have been following that discussion very closely but it seems that we are
> debating solutions without a good specification of the problem/requirements.
> I would suggest that we collect all the partitioning requirements on a
> dedicated Wiki page. There might not be a one size fits it all solution for
> all requirements. We can also look at what other databases are proposing to
> address these issues.
> If we can prioritize features, that should also allow us to stage the
> partitioning implementation.

This might be a good idea. Want to take a crack at it?

> I have a prototype insert trigger in C that directly move inserts in a
> master table to the appropriate child table (directly moving the tuple). Let
> me know if anyone is interested.

Can't hurt to post it.

...Robert

From:	Emmanuel Cecchet <manu(at)frogthinker(dot)org>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, Gregory Stark <stark(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Nikhil Sontakke <nikhil(dot)sontakke(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1
Date:	2008-12-10 00:51:05
Message-ID:	493F1279.8010804@frogthinker.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Hi all,

While I was trying to find the right place to add a new page on the
wiki, I found the document of Simon on partitioning requirements
(http://wiki.postgresql.org/wiki/Image:Partitioning_Requirements.pdf)
referenced from http://wiki.postgresql.org/wiki/Development_projects
I think this is a good base to start from. Should we convert the doc
into a wiki page or get the source for the doc and go from there?

I attach what I have come up with so far for the C trigger I was talking
about for efficient automatic auto-partitioning of inserts in child tables.

Emmanuel

Robert Haas wrote:
> On Thu, Nov 27, 2008 at 11:09 AM, Emmanuel Cecchet <manu(at)frogthinker(dot)org> wrote:
>
>> I have been following that discussion very closely but it seems that we are
>> debating solutions without a good specification of the problem/requirements.
>> I would suggest that we collect all the partitioning requirements on a
>> dedicated Wiki page. There might not be a one size fits it all solution for
>> all requirements. We can also look at what other databases are proposing to
>> address these issues.
>> If we can prioritize features, that should also allow us to stage the
>> partitioning implementation.
>>
>
> This might be a good idea. Want to take a crack at it?
>
>
>> I have a prototype insert trigger in C that directly move inserts in a
>> master table to the appropriate child table (directly moving the tuple). Let
>> me know if anyone is interested.
>>
>
> Can't hurt to post it.
>
> ...Robert
>
>

--
Emmanuel Cecchet
FTO @ Frog Thinker
Open Source Development & Consulting
--
Web: http://www.frogthinker.org
email: manu(at)frogthinker(dot)org
Skype: emmanuel_cecchet

Attachment	Content-Type	Size
patch-trigger-autopartition-C.txt	text/plain	10.2 KB

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Emmanuel Cecchet <manu(at)frogthinker(dot)org>
Cc:	Nikhil Sontakke <nikhil(dot)sontakke(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1
Date:	2008-12-15 17:49:08
Message-ID:	49469894.4070406@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Hackers,

We don't yet seem to have a clear specification for this feature, and
the Other Open Source DB has shown us how problematic it is to get
auto-partitioning wrong.

Should we defer auto-partitioning to 8.5?

--Josh

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Emmanuel Cecchet <manu(at)frogthinker(dot)org>, Nikhil Sontakke <nikhil(dot)sontakke(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1
Date:	2008-12-15 18:46:43
Message-ID:	87abaxp97w.fsf@oxford.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Josh Berkus <josh(at)agliodbs(dot)com> writes:

> Hackers,
>
> We don't yet seem to have a clear specification for this feature, and the Other
> Open Source DB has shown us how problematic it is to get auto-partitioning
> wrong.
>
> Should we defer auto-partitioning to 8.5?

If we're serious about having a "next generation" partitioning with a concept
of partition keys then it seems to me to make more sense to do that first and
then add on a feature like this.

This is still very useful. I haven't looked at the actual patch, does it
require core changes or can it be stashed in a pgfoundry or contrib module?

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's 24x7 Postgres support!

From:	"Jaime Casanova" <jcasanov(at)systemguards(dot)com(dot)ec>
To:	"Gregory Stark" <stark(at)enterprisedb(dot)com>
Cc:	"Josh Berkus" <josh(at)agliodbs(dot)com>, "Emmanuel Cecchet" <manu(at)frogthinker(dot)org>, "Nikhil Sontakke" <nikhil(dot)sontakke(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1
Date:	2008-12-15 19:01:19
Message-ID:	3073cc9b0812151101h3db74130xb5a808422e48c3f3@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

On Mon, Dec 15, 2008 at 1:46 PM, Gregory Stark <stark(at)enterprisedb(dot)com> wrote:
> Josh Berkus <josh(at)agliodbs(dot)com> writes:
>
>> Hackers,
>>
>> We don't yet seem to have a clear specification for this feature, and the Other
>> Open Source DB has shown us how problematic it is to get auto-partitioning
>> wrong.
>>
>> Should we defer auto-partitioning to 8.5?
>
> If we're serious about having a "next generation" partitioning with a concept
> of partition keys then it seems to me to make more sense to do that first and
> then add on a feature like this.
>

> This is still very useful. I haven't looked at the actual patch, does it
> require core changes or can it be stashed in a pgfoundry or contrib module?
>

what i don't like about this one is that it creates partitions at
create table time and to manually add all new partitions (inherit
tables and modify the trigger)... and what i want to see is an
automatic creation when it's needed...

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

From:	Emmanuel Cecchet <manu(at)frogthinker(dot)org>
To:	Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
Cc:	Gregory Stark <stark(at)enterprisedb(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Nikhil Sontakke <nikhil(dot)sontakke(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1
Date:	2008-12-16 02:05:16
Message-ID:	49470CDC.7020707@frogthinker.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Hi all,

I will be working on a roadmap for the partitioning features. I think
that there are different needs and that we will not be able to address
them all in 8.5 or even 8.6.
The goal will be to get things done step by step but possibly with a
design that will not require major refactoring to support new features.
I will try to setup the new wiki page tomorrow.

In the meantime, I have made some more tests with the trigger in C (see
attached patch). To prevent duplicating too much code, it requires the
ExecRelCheck method to be exported (that would be nice to have this
function exported in 8.4 so that we can start experimenting in 8.4 and
don't have to wait another year for 8.5). If there is locality in the
inserts (which might be the case if you COPY sorted data), the
performance remains constant regardless the number of child tables.
My initial tests to insert 140k rows are as follows:
- direct inserts in a child table: 2 seconds
- pgplsql trigger (IF ... ELSE IF ... blocks) : 14.5 seconds.
- C trigger: 4 seconds (actually the overhead is in the constraint check)

Right now if the row cannot be inserted in any child table, it is
inserted in the parent. If you want to fail, we can add a 'fail trigger'
(after all other triggers) that generates an error if previous triggers
did not capture the row. If you want to create a new partition, you can
have another trigger to handle that.

So I think that this trigger approach is pretty flexible like people
used AOP in J2EE servers to process requests. It has also the advantage
of allowing fast prototyping. It should also be easy to push that
functionality down in the core as needed.

Is it ok if I move Simon's requirement document under a more generic
'Table partitioning' page on the Wiki?

Thanks for your feedback,
manu

Jaime Casanova wrote:
> On Mon, Dec 15, 2008 at 1:46 PM, Gregory Stark <stark(at)enterprisedb(dot)com> wrote:
>
>> Josh Berkus <josh(at)agliodbs(dot)com> writes:
>>
>>
>>> Hackers,
>>>
>>> We don't yet seem to have a clear specification for this feature, and the Other
>>> Open Source DB has shown us how problematic it is to get auto-partitioning
>>> wrong.
>>>
>>> Should we defer auto-partitioning to 8.5?
>>>
>> If we're serious about having a "next generation" partitioning with a concept
>> of partition keys then it seems to me to make more sense to do that first and
>> then add on a feature like this.
>>
>>
>
> +1
>
>
>> This is still very useful. I haven't looked at the actual patch, does it
>> require core changes or can it be stashed in a pgfoundry or contrib module?
>>
>>
>
> what i don't like about this one is that it creates partitions at
> create table time and to manually add all new partitions (inherit
> tables and modify the trigger)... and what i want to see is an
> automatic creation when it's needed...
>
>
>

--
Emmanuel Cecchet
FTO @ Frog Thinker
Open Source Development & Consulting
--
Web: http://www.frogthinker.org
email: manu(at)frogthinker(dot)org
Skype: emmanuel_cecchet

Attachment	Content-Type	Size
patch-trigger-autopartition-C-v2.txt	text/plain	11.7 KB

From:	"Nikhil Sontakke" <nikhil(dot)sontakke(at)enterprisedb(dot)com>
To:	"Emmanuel Cecchet" <manu(at)frogthinker(dot)org>
Cc:	"Jaime Casanova" <jcasanov(at)systemguards(dot)com(dot)ec>, "Gregory Stark" <stark(at)enterprisedb(dot)com>, "Josh Berkus" <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1
Date:	2008-12-16 06:35:11
Message-ID:	a301bfd90812152235j7af3bfabh5265c1e3ec813fd9@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Hi,

> I will be working on a roadmap for the partitioning features. I think that
> there are different needs and that we will not be able to address them all
> in 8.5 or even 8.6.
> The goal will be to get things done step by step but possibly with a design
> that will not require major refactoring to support new features. I will try
> to setup the new wiki page tomorrow.
>

+1.

>
> In the meantime, I have made some more tests with the trigger in C (see
> attached patch). To prevent duplicating too much code, it requires the
> ExecRelCheck method to be exported (that would be nice to have this function
> exported in 8.4 so that we can start experimenting in 8.4 and don't have to
> wait another year for 8.5). If there is locality in the inserts (which might
> be the case if you COPY sorted data), the performance remains constant
> regardless the number of child tables.

A similar DELETE trigger should be pretty easy to write up in C. I think the
main challenge is with UPDATE triggers especially if the new row will fall
into another child table - but we can always throw an error for such a case
initially.

>
> Right now if the row cannot be inserted in any child table, it is inserted
> in the parent. If you want to fail, we can add a 'fail trigger' (after all
> other triggers) that generates an error if previous triggers did not capture
> the row. If you want to create a new partition, you can have another trigger
> to handle that.
>

One of the work items related to partitioning eventually is to avoid having
to APPEND the parent in all queries involving children. Maybe having an
overflow child table might help to catch failed triggers for those cases?

Regards,
Nikhils

>
> So I think that this trigger approach is pretty flexible like people used
> AOP in J2EE servers to process requests. It has also the advantage of
> allowing fast prototyping. It should also be easy to push that functionality
> down in the core as needed.
>
> Is it ok if I move Simon's requirement document under a more generic 'Table
> partitioning' page on the Wiki?
>
> Thanks for your feedback,
> manu
>
> Jaime Casanova wrote:
>
>> On Mon, Dec 15, 2008 at 1:46 PM, Gregory Stark <stark(at)enterprisedb(dot)com>
>> wrote:
>>
>>
>>> Josh Berkus <josh(at)agliodbs(dot)com> writes:
>>>
>>>
>>>
>>>> Hackers,
>>>>
>>>> We don't yet seem to have a clear specification for this feature, and
>>>> the Other
>>>> Open Source DB has shown us how problematic it is to get
>>>> auto-partitioning
>>>> wrong.
>>>>
>>>> Should we defer auto-partitioning to 8.5?
>>>>
>>>>
>>> If we're serious about having a "next generation" partitioning with a
>>> concept
>>> of partition keys then it seems to me to make more sense to do that first
>>> and
>>> then add on a feature like this.
>>>
>>>
>>>
>>
>> +1
>>
>>
>>
>>> This is still very useful. I haven't looked at the actual patch, does it
>>> require core changes or can it be stashed in a pgfoundry or contrib
>>> module?
>>>
>>>
>>>
>>
>> what i don't like about this one is that it creates partitions at
>> create table time and to manually add all new partitions (inherit
>> tables and modify the trigger)... and what i want to see is an
>> automatic creation when it's needed...
>>
>>
>>
>>
>
>
> --
> Emmanuel Cecchet
> FTO @ Frog Thinker Open Source Development & Consulting
> --
> Web: http://www.frogthinker.org
> email: manu(at)frogthinker(dot)org
> Skype: emmanuel_cecchet
>
>
> ### Eclipse Workspace Patch 1.0
> #P Postgres-HEAD
> Index: src/test/regress/regress.c
> ===================================================================
> RCS file: /root/cvsrepo/pgsql/src/test/regress/regress.c,v
> retrieving revision 1.71
> diff -u -r1.71 regress.c
> --- src/test/regress/regress.c 25 Mar 2008 22:42:46 -0000 1.71
> +++ src/test/regress/regress.c 16 Dec 2008 01:42:50 -0000
> @@ -10,6 +10,9 @@
> #include "utils/geo_decls.h" /* includes <math.h> */
> #include "executor/executor.h" /* For GetAttributeByName */
> #include "commands/sequence.h" /* for nextval() */
> +#include "catalog/namespace.h"
> +#include "executor/executor.h"
> +#include "executor/tuptable.h"
>
> #define P_MAXDIG 12
> #define LDELIM '('
> @@ -732,3 +735,141 @@
> *--walk = '\0';
> PG_RETURN_CSTRING(result);
> }
> +
> +
> +/*
> + * Partition trigger test
> + *
> + * The trigger should be used this way:
> + * CREATE TRIGGER trigger_name
> + BEFORE INSERT ON master_table
> + FOR EACH ROW EXECUTE PROCEDURE
> partition_insert_trigger(child_table_name, column_number, min_val, max_val);
> + */
> +
> +extern Datum partition_insert_trigger(PG_FUNCTION_ARGS);
> +
> +static Datum
> +check_constraints_and_insert_tuple(Relation child_table_relation,
> TriggerData *trigdata ,HeapTuple trigtuple)
> +{ // Check the constraints
> + ResultRelInfo *resultRelInfo;
> + TupleTableSlot *slot;
> + EState *estate = CreateExecutorState();
> + Datum result;
> +
> + result = PointerGetDatum(trigdata->tg_trigtuple);
> +
> + resultRelInfo = makeNode(ResultRelInfo);
> + resultRelInfo->ri_RangeTableIndex = 1; /* dummy */
> + resultRelInfo->ri_RelationDesc = child_table_relation;
> +
> + estate->es_result_relations = resultRelInfo;
> + estate->es_num_result_relations = 1;
> + estate->es_result_relation_info = resultRelInfo;
> +
> + /* Set up a tuple slot too */
> + slot = MakeSingleTupleTableSlot(trigdata->tg_relation->rd_att);
> + ExecStoreTuple(trigtuple, slot, InvalidBuffer, false);
> +
> + if (ExecRelCheck(resultRelInfo, slot, estate) == NULL)
> + { // Constraints satisfied, insert the row in the child table
> + bool use_wal = true;
> + bool use_fsm=true;
> +
> + /* BEFORE ROW INSERT Triggers */
> + if (resultRelInfo->ri_TrigDesc &&
> +
> resultRelInfo->ri_TrigDesc->n_before_row[TRIGGER_EVENT_INSERT] > 0)
> + {
> + HeapTuple newtuple;
> + newtuple = ExecBRInsertTriggers(estate,
> resultRelInfo, trigtuple);
> +
> + if (newtuple != trigtuple) /* modified by
> Trigger(s) */
> + {
> + heap_freetuple(trigtuple);
> + trigtuple = newtuple;
> + }
> + }
> +
> + /* Perform the insert
> + * TODO: Check that we detect constraint violation if
> before row insert does something bad */
> + heap_insert(child_table_relation, trigtuple,
> GetCurrentCommandId(true), use_wal, use_fsm);
> +
> + /* Update indices */
> + ExecOpenIndices(resultRelInfo);
> + if (resultRelInfo->ri_NumIndices > 0)
> + ExecInsertIndexTuples(slot, &(trigtuple->t_self),
> estate, false);
> +
> + /* AFTER ROW INSERT Triggers */
> + ExecARInsertTriggers(estate, resultRelInfo, trigtuple);
> +
> + result = PointerGetDatum(NULL);
> + }
> + // Free resources
> + FreeExecutorState(estate);
> + ExecDropSingleTupleTableSlot(slot);
> +
> + return result;
> +}
> +
> +
> +PG_FUNCTION_INFO_V1(partition_insert_trigger);
> +
> +static Relation last_inserted_relation;
> +
> +Datum
> +partition_insert_trigger(PG_FUNCTION_ARGS)
> +{
> + TriggerData *trigdata = (TriggerData *) fcinfo->context;
> + HeapTuple trigtuple= trigdata->tg_trigtuple;
> + TupleConstr *constr;
> + char *child_table_name;
> + Relation child_table_relation;
> + Oid relation_id;
> + Datum result;
> +
> + // Try to exploit locality for bulk inserts
> + // We expect consecutive insert to go to the same child table
> + if (last_inserted_relation != NULL)
> + { // Try the last table we used
> + result =
> check_constraints_and_insert_tuple(last_inserted_relation, trigdata,
> trigtuple);
> + if (result == PointerGetDatum(NULL))
> + return result;
> + // We got a miss
> + last_inserted_relation = NULL;
> + }
> +
> + /* make sure it's called as a trigger at all */
> + if (!CALLED_AS_TRIGGER(fcinfo))
> + elog(ERROR, "partition_insert_trigger: not called by trigger
> manager");
> +
> + /* Sanity checks */
> + if (!TRIGGER_FIRED_BY_INSERT(trigdata->tg_event) ||
> !TRIGGER_FIRED_BEFORE(trigdata->tg_event))
> + elog(ERROR, "partition_insert_trigger: not called on insert
> before");
> +
> + // Child table name is either given as the unique parameter or it is
> the name of the trigger
> + if (trigdata->tg_trigger->tgnargs == 1)
> + child_table_name = trigdata->tg_trigger->tgargs[0];
> + else
> + child_table_name = trigdata->tg_trigger->tgname;
> +
> + // Lookup the child relation
> + relation_id = RelnameGetRelid(child_table_name);
> + if (relation_id == InvalidOid)
> + elog(ERROR, "partition_insert_trigger: Invalid child table %s",
> child_table_name);
> + child_table_relation = RelationIdGetRelation(relation_id);
> + if (child_table_relation == NULL)
> + elog(ERROR, "partition_insert_trigger: Failed to locate relation
> for child table %s", child_table_name);
> +
> + constr = child_table_relation->rd_att->constr;
> + if (constr->num_check == 0)
> + elog(ERROR, "partition_insert_trigger: No constraint found
> for child table %s", child_table_name);
> +
> + result = check_constraints_and_insert_tuple(child_table_relation,
> trigdata, trigtuple);
> +
> + if (result == PointerGetDatum(NULL))
> + last_inserted_relation = child_table_relation;
> +
> + RelationClose(child_table_relation);
> +
> + return result;
> +}
> +
> Index: src/backend/executor/execMain.c
> ===================================================================
> RCS file: /root/cvsrepo/pgsql/src/backend/executor/execMain.c,v
> retrieving revision 1.314
> diff -u -r1.314 execMain.c
> --- src/backend/executor/execMain.c 31 Oct 2008 21:07:54 -0000
> 1.314
> +++ src/backend/executor/execMain.c 16 Dec 2008 01:42:50 -0000
> @@ -1947,7 +1947,7 @@
> /*
> * ExecRelCheck --- check that tuple meets constraints for result relation
> */
> -static const char *
> +const char *
> ExecRelCheck(ResultRelInfo *resultRelInfo,
> TupleTableSlot *slot, EState *estate)
> {
> Index: src/test/regress/output/create_function_1.source
> ===================================================================
> RCS file:
> /root/cvsrepo/pgsql/src/test/regress/output/create_function_1.source,v
> retrieving revision 1.34
> diff -u -r1.34 create_function_1.source
> --- src/test/regress/output/create_function_1.source 31 Oct 2008
> 19:37:56 -0000 1.34
> +++ src/test/regress/output/create_function_1.source 16 Dec 2008
> 01:42:50 -0000
> @@ -47,6 +47,10 @@
> RETURNS int4
> AS '@libdir@/regress(at)DLSUFFIX@'
> LANGUAGE C STRICT;
> +CREATE FUNCTION partition_insert_trigger ()
> + RETURNS trigger
> + AS '@libdir@/regress(at)DLSUFFIX@'
> + LANGUAGE C STRICT;
> -- Things that shouldn't work:
> CREATE FUNCTION test1 (int) RETURNS int LANGUAGE SQL
> AS 'SELECT ''not an integer'';';
> @@ -80,3 +84,70 @@
> CREATE FUNCTION test1 (int) RETURNS int LANGUAGE internal
> AS 'nosuch';
> ERROR: there is no built-in function named "nosuch"
> +-- Partitioning trigger test
> +CREATE TABLE master (
> + id int not null,
> + date date not null,
> + value int
> +);
> +CREATE TABLE child_y2008m01 (
> + CHECK ( date >= DATE '2008-01-01' AND date < DATE '2008-02-01' )
> +) INHERITS (master);
> +CREATE TABLE child_y2008m02 (
> + CHECK ( date >= DATE '2008-02-01' AND date < DATE '2008-03-01' )
> +) INHERITS (master);
> +CREATE TABLE child_y2008m03 (
> + CHECK ( date >= DATE '2008-03-01' AND date < DATE '2008-04-01' )
> +) INHERITS (master);
> +CREATE TRIGGER insert_child_y2008m01
> + BEFORE INSERT ON master
> + FOR EACH ROW EXECUTE PROCEDURE
> partition_insert_trigger('child_y2008m01');
> +CREATE TRIGGER child_y2008m02
> + BEFORE INSERT ON master
> + FOR EACH ROW EXECUTE PROCEDURE partition_insert_trigger();
> +CREATE TRIGGER child_y2008m03
> + BEFORE INSERT ON master
> + FOR EACH ROW EXECUTE PROCEDURE partition_insert_trigger();
> +INSERT INTO master VALUES (1, '2008-01-15', 1);
> +INSERT INTO master VALUES (2, '2008-02-15', 2);
> +INSERT INTO master VALUES (3, '2008-03-15', 3);
> +INSERT INTO master VALUES (4, '2008-04-15', 4);
> +COPY master FROM '/root/pg_partitions/copy_input.txt';
> +select * from master;
> + id | date | value
> +----+------------+-------
> + 4 | 04-15-2008 | 4
> + 1 | 01-15-2008 | 1
> + 11 | 01-10-2008 | 11
> + 2 | 02-15-2008 | 2
> + 12 | 02-15-2008 | 12
> + 3 | 03-15-2008 | 3
> + 13 | 03-15-2008 | 13
> +(7 rows)
> +
> + select * from child_y2008m01;
> + id | date | value
> +----+------------+-------
> + 1 | 01-15-2008 | 1
> + 11 | 01-10-2008 | 11
> +(2 rows)
> +
> + select * from child_y2008m02;
> + id | date | value
> +----+------------+-------
> + 2 | 02-15-2008 | 2
> + 12 | 02-15-2008 | 12
> +(2 rows)
> +
> + select * from child_y2008m03;
> + id | date | value
> +----+------------+-------
> + 3 | 03-15-2008 | 3
> + 13 | 03-15-2008 | 13
> +(2 rows)
> +
> +DROP TABLE master CASCADE;
> +NOTICE: drop cascades to 3 other objects
> +DETAIL: drop cascades to table child_y2008m01
> +drop cascades to table child_y2008m02
> +drop cascades to table child_y2008m03
> Index: src/include/executor/executor.h
> ===================================================================
> RCS file: /root/cvsrepo/pgsql/src/include/executor/executor.h,v
> retrieving revision 1.152
> diff -u -r1.152 executor.h
> --- src/include/executor/executor.h 31 Oct 2008 21:07:55 -0000
> 1.152
> +++ src/include/executor/executor.h 16 Dec 2008 01:42:50 -0000
> @@ -155,6 +155,8 @@
> extern bool ExecContextForcesOids(PlanState *planstate, bool *hasoids);
> extern void ExecConstraints(ResultRelInfo *resultRelInfo,
> TupleTableSlot *slot, EState *estate);
> +extern const char *ExecRelCheck(ResultRelInfo *resultRelInfo,
> + TupleTableSlot *slot, EState *estate);
> extern TupleTableSlot *EvalPlanQual(EState *estate, Index rti,
> ItemPointer tid, TransactionId priorXmax);
> extern PlanState *ExecGetActivePlanTree(QueryDesc *queryDesc);
> Index: src/test/regress/input/create_function_1.source
> ===================================================================
> RCS file:
> /root/cvsrepo/pgsql/src/test/regress/input/create_function_1.source,v
> retrieving revision 1.19
> diff -u -r1.19 create_function_1.source
> --- src/test/regress/input/create_function_1.source 1 Oct 2008 22:38:57
> -0000 1.19
> +++ src/test/regress/input/create_function_1.source 16 Dec 2008
> 01:42:50 -0000
> @@ -52,6 +52,12 @@
> AS '@libdir@/regress(at)DLSUFFIX@'
> LANGUAGE C STRICT;
>
> +CREATE FUNCTION partition_insert_trigger ()
> + RETURNS trigger
> + AS '@libdir@/regress(at)DLSUFFIX@'
> + LANGUAGE C STRICT;
> +
> +
> -- Things that shouldn't work:
>
> CREATE FUNCTION test1 (int) RETURNS int LANGUAGE SQL
> @@ -77,3 +83,48 @@
>
> CREATE FUNCTION test1 (int) RETURNS int LANGUAGE internal
> AS 'nosuch';
> +
> +-- Partitioning trigger test
> +
> +CREATE TABLE master (
> + id int not null,
> + date date not null,
> + value int
> +);
> +
> +CREATE TABLE child_y2008m01 (
> + CHECK ( date >= DATE '2008-01-01' AND date < DATE '2008-02-01' )
> +) INHERITS (master);
> +
> +CREATE TABLE child_y2008m02 (
> + CHECK ( date >= DATE '2008-02-01' AND date < DATE '2008-03-01' )
> +) INHERITS (master);
> +
> +CREATE TABLE child_y2008m03 (
> + CHECK ( date >= DATE '2008-03-01' AND date < DATE '2008-04-01' )
> +) INHERITS (master);
> +
> +CREATE TRIGGER insert_child_y2008m01
> + BEFORE INSERT ON master
> + FOR EACH ROW EXECUTE PROCEDURE
> partition_insert_trigger('child_y2008m01');
> +
> +CREATE TRIGGER child_y2008m02
> + BEFORE INSERT ON master
> + FOR EACH ROW EXECUTE PROCEDURE partition_insert_trigger();
> +
> +CREATE TRIGGER child_y2008m03
> + BEFORE INSERT ON master
> + FOR EACH ROW EXECUTE PROCEDURE partition_insert_trigger();
> +
> +INSERT INTO master VALUES (1, '2008-01-15', 1);
> +INSERT INTO master VALUES (2, '2008-02-15', 2);
> +INSERT INTO master VALUES (3, '2008-03-15', 3);
> +INSERT INTO master VALUES (4, '2008-04-15', 4);
> +COPY master FROM '/root/pg_partitions/copy_input.txt';
> +
> +select * from master;
> +select * from child_y2008m01;
> +select * from child_y2008m02;
> +select * from child_y2008m03;
> +
> +DROP TABLE master CASCADE;
>
>

--
http://www.enterprisedb.com

From:	ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To:	Emmanuel Cecchet <manu(at)frogthinker(dot)org>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1
Date:	2008-12-16 07:02:17
Message-ID:	20081216153856.C013.52131E4D@oss.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Emmanuel Cecchet <manu(at)frogthinker(dot)org> wrote:

> In the meantime, I have made some more tests with the trigger in C (see
> attached patch).

Hmm... The inserting partition is passed by trigger arguments.
Users must replace triggers when the target is changed (ex. every month).
Is it possible to expand all of child paritions from pg_inherits and
determine a suitable parition by checking their constraints?
We can also use it when there are multiple inserting paritions,
something like hash or list paritioning. Fixed target is only applicable
to time-based range paritioning.

BTW, there is another issue in trigger approach. If INSERT commands
are interrupted by triggers, server says "INSERT 0 row" though
rows are inserted into child tables. Since using C, we could
use some back doors to modify a variable counting affected rows.
We could use partitioned tables more transparently if we have it.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

From:	Emmanuel Cecchet <manu(at)frogthinker(dot)org>
To:	Nikhil Sontakke <nikhil(dot)sontakke(at)enterprisedb(dot)com>
Cc:	Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, Gregory Stark <stark(at)enterprisedb(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1
Date:	2008-12-16 07:13:29
Message-ID:	49475519.9020000@frogthinker.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Nikhil Sontakke wrote:
> A similar DELETE trigger should be pretty easy to write up in C. I
> think the main challenge is with UPDATE triggers especially if the new
> row will fall into another child table - but we can always throw an
> error for such a case initially.
I agree. A first implementation could be restricted to updates within
the same partition.
>
> Right now if the row cannot be inserted in any child table, it is
> inserted in the parent. If you want to fail, we can add a 'fail
> trigger' (after all other triggers) that generates an error if
> previous triggers did not capture the row. If you want to create a
> new partition, you can have another trigger to handle that.
>
>
> One of the work items related to partitioning eventually is to avoid
> having to APPEND the parent in all queries involving children. Maybe
> having an overflow child table might help to catch failed triggers for
> those cases?
This is a good option too.

Emmanuel

>
> Regards,
> Nikhils
>
>
>
> So I think that this trigger approach is pretty flexible like
> people used AOP in J2EE servers to process requests. It has also
> the advantage of allowing fast prototyping. It should also be easy
> to push that functionality down in the core as needed.
>
> Is it ok if I move Simon's requirement document under a more
> generic 'Table partitioning' page on the Wiki?
>
> Thanks for your feedback,
> manu
>
>
> Jaime Casanova wrote:
>
> On Mon, Dec 15, 2008 at 1:46 PM, Gregory Stark
> <stark(at)enterprisedb(dot)com <mailto:stark(at)enterprisedb(dot)com>> wrote:
>
>
> Josh Berkus <josh(at)agliodbs(dot)com <mailto:josh(at)agliodbs(dot)com>>
> writes:
>
>
>
> Hackers,
>
> We don't yet seem to have a clear specification for
> this feature, and the Other
> Open Source DB has shown us how problematic it is to
> get auto-partitioning
> wrong.
>
> Should we defer auto-partitioning to 8.5?
>
>
> If we're serious about having a "next generation"
> partitioning with a concept
> of partition keys then it seems to me to make more sense
> to do that first and
> then add on a feature like this.
>
>
>
>
> +1
>
>
>
> This is still very useful. I haven't looked at the actual
> patch, does it
> require core changes or can it be stashed in a pgfoundry
> or contrib module?
>
>
>
>
> what i don't like about this one is that it creates partitions at
> create table time and to manually add all new partitions (inherit
> tables and modify the trigger)... and what i want to see is an
> automatic creation when it's needed...
>

From:	Emmanuel Cecchet <manu(at)frogthinker(dot)org>
To:	ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1
Date:	2008-12-16 07:33:36
Message-ID:	494759D0.1090002@frogthinker.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

ITAGAKI Takahiro wrote:
> Emmanuel Cecchet <manu(at)frogthinker(dot)org> wrote
>> In the meantime, I have made some more tests with the trigger in C (see
>> attached patch).
>>
>
> Hmm... The inserting partition is passed by trigger arguments.
>
Actually this is just a fallback option. The preferred option is to name
the trigger after the child table name. This way the trigger retrieve
the table name directly from the trigger name and no argument has to be
passed to the trigger.
> Users must replace triggers when the target is changed (ex. every month).
>
I am not sure what you mean. There is one trigger per child table but
the trigger is always the same, it is just the name that is given to it
that changes.
> Is it possible to expand all of child paritions from pg_inherits and
> determine a suitable parition by checking their constraints?
>
Ideally it would be better to do this way. I have not found yet how to
automatically get all the child partitions of a parent table from the
trigger. This would simplify things by having a single trigger.
> We can also use it when there are multiple inserting paritions,
> something like hash or list paritioning. Fixed target is only applicable
> to time-based range paritioning.
>
I think there is a misunderstanding on how the trigger works. You have 1
trigger per child table and they are all chained on the parent table.
When a tuple is inserted on the parent table, the first trigger is
fired, if the constraints of the 1st child table are satisfied, the
tuple is moved in the 1st child table and that's it. If it is a miss,
the tuple is passed to the next trigger that checks the constraints of
the 2nd table. And so on.
This will work with any type of partitioning (hash or even UDF) as long
as the constraints on the child table reflect the partitioning.
> BTW, there is another issue in trigger approach. If INSERT commands
> are interrupted by triggers, server says "INSERT 0 row" though
> rows are inserted into child tables. Since using C, we could
> use some back doors to modify a variable counting affected rows.
> We could use partitioned tables more transparently if we have it.
>
Even if you don't abort the query, the query reports 0 row if it has
been moved to another table (you can COPY 100k lines and the server will
return 0 if they were all successfully moved to child tables).
Technically this is correct since 0 rows were inserted in the parent
table. Right now any number >0 is the number of rows that did not
satisfy any child table constraint and were inserted in the master table
(useful if you don't want the copy command to fail).

Regards,
Emmanuel
> Regards,
> ---
> ITAGAKI Takahiro
> NTT Open Source Software Center
>
--
Emmanuel Cecchet
FTO @ Frog Thinker
Open Source Development & Consulting
--
Web: http://www.frogthinker.org
email: manu(at)frogthinker(dot)org
Skype: emmanuel_cecchet

From:	ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To:	Emmanuel Cecchet <manu(at)frogthinker(dot)org>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1
Date:	2008-12-16 07:58:49
Message-ID:	20081216163936.C016.52131E4D@oss.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Emmanuel Cecchet <manu(at)frogthinker(dot)org> wrote:

> I think there is a misunderstanding on how the trigger works. You have 1
> trigger per child table and they are all chained on the parent table.

Oops, I misunderstand your patch, sorry.

> > Is it possible to expand all of child paritions from pg_inherits and
> > determine a suitable parition by checking their constraints?
> >
> Ideally it would be better to do this way. I have not found yet how to
> automatically get all the child partitions of a parent table from the
> trigger. This would simplify things by having a single trigger.

The chained triggers would have better flexibilty, and the auto expanding
trigger would have better usability. I'm not sure about performance
because expanding child partitions is not always faster than chained
calls of triggers.

I think chained triggers are hard to maintain. If we drop one of partition
tables, we need to reconnect the single-linked-list of the triggers.

> > server says "INSERT 0 row" though rows are inserted into child tables.
> Technically this is correct since 0 rows were inserted in the parent
> table.

Yes, but users expect non-0 result normally. Some O/R mapping tools
also checks the result exactly and raises errors (it could be turned
off, but default is on).

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

From:	Emmanuel Cecchet <manu(at)frogthinker(dot)org>
To:	ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1
Date:	2008-12-16 08:19:31
Message-ID:	49476493.1080000@frogthinker.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

ITAGAKI Takahiro wrote:
> The chained triggers would have better flexibilty, and the auto expanding
> trigger would have better usability. I'm not sure about performance
> because expanding child partitions is not always faster than chained
> calls of triggers.
>
> I think chained triggers are hard to maintain. If we drop one of partition
> tables, we need to reconnect the single-linked-list of the triggers.
>
When you drop one child table, you would also have to drop the trigger
that has the same name on the parent table. This does not seem too hard
but I may be missing something.
>>> server says "INSERT 0 row" though rows are inserted into child tables.
>>>
>> Technically this is correct since 0 rows were inserted in the parent
>> table.
>>
> Yes, but users expect non-0 result normally. Some O/R mapping tools
> also checks the result exactly and raises errors (it could be turned
> off, but default is on).
>
If the O/R mapping tool is also creating the table it should be aware of
the semantics specifics to partition. But your comment is well taken,
this seems counterintuitive and against most API semantics to return 0
when the number of inserted rows is expected. This would certainly
require some additional hooks to return the proper value.

Best regards,
Emmanuel
> Regards,
> ---
> ITAGAKI Takahiro
> NTT Open Source Software Center
>

--
Emmanuel Cecchet
FTO @ Frog Thinker
Open Source Development & Consulting
--
Web: http://www.frogthinker.org
email: manu(at)frogthinker(dot)org
Skype: emmanuel_cecchet

From:	Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>
To:	ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc:	Emmanuel Cecchet <manu(at)frogthinker(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1
Date:	2008-12-16 17:17:02
Message-ID:	C2733311-4B26-4F06-B995-0D4896142D95@pointblue.com.pl
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

On 2008-12-16, at 07:58, ITAGAKI Takahiro wrote:
>
>
>>> server says "INSERT 0 row" though rows are inserted into child
>>> tables.
>> Technically this is correct since 0 rows were inserted in the parent
>> table.
>
> Yes, but users expect non-0 result normally. Some O/R mapping tools
> also checks the result exactly and raises errors (it could be turned
> off, but default is on).

this is a general problem with triggers on inserts/updates/deletes. To
be honest, I would love to see someone fixing it in 8.4, cos it is
quite annoying - that developer is unable to figure out number of rows
affected - just because there's trigger on that table.

From:	Devrim GÜNDÜZ <devrim(at)gunduz(dot)org>
To:	PostgreSQL Hackers ML <pgsql-hackers(at)postgresql(dot)org>
Cc:	Nikhil Sontakke <nikhil(dot)sontakke(at)enterprisedb(dot)com>
Subject:	Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1
Date:	2009-01-06 17:19:59
Message-ID:	1231262399.3106.102.camel@laptop.gunduz.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

<snip>

Is there any progress on this patch? I was asked about this feature last
month, during a PostgreSQL talk. I am willing to spend time for testing
this patch, if needed.
--
Devrim GÜNDÜZ, RHCE
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
http://www.gunduz.org