Quick Links

Partitioning feature ...

Lists:	pgsql-hackers

From:	Kedar Potdar <kedar(dot)potdar(at)gmail(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Cc:	Amit Gupta <amit(dot)pc(dot)gupta(at)gmail(dot)com>
Subject:	Partitioning feature ...
Date:	2009-03-23 09:45:12
Message-ID:	bd8134a40903230245j56e2be77tbec2fd4e25cf52f1@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

We are implementing table partitioning feature to support Range and Hash
partitions. Please find attached, the WIP patch and test-cases.

The syntax used conforms to most of the suggestions mentioned in
http://archives.postgresql.org/pgsql-hackers/2008-01/msg00413.php, barring
the following:
-- Specification of partition names is optional. System will be able to
generate partition names in such cases.
-- Sub partitioning

We are maintaining a system catalog(pg_partition) for partition meta-data.
System will look-up this table to find appropriate partition to operate on.
System internally uses low-level 'C' triggers to row-movement.

Regards,
--
Kedar.

Attachment	Content-Type	Size
partition.patch	application/octet-stream	143.2 KB
partition.sql	application/octet-stream	6.2 KB

From:	Nikhil Sontakke <nikhil(dot)sontakke(at)enterprisedb(dot)com>
To:	Kedar Potdar <kedar(dot)potdar(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org, Amit Gupta <amit(dot)pc(dot)gupta(at)gmail(dot)com>
Subject:	Re: Partitioning feature ...
Date:	2009-03-23 11:39:18
Message-ID:	a301bfd90903230439n963540n3998ea0bfca56f0f@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Kedar,

>
>
> The syntax used conforms to most of the suggestions mentioned in
> http://archives.postgresql.org/pgsql-hackers/2008-01/msg00413.php, barring
> the following:
> -- Specification of partition names is optional. System will be able to
> generate partition names in such cases.
> -- Sub partitioning
>

I was wondering if there is a need to mention the type of partition while
dropping it.

E.g
ALTER table x DROP RANGE PARTITION x_part;

The type of partition (RANGE, HASH) could be dropped according to me.

>
> We are maintaining a system catalog(pg_partition) for partition meta-data.
> System will look-up this table to find appropriate partition to operate on.
> System internally uses low-level 'C' triggers to row-movement.
>

Can you elaborate more on how do you handle updates with these triggers?

Regards,
Nikhils
--
http://www.enterprisedb.com

From:	Kedar Potdar <kedar(dot)potdar(at)gmail(dot)com>
To:	Nikhil Sontakke <nikhil(dot)sontakke(at)enterprisedb(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org, Amit Gupta <amit(dot)pc(dot)gupta(at)gmail(dot)com>
Subject:	Re: Partitioning feature ...
Date:	2009-03-24 12:16:17
Message-ID:	bd8134a40903240516g51aa23e5g84226de7d5e6066c@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Nikhil,

Update operation is performed as a combination of 'delete' and 'insert'.

In Update trigger, the row is deleted from relation according to it's
'ctid'. A look-up on system catalog for partitions is performed to identify
the target table by evaluating values of partition-key attributes, of the
given row. The constraints of this target table are evaluated for this new
row and if found valid, the row is inserted.

Regards,
--
Kedar.

On Mon, Mar 23, 2009 at 5:09 PM, Nikhil Sontakke <
nikhil(dot)sontakke(at)enterprisedb(dot)com> wrote:

> Hi Kedar,
>
>>
>>
>> The syntax used conforms to most of the suggestions mentioned in
>> http://archives.postgresql.org/pgsql-hackers/2008-01/msg00413.php,
>> barring the following:
>> -- Specification of partition names is optional. System will be able to
>> generate partition names in such cases.
>> -- Sub partitioning
>>
>
> I was wondering if there is a need to mention the type of partition while
> dropping it.
>
> E.g
> ALTER table x DROP RANGE PARTITION x_part;
>
> The type of partition (RANGE, HASH) could be dropped according to me.
>
>
>>
>> We are maintaining a system catalog(pg_partition) for partition meta-data.
>> System will look-up this table to find appropriate partition to operate on.
>> System internally uses low-level 'C' triggers to row-movement.
>>
>
> Can you elaborate more on how do you handle updates with these triggers?
>
> Regards,
> Nikhils
> --
> http://www.enterprisedb.com
>

From:	Emmanuel Cecchet <manu(at)asterdata(dot)com>
To:	Kedar Potdar <kedar(dot)potdar(at)gmail(dot)com>
Cc:	<pgsql-hackers(at)postgresql(dot)org>, Amit Gupta <amit(dot)pc(dot)gupta(at)gmail(dot)com>
Subject:	Re: Partitioning feature ...
Date:	2009-03-26 22:08:02
Message-ID:	49CBFCC2.9050502@asterdata.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Kedar,

First of all, congratulations for the excellent work.
I have some comments and questions.

In get_relevent_partition (btw, relevant is spelled with an a) you are
maintaining 2 lists. I guess this is only useful for multi-column
partitions, right?
If you have a single column partition (without subpartitions), I think
you could directly return on the first match (without maintaining any
list) since you guarantee that there is no overlap between partitions.
A simple but effective optimization for inserts consists of caching the
last partition used (consecutive inserts often go to the same partition)
and try it first before going through the whole loop.

The update trigger should first check if the tuple needs to be moved. If
the updated tuple still matches the constraints of the partitions it
will not have to be moved and will save a lot of overhead.

The COPY operation should probably be optimized to use the same code as
the one in the insert trigger for partitioned tables. I guess some code
could be factorized in COPY to make the inserts more efficient.

The current trigger approach should prevent other triggers to be added
to the table, or you should make sure that the partition trigger is
always the one to execute last.

As we don't have automatic partition creation, it would be interesting
to have an optional mechanism to deal with tuples that don't match any
partition (very useful when you do bulk insert and some new data require
a new partition). Having a simple overflow partition or an error logging
mechanism would definitely help to identify these tuples and prevent
things like large COPY operations to fail.

Looking forward to your responses,
Emmanuel

>
> We are implementing table partitioning feature to support Range and
> Hash partitions. Please find attached, the WIP patch and test-cases.
>
> The syntax used conforms to most of the suggestions mentioned in
> http://archives.postgresql.org/pgsql-hackers/2008-01/msg00413.php,
> barring the following:
> -- Specification of partition names is optional. System will be able
> to generate partition names in such cases.
> -- Sub partitioning
>
> We are maintaining a system catalog(pg_partition) for partition
> meta-data. System will look-up this table to find appropriate
> partition to operate on.
> System internally uses low-level 'C' triggers to row-movement.
>
> Regards,
> --
> Kedar.
>
>
>
> ------------------------------------------------------------------------
>
>
>

--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com

From:	Kedar Potdar <kedar(dot)potdar(at)gmail(dot)com>
To:	Emmanuel Cecchet <manu(at)asterdata(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org, Amit Gupta <amit(dot)pc(dot)gupta(at)gmail(dot)com>
Subject:	Re: Partitioning feature ...
Date:	2009-03-30 11:51:21
Message-ID:	bd8134a40903300451lfbc1cc5w5756140902f2689c@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Emmanuel,

Thanks for your time. This is a WIP patch and we will integrate your
suggestions/comments as appropriate.

Regards,
--
Kedar.

On Fri, Mar 27, 2009 at 3:38 AM, Emmanuel Cecchet <manu(at)asterdata(dot)com>wrote:

> Hi Kedar,
>
> First of all, congratulations for the excellent work.
> I have some comments and questions.
>
> In get_relevent_partition (btw, relevant is spelled with an a) you are
> maintaining 2 lists.

> Oops! 'a' typographical error.

I guess this is only useful for multi-column partitions, right?
> If you have a single column partition (without subpartitions), I think you
> could directly return on the first match (without maintaining any list)
> since you guarantee that there is no overlap between partitions.
> A simple but effective optimization for inserts consists of caching the
> last partition used (consecutive inserts often go to the same partition) and
> try it first before going through the whole loop.

> Yep.

>
>
> The update trigger should first check if the tuple needs to be moved. If
> the updated tuple still matches the constraints of the partitions it will
> not have to be moved and will save a lot of overhead.

> Yes. We agree on that.

>
>
> The COPY operation should probably be optimized to use the same code as the
> one in the insert trigger for partitioned tables. I guess some code could be
> factorized in COPY to make the inserts more efficient.
>
> The current trigger approach should prevent other triggers to be added to
> the table, or you should make sure that the partition trigger is always the
> one to execute last.

> As triggers are executed in order of their names, we've prefixed the
trigger names with "zz". This should work fine as long as no-one uses
trigger-name which starts with "zz".

>
> As we don't have automatic partition creation, it would be interesting to
> have an optional mechanism to deal with tuples that don't match any
> partition (very useful when you do bulk insert and some new data require a
> new partition). Having a simple overflow partition or an error logging
> mechanism would definitely help to identify these tuples and prevent things
> like large COPY operations to fail.

> Will get back on this.

>
>
>
> Looking forward to your responses,
> Emmanuel
>
>
>> We are implementing table partitioning feature to support Range and Hash
>> partitions. Please find attached, the WIP patch and test-cases.
>>
>> The syntax used conforms to most of the suggestions mentioned in
>> http://archives.postgresql.org/pgsql-hackers/2008-01/msg00413.php,
>> barring the following:
>> -- Specification of partition names is optional. System will be able to
>> generate partition names in such cases.
>> -- Sub partitioning
>>
>> We are maintaining a system catalog(pg_partition) for partition meta-data.
>> System will look-up this table to find appropriate partition to operate on.
>> System internally uses low-level 'C' triggers to row-movement.
>>
>> Regards,
>> --
>> Kedar.
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>>
>>
>>
>
>
> --
> Emmanuel Cecchet
> Aster Data Systems
> Web: http://www.asterdata.com
>
>

From:	Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
To:	Kedar Potdar <kedar(dot)potdar(at)gmail(dot)com>
Cc:	Emmanuel Cecchet <manu(at)asterdata(dot)com>, pgsql-hackers(at)postgresql(dot)org, Amit Gupta <amit(dot)pc(dot)gupta(at)gmail(dot)com>
Subject:	Re: Partitioning feature ...
Date:	2009-03-30 14:59:05
Message-ID:	3073cc9b0903300759o433f7fc7o15bec26498c12bcd@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Mar 30, 2009 at 6:51 AM, Kedar Potdar <kedar(dot)potdar(at)gmail(dot)com> wrote:
>
>> As triggers are executed in order of their names, we've prefixed the
>> trigger names with "zz". This should work fine as long as no-one uses
>> trigger-name which starts with "zz".
>>

this seems a lot fragile... why system generated triggers has to be
executed following the same rules (talking about order of execution)
as user triggers? can't we simply execute them first or last or maybe
be clever and mark one to be executed first and others last?

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

From:	Emmanuel Cecchet <manu(at)frogthinker(dot)org>
To:	Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
Cc:	Kedar Potdar <kedar(dot)potdar(at)gmail(dot)com>, Emmanuel Cecchet <manu(at)asterdata(dot)com>, pgsql-hackers(at)postgresql(dot)org, Amit Gupta <amit(dot)pc(dot)gupta(at)gmail(dot)com>
Subject:	Re: Partitioning feature ...
Date:	2009-03-30 15:42:53
Message-ID:	49D0E87D.7010808@frogthinker.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I agree with Jaime that system triggers should execute independently of
user triggers.
In the particular case of partitioning, the system trigger should
execute after the user triggers. However, as the partitioning trigger is
a row level trigger, it is not clear what is going to happen with user
triggers that work at the statement level.

Emmanuel

Jaime Casanova wrote:
> On Mon, Mar 30, 2009 at 6:51 AM, Kedar Potdar <kedar(dot)potdar(at)gmail(dot)com> wrote:
>
>>> As triggers are executed in order of their names, we've prefixed the
>>> trigger names with "zz". This should work fine as long as no-one uses
>>> trigger-name which starts with "zz".
>>>
>>>
>
> this seems a lot fragile... why system generated triggers has to be
> executed following the same rules (talking about order of execution)
> as user triggers? can't we simply execute them first or last or maybe
> be clever and mark one to be executed first and others last?
>
>

--
Emmanuel Cecchet
FTO @ Frog Thinker
Open Source Development & Consulting
--
Web: http://www.frogthinker.org
email: manu(at)frogthinker(dot)org
Skype: emmanuel_cecchet

From:	Nikhil Sontakke <nikhil(dot)sontakke(at)enterprisedb(dot)com>
To:	Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
Cc:	Kedar Potdar <kedar(dot)potdar(at)gmail(dot)com>, Emmanuel Cecchet <manu(at)asterdata(dot)com>, pgsql-hackers(at)postgresql(dot)org, Amit Gupta <amit(dot)pc(dot)gupta(at)gmail(dot)com>
Subject:	Re: Partitioning feature ...
Date:	2009-03-31 14:03:44
Message-ID:	a301bfd90903310703o34e65653j27d9776605a381b9@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

>
> >> As triggers are executed in order of their names, we've prefixed the
> >> trigger names with "zz". This should work fine as long as no-one uses
> >> trigger-name which starts with "zz".
> >>
>
> this seems a lot fragile... why system generated triggers has to be
> executed following the same rules (talking about order of execution)
> as user triggers? can't we simply execute them first or last or maybe
> be clever and mark one to be executed first and others last?
>

AFAICS, we do not have any category like system triggers. So yeah, it would
have been nice to generate triggers with names (starting with __ for
example) for such special triggers. But I don't think we disallow
user-triggers starting with underscores etc.

So some of the options could be:

- to add a new column in pg_trigger to indicate special or system triggers
which can be executed last (sorted order if multiple entries)

- invent a prefix "__partition__" or something and disallow user triggers to
use such a prefix for their names, plus introduce logic to execute them
(again sorted order if multiple entries) last.

Regards,
Nikhils
--
http://www.enterprisedb.com

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Nikhil Sontakke <nikhil(dot)sontakke(at)enterprisedb(dot)com>
Cc:	Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, Kedar Potdar <kedar(dot)potdar(at)gmail(dot)com>, Emmanuel Cecchet <manu(at)asterdata(dot)com>, pgsql-hackers(at)postgresql(dot)org, Amit Gupta <amit(dot)pc(dot)gupta(at)gmail(dot)com>
Subject:	Re: Partitioning feature ...
Date:	2009-03-31 14:46:59
Message-ID:	20090331144659.GS23023@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Nikhil Sontakke escribió:

> > >> As triggers are executed in order of their names, we've prefixed the
> > >> trigger names with "zz". This should work fine as long as no-one uses
> > >> trigger-name which starts with "zz".
> >
> > this seems a lot fragile... why system generated triggers has to be
> > executed following the same rules (talking about order of execution)
> > as user triggers? can't we simply execute them first or last or maybe
> > be clever and mark one to be executed first and others last?
>
> AFAICS, we do not have any category like system triggers. So yeah, it would
> have been nice to generate triggers with names (starting with __ for
> example) for such special triggers. But I don't think we disallow
> user-triggers starting with underscores etc.

We already have system triggers -- the FK triggers. I don't think we've
had all that much trouble with them.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

From:	Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Nikhil Sontakke <nikhil(dot)sontakke(at)enterprisedb(dot)com>, Kedar Potdar <kedar(dot)potdar(at)gmail(dot)com>, Emmanuel Cecchet <manu(at)asterdata(dot)com>, pgsql-hackers(at)postgresql(dot)org, Amit Gupta <amit(dot)pc(dot)gupta(at)gmail(dot)com>
Subject:	Re: Partitioning feature ...
Date:	2009-03-31 15:21:01
Message-ID:	3073cc9b0903310821m55caed54l6b2763b92fdcf8cf@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Mar 31, 2009 at 9:46 AM, Alvaro Herrera
<alvherre(at)commandprompt(dot)com> wrote:
>>
>> AFAICS, we do not have any category like system triggers. So yeah, it would
>> have been nice to generate triggers with names (starting with __ for
>> example) for such special triggers. But I don't think we disallow
>> user-triggers starting with underscores etc.
>
> We already have system triggers -- the FK triggers. I don't think we've
> had all that much trouble with them.
>

yeah! but we mark them with pg_trigger.tgisconstraint, but i'm not
completely convinced that we should use that same field

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Nikhil Sontakke <nikhil(dot)sontakke(at)enterprisedb(dot)com>, Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, Kedar Potdar <kedar(dot)potdar(at)gmail(dot)com>, Emmanuel Cecchet <manu(at)asterdata(dot)com>, pgsql-hackers(at)postgresql(dot)org, Amit Gupta <amit(dot)pc(dot)gupta(at)gmail(dot)com>
Subject:	Re: Partitioning feature ...
Date:	2009-03-31 15:40:16
Message-ID:	10774.1238514016@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Nikhil Sontakke escribi:
>>> As triggers are executed in order of their names, we've prefixed the
>>> trigger names with "zz". This should work fine as long as no-one uses
>>> trigger-name which starts with "zz".

>> this seems a lot fragile...

> We already have system triggers -- the FK triggers. I don't think we've
> had all that much trouble with them.

In the case of the FK triggers, it's intentional (and maybe even
documented) that users should be able to place their own triggers before
or after the FK triggers. Is there a good reason why partitioning
triggers should be different? If there is, maybe the feature shouldn't
be implemented via triggers in the first place.

regards, tom lane

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Nikhil Sontakke <nikhil(dot)sontakke(at)enterprisedb(dot)com>, Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, Kedar Potdar <kedar(dot)potdar(at)gmail(dot)com>, Emmanuel Cecchet <manu(at)asterdata(dot)com>, pgsql-hackers(at)postgresql(dot)org, Amit Gupta <amit(dot)pc(dot)gupta(at)gmail(dot)com>
Subject:	Re: Partitioning feature ...
Date:	2009-03-31 15:59:45
Message-ID:	49D23DF1.5000809@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
>> We already have system triggers -- the FK triggers. I don't think we've
>> had all that much trouble with them.
>>
>
> In the case of the FK triggers, it's intentional (and maybe even
> documented) that users should be able to place their own triggers before
> or after the FK triggers.

If it's documented I think it's well hidden ;-) ISTM that the fact that
we implement FK constraints via triggers is really an implementation
detail, not something the user should be encouraged to mess with.

> Is there a good reason why partitioning
> triggers should be different?
>

Probably not. ISTM that the scheme should turn tgisconstraint into a
multi-valued item (tgkind: 'u' = userland, 'c'= constraint, 'p' =
partition or some such).

cheers

andrew

From:	Emmanuel Cecchet <manu(at)asterdata(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Nikhil Sontakke <nikhil(dot)sontakke(at)enterprisedb(dot)com>, Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, Kedar Potdar <kedar(dot)potdar(at)gmail(dot)com>, Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Amit Gupta <amit(dot)pc(dot)gupta(at)gmail(dot)com>
Subject:	Re: Partitioning feature ...
Date:	2009-03-31 16:45:17
Message-ID:	49D2489D.1030100@asterdata.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Yes, there is a good reason. As a trigger can update the tuple value,
this can change the routing decision. If you have a user trigger that
tries to change the key value after the partition choice has been made,
this will lead to an integrity constraint violation which is probably
not what the user expects.
Note that user triggers with partitions will be tricky anyway
(regardless of how partitioning is implemented, that is with triggers or
not). If 2 partitions have user triggers that update the key value to
bounce the tuple to the other partition you may end up with an infinite
loop.
I am not sure what the semantic of statement triggers (still user
triggers) should be on partitioned tables.
We will probably have to come up with restrictions on triggers so that
they can only be applied to the parent table and not on child tables to
prevent nasty issues.

Emmanuel

Tom Lane wrote:
> In the case of the FK triggers, it's intentional (and maybe even
> documented) that users should be able to place their own triggers before
> or after the FK triggers. Is there a good reason why partitioning
> triggers should be different? If there is, maybe the feature shouldn't
> be implemented via triggers in the first place.
>
> regards, tom lane
>

--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Emmanuel Cecchet <manu(at)asterdata(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Nikhil Sontakke <nikhil(dot)sontakke(at)enterprisedb(dot)com>, Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, Kedar Potdar <kedar(dot)potdar(at)gmail(dot)com>, Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Amit Gupta <amit(dot)pc(dot)gupta(at)gmail(dot)com>
Subject:	Re: Partitioning feature ...
Date:	2009-03-31 16:55:38
Message-ID:	19670.1238518538@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Emmanuel Cecchet <manu(at)asterdata(dot)com> writes:
> Yes, there is a good reason. As a trigger can update the tuple value,
> this can change the routing decision. If you have a user trigger that
> tries to change the key value after the partition choice has been made,
> this will lead to an integrity constraint violation which is probably
> not what the user expects.

[ shrug... ] Badly written user triggers can break FK constraints,
too. We've tolerated that in the past because preventing it disables
useful capabilities.

I remain of the opinion that if you think you *have to* execute last,
you should not be writing this as a trigger; you'd be better off
embedding it lower in the system.

regards, tom lane

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Emmanuel Cecchet <manu(at)asterdata(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Nikhil Sontakke <nikhil(dot)sontakke(at)enterprisedb(dot)com>, Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, Kedar Potdar <kedar(dot)potdar(at)gmail(dot)com>, Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, pgsql-hackers(at)postgresql(dot)org, Amit Gupta <amit(dot)pc(dot)gupta(at)gmail(dot)com>
Subject:	Re: Partitioning feature ...
Date:	2009-03-31 17:16:49
Message-ID:	49D25001.5080003@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 3/31/09 9:45 AM, Emmanuel Cecchet wrote:
> Yes, there is a good reason. As a trigger can update the tuple value,
> this can change the routing decision. If you have a user trigger that
> tries to change the key value after the partition choice has been made,
> this will lead to an integrity constraint violation which is probably
> not what the user expects.

Actually, it's worse. Depending on the timing of the triggers, it's
possible to bypass the FK check entirely, and you can end up with
inconsistent data.

--Josh

From:	Nikhil Sontakke <nikhil(dot)sontakke(at)enterprisedb(dot)com>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, Kedar Potdar <kedar(dot)potdar(at)gmail(dot)com>, Emmanuel Cecchet <manu(at)asterdata(dot)com>, pgsql-hackers(at)postgresql(dot)org, Amit Gupta <amit(dot)pc(dot)gupta(at)gmail(dot)com>
Subject:	Re: Partitioning feature ...
Date:	2009-04-01 05:07:44
Message-ID:	a301bfd90903312207p732cc50dpb22978f989d1e6ff@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

> We already have system triggers -- the FK triggers. I don't think we've
>>> had all that much trouble with them.
>>>
>>>
>>
>> In the case of the FK triggers, it's intentional (and maybe even
>> documented) that users should be able to place their own triggers before
>> or after the FK triggers.
>>
>
> If it's documented I think it's well hidden ;-) ISTM that the fact that we
> implement FK constraints via triggers is really an implementation detail,
> not something the user should be encouraged to mess with.
>
> Is there a good reason why partitioning
>> triggers should be different?
>>
>
> Probably not. ISTM that the scheme should turn tgisconstraint into a
> multi-valued item (tgkind: 'u' = userland, 'c'= constraint, 'p' = partition
> or some such).
>

+1.

This seems to be the best way forward if we stick to triggers for
partitioning. I think they appear to serve the purpose well for this
use-case and maybe with this scheme they will be low-level enough too.

Regards,
Nikhils
--
http://www.enterprisedb.com