Quick Links

Re: Partitioning option for COPY

Lists:	pgsql-hackers

From:	Emmanuel Cecchet <manu(at)asterdata(dot)com>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Partitioning option for COPY
Date:	2009-11-11 15:51:06
Message-ID:	4AFADD6A.9070002@asterdata.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi all,

I have extracted the partitioning option for COPY (removed the error
logging part) from the previous patch. The documentation and test suite
sample are provided as well.
More details are on the wiki page at
http://wiki.postgresql.org/wiki/Auto-partitioning_in_COPY. Ignore the
error logging related comments that do not apply here.

Looking forward to your feedback
Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com

Attachment	Content-Type	Size
aster-copy-partitioning-patch-8.5-context.txt	text/plain	45.3 KB

From:	Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To:	Emmanuel Cecchet <manu(at)asterdata(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-12 00:25:24
Message-ID:	20091112092523.A949.52131E4D@oss.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Emmanuel Cecchet <manu(at)asterdata(dot)com> wrote:

> I have extracted the partitioning option for COPY (removed the error
> logging part) from the previous patch.

We can use an INSERT trigger to route tuples into partitions even now.
Why do you need an additional router for COPY? Also, it would be nicer
that the router can works not only in COPY but also in INSERT.

BTW, I'm working on meta data of partitioning now. Your "partitioning"
option in COPY could be replaced with the catalog.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

From:	Emmanuel Cecchet <manu(at)asterdata(dot)com>
To:	Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc:	Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-12 00:53:58
Message-ID:	4AFB5CA6.80704@asterdata.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,
>> I have extracted the partitioning option for COPY (removed the error
>> logging part) from the previous patch.
>>
>
> We can use an INSERT trigger to route tuples into partitions even now.
> Why do you need an additional router for COPY?
Tom has already explained on the list why using a trigger was a bad idea
(and I know we can use a trigger since I am the one who wrote it).
If you look at the code you will see that you can do optimizations in
the COPY code that you cannot do in the trigger.

> Also, it would be nicer
> that the router can works not only in COPY but also in INSERT.
>
As 8.5 will at best provide a syntactic hack on top of the existing
constraint implementation, I think that it will not hurt to have routing
in COPY since we will not have it anywhere otherwise.
> BTW, I'm working on meta data of partitioning now. Your "partitioning"
> option in COPY could be replaced with the catalog.
>
This implementation is only for the current 8.5 and it will not be
needed anymore once we get a fully functional partitioning in Postgres
which seems to be for a future version.

Best regards,
Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com

From:	Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To:	Emmanuel Cecchet <manu(at)asterdata(dot)com>
Cc:	Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-12 01:13:03
Message-ID:	20091112101303.A955.52131E4D@oss.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Emmanuel Cecchet <manu(at)asterdata(dot)com> wrote:

> If you look at the code you will see that you can do optimizations in
> the COPY code that you cannot do in the trigger.

Since the optimizations is nice, I hope it will work not only in COPY
but also in INSERT. An idea is moving the partitioning cache into
Relation cache, and also moving the routing routines into heap_insert().
My concern is just in the modified position; I think you don't have to
change your logic itself.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc:	Emmanuel Cecchet <manu(at)asterdata(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-12 01:18:49
Message-ID:	4AFB6279.1070207@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> We can use an INSERT trigger to route tuples into partitions even now.
> Why do you need an additional router for COPY? Also, it would be nicer
> that the router can works not only in COPY but also in INSERT.

Yeah, but performance on an insert trigger is impractical for large
volumes of data.

--Josh Berkus

From:	Jan Urbański <wulczer(at)wulczer(dot)org>
To:	Emmanuel Cecchet <manu(at)asterdata(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-14 22:27:43
Message-ID:	4AFF2EDF.1080701@wulczer.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Emmanuel Cecchet wrote:
> Hi all,

Hi!,

> partitioning option for COPY

Here's the review:

== Submission ==
The patch is contextual, applies cleanly to current HEAD, compiles fine.
The docs build cleanly.

== Docs ==
They're reasonably clear, although they still mention ERROR_LOGGING,
which was taken out of this patch. They could use some wordsmithing, but
I didn't go into details, as there were more severe issues with the patch.

One thing that made me cautious was the mention that triggers modifying
tuples will make random errors appear. As is demonstrated later,
triggers are a big issue.

== Regression tests ==
They ran fine, there's one additional regression test that exercises the
new option.

== Style/nitpicks ==
Minor gripes include:
o instead of using an ad-hoc data structure for the LRU cache list, I'd
suggest an OidList from pg_list.h.
o some mentions of "method" in comments should be changed to "function"
o trailing whitespace in the patch (it's not the end of the world, of
course)

== Issues ==
Attached are 3 files that demonstrate problems the patch has.
o test1.sql always segfaults for me, poking around with gdb suggests
it's a case of an uninitialised cache list (another reason to use the
builtin one).
o test2.sql demonstrates, that indices on child tables are not being
updated, probably because after resultRelInfo in
check_tuple_constraints() gets created is never has ri_NumIndices set,
and so the code that was supposed to take care of indices is never
called. Looks like a copy-paste error.
o test3.sql demonstrates, that some triggers that I would expect to be
fired are in fact not fired. I guess it's the same reason as mentioned:
ri_TrigDesc never gets set, so the code that calls triggers is dead.

I stopped there, because unfortunately, apart from all that there's one
fundamental problem with this patch, namely "we probably don't want it".

As it stands it's more of a proof of concept than a really usable
solution, it feels like built from spare (copied from around copy.c)
parts. IMHO it's much too narrow for a general partitioning solution,
even if the design it's based upon would be accepted. It's assuming a
lot of things about the presence of child tables (with proper
constraints), the absence of triggers, and so on.

Granted, it solves a particular problem (bulk loading into a partitioned
table, with not extra features like triggers and with standard
inheritance/exclusive check constraints setup), but that's not good
enough in my opinion, even if all other issues would be addressed.

Now I'm not a real Postgres user, it's been a while since I worked in a
PG shop (or a DB shop for that matter), but from what I understand from
following this community for a while, a patch like that doesn't have a
lot of chances to be committed. That said, my puny experience with real
PG installations and their needs must be taken into account here.

I'll mark this patch as "Waiting on Author", but I have little doubt
that even after fixing those probably trivial segfaults etc. the patch
would be promptly rejected by a committer. I suggest withdrawing it from
this commitfest and trying to work out a more complete design first that
would address the needs of a bigger variety of users, or joining some of
the already underway efforts to bring full-featured partitioning into
Postgres.

Best,
Jan

From:	Jan Urbański <wulczer(at)wulczer(dot)org>
To:	Emmanuel Cecchet <manu(at)asterdata(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-14 22:31:58
Message-ID:	4AFF2FDE.9010002@wulczer.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Jan Urbański wrote:
> Emmanuel Cecchet wrote:
>> Hi all,
>
> Hi!,
>
>> partitioning option for COPY

> Attached are 3 files that demonstrate problems the patch has.

And the click-before-you-think prize winner is... me.

Test cases attached, see the comments for expected/actual results.

Jan

Attachment	Content-Type	Size
test1.sql	text/plain	460 bytes
test2.sql	text/plain	659 bytes
test3.sql	text/plain	964 bytes

From:	Emmanuel Cecchet <manu(at)asterdata(dot)com>
To:	Jan Urbański <wulczer(at)wulczer(dot)org>
Cc:	Emmanuel Cecchet <manu(at)asterdata(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-15 23:41:15
Message-ID:	4B00919B.4050309@asterdata.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Jan,

Here is a new version of the patch. Find the response to your comments
embedded in the text.
>> partitioning option for COPY
>>
>
> Here's the review:
>
> == Submission ==
> The patch is contextual, applies cleanly to current HEAD, compiles fine.
> The docs build cleanly.
>
> == Docs ==
> They're reasonably clear, although they still mention ERROR_LOGGING,
> which was taken out of this patch. They could use some wordsmithing, but
> I didn't go into details, as there were more severe issues with the patch.
>
Removed the text related to ERROR_LOGGING.
> One thing that made me cautious was the mention that triggers modifying
> tuples will make random errors appear. As is demonstrated later,
> triggers are a big issue.
>
Whichever way routing is implemented we will have to decide what we want
to do with triggers. We can decide to fire them or not (there was
already a debate whether COPY is an insert statement or not and should
fire the statement trigger for insert). This is not a design problem
with this patch, we just have to chose what we want to do with triggers
when partitioning is involved. IMHO we should disable them altogether
but there are scenarios where one could argue that there are still useful.
> == Regression tests ==
> They ran fine, there's one additional regression test that exercises the
> new option.
>
> == Style/nitpicks ==
> Minor gripes include:
> o instead of using an ad-hoc data structure for the LRU cache list, I'd
> suggest an OidList from pg_list.h.
>
Will do if we decide to go further with this patch.
> o some mentions of "method" in comments should be changed to "function"
> o trailing whitespace in the patch (it's not the end of the world, of
> course)
>
I guess the committer will run pg_indent anyway so I'm not too worried
about spaces.
> == Issues ==
> Attached are 3 files that demonstrate problems the patch has.
> o test1.sql always segfaults for me, poking around with gdb suggests
> it's a case of an uninitialised cache list (another reason to use the
> builtin one).
>
I was never able to reproduce that problem. I don't know where this
comes from.
> o test2.sql demonstrates, that indices on child tables are not being
> updated, probably because after resultRelInfo in
> check_tuple_constraints() gets created is never has ri_NumIndices set,
> and so the code that was supposed to take care of indices is never
> called. Looks like a copy-paste error.
>
Fixed, actually there was a leak in relcache for the index.
> o test3.sql demonstrates, that some triggers that I would expect to be
> fired are in fact not fired. I guess it's the same reason as mentioned:
> ri_TrigDesc never gets set, so the code that calls triggers is dead.
>
There is a problem with after row triggers that I did not completely
figure out. For some reason, if I use the regular mechanism by calling
ExecARInsertTrigger that differ the execution of the trigger until the
after row event is triggered, the child relation is not closed and there
is a leak in the relcache. I forced the after row triggers to execute
synchronously after inserting in the child table to work around the
problem. If someone has an explanation, I am willing to do a cleaner
implementation!
> I stopped there, because unfortunately, apart from all that there's one
> fundamental problem with this patch, namely "we probably don't want it".
>
> As it stands it's more of a proof of concept than a really usable
> solution, it feels like built from spare (copied from around copy.c)
> parts. IMHO it's much too narrow for a general partitioning solution,
> even if the design it's based upon would be accepted. It's assuming a
> lot of things about the presence of child tables (with proper
> constraints), the absence of triggers, and so on.
>
> Granted, it solves a particular problem (bulk loading into a partitioned
> table, with not extra features like triggers and with standard
> inheritance/exclusive check constraints setup), but that's not good
> enough in my opinion, even if all other issues would be addressed.
>
Well, as Postgres does not have any support for real partitioning
besides inheritance, and so far it is unlikely that another
implementation will happen in the 8.5 timeframe, this feature fills the
need for people doing data warehouses. This is a scenario used with
every single Aster customer. Now if the Postgres community does not
think that the Aster use case is general enough or of interest to be
integrated in the code base, this is a different issue and I won't spent
time arguing if this is a philosophical/political issue.
Note that the new patch works with triggers but you can easily generate
corrupt data if your triggers are modifying the data on which the
routing decision is based.
> Now I'm not a real Postgres user, it's been a while since I worked in a
> PG shop (or a DB shop for that matter), but from what I understand from
> following this community for a while, a patch like that doesn't have a
> lot of chances to be committed. That said, my puny experience with real
> PG installations and their needs must be taken into account here.
>
I don't really understand why a new option of COPY should be solving a
general problem. It's an option, and like every option, it is to solve a
particular use case. I don't see what is wrong with that.
> I'll mark this patch as "Waiting on Author", but I have little doubt
> that even after fixing those probably trivial segfaults etc. the patch
> would be promptly rejected by a committer. I suggest withdrawing it from
> this commitfest and trying to work out a more complete design first that
> would address the needs of a bigger variety of users, or joining some of
> the already underway efforts to bring full-featured partitioning into
> Postgres.
>
I have integrated your tests in the regression test suite and I was
never able to reproduce the segfault you mentioned. What platform are
you using?

Thanks for your valuable feedback
Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com

Attachment	Content-Type	Size
aster-copy-partitioning-patch-8.5-contextv2.txt	text/plain	49.8 KB

From:	Jan Urbański <wulczer(at)wulczer(dot)org>
To:	Emmanuel Cecchet <manu(at)asterdata(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-17 00:36:59
Message-ID:	4B01F02B.8000201@wulczer.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

I'll hopefully look at the next version of the patch tommorrow.

Emmanuel Cecchet wrote:
>> o test1.sql always segfaults for me, poking around with gdb suggests
>> it's a case of an uninitialised cache list (another reason to use the
>> builtin one).
>>
> I was never able to reproduce that problem. I don't know where this
> comes from.

> I have integrated your tests in the regression test suite and I was
> never able to reproduce the segfault you mentioned. What platform are
> you using?

In the meantime I tried the test1.sql file again and it still segfaulted
for me.
I'm using 32bit Linux, PG compiled with:

$ ./configure CFLAGS=-O0 --enable-cassert --enable-debug --without-perl
--without-python --without-openssl --without-tcl

and then I start postmaster, fire up psql, attach gdb to the backend, do
\i test1.sql and get:

Program received signal SIGSEGV, Segmentation fault.
0x0819368b in route_tuple_to_child (parent_relation=0xb5d93040,
tuple=0x873b08c, hi_options=0, parentResultRelInfo=0x871e204) at copy.c:1821
1821 child_relation_id =
child_oid_cell->oid_value;
(gdb) bt
#0 0x0819368b in route_tuple_to_child (parent_relation=0xb5d93040,
tuple=0x873b08c, hi_options=0, parentResultRelInfo=0x871e204) at copy.c:1821
#1 0x081950e3 in CopyFrom (cstate=0x871e0dc) at copy.c:2480
#2 0x08192532 in DoCopy (stmt=0x86fb144, queryString=0x86fa73c "copy
parent from stdin with (partitioning);") at copy.c:1227

(gdb) p child_oid_cell
$1 = (OidCell *) 0x7f7f7f7f

(gdb) p child_oid_cell->oid_value
Cannot access memory at address 0x7f7f7f7f

That 0x7f7f7f7f looks like clobbered memory, the memory management funcs
do that when cassert is enabled, IIRC.

Cheers,
Jan

--
Jan Urbanski
GPG key ID: E583D7D2

ouden estin

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Jan Urbański <wulczer(at)wulczer(dot)org>
Cc:	Emmanuel Cecchet <manu(at)asterdata(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-17 01:58:23
Message-ID:	17544.1258423103@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <wulczer(at)wulczer(dot)org> writes:
> Program received signal SIGSEGV, Segmentation fault.
> 0x0819368b in route_tuple_to_child (parent_relation=0xb5d93040,
> tuple=0x873b08c, hi_options=0, parentResultRelInfo=0x871e204) at copy.c:1821
> 1821 child_relation_id =
> child_oid_cell->oid_value;
> (gdb) p child_oid_cell
> $1 = (OidCell *) 0x7f7f7f7f

This looks like the patch is trying to create a data structure in a
memory context that's not sufficiently long-lived for the use of the
structure. If you do this in a non-cassert build, it will seem to
work, some of the time, if the memory in question happens to not
get reallocated to something else.

A good rule of thumb is to never do code development in a non-cassert
build. You're just setting yourself up for failure.

regards, tom lane

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Jan Urbański <wulczer(at)wulczer(dot)org>, Emmanuel Cecchet <manu(at)asterdata(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-17 02:25:34
Message-ID:	4B02099E.6000609@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> A good rule of thumb is to never do code development in a non-cassert
> build.
And the same rule goes for review, too; I'll update the review
guidelines to spell that out more clearly. Basically, if you're doing
any work on new code, you should have cassert turned on, *except* if
you're doing performance testing. The asserts slow things down enough
(particularly with large shared_buffers values) to skew performance
tests, but in all other coding situations you should have them enabled.

--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.com

From:	Emmanuel Cecchet <manu(at)asterdata(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Jan Urbański <wulczer(at)wulczer(dot)org>, Emmanuel Cecchet <manu(at)asterdata(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-17 14:59:58
Message-ID:	4B02BA6E.4060303@asterdata.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> =?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <wulczer(at)wulczer(dot)org> writes:
>
>> Program received signal SIGSEGV, Segmentation fault.
>> 0x0819368b in route_tuple_to_child (parent_relation=0xb5d93040,
>> tuple=0x873b08c, hi_options=0, parentResultRelInfo=0x871e204) at copy.c:1821
>> 1821 child_relation_id =
>> child_oid_cell->oid_value;
>> (gdb) p child_oid_cell
>> $1 = (OidCell *) 0x7f7f7f7f
>>
>
> This looks like the patch is trying to create a data structure in a
> memory context that's not sufficiently long-lived for the use of the
> structure. If you do this in a non-cassert build, it will seem to
> work, some of the time, if the memory in question happens to not
> get reallocated to something else.
>
I was using the CacheMemoryContext. Could someone tell me why this is
wrong and what should have been the appropriate context to use?

Thanks
Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Emmanuel Cecchet <manu(at)asterdata(dot)com>
Cc:	Jan Urbański <wulczer(at)wulczer(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-17 15:03:39
Message-ID:	28321.1258470219@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Emmanuel Cecchet <manu(at)asterdata(dot)com> writes:
> Tom Lane wrote:
>> This looks like the patch is trying to create a data structure in a
>> memory context that's not sufficiently long-lived for the use of the
>> structure. If you do this in a non-cassert build, it will seem to
>> work, some of the time, if the memory in question happens to not
>> get reallocated to something else.
>>
> I was using the CacheMemoryContext. Could someone tell me why this is
> wrong and what should have been the appropriate context to use?

Well, (a) I doubt you really were creating the list in
CacheMemoryContext, else it'd have not gotten clobbered; (b) creating
statement-local data structures in CacheMemoryContext is entirely
unacceptable anyway, because then they represent a permanent memory
leak.

The right context for statement-lifetime data structures is generally
the CurrentMemoryContext the statement code is called with.

regards, tom lane

From:	Emmanuel Cecchet <manu(at)asterdata(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, Jan Urbański <wulczer(at)wulczer(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-17 15:31:11
Message-ID:	4B02C1BF.8060009@asterdata.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> Emmanuel Cecchet <manu(at)asterdata(dot)com> writes:
>
>> Tom Lane wrote:
>>
>>> This looks like the patch is trying to create a data structure in a
>>> memory context that's not sufficiently long-lived for the use of the
>>> structure. If you do this in a non-cassert build, it will seem to
>>> work, some of the time, if the memory in question happens to not
>>> get reallocated to something else.
>>>
>>>
>> I was using the CacheMemoryContext. Could someone tell me why this is
>> wrong and what should have been the appropriate context to use?
>>
>
> Well, (a) I doubt you really were creating the list in
> CacheMemoryContext, else it'd have not gotten clobbered; (b) creating
> statement-local data structures in CacheMemoryContext is entirely
> unacceptable anyway, because then they represent a permanent memory
> leak.
>
Well I thought that this code would do it:

child_table_lru = (OidLinkedList *)MemoryContextAlloc(
+ CacheMemoryContext, sizeof(OidLinkedList));
...
+ /* Add the new entry in head of the list */
+ new_head = (OidCell *) MemoryContextAlloc(
+ CacheMemoryContext, sizeof(OidCell));

> The right context for statement-lifetime data structures is generally
> the CurrentMemoryContext the statement code is called with.
>
Actually the list is supposed to stay around between statement
executions. You don't want to restart with a cold cache at every
statement so I really want this structure to stay in memory at a more
global level.

Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Emmanuel Cecchet <manu(at)asterdata(dot)com>
Cc:	Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, Jan Urbański <wulczer(at)wulczer(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-17 15:55:25
Message-ID:	29197.1258473325@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Emmanuel Cecchet <manu(at)asterdata(dot)com> writes:
> Actually the list is supposed to stay around between statement
> executions. You don't want to restart with a cold cache at every
> statement so I really want this structure to stay in memory at a more
> global level.

Cache? Why do you need a cache for COPY? Repeated bulk loads into the
same table within a single session doesn't seem to me to be a case that
is common enough to justify a cache.

(BTW, the quoted code seems to be busily reinventing OID Lists. Don't
do that.)

regards, tom lane

From:	Emmanuel Cecchet <manu(at)asterdata(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, Jan Urbański <wulczer(at)wulczer(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-17 16:19:35
Message-ID:	4B02CD17.1000907@asterdata.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> Emmanuel Cecchet <manu(at)asterdata(dot)com> writes:
>
>> Actually the list is supposed to stay around between statement
>> executions. You don't want to restart with a cold cache at every
>> statement so I really want this structure to stay in memory at a more
>> global level.
>>
>
> Cache? Why do you need a cache for COPY? Repeated bulk loads into the
> same table within a single session doesn't seem to me to be a case that
> is common enough to justify a cache.
>
Actually the cache is only activated if you use the partitioning option.
It is just a list of oids of child tables where tuples were inserted.
It is common to have multiple COPY operations in the same session when
you are doing bulk loading in a warehouse.
> (BTW, the quoted code seems to be busily reinventing OID Lists. Don't
> do that.)
>
Yes, I understood that I should use an OidList instead. But I was trying
to understand what I did wrong here (besides reinventing the oid list ;-)).
Why do I get this segfault if I use memory from CacheMemoryContext?

Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Emmanuel Cecchet <manu(at)asterdata(dot)com>
Cc:	Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, Jan Urbański <wulczer(at)wulczer(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-17 16:41:48
Message-ID:	29961.1258476108@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Emmanuel Cecchet <manu(at)asterdata(dot)com> writes:
> Tom Lane wrote:
>> Cache? Why do you need a cache for COPY?

> Actually the cache is only activated if you use the partitioning option.
> It is just a list of oids of child tables where tuples were inserted.

Umm ... why is that useful enough to be cached?

> Why do I get this segfault if I use memory from CacheMemoryContext?

Well, CacheMemoryContext will never be reset, so either you freed the
data structure yourself or there's something wrong with the pointer
you think is pointing at the data structure ...

regards, tom lane

From:	Emmanuel Cecchet <manu(at)frogthinker(dot)org>
To:	Jan Urbański <wulczer(at)wulczer(dot)org>
Cc:	Emmanuel Cecchet <manu(at)asterdata(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-19 21:25:04
Message-ID:	4B05B7B0.7030205@frogthinker.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Jan,

Here is a new version of the patch with the following modifications:
- used oid list from pg_list.h
- properly handles triggers and generate an error if needed (updated doc
as well)
- added your test cases + extra bad trigger cases

Emmanuel

> Hi,
>
> I'll hopefully look at the next version of the patch tommorrow.
>
> Emmanuel Cecchet wrote:
>
>>> o test1.sql always segfaults for me, poking around with gdb suggests
>>> it's a case of an uninitialised cache list (another reason to use the
>>> builtin one).
>>>
>>>
>> I was never able to reproduce that problem. I don't know where this
>> comes from.
>>
>
>
>> I have integrated your tests in the regression test suite and I was
>> never able to reproduce the segfault you mentioned. What platform are
>> you using?
>>
>
> In the meantime I tried the test1.sql file again and it still segfaulted
> for me.
> I'm using 32bit Linux, PG compiled with:
>
> $ ./configure CFLAGS=-O0 --enable-cassert --enable-debug --without-perl
> --without-python --without-openssl --without-tcl
>
> and then I start postmaster, fire up psql, attach gdb to the backend, do
> \i test1.sql and get:
>
> Program received signal SIGSEGV, Segmentation fault.
> 0x0819368b in route_tuple_to_child (parent_relation=0xb5d93040,
> tuple=0x873b08c, hi_options=0, parentResultRelInfo=0x871e204) at copy.c:1821
> 1821 child_relation_id =
> child_oid_cell->oid_value;
> (gdb) bt
> #0 0x0819368b in route_tuple_to_child (parent_relation=0xb5d93040,
> tuple=0x873b08c, hi_options=0, parentResultRelInfo=0x871e204) at copy.c:1821
> #1 0x081950e3 in CopyFrom (cstate=0x871e0dc) at copy.c:2480
> #2 0x08192532 in DoCopy (stmt=0x86fb144, queryString=0x86fa73c "copy
> parent from stdin with (partitioning);") at copy.c:1227
>
> (gdb) p child_oid_cell
> $1 = (OidCell *) 0x7f7f7f7f
>
> (gdb) p child_oid_cell->oid_value
> Cannot access memory at address 0x7f7f7f7f
>
>
> That 0x7f7f7f7f looks like clobbered memory, the memory management funcs
> do that when cassert is enabled, IIRC.
>
> Cheers,
> Jan
>
>

--
Emmanuel Cecchet
FTO @ Frog Thinker
Open Source Development & Consulting
--
Web: http://www.frogthinker.org
email: manu(at)frogthinker(dot)org
Skype: emmanuel_cecchet

Attachment	Content-Type	Size
aster-copy-partitioning-patch-8.5-contextv4.txt	text/plain	53.5 KB

From:	Jan Urbański <wulczer(at)wulczer(dot)org>
To:	Emmanuel Cecchet <manu(at)frogthinker(dot)org>
Cc:	Emmanuel Cecchet <manu(at)asterdata(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-20 23:22:33
Message-ID:	4B0724B9.5030309@wulczer.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Emmanuel Cecchet wrote:
> Hi Jan,
>
> Here is a new version of the patch with the following modifications:
> - used oid list from pg_list.h
> - properly handles triggers and generate an error if needed (updated doc
> as well)
> - added your test cases + extra bad trigger cases

Hi,

that got broken by the WHEN triggers patch
(c6e0a36243a54eff79b47b3a0cb119fb67a55165), which changed the
TriggerEnabled function signature, the code currently does not compile.

I'll continue reading, in the meantime could you send a updated patch?

Thanks,
Jan

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Jan Urbański <wulczer(at)wulczer(dot)org>
Cc:	Emmanuel Cecchet <manu(at)frogthinker(dot)org>, Emmanuel Cecchet <manu(at)asterdata(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-20 23:40:53
Message-ID:	19669.1258760453@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <wulczer(at)wulczer(dot)org> writes:
> that got broken by the WHEN triggers patch
> (c6e0a36243a54eff79b47b3a0cb119fb67a55165), which changed the
> TriggerEnabled function signature, the code currently does not compile.

[ squint... ] What is that patch doing touching the innards of
trigger.c in the first place? I can't see any reason for trigger.c
to be associated with partitioning.

regards, tom lane

From:	Emmanuel Cecchet <manu(at)asterdata(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Jan Urbański <wulczer(at)wulczer(dot)org>, Emmanuel Cecchet <manu(at)asterdata(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-21 00:32:58
Message-ID:	4B07353A.2050308@asterdata.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> =?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <wulczer(at)wulczer(dot)org> writes:
>
>> that got broken by the WHEN triggers patch
>> (c6e0a36243a54eff79b47b3a0cb119fb67a55165), which changed the
>> TriggerEnabled function signature, the code currently does not compile.
>>
>
> [ squint... ] What is that patch doing touching the innards of
> trigger.c in the first place? I can't see any reason for trigger.c
> to be associated with partitioning.
>
The problem I had is that if I used the standard trigger mechanism for
after row inserts on a child table where the trigger is called
asynchronously, I had a relcache leak on the child table. I tried to ask
for help on that earlier on but it got lost with other discussions on
the patch. So I tried to call the after trigger synchronously on the
child table and it worked.
So the patch is just adding a synchronous call to after row insert
triggers that is called when the tuple is moved to a child table (also
allows to detect for triggers that are messing with the routing).

I would be happy to follow any recommendation for a more elegant
solution to the problem.

Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com

From:	Emmanuel Cecchet <manu(at)asterdata(dot)com>
To:	Jan Urbański <wulczer(at)wulczer(dot)org>
Cc:	Emmanuel Cecchet <manu(at)asterdata(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-21 03:59:46
Message-ID:	4B0765B2.4040906@asterdata.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Jan,

Here is the updated patch.
Note that the new code in trigger is a copy/paste of the before row
insert trigger code modified to use the pointers of the after row
trigger functions.

Emmanuel

> Emmanuel Cecchet wrote:
>
>> Hi Jan,
>>
>> Here is a new version of the patch with the following modifications:
>> - used oid list from pg_list.h
>> - properly handles triggers and generate an error if needed (updated doc
>> as well)
>> - added your test cases + extra bad trigger cases
>>
>
> Hi,
>
> that got broken by the WHEN triggers patch
> (c6e0a36243a54eff79b47b3a0cb119fb67a55165), which changed the
> TriggerEnabled function signature, the code currently does not compile.
>
> I'll continue reading, in the meantime could you send a updated patch?
>
> Thanks,
> Jan
>
>

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com

Attachment	Content-Type	Size
aster-copy-partitioning-patch-8.5-contextv5.txt	text/plain	53.6 KB

From:	Jan Urbański <wulczer(at)wulczer(dot)org>
To:	Emmanuel Cecchet <manu(at)asterdata(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-22 00:00:05
Message-ID:	4B087F05.6060409@wulczer.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Emmanuel Cecchet wrote:
> Hi Jan,
>
> Here is the updated patch.
> Note that the new code in trigger is a copy/paste of the before row
> insert trigger code modified to use the pointers of the after row
> trigger functions.

Hi,

ok, this version applied, compiled and ran the regression tests fine. I
tried a few things and was not able to break it this time.

A couple of nitpicks first:

o) the route_tuple_to_child recurses to child tables of child tables,
which is undocumented and requires a check_stack_depth() call if it's
really desirable

o) the error messages given when a trigger modifies the tuple should be
one sentence, I suggest dropping the "Aborting insert" part

o) there are two places with "Close the relation but keep the lock"
comments. Why is in necessary to keep the locks? I confess I don't know
why *wouldn't* it be necessary, but maybe the comment could explain
that? Or is it just my lack of understanding and it should be obvious
that the lock needs to be kept?

o) the result of relation_open is explicitly cast to Relation, the
result of try_relation_open is not (a minor gripe)

And a couple of more important things:

o) the code added in trigger.c (ExecARInsertTriggersNow) is copy/pasted
from just above, I guess there was a reason why you needed that code,
but I also suspect that's a string indication that something's wrong
with the abstractions in your patch. Again I don't really know how else
you could achieve what you want. It just looks fishy if you need to
modify trigger.c to add an option to COPY.

o) publicizing ExecRelCheck might also indicate a problem, but I guess
that can be defended, as the patch is basically based on using that
function for each incoming tuple

o) the LRU OID cache is a separate optimisation that could be
separated from the patch. I didn't do any performance tests, and I trust
that a cache like that helps with some workloads, but I think we could
do a better effort that a simplistic cache like that. Also, I'm not 100%
sure it's OK to just stick it into CacheMemoryContext... Maybe it could
go into the COPY statement context? You said you don't want to start
with a cold cache always, but OTOH if you're loading into different
tables in the same backend, the cache will actually hurt...

[thinks of something really bad... types up a quick test...]

Oh, actually, the cache is outright *wrong*, as the attached test6.sql
shows. Ugh, let's just forget about that LRU cache for now.

o) the patch could use some more docs, especially about descending into
child tables.

o) my main concern is still valid: the design was never agreed upon.
The approach of using inheritance info for automatic partitioning is, at
least IMHO, too restricted. Regular INSERTs won't get routed to child
tables. Data from writable CTEs won't get routed. People wanting to do
partitioning on something else that constraints are stuffed.

I strongly suspect the patch will get rejected on the grounds of lack of
community agreement on partitioning, but I'd hate to see your work
wasted. It's not too late to open a discussion on how automatic
partitioning could work (or start working out a common proposal with the
people discussing in the "Syntax for partitioning" thread).

Marking as Waiting on Author, although I'd really like to see a solid
design being agreed upon, and then the code.

Cheers,
Jan

Attachment	Content-Type	Size
test6.sql	text/plain	468 bytes

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Jan Urbański <wulczer(at)wulczer(dot)org>
Cc:	Emmanuel Cecchet <manu(at)asterdata(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-22 05:28:39
Message-ID:	4B08CC07.20207@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.com

From:	Emmanuel Cecchet <manu(at)asterdata(dot)com>
To:	Jan Urbański <wulczer(at)wulczer(dot)org>
Cc:	Emmanuel Cecchet <manu(at)asterdata(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-22 05:49:52
Message-ID:	4B08D100.7030105@asterdata.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Jan Urbański wrote:
> o) my main concern is still valid: the design was never agreed upon.
> The approach of using inheritance info for automatic partitioning is, at
> least IMHO, too restricted. Regular INSERTs won't get routed to child
> tables. Data from writable CTEs won't get routed. People wanting to do
> partitioning on something else that constraints are stuffed.
>
Well, this patch does not claim to implement partitioning for Postgres,
it just offers partitioning as an option for COPY (and COPY only) based
on the existing mechanism in Postgres.
I have already participated in lengthy and relatively sterile
discussions on how to implement a full-blown partitioning but we never
reached the beginning of an agreement and it was decided that a
step-by-step approach would be better. I will propose another
implementation of partitioning in COPY once Postgres has another
representation than constraints on child tables to implement it.
> I strongly suspect the patch will get rejected on the grounds of lack of
> community agreement on partitioning, but I'd hate to see your work
> wasted. It's not too late to open a discussion on how automatic
> partitioning could work (or start working out a common proposal with the
> people discussing in the "Syntax for partitioning" thread).
>
This is not my call. Right now the syntax for partitioning does not
change anything to Postgres, it just adds syntactic sugar on top of the
existing implementation. It will not route anything or answer any of the
needs you mentioned in your previous point.
> Marking as Waiting on Author, although I'd really like to see a solid
> design being agreed upon, and then the code.
>
You are asking the wrong person if you want me to lead the partitioning
design discussions. I already tried once and I was unsuccessful. As
nothing as changed I don't see why I would be more successful this time.

Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com

From:	Emmanuel Cecchet <manu(at)asterdata(dot)com>
To:	Jan Urbański <wulczer(at)wulczer(dot)org>
Cc:	Emmanuel Cecchet <manu(at)asterdata(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-22 16:55:10
Message-ID:	4B096CEE.5090201@asterdata.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Jan,

> A couple of nitpicks first:
>
> o) the route_tuple_to_child recurses to child tables of child tables,
> which is undocumented and requires a check_stack_depth() call if it's
> really desirable
>
The recursive call is as deep as the inheritance hierarchy. I am not
sure what we are supposed to do if check_stack_depth() fails.
> o) the error messages given when a trigger modifies the tuple should be
> one sentence, I suggest dropping the "Aborting insert" part
>
Where are those rules about error messages specified?
> o) there are two places with "Close the relation but keep the lock"
> comments. Why is in necessary to keep the locks? I confess I don't know
> why *wouldn't* it be necessary, but maybe the comment could explain
> that? Or is it just my lack of understanding and it should be obvious
> that the lock needs to be kept?
>
As we did write to the table, we must maintain the lock on it until the
operation or transaction is complete.
> o) the result of relation_open is explicitly cast to Relation, the
> result of try_relation_open is not (a minor gripe)
>
The first cast was unnecessary, I removed it.
> And a couple of more important things:
>
> o) the code added in trigger.c (ExecARInsertTriggersNow) is copy/pasted
> from just above, I guess there was a reason why you needed that code,
> but I also suspect that's a string indication that something's wrong
> with the abstractions in your patch. Again I don't really know how else
> you could achieve what you want. It just looks fishy if you need to
> modify trigger.c to add an option to COPY.
>
As I explained to Tom, if the after row trigger is called asynchronously
I get a relcache leak on the child table at the end of the copy
operation. If the trigger is called synchronously (like a before row
trigger) it works fine. Also calling the after row trigger synchronously
allows me to detect any potential problem between the actions of the
trigger and the routing decision. I am open to any suggestion for a more
elegant solution.
> o) publicizing ExecRelCheck might also indicate a problem, but I guess
> that can be defended, as the patch is basically based on using that
> function for each incoming tuple
>
The only exposed method for checking constraints (ExecConstraints) goes
directly into an error (ereport) if the constraint checking fails.
Another option would be to add a new parameter to ExecConstraint to tell
it whether to generate an ereport or not but that would impact all
callers of that method.
> o) the LRU OID cache is a separate optimisation that could be
> separated from the patch. I didn't do any performance tests, and I trust
> that a cache like that helps with some workloads, but I think we could
> do a better effort that a simplistic cache like that. Also, I'm not 100%
> sure it's OK to just stick it into CacheMemoryContext... Maybe it could
> go into the COPY statement context? You said you don't want to start
> with a cold cache always, but OTOH if you're loading into different
> tables in the same backend, the cache will actually hurt...
>
> [thinks of something really bad... types up a quick test...]
>
> Oh, actually, the cache is outright *wrong*, as the attached test6.sql
> shows. Ugh, let's just forget about that LRU cache for now.
>
Point taken, I have removed the cache from the GUC variables and it is
now only used for the duration of the COPY operation.
> o) the patch could use some more docs, especially about descending into
> child tables.
>
Do you mean an overall comment explaining the design? Otherwise there is
a comment for every single 'if' and block of code in the patch. Be more
specific if you have a special location where you think comments are
missing or too vague.
> o) my main concern is still valid: the design was never agreed upon.
> The approach of using inheritance info for automatic partitioning is, at
> least IMHO, too restricted. Regular INSERTs won't get routed to child
> tables. Data from writable CTEs won't get routed. People wanting to do
> partitioning on something else that constraints are stuffed.
>
> I strongly suspect the patch will get rejected on the grounds of lack of
> community agreement on partitioning, but I'd hate to see your work
> wasted. It's not too late to open a discussion on how automatic
> partitioning could work (or start working out a common proposal with the
> people discussing in the "Syntax for partitioning" thread).
>
> Marking as Waiting on Author, although I'd really like to see a solid
> design being agreed upon, and then the code.
>
I already commented on that part in another message and this is not
related to that patch but to the politics of implementing partitioning
in Postgres. Now if the rejection of the patch is based on political
stances rather than technical once, I can understand that too.

Please find the new patch attached.
Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com

Attachment	Content-Type	Size
aster-copy-partitioning-patch-8.5-contextv6.txt	text/plain	39.4 KB

From:	Jan Urbański <wulczer(at)wulczer(dot)org>
To:	Emmanuel Cecchet <manu(at)asterdata(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-22 17:35:36
Message-ID:	4B097668.1000006@wulczer.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Emmanuel Cecchet wrote:
> Jan,
>
>> A couple of nitpicks first:
>>
>> o) the route_tuple_to_child recurses to child tables of child tables,
>> which is undocumented and requires a check_stack_depth() call if it's
>> really desirable
>>
> The recursive call is as deep as the inheritance hierarchy. I am not
> sure what we are supposed to do if check_stack_depth() fails.

I think that check_stack_depth() just throws an elog(ERROR) when the
stack depth is exceeded, so you can just add a check_stack_depth() call
somewhere at the beginning of route_tuple_to_child and that's it.

>> o) the error messages given when a trigger modifies the tuple should be
>> one sentence, I suggest dropping the "Aborting insert" part
>>
> Where are those rules about error messages specified?

http://www.postgresql.org/docs/current/static/error-style-guide.html

Dropping "Aborting insert" is just a suggestion, it's possible the error
message will sound OK to a native English speaker.

>> o) there are two places with "Close the relation but keep the lock"
>> comments. Why is in necessary to keep the locks? I confess I don't know
>> why *wouldn't* it be necessary, but maybe the comment could explain
>> that? Or is it just my lack of understanding and it should be obvious
>> that the lock needs to be kept?
>>
> As we did write to the table, we must maintain the lock on it until the
> operation or transaction is complete.

OK, understood.

>> o) the result of relation_open is explicitly cast to Relation, the
>> result of try_relation_open is not (a minor gripe)
>>
> The first cast was unnecessary, I removed it.

OK.

>> o) the code added in trigger.c (ExecARInsertTriggersNow) is copy/pasted
>> from just above, I guess there was a reason why you needed that code,
>> but I also suspect that's a string indication that something's wrong
>> with the abstractions in your patch.

> As I explained to Tom, if the after row trigger is called asynchronously
> I get a relcache leak on the child table at the end of the copy
> operation. If the trigger is called synchronously (like a before row
> trigger) it works fine. Also calling the after row trigger synchronously
> allows me to detect any potential problem between the actions of the
> trigger and the routing decision. I am open to any suggestion for a more
> elegant solution.

OK, my competence ends here :-) Someone with a better knowledge of the
code should comment on that, I certainly don't have a better proposal.

>> Oh, actually, the cache is outright *wrong*, as the attached test6.sql
>> shows. Ugh, let's just forget about that LRU cache for now.
>>
> Point taken, I have removed the cache from the GUC variables and it is
> now only used for the duration of the COPY operation.

OK, that looks better.

>> o) the patch could use some more docs, especially about descending into
>> child tables.
>>
> Do you mean an overall comment explaining the design? Otherwise there is
> a comment for every single 'if' and block of code in the patch. Be more
> specific if you have a special location where you think comments are
> missing or too vague.

I was thinking more about SGML docs. They could mention that BEFORE
triggers are fired both for the parent table and for the child table,
while AFTER triggers will only be called on the target table. I'd add a
sentence or two explaining what happens if you have a three-level
inheritance hierarchy (that the tuple will be inserted in the bottommost
table of the hierarchy).

>> o) my main concern is still valid: the design was never agreed upon.

> I already commented on that part in another message and this is not
> related to that patch but to the politics of implementing partitioning
> in Postgres. Now if the rejection of the patch is based on political
> stances rather than technical once, I can understand that too.

Sure, sorry if it sounded harsh. As I said I have virtually no field
experience with PG, so I might have a wrong perspective. I also don't
feel particulary eligible to judge your approach of handling automatic
partitioning designwise.

Except for the really minor things like checking stack depth and adding
a few sentences to the SGML docs, I think it's time someone more
qualified looked at the patch. If you'd like to send a new version, I'll
wait for it and mark it as ready for committer review. Thanks for
persisting with the patch and sorry for nitpicking so much :-)

Cheers,
Jan

From:	Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>
To:	Emmanuel Cecchet <manu(at)asterdata(dot)com>
Cc:	Jan Urbański <wulczer(at)wulczer(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-22 20:02:08
Message-ID:	20091122115850.A96919@megazone.bigpanda.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, 22 Nov 2009, Emmanuel Cecchet wrote:

Well, I think there are still some issues there that at least need to be
better documented.

For example,
create or replace function fi() returns trigger as '
begin
if (NEW.p is not null) then
if (select count(*) from i where i.i = NEW.p) = 0 then
raise exception ''No parent'';
end if;
end if;
return NEW;
end;
' language 'plpgsql';

create or replace function fc() returns trigger as '
begin
if (NEW.p is not null) then
if (select count(*) from c where c.i = NEW.p) = 0 then
raise exception ''No parent'';
end if;
end if;
return NEW;
end;
' language 'plpgsql';

create or replace function fp() returns trigger as '
begin
if (NEW.p is not null) then
if (select count(*) from p where p.i = NEW.p) = 0 then
raise exception ''No parent'';
end if;
end if;
return NEW;
end;
' language 'plpgsql';

drop table i;
drop table c;
drop table p cascade;

create table i(i int, p int);
create trigger tri after insert on i for each row execute procedure fi();

create table c(i int, p int);
create trigger trc after insert on c for each row execute procedure fc();

create table p(i int, p int);
create table p1 (check (i > 0 and i <= 10)) inherits (p);
create table p2 (check (i > 10 and i <= 20)) inherits (p);
create table p3 (check (i > 20 and i <= 30)) inherits (p);
create trigger trp1 after insert on p1 for each row execute procedure fp();
create trigger trp2 after insert on p2 for each row execute procedure fp();
create trigger trp3 after insert on p3 for each row execute procedure fp();

insert into i values (1,3),(2,1),(3,NULL);
copy c from stdin;
1 3
2 1
3 \N
\.
copy p from stdin with (partitioning);
1 3
2 1
3 \N
\.

gives me a successful load into i and c, but not into p with the current
patch AFAICS while a load where the 3 row is first does load.

From:	Emmanuel Cecchet <manu(at)asterdata(dot)com>
To:	Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>
Cc:	Emmanuel Cecchet <manu(at)asterdata(dot)com>, Jan Urbański <wulczer(at)wulczer(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-22 22:20:08
Message-ID:	4B09B918.1020702@asterdata.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Stephan Szabo wrote:
> On Sun, 22 Nov 2009, Emmanuel Cecchet wrote:
>
>
>> As I explained to Tom, if the after row trigger is called asynchronously
>> I get a relcache leak on the child table at the end of the copy
>> operation. If the trigger is called synchronously (like a before row
>> trigger) it works fine. Also calling the after row trigger synchronously
>> allows me to detect any potential problem between the actions of the
>> trigger and the routing decision. I am open to any suggestion for a more
>> elegant solution.
>>
>
> Well, I think there are still some issues there that at least need to be
> better documented.
>
> For example,
> create or replace function fi() returns trigger as '
> begin
> if (NEW.p is not null) then
> if (select count(*) from i where i.i = NEW.p) = 0 then
> raise exception ''No parent'';
> end if;
> end if;
> return NEW;
> end;
> ' language 'plpgsql';
>
> create or replace function fc() returns trigger as '
> begin
> if (NEW.p is not null) then
> if (select count(*) from c where c.i = NEW.p) = 0 then
> raise exception ''No parent'';
> end if;
> end if;
> return NEW;
> end;
> ' language 'plpgsql';
>
> create or replace function fp() returns trigger as '
> begin
> if (NEW.p is not null) then
> if (select count(*) from p where p.i = NEW.p) = 0 then
> raise exception ''No parent'';
> end if;
> end if;
> return NEW;
> end;
> ' language 'plpgsql';
>
> drop table i;
> drop table c;
> drop table p cascade;
>
> create table i(i int, p int);
> create trigger tri after insert on i for each row execute procedure fi();
>
> create table c(i int, p int);
> create trigger trc after insert on c for each row execute procedure fc();
>
> create table p(i int, p int);
> create table p1 (check (i > 0 and i <= 10)) inherits (p);
> create table p2 (check (i > 10 and i <= 20)) inherits (p);
> create table p3 (check (i > 20 and i <= 30)) inherits (p);
> create trigger trp1 after insert on p1 for each row execute procedure fp();
> create trigger trp2 after insert on p2 for each row execute procedure fp();
> create trigger trp3 after insert on p3 for each row execute procedure fp();
>
> insert into i values (1,3),(2,1),(3,NULL);
> copy c from stdin;
> 1 3
> 2 1
> 3 \N
> \.
> copy p from stdin with (partitioning);
> 1 3
> 2 1
> 3 \N
> \.
>
> gives me a successful load into i and c, but not into p with the current
> patch AFAICS while a load where the 3 row is first does load.
>
Well, if you don't insert anything in p (the table, try to avoid using
the same name for the table and the column in an example), copy will
insert (1,3) in p1 and then the trigger will evaluate

select count(*) from p where p.i = NEW.p => NEW.p is 3 and the only p.i available is 1.

This should return 0 rows and raise the exception. This seems normal to me.
The only reason it works for i is because you inserted the values before
the copy.

Am I missing something?
Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com

From:	Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>
To:	Emmanuel Cecchet <manu(at)asterdata(dot)com>
Cc:	Jan Urbański <wulczer(at)wulczer(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-22 22:49:52
Message-ID:	20091122143651.T2381@megazone.bigpanda.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, 22 Nov 2009, Emmanuel Cecchet wrote:

> Stephan Szabo wrote:
> > On Sun, 22 Nov 2009, Emmanuel Cecchet wrote:
> >
> >
> >> As I explained to Tom, if the after row trigger is called asynchronously
> >> I get a relcache leak on the child table at the end of the copy
> >> operation. If the trigger is called synchronously (like a before row
> >> trigger) it works fine. Also calling the after row trigger synchronously
> >> allows me to detect any potential problem between the actions of the
> >> trigger and the routing decision. I am open to any suggestion for a more
> >> elegant solution.
> >>
> >
> > Well, I think there are still some issues there that at least need to be
> > better documented.
> >
> > For example,
> > create or replace function fi() returns trigger as '
> > begin
> > if (NEW.p is not null) then
> > if (select count(*) from i where i.i = NEW.p) = 0 then
> > raise exception ''No parent'';
> > end if;
> > end if;
> > return NEW;
> > end;
> > ' language 'plpgsql';
> >
> > create or replace function fc() returns trigger as '
> > begin
> > if (NEW.p is not null) then
> > if (select count(*) from c where c.i = NEW.p) = 0 then
> > raise exception ''No parent'';
> > end if;
> > end if;
> > return NEW;
> > end;
> > ' language 'plpgsql';
> >
> > create or replace function fp() returns trigger as '
> > begin
> > if (NEW.p is not null) then
> > if (select count(*) from p where p.i = NEW.p) = 0 then
> > raise exception ''No parent'';
> > end if;
> > end if;
> > return NEW;
> > end;
> > ' language 'plpgsql';
> >
> > drop table i;
> > drop table c;
> > drop table p cascade;
> >
> > create table i(i int, p int);
> > create trigger tri after insert on i for each row execute procedure fi();
> >
> > create table c(i int, p int);
> > create trigger trc after insert on c for each row execute procedure fc();
> >
> > create table p(i int, p int);
> > create table p1 (check (i > 0 and i <= 10)) inherits (p);
> > create table p2 (check (i > 10 and i <= 20)) inherits (p);
> > create table p3 (check (i > 20 and i <= 30)) inherits (p);
> > create trigger trp1 after insert on p1 for each row execute procedure fp();
> > create trigger trp2 after insert on p2 for each row execute procedure fp();
> > create trigger trp3 after insert on p3 for each row execute procedure fp();
> >
> > insert into i values (1,3),(2,1),(3,NULL);
> > copy c from stdin;
> > 1 3
> > 2 1
> > 3 \N
> > \.
> > copy p from stdin with (partitioning);
> > 1 3
> > 2 1
> > 3 \N
> > \.
> >
> > gives me a successful load into i and c, but not into p with the current
> > patch AFAICS while a load where the 3 row is first does load.
> >
> Well, if you don't insert anything in p (the table, try to avoid using
> the same name for the table and the column in an example), copy will
> insert (1,3) in p1 and then the trigger will evaluate
>
> select count(*) from p where p.i = NEW.p => NEW.p is 3 and the only p.i available is 1.
>
> This should return 0 rows and raise the exception. This seems normal to me.
>
> The only reason it works for i is because you inserted the values before
> the copy.
>
> Am I missing something?

I believe so unless I am.

There are three separate cases being run for comparison purposes.
Multi-row insert on i where an after trigger on i checks the parents
within i, a copy on c where an after trigger on c checks the parents
within c, a copy on p (with inheritance) where an after trigger on p*
checks the parents within the p hierarchy.

So, in the case of the multi-row insert, it's inserting (1,3), but it
doesn't immediately check, it inserts (2,1) and (3,NULL) before running
the checks. The same seems to happen for the base copy. Copy with
inheritance seems to be working differently. That may or may not be okay,
but if it's different it needs to be prominently mentioned in
documentation.

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Jan Urbański <wulczer(at)wulczer(dot)org>
Cc:	Emmanuel Cecchet <manu(at)asterdata(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-23 04:02:49
Message-ID:	603c8f070911222002k31aafaa6q523a001eb7b53566@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Nov 22, 2009 at 12:35 PM, Jan Urbański <wulczer(at)wulczer(dot)org> wrote:
> I was thinking more about SGML docs. They could mention that BEFORE
> triggers are fired both for the parent table and for the child table,
> while AFTER triggers will only be called on the target table. I'd add a
> sentence or two explaining what happens if you have a three-level
> inheritance hierarchy (that the tuple will be inserted in the bottommost
> table of the hierarchy).

I have a hard time believing this is OK, even with documentation.
While it might be OK in some (many?) particular use cases to fire
triggers in this way, making COPY with the partitioning option fire
different triggers at different times than COPY without the
partitioning option - and in fact every other method of getting data
into a table, all of which are consistent with each other and with
COPY without the partitioning option - seems like a bad idea to me. I
don't think the behavior described above is OK, and I also don't think
that the changes in the timing of AFTER-trigger firing are OK. I
understand that without that change there was a relcache leak, but I
think that just means that bug needs to be found and fixed.

I would also like to see some more discussion of the basic mechanism
of this patch. Essentially, what it's trying to do is traverse the
inheritance hierarchy looking for a table whose constraints match the
current tuple, and then inserting the tuple there. First - is this a
good idea? It's embeds some assumptions about how inheritance
hierarchies are set up which don't seem totally unreasonable, but even
so I'm not sure we want to go that route. Second - in lieu of
accepting this approach, do we want to wait for Itagaki Takahiro's
partitioning syntax patch to go in (as I am hoping that it will) and
then do something more structured based on the notation introduced
there?

One thing that biases me toward thinking that maybe we should wait is
the fact that this patch relies on an MRU list to determine into which
child table a particular tuple should be inserted. If the constraints
on the child tables are not mutually exclusive, the tuple routing
won't be deterministic, which seems undesirable to me. On the other
hand, if we got rid of the MRU cache and made the order deterministic
(say, alphabetical by partition name) then I'm guessing this would be
quite slow for large numbers of partitions when most of the tuples
need to go into the later partitions.

...Robert

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Emmanuel Cecchet <manu(at)asterdata(dot)com>
Cc:	Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-23 09:36:43
Message-ID:	1258969003.27757.4259.camel@ebony
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 2009-11-11 at 19:53 -0500, Emmanuel Cecchet wrote:
> Hi,
> >> I have extracted the partitioning option for COPY (removed the error
> >> logging part) from the previous patch.
> >>
> >
> > We can use an INSERT trigger to route tuples into partitions even now.
> > Why do you need an additional router for COPY?
> Tom has already explained on the list why using a trigger was a bad idea
> (and I know we can use a trigger since I am the one who wrote it).
> If you look at the code you will see that you can do optimizations in
> the COPY code that you cannot do in the trigger.
>
> > Also, it would be nicer
> > that the router can works not only in COPY but also in INSERT.
> >
> As 8.5 will at best provide a syntactic hack on top of the existing
> constraint implementation, I think that it will not hurt to have routing
> in COPY since we will not have it anywhere otherwise.
> > BTW, I'm working on meta data of partitioning now. Your "partitioning"
> > option in COPY could be replaced with the catalog.
> >
> This implementation is only for the current 8.5 and it will not be
> needed anymore once we get a fully functional partitioning in Postgres
> which seems to be for a future version.

Yes, the trigger way of doing this is a bad way.

I regret to say that the way proposed here isn't much better, AFAICS.
Let me explain why I think that, but -1 to anyone applying this patch.

This patch proposes keeping a cache of last visited partitions to reduce
the overhead of data routing.

What I've requested is that partitioning work by using a data structure
held in relcache for inheritance parents. This differs in 3 ways from
this patch
a) it has a clearly defined location for the cached metadata, with
clearly identified and well worked out mechanisms for cache invalidation
b) the cache can be built once when it is first needed, not slowly grown
as parts of the metadata are used
c) it would be available for all parts of the server, not just COPY.

The easiest way to build that metadata is when structured partitioning
info is available. i.e. the best next action is to complete and commit
Itagaki's partitioning syntax patch. Then we can easily build the
metadata for partitioning, which can then be used in COPY for data
routing.

Anyway, I want data routing, as is the intention of this patch. I just
don't think this patch is a useful way to do it. It is too narrow in its
scope and potentially buggy in its approach to developing a cache and
using trigger-like stuff.

ISTM that with the right metadata in the right place, a cleaner and
easier solution is still possible for 8.5. The code within COPY should
really just reduce to a small piece of code to derive the correct
relation for the desired row and then use that during heap_insert().

I have just discussed partitioning with Itagaki-san at JPUG, so I know
his plans. Itagaki-san and Manu, please can you work together to make
this work for 8.5?

---

A more detailed explanation of Partitioning Metadata:

Partitioning Metadata is information held on the relcache for a table
that has child partitions. Currently, a table does not cache info about
its children, which prevents various optimisations.

We would have an extra pointer on the Relation struct that points to a
PartitioningMetadata struct. We can fill in this information when we
construct the relcache for a relation, or we can populate it on demand
the first time we attempt to use that information (if it exists).

We want to hold an array of partition boundary values. This will then
allow us to use bsearch to find the partition that a specific value
applies to. Thus it can be used for routing data from INSERTs or COPY,
can be used for identifying which partitions need to be
included/excluded from an APPEND node. Using this will be O(logN) rather
than O(N), so allowing us to have much larger number of partitions when
required. Note that it can also be used within the executor to perform
dynamic partition elimination, thus allowing us to easily implement
partition aware joins etc.

To construct the array we must sort the partition boundary values and
prove that the partition definitions do not overlap. That is much easier
to do when the partitions are explicitly defined. (Plus, there is no
requirement to have, or mechanism to specify, unique partitions
currently, although most users assume this in their usage).

I imagine we would have an API called something like
RelationIdentifyPartition() where we provide value(s) for the
PartitioningKey column(s) and we then return the Oid of the partition
that holds that value. That function would build the metadata, if not
already cached, then bsearch it to provide the Oid.

--
Simon Riggs www.2ndQuadrant.com

From:	Emmanuel Cecchet <manu(at)asterdata(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-23 14:39:33
Message-ID:	4B0A9EA5.4020708@asterdata.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon,

I think you should read the thread and the patch before making any false
statements like you did in your email.

1. The patch does not use any trigger for routing.
2. This is just an option for COPY that is useful for loading operations
in the datawarehouse world. It is not meant to implement full
partitioning as explained many times already in this thread.
3. This patch elaborates on existing mechanisms and cannot rely on a
meta-data representation of partitions which does not exist yet and will
probably not exist in 8.5

You should justify your statements when you say 'potentially buggy in
its approach to developing a cache and using trigger-like stuff'. I
understand that you don't like it because this is not what you want but
this is not my fault. This is not an implementation of partitioning like
COPY does not do update/delete/alter/...
And yes the use case is 'narrow' like any option in COPY. It is like
complaining that the CSV option is not useful because you want to load
binary dumps.

If Itagaki gets the support of the community to get his implementation
accepted, I will gladly use it. Contributing? If Aster is willing to
contribute a code monkey to implement your specs, why not but you will
have to convince them.

You should really think twice about the style of your emails that cast a
detestable tone to discussions on pg-hackers.

Emmanuel

> On Wed, 2009-11-11 at 19:53 -0500, Emmanuel Cecchet wrote:
>
>> Hi,
>>
>>>> I have extracted the partitioning option for COPY (removed the error
>>>> logging part) from the previous patch.
>>>>
>>>>
>>> We can use an INSERT trigger to route tuples into partitions even now.
>>> Why do you need an additional router for COPY?
>>>
>> Tom has already explained on the list why using a trigger was a bad idea
>> (and I know we can use a trigger since I am the one who wrote it).
>> If you look at the code you will see that you can do optimizations in
>> the COPY code that you cannot do in the trigger.
>>
>>
>>> Also, it would be nicer
>>> that the router can works not only in COPY but also in INSERT.
>>>
>>>
>> As 8.5 will at best provide a syntactic hack on top of the existing
>> constraint implementation, I think that it will not hurt to have routing
>> in COPY since we will not have it anywhere otherwise.
>>
>>> BTW, I'm working on meta data of partitioning now. Your "partitioning"
>>> option in COPY could be replaced with the catalog.
>>>
>>>
>> This implementation is only for the current 8.5 and it will not be
>> needed anymore once we get a fully functional partitioning in Postgres
>> which seems to be for a future version.
>>
>
> Yes, the trigger way of doing this is a bad way.
>
> I regret to say that the way proposed here isn't much better, AFAICS.
> Let me explain why I think that, but -1 to anyone applying this patch.
>
> This patch proposes keeping a cache of last visited partitions to reduce
> the overhead of data routing.
>
> What I've requested is that partitioning work by using a data structure
> held in relcache for inheritance parents. This differs in 3 ways from
> this patch
> a) it has a clearly defined location for the cached metadata, with
> clearly identified and well worked out mechanisms for cache invalidation
> b) the cache can be built once when it is first needed, not slowly grown
> as parts of the metadata are used
> c) it would be available for all parts of the server, not just COPY.
>
> The easiest way to build that metadata is when structured partitioning
> info is available. i.e. the best next action is to complete and commit
> Itagaki's partitioning syntax patch. Then we can easily build the
> metadata for partitioning, which can then be used in COPY for data
> routing.
>
> Anyway, I want data routing, as is the intention of this patch. I just
> don't think this patch is a useful way to do it. It is too narrow in its
> scope and potentially buggy in its approach to developing a cache and
> using trigger-like stuff.
>
> ISTM that with the right metadata in the right place, a cleaner and
> easier solution is still possible for 8.5. The code within COPY should
> really just reduce to a small piece of code to derive the correct
> relation for the desired row and then use that during heap_insert().
>
> I have just discussed partitioning with Itagaki-san at JPUG, so I know
> his plans. Itagaki-san and Manu, please can you work together to make
> this work for 8.5?
>
> ---
>
> A more detailed explanation of Partitioning Metadata:
>
> Partitioning Metadata is information held on the relcache for a table
> that has child partitions. Currently, a table does not cache info about
> its children, which prevents various optimisations.
>
> We would have an extra pointer on the Relation struct that points to a
> PartitioningMetadata struct. We can fill in this information when we
> construct the relcache for a relation, or we can populate it on demand
> the first time we attempt to use that information (if it exists).
>
> We want to hold an array of partition boundary values. This will then
> allow us to use bsearch to find the partition that a specific value
> applies to. Thus it can be used for routing data from INSERTs or COPY,
> can be used for identifying which partitions need to be
> included/excluded from an APPEND node. Using this will be O(logN) rather
> than O(N), so allowing us to have much larger number of partitions when
> required. Note that it can also be used within the executor to perform
> dynamic partition elimination, thus allowing us to easily implement
> partition aware joins etc.
>
> To construct the array we must sort the partition boundary values and
> prove that the partition definitions do not overlap. That is much easier
> to do when the partitions are explicitly defined. (Plus, there is no
> requirement to have, or mechanism to specify, unique partitions
> currently, although most users assume this in their usage).
>
> I imagine we would have an API called something like
> RelationIdentifyPartition() where we provide value(s) for the
> PartitioningKey column(s) and we then return the Oid of the partition
> that holds that value. That function would build the metadata, if not
> already cached, then bsearch it to provide the Oid.
>
>

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Emmanuel Cecchet <manu(at)asterdata(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-23 14:59:18
Message-ID:	603c8f070911230659rba97935s68e72398ef40a83b@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Nov 23, 2009 at 9:39 AM, Emmanuel Cecchet <manu(at)asterdata(dot)com> wrote:
> I think you should read the thread and the patch before making any false
> statements like you did in your email.
>
> 1. The patch does not use any trigger for routing.

Whoa, whoa! I don't think that Simon said that it did. But even if I
am wrong and he did...

> You should really think twice about the style of your emails that cast a
> detestable tone to discussions on pg-hackers.

...I certainly don't think this comment is justified. This is a
technical discussion about the best way of solving a certain problem,
and I don't believe that any of the discussion up to this point has
been anything other than civil. I can tell that you are frustrated
that your patch is not getting the support you would like to see it
get, but launching ad hominem attacks on Simon or anyone else is not
going to help.

...Robert

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Emmanuel Cecchet <manu(at)asterdata(dot)com>
Cc:	Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-23 15:23:54
Message-ID:	1258989834.27757.5745.camel@ebony
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 2009-11-23 at 09:39 -0500, Emmanuel Cecchet wrote:

> I think you should read the thread and the patch

I did read the thread and patch in full before posting. My opinions are
given to help you and the community towards a desirable common goal.

I was unaware you were developing these ideas and so was unable to
provide comments until now. My review of Kedar's patch in July did lay
out in general terms a specific implementation route for future work on
partitioning. I had thought I might not have made those comments clearly
enough, so gave a more specific description of what I consider to be a
more workable and general solution for cacheing and using partitioning
metadata.

--
Simon Riggs www.2ndQuadrant.com

From:	Emmanuel Cecchet <manu(at)asterdata(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Emmanuel Cecchet <manu(at)asterdata(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-23 15:24:26
Message-ID:	4B0AA92A.6030507@asterdata.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas wrote:
> On Mon, Nov 23, 2009 at 9:39 AM, Emmanuel Cecchet <manu(at)asterdata(dot)com> wrote:
>
>> I think you should read the thread and the patch before making any false
>> statements like you did in your email.
>>
>> 1. The patch does not use any trigger for routing.
>>
>
> Whoa, whoa! I don't think that Simon said that it did. But even if I
> am wrong and he did...
>
Quote from Simon's email: "It is too narrow in its scope and potentially
buggy in its approach to developing a cache and using trigger-like stuff."
>> You should really think twice about the style of your emails that cast a
>> detestable tone to discussions on pg-hackers.
>>
> ...I certainly don't think this comment is justified. This is a
> technical discussion about the best way of solving a certain problem,
> and I don't believe that any of the discussion up to this point has
> been anything other than civil. I can tell that you are frustrated
> that your patch is not getting the support you would like to see it
> get, but launching ad hominem attacks on Simon or anyone else is not
> going to help
We certainly don't live in the same civilization then.

I am not frustrated if my patch does not get in because of technical
considerations and I am happy so far with Jan's feedback that helped a
lot. I think there is a misunderstanding between what Simon wants
('Anyway, I want data routing, as is the intention of this patch.') and
what this patch is about. This patch is just supposed to load tuples in
a hierarchy of tables as this is a recurrent use case in datawarehouse
scenarios. It is not supposed to solve data routing in general
(otherwise that would be integrated standard in COPY and not as an option).

But it looks like it is a waste of everybody's time to continue this
discussion further. Just move the patch to the rejected patches and
let's wait for Itagaki's implementation.

Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Emmanuel Cecchet <manu(at)asterdata(dot)com>
Cc:	Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-23 15:26:16
Message-ID:	1258989976.27757.5756.camel@ebony
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 2009-11-23 at 09:39 -0500, Emmanuel Cecchet wrote:

> I think you should read the thread and the patch

I did read the thread and patch in full before posting. My opinions are
given to help you and the community towards a desirable common goal.

--
Simon Riggs www.2ndQuadrant.com

From:	Emmanuel Cecchet <manu(at)asterdata(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Emmanuel Cecchet <manu(at)asterdata(dot)com>, Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-23 15:43:24
Message-ID:	4B0AAD9C.1040109@asterdata.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs wrote:
> I was unaware you were developing these ideas and so was unable to
> provide comments until now.
The first patch was published to this list on September 10 (almost 2.5
months ago) along with the wiki page describing the problem and the
solution.
What should I have done to raise awareness further?

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Emmanuel Cecchet <manu(at)asterdata(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-23 15:49:54
Message-ID:	1258991394.27757.5898.camel@ebony
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 2009-11-23 at 10:24 -0500, Emmanuel Cecchet wrote:

> I think there is a misunderstanding between what Simon wants
> ('Anyway, I want data routing, as is the intention of this patch.') and
> what this patch is about. This patch is just supposed to load tuples in
> a hierarchy of tables as this is a recurrent use case in datawarehouse
> scenarios. It is not supposed to solve data routing in general
> (otherwise that would be integrated standard in COPY and not as an option).

I have not misunderstood. You wish to solve a very specific problem,
with very specific code. I've done that myself on occasion. My opinion
is that we should solve many of the partitioning problems with one set
of central, common code. If we do not do this we will need 3-4 times as
much code, most of which will be similar and yet must be exactly the
same. That alone is enough to block the patch's proposed method (IMHO).

> But it looks like it is a waste of everybody's time to continue this
> discussion further. Just move the patch to the rejected patches and
> let's wait for Itagaki's implementation.

The lack of discussion and design in this area has held back the last
few patches, by various authors; we should learn from that. Also,
working in isolation on narrow problems will not move us forwards as
fast as if we all work together on pieces of the whole vision for
partitioning. My piece was to think through how to link each of the
different aspects of partitioning and to propose a solution. Please join
with Itagaki to move this forwards - your further contributions will be
valuable.

--
Simon Riggs www.2ndQuadrant.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Emmanuel Cecchet <manu(at)asterdata(dot)com>
Cc:	Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-23 16:02:14
Message-ID:	1258992134.27757.5969.camel@ebony
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 2009-11-23 at 10:43 -0500, Emmanuel Cecchet wrote:
> Simon Riggs wrote:
> > I was unaware you were developing these ideas and so was unable to
> > provide comments until now.

> The first patch was published to this list on September 10 (almost 2.5
> months ago) along with the wiki page describing the problem and the
> solution.

> What should I have done to raise awareness further?

...Read my detailed comments in response to Kedar's patch and post
comments on that thread to say you didn't agree with that proposal and
that you were thinking of another way entirely. ~14 July. >4 months ago.

...Contact me personally when you saw that I hadn't responded to your
later posts, knowing that I have recent track record as a reviewer of
partitioning patches.

--
Simon Riggs www.2ndQuadrant.com

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Emmanuel Cecchet <manu(at)asterdata(dot)com>, Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-23 17:18:56
Message-ID:	18802.1258996736@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> Anyway, I want data routing, as is the intention of this patch. I just
> don't think this patch is a useful way to do it. It is too narrow in its
> scope and potentially buggy in its approach to developing a cache and
> using trigger-like stuff.

FWIW, I agree --- there are two really fundamental problems with this
patch:

* It only applies to COPY. You'd certainly want routing for INSERT as
well. And it shouldn't be necessary to specify an option.

* Building this type of infrastructure on top of independent, not
guaranteed consistent table constraints is just throwing more work
into a dead end. The patch is already full of special-case errors
for possible inconsistency of the constraints, and I don't think it's
bulletproof even so (what if someone is altering the constraints
concurrently? What if there's more than one legal destination?)
And the performance necessarily sucks.

What we need first is an explicit representation of partitioning, and
then to build routing code on top of that. I haven't looked at
Itagaki-san's syntax patch at all, but I think it's at least starting
in a sensible place.

regards, tom lane

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Emmanuel Cecchet <manu(at)asterdata(dot)com>, Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-23 17:23:21
Message-ID:	4B0AC509.20104@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs wrote:
> ...Read my detailed comments in response to Kedar's patch and post
> comments on that thread to say you didn't agree with that proposal and
> that you were thinking of another way entirely.
Useful background here is:

http://wiki.postgresql.org/wiki/Table_partitioning
http://archives.postgresql.org/pgsql-hackers/2008-01/msg00413.php
http://archives.postgresql.org/message-id/bd8134a40906080702s96c90a9q3bbb581b9bd0d5d7@mail.gmail.com
http://archives.postgresql.org/message-id/1247564358.11347.1308.camel@ebony.2ndQuadrant

The basic problem here is that Emmanuel and Aster developed a useful
answer to one of the more pressing implementation details needed here,
but did so without being involved in the much larger discussion of how
to implement general, more automated partitioning in PostgreSQL that (as
you can see from the date of the first links there) has been going on
for years already. What we did wrong as a community is not more
explicitly tell Emmanuel the above when he first submitted code a few
months ago, before he'd invested more time on a subset implementation
that was unlikely to be committed. As I already commented upthread, I
was just happy to see coding progress being made on part of the design
that nobody had hacked on before to my knowledge; I didn't consider then
how Emmanuel was going to be disappointed by the slow rate that code
would be assimilated into the design going on in this area.

What would probably be helpful here is to take the mess of raw data
above and turn it into a simpler partitioning roadmap. There's a stack
of useful patches here, multiple contributors who have gotten familiar
with the implementation details required, and enough time left that it's
possible to pull something together in time for 8.5--but only if
everyone is clear on exactly what direction to push toward. I'm going
to reread the history here myself and see if I can write something
helpful here.

--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Emmanuel Cecchet <manu(at)asterdata(dot)com>, Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-23 17:39:16
Message-ID:	1258997956.27757.6481.camel@ebony
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 2009-11-23 at 12:23 -0500, Greg Smith wrote:

> What would probably be helpful here is to take the mess of raw data
> above and turn it into a simpler partitioning roadmap.

Thanks for summarising.

I briefly tried to do that on the thread for Itagaki-san's patch. That's
a first stab at things, at least.

--
Simon Riggs www.2ndQuadrant.com

From:	Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, Emmanuel Cecchet <manu(at)asterdata(dot)com>, Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-24 04:27:03
Message-ID:	20091124132703.B070.52131E4D@oss.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> What we need first is an explicit representation of partitioning, and
> then to build routing code on top of that. I haven't looked at
> Itagaki-san's syntax patch at all, but I think it's at least starting
> in a sensible place.

I have the following development plan for partitioning.
I'll continue to use inherits-based partitioning... at least in 8.5.

8.5 Alpha 3:
Syntax and catalog changes (on-disk structure).
I think pg_dump is the biggest stopper in the phase.

8.5 Alpha 4:
Internal representation (on-memory structure), that will replace
insert-triggers first, and also replace CHECK constraints if
possible (but probably non-INSERT optimizations will slide to 8.6).

The internal representation of RANGE partitions will be an array of
pairs of { upper-value, parition-relid } for each parent table.
An insert target partition are determined using binary search on insert.
It will be faster than sequential checks of CHECK constraint
especially in large number of child tables. The array will be kept
in CacheMemoryContext or query context to reduce access to the system
catalog. RelationData or TupleDesc will have an additional field for it.

> * It only applies to COPY. You'd certainly want routing for INSERT as
> well. And it shouldn't be necessary to specify an option.

Sure. We need the routingin both INSERT and COPY. Even if Emmanuel-san's
patch will be committed in Alpha 3, the code would be discarded in Alpha 4.

> * Building this type of infrastructure on top of independent, not
> guaranteed consistent table constraints is just throwing more work
> into a dead end.

I think the current approach is not necessarily wrong for CHECK-based
partitioning, but I'd like to have more specialized or generalized
functionality for the replacement of triggers.

If we will take specialized approach, triggers will be replaced with
built-in feature. We can only use RANGE and LIST partitions.

On the other hand, it might be interesting to take some generalized
approach; For example, spliting BEFORE INSERT triggers into 3 phases:
1. Can cancel the insert and modify the new tuple.
2. Can cancel the insert, but cannot modify tuple.
3. Neigher can cancel nor modify.
We call triggers in the number order. INSERT TRIGGERs are implemented
in 2nd phase, so we're not afraid of modifing partition keys.
(3rd phase will be used for replication trigger.)

However, I think generalized one is overkill.
A specialized approach would be enough.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

From:	Nikhil Sontakke <nikhil(dot)sontakke(at)enterprisedb(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, Emmanuel Cecchet <manu(at)asterdata(dot)com>, Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-24 06:45:31
Message-ID:	a301bfd90911232245t35e8a804g6f163d8c4d3ceb77@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

>> What would probably be helpful here is to take the mess of raw data
>> above and turn it into a simpler partitioning roadmap.
>
> Thanks for summarising.
>

Yeah, excellent summary Greg. As you rightly pointed out, partitioning
needs a broad roadmap so that the community can contribute in unison.
That ways we can in future avoid decent efforts like Manu's which
might not bear any fruit because of the prevailing confusion today..

> I briefly tried to do that on the thread for Itagaki-san's patch. That's
> a first stab at things, at least.

+1. Itagaki-san's patch seems like a firm foot forward.

Regards,
Nikhils
--
http://www.enterprisedb.com

From:	Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To:	Emmanuel Cecchet <manu(at)asterdata(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-24 08:17:29
Message-ID:	1259050649.30357.52.camel@hvost1700
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 2009-11-23 at 10:24 -0500, Emmanuel Cecchet wrote:

> But it looks like it is a waste of everybody's time to continue this
> discussion further. Just move the patch to the rejected patches and
> let's wait for Itagaki's implementation.

Emmanuel, please try to work together with Itagaki san on getting the
bigger vision implemented, as this is a thing that can benefit a lot
from more people who have taken the time to learn about the parts of
code involved.

Even though this patch will not get in, most of the effort in developing
it is not actual coding, but familiarizing yourself with the other code
involved.

Coding actual patches should be easy once you know the code _and_ the
desired result.

You probably already know a lot of what is required to help us to common
goal of a clean implementation of partitioning.

> --
> Emmanuel Cecchet
> Aster Data
> Web: http://www.asterdata.com
>
>

--
Hannu Krosing http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability
Services, Consulting and Training

From:	Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To:	Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
Cc:	Emmanuel Cecchet <manu(at)asterdata(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-24 08:30:40
Message-ID:	20091124173040.B086.52131E4D@oss.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hannu Krosing <hannu(at)2ndQuadrant(dot)com> wrote:

> Even though this patch will not get in, most of the effort in developing
> it is not actual coding, but familiarizing yourself with the other code
> involved.

I just edited a wiki page for this discussion.
I hope it can be a help.
http://wiki.postgresql.org/wiki/Table_partitioning

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc:	Hannu Krosing <hannu(at)2ndQuadrant(dot)com>, Emmanuel Cecchet <manu(at)asterdata(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-24 10:15:53
Message-ID:	1259057753.27757.8783.camel@ebony
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, 2009-11-24 at 17:30 +0900, Itagaki Takahiro wrote:
> Hannu Krosing <hannu(at)2ndQuadrant(dot)com> wrote:
>
> > Even though this patch will not get in, most of the effort in developing
> > it is not actual coding, but familiarizing yourself with the other code
> > involved.
>
> I just edited a wiki page for this discussion.
> I hope it can be a help.
> http://wiki.postgresql.org/wiki/Table_partitioning
>

Good job. Looks like a clear path forwards to me.

I've made a couple of minor clarifications.

--
Simon Riggs www.2ndQuadrant.com

From:	Emmanuel Cecchet <manu(at)asterdata(dot)com>
To:	Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc:	Hannu Krosing <hannu(at)2ndQuadrant(dot)com>, Emmanuel Cecchet <manu(at)asterdata(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-24 15:08:33
Message-ID:	4B0BF6F1.8050300@asterdata.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Itagaki Takahiro wrote:
> I just edited a wiki page for this discussion.
> I hope it can be a help.
> http://wiki.postgresql.org/wiki/Table_partitioning
>
I guess the problem of handling user triggers is still open.
If we allow triggers on partitions, badly written logic could lead to
infinite loops in routing. In the case of COPY, an after statement
trigger could change all the routing decisions taken for each row. I am
not sure what the semantic should be if you have triggers defined on the
parent and child tables. Which triggers do you fire if the insert is on
the parent table but the tuple ends up in a child table?
If the new implementation hides the child tables, it might be safer to
not allow triggers on child tables altogether and use the parent table
as the single point of entry to access the partition (and define
triggers). With the current proposed implementation, would it be
possible to define a view using child tables?

Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com

From:	Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To:	Emmanuel Cecchet <manu(at)asterdata(dot)com>
Cc:	Hannu Krosing <hannu(at)2ndQuadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-25 02:16:14
Message-ID:	20091125111614.9284.52131E4D@oss.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Emmanuel Cecchet <manu(at)asterdata(dot)com> wrote:

> I guess the problem of handling user triggers is still open.
> If we allow triggers on partitions, badly written logic could lead to
> infinite loops in routing.

Infinite loops are not a partition-related problem, no?
We can also find infinite loops in user defined functions,
recursive queries, etc. I think the only thing we can do for it
is to *stop* loops instead of prevention, like max_stack_depth.

> With the current proposed implementation, would it be
> possible to define a view using child tables?

No, if you mean using a partition-view. I'm thinking we are moving
our implementation of partitioning from view-based to built-in feature.
Do you have any use-cases that requires view-based partitioning?
Was the inheritance-based partitioning not enough for it?

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

From:	Emmanuel Cecchet <manu(at)asterdata(dot)com>
To:	Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-25 03:31:40
Message-ID:	4B0CA51C.5010809@asterdata.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Itagaki Takahiro wrote:
> Emmanuel Cecchet <manu(at)asterdata(dot)com> wrote:
>
>
>> I guess the problem of handling user triggers is still open.
>> If we allow triggers on partitions, badly written logic could lead to
>> infinite loops in routing.
>>
> Infinite loops are not a partition-related problem, no?
> We can also find infinite loops in user defined functions,
> recursive queries, etc. I think the only thing we can do for it
> is to *stop* loops instead of prevention, like max_stack_depth.
>
I was thinking a trigger on child1 updating the partition key forcing
the tuple to move to child2. And then a trigger on child2 updating the
key again to move the tuple back to child1. You end up with an infinite
loop.
>> With the current proposed implementation, would it be
>> possible to define a view using child tables?
>>
>
> No, if you mean using a partition-view. I'm thinking we are moving
> our implementation of partitioning from view-based to built-in feature.
> Do you have any use-cases that requires view-based partitioning?
> Was the inheritance-based partitioning not enough for it?
>
Nevermind, I was thinking about the implications of materialized views
but Postgres does not have materialized views!

I have other questions related to create table but I will post them in
the 'syntax for partitioning' thread.

Thanks
Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com

From:	Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To:	Emmanuel Cecchet <manu(at)asterdata(dot)com>
Cc:	Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-25 10:03:17
Message-ID:	1259143397.30357.170.camel@hvost1700
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, 2009-11-24 at 10:08 -0500, Emmanuel Cecchet wrote:
> Itagaki Takahiro wrote:
> > I just edited a wiki page for this discussion.
> > I hope it can be a help.
> > http://wiki.postgresql.org/wiki/Table_partitioning
> >
> I guess the problem of handling user triggers is still open.
> If we allow triggers on partitions, badly written logic could lead to
> infinite loops in routing. In the case of COPY, an after statement
> trigger could change all the routing decisions taken for each row.

A simple update to the row can cause it to move between partitions, no ?

> I am not sure what the semantic should be if you have triggers defined on the
> parent and child tables. Which triggers do you fire if the insert is on
> the parent table but the tuple ends up in a child table?

I'd propose that triggers on both parent table and selected child are
executed.

1. first you execute before triggers on parent table, which may
change which partition the row belongs to

2. then you execute before triggers on selected child table

2.1 if this changes the child table selection repeat from 2.

3. save the tuple in child table

4. execute after triggers of the final selected child table

5. execute after triggers of parent table

order of 4. and 5. is selected arbitrarily, others are determined by
flow.

> If the new implementation hides the child tables,

If you hide child tables, you suddenly need a lot of new syntax to
"unhide" them, so that partitions can be manipulated. Currently it is
easy to do it with INHERIT / NO INHERIT.

> it might be safer to
> not allow triggers on child tables altogether and use the parent table
> as the single point of entry to access the partition (and define
> triggers). With the current proposed implementation, would it be
> possible to define a view using child tables?

the child tables are there, and they _are_ defined, either implicitly
(using constraints, which "constraint exclusion" resolves to a set of
child tables) or explicitly, using child table names directly.

--
Hannu Krosing http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability
Services, Consulting and Training

From:	Emmanuel Cecchet <manu(at)asterdata(dot)com>
To:	Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-25 13:39:27
Message-ID:	4B0D338F.1090705@asterdata.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hannu Krosing wrote:
> On Tue, 2009-11-24 at 10:08 -0500, Emmanuel Cecchet wrote:
>
>> Itagaki Takahiro wrote:
>>
>>> I just edited a wiki page for this discussion.
>>> I hope it can be a help.
>>> http://wiki.postgresql.org/wiki/Table_partitioning
>>>
>>>
>> I guess the problem of handling user triggers is still open.
>> If we allow triggers on partitions, badly written logic could lead to
>> infinite loops in routing. In the case of COPY, an after statement
>> trigger could change all the routing decisions taken for each row.
>>
>
> A simple update to the row can cause it to move between partitions, no ?
>
Yes.
>> I am not sure what the semantic should be if you have triggers defined on the
>> parent and child tables. Which triggers do you fire if the insert is on
>> the parent table but the tuple ends up in a child table?
>>
>
> I'd propose that triggers on both parent table and selected child are
> executed.
>
> 1. first you execute before triggers on parent table, which may
> change which partition the row belongs to
>
> 2. then you execute before triggers on selected child table
>
> 2.1 if this changes the child table selection repeat from 2.
>
> 3. save the tuple in child table
>
> 4. execute after triggers of the final selected child table
>
What if that trigger changes again the child table selection?
> 5. execute after triggers of parent table
>
Same here, what if the trigger changes the child table selection. Do we
re-execute triggers on the new child table?
Also it is debatable whether we should execute an after trigger on a
table where nothing was really inserted.
> order of 4. and 5. is selected arbitrarily, others are determined by
> flow.
>
Also the description omits the execution of before and after statement
triggers. While those can apply to the parent table (but the same
question about what happens if the after statement modifies routing
decision still applies), what does it mean in the case of COPY to have
statement triggers on the child tables? You cannot know in advance where
the tuples are going to go and fire the before statement triggers. If
you had to fire after statement triggers, in which order would you fire
them?
>> If the new implementation hides the child tables,
>>
>
> If you hide child tables, you suddenly need a lot of new syntax to
> "unhide" them, so that partitions can be manipulated. Currently it is
> easy to do it with INHERIT / NO INHERIT.
>
Agreed, but I think that we will discover some restrictions that will
apply to child tables.

Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Hannu Krosing <hannu(at)2ndquadrant(dot)com>
Cc:	Emmanuel Cecchet <manu(at)asterdata(dot)com>, Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-25 14:14:57
Message-ID:	603c8f070911250614v9a4af61v6910f362896fc20@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Nov 25, 2009 at 5:03 AM, Hannu Krosing <hannu(at)2ndquadrant(dot)com> wrote:
> I'd propose that triggers on both parent table and selected child are
> executed.

I was thinking we should make the partitioning decision FIRST, before
any triggers are fired, and then fire only those triggers relevant to
the selected partition. If the BEFORE triggers on the partition
modify the tuple in a way that makes it incompatible with the table
constraints on that partition, the insert (or update) fails.

Firing triggers on more than one table is pretty substantially
incompatible with what we do elsewhere and I'm not clear what we get
in exchange. What is the use case for this?

...Robert

From:	Emmanuel Cecchet <manu(at)asterdata(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Hannu Krosing <hannu(at)2ndquadrant(dot)com>, Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-25 14:21:21
Message-ID:	4B0D3D61.2070204@asterdata.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas wrote:
> On Wed, Nov 25, 2009 at 5:03 AM, Hannu Krosing <hannu(at)2ndquadrant(dot)com> wrote:
>
>> I'd propose that triggers on both parent table and selected child are
>> executed.
>>
>
> I was thinking we should make the partitioning decision FIRST, before
> any triggers are fired, and then fire only those triggers relevant to
> the selected partition. If the BEFORE triggers on the partition
> modify the tuple in a way that makes it incompatible with the table
> constraints on that partition, the insert (or update) fails.
>
> Firing triggers on more than one table is pretty substantially
> incompatible with what we do elsewhere and I'm not clear what we get
> in exchange. What is the use case for this?
>
I don't have a use case for this but I was puzzled with that when I had
to implement trigger support in COPY with partitioning.
I came to the same conclusion as you and made the operation fail if the
trigger was trying to move the tuple to another partition. However, I
had a problem with after row triggers that I had to call synchronously
to be able to detect the change. We will need something to tell us that
an after row trigger did not mess with the routing decision.

Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Emmanuel Cecchet <manu(at)asterdata(dot)com>
Cc:	Hannu Krosing <hannu(at)2ndquadrant(dot)com>, Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-25 14:35:43
Message-ID:	603c8f070911250635o614edd97v54a818884b8a4243@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Nov 25, 2009 at 9:21 AM, Emmanuel Cecchet <manu(at)asterdata(dot)com> wrote:
> Robert Haas wrote:
>> On Wed, Nov 25, 2009 at 5:03 AM, Hannu Krosing <hannu(at)2ndquadrant(dot)com>
>> wrote:
>>>
>>> I'd propose that triggers on both parent table and selected child are
>>> executed.
>>
>> I was thinking we should make the partitioning decision FIRST, before
>> any triggers are fired, and then fire only those triggers relevant to
>> the selected partition. If the BEFORE triggers on the partition
>> modify the tuple in a way that makes it incompatible with the table
>> constraints on that partition, the insert (or update) fails.
>>
>> Firing triggers on more than one table is pretty substantially
>> incompatible with what we do elsewhere and I'm not clear what we get
>> in exchange. What is the use case for this?
>
> I don't have a use case for this but I was puzzled with that when I had to
> implement trigger support in COPY with partitioning.
> I came to the same conclusion as you and made the operation fail if the
> trigger was trying to move the tuple to another partition. However, I had a
> problem with after row triggers that I had to call synchronously to be able
> to detect the change. We will need something to tell us that an after row
> trigger did not mess with the routing decision.

*scratches head*

I'm confused. Only a BEFORE ROW trigger can possibly change
anything... the return value of an AFTER ROW trigger is ignored.

...Robert

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Emmanuel Cecchet <manu(at)asterdata(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-25 16:30:47
Message-ID:	19645.1259166647@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

It seems like the easiest way to resolve this without weird corner
cases is to say that we fire triggers belonging to the parent table.
The individual partition child tables either shouldn't have triggers
at all, or we should restrict the cases in which those are considered
applicable.

As an example, what are you going to do with statement-level triggers?
Fire them for *every* child whether it receives a row or not? Doesn't
seem like the right thing.

Again, this solution presupposes an explicit concept of partitioned
tables within the system...

regards, tom lane

From:	Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To:	Emmanuel Cecchet <manu(at)asterdata(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-25 22:58:30
Message-ID:	1259189910.30357.200.camel@hvost1700
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 2009-11-25 at 08:39 -0500, Emmanuel Cecchet wrote:
> Hannu Krosing wrote:
> > On Tue, 2009-11-24 at 10:08 -0500, Emmanuel Cecchet wrote:
> >
> >> Itagaki Takahiro wrote:
> >>
> >>> I just edited a wiki page for this discussion.
> >>> I hope it can be a help.
> >>> http://wiki.postgresql.org/wiki/Table_partitioning
> >>>
> >>>
> >> I guess the problem of handling user triggers is still open.
> >> If we allow triggers on partitions, badly written logic could lead to
> >> infinite loops in routing. In the case of COPY, an after statement
> >> trigger could change all the routing decisions taken for each row.
> >>
> >
> > A simple update to the row can cause it to move between partitions, no ?
> >
> Yes.
> >> I am not sure what the semantic should be if you have triggers defined on the
> >> parent and child tables. Which triggers do you fire if the insert is on
> >> the parent table but the tuple ends up in a child table?
> >>
> >
> > I'd propose that triggers on both parent table and selected child are
> > executed.
> >
> > 1. first you execute before triggers on parent table, which may
> > change which partition the row belongs to
> >
> > 2. then you execute before triggers on selected child table
> >
> > 2.1 if this changes the child table selection repeat from 2.
> >
> > 3. save the tuple in child table
> >
> > 4. execute after triggers of the final selected child table
> >
> What if that trigger changes again the child table selection?
> > 5. execute after triggers of parent table
> >
> Same here, what if the trigger changes the child table selection. Do we
> re-execute triggers on the new child table?

After triggers can't change tuple, thus cant change routing.

> Also it is debatable whether we should execute an after trigger on a
> table where nothing was really inserted.
> > order of 4. and 5. is selected arbitrarily, others are determined by
> > flow.
> >
> Also the description omits the execution of before and after statement
> triggers. While those can apply to the parent table (but the same
> question about what happens if the after statement modifies routing
> decision still applies), what does it mean in the case of COPY to have
> statement triggers on the child tables?

What statement triggers do you mean ?

I don't think we have ON COPY triggers ?

> You cannot know in advance where
> the tuples are going to go and fire the before statement triggers. If
> you had to fire after statement triggers, in which order would you fire
> them?
> >> If the new implementation hides the child tables,
> >>
> >
> > If you hide child tables, you suddenly need a lot of new syntax to
> > "unhide" them, so that partitions can be manipulated. Currently it is
> > easy to do it with INHERIT / NO INHERIT.
> >
> Agreed, but I think that we will discover some restrictions that will
> apply to child tables.

I think we should keep the possibility to populate partitions offline
and then plug then into table as partitions (current INHERIT) and also
to extract partition into separate table (NO INHERIT).

--
Hannu Krosing http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability
Services, Consulting and Training

From:	Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Emmanuel Cecchet <manu(at)asterdata(dot)com>, Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-25 23:04:36
Message-ID:	1259190276.30357.206.camel@hvost1700
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 2009-11-25 at 11:30 -0500, Tom Lane wrote:
> It seems like the easiest way to resolve this without weird corner
> cases is to say that we fire triggers belonging to the parent table.
> The individual partition child tables either shouldn't have triggers
> at all, or we should restrict the cases in which those are considered
> applicable.

Agreed. maybe allow only ROW-level AFTER triggers (for logging late
arrivals and updates on tables partitioned on time for example )

> As an example, what are you going to do with statement-level triggers?
> Fire them for *every* child whether it receives a row or not? Doesn't
> seem like the right thing.
>
> Again, this solution presupposes an explicit concept of partitioned
> tables within the system...

For explicit partitioned tables with hidden partitions it is of course
best to not add extra effort for allowing triggers to be defined on
those (hidden) partitions.

If the partition tables are visible, some trigger support would be good.

> regards, tom lane

--
Hannu Krosing http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability
Services, Consulting and Training

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Emmanuel Cecchet <manu(at)asterdata(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-25 23:28:29
Message-ID:	603c8f070911251528rd254525t4c13e05621a571e@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Nov 25, 2009 at 11:30 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> It seems like the easiest way to resolve this without weird corner
> cases is to say that we fire triggers belonging to the parent table.
> The individual partition child tables either shouldn't have triggers
> at all, or we should restrict the cases in which those are considered
> applicable.
>
> As an example, what are you going to do with statement-level triggers?
> Fire them for *every* child whether it receives a row or not? Doesn't
> seem like the right thing.

Just the tables that get a row? I don't know, your way may be best,
but it seems like tables on individual partitions might be useful in
some situations.

> Again, this solution presupposes an explicit concept of partitioned
> tables within the system...

...Robert

From:	Emmanuel Cecchet <manu(at)asterdata(dot)com>
To:	Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
Cc:	Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioning option for COPY
Date:	2009-11-26 15:16:57
Message-ID:	4B0E9BE9.1020904@asterdata.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hannu Krosing wrote:
> After triggers can't change tuple, thus cant change routing.
>
An after trigger can always issue an update of the tuple but that should
be trapped by the regular mechanism that will deal with updates (when we
have it available).
>> Also the description omits the execution of before and after statement
>> triggers. While those can apply to the parent table (but the same
>> question about what happens if the after statement modifies routing
>> decision still applies), what does it mean in the case of COPY to have
>> statement triggers on the child tables?
>>
>
> What statement triggers do you mean ?
>
> I don't think we have ON COPY triggers ?
>
I mean

CREATE TRIGGER /name/ { BEFORE | AFTER } /event/
ON /table/ FOR EACH STATEMENT

Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com