a performence question

Lists: pgsql-general
From: Rafal Pietrak <rafal(at)zorro(dot)isa-geek(dot)com>
To: PostgreSQL <pgsql-general(at)postgresql(dot)org>
Subject: a performence question
Date: 2008-09-04 06:30:34
Message-ID: 1220509834.1366.298.camel@zorro.isa-geek.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi,

Maybe someone on this list actually have already tried this:

I'm planning to make a partitioned database. From Postgres documentation
I can see, that there are basically two methods to route INSERTS into
partitioned table:
one. is a TRIGGER
other. is a RULE

My Table will have over 1000 partitions. Some not so big, but
significant number of them will be of multimillion rows. Partitioning
will be done using a single column, on equality.... meaning:

CREATE TABLE mainlog (sel int, tm timestamp, info text,...);
CREATE TABLE mainlog_p1 (CHECK (sel=1)) INHERITS (mainlog);
CREATE TABLE mainlog_p2 (CHECK (sel=2)) INHERITS (mainlog);
...etc.

If I route INSERT with a TRIGGER, the function would look like:
CREATE .... TRIGGER...AS $$ DECLARE x RECORD; BEGIN
SELECT id INTO x FROM current_route; NEW.sel := x.id;
IF NEW.sel = 1 THEN INSERT INTO mainlog_p1 VALUES (NEW.*);
ELSE IF NEW.sel = 2 THEN INSERT INTO mainlog_p2 VALUES (NEW.*);
....
END IF;
RETURN NULL;
$$;

If I route INSETS with a RULE, I'd have something like 1000 rules hooked
up to MAINLOG, all looking like:
CREATE RULE .... ON INSERT ... WHERE EXISTS(SELECT 1 FROM current_route
WHERE id = 1) DO INSTEAD INSERT INTO mainlog_p1 VALUES SELECT
x.id,new.tm... FROM (SELECT id FROM current_route) x;
... and similar RULES for cases "WHERE id = 2", etc.

My question is, where should I expect better performance on those
INSERTS).

I would prefer a set of RULES (as I wouldn't like to rewrite TRIGGER
function every time I add a partition ... a thousand lines function),
but since they all must make a select query on CURRENT_ROUTE table, may
be that will not be particularly effective? The TRIGGER function does a
single query - may be it'll be faster? I was planning to generate some
dummy data and run a simulation, but may be someone already has that
experience? Or maybe the TRIGGER should look differently? Or the set of
RULES?

And on a similar token: Is there a way to RAISE an exception *within*
RULE definition? The only way I could imagine to achieve that is to
create a function raise_exception(), and make a "SELECT
raise_eception();" in such case. Within a RULE I cannot "CREATE TABLE",
EXECUTE, RAISE, etc .... or there is a way but I just don't know it....
without artificial use of "SELECT function()"?

-R


From: Filip Rembiałkowski <plk(dot)zuber(at)gmail(dot)com>
To: "Rafal Pietrak" <rafal(at)zorro(dot)isa-geek(dot)com>
Cc: PostgreSQL <pgsql-general(at)postgresql(dot)org>
Subject: Re: a performence question
Date: 2008-09-05 15:04:00
Message-ID: 92869e660809050803o5340c4c2se4d7e593c5533d63@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

2008/9/4 Rafal Pietrak <rafal(at)zorro(dot)isa-geek(dot)com>:
> Hi,
>
> Maybe someone on this list actually have already tried this:
>
> I'm planning to make a partitioned database. From Postgres documentation
> I can see, that there are basically two methods to route INSERTS into
> partitioned table:
> one. is a TRIGGER
> other. is a RULE
>
> My Table will have over 1000 partitions. Some not so big, but
> significant number of them will be of multimillion rows. Partitioning
> will be done using a single column, on equality.... meaning:
>
> CREATE TABLE mainlog (sel int, tm timestamp, info text,...);
> CREATE TABLE mainlog_p1 (CHECK (sel=1)) INHERITS (mainlog);
> CREATE TABLE mainlog_p2 (CHECK (sel=2)) INHERITS (mainlog);
> ...etc.
>
> If I route INSERT with a TRIGGER, the function would look like:
> CREATE .... TRIGGER...AS $$ DECLARE x RECORD; BEGIN
> SELECT id INTO x FROM current_route; NEW.sel := x.id;
> IF NEW.sel = 1 THEN INSERT INTO mainlog_p1 VALUES (NEW.*);
> ELSE IF NEW.sel = 2 THEN INSERT INTO mainlog_p2 VALUES (NEW.*);
> ....
> END IF;
> RETURN NULL;
> $$;
>
> If I route INSETS with a RULE, I'd have something like 1000 rules hooked
> up to MAINLOG, all looking like:
> CREATE RULE .... ON INSERT ... WHERE EXISTS(SELECT 1 FROM current_route
> WHERE id = 1) DO INSTEAD INSERT INTO mainlog_p1 VALUES SELECT
> x.id,new.tm... FROM (SELECT id FROM current_route) x;
> ... and similar RULES for cases "WHERE id = 2", etc.
>
> My question is, where should I expect better performance on those
> INSERTS).
>
> I would prefer a set of RULES (as I wouldn't like to rewrite TRIGGER
> function every time I add a partition ... a thousand lines function),
> but since they all must make a select query on CURRENT_ROUTE table, may
> be that will not be particularly effective? The TRIGGER function does a
> single query - may be it'll be faster? I was planning to generate some
> dummy data and run a simulation, but may be someone already has that
> experience? Or maybe the TRIGGER should look differently? Or the set of
> RULES?
>

I had a bit spare time so I tested this

see http://filip.rembialkowski.net/postgres-partitioning-performance-rules-vs-triggers/

seems that in your scenario trigger will be better.

but If I had to do this, and if performance was very important, I
would move "partition selection" logic out of the INSERT phase. the
application can know this before the actual insert. unless you want to
shift selections very often...

--
Filip Rembiałkowski


From: Rafal Pietrak <rafal(at)zorro(dot)isa-geek(dot)com>
To: Filip Rembiałkowski <plk(dot)zuber(at)gmail(dot)com>
Cc: PostgreSQL <pgsql-general(at)postgresql(dot)org>
Subject: Re: a performence question
Date: 2008-09-07 12:46:49
Message-ID: 1220791609.14522.93.camel@zorro.isa-geek.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Thenx,

The preparation of such simulation would take me much longer :(. Thenx!

But out of the results, I was wandering if this is an exotic RDMB use,
or may be it would be worth pondering at this point, of improvements to
the Postgres RULE system.

I'm not really up to any technical discussion regarding those
improvements, I can only try to express my ideas "at the syntax
level" (hoping that those might be more readable then "not so plain
English"), with the following wish-list:

wish nr.1:
The RULE version (of the "routing table" case) could possibly get
performance improvement if only one could make a single query for ALL
branches of the RULE-list, *before* evaluation of the WHERE clausure of
the RULE-list. An SQL extension would be necessary.

Syntactically, this could be expressed by a phrase: "CREATE RULE ... AS
BEFORE INSERT TO .... DO SELECT sel INTO NEW.sel FROM
current_route" (or: "... DO INSERT INTO NEW.sel FROM (SELECT sel FROM
current_route)" or: "...DO UPDATE *NEW* SET sel=cur.sel FROM
current_route cur") - thus defining a RULE which could initialize some
of the NEW.* fields with values retrieved from the database. This would
allow for omission of the "SELECT 1 FORM..." and have that replaced by
simple NEW.sel = NNN clausure .... thus bringing the "routing_table
case" performance of the RULE system in line with the case of "routing
INSERT based solely on inserted entity".

In other words, I wish there was a stage of "readying" data (in fact,
readying data *for* the WHERE clausure), before RULEs evaluate.

And: Would such "readying" be usefull for UPDATES or DELETES, too? I
think: possibly.

wish nr.2:
Should there be a way to pass "schema element" reference (meaning: a
name of the effected table) into plpsql function, writing (and
maintaining the code) of such TRIGGERs would be immensely simpler.

And thenx again for the help in evaluating the routing performance. This
helped me a lot!

-R

On Fri, 2008-09-05 at 17:04 +0200, Filip Rembiałkowski wrote:
> 2008/9/4 Rafal Pietrak <rafal(at)zorro(dot)isa-geek(dot)com>:
> > Hi,
> >
> > Maybe someone on this list actually have already tried this:
> >
> > I'm planning to make a partitioned database. From Postgres documentation
> > I can see, that there are basically two methods to route INSERTS into
> > partitioned table:
> > one. is a TRIGGER
> > other. is a RULE
> >
> > My Table will have over 1000 partitions. Some not so big, but
> > significant number of them will be of multimillion rows. Partitioning
> > will be done using a single column, on equality.... meaning:
> >
> > CREATE TABLE mainlog (sel int, tm timestamp, info text,...);
> > CREATE TABLE mainlog_p1 (CHECK (sel=1)) INHERITS (mainlog);
> > CREATE TABLE mainlog_p2 (CHECK (sel=2)) INHERITS (mainlog);
> > ...etc.
> >
> > If I route INSERT with a TRIGGER, the function would look like:
> > CREATE .... TRIGGER...AS $$ DECLARE x RECORD; BEGIN
> > SELECT id INTO x FROM current_route; NEW.sel := x.id;
> > IF NEW.sel = 1 THEN INSERT INTO mainlog_p1 VALUES (NEW.*);
> > ELSE IF NEW.sel = 2 THEN INSERT INTO mainlog_p2 VALUES (NEW.*);
> > ....
> > END IF;
> > RETURN NULL;
> > $$;
> >
> > If I route INSETS with a RULE, I'd have something like 1000 rules hooked
> > up to MAINLOG, all looking like:
> > CREATE RULE .... ON INSERT ... WHERE EXISTS(SELECT 1 FROM current_route
> > WHERE id = 1) DO INSTEAD INSERT INTO mainlog_p1 VALUES SELECT
> > x.id,new.tm... FROM (SELECT id FROM current_route) x;
> > ... and similar RULES for cases "WHERE id = 2", etc.
> >
> > My question is, where should I expect better performance on those
> > INSERTS).
> >
> > I would prefer a set of RULES (as I wouldn't like to rewrite TRIGGER
> > function every time I add a partition ... a thousand lines function),
> > but since they all must make a select query on CURRENT_ROUTE table, may
> > be that will not be particularly effective? The TRIGGER function does a
> > single query - may be it'll be faster? I was planning to generate some
> > dummy data and run a simulation, but may be someone already has that
> > experience? Or maybe the TRIGGER should look differently? Or the set of
> > RULES?
> >
>
> I had a bit spare time so I tested this
>
> see http://filip.rembialkowski.net/postgres-partitioning-performance-rules-vs-triggers/
>
> seems that in your scenario trigger will be better.
>
> but If I had to do this, and if performance was very important, I
> would move "partition selection" logic out of the INSERT phase. the
> application can know this before the actual insert. unless you want to
> shift selections very often...
>
>
>
>
>
> --
> Filip Rembiałkowski
>