Re: On partitioning

From: "Amit Langote" <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: "'Robert Haas'" <robertmhaas(at)gmail(dot)com>, "'Amit Kapila'" <amit(dot)kapila16(at)gmail(dot)com>
Cc: "'Andres Freund'" <andres(at)2ndquadrant(dot)com>, "'Alvaro Herrera'" <alvherre(at)2ndquadrant(dot)com>, "'Bruce Momjian'" <bruce(at)momjian(dot)us>, "'Pg Hackers'" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: On partitioning
Date: 2014-12-12 04:43:46
Message-ID: 007b01d015c6$373c21e0$a5b465a0$@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> -----Original Message-----
> From: Robert Haas [mailto:robertmhaas(at)gmail(dot)com]
> On Thu, Dec 11, 2014 at 12:00 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> > Yeah either this way or what Josh has suggested upthread, the main
> > point was that if at all we want to support multi-column list partitioning
> > then we need to have slightly different syntax, however I feel that we
> > can leave multi-column list partitioning for first version.
>
> Yeah, possibly.
>
> I think we could stand to have a lot more discussion about the syntax
> here. So far the idea seems to be to copy what Oracle has, but it's
> not clear if we're going to have exactly what Oracle has or something
> subtly different. I personally don't find the Oracle syntax very
> PostgreSQL-ish. Stuff like "VALUES LESS THAN 500" doesn't sit
> especially well with me - less than according to which opclass? Are
> we going to insist that partitioning must use the default btree
> opclass so that we can use that syntax? That seems kind of lame.
>

Syntax like VALUES LESS THAN 500 also means, we then have to go figure out what's that partition's lower bound based on upper bound of the previous one. Forget holes in the range if they matter. I expressed that concern elsewhere in favour of having available both a range's lower and upper bounds.

> There are lots of interesting things we could do here, e.g.:
>
> CREATE TABLE parent_name PARTITION ON (column [ USING opclass ] [, ... ]);

So, no PARTITION BY [RANGE | LIST] clause huh?

What we are calling pg_partitioned_rel would obtain following bits of information from such a definition of a partitioned relation:

* column(s) to partition on and respective opclass(es)
* the level this partitioned relation lies in the partitioning hierarchy
(determining its relkind and storage qualification)

By the way, I am not sure how we define a partitioning key on a partition (in other words, a subpartitioning key on the corresponding partitioned relation). Perhaps (only) via ALTER TABLE on a partition relation?

> CREATE TABLE child_name PARTITION OF parent_name
> FOR { (value, ...) [ TO (value, ...) ] } [, ...];
>

So it's still a CREATE "TABLE" but the part 'PARTITION OF' turns this "table" into something having characteristics of a partition relation getting all kinds of new treatments at various places. It appears there is a redistribution of table-characteristics between a partitioned relation and its partition. We take away storage from the former and instead give it to the latter. On the other hand, the latter's data is only accessible through the former perhaps with escape routes for direct access via some special syntax attached to various access commands. We also stand to lose certain abilities with a partitioned relation such as not able to define a unique constraint (other than what partition key could potentially help ensure) or use it as target of foreign key constraint (just reiterating).

What we call pg_partition_def obtains following bits of information from such a definition of a partition relation:

* parent relation (partitioned relation this is partition of)
* partition kind (do we even want to keep carrying this
around as a separate field in catalog?)
* values this partition holds

The last part being the most important.

In case of what we would have called a 'LIST' partition, this could look like

... FOR VALUES (val1, val2, val3, ...)

Assuming we only support partition key to contain only one column in such a case.

In case of what we would have called a 'RANGE' partition, this could look like

... FOR VALUES (val1min, val2min, ...) TO (val1max, val2max, ...)

How about BETWEEN ... AND ... ?

Here we allow a partition key to contain more than one column.

> So instead of making a hard distinction between range and list
> partitioning, you can say:
>
> CREATE TABLE child_name PARTITION OF parent_name FOR (3), (5), (7);
> CREATE TABLE child2_name PARTITION OF parent_name FOR (8) TO (12);
> CREATE TABLE child2_name PARTITION OF parent_name FOR (20) TO (30),
> (120) TO (130);
>

I would include the noise keyword VALUES just for readability if anything.

> Now that might be a crappy idea for various reasons, but the point is
> there are a lot of details to be hammered out with the syntax, and
> there are several ways we can go wrong. If we choose an
> overly-limiting syntax, we're needlessly restricting what can be done.
> If we choose an overly-permissive syntax, we'll restrict the
> optimization opportunities.
>

I am not sure but perhaps RANGE and LIST as partitioning kinds may as well just be noise keywords. We can parse those values into a parse node such that we don’t have to care about whether they describe partition as being one kind or the other. Say a List of something like,

typedef struct PartitionColumnValue
{
NodeTag type,
Oid *partitionid,
char *partcolname,
Node *partrangelower,
Node *partrangeupper,
List *partlistvalues
};

Or we could still add a (char) partkind just to say which of the fields matter.

We don't need any defining values here for hash partitions if and when we add support for the same. We would either be using a system-wide common hash function or we could add something with partitioning key definition.

Thanks,
Amit

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2014-12-12 04:55:08 Re: On partitioning
Previous Message Michael Paquier 2014-12-12 03:33:24 Re: Compression of full-page-writes