Re: On partitioning

From: "Amit Langote" <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: "'Robert Haas'" <robertmhaas(at)gmail(dot)com>
Cc: "'Andres Freund'" <andres(at)2ndquadrant(dot)com>, "'Alvaro Herrera'" <alvherre(at)2ndquadrant(dot)com>, "'Bruce Momjian'" <bruce(at)momjian(dot)us>, "'Pg Hackers'" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: On partitioning
Date: 2014-11-26 01:20:35
Message-ID: 04da01d00917$2dde1010$899a3030$@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hi,

> > I'm wondering here if it's better to keep partition values per partition
> > wherein we have two catalogs, say, pg_partitioned_rel and pg_partition_def.
> >
> > pg_partitioned_rel stores information like partition kind, key (attribute
> > number(s)?), key opclass(es). Optionally, we could also say here if a given
> > record (in pg_partitioned_rel) represents an actual top-level partitioned table
> > or a partition that is sub-partitioned (wherein this record is just a dummy for
> > keys of sub-partitioning and such); something like partisdummy...
> >
> > pg_partition_def stores information of individual partitions (/sub-partitions,
> > too?) such as its parent (either an actual top level partitioned table or a sub-
> > partitioning template), whether this is an overflow/default partition, and
> > partition values.
>
> Yeah, you could do something like this. There's a certain overhead to
> adding additional system catalogs, though. It means more inodes on
> disk, probably more syscaches, and more runtime spent probing those
> additional syscache entries to assemble a relcache entry. On the
> other hand, it's got a certain conceptual cleanliness to it.
>

Hmm, this could be a concern.

> I do think at a very minimum it's important to have a Boolean flag in
> pg_class so that we need not probe what you're calling
> pg_partitioned_rel if no partitioning information is present there. I
> might be tempted to go further and add the information you are
> proposing to put in pg_partitioned_rel in pg_class instead, and just
> add one new catalog. But it depends on how many columns we end up
> with.
>

I think something like pg_class.relispartitioned would be good as a minimum like you said.

> Before going too much further with this I'd mock up schemas for your
> proposed catalogs and a list of DDL operations to be supported, with
> the corresponding syntax, and float that here for comment.
>

I came up with something like the following:

* Catalog schema:

CREATE TABLE pg_catalog.pg_partitioned_rel
(
partrelid oid NOT NULL,
partkind oid NOT NULL,
partissub bool NOT NULL,
partkey int2vector NOT NULL, -- partitioning attributes
partopclass oidvector,

PRIMARY KEY (partrelid, partissub),
FOREIGN KEY (partrelid) REFERENCES pg_class (oid),
FOREIGN KEY (partopclass) REFERENCES pg_opclass (oid)
)
WITHOUT OIDS ;

CREATE TABLE pg_catalog.pg_partition_def
(
partitionid oid NOT NULL,
partitionparentrel oid NOT NULL,
partitionisoverflow bool NOT NULL,
partitionvalues anyarray,

PRIMARY KEY (partitionid),
FOREIGN KEY (partitionid) REFERENCES pg_class(oid)
)
WITHOUT OIDS;

ALTER TABLE pg_catalog.pg_class ADD COLUMN relispartitioned;

pg_partitioned_rel stores the partitioning information for a partitioned relation. A pg_class relation has pg_partitioned_rel entry if pg_class.relispartitioned is 'true'. Though this can be challenged by saying we will want to store sub-partitioning key here too. Do we want a partition relation to be called partitioned itself for the purpose of underlying subpartitions? 'partissub' would be true in that case.

pg_partition_def has a row for each relation that has defined restrictions on the data that partkey column can take, aka a partition. The data is known to be within the bounds defined by partitionvalues. Perhaps we could divide this into two viz. rangeupperbound and listvalues for two partition types. When we will get to multi-level partitioning (sub-partitioning), the partitions described here would actually be either data containing relations (lowest level) or placeholder relations (upper-level). The parentrel is supposed to make it easier to scan for all partitions of a given partitioned relation. The partitioning hierarchy also stays in the form of inheritance stored elsewhere (pg_inherits).

The main reasoning behind two separate catalogs (or at least keeping partition definitions separate) is to make life easier during future enhancements like sub-partitioning.

* DDL syntax (no multi-column partitioning, sub-partitioning support as yet):

-- create partitioned table and child partitions at once.
CREATE TABLE parent (...)
PARTITION BY [ RANGE | LIST ] (key_column) [ opclass ]
[ (
PARTITION child
{
VALUES LESS THAN { ... | MAXVALUE } -- for RANGE
| VALUES [ IN ] ( { ... | DEFAULT } ) -- for LIST
}
[ WITH ( ... ) ] [ TABLESPACE tbs ]
[, ...]
) ] ;

-- define partitioning key on a table
ALTER TABLE parent PARTITION BY [ RANGE | LIST ] ( key_column ) [ opclass ] [ (...) ] ;

-- create a new partition on a partitioned table with specified values
CREATE PARTITION child ON parent VALUES ...;

-- drop a partition of a partitioned table with specified values
DROP PARTITION child ON parent VALUES ...;

-- attach table as a partition to a partitioned table
ALTER TABLE parent ATTACH PARTITION child VALUES ... ;

-- detach a partition (child continues to exist as a regular table)
ALTER TABLE parent DETACH PARTITION child ;

Thanks,
Amit

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2014-11-26 01:26:37 Re: On partitioning
Previous Message Adam Brightwell 2014-11-26 00:55:46 Re: Role Attribute Bitmask Catalog Representation