Re: On partitioning

From: "Amit Langote" <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: "'Robert Haas'" <robertmhaas(at)gmail(dot)com>
Cc: "'Andres Freund'" <andres(at)2ndquadrant(dot)com>, "'Alvaro Herrera'" <alvherre(at)2ndquadrant(dot)com>, "'Bruce Momjian'" <bruce(at)momjian(dot)us>, "'Pg Hackers'" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: On partitioning
Date: 2014-12-03 03:43:45
Message-ID: 015b01d00eab$56d37700$047a6500$@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hi Robert,

From: Robert Haas [mailto:robertmhaas(at)gmail(dot)com]
> > * Catalog schema:
> >
> > CREATE TABLE pg_catalog.pg_partitioned_rel
> > (
> > partrelid oid NOT NULL,
> > partkind oid NOT NULL,
> > partissub bool NOT NULL,
> > partkey int2vector NOT NULL, -- partitioning attributes
> > partopclass oidvector,
> >
> > PRIMARY KEY (partrelid, partissub),
> > FOREIGN KEY (partrelid) REFERENCES pg_class (oid),
> > FOREIGN KEY (partopclass) REFERENCES pg_opclass (oid)
> > )
> > WITHOUT OIDS ;
>
> So, we're going to support exactly two levels of partitioning?
> partitions with partissub=false and subpartitions with partissub=true?
> Why not support only one level of partitioning here but then let the
> children have their own pg_partitioned_rel entries if they are
> subpartitioned? That seems like a cleaner design and lets us support
> an arbitrary number of partitioning levels if we ever need them.
>

Yeah, that's what I thought at some point in favour of dropping partissub altogether.

However, not that this design solves it, there is one question - if we would want to support defining for a table both partition key and sub-partition key in advance? That is, without having defined a first level partition yet; in that case, what level do we associate sub-(sub-) partitioning key with or more to the point where do we keep it? One way is to replace partissub by partkeylevel with level 0 being the topmost-level partitioning key and so on while keeping the partrelid equal to the pg_class.oid of the parent. That brings us to next question of managing hierarchies in pg_partition_def corresponding to partkeylevel in the definition of topmost partitioned relation. But I guess those are implementation details rather than representational unless I am being too naïve.

> > CREATE TABLE pg_catalog.pg_partition_def
> > (
> > partitionid oid NOT NULL,
> > partitionparentrel oid NOT NULL,
> > partitionisoverflow bool NOT NULL,
> > partitionvalues anyarray,
> >
> > PRIMARY KEY (partitionid),
> > FOREIGN KEY (partitionid) REFERENCES pg_class(oid)
> > )
> > WITHOUT OIDS;
> >
> > ALTER TABLE pg_catalog.pg_class ADD COLUMN relispartitioned;
>
> What is an overflow partition and why do we want that?
>

That would be a default partition. That is, where the tuples that don't belong elsewhere (other defined partitions) go. VALUES clause of the definition for such a partition would look like:

(a range partition) ... VALUES LESS THAN MAXVALUE
(a list partition) ... VALUES DEFAULT

There has been discussion about whether there shouldn't be such a place for tuples to go. That is, it should generate an error if a tuple can't go anywhere (or support auto-creating a new one like in interval partitioning?)

> What are you going to do if the partitioning key has two columns of
> different data types?
>

Sorry, this totally eluded me. Perhaps, the 'values' needs some more thought. They are one of the most crucial elements of the scheme.

I wonder if your suggestion of pg_node_tree plays well here. This then could be a list of CONSTs or some such... And I am thinking it's a concern only for range partitions, no? (that is, a multicolumn partition key)

I think partkind switches the interpretation of the field as appropriate. Am I missing something? By the way, I had mentioned we could have two values fields each for range and list partition kind.

> > * DDL syntax (no multi-column partitioning, sub-partitioning support as yet):
> >
> > -- create partitioned table and child partitions at once.
> > CREATE TABLE parent (...)
> > PARTITION BY [ RANGE | LIST ] (key_column) [ opclass ]
> > [ (
> > PARTITION child
> > {
> > VALUES LESS THAN { ... | MAXVALUE } -- for RANGE
> > | VALUES [ IN ] ( { ... | DEFAULT } ) -- for LIST
> > }
> > [ WITH ( ... ) ] [ TABLESPACE tbs ]
> > [, ...]
> > ) ] ;
>
> How are you going to dump and restore this, bearing in mind that you
> have to preserve a bunch of OIDs across pg_upgrade? What if somebody
> wants to do pg_dump --table name_of_a_partition?
>

Assuming everything's (including partitioned relation and partitions at all levels) got a pg_class entry of its own, would OIDs be a problem? Or what is the nature of this problem if it's possible that it may be.

If someone pg_dump's an individual partition as a table, we could let it be dumped as just a plain table. I am thinking we should be able to do that or should be doing just that (?)

> I actually think it will be much cleaner to declare the parent first
> and then have separate CREATE TABLE statements that glue the children
> in, like CREATE TABLE child PARTITION OF parent VALUES LESS THAN (1,
> 10000).
>

Oh, do you mean to do away without any syntax for defining partitions with CREATE TABLE parent?

By the way, do you mean the following:

CREATE TABLE child PARTITION OF parent VALUES LESS THAN (1, 10000)

Instead of,

CREATE PARTITION child ON parent VALUES LESS THAN 10000?

And as for the dump of a partitioned table, it does sound cleaner to do it piece by piece starting with the parent and its partitioning key (as ALTER on it?) followed by individual partitions using either of the syntax above. Moreover we dump a sub-partition as a partition on its parent partition.

Thanks for your time and valuable input.

Regards,
Amit

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2014-12-03 04:56:55 pgsql: Fix whitespace
Previous Message Michael Paquier 2014-12-03 03:38:26 Re: [REVIEW] Re: Compression of full-page-writes