Re: On partitioning

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: On partitioning
Date: 2014-11-20 16:13:48
Message-ID: CA+TgmoZ-HJOTPfZcnPOh27881xn4N1QGtY7J4nG+v6tV9cJjNw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 19, 2014 at 10:27 PM, Amit Langote
<Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> Maybe as anyarray, but I think pg_node_tree
>> might even be better. That can also represent data of some arbitrary
>> type, but it doesn't enforce that everything is uniform. So you could
>> have a list of objects of the form {RANGEPARTITION :lessthan {CONST
>> ...} :partition 16982} or similar. The relcache could load that up
>> and convert the list to a C array, which would then be easy to
>> binary-search.
>>
>> As you say, you also need to store the relevant operator somewhere,
>> and the fact that it's a range partition rather than list or hash,
>> say.
>
> I'm wondering here if it's better to keep partition values per partition wherein we have two catalogs, say, pg_partitioned_rel and pg_partition_def.
>
> pg_partitioned_rel stores information like partition kind, key (attribute number(s)?), key opclass(es). Optionally, we could also say here if a given record (in pg_partitioned_rel) represents an actual top-level partitioned table or a partition that is sub-partitioned (wherein this record is just a dummy for keys of sub-partitioning and such); something like partisdummy...
>
> pg_partition_def stores information of individual partitions (/sub-partitions, too?) such as its parent (either an actual top level partitioned table or a sub-partitioning template), whether this is an overflow/default partition, and partition values.

Yeah, you could do something like this. There's a certain overhead to
adding additional system catalogs, though. It means more inodes on
disk, probably more syscaches, and more runtime spent probing those
additional syscache entries to assemble a relcache entry. On the
other hand, it's got a certain conceptual cleanliness to it.

I do think at a very minimum it's important to have a Boolean flag in
pg_class so that we need not probe what you're calling
pg_partitioned_rel if no partitioning information is present there. I
might be tempted to go further and add the information you are
proposing to put in pg_partitioned_rel in pg_class instead, and just
add one new catalog. But it depends on how many columns we end up
with.

Before going too much further with this I'd mock up schemas for your
proposed catalogs and a list of DDL operations to be supported, with
the corresponding syntax, and float that here for comment.

> Such a scheme would be similar to what Greenplum [1] has.

Interesting.

> Perhaps this duplicates inheritance and can be argued in that sense, though.
>
> Do you think keeping partition defining values with the top-level partitioned table would make some partitioning schemes (multikey, sub- , etc.) a bit complicated to implement? I cannot offhand imagine the actual implementation difficulties that might be involved myself but perhaps you have a better idea of such details and would have a say...

I don't think this is a big deal one way or the other. We're all
database folks here, so deciding to normalize for performance or
denormalize for conceptual cleanliness shouldn't tax our powers
unduly.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-11-20 16:18:03 Re: Bugfix and new feature for PGXS
Previous Message Tom Lane 2014-11-20 16:05:32 Re: Doing better at HINTing an appropriate column within errorMissingColumn()