Re: On partitioning

From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>
Cc: 'Robert Haas' <robertmhaas(at)gmail(dot)com>, 'Andres Freund' <andres(at)2ndquadrant(dot)com>, 'Alvaro Herrera' <alvherre(at)2ndquadrant(dot)com>, 'Bruce Momjian' <bruce(at)momjian(dot)us>, 'Pg Hackers' <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: On partitioning
Date: 2014-12-05 19:52:03
Message-ID: 54820CE3.7030706@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/5/14, 1:22 PM, Jim Nasby wrote:
> On 12/5/14, 3:42 AM, Amit Langote wrote:
>>> > I think you are right. I think in this case we need something similar
>>> >to column pg_index.indexprs which is of type pg_node_tree(which
>>> >seems to be already suggested by Robert). So may be we can proceed
>>> >with this type and see if any one else has better idea.
>> One point raised about/against pg_node_tree was the values represented therein would turn out to be too generalized to be used with advantage during planning. But, it seems we could deserialize it in advance back to the internal form (like an array of a struct) as part of the cached relation data. This overhead would only be incurred in case of partitioned tables. Perhaps this is what Robert suggested elsewhere.
>
> In order to store a composite type in a catalog, we would need to have one field that has the typid of the composite, and the field that stores the actual composite data would need to be a "dumb" varlena that stores the composite HeapTupleHeader.

On further thought; if we disallow NULL as a partition boundary, we don't need a separate rowtype; we could just use the one associated with the relation itself. Presumably that would make comparing tuples to the relation list a lot easier.

I was hung up on how that would work in the case of ALTER TABLE, but we'd have the same problem with using pg_node_tree: if you alter a table in such a way that *might* affect your partitioning, you have to do some kind of revalidation anyway.

The other option would be to use some custom rowtype to store boundary values and have a method that can form a boundary tuple from a real one. Either way, I suspect this is better than frequently evaluating pg_node_trees.

There may be one other option. If range partitions are defined in terms of an expression that is different for every partition (ie: (substr(product_key, 1, 4), date_trunc('month', sales_date))) then we could use a hash of that expression to identify a partition. In other words, range partitioning becomes a special case of hash partitioning. I do think we need a programmatic means to identify the range of an individual partition and hash won't solve that, but the performance of that case isn't critical so we could use pretty much whatever we wanted to there.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-12-05 20:02:53 Re: On partitioning
Previous Message Jim Nasby 2014-12-05 19:22:13 Re: On partitioning