Re: On partitioning

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: On partitioning
Date: 2014-12-17 03:35:27
Message-ID: CA+TgmoYuRLE-H1c2sXh8KNnHca71FWg6w4N0Wg8Zvvp0m6_NHw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 16, 2014 at 9:01 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> On 12/16/2014 05:52 PM, Robert Haas wrote:
>> But in a more complicated case where the value there isn't known until
>> runtime, yeah, it scans everything. I'm not sure what the best way to
>> fix that is. If the partition bounds were stored in a structured way,
>> as we've been discussing, then the Append or Merge Append node could,
>> when initialized, check which partition the id = X qual routes to and
>> ignore the rest. But that's more iffy with the current
>> representation, I think.
>
> Huh. I was just testing:
>
> WHERE event_time BETWEEN timestamptz '2014-12-01' and ( timestamptz
> '2014-12-01' + interval '1 month')
>
> In that case, the expression above got folded to constants by the time
> Postgres did the index scans, but it scanned all partitions. So somehow
> (timestamptz + interval) doesn't get constant-folded until after
> planning, at least not on 9.3.
>
> And of course this leaves out common patterns like "now() - interval '30
> days'" or "to_timestamp('20141201','YYYYMMDD')"
>
> Anyway, what I'm saying is that I personally regard the inability to
> handle even moderately complex expressions a major failing of our
> existing partitioning scheme (possibly its worst single failing), and I
> would regard any new partitioning feature which didn't address that
> issue as suspect.

I understand, but I think you need to be careful not to stonewall all
progress in the name of getting what you want. Getting the
partitioning metadata into the system catalogs in a suitable format
will be a huge step forward regardless of whether it solves this
particular problem right away or not, because it will make it possible
to solve this problem in a highly-efficient way, which is quite hard
to do right now.

For example, we could (right now) write code that would do run-time
partition pruning by taking the final filter clause, with all values
substituted in, and re-checking for partitions that can be pruned via
constraint exclusion. But that would be expensive and would often
fail to find anything useful. Even in the best case where it works
out it's O(n) in the number of partitions, and will therefore perform
badly for large numbers of partitions (even, say, 1000). But once the
partitioning metadata is stored in the catalog, we can implement this
as a binary search -- O(lg n) time -- and the constant factor should
be lower -- and it will be pretty easy to skip it in cases where it's
useless so that we don't waste cycles spinning our wheels. Whether
the initial patch covers all the cases you care about or not, and it
probably won't, it will be a really big step towards making it
POSSIBLE to handle those cases.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2014-12-17 03:39:22 Re: Escaping from blocked send() reprised.
Previous Message Kyotaro HORIGUCHI 2014-12-17 03:31:41 Re: [Bug] Inconsistent result for inheritance and FOR UPDATE.