Re: non-overlapping, consecutive partitions

Lists: pgsql-hackers
From: Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject: non-overlapping, consecutive partitions
Date: 2010-07-23 20:04:00
Message-ID: 89719AEB-B23A-42A4-90D7-B26D7BE49EDA@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

hello everybody,

i have just come across some issue which has been bugging me for a while.
consider:

SELECT * FROM foo ORDER BY bar;

if we have an index on bar, we can nicely optimize away the sort step by consulting the index - a btree will return sorted output.
under normal circumstances it will be seq->sort but doing some config settings we can turn this into an index scan nicely to avoid to the sort (disk space is my issue here).

this is not so easy anymore:

create table foo ( x date );
create table foo_2010 () INHERITS (foo)
create table foo_2009 () INHERITS (foo)
create table foo_2008 () INHERITS (foo)

now we add constraints to make sure that data is only in 2008, 2009 and 2010.
we assume that everything is indexed:

SELECT * FROM foo ORDER BY bar will now demand an ugly sort for this data.
this is not an option if you need more than a handful of rows ...

if constraints are non overlapping and if they are based on a "sortable" data type, we might be able to scan one index after the other and get a sorted list.
why is this an issue? imagine a case where you want to do billing, eg. some phone calls. the job now is: the last 10 calls of a customer are free and you want to sum up those which are not free.
to do that you basically need a sorted list per customer. if you have data here which is partitioned over time you are screwed up because you want to return a sorted list taken from X partitions to some higher level operation (windowing or whatever).
resorting vast amounts of data is a killer here. in the particular case i am talking about my problem is roughly 2 TB scaled out to some PL/proxy farm.

does anybody see a solution to this problem?
what are the main showstoppers to make something like this work?

many thanks,

hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


From: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
To: Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject: Re: non-overlapping, consecutive partitions
Date: 2010-07-23 20:15:41
Message-ID: 4C49F86D.9030301@cs.helsinki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 7/23/2010 11:04 PM, Hans-Jürgen Schönig wrote:
> does anybody see a solution to this problem?
> what are the main showstoppers to make something like this work?

I think we should absolutely make this work when we have a good
partitioning implementation.

That said, I don't think it's wise to put a lot of effort into making
this work with our current partitioning method when the partitioning
patches are just around the corner. The developer time should be
directed at those patches instead.

Regards,
Marko Tiikkaja


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject: Re: non-overlapping, consecutive partitions
Date: 2010-07-25 09:56:38
Message-ID: 20100725095638.GA24914@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jul 23, 2010 at 10:04:00PM +0200, Hans-Jürgen Schönig wrote:
> create table foo ( x date );
> create table foo_2010 () INHERITS (foo)
> create table foo_2009 () INHERITS (foo)
> create table foo_2008 () INHERITS (foo)
>
> now we add constraints to make sure that data is only in 2008, 2009 and 2010.
> we assume that everything is indexed:
>
> SELECT * FROM foo ORDER BY bar will now demand an ugly sort for this data.
> this is not an option if you need more than a handful of rows ...

I think the right way to approach this is to teach the planner about
merge sorts. This is, if the planner has path to foo_* all ordered by
the same key (because they have the same indexes) then it has a path to
the UNION of those tables simply by merging the results of those paths.

This would be fairly straight forward to implement I think, you may
even be able to reuse the merge sort in the normal sort machinery.
(You'll need to watch out for UNION vs UNION ALL.)

The real advantage of this approach is that you no longer have to prove
anything about the constraints or various datatypes and it is more
general. Say you have partitioned by start_date but you want to sort by
end_date, simple index scanning won't work while a merge sort will work
beautifully.

You're also not limited to how the partitioning machinery will
eventually work.

Hope this helps,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patriotism is when love of your own people comes first; nationalism,
> when hate for people other than your own comes first.
> - Charles de Gaulle


From: PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject: Re: non-overlapping, consecutive partitions
Date: 2010-07-25 14:41:03
Message-ID: 3FF38A84-45D0-434B-9510-3CD4F28FA617@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Jul 25, 2010, at 11:56 AM, Martijn van Oosterhout wrote:

> On Fri, Jul 23, 2010 at 10:04:00PM +0200, Hans-Jürgen Schönig wrote:
>> create table foo ( x date );
>> create table foo_2010 () INHERITS (foo)
>> create table foo_2009 () INHERITS (foo)
>> create table foo_2008 () INHERITS (foo)
>>
>> now we add constraints to make sure that data is only in 2008, 2009 and 2010.
>> we assume that everything is indexed:
>>
>> SELECT * FROM foo ORDER BY bar will now demand an ugly sort for this data.
>> this is not an option if you need more than a handful of rows ...
>
> I think the right way to approach this is to teach the planner about
> merge sorts. This is, if the planner has path to foo_* all ordered by
> the same key (because they have the same indexes) then it has a path to
> the UNION of those tables simply by merging the results of those paths.
>
> This would be fairly straight forward to implement I think, you may
> even be able to reuse the merge sort in the normal sort machinery.
> (You'll need to watch out for UNION vs UNION ALL.)
>
> The real advantage of this approach is that you no longer have to prove
> anything about the constraints or various datatypes and it is more
> general. Say you have partitioned by start_date but you want to sort by
> end_date, simple index scanning won't work while a merge sort will work
> beautifully.
>
> You're also not limited to how the partitioning machinery will
> eventually work.
>
> Hope this helps,

i think this is excellent input.
i will do some research going into that direction.

many thanks,

hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject: Re: non-overlapping, consecutive partitions
Date: 2010-07-25 21:32:57
Message-ID: AANLkTinE=Dcqv0tPY9yyG8Mg9fVq1eEBy=EDSsBovRy6@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/7/25 PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>:
>
> On Jul 25, 2010, at 11:56 AM, Martijn van Oosterhout wrote:
>
>> On Fri, Jul 23, 2010 at 10:04:00PM +0200, Hans-Jürgen Schönig wrote:
>>>      create table foo ( x date );
>>>      create table foo_2010 () INHERITS (foo)
>>>      create table foo_2009 () INHERITS (foo)
>>>      create table foo_2008 () INHERITS (foo)
>>>
>>> now we add constraints to make sure that data is only in 2008, 2009 and 2010.
>>> we assume that everything is indexed:
>>>
>>> SELECT * FROM foo ORDER BY bar  will now demand an ugly sort for this data.
>>> this is not an option if you need more than a handful of rows ...
>>
>> I think the right way to approach this is to teach the planner about
>> merge sorts. This is, if the planner has path to foo_* all ordered by
>> the same key (because they have the same indexes) then it has a path to
>> the UNION of those tables simply by merging the results of those paths.
>>
>> This would be fairly straight forward to implement I think, you may
>> even be able to reuse the merge sort in the normal sort machinery.
>> (You'll need to watch out for UNION vs UNION ALL.)
>>
>> The real advantage of this approach is that you no longer have to prove
>> anything about the constraints or various datatypes and it is more
>> general. Say you have partitioned by start_date but you want to sort by
>> end_date, simple index scanning won't work while a merge sort will work
>> beautifully.
>>
>> You're also not limited to how the partitioning machinery will
>> eventually work.
>>
>> Hope this helps,
>
>
> i think this is excellent input.
> i will do some research going into that direction.

Greg Stark had a patch to do this a while back called merge append,
but it never got finished...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject: Re: non-overlapping, consecutive partitions
Date: 2010-07-25 22:40:43
Message-ID: AANLkTimdmLiSgJmnFS+WHQ6cK7s0UtVRwayUKDLP37Ty@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/7/25 Robert Haas <robertmhaas(at)gmail(dot)com>:
> 2010/7/25 PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>:
>>
>> On Jul 25, 2010, at 11:56 AM, Martijn van Oosterhout wrote:
>>
>>> I think the right way to approach this is to teach the planner about
>>> merge sorts.

For what it's worth I think this is a belt-and-suspenders type of
situation where we want two solutions which overlap somewhat.

I would really like to have merge-append nodes because there are all
sorts of plans where append nodes destroying the ordering of their
inputs eliminates a lot of good plans. Those cases can be UNION ALL
nodes, or partitions where there's no filter on the partition key at
all.

But for partitioned tables like the OPs the "real" solution would be
to have more structured meta-data about the partitions that allows the
planner to avoid needing the merge at all. It would also means the
planner wouldn't need to look at every node; it could do a binary
search or equivalent for the right partitions.

> Greg Stark had a patch to do this a while back called merge append,
> but it never got finished...

I was basically in over my head with the planner. I don't understand
how equivalent classes are used or should be used and didn't
understand the code I was pointed at as being analogous. It's probably
not so complicated as all that, but I never really wrapped my head
around it and moved onto tasks I could make more progress on.

--
greg


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject: Re: non-overlapping, consecutive partitions
Date: 2010-07-25 23:14:20
Message-ID: AANLkTim1FBZ638MLUpRk6ZU_EVKiea5XCvLQ+VHHUZzv@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Jul 25, 2010 at 6:40 PM, Greg Stark <gsstark(at)mit(dot)edu> wrote:
> 2010/7/25 Robert Haas <robertmhaas(at)gmail(dot)com>:
>> 2010/7/25 PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>:
>>>
>>> On Jul 25, 2010, at 11:56 AM, Martijn van Oosterhout wrote:
>>>
>>>> I think the right way to approach this is to teach the planner about
>>>> merge sorts.
>
> For what it's worth I think this is a belt-and-suspenders type of
> situation where we want two solutions which overlap somewhat.
>
> I would really like to have merge-append nodes because there are all
> sorts of plans where append nodes destroying the ordering of their
> inputs eliminates a lot of good plans. Those cases can be UNION ALL
> nodes, or partitions where there's no filter on the partition key at
> all.
>
> But for partitioned tables like the OPs the "real" solution would be
> to have more structured meta-data about the partitions that allows the
> planner to avoid needing the merge at all. It would also means the
> planner wouldn't need to look at every node; it could do a binary
> search or equivalent for the right partitions.

Agreed on all points.

>> Greg Stark had a patch to do this a while back called merge append,
>> but it never got finished...
>
> I was basically in over my head with the planner. I don't understand
> how equivalent classes are used or should be used and didn't
> understand the code I was pointed at as being analogous. It's probably
> not so complicated as all that, but I never really wrapped my head
> around it and moved onto tasks I could make more progress on.

Yeah, I don't fully understand those either.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject: Re: non-overlapping, consecutive partitions
Date: 2010-07-29 15:12:09
Message-ID: AB3126F0-BF5B-4D29-B43E-7F3172C4F492@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

hello ...

yeah, this is fairly complicated.

greg:
can you send me how far you got?
i would be curious to see how you have attacked this issue.

i am still in the process of checking the codes.
we somehow have to find a solution for that. otherwise we are in slight trouble here.
it seems we have to solve it no matter what it takes.

many thanks,

hans

On Jul 26, 2010, at 1:14 AM, Robert Haas wrote:

> On Sun, Jul 25, 2010 at 6:40 PM, Greg Stark <gsstark(at)mit(dot)edu> wrote:
>> 2010/7/25 Robert Haas <robertmhaas(at)gmail(dot)com>:
>>> 2010/7/25 PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>:
>>>>
>>>> On Jul 25, 2010, at 11:56 AM, Martijn van Oosterhout wrote:
>>>>
>>>>> I think the right way to approach this is to teach the planner about
>>>>> merge sorts.
>>
>> For what it's worth I think this is a belt-and-suspenders type of
>> situation where we want two solutions which overlap somewhat.
>>
>> I would really like to have merge-append nodes because there are all
>> sorts of plans where append nodes destroying the ordering of their
>> inputs eliminates a lot of good plans. Those cases can be UNION ALL
>> nodes, or partitions where there's no filter on the partition key at
>> all.
>>
>> But for partitioned tables like the OPs the "real" solution would be
>> to have more structured meta-data about the partitions that allows the
>> planner to avoid needing the merge at all. It would also means the
>> planner wouldn't need to look at every node; it could do a binary
>> search or equivalent for the right partitions.
>
> Agreed on all points.
>
>>> Greg Stark had a patch to do this a while back called merge append,
>>> but it never got finished...
>>
>> I was basically in over my head with the planner. I don't understand
>> how equivalent classes are used or should be used and didn't
>> understand the code I was pointed at as being analogous. It's probably
>> not so complicated as all that, but I never really wrapped my head
>> around it and moved onto tasks I could make more progress on.
>
> Yeah, I don't fully understand those either.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise Postgres Company
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de