Re: Sorting. When?

Lists: pgsql-hackers
From: "mac_man2008(at)yahoo(dot)it" <mac_man2008(at)yahoo(dot)it>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Sorting. When?
Date: 2011-02-10 22:38:15
Message-ID: 4D5468D7.4080709@yahoo.it
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi.

Which operations invoke the sorting algorithms implemented in the
sorting module (tuplesort.c) ?
Of course, one of those operations invoking sorting is the ORDER BY
clause and the DISTINCT too.

Moreover, the Merge Join should be implemented invoking sorting.

Is there any other operation invoking sorting?

Thanks.
Regards.

Fava


From: Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>
To: "mac_man2008(at)yahoo(dot)it" <mac_man2008(at)yahoo(dot)it>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Sorting. When?
Date: 2011-02-10 23:21:50
Message-ID: AANLkTikg9_iUd+aU8nnmuPrtfx6FgugO4NCfCoQiRcK4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2011/2/10 mac_man2008(at)yahoo(dot)it <mac_man2008(at)yahoo(dot)it>:

> Which operations invoke the sorting algorithms implemented in the sorting
> module (tuplesort.c) ?
> Of course, one of those operations invoking sorting is the ORDER BY clause
> and the DISTINCT too.
>
> Moreover, the Merge Join should be implemented invoking sorting.
>
> Is there any other operation invoking sorting?

AFAIK, all set operators except for UNION ALL. (I am probably missing
a whole boatload of other things.)

Nicolas


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>
Cc: "mac_man2008(at)yahoo(dot)it" <mac_man2008(at)yahoo(dot)it>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Sorting. When?
Date: 2011-02-11 01:17:43
Message-ID: AANLkTinD1WdwQ7g0xDxfK_LqnbECcqcqiLtKfyC_Z58U@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 10, 2011 at 6:21 PM, Nicolas Barbier
<nicolas(dot)barbier(at)gmail(dot)com> wrote:
> 2011/2/10 mac_man2008(at)yahoo(dot)it <mac_man2008(at)yahoo(dot)it>:
>
>> Which operations invoke the sorting algorithms implemented in the sorting
>> module (tuplesort.c) ?
>> Of course, one of those operations invoking sorting is the ORDER BY clause
>> and the DISTINCT too.
>>
>> Moreover, the Merge Join should be implemented invoking sorting.
>>
>> Is there any other operation invoking sorting?
>
> AFAIK, all set operators except for UNION ALL. (I am probably missing
> a whole boatload of other things.)

Merge joins don't necessarily involve a sort - you could do a merge
over a pair of index scans, for example.

Set operations can be implemented using hashing or sorting, too.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: "mac_man2008(at)yahoo(dot)it" <mac_man2008(at)yahoo(dot)it>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Sorting. When?
Date: 2011-02-11 10:03:37
Message-ID: 4D550979.7040704@yahoo.it
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thank you all for your replies.

So, is there any precise way to discover when sorting is invoked?

Thanks.
Regards.

Fava

Il 11/02/2011 01:17, Robert Haas ha scritto:
> On Thu, Feb 10, 2011 at 6:21 PM, Nicolas Barbier
> <nicolas(dot)barbier(at)gmail(dot)com> wrote:
>> 2011/2/10 mac_man2008(at)yahoo(dot)it<mac_man2008(at)yahoo(dot)it>:
>>
>>> Which operations invoke the sorting algorithms implemented in the sorting
>>> module (tuplesort.c) ?
>>> Of course, one of those operations invoking sorting is the ORDER BY clause
>>> and the DISTINCT too.
>>>
>>> Moreover, the Merge Join should be implemented invoking sorting.
>>>
>>> Is there any other operation invoking sorting?
>> AFAIK, all set operators except for UNION ALL. (I am probably missing
>> a whole boatload of other things.)
> Merge joins don't necessarily involve a sort - you could do a merge
> over a pair of index scans, for example.
>
> Set operations can be implemented using hashing or sorting, too.
>


From: Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>
To: "mac_man2008(at)yahoo(dot)it" <mac_man2008(at)yahoo(dot)it>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Sorting. When?
Date: 2011-02-11 11:50:19
Message-ID: AANLkTik4dEGn=xEsruvowEcrpNv0CzbdmzRLZ=cKxcWA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

[ Please don't top-post. <URL:http://en.wikipedia.org/wiki/Posting_style> ]

2011/2/11 mac_man2008(at)yahoo(dot)it <mac_man2008(at)yahoo(dot)it>:

> So, is there any precise way to discover when sorting is invoked?

EXPLAIN shows how a query would be executed; explicit sorts should be
mostly obvious.
<URL:http://www.postgresql.org/docs/9.0/static/sql-explain.html>

Nicolas


From: "mac_man2008(at)yahoo(dot)it" <mac_man2008(at)yahoo(dot)it>
To: Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Sorting. When?
Date: 2011-02-11 12:50:53
Message-ID: 4D5530AD.7020801@yahoo.it
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Nicolas, thanks.

Unfortunately I don't think I can get precise infos from that link. That
"explains" how the EXPLAIN works, while I need to know, from an
algorithmic point of view, in which cases sorting is invoked.
Actually, maybe I can spend some time in trying to perform samples
queries and trying to deduce which operations calls the sorting module.
But I think it is not the most effective way to do that, since that
would mean running a bounch of queries for different values of work_mem,
or for different size of the involved tables. Even if I try to do that,
some cases can not be evident to my sight.

I am searching for someone telling me (how to get) a list of operations
invoking sorting, and in which cases they do that.
Just for example:
- ORDER BY, always invokes sorting.
- DISTINCT, always invokes sorting
- Merge Join, just in case (..bla bla bla..)
- ...

Is it possible?
Any other suggestion?

Thanks for your time.
Best regards.

Fava

Il 11/02/2011 11:50, Nicolas Barbier ha scritto:
> [ Please don't top-post.<URL:http://en.wikipedia.org/wiki/Posting_style> ]
>
> 2011/2/11 mac_man2008(at)yahoo(dot)it<mac_man2008(at)yahoo(dot)it>:
>
>> So, is there any precise way to discover when sorting is invoked?
> EXPLAIN shows how a query would be executed; explicit sorts should be
> mostly obvious.
> <URL:http://www.postgresql.org/docs/9.0/static/sql-explain.html>
>
> Nicolas
>


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Nicolas Barbier" <nicolas(dot)barbier(at)gmail(dot)com>, "mac_man2008(at)yahoo(dot)it" <mac_man2008(at)yahoo(dot)it>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Sorting. When?
Date: 2011-02-11 20:46:09
Message-ID: 4D554BB1020000250003A8CC@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"mac_man2008(at)yahoo(dot)it" <mac_man2008(at)yahoo(dot)it> wrote:

> I need to know, from an algorithmic point of view, in which cases
> sorting is invoked.

Well, I think the only accurate answer to that is "when the
estimated cost of a plan using a sort is lower than the estimated
cost of any alternatives". There are cases where you can be sure a
sort will be used, like when you have an ORDER BY clause and there
is no index which can scan rows in that order; but there are a lot
of situations where plans which involve sorts are in competition
with plans which don't. The plan chosen can depend on such things
as how bloated your table is, how close the order of the tuples in
the heap corresponds to the order of a particular index, how many
rows are in which tables, what work_mem is set to, etc.

Are your really looking to categorize the types of queries where
sorting is *invoked*, or the ones where it is *considered*? Or
perhaps only those where it is *required*, since there are no
possible plans without sorting?

-Kevin


From: Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>
To: "mac_man2008(at)yahoo(dot)it" <mac_man2008(at)yahoo(dot)it>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Sorting. When?
Date: 2011-02-11 22:49:27
Message-ID: AANLkTimd-aUg-Zhu5kcxBRdSn7vi5KOaYOtMOmZwRd4p@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2011/2/11 Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>:

> "mac_man2008(at)yahoo(dot)it" <mac_man2008(at)yahoo(dot)it> wrote:
>
>> I need to know, from an algorithmic point of view, in which cases
>> sorting is invoked.

[..]

> Are your really looking to categorize the types of queries where
> sorting is *invoked*, or the ones where it is *considered*?  Or
> perhaps only those where it is *required*, since there are no
> possible plans without sorting?

Or, if you are seeking the exact rules that are used by the planner to
determine all possible plans from which the one with minimum cost is
chosen (and hence all ways in which sorts can be used), I think that
the source code is the only complete reference. A non-complete
introduction is:

<URL:http://www.postgresql.org/docs/9.0/static/planner-optimizer.html>

Basically, physical operators (seq scan, index scan, hash join, merge
join, nested loop, filter, set operation, etc) may require their input
to satisfy certain sort constraints (for example, both inputs of a
merge join need to be sorted on the join attribute(s)). If it happens
to be of lowest cost to explicitly sort the inputs right before
consuming them, that will be done. If there is a way to get the same
input in an already-ordered way (for example an index scan, or the
output of a merge join), so that the cost is less than the non-ordered
way + an explicit sort, then that already-ordered way will be chosen.

Super-basic example:

SELECT * FROM t ORDER BY a;

This may either perform a seq scan of table t and then do an explicit
sort, or perform a full index scan of an index on attribute a
(provided that such an index exists), in which case the explicit sort
is not needed because the index scan will deliver the rows in
already-sorted order. Which option is chosen depends on the cost: The
costs of both plans are calculated and the least costly plan is
chosen. See the (non-exhaustive) list of things that influence the
costs provided by Kevin to get a feeling for how many variables there
are that influence this choice.

Nicolas


From: "mac_man2008(at)yahoo(dot)it" <mac_man2008(at)yahoo(dot)it>
To: Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Sorting. When?
Date: 2011-02-12 11:13:04
Message-ID: 4D566B40.7020400@yahoo.it
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

So, invoking or not invoking sorting depends on different parameters of
the specific DBMS, does it?

This also means that it depends on the specific implementation of the
Planner and, as a consequence, *on the specific DBMS*?
I mean, different DBMS can chose differently on invoking sorting even if
they are executing the same query over the same set of data?

Fava.

Il 11/02/2011 22:49, Nicolas Barbier ha scritto:
> 2011/2/11 Kevin Grittner<Kevin(dot)Grittner(at)wicourts(dot)gov>:
>
>> "mac_man2008(at)yahoo(dot)it"<mac_man2008(at)yahoo(dot)it> wrote:
>>
>>> I need to know, from an algorithmic point of view, in which cases
>>> sorting is invoked.
> [..]
>
>> Are your really looking to categorize the types of queries where
>> sorting is *invoked*, or the ones where it is *considered*? Or
>> perhaps only those where it is *required*, since there are no
>> possible plans without sorting?
> Or, if you are seeking the exact rules that are used by the planner to
> determine all possible plans from which the one with minimum cost is
> chosen (and hence all ways in which sorts can be used), I think that
> the source code is the only complete reference. A non-complete
> introduction is:
>
> <URL:http://www.postgresql.org/docs/9.0/static/planner-optimizer.html>
>
> Basically, physical operators (seq scan, index scan, hash join, merge
> join, nested loop, filter, set operation, etc) may require their input
> to satisfy certain sort constraints (for example, both inputs of a
> merge join need to be sorted on the join attribute(s)). If it happens
> to be of lowest cost to explicitly sort the inputs right before
> consuming them, that will be done. If there is a way to get the same
> input in an already-ordered way (for example an index scan, or the
> output of a merge join), so that the cost is less than the non-ordered
> way + an explicit sort, then that already-ordered way will be chosen.
>
> Super-basic example:
>
> SELECT * FROM t ORDER BY a;
>
> This may either perform a seq scan of table t and then do an explicit
> sort, or perform a full index scan of an index on attribute a
> (provided that such an index exists), in which case the explicit sort
> is not needed because the index scan will deliver the rows in
> already-sorted order. Which option is chosen depends on the cost: The
> costs of both plans are calculated and the least costly plan is
> chosen. See the (non-exhaustive) list of things that influence the
> costs provided by Kevin to get a feeling for how many variables there
> are that influence this choice.
>
> Nicolas
>


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: "mac_man2008(at)yahoo(dot)it" <mac_man2008(at)yahoo(dot)it>
Cc: Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Sorting. When?
Date: 2011-02-12 18:58:06
Message-ID: AANLkTi=woCr0xj3SADgq9=3Z42E9oONZQw0+xcUGF0Mn@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2011/2/12 mac_man2008(at)yahoo(dot)it <mac_man2008(at)yahoo(dot)it>:
> So, invoking or not invoking sorting depends on different parameters of the
> specific DBMS, does it?
>
> This also means that it depends on the specific implementation of the
> Planner and, as a consequence, on the specific DBMS?
> I mean, different DBMS can chose differently on invoking sorting even if
> they are executing the same query over the same set of data?
>

exactly

Regards

Pavel Stehule

> Fava.
>
> Il 11/02/2011 22:49, Nicolas Barbier ha scritto:
>
> 2011/2/11 Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>:
>
> "mac_man2008(at)yahoo(dot)it" <mac_man2008(at)yahoo(dot)it> wrote:
>
> I need to know, from an algorithmic point of view, in which cases
> sorting is invoked.
>
> [..]
>
> Are your really looking to categorize the types of queries where
> sorting is *invoked*, or the ones where it is *considered*?  Or
> perhaps only those where it is *required*, since there are no
> possible plans without sorting?
>
> Or, if you are seeking the exact rules that are used by the planner to
> determine all possible plans from which the one with minimum cost is
> chosen (and hence all ways in which sorts can be used), I think that
> the source code is the only complete reference. A non-complete
> introduction is:
>
> <URL:http://www.postgresql.org/docs/9.0/static/planner-optimizer.html>
>
> Basically, physical operators (seq scan, index scan, hash join, merge
> join, nested loop, filter, set operation, etc) may require their input
> to satisfy certain sort constraints (for example, both inputs of a
> merge join need to be sorted on the join attribute(s)). If it happens
> to be of lowest cost to explicitly sort the inputs right before
> consuming them, that will be done. If there is a way to get the same
> input in an already-ordered way (for example an index scan, or the
> output of a merge join), so that the cost is less than the non-ordered
> way + an explicit sort, then that already-ordered way will be chosen.
>
> Super-basic example:
>
> SELECT * FROM t ORDER BY a;
>
> This may either perform a seq scan of table t and then do an explicit
> sort, or perform a full index scan of an index on attribute a
> (provided that such an index exists), in which case the explicit sort
> is not needed because the index scan will deliver the rows in
> already-sorted order. Which option is chosen depends on the cost: The
> costs of both plans are calculated and the least costly plan is
> chosen. See the (non-exhaustive) list of things that influence the
> costs provided by Kevin to get a feeling for how many variables there
> are that influence this choice.
>
> Nicolas
>
>
>


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "mac_man2008(at)yahoo(dot)it" <mac_man2008(at)yahoo(dot)it>
Cc: "Nicolas Barbier" <nicolas(dot)barbier(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Sorting. When?
Date: 2011-02-12 19:05:20
Message-ID: 4D568590020000250003A916@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"mac_man2008(at)yahoo(dot)it" <mac_man2008(at)yahoo(dot)it> wrote:

> So, invoking or not invoking sorting depends on different
> parameters of the specific DBMS, does it?

In the right circumstances (where autovacuum or other maintenance
jobs happen to run in the background at the right moment), two
back-to-back runs on the same database with no data or configuration
changes could do this differently. You wouldn't see that too often,
but if the estimated costs of sorting versus non-sorting plans were
close, the random sampling of one statistics sampling run versus
another could cause a change.

-Kevin