Order by optimisations?

Lists: pgsql-hackers
From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Order by optimisations?
Date: 2005-07-13 08:08:43
Message-ID: 42D4CC0B.3070805@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Does PostgreSQL do the following optimisation:

SELECT * FROM diary WHERE date = '2005-05-01' ORDER BY date;

or in fact even better (for my situation)

SELECT * FROM diary WHERE date BETWEEN '2005-05-01' AND '2005-05-01'
ORDER BY date;

Does it know that the input to the sort routine is already sorted and
hence is a no-op?

Chris


From: Hannu Krosing <hannu(at)skype(dot)net>
To: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
Cc: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Order by optimisations?
Date: 2005-07-13 20:12:58
Message-ID: 1121285578.5551.0.camel@fuji.krosing.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On K, 2005-07-13 at 16:08 +0800, Christopher Kings-Lynne wrote:
> Hi,
>
> Does PostgreSQL do the following optimisation:
>
> SELECT * FROM diary WHERE date = '2005-05-01' ORDER BY date;
>
> or in fact even better (for my situation)
>
> SELECT * FROM diary WHERE date BETWEEN '2005-05-01' AND '2005-05-01'
> ORDER BY date;
>
> Does it know that the input to the sort routine is already sorted and
> hence is a no-op?

Yes

try EXPLAIN ;)

--
Hannu Krosing <hannu(at)skype(dot)net>


From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Hannu Krosing <hannu(at)skype(dot)net>
Cc: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Order by optimisations?
Date: 2005-07-14 09:06:06
Message-ID: 42D62AFE.7090109@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hannu Krosing wrote:
> On K, 2005-07-13 at 16:08 +0800, Christopher Kings-Lynne wrote:
>
>>Hi,
>>
>>Does PostgreSQL do the following optimisation:
>>
>>SELECT * FROM diary WHERE date = '2005-05-01' ORDER BY date;
>>
>>or in fact even better (for my situation)
>>
>>SELECT * FROM diary WHERE date BETWEEN '2005-05-01' AND '2005-05-01'
>>ORDER BY date;
>>
>>Does it know that the input to the sort routine is already sorted and
>>hence is a no-op?
>
>
> Yes
>
> try EXPLAIN ;)

Doesn't seem like it does:

usatest=# explain select * from users_myfoods_map where
date='2004-11-21' order by date;
QUERY PLAN
---------------------------------------------------------------------------
Sort (cost=17.17..17.48 rows=123 width=22)
Sort Key: date
-> Seq Scan on users_myfoods_map (cost=0.00..12.90 rows=123 width=22)
Filter: (date = '2004-11-21'::date)
(4 rows)

The sort cost is non-zero. Or am I not looking at the right thing...

Chris


From: "Michael Paesold" <mpaesold(at)gmx(dot)at>
To: "Christopher Kings-Lynne" <chriskl(at)familyhealth(dot)com(dot)au>, "Hannu Krosing" <hannu(at)skype(dot)net>
Cc: "Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Order by optimisations?
Date: 2005-07-14 09:32:44
Message-ID: 012501c58856$fe0f0790$d501a8c0@zaphod
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Christopher Kings-Lynne wrote:
> Doesn't seem like it does:
>
> usatest=# explain select * from users_myfoods_map where date='2004-11-21'
> order by date;
> QUERY PLAN
> ---------------------------------------------------------------------------
> Sort (cost=17.17..17.48 rows=123 width=22)
> Sort Key: date
> -> Seq Scan on users_myfoods_map (cost=0.00..12.90 rows=123 width=22)
> Filter: (date = '2004-11-21'::date)
> (4 rows)
>
> The sort cost is non-zero. Or am I not looking at the right thing...

You are looking at the right thing, AFAIK. Well, it seems the planner cannot
reason that if a field should have only one value, sorting on that field is
not needed.

I remember there are examples where the planner will know that the input to
a sort is already sorted and will skip the sort. Tom will be able to explain
if this here is a reasonable optimization. I *guess* it could be done, with
some restrictions.

Best Regards,
Michael Paesold


From: "Andrew Dunstan" <andrew(at)dunslane(dot)net>
To: <chriskl(at)familyhealth(dot)com(dot)au>
Cc: <hannu(at)skype(dot)net>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Order by optimisations?
Date: 2005-07-14 11:38:20
Message-ID: 1996.24.211.165.134.1121341100.squirrel@www.dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Christopher Kings-Lynne said:
>

> usatest=# explain select * from users_myfoods_map where
> date='2004-11-21' order by date;

I assume that this is program generated SQL, as I hope a human would know
better than to write this. In which case, isn't the answer to improve the
generator rather than expect postgres to make up for its defficiencies?

cheers

andrew


From: Jochem van Dieten <jochemd(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Order by optimisations?
Date: 2005-07-14 12:52:22
Message-ID: f96a9b83050714055271e38dbd@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 7/14/05, Michael Paesold wrote:
> Christopher Kings-Lynne wrote:
>>
>> usatest=# explain select * from users_myfoods_map where date='2004-11-21'
>> order by date;
>> QUERY PLAN
>> ---------------------------------------------------------------------------
>> Sort (cost=17.17..17.48 rows=123 width=22)
>> Sort Key: date
>> -> Seq Scan on users_myfoods_map (cost=0.00..12.90 rows=123 width=22)
>> Filter: (date = '2004-11-21'::date)
>> (4 rows)
>>
>> The sort cost is non-zero. Or am I not looking at the right thing...
>
> You are looking at the right thing, AFAIK. Well, it seems the planner cannot
> reason that if a field should have only one value, sorting on that field is
> not needed.

For the planner to deduct that, it should first deduct that the field
should only have one value. Is that a deduction the planner can even
make for this query if we consider for instance implicit timestamp to
date casting?

> I remember there are examples where the planner will know that the input to
>a sort is already sorted and will skip the sort.

The planner knows the output of an indexscan is sorted. With a proper
index on the "date" field (I hope that is not really the name) and
favourable statistics the planner should switch to an indexscan and
the order node should disappear.

Jochem


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Hannu Krosing <hannu(at)skype(dot)net>
Cc: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Order by optimisations?
Date: 2005-07-14 13:32:16
Message-ID: 14216.1121347936@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hannu Krosing <hannu(at)skype(dot)net> writes:
> On K, 2005-07-13 at 16:08 +0800, Christopher Kings-Lynne wrote:
>> Does it know that the input to the sort routine is already sorted and
>> hence is a no-op?

> Yes

No, but in most cases this will use an index and hence will assume that
the index is responsible for ordering.

regards, tom lane


From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: hannu(at)skype(dot)net, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Order by optimisations?
Date: 2005-07-15 02:36:00
Message-ID: 42D72110.8010001@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> I assume that this is program generated SQL, as I hope a human would know
> better than to write this. In which case, isn't the answer to improve the
> generator rather than expect postgres to make up for its defficiencies?

Well, the issue in my case is we have user food diaries. Usually,
99.9999% of the time we pull up a single date of their diary. However,
for printing purposes we need a range.

It's a large query, so it's implemented as a simple PL/PSQL stored
procedure.

I was trying to avoid having to c&p the entire stored proc to make a
'range version'.

If PostgreSQL was smart enough to deal with a range of 1 day and a sort
on it efficiently, I'd just use the range stored proc exclusively....

Chris


From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Hannu Krosing <hannu(at)skype(dot)net>, Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Order by optimisations?
Date: 2005-07-15 02:39:37
Message-ID: 42D721E9.8090906@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>Does it know that the input to the sort routine is already sorted and
>>>hence is a no-op?
>
>
>>Yes
>
>
> No, but in most cases this will use an index and hence will assume that
> the index is responsible for ordering.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly


From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Hannu Krosing <hannu(at)skype(dot)net>, Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Order by optimisations?
Date: 2005-07-15 02:40:39
Message-ID: 42D72227.5050607@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


>>>Does it know that the input to the sort routine is already sorted and
>>>hence is a no-op?
>
>>Yes
>
> No, but in most cases this will use an index and hence will assume that
> the index is responsible for ordering.

OK, so what's going on here?

usa=> explain select * from users_myfoods_map where user_id=1 and
date='2003-11-03' order by date;
QUERY PLAN

-------------------------------------------------------------------------------------------------------------------
Sort (cost=4.84..4.85 rows=2 width=22)
Sort Key: date
-> Index Scan using users_myfoods_map_user_id_date_key on
users_myfoods_map (cost=0.00..4.83 rows=2 width=22)
Index Cond: ((user_id = 1) AND (date = '2003-11-03'::date))
(4 rows)

(That's on our enormous live table)

Chris


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
Cc: Hannu Krosing <hannu(at)skype(dot)net>, Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Order by optimisations?
Date: 2005-07-15 04:59:44
Message-ID: 7590.1121403584@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au> writes:
> OK, so what's going on here?

> usa=> explain select * from users_myfoods_map where user_id=1 and
> date='2003-11-03' order by date;
> QUERY PLAN

> -------------------------------------------------------------------------------------------------------------------
> Sort (cost=4.84..4.85 rows=2 width=22)
> Sort Key: date
> -> Index Scan using users_myfoods_map_user_id_date_key on
> users_myfoods_map (cost=0.00..4.83 rows=2 width=22)
> Index Cond: ((user_id = 1) AND (date = '2003-11-03'::date))
> (4 rows)

Well, date evidently isn't the high-order key of this index. But why
exactly are you worried about a sort of 2 rows?

regards, tom lane


From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Hannu Krosing <hannu(at)skype(dot)net>, Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Order by optimisations?
Date: 2005-07-15 05:42:38
Message-ID: 42D74CCE.3030609@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Well, date evidently isn't the high-order key of this index. But why
> exactly are you worried about a sort of 2 rows?

Aha that's nailed it:

usa=> explain select * from users_myfoods_map where user_id=1 and date
between '2003-11-03' and '2003-11-03' order by user_id, date;
QUERY PLAN

-------------------------------------------------------------------------------------------------------------
Index Scan using users_myfoods_map_user_id_date_key on
users_myfoods_map (cost=0.00..3.78 rows=1 width=22)
Index Cond: ((user_id = 1) AND (date >= '2003-11-03'::date) AND
(date <= '2003-11-03'::date))
(2 rows)

I don't care about this particular result. But imagine it running
thousands of times a minute, with result sets between 0 and 50 rows...

Chris