Trimming the cost of ORDER BY in a simple query

Lists: pgsql-sql
From: <ogjunk-pgjedan(at)yahoo(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: Trimming the cost of ORDER BY in a simple query
Date: 2005-05-03 04:35:31
Message-ID: 20050503043531.23516.qmail@web31106.mail.mud.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Hello,

I have a simple query with a pretty high cost (EXPLAIN ...), and I'm
wondering if I can somehow trim it.

Query (shows the last 7 dates):

=> SELECT DISTINCT date_part('year', uu.add_date), date_part('month',
uu.add_date), date_part('day', uu.add_date) FROM user_url uu WHERE
uu.user_id=1 ORDER BY date_part('year', uu.add_date) DESC,
date_part('month', uu.add_date) DESC, date_part('day', uu.add_date)
DESC LIMIT 7;

QUERY PLAN:
---------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=4510.14..4522.93 rows=2 width=8) (actual
time=19.924..20.160 rows=7 loops=1)
-> Unique (cost=4510.14..4522.93 rows=2 width=8) (actual
time=19.919..20.139 rows=7 loops=1)
-> Sort (cost=4510.14..4513.34 rows=1279 width=8) (actual
time=19.915..20.004 rows=78 loops=1)
Sort Key: date_part('year'::text, add_date),
date_part('month'::text, add_date), date_part('day'::text, add_date)
-> Index Scan using foo on user_url uu
(cost=0.00..4444.14 rows=1279 width=8) (actual time=0.095..14.761
rows=1225 loops=1)
Index Cond: (user_id = 1)
Total runtime: 20.313 ms
(7 rows)

It looks like the cost is all in ORDER BY, and if I remove ORDER BY the
execution time goes from 20-90 ms to less than 1 ms.

I do need the 7 most recent add_dates. Is there a more efficient way
of grabbing them?

Thanks,
Otis


From: Mischa Sandberg <mischa(dot)sandberg(at)telus(dot)net>
To: ogjunk-pgjedan(at)yahoo(dot)com
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Trimming the cost of ORDER BY in a simple query
Date: 2005-05-03 06:31:05
Message-ID: 1115101865.42771aa9ee0d3@webmail.telus.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Quoting ogjunk-pgjedan(at)yahoo(dot)com:

> I have a simple query with a pretty high cost (EXPLAIN ...), and I'm
> wondering if I can somehow trim it.
>
> Query (shows the last 7 dates):
>
> => SELECT DISTINCT date_part('year', uu.add_date),
> date_part('month',
> uu.add_date), date_part('day', uu.add_date) FROM user_url uu
> WHERE
> uu.user_id=1 ORDER BY date_part('year', uu.add_date) DESC,
> date_part('month', uu.add_date) DESC, date_part('day',
> uu.add_date)
> DESC LIMIT 7;
>
> QUERY PLAN:
>
---------------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=4510.14..4522.93 rows=2 width=8) (actual
> time=19.924..20.160 rows=7 loops=1)
> -> Unique (cost=4510.14..4522.93 rows=2 width=8) (actual
> time=19.919..20.139 rows=7 loops=1)
> -> Sort (cost=4510.14..4513.34 rows=1279 width=8) (actual
> time=19.915..20.004 rows=78 loops=1)
> Sort Key: date_part('year'::text, add_date),
> date_part('month'::text, add_date), date_part('day'::text, add_date)
> -> Index Scan using foo on user_url uu
> (cost=0.00..4444.14 rows=1279 width=8) (actual time=0.095..14.761
> rows=1225 loops=1)
> Index Cond: (user_id = 1)
> Total runtime: 20.313 ms
> (7 rows)
>
>
> It looks like the cost is all in ORDER BY, and if I remove ORDER BY
> the
> execution time goes from 20-90 ms to less than 1 ms.
>
> I do need the 7 most recent add_dates. Is there a more efficient
> way
> of grabbing them?

The query analyzer is using the sort to detect and return distinct
values, as well. So there's not much point in trying to remove it.


From: Ragnar Hafstað <gnari(at)simnet(dot)is>
To: ogjunk-pgjedan(at)yahoo(dot)com
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Trimming the cost of ORDER BY in a simple query
Date: 2005-05-03 07:22:18
Message-ID: 1115104938.5853.25.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

On Mon, 2005-05-02 at 21:35 -0700, ogjunk-pgjedan(at)YAHOO(dot)COM wrote:

> Query (shows the last 7 dates):
>
> => SELECT DISTINCT date_part('year', uu.add_date), date_part('month',
> uu.add_date), date_part('day', uu.add_date) FROM user_url uu WHERE
> uu.user_id=1 ORDER BY date_part('year', uu.add_date) DESC,
> date_part('month', uu.add_date) DESC, date_part('day', uu.add_date)
> DESC LIMIT 7;
>

i assume add_date is a timestamp, because otherwise you could
ORDER by add_date, and use an index on (desc, add_date):
SELECT DISTINCT .... WHERE uu.user_id=1
ORDER BY uu.user_id DESC ,uu.add_date DESC LIMIT 7;

this will not work for a timestamp add_date

> QUERY PLAN:
...
> Total runtime: 20.313 ms

20 ms does not sound like a big problem.

if 20 ms is really too much for you, and add_date IS a timestamp,
you might think about adding a date column , maintained
by triggers of by your application, and add an index on (user_id, x)

another possibility (total guess) is a functional index
create index func_id_date on user_url(user_id,(add_date::date));
SELECT ... ORDER BY uu.user_id DESC ,uu.add_date::date DESC LIMIT 7;
I have no idea if this will work, or if the planner will use such
an index.

gnari


From: Ezequiel Tolnay <mail(at)etolnay(dot)com(dot)ar>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: Trimming the cost of ORDER BY in a simple query
Date: 2005-05-03 08:08:23
Message-ID: d57bhc$2nfr$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Ragnar Hafstað wrote:
> another possibility (total guess) is a functional index
> create index func_id_date on user_url(user_id,(add_date::date));
> SELECT ... ORDER BY uu.user_id DESC ,uu.add_date::date DESC LIMIT 7;
> I have no idea if this will work, or if the planner will use such
> an index.
To make sure the index is used you could build the query using a
subquery like this:

SELECT DISTINCT date_part('year', add_date), date_part('month',
add_date), date_part('day', add_date)
FROM (
SELECT uu.add_date::date as add_date
FROM user_url uu
GROUP BY uu.user_id, uu.add_date
HAVING uu.user_id=1
ORDER BY uu.add_date::date DESC
LIMIT 7
) x;

Perhaps a select distinct instead of the group by in the subquery would
make use the index?

Cheers,

Ezequiel Tolnay


From: Harald Fuchs <use_reply_to(at)protecting(dot)net>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: Trimming the cost of ORDER BY in a simple query
Date: 2005-05-03 10:59:48
Message-ID: puu0lk60d7.fsf@srv.protecting.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

In article <1115104938(dot)5853(dot)25(dot)camel(at)localhost(dot)localdomain>,
Ragnar Hafstað <gnari(at)simnet(dot)is> writes:

> 20 ms does not sound like a big problem.

> if 20 ms is really too much for you, and add_date IS a timestamp,
> you might think about adding a date column , maintained
> by triggers of by your application, and add an index on (user_id, x)

Maybe a functional index on "date_trunc('day', add_date)" would be helpful?