Re: Re: Using indexes for ORDER BY and PARTITION BY clause in windowing functions

From: Sameer Kumar <sameer(dot)kumar(at)ashnik(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: David Johnston <polobo(at)yahoo(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Using indexes for ORDER BY and PARTITION BY clause in windowing functions
Date: 2013-11-05 09:08:02
Message-ID: CADp-Sm5EzGp5qryZMf=eKMtF3tG4rmN1SGf3uYib9WjgqduOSA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Hello,
>
> > > With this index, you will get a different plan like this,
> > >
> > Exactly my point, can we look at making windowing functions
> > smart and make use of available indexes?
>
> I might have guessed..
>
>
> > > Does this satisfies your needs?
> > >
> > Not exactly. If I have missed to mention, this is not a
> > production issue for me. I am trying to see if PostgreSQL
> > planner produces best plans for Data Warehouse and mining
> > oriented queries.
>
> I agree to the point.
>
> > I think Hashes can be efficiently used for sorting (and I
> > believe they are used for joins too when a pre-sorted data set
> > is not available via indexes). This again could my
> > misinterpretation.
>
> It is true if 'Sorting' means 'key classification without
> orderings'. Hashes should always appear at inner side of a join,
> I'm convinced. The "ordered' nature is not required for the case
> if outer side is already ordered. If not, separate sorting will
> needed.
>
> > I lost you somewhere here. My be this is above my pay-grade :-)
>
> Sorry for my crumsy english :-<
>
>
No, it was not your English. :-)
When I read it again and try to relate, I get your point. Actually true,
hashes must always be performed as last option (if that is what you too
meant) and if there are few other operations they must be the last one to
be performed especially after sorting/grouping. Hashes must somehow make
use of already sorted data (I think this something even you indicated)

> Well, at least with Oracle and DB2 planners I have seen that
> > the plan produced with dense_rank performs better than a series
> > of nested SELECT MAX().
>
> I see your point. Although I don't know what plans they
> generates, and I don't see how to ordering and ranking without
> sorting. Could you let me see what they look like?
>
> # Nevertheless, I don't have the confidence that I can be of some
> # help..
>
> I will do that if I get a DB2 system or Oracle system running. I will try
to replicate the same 2 test cases and share the plan. One thing which I am
sure is, the below part of the plan

QUERY PLAN | Subquery Scan on __unnamed_subquery_0
(cost=12971.39..16964.99 rows=614 width=43) (actual
time=2606.075..2953.937 rows=558 loops=1)

would be generated as RID scan in DB2 (which I have seen to perform better
than normal subquery scans in DB2).

Regards
Sameer
Ashnik Pte Ltd

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Rajeev rastogi 2013-11-05 09:50:56 TODO: Split out pg_resetxlog output into pre- and post-sections
Previous Message Simon Riggs 2013-11-05 09:06:57 Re: Fast insertion indexes: why no developments