JOIN with ORDER on both tables does a sort when it souldn't

Lists: pgsql-general
From: Dániel Dénes <panther-d(at)freemail(dot)hu>
To: pgsql-general(at)postgresql(dot)org
Subject: JOIN with ORDER on both tables does a sort when it souldn't
Date: 2007-05-27 16:49:32
Message-ID: freemail.20070427184932.8285@fm06.freemail.hu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi,

I have three tables involved in my problem:

forums_grps [means: Forum-Groups]
- id (PRIMARY KEY)
- title

forums [means: Forums]
- id (PRIMARY KEY)
- forum_group_id (NOT NULL, FOREIGN KEY)
- order (defines listing order of forums in the same forum_group)
INDEX: (forum_group_id, order)

sit_shw_fgr [means: Sites Show Forum-Groups]
- site_id (PRIMARY KEY)
- forum_group_id (PRIMARY KEY, FOREIGN KEY)
- order (defines listing order of shown forum_groups on a site)
INDEX: (site_id, order)

What I want to do is SELECT the forums shown on a given site,
ordered by sit_shw_fgr.order ASC, forums.order ASC. So the query is:

SELECT * FROM sit_shw_fgr JOIN forums
ON forums.forum_group_id = sit_shw_fgr.forum_group_id
WHERE sit_shw_fgr.site_id = 1
ORDER BY sit_shw_fgr.order ASC, forums.order ASC

If the plan uses a nestloop with both indexes I mentioned, it will get
the results in the correct order. But the planner will only choose this
plan, if I disable all other choices:
SET enable_seqscan TO false;
SET enable_hashjoin TO false;
SET enable_mergejoin TO false;
But even then, it won't realize that the result are in correct order, and
does a sort! Why?

Sort
Sort Key: sit_shw_fgr.order, forums.order
-> Nested Loop
-> Index Scan using sit_shw_fgr_idx_siteid_order on sit_shw_fgr
Index Cond: (sitid = 1)
-> Index Scan using forums_idx_forumgroupid_order on forums
Index Cond: (forums.fgrid = "outer".fgrid)

I'm using PostgreSQL 8.1.8.

Thanks for the answer in advance,
Denes Daniel

Végleges lézeres szőrtelenítés:jún. 30-ig most mindkét hónalj kezelése csak 79 000 Ft! Klikk ide a részleteketért!
http://www.webdesign.hu/aesthetica/flash_microsite/?id=8;p_code=2029


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Dániel Dénes <panther-d(at)freemail(dot)hu>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: JOIN with ORDER on both tables does a sort when it souldn't
Date: 2007-05-27 19:01:44
Message-ID: 241.1180292504@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

=?ISO-8859-2?Q?D=E1niel_D=E9nes?= <panther-d(at)freemail(dot)hu> writes:
> But even then, it won't realize that the result are in correct order, and
> does a sort! Why?

In general the output of a nestloop doesn't derive any ordering
properties from the inner scan. It might happen to work in your
particular case because on the outer side (site_id, order) is unique and
so the "order" values must be strictly increasing. But if there could
be multiple rows with the same "order" value coming from the outer side,
then it would be incorrect to claim that the join output is sorted by
(outer.order, inner.order).

It's possible that the planner could be taught to recognize this
situation, but it looks to me like doing that would result in drastic
increases in planning time for many queries (due to having to consider
a lot more Paths) with a resulting win in only a very few.

regards, tom lane


From: Dániel Dénes <panther-d(at)freemail(dot)hu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: JOIN with ORDER on both tables does a sort when it souldn't
Date: 2007-05-27 22:26:06
Message-ID: freemail.20070428002606.68542@fm06.freemail.hu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> írta:

> Daniel Denes <panther-d(at)freemail(dot)hu> writes:
> > But even then, it won't realize that the result are in correct order,
> > and does a sort! Why?
>
> In general the output of a nestloop doesn't derive any ordering
> properties from the inner scan. It might happen to work in your
> particular case because on the outer side (site_id, order) is unique
> and so the "order" values must be strictly increasing. But if there
> could be multiple rows with the same "order" value coming from the
> outer side, then it would be incorrect to claim that the join output is
> sorted by (outer.order, inner.order).
>
> It's possible that the planner could be taught to recognize this
> situation, but it looks to me like doing that would result in drastic
> increases in planning time for many queries (due to having to consider
> a lot more Paths) with a resulting win in only a very few.
>
> regards, tom lane
>

Aww, you're right... I absolutely forgot that scenario (rows with
same "order" in the outer scan).
But that led me to another question: if there could be rows with the
same "order" value in the outer scan, how could I get the list where
forums in distinct forum_groups don't mix, they are sorted by "order"
inside the group, and groups are sorted according to the
outer "order"? (Groups with the same outer "order" can be listed in
either way, but don't mix them!)
The query I wrote would return something like this in such a situation:

group_id | group_ord | forum_id | forum_ord
----------+-----------+----------+-----------
... | ... | ... | ...
41 | 6 | 761 | 1
27 | 6 | 763 | 1
41 | 6 | 762 | 2
27 | 6 | 764 | 2
... | ... | ... | ...

But what if I wanted one of these (either one is OK)?

group_id | group_ord | forum_id | forum_ord
----------+-----------+----------+-----------
... | ... | ... | ...
41 | 6 | 761 | 1
41 | 6 | 762 | 2
27 | 6 | 763 | 1
27 | 6 | 764 | 2
... | ... | ... | ...

group_id | group_ord | forum_id | forum_ord
----------+-----------+----------+-----------
... | ... | ... | ...
27 | 6 | 763 | 1
27 | 6 | 764 | 2
41 | 6 | 761 | 1
41 | 6 | 762 | 2
... | ... | ... | ...

How would I do this? The only thing I can think of is inserting another
ORDER BY column in the middle (assuming group_id is a PRIMARY KEY):
group_ord ASC, group_id ASC, forum_ord ASC

Of course this would now be the same situation as before... :)

Végleges lézeres szőrtelenítés:jún. 30-ig most mindkét hónalj kezelése csak 79 000 Ft! Klikk ide a részleteketért!
http://www.webdesign.hu/aesthetica/flash_microsite/?id=8;p_code=2029


From: Dániel Dénes <panther-d(at)freemail(dot)hu>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: JOIN with ORDER on both tables does a sort when it souldn't
Date: 2007-09-18 10:22:22
Message-ID: freemail.20070818122222.64777@fm17.freemail.hu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Dániel Dénes <panther-d(at)freemail(dot)hu> writes:
> > But even then, it won't realize that the result are in correct
> > order, and does a sort! Why?
>
> In general the output of a nestloop doesn't derive any ordering
> properties from the inner scan. It might happen to work in your
> particular case because on the outer side (site_id, order) is unique
> and so the "order" values must be strictly increasing. But if there
> could be multiple rows with the same "order" value coming from the
> outer side, then it would be incorrect to claim that the join output is
> sorted by (outer.order, inner.order).
>
> It's possible that the planner could be taught to recognize this
> situation, but it looks to me like doing that would result in drastic
> increases in planning time for many queries (due to having to consider
> a lot more Paths) with a resulting win in only a very few.
>
> regards, tom lane

When you wrote this answer, I thought maybe it's really a one-time
problem, and it's not worth spending much time on it, because the
tables involved had 10-100 rows, so a sort wasn't really that scary; I
just wanted to know the cause.

But now I ran into this again. There are 2 tables involved (simplified):

banners_places:
- id integer (PKEY)
- pageid integer (FKEY to a table not involved now)
- place text
UNIQUE KEY: (pageid, place)

banners_show:
- id integer (PKEY)
- bplid integer (FKEY to banners_places.id)
- uptime timestamp
INDEX: (bplid, uptime)

My query is:
SELECT *
FROM banners_places AS bpl
JOIN banners_show AS bsh ON bsh.bplid = bpl.id
WHERE bpl.pageid = 123
ORDER BY bpl.place, bsh.uptime

To me it looks like the best plan would be to get the desired rows from
banners_places and then do a NestLoop join using the index on
banners_show. This way no sorting should be necessary.
But even though I forced PG to do my plan (disabled almost every
alternative), the sort is there:

Sort
Sort Key: bpl.place, bsh.uptime
-> Nested Loop
-> Index Scan using bpl_UNIQUE on banners_places bpl
Index Cond: (pageid = 123)
-> Index Scan using bsh_INDEX on banners_show bsh
Index Cond: (bsh.bplid = "outer".id)

Are you sure this can't be fixed without drastically increasing planning
time?
Or is there a way I can make this query not to do a sort?

Regards,
Denes Daniel

___________________________________________________________
Légy mindig trendi és naprakész - olvass magazinokat a mobilodon Mobizinnel!
www.t-mobile.hu/mobizin