Re: 'distinct on' and 'order by' conflicts of interest

From: stephen(at)thunkit(dot)com
To: "Michael Fuhr" <mike(at)fuhr(dot)org>
Cc: stephen(at)thunkit(dot)com, pgsql-general(at)postgresql(dot)org
Subject: Re: 'distinct on' and 'order by' conflicts of interest
Date: 2004-12-31 21:02:56
Message-ID: 3030.67.21.238.59.1104526976.squirrel@mail.thunkit.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> Define the problem, not how you think it should be solved. What
> are you trying to do? If you can't get the query to work, then
> please post SQL statements to create and populate a table and
> describe the query results you'd like to see.

the situation is i have a set of records in a table (actually, their
values span multiple tables, but that is not important for this example).
I then have a second table that allows me to draw n-ary relationships
between them. so:

create table nodes (
id sequence not null,
property1 varchar(12),
property2 varchar(9),
etc....,

primary key (id)
);

create table arcs (
arcstart integer not null,
arcend integer not null,
type varchar(12) not null,
ordinal integer,

primary key( arcstart,arcend,type )
);

if i have one node loaded, and i want to retrieve nodes related to it via
the arcs table where the 'type' field is of a certain value, and ordered
by the 'ordinal' field, i'd do something like this:

select id,property1,property2,ordinal from nodes,arcs where id=arcstart
and type='contains' order by ordinal;

which would be great, except that arc types can have inverses. That is,
if type is 'contains', there's also a 'contained_by' where the arcstart
and arcend fields are flipped. This isn't data duplication, because
depending on which way the arc is drawn between the two nodes, the ordinal
information is different. So now i have:

select id,property1,property2,ordinal from nodes,arcs where (id=arcstart
and type='contains') or (id=arcend and type='contained_by').

This way I don't have to worry about which end of the arc got defined. As
long as it's defined from one node's point of view, the other one can find
it. But, if it's been defined by both nodes, because they needed to both
specify ordinal information, then I get duplicates. actual results from my
db:

select nodes.title, nodes.name, nodes.id, arcs.ordinal, from Nodes,Arcs
where (Arcs.ArcEnd=Nodes.id and Arcs.ArcStart in ('638') and Arcs.Type=
'contains') or (Arcs.ArcStart=Nodes.id and Arcs.ArcEnd in ('638') and
Arcs.type = 'contained_by' ) order by arcs.ordinal

title | name | id | ordinal
------------------+------------------------------+-----+---------
Recent | Public: Recent Exhibitions | 870 | 0
Upcoming | Public: Upcoming Exhibitions | 852 | 0
Current | Public: Current Exhibitions | 802 | 0
Upcoming | Public: Upcoming Exhibitions | 852 | 1 *
Recent | Public: Recent Exhibitions | 870 | 2 *
Hands-on History | Public: Hands-on History | 931 | 3 *
Current | Public: Current Exhibitions | 802 | 4 *
Hands-on History | Public: Hands-on History | 931 | 5

I've put an '*' next to the rows I want. So my dilemma is two part.
First, I want to sort by the ordinal information only when the arc is
pointing from the source object (id 638) to the other objects. Well, it's
pretty easy to determine which arcs are pointing the right way with this
addition:

select nodes.title, nodes.name, nodes.id, arcs.ordinal, CASE WHEN
nodes.id=arcstart THEN '1' ELSE '0' END as direction from Nodes,Arcs where
(Arcs.ArcEnd=Nodes.id and Arcs.ArcStart in ('638') and Arcs.Type=
'contains') or (Arcs.ArcStart=Nodes.id and Arcs.ArcEnd in ('638') and
Arcs.type = 'contained_by' ) order by direction, arcs.ordinal

title | name | id | ordinal | direction
------------------+------------------------------+-----+---------+-----------
Upcoming | Public: Upcoming Exhibitions | 852 | 1 | 0 *
Recent | Public: Recent Exhibitions | 870 | 2 | 0 *
Hands-on History | Public: Hands-on History | 931 | 3 | 0 *
Current | Public: Current Exhibitions | 802 | 4 | 0 *
Current | Public: Current Exhibitions | 802 | 0 | 1
Recent | Public: Recent Exhibitions | 870 | 0 | 1
Upcoming | Public: Upcoming Exhibitions | 852 | 0 | 1
Hands-on History | Public: Hands-on History | 931 | 5 | 1

So i've got a sort order i like, now i just want to use 'distinct on (id)'
to give me back the first four rows. But that won't work, because it will
sort by id first, giving me:

title | name | id | ordinal | direction
------------------+------------------------------+-----+---------+-----------
Current | Public: Current Exhibitions | 802 | 4 | 0 *
Upcoming | Public: Upcoming Exhibitions | 852 | 1 | 0 *
Recent | Public: Recent Exhibitions | 870 | 2 | 0 *
Hands-on History | Public: Hands-on History | 931 | 3 | 0 *

thus destroying the 'ordinal' field's ability to sort them in the
direction i wish.

I hope that clearly defines the problem defined. I'm not able to modify
the overal db schema because it is central to how the software depending
on it operates. So i really need a query-level solution for this problem.

-Stephen

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Bruno Wolff III 2004-12-31 21:51:23 Re: 'distinct on' and 'order by' conflicts of interest
Previous Message Jeff Davis 2004-12-31 20:14:38 Re: postgresql 8.0 Rc1