Re: Ordering in an aggregate -- points to paths

Lists: pgsql-sql
From: "Julian Scarfe" <julian(dot)scarfe(at)ntlworld(dot)com>
To: <pgsql-sql(at)postgresql(dot)org>
Subject: Ordering in an aggregate -- points to paths
Date: 2003-06-15 15:13:14
Message-ID: 036301c33350$a4bd5e60$0600a8c0@Wilbur
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

OK, I know relying on ordering in an aggregate is sinful, but I don't know
if it's mortal or venial.

Long explanation, please bear with me, here's the background:

---
CREATE TABLE "foo" (
"a" point,
"b" int
);
INSERT INTO ...

SELECT * FROM foo;
a | b
---------+---
(1,1) | 1
(1,2) | 3
(1.5,3) | 5
(4,4) | 2
(-1,-2) | 4
(5 rows)

---

So I want to create paths from the points in field a ordered by b. The
first step is the rather laborious construction of an append_point function
(am I missing something, BTW? -- seems like an obvious function to have but
I couldn't find a built-in).

---
CREATE FUNCTION "path" (point) RETURNS path AS 'select
path_add_pt(''[(0,0)]''::path,$1)' LANGUAGE 'sql';

CREATE FUNCTION "append_point" (path,point) RETURNS path AS 'select case
WHEN $1 is null THEN path($2)
WHEN $2 is null THEN $1
ELSE path_add($1,path_add_pt(''[(0,0)]''::path,$2))
END' LANGUAGE 'sql';

---

and then the aggregate itself follows in the obvious way

---
CREATE AGGREGATE create_path ( BASETYPE = point, SFUNC = append_point, STYPE
= path);

SELECT create_path(a) FROM foo;
create_path
-------------------------------------
[(1,1),(1,2),(1.5,3),(4,4),(-1,-2)]
(1 row)

---

and moreover, with subselect for ordering following examples from this
mailing list

---
SELECT create_path(c.a) FROM (SELECT a FROM foo ORDER BY b) c;
create_path
-------------------------------------
[(1,1),(4,4),(1,2),(-1,-2),(1.5,3)]
(1 row)

---

So far so good. Now for the real data. The points are an ordered (by
seq_no) set of latitude, longitude pairs defining a "fir", the boundary of a
region on the surface of the earth. (fir_ident, fir_indicator, seq_no) is
unique.

---
CREATE TABLE "fir_coords" (
"node" point,
...
"fir_ident" character(4),
"fir_indicator" character(4),
"seq_no" character(4),
);

SELECT c.fir_ident, c.fir_indicator, create_path (c.node) AS fir_edge
INTO fir_e
FROM
(SELECT fir_ident, fir_indicator, node
FROM fir_coords
ORDER BY fir_ident, fir_indicator,seq_no) c
GROUP BY fir_ident, fir_indicator;
---

The fir_e table should contain the paths for the fir. And for simple shapes
(a few dozen points) it works fine.

But the problem is that e.g. Austria's fir is defined by 1577 points, and
the path that I construct appears to be in the wrong order.

foo=# SELECT fir_ident, fir_indicator, seq_no, node FROM fir_coords WHERE
fir_ident = 'LOVV' LIMIT 10;
fir_ident | fir_indicator | seq_no | node
-----------+---------------+--------+---------------------------------------
LOVV | B | 0005 | (0.241534175928771,0.851255253839368)
LOVV | B | 0010 | (0.241844456684681,0.851240709428934)
LOVV | B | 0015 | (0.242135344893347,0.851167987376768)
LOVV | B | 0020 | (0.242368055460279,0.851022543272435)
LOVV | B | 0025 | (0.242571677206345,0.850935276809835)
LOVV | B | 0030 | (0.242862565415011,0.850717110653336)
LOVV | B | 0035 | (0.24312436480281,0.850528033317703)
LOVV | B | 0040 | (0.243327986548876,0.850368044802937)
LOVV | B | 0045 | (0.243560697115809,0.850324411571637)
LOVV | B | 0050 | (0.243633419167975,0.850208056288171)

whereas my path fir_edge looks like:

((0.268140750748062,0.854920445268556),(0.195244165656432,0.819810238482603)
,(0.238688319620658,0.812014434490362),(0.286597607587902,0.82373722929959),
(0.184975811890532,0.817861287484543),(0.192917059987107,0.816959534037679),
(0.244753338771338,0.849015414632642),(0.298204047113664,0.838528894710242),
(0.277478262246232,0.852418806674031),...

The ordering has gone awry. And since I'm going to draw the fir by 'joining
the dots' that's a Bad Thing.

Is this a problem with my functions, or is there something going on in the
internals that makes it dangerous to rely on the
ordered-subselect-with-aggregate construction above?

Thanks

Julian Scarfe


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Julian Scarfe" <julian(dot)scarfe(at)ntlworld(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Ordering in an aggregate -- points to paths
Date: 2003-06-15 15:49:57
Message-ID: 13056.1055692197@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

"Julian Scarfe" <julian(dot)scarfe(at)ntlworld(dot)com> writes:
> OK, I know relying on ordering in an aggregate is sinful, but I don't know
> if it's mortal or venial.
> ...
> SELECT c.fir_ident, c.fir_indicator, create_path (c.node) AS fir_edge
> INTO fir_e
> FROM
> (SELECT fir_ident, fir_indicator, node
> FROM fir_coords
> ORDER BY fir_ident, fir_indicator,seq_no) c
> GROUP BY fir_ident, fir_indicator;

Yeah, this is a fairly obvious thing to want to do with a user-written
aggregate. It does not work in released versions, because the planner
does not notice that the inner SELECT's output ordering matches what
the GROUP BY needs, and so it inserts an additional Sort plan step
above the sub-select (you can see this if you look at EXPLAIN output).
Unfortunately, on most platforms qsort() isn't stable and will not
preserve the ordering of its input for equal keys. So you lose the
minor ordering by seq_no in the re-sort.

We have fixed this in CVS tip by teaching the planner to notice the
subselect's result ordering and avoid the redundant Sort step. The
patch is probably too large to consider back-patching into 7.3,
unfortunately. Here's the log entry if you want to pursue that:

2003-02-15 15:12 tgl

* src/: backend/optimizer/path/allpaths.c,
backend/optimizer/path/pathkeys.c,
backend/optimizer/plan/planner.c,
backend/optimizer/util/pathnode.c,
backend/optimizer/util/relnode.c, backend/optimizer/util/tlist.c,
include/optimizer/pathnode.h, include/optimizer/paths.h,
include/optimizer/tlist.h: Teach planner how to propagate pathkeys
from sub-SELECTs in FROM up to the outer query. (The
implementation is a bit klugy, but it would take nontrivial
restructuring to make it nicer, which this is probably not worth.)
This avoids unnecessary sort steps in examples like SELECT
foo,count(*) FROM (SELECT ... ORDER BY foo,bar) sub GROUP BY foo
which means there is now a reasonable technique for controlling the
order of inputs to custom aggregates, even in the grouping case.

regards, tom lane


From: "Julian Scarfe" <julian(at)avbrief(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-sql(at)postgresql(dot)org>
Subject: Re: Ordering in an aggregate -- points to paths
Date: 2003-06-15 16:48:40
Message-ID: 039401c3335d$f94d1760$0600a8c0@Wilbur
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

From: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>

> Yeah, this is a fairly obvious thing to want to do with a user-written
> aggregate. It does not work in released versions, because the planner
> does not notice that the inner SELECT's output ordering matches what
> the GROUP BY needs, and so it inserts an additional Sort plan step
> above the sub-select (you can see this if you look at EXPLAIN output).
> Unfortunately, on most platforms qsort() isn't stable and will not
> preserve the ordering of its input for equal keys. So you lose the
> minor ordering by seq_no in the re-sort.

Most grateful for the rapid response Tom. Knowing that, I can work around by
iterating through the firs at the application level.

Regards

Julian Scarfe

PS: you shouldn't be working on a Sunday, it's bad for you ;-)