Re: how to write an optimized sql with two same subsql?

Lists: pgsql-general
From: sunpeng <bluevaley(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: how to write an optimized sql with two same subsql?
Date: 2010-10-14 23:34:10
Message-ID: AANLkTin6fUFO3mO-AMEX=-xWQV16gOzmyWcRdBWL+SSw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

We have a table A:
CREATE TABLE A(
uid integer,
groupid integer
)
Now we use this subsql to get each group's count:
SELECT count(*) as count
FROM A
GROUP BY groupid
ORDER BY groupid

Then we try to find the group pair with following conditions:
SELECT c.groupid as groupid1,d.groupid as groupid2
FROM subsql as c, subsql as d
WHERE d.groupid > c.groupid
and d.count > c.count;

Does that mean subsql will be executed twice? or how to write the optimized
sql?


From: Rob Sargent <robjsargent(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: how to write an optimized sql with two same subsql?
Date: 2010-10-15 00:06:11
Message-ID: 4CB79AF3.1050207@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 10/14/2010 05:34 PM, sunpeng wrote:
> We have a table A:
> CREATE TABLE A(
> uid integer,
> groupid integer
> )
> Now we use this subsql to get each group's count:
> SELECT count(*) as count
> FROM A
> GROUP BY groupid
> ORDER BY groupid
>
> Then we try to find the group pair with following conditions:
> SELECT c.groupid as groupid1,d.groupid as groupid2
> FROM subsql as c, subsql as d
> WHERE d.groupid > c.groupid
> and d.count > c.count;
>
> Does that mean subsql will be executed twice? or how to write the
> optimized sql?
>
>
What are you trying to discover about groups and their frequency in
tablea? Does the numberical value of groupid have any meaning in your
system?


From: sunpeng <bluevaley(at)gmail(dot)com>
To: Rob Sargent <robjsargent(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: how to write an optimized sql with two same subsql?
Date: 2010-10-15 00:18:16
Message-ID: AANLkTi=PsGVWJ-JbVGmr_ujsAdbyf++Bu4k1KT8Snf93@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Actually I've simplied my original sql to the previous version, since it's
simple yet reveals the same problem.
My original sql is to get two instersected cluster(as same concept as group
) and its commonarea:

SELECT a.clusterid AS clusterida,
b.clusterid AS clusteridb,
*St_astext*(*St_intersection*(a.bufferbox, b.bufferbox)) AS
commonarea
FROM (SELECT *St_buffer*(*St_convexhull*(*St_collect*(c.a0)), 2100.000000)
AS
bufferbox,
d.clusterid AS
clusterid
FROM _mcir_2347694 c,
_mcir_2347694_clusterid2 d
WHERE c.uid = d.uid
GROUP BY d.clusterid) a,
(SELECT *St_buffer*(*St_convexhull*(*St_collect*(c.a0)), 2100.000000)
AS
bufferbox,
d.clusterid AS
clusterid
FROM _mcir_2347694 c,
_mcir_2347694_clusterid2 d
WHERE c.uid = d.uid
GROUP BY d.clusterid) b
WHERE b.clusterid > a.clusterid
AND *St_intersects*(a.bufferbox, b.bufferbox)
ORDER BY a.clusterid;

The DDL for _mcir_2347694 and _mcir_2347694_clusterid2 is:
CREATE TABLE _mcir_2347579
(
a0 geometry,
uid integer
)
CREATE TABLE _mcir_2347579_clusterid2
(
uid integer NOT NULL,
clusterid integer
)
In these two tables, _mcir_2347579_clusterid2.uid = mcir_2347579.uid, just
like a forign key.

The same question is how to avoid the following subquery be executed twice:
SELECT *St_buffer*(*St_convexhull*(*St_collect*(c.a0)), 2100.000000) AS
bufferbox,
d.clusterid AS
clusterid
FROM _mcir_2347694 c,
_mcir_2347694_clusterid2 d
WHERE c.uid = d.uid
GROUP BY d.clusterid

2010/10/14 Rob Sargent <robjsargent(at)gmail(dot)com>

>
>
> On 10/14/2010 05:34 PM, sunpeng wrote:
> > We have a table A:
> > CREATE TABLE A(
> > uid integer,
> > groupid integer
> > )
> > Now we use this subsql to get each group's count:
> > SELECT count(*) as count
> > FROM A
> > GROUP BY groupid
> > ORDER BY groupid
> >
> > Then we try to find the group pair with following conditions:
> > SELECT c.groupid as groupid1,d.groupid as groupid2
> > FROM subsql as c, subsql as d
> > WHERE d.groupid > c.groupid
> > and d.count > c.count;
> >
> > Does that mean subsql will be executed twice? or how to write the
> > optimized sql?
> >
> >
> What are you trying to discover about groups and their frequency in
> tablea? Does the numberical value of groupid have any meaning in your
> system?
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


From: "Igor Neyman" <ineyman(at)perceptron(dot)com>
To: "sunpeng" <bluevaley(at)gmail(dot)com>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: how to write an optimized sql with two same subsql?
Date: 2010-10-15 17:56:25
Message-ID: F4C27E77F7A33E4CA98C19A9DC6722A2069F346B@EXCHANGE.corp.perceptron.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

> -----Original Message-----
> From: sunpeng [mailto:bluevaley(at)gmail(dot)com]
> Sent: Thursday, October 14, 2010 7:34 PM
> To: pgsql-general(at)postgresql(dot)org
> Subject: how to write an optimized sql with two same subsql?
>
> We have a table A:
> CREATE TABLE A(
> uid integer,
> groupid integer
> )
> Now we use this subsql to get each group's count:
> SELECT count(*) as count
> FROM A
> GROUP BY groupid
> ORDER BY groupid
>
> Then we try to find the group pair with following conditions:
> SELECT c.groupid as groupid1,d.groupid as groupid2 FROM
> subsql as c, subsql as d WHERE d.groupid > c.groupid
> and d.count > c.count;
>
> Does that mean subsql will be executed twice? or how to write
> the optimized sql?
>

Is that what you want:

WITH gr_counts AS (
SELECT groupid, COUNT(*) AS CNT
FROM A
GROUP BY groupid)
SELECT C.groupid AS groupid1, D.groupid AS groupid2
FROM gr_counts C, gr_counts D
WHERE D.groupid > C.groupid
AND D.count > C.count;

This will execute:

SELECT groupid, COUNT(*) AS CNT
FROM A
GROUP BY groupid

only once.

Regards,
Igor Neyman