hpw to Count without group by

Lists: pgsql-general
From: Yudie Pg <yudiepg(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: hpw to Count without group by
Date: 2005-06-01 21:16:43
Message-ID: e460d0c0506011416d0bd27c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hello,
I have a table, structure like this:
create table product(
sku, int4 not null,
category int4 null,
display_name varchar(100) null,
rank int4 null
)
let say example data:
sku, category, display_name
=======================
10001, 5, postgresql, 132
10002, 5, mysql, 243
10003, 5, oracle, 323
10006, 7, photoshop, 53
10007, 7, flash mx, 88
10008, 9, Windows XP, 44
10008, 9, Linux, 74
Expected query result:
sku, category, display_name, category_count
====================================
10001, 5, postgresql, 3
10006, 7, photoshop, 2
10008, 9, Windows XP, 2
The idea is getting getting highest ranking each product category and COUNT
how many products in the category with SINGLE query.
the first 3 columns can be done with select distinct on (category) ...
order by category, rank desc but it still missing the category_count. I wish
no subquery needed for having simplest query plan.
Thank you.
Yudie G.


From: Edmund Bacon <ebacon-xlii(at)onesystem(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: hpw to Count without group by
Date: 2005-06-01 22:49:56
Message-ID: m3zmu9sne3.fsf@elb-lx.onesystem.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

yudiepg(at)gmail(dot)com (Yudie Pg) writes:

> Hello,
> I have a table, structure like this:
> create table product(
> sku, int4 not null,
> category int4 null,
> display_name varchar(100) null,
> rank int4 null
> )
> let say example data:
> sku, category, display_name
> =======================
> 10001, 5, postgresql, 132
> 10002, 5, mysql, 243
> 10003, 5, oracle, 323
> 10006, 7, photoshop, 53
> 10007, 7, flash mx, 88
> 10008, 9, Windows XP, 44
> 10008, 9, Linux, 74
> Expected query result:
> sku, category, display_name, category_count
> ====================================
> 10001, 5, postgresql, 3
> 10006, 7, photoshop, 2
> 10008, 9, Windows XP, 2
> The idea is getting getting highest ranking each product category and COUNT
> how many products in the category with SINGLE query.
> the first 3 columns can be done with select distinct on (category) ...
> order by category, rank desc but it still missing the category_count. I wish
> no subquery needed for having simplest query plan.
> Thank you.
> Yudie G.

I do not believe you can do this without a subquery - you are trying
to get 2 separate pieces of information from your data
* some data about the record having MAX(rank) for each category
and
* the count of records in each category

Note, however that you can get MAX(rank) and COUNT(category) in one
sequential pass of the data: e.g
SELECT category, MAX(rank), COUNT(category) FROM product;

Joining this with the orignal table is not too dificult :

SELECT sku, category, display_name, category_count
FROM product
JOIN (SELECT category, MAX(rank) AS rank, COUNT(category) AS category_count
FROM product
GROUP BY category) subq
USING(category, rank)
ORDER BY sku;

Depending on what your data looks like, you might improve things by
having an index on category, and perhaps on (category, rank).

Note that there is may be a problem with this query: If you have more
than one product with the same rank in the same category, you may get
more than one record for that category. Apply distinct on as
neccessary.

--
Remove -42 for email


From: Ragnar HafstaĆ° <gnari(at)simnet(dot)is>
To: Yudie Pg <yudiepg(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: hpw to Count without group by
Date: 2005-06-01 22:52:50
Message-ID: 1117666370.31064.5.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, 2005-06-01 at 16:16 -0500, Yudie Pg wrote:
> Hello,
> I have a table, structure like this:

[...]

> Expected query result:
>
> sku, category, display_name, category_count
> ====================================
> 10001, 5, postgresql, 3
> 10006, 7, photoshop, 2
> 10008, 9, Windows XP, 2
>
> The idea is getting getting highest ranking each product category and
> COUNT how many products in the category with SINGLE query.
>
> the first 3 columns can be done with select distinct on (category) ...
> order by category, rank desc but it still missing the category_count.
> I wish no subquery needed for having simplest query plan.

how about a simple join ?

select sku,category,display_name,count
from
(select distinct on (category) category, sku,display_name
from product order by category,rank
) as foo
natural join
(select category,count(*) as count
from product group by category
) as bar;

gnari


From: Yudie Pg <yudiepg(at)gmail(dot)com>
To: Edmund Bacon <ebacon-xlii(at)onesystem(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: hpw to Count without group by
Date: 2005-06-02 15:30:25
Message-ID: e460d0c05060208304fa51cc5@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

>
> I do not believe you can do this without a subquery - you are trying
> to get 2 separate pieces of information from your data
> * some data about the record having MAX(rank) for each category
> and
> * the count of records in each category

Hi, I guess i try to answer my own question which end up with creating
stored procedure.
Unless you have direct query idea.
This function cut the half of query time, as my concern about postgres
count agregate function is always slower than I expected.
SQL:

CREATE TYPE product_type as
(sku int4, category int4, display_name varchar(100),rank int4,
category_count);
CREATE OR REPLACE FUNCTION get_toprank_product_category (text) returns setof
product_type
as '
DECLARE
kwd ALIAS for $1;
mrow RECORD;
retrow prdtcat_searchresult;
tempcount int4;
prevcatnum int4 ;
i int4;
BEGIN
tempcount = 0;
prevcatnum := 0;
I:=0;
FOR tbrow IN
select * from product order by category, rank
LOOP
i := i+1;

IF prevcatnum != mrow.catnum OR i = 1 THEN
prevcatnum := mrow.catnum;
if i > 1 THEN

RETURN NEXT retrow;

END IF;
retrow.catnum := mrow.catnum;
retrow.corenum :=mrow.corenum;
retrow. mernum := mrow.mernum;
retrow.mersku := mrow.mersku;

tempcount = 1;
retrow.catcount := tempcount;
prevcatnum := mrow.catnum;
ELSE
tempcount := tempcount + 1;
retrow.catcount := tempcount;

END IF;
END LOOP;
RETURN NEXT retrow;

RETURN;
END'
language 'PLPGSQL';