rewrite count distinct query

From: Chris Smith <chris(at)interspire(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: rewrite count distinct query
Date: 2004-12-06 07:00:50
Message-ID: 41B403A2.20000@interspire.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi all,

'Scuse the long post :) I'm trying to include all relevant info..

I'm trying to work out a better way to approach a query, any tips are
greatly appreciated.

The relevant tables:

db=# \d tp_conversions
Table "public.tp_conversions"
Column | Type | Modifiers
---------------+------------------------+--------------------
conversionid | integer | not null default 0
type | character varying(10) |
name | character varying(255) |
amount | double precision |
cookieid | character varying(32) |
currtime | integer |
ip | character varying(20) |
origintype | character varying(20) |
originfrom | character varying(255) |
origindetails | character varying(255) |
userid | integer |
Indexes:
"tp_conversions_pkey" primary key, btree (conversionid)
"conv_origindetails" btree (origindetails)
"conv_originfrom" btree (originfrom)
"conv_origintype" btree (origintype)
"conv_time" btree (currtime)
"conv_userid" btree (userid)

trackpoint=# SELECT count(*) from tp_conversions;
count
-------
261
(1 row)

db=# \d tp_search
Table "public.tp_search"
Column | Type | Modifiers
------------------+------------------------+--------------------
searchid | integer | not null default 0
searchenginename | character varying(255) |
keywords | character varying(255) |
currtime | integer |
ip | character varying(20) |
landingpage | character varying(255) |
cookieid | character varying(32) |
userid | integer |
Indexes:
"tp_search_pkey" primary key, btree (searchid)
"search_cookieid" btree (cookieid)
"search_keywords" btree (keywords)
"search_searchenginename" btree (searchenginename)
"search_userid" btree (userid)

trackpoint=# SELECT count(*) from tp_search;
count
-------
5086
(1 row)

What I'm trying to do...

Work out the number of conversions for each search origin.

This query works:

select
count(distinct conversionid) as convcount,
count(distinct searchid) as searchcount,
(count(distinct conversionid) / count(distinct searchid)) as perc,
s.searchenginename
from tp_conversions c, tp_search s
where
c.origintype='search' and s.searchenginename=c.originfrom and
s.userid=c.userid and c.userid=1
group by searchenginename
order by convcount desc;

convcount | searchcount | perc | searchenginename
-----------+-------------+------+------------------
15 | 2884 | 0 | Google
1 | 110 | 0 | Google AU
2 | 308 | 0 | Google CA
1 | 25 | 0 | Google CL
1 | 143 | 0 | Google DE
1 | 117 | 0 | Google IN
1 | 26 | 0 | Google NZ
3 | 49 | 0 | Google RO
1 | 60 | 0 | Google TH
2 | 174 | 0 | Yahoo
(10 rows)

However the percentage is wrong.

I can cast one to a float:

(count(distinct conversionid) / count(distinct searchid)::float)

and it'll give me a better percentage:

convcount | searchcount | perc | searchenginename
-----------+-------------+---------------------+------------------
15 | 2884 | 0.00520110957004161 | Google
3 | 49 | 0.0612244897959184 | Google RO
2 | 308 | 0.00649350649350649 | Google CA
2 | 174 | 0.0114942528735632 | Yahoo
1 | 110 | 0.00909090909090909 | Google AU
1 | 25 | 0.04 | Google CL
1 | 143 | 0.00699300699300699 | Google DE
1 | 117 | 0.00854700854700855 | Google IN
1 | 26 | 0.0384615384615385 | Google NZ
1 | 60 | 0.0166666666666667 | Google TH
(10 rows)

(I think the answer to this is 'no' but I'm going to ask anyway :P)
Is there an easier way to get the more-detailed percentage (it's meant
to work in multiple databases - so casting to a float won't work for
other db's) ?

More importantly... Is there a better way to write the query (I don't
like the count(distinct...) but it works and gives the right info) ?

I tried to do it with a union:

SELECT
count(searchid),
searchenginename
from tp_search s
where userid=1
group by searchenginename
union
select
count(conversionid),
originfrom
from tp_conversions c
where c.userid=1
group by originfrom;

but then realised that getting the data out with php would be a
nightmare (plus I can't get the percentages).

Lastly:

db=# SELECT version();
version
---------------------------------------------------------------
PostgreSQL 7.4.3 on i386-pc-linux-gnu, compiled by GCC 2.95.4
(1 row)

(I know it's a little out of date, upgrading later this week).

Any suggestions/hints/tips welcome :)

Thanks,
Chris.

Browse pgsql-general by date

  From Date Subject
Next Message Rolf stvik 2004-12-06 07:01:47 Re: 3rd RFD: comp.databases.postgresql (was:
Previous Message Michael Fuhr 2004-12-06 06:51:03 Re: [HACKERS] DBD::PgSPI 0.02