Re: people who buy A, also buy C, D, E

From: Achilleus Mantzios <achill(at)matrix(dot)gatewaynet(dot)com>
To: Christoph Haller <ch(at)rodos(dot)fzk(dot)de>
Cc: Dan Langille <dan(at)langille(dot)org>, <pgsql-sql(at)postgresql(dot)org>
Subject: Re: people who buy A, also buy C, D, E
Date: 2005-04-26 12:25:12
Message-ID: Pine.LNX.4.44.0504261523390.7746-100000@matrix.gatewaynet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

O Christoph Haller έγραψε στις Apr 26, 2005 :

> Dan Langille wrote:
> >
> > The goal of my query is: given a book, what did other people who
> > bought this book also buy? I plan the list the 5 most popular such
> > books. In reality, this isn't about books, but that makes it easier
> > to understand I think.
> >
> > We have a table of customer_id (watch_list_id) and book_id
> > (element_id).
> >
> > freshports.org=# \d watch_list_element
> > Table "public.watch_list_element"
> > Column | Type | Modifiers
> > ---------------+---------+-----------
> > watch_list_id | integer | not null
> > element_id | integer | not null
> > Indexes:
> > "watch_list_element_pkey" primary key, btree (watch_list_id,
> > element_id)
> > "watch_list_element_element_id" btree (element_id)
> > Foreign-key constraints:
> > "$2" FOREIGN KEY (watch_list_id) REFERENCES watch_list(id) ON
> > UPDATE CASCADE ON DELETE CASCADE
> > "$1" FOREIGN KEY (element_id) REFERENCES element(id) ON UPDATE
> > CASCADE ON DELETE CASCADE
> >
> > freshports.org=#
> >
> > I have a query which returns the needed results:
> >
> > SELECT W.element_id
> > FROM watch_list_element W
> > WHERE w.watch_list_id in (select watch_list_id from
> > watch_list_element where element_id = 54968)
> > GROUP BY W.element_id
> > ORDER BY count(W.watch_list_id) DESC
> > LIMIT 5;
> >
> > But performance is an issue here. So I'm planning to calculate all
> > the possible values and cache them. That is, given each element_id in
> > a watch_list, what are the top 5 element_id values on all the lists
> > on which the original element_id appears?
> >
> > I'm having trouble constructing the query. I'm not even sure I can
> > do this in one select, but that would be nice. Examples and clues
> > are appreciated.
> >
> > Any ideas?
> >
> > Thank you.
> > --
>
> Just two ideas.
>
> 1) Older Postgres versions are notorious for being slow
> on "IN" clauses.
> Does this one (untested) perform better:
>
> SELECT W.element_id, count(W.watch_list_id)
> FROM watch_list_element W
> WHERE EXISTS
> (SELECT * FROM watch_list_element E
> WHERE E.element_id = 54968 AND W.watch_list_id = E.watch_list_id)
> GROUP BY W.element_id
> ORDER BY 2 DESC
> LIMIT 5;
>
> 2) I suspect calculating all possible values would require time and
> an enormous cache buffer in size as well as re-calculating pretty often.
> So my approach would be trying to tune the query before introducing
> cached results.

AFAIK, problems like this fall into the "Data Mining" field,
and often their solution go beyond some DB arrangments.
A little research wouldn't hurt, IMO.

>
> HTH
>
> Regards, Christoph
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend
>

--
-Achilleus

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Dan Langille 2005-04-26 12:35:20 Re: people who buy A, also buy C, D, E
Previous Message Christoph Haller 2005-04-26 12:24:37 Re: people who buy A, also buy C, D, E