Re: people who buy A, also buy C, D, E

From: "Greg Sabino Mullane" <greg(at)turnstep(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Cc: dan(at)langille(dot)org
Subject: Re: people who buy A, also buy C, D, E
Date: 2005-06-25 03:35:40
Message-ID: 62b4e0aae1e53b542337e18ec2b6b4a8@biglumber.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

> The goal of my query is: given a book, what did other people who
> bought this book also buy? I plan the list the 5 most popular such
> books.

I've been playing with this a little bit, and I don't think you are
going to get better than you already have. Certainly, the caching
won't work either as any insert into the watch_list_element has
the potential to change a very large number of pre-compiled lists.
However, there are some minor optimizations that can be made to
speed up the existing query quite a bit. One note first: the LIMIT
should be 6 not 5 if you really want the five other "books" and the
book itself will more than likely appear in the list. Picking it
out is something the client app can do.

* Make sure the tables are freshly analyzed. Might want to bump
up the default stats a bit too.

* Looks like you already have indexes on the watch_list_element
table. The watch_list_element_element_id index could be broken
into multiple conditional indexes, but your explain shows this
would not really gain us much:

actual time=37.957..41.789

* One big gain would be to cluster the table on watch_list_id:

CREATE INDEX watch_index ON watch_list_element (watch_list_id);
CLUSTER watch_index ON watch_list_element;

I got about a 25% speedup on my queries by doing this. YMMV, as I
don't know enough about your conditions to do more than make an
approximate test database. But it should help this query out.

* Finally, you should upgrade if at all possible. Going from
7.4.7 to 8.0.1 gave me a 10% speed increase, while going from
8.0.1 to 8.1.0 (e.g. the upcoming version) gave me an additional
25% speed boost, mostly due to the new bitmap stuff. So, making
the jump to 8.0.1 will be good practice for the 8.1.0 jump, right? :)

Overall, I was able to get the query to go about a third faster
than when I started. Hope this helps.

- --
Greg Sabino Mullane greg(at)turnstep(dot)com
PGP Key: 0x14964AC8 200506242328
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8

-----BEGIN PGP SIGNATURE-----

iD8DBQFCvNCrvJuQZxSWSsgRAmkDAJ44z/Ei27HuEBqx/htmCRHJZWi8VQCfV2mm
upeE0p3z4h11NJzl5aOqCkc=
=LVqI
-----END PGP SIGNATURE-----

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Greg Sabino Mullane 2005-06-25 04:22:54 Re: Grouping Too Closely
Previous Message Bruce Momjian 2005-06-25 01:50:23 Re: [SQL] ARRAY() returning NULL instead of ARRAY[]