Performance penalty when using WITH

From: Li Jin <ljin(at)tripadvisor(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Performance penalty when using WITH
Date: 2011-07-28 21:00:06
Message-ID: 339959BC-3A27-4269-8A3E-4C157EBABB86@tripadvisor.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi guys,

I met with the problem that when I was using WITH clause to reuse a subquery, I got a huge performance penalty because of query planner.

Here are the details, the original query is

EXPLAIN ANALYZE WITH latest_identities AS
(
SELECT DISTINCT ON (memberid) memberid, username, changedate
FROM t_username_history
WHERE memberid IN (SELECT memberid FROM t_member WHERE firstname || ' ' || substring(lastname,1,1) = 'Eddie T')
ORDER BY memberid, changedate DESC
)
SELECT t_member.email as email, t_member.username as username, t_member.location as location, t_member.locale as locale, t_member.status as status, t_member.creationdate as creationdate, t_forum_member.pos\
ts as posts, t_forum_member.expertlocations as expertlocations, t_nexus_member.pages_created as pages_created, t_nexus_member.pages_edited as pages_edited, t_member_contributions.hotel_reviews as hotel_rev\
iews, t_member_contributions.restaurant_reviews as restaurant_reviews, t_member_contributions.attraction_reviews as attraction_reviews, t_member_contributions.geo_reviews as geo_reviews, t_member_contribut\
ions.photos as photos, t_member_contributions.videos as videos, t_recent_contribution.recent_contribution_date as recent_contribution_date, t_recent_contribution.recent_contribution_type as recent_contribu\
tion_type, t_owner_member.memberid as owner_memberid, t_member_interaction.flags as interaction_flags, t_media.path as ta_avatar_path, t_external_member.externalid as facebookid, latest_identities.username\
as latest_identity
FROM t_member left join t_forum_member on (t_member.memberid = t_forum_member.memberid) left join t_nexus_member on (t_member.memberid = t_nexus_member.memberid) left join t_member_contributions on (t_memb\
er.memberid = t_member_contributions.memberid) left join t_recent_contribution on (t_member.memberid = t_recent_contribution.memberid) left join t_owner_member on (t_member.memberid = t_owner_member.member\
id) left join t_member_interaction on (t_member.memberid = t_member_interaction.memberid) left join t_media on (t_member.avatar = t_media.id) left join t_external_member on (t_member.memberid = t_external_\
member.memberid AND t_external_member.idtype = 'FB') left join latest_identities on (t_member.memberid = latest_identities.memberid)
WHERE t_member.firstname || ' ' || substring(t_member.lastname,1,1) = 'Eddie T';

The may seems scary, but what it really does is searching for members with certain name and joining with a bunch of other tables on memberid. The t_username_history table has multiple rows for a memberid therefore I just get the most recent record for each memberid that I am interested in before the join.

Here is the link to explain:
http://explain.depesz.com/s/ZKb

Since the red part looks suboptimal to me, I changed it using WITH subquery:

EXPLAIN WITH memberids AS
(
SELECT memberid FROM t_member WHERE firstname || ' ' || substring(lastname,1,1) = 'Eddie T'
),
latest_identities AS
(
SELECT DISTINCT ON (memberid) memberid, username, changedate
FROM t_username_history
WHERE memberid IN (SELECT memberid FROM memberids)
ORDER BY memberid, changedate DESC
)
SELECT t_member.email as email, t_member.username as username, t_member.location as location, t_member.locale as locale, t_member.status as status, t_member.creationdate as creationdate, t_forum_member.pos\
ts as posts, t_forum_member.expertlocations as expertlocations, t_nexus_member.pages_created as pages_created, t_nexus_member.pages_edited as pages_edited, t_member_contributions.hotel_reviews as hotel_rev\
iews, t_member_contributions.restaurant_reviews as restaurant_reviews, t_member_contributions.attraction_reviews as attraction_reviews, t_member_contributions.geo_reviews as geo_reviews, t_member_contribut\
ions.photos as photos, t_member_contributions.videos as videos, t_recent_contribution.recent_contribution_date as recent_contribution_date, t_recent_contribution.recent_contribution_type as recent_contribu\
tion_type, t_owner_member.memberid as owner_memberid, t_member_interaction.flags as interaction_flags, t_media.path as ta_avatar_path, t_external_member.externalid as facebookid, latest_identities.username\
as latest_identity
FROM t_member left join t_forum_member on (t_member.memberid = t_forum_member.memberid) left join t_nexus_member on (t_member.memberid = t_nexus_member.memberid) left join t_member_contributions on (t_memb\
er.memberid = t_member_contributions.memberid) left join t_recent_contribution on (t_member.memberid = t_recent_contribution.memberid) left join t_owner_member on (t_member.memberid = t_owner_member.member\
id) left join t_member_interaction on (t_member.memberid = t_member_interaction.memberid) left join t_media on (t_member.avatar = t_media.id) left join t_external_member on (t_member.memberid = t_external_\
member.memberid AND t_external_member.idtype = 'FB') left join latest_identities on (t_member.memberid = latest_identities.memberid)
WHERE t_member.memberid IN (SELECT memberid FROM memberids)

However, this query runs forever because (I think) the planner join the tables before filter by where clause.

Here is the explain link:
http://explain.depesz.com/s/v2K

Anyone knows why the planner is doing this?

Regards,
Li

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Robert Haas 2011-07-29 00:25:20 Re: very large record sizes and ressource usage
Previous Message Terry Schmitt 2011-07-27 16:02:53 Re: Hardware advice for scalable warehouse db