From: | Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> |
---|---|
To: | dgrowleyml(at)gmail(dot)com |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Performance improvement for joins where outer side is unique |
Date: | 2015-03-24 12:11:20 |
Message-ID: | 20150324.211120.120654019.horiguchi.kyotaro@lab.ntt.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi, thanks for the new patch.
I made an additional shrink from your last one. Do you have a
look on the attached?
At Sun, 22 Mar 2015 19:42:21 +1300, David Rowley <dgrowleyml(at)gmail(dot)com> wrote in <CAApHDvrKwMmTwkXfn4uazYZA9jQL1c7UwBjBtuwFR69rqLVKfA(at)mail(dot)gmail(dot)com>
> On 20 March 2015 at 21:11, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> >
> > I can continue working on your patch if you like? Or are you planning to
> > go further with it?
It's fine that you continue to work on this.
# Sorry for the hardly baked patch which had left many things alone:(
> I've been working on this more over the weekend and I've re-factored things
> to allow LEFT JOINs to be properly marked as unique.
> I've also made changes to re-add support for detecting the uniqueness of
> sub-queries.
I don't see the point of calling mark_unique_joins for every
iteration on join_info_list in remove_useless_joins.
The loop already iteraltes on whole the join_info_list so
mark_unique_join as an individual function is needless.
Finally, simply marking uniqueness of join in join_is_removable
seems to be enough, inhibiting early bailing out by the checks on
attribute usage and placeholder let it work as expected.
Reducing changes to this extent, I can easily see what is added
to planning computations. It consists of mainly two factors.
- Unique-join chekcs for every candidate inner joins in
add_paths_to_joinrel.
- Uniqueness check of mergejoinable clause in join-removability
check for every left join, some of which would be skipped
by other checks before.
> Also, I've added modified the costing for hash and nested loop joins to
> reduce the cost for unique inner joins to cost the join the same as it does
> for SEMI joins. This has tipped the scales on a few plans in the regression
> tests.
I've forgotten it, but quite important.
> Also, please see attached unijoin_analysis.patch. This just adds some code
> which spouts out notices when join nodes are initialised which states if
> the join is unique or not. Running the regression tests with this patch in
> places gives:
>
> Unique Inner: Yes == 753 hits
> Unique Inner: No == 1430 hits
>
> So it seems we can increase the speed of about 1 third of joins by about
> 10%.
> A quick scan of the "No"s seems to show quite a few cases which do not look
> that real world like. e.g cartesian join.
I don't have an idea how many queries in the reality hit this but
I suppose it's not a few.
> It would be great if someone could run some PostgreSQL application with
> these 2 patches applied, and then grep the logs for the Unique Inner
> results... Just to get a better idea of how many joins in a real world case
> will benefit from this patch.
Wow. I think the second patch should be DEBUGx, not NOTICE:)
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachment | Content-Type | Size |
---|---|---|
0001-Removing-individual-phase-to-mark-unique-joins.patch | text/x-patch | 14.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | ktm@rice.edu | 2015-03-24 13:00:20 | Re: Abbreviated keys for Numeric |
Previous Message | Thom Brown | 2015-03-24 12:03:32 | Re: Error with index on unlogged table |