Re: why not parallel seq scan for slow functions

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: why not parallel seq scan for slow functions
Date: 2017-08-08 07:50:23
Message-ID: CAA4eK1LW4xLj8HkevV3nsrr3Y1JNYK5NSWkfJRKixcRXtA82fQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 2, 2017 at 11:12 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> On Wed, Jul 12, 2017 at 7:08 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
>>
>> On Wed, Jul 12, 2017 at 11:20 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>> > On Tue, Jul 11, 2017 at 10:25 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
>> > wrote:
>> >>
>> >> On Wed, Jul 12, 2017 at 1:50 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
>> >> wrote:
>> >> > On Mon, Jul 10, 2017 at 9:51 PM, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
>> >> > wrote:
>> >> >>
>> >> >> So because of this high projection cost the seqpath and parallel
>> >> >> path
>> >> >> both have fuzzily same cost but seqpath is winning because it's
>> >> >> parallel safe.
>> >> >
>> >> >
>> >> > I think you are correct. However, unless parallel_tuple_cost is set
>> >> > very
>> >> > low, apply_projection_to_path never gets called with the Gather path
>> >> > as
>> >> > an
>> >> > argument. It gets ruled out at some earlier stage, presumably
>> >> > because
>> >> > it
>> >> > assumes the projection step cannot make it win if it is already
>> >> > behind
>> >> > by
>> >> > enough.
>> >> >
>> >>
>> >> I think that is genuine because tuple communication cost is very high.
>> >
>> >
>> > Sorry, I don't know which you think is genuine, the early pruning or my
>> > complaint about the early pruning.
>> >
>>
>> Early pruning. See, currently, we don't have a way to maintain both
>> parallel and non-parallel paths till later stage and then decide which
>> one is better. If we want to maintain both parallel and non-parallel
>> paths, it can increase planning cost substantially in the case of
>> joins. Now, surely it can have benefit in many cases, so it is a
>> worthwhile direction to pursue.
>
>
> If I understand it correctly, we have a way, it just can lead to exponential
> explosion problem, so we are afraid to use it, correct? If I just
> lobotomize the path domination code (make pathnode.c line 466 always test
> false)
>
> if (JJ_all_paths==0 && costcmp != COSTS_DIFFERENT)
>
> Then it keeps the parallel plan and later chooses to use it (after applying
> your other patch in this thread) as the overall best plan. It even doesn't
> slow down "make installcheck-parallel" by very much, which I guess just
> means the regression tests don't have a lot of complex joins.
>
> But what is an acceptable solution? Is there a heuristic for when retaining
> a parallel path could be helpful, the same way there is for fast-start
> paths? It seems like the best thing would be to include the evaluation
> costs in the first place at this step.
>
> Why is the path-cost domination code run before the cost of the function
> evaluation is included?

Because the function evaluation is part of target list and we create
path target after the creation of base paths (See call to
create_pathtarget @ planner.c:1696).

> Is that because the information needed to compute
> it is not available at that point,

Right.

I see two ways to include the cost of the target list for parallel
paths before rejecting them (a) Don't reject parallel paths
(Gather/GatherMerge) during add_path. This has the danger of path
explosion. (b) In the case of parallel paths, somehow try to identify
that path has a costly target list (maybe just check if the target
list has anything other than vars) and use it as a heuristic to decide
that whether a parallel path can be retained.

I think the preference will be to do something on the lines of
approach (b), but I am not sure whether we can easily do that.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2017-08-08 08:00:43 Re: WIP: Failover Slots
Previous Message Mengxing Liu 2017-08-08 07:35:06 Re: [GSOC][weekly report 9] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions