Re: [HACKERS] [PATCH] Incremental sort

From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Darafei Komяpa Praliaskouski <me(at)komzpa(dot)net>, Antonin Houska <ah(at)cybertec(dot)at>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] [PATCH] Incremental sort
Date: 2018-04-09 15:56:10
Message-ID: CAPpHfdt14t5fsLQ2Pd5m2WQcRZHaHHPNyx9Fko6x_4Uk-Buq4g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Apr 7, 2018 at 11:57 PM, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
wrote:

> On 04/07/2018 06:23 PM, Tom Lane wrote:
> > Teodor Sigaev <teodor(at)sigaev(dot)ru> writes:
> >>> I dunno, how would you estimate whether this is actually a win or not?
> >>> I don't think our model of sort costs is anywhere near refined enough
> >>> or accurate enough to reliably predict whether this is better than
> >>> just doing it in one step. Even if the cost model is good, it's not
> >>> going to be better than our statistics about the number/size of the
> >>> groups in the first column(s), and that's a notoriously unreliable
> stat.
> >
> >> I think that improvement in cost calculation of sort should be a
> >> separate patch, not directly connected to this one. Postpone patches
> >> till other part will be ready to get max improvement for postponed ones
> >> doesn't seem to me very good, especially if it suggests some improvement
> >> right now.
> >
> > No, you misunderstand the point of my argument. Without a reasonably
> > reliable cost model, this patch could easily make performance *worse*
> > not better for many people, due to choosing incremental-sort plans
> > where they were really a loss.
> >
>
> Yeah. Essentially the patch could push the planner to pick a path that
> has low startup cost (and very high total cost), assuming it'll only
> need to read small part of the input. But if the number of groups in the
> input is low (perhaps just one huge group), that would be a regression.
>

Yes, I think the biggest risk here is too small number of groups. More
precisely the risk is too large groups while total number of groups might
be large enough.

> If we were at the start of a development cycle and work were being
> > promised to be done later in the cycle to improve the planning aspect,
> > I'd be more charitable about it. But this isn't merely the end of a
> > cycle, it's the *last day*. Now is not the time to commit stuff that
> > needs, or even just might need, follow-on work.
> >
>
> +1 to that
>
> FWIW I'm willing to spend some time on the patch for PG12, particularly
> on the planner / costing part. The potential gains are too interesting
>

Thank you very much for your efforts on reviewing this patch.
I'm looking forward work with you on this feature for PG12.

FWIW, I think that we're moving this patch in the right direction by
providing separate paths for incremental sort. It's much better than
deciding between full or incremental sort in-place. For sure, considereing
extra paths might cause planning time regression. But I think the
same statement is true about many other planning optimizations.
One thing be can do is to make enable_incrementalsort = off by
default. Then only users who understand improtance of incremental
sort will get extra planning time.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2018-04-09 16:06:43 Re: [HACKERS] Runtime Partition Pruning
Previous Message Greg Stark 2018-04-09 15:29:36 Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS