Re: [PATCH] Incremental sort (was: PoC: Partial sort)

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: James Coleman <jtc331(at)gmail(dot)com>
Cc: Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Shaun Thomas <shaun(dot)thomas(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date: 2019-06-14 17:37:25
Message-ID: 20190614173725.vu6xrkx4iu4ghjdz@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 13, 2019 at 11:38:12PM -0400, James Coleman wrote:
>On Wed, Jun 5, 2019 at 12:14 PM Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com> wrote:
>> > > 2) Provide some fallback at execution time. For example, we might watch
>> > > the size of the group, and if we run into an unexpectedly large one we
>> > > might abandon the incremental sort and switch to a "full sort" mode.
>> >
>> > Are there good examples of our doing this in other types of nodes
>> > (whether the fallback is an entirely different algorithm/node type)? I
>> > like this idea in theory, but I also think it's likely it would add a
>> > very significant amount of complexity. The other problem is knowing
>> > where to draw the line: you end up creating these kinds of cliffs
>> > where pulling one more tuple through the incremental sort would give
>> > you your batch and result in not having to pull many more tuples in a
>> > regular sort node, but the fallback logic kicks in anyway.
>> >
>> What about having some simple mechanism for this, like if we encounter
>> the group with more tuples than the one estimated then simply switch
>> to normal sort for the remaining tuples, as the estimates does not
>> hold true anyway. Atleast this will not give issues of having
>> regressions of incremental sort being too bad than the normal sort.
>> I mean having something like this, populate the tuplesortstate and
>> keep on counting the number of tuples in a group, if they are within
>> the budget call tuplesort_performsort, otherwise put all the tuples in
>> the tuplesort and then call tuplesort_performsort. We may have an
>> additional field in IncrementalSortState to save the estimated size of
>> each group. I am assuming that we use MCV lists to approximate better
>> the group sizes as suggested above by Tomas.
>
>I think the first thing to do is get some concrete numbers on performance if we:
>
>1. Only sort one group at a time.
>2. Update the costing to prefer traditional sort unless we have very
>high confidence we'll win with incremental sort.
>
>It'd be nice not to have to add additional complexity if at all possible.
>

+1 to that

>> > Unrelated to all of the above: if I read the patch properly it
>> > intentionally excludes backwards scanning. I don't see any particular
>> > reason why that ought to be the case, and it seems like an odd
>> > limitation for the feature should it be merged. Should that be a
>> > blocker to merging?
>>
>> Regarding this, I came across this,
>> /*
>> * Incremental sort can't be used with either EXEC_FLAG_REWIND,
>> * EXEC_FLAG_BACKWARD or EXEC_FLAG_MARK, because we hold only current
>> * bucket in tuplesortstate.
>> */
>> I think that is quite understandable. How are you planning to support
>> backwards scan for this? In other words, when will incremental sort be
>> useful for backward scan.
>
>For some reason I was thinking we'd need it to support backwards scans
>to be able to handle DESC sort on the index, but I've tested and
>confirmed that already works. I suppose that's because the index scan
>provides that ordering and the sort node doesn't need to reverse the
>direction of what's provided to it. That's not particularly obvious to
>someone newer to the codebase; I'm not sure if that's documented
>anywhere.
>

Yeah, backward scans are not about ASC/DESC, it's about being able to walk
back through the result. And we can't do that with incremental sorts
without materialization.

>> On a different note, I can't stop imagining this operator on the lines
>> similar to parallel-append, wherein multiple workers can sort the
>> different groups independently at the same time.
>
>That is an interesting idea. I suppose it'd be particularly valuable
>if somehow there a node that was generating each batch in parallel
>already, though I'm not sure at first though what kind of query or
>node would result in that. I also wonder if (assuming that weren't the
>case) it would be much of an improvement since a single thread would
>have to generate each batch anyway; I'm not sure if the overhead of
>farming each batch out to a worker would actually be useful or if the
>real blocker is the base scan.
>
>At the very least it's an interesting idea.
>

I kinda doubt that'd be very valuable. Or more precisely, we kinda already
have that capability because we can do things like this:

-> Gather Merge
-> Sort
-> ... scan ...

so I imagine we'd just do an Incremental Sort here. Granted, it's not
distributed by prefix groups (I assume that's what you mean by batches
here), but I don't think that's a big problem.

>---
>
>I've been writing down notes here, and I realized that my test case
>from far upthread is actually a useful setup to see how much overhead
>is involved in sorting each batch individually, since it sets up data
>with each batch only containing 1 tuple. That particular case is one
>we could easily optimize anyway in the code and skip sorting
>altogether -- might be a useful enhancement.
>
>I hope to do some more testing and then report back with the results.
>
>James Coleman

OK.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2019-06-14 18:11:35 Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Previous Message Jesper Pedersen 2019-06-14 17:24:20 Re: Index Skip Scan