Re: PoC: Partial sort

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PoC: Partial sort
Date: 2013-12-18 11:53:06
Message-ID: CAPpHfdvih8Q04Kd_ezcGoFOFa4mYEgw8Ci5eT3gS41+BVhHa-Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Dec 14, 2013 at 7:04 PM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:

> Hi,
>
> > > > Limit (cost=69214.06..69214.08 rows=10 width=16) (actual
> > > > time=0.097..0.099 rows=10 loops=1)
> > > > -> Sort (cost=69214.06..71714.06 rows=1000000 width=16) (actual
> > > > time=0.096..0.097 rows=10 loops=1)
> > > > Sort Key: v1, v2
> > > > Sort Method: top-N heapsort Memory: 25kB
> > > > -> Index Scan using test_v1_idx on test
> (cost=0.42..47604.42
> > > > rows=1000000 width=16) (actual time=0.017..0.066 rows=56 loops=1)
> > > > Total runtime: 0.125 ms
> > > > (6 rows)
> > >
> > > Is that actually all that beneficial when sorting with a bog standard
> > > qsort() since that doesn't generally benefit from data being
> pre-sorted?
> > > I think we might need to switch to a different algorithm to really
> > > benefit from mostly pre-sorted input.
> > >
> >
> > In this patch I don't do full sort of dataset. For instance, index
> returns
> > data ordered by first column and we need to order them also by second
> > column.
>
> Ah, that makes sense.
>
> > But, I don't think we should expect pre-sorted values of second column
> > inside a group.
>
> Yes, if you do it that way, there doesn't seem to any need to assume
> that any more than we usually do.
>
> I think you should make the explain output reflect the fact that we're
> assuming v1 is presorted and just sorting v2. I'd be happy enough with:
> Sort Key: v1, v2
> Partial Sort: v2
> or even just
> "Partial Sort Key: [v1,] v2"
> but I am sure others disagree.
>

Sure, I just didn't change explain output yet. It should look like what you
propose.

> > > *partial-knn-1.patch*
>
> > > Rechecking from the heap means adding a sort node though, which I don't
> > > see here? Or am I misunderstanding something?
>
> > KNN-GiST contain RB-tree of scanned items. In this patch item is
> rechecked
> > inside GiST and reinserted into same RB-tree. It appears to be much
> easier
> > implementation for PoC and also looks very efficient. I'm not sure what
> is
> > actually right design for it. This is what I like to discuss.
>
> I don't have enough clue about gist to say wether it's the right design,
> but it doesn't look wrong to my eyes. It'd probably be useful to export
> the knowledge that we are rechecking and how often that happens to the
> outside.
> While I didn't really look into the patch, I noticed in passing that you
> pass a all_dead variable to heap_hot_search_buffer without using the
> result - just pass NULL instead, that performs a bit less work.

Useful notice, thanks.

------
With best regards,
Alexander Korotkov.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2013-12-18 11:59:12 Re: [PATCH] SQL assertions prototype
Previous Message Alexander Korotkov 2013-12-18 11:51:08 Re: PoC: Partial sort