Re: PoC: Duplicate Tuple Elidation during External Sort for DISTINCT

From: Jeremy Harris <jgh(at)wizmail(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PoC: Duplicate Tuple Elidation during External Sort for DISTINCT
Date: 2014-01-22 21:20:07
Message-ID: 52E03607.2020501@wizmail.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 22/01/14 03:16, Jon Nelson wrote:
> Greetings -hackers:
>
> I have worked up a patch to PostgreSQL which elides tuples during an
> external sort. The primary use case is when sorted input is being used
> to feed a DISTINCT operation. The idea is to throw out tuples that
> compare as identical whenever it's convenient, predicated on the
> assumption that even a single I/O is more expensive than some number
> of (potentially extra) comparisons. Obviously, this is where a cost
> model comes in, which has not been implemented. This patch is a
> work-in-progress.

Dedup-in-sort is also done by my WIP internal merge sort, and
extended (in much the same ways as Jon's) to the external merge.

https://github.com/j47996/pgsql_sorb

I've not done a cost model either, but the dedup capability is
exposed from tuplesort.c to the executor, and downstream uniq
nodes removed.

I've not worked out yet how to eliminate upstream hashagg nodes,
which would be worthwhile from testing results.

--
Cheers,
Jeremy

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeremy Harris 2014-01-22 21:22:03 Re: PoC: Duplicate Tuple Elidation during External Sort for DISTINCT
Previous Message Greg Stark 2014-01-22 20:28:43 Re: proposal: hide application_name from other users