Re: PoC: Duplicate Tuple Elidation during External Sort for DISTINCT

From: Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeremy Harris <jgh(at)wizmail(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PoC: Duplicate Tuple Elidation during External Sort for DISTINCT
Date: 2014-01-22 21:49:14
Message-ID: CAKuK5J07k3rEWq6QT0_i7pTT3OSBK9ReQwQfi5LXNp8dmeokEQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 22, 2014 at 3:26 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Jeremy Harris <jgh(at)wizmail(dot)org> writes:
>> On 22/01/14 03:53, Tom Lane wrote:
>>> Jon Nelson <jnelson+pgsql(at)jamponi(dot)net> writes:
>>>> - in createplan.c, eliding duplicate tuples is enabled if we are
>>>> creating a unique plan which involves sorting first
>
>>> [ raised eyebrow ... ] And what happens if the planner drops the
>>> unique step and then the sort doesn't actually go to disk?
>
>> I don't think Jon was suggesting that the planner drop the unique step.
>
> Hm, OK, maybe I misread what he said there. Still, if we've told
> tuplesort to remove duplicates, why shouldn't we expect it to have
> done the job? Passing the data through a useless Unique step is
> not especially cheap.

That's correct - I do not propose to drop the unique step. Duplicates
are only dropped if it's convenient to do so. In one case, it's a
zero-cost drop (no extra comparison is made). In most other cases, an
extra comparison is made, typically right before writing a tuple to
tape. If it compares as identical to the previously-written tuple,
it's thrown out instead of being written.

The output of the modified code is still sorted, still *might* (and in
most cases, probably will) contain duplicates, but will (probably)
contain fewer duplicates.

--
Jon

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2014-01-22 21:55:36 Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance
Previous Message Peter Geoghegan 2014-01-22 21:45:16 Re: Storing pg_stat_statements query texts externally, pg_stat_statements in core