Re: Optimizer on sort aggregate

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Feng Tian <ftian(at)vitessedata(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimizer on sort aggregate
Date: 2014-10-19 09:35:51
Message-ID: CAApHDvpG4FOxwH5dBExY=n1oJpUDJ=5ZA5RW3UAaF85EH5Dsbw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Oct 18, 2014 at 2:25 PM, Feng Tian <ftian(at)vitessedata(dot)com> wrote:

> Hi, David,
>
> Yes, switch sorting order would loose an interesting order so if user
> dictates order by t, i; planner need to resort to its cost model.
> Estimating cardinality of groupby is a much bigger topic than this thread.
>
>
Well I don't want to jump in and make the idea more complex than it needs
to be, More research on this would be really good!

Just to make my thoughts a bit more clear, I had thought that you'd likely
use attwidth from pg_statistic to determine the average width of the column
in order to know if you'd want to play about with the order or not. If not,
then how would you know which column to put last in the group by if there
happened to be 2 text type columns in the grouping list?

Maybe this could be determined based on some ratio between stadistinct and
stawidth in pg_statistic. If for example 2 columns had the same width, then
you'd likely want to use the one with more distinct values estimated.
Perhaps the heuristics for determining this would be more complex as, if
you had an bigint with 1000 distinct values, then it perhaps would be
better to group by that first before some int with, say 5 distinct values.
Or maybe not? Some series of benchmarks might some sort of indication if
there is any sort of magical tipping point to be sought after here. Of
course the width alone might not be a great thing to base any benchmarks on
as a multibyte text type with, for example, 6 in the stawidth column, would
likely be slower to group by first than a bigint, which would have 8 in the
stawidth column. People that had carefully written their group bys a
certain way for performance might get upset if we made their query slower
by messing with them too, so I guess the logic should likely only kick in
if it's a clear win to swap the order.

Regards

David Rowley

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2014-10-19 10:22:39 Re: [PATCH] Simplify EXISTS subqueries containing LIMIT
Previous Message Magnus Hagander 2014-10-19 09:14:29 Re: [PATCH] add ssl_protocols configuration option