Re: Yet another abort-early plan disaster on 9.3

From: Peter Geoghegan <peter(dot)geoghegan86(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Yet another abort-early plan disaster on 9.3
Date: 2014-10-02 09:30:02
Message-ID: CAEYLb_UTz=dtxYOCYF_D1s7wQJfoZUrey6Fjxq08VFueXWoEKQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Thu, Oct 2, 2014 at 1:19 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> I disagree that (1) is not worth fixing just because we've provided
>> users with an API to override the stats. It would unquestionably be
>> better for us to have a better n_distinct estimate in the first place.
>> Further, this is an easier problem to solve, and fixing n_distinct
>> estimates would fix a large minority of currently pathological queries.
>> It's like saying "hey, we don't need to fix the leak in your radiator,
>> we've given you a funnel in the dashboard you can pour water into."
>
> Having read papers on it, I believe the problem is intractable. Coding
> is not the issue. To anyone: please prove me wrong, in detail, with
> references so it can be coded.

I think it might be close to intractable if you're determined to use a
sampling model. HyperLogLog looks very interesting for n_distinct
estimation, though. My abbreviated key patch estimates the cardinality
of abbreviated keys (and original strings that are to be sorted) with
high precision and fixed overhead. Maybe we can figure out a way to
do opportunistic streaming of HLL. Believe it or not, the way I use
HLL for estimating cardinality is virtually free. Hashing is really
cheap when the CPU is bottlenecked on memory bandwidth.

If you're interested, download the patch, and enable the debug traces.
You'll see HyperLogLog accurately indicate the cardinality of text
datums as they're copied into local memory before sorting.

--
Regards,
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2014-10-02 09:30:06 Re: Replication identifiers, take 3
Previous Message Andres Freund 2014-10-02 09:02:50 Re: Escaping from blocked send() reprised.

Browse pgsql-performance by date

  From Date Subject
Next Message Ryan Johnson 2014-10-02 13:59:00 Re: Yet another abort-early plan disaster on 9.3
Previous Message Simon Riggs 2014-10-02 08:19:30 Re: Yet another abort-early plan disaster on 9.3