Re: Yet another abort-early plan disaster on 9.3

From: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Yet another abort-early plan disaster on 9.3
Date: 2014-09-30 21:11:20
Message-ID: 542B1C78.5000602@archidevsys.co.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On 01/10/14 05:54, Jeff Janes wrote:
> On Mon, Sep 29, 2014 at 7:12 PM, Gavin Flower
> <GavinFlower(at)archidevsys(dot)co(dot)nz <mailto:GavinFlower(at)archidevsys(dot)co(dot)nz>>
> wrote:
>
>
> Would it be feasible to get a competent statistician to advise what data to collect, and to analyze it? Maybe it is possible to get a better estimate on how much of a table needs to be scanned, based on some fairly simple statistics. But unless research is done, it is probably impossible to determine what statistics might be useful, and how effective a better estimate could be.
>
> I have a nasty feeling that assuming a uniform distribution, may
> still end up being the best we can do - but I maybe being unduly
> pessimistic!.
>
>
> As a semi-competent statistician, my gut feeling is that our best bet
> would be not to rely on the competence of statisticians for too much,
> and instead try to give the executor the ability to abandon a
> fruitless path and pick a different plan instead. Of course this
> option is foreclosed once a tuple is returned to the client (unless
> the ctid is also cached, so we can make sure not to send it again on
> the new plan).
>
> I think that the exponential explosion of possibilities is going to be
> too great to analyze in any rigorous way.
>
> Cheers,
>
> Jeff
Many moons ago, I passed several 300 level statistics papers.

I looked at this problem and found it was too hard to even properly
characterise the problem (looks 'simple' - if you don't look too
closely), and ended up feeling it was definitely 'way above my pay
grade'! :-)

It might be possible to tackle it more pragmatically, instead of trying
to be all analytic and rigorously list all the possible influences, have
a look at queries of this nature that are taking far too long. Then get
a feel for combinations of issues involved and how they contribute. If
you have enough data, you might be able to use something like Principle
Component Analysis (I was fortunate to meet a scientist who had got
heavily into this area of statistics). Such an approach might yield
valuable insights, even if the problem is not fully characterised, let
alone 'solved'.

Cheers,
Gavin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2014-09-30 21:15:33 Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}
Previous Message Andres Freund 2014-09-30 21:03:28 Re: libpq-dev: pg_config_manual.h redefines CACHE_LINE_SIZE

Browse pgsql-performance by date

  From Date Subject
Next Message Merlin Moncure 2014-09-30 22:14:18 Re: Yet another abort-early plan disaster on 9.3
Previous Message Jeff Janes 2014-09-30 17:28:02 Re: Yet another abort-early plan disaster on 9.3