Quick Links

Re: Yet another abort-early plan disaster on 9.3

From:	Simon Riggs <simon(at)2ndquadrant(dot)com>
To:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Yet another abort-early plan disaster on 9.3
Date:	2014-09-30 23:01:46
Message-ID:	CA+U5nMLj0iyYpOM+uF934-joy+6S64=yR+d6cnp94ZWzk2xkug@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-performance

On 30 September 2014 18:28, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:

>> Anyway, in the particular case I posted fixing n_distinct to realistic
>> numbers (%) fixed the query plan.
>
>
> But wouldn't fixing the absolute number also have fixed the plan?

There are two causes of this issue.

1. Poor estimates of n_distinct. Fixable by user.

2. Poor assumption of homogeneous distribution. No way for user to
fix. Insufficient stats detail to be able to solve in current planner.

I see (2) as the main source of issues, since as we observe, (1) is fixable.

An example is a social media application where the business query is
"Display the last 10 posts". If the user is a frequent, recent user
then the query could come back very quickly, so a reverse scan on
post_id would work great. If the user hasn't logged on for ages, then
that plan needs to scan lots and lots of data to get to find 10 posts.
That gives the problem that only certain users experience poor
performance - even the data isn't consistent in its distribution, so
stats wouldn't help much, even if we could capture the profile of the
"typical user".

>> > The problem, as I see it, is different. We assume that if there are
>> > 100 distinct values and you use LIMIT 1 that you would only need to
>> > scan 1% of rows. We assume that the data is arranged in the table in a
>> > very homogenous layout. When data is not, and it seldom is, we get
>> > problems.
>> >
>> > Simply put, assuming that LIMIT will reduce the size of all scans is
>> > just way wrong. I've seen many plans where increasing the LIMIT
>> > dramatically improves the plan.
>> >
>> > If we can at least agree it is a problem, we can try to move forwards.
>
>
> I don't think anyone doubts there is a problem (many more than one of them),
> there is just disagreement about the priority and what can be done about it.

>> That is certainly another problem. Does correlation stat figure in the
>> LIMIT calculation at all, currently? That's what correlation stat is
>> for, no?
>
>
> I don't think correlation is up to the task as a complete solution, although
> it might help a little. There is no way a simple correlation can encode
> that John retired 15 years ago and hasn't logged on since, while Johannes
> was hired yesterday and never logged on before then.

Ah, OK, essentially the same example.

Which is why I ruled out correlation stats based approaches and
suggested a risk-weighted cost approach.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Re: Yet another abort-early plan disaster on 9.3 at 2014-09-30 17:28:02 from Jeff Janes

Responses

Re: Yet another abort-early plan disaster on 9.3 at 2014-10-01 18:56:32 from Josh Berkus

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Josh Berkus	2014-09-30 23:03:01	Re: CREATE IF NOT EXISTS INDEX
Previous Message	Andres Freund	2014-09-30 22:53:48	Re: CREATE IF NOT EXISTS INDEX

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Josh Berkus	2014-10-01 18:56:32	Re: Yet another abort-early plan disaster on 9.3
Previous Message	Merlin Moncure	2014-09-30 22:14:18	Re: Yet another abort-early plan disaster on 9.3