Re: Incorrect assumptions with low LIMITs

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Incorrect assumptions with low LIMITs
Date: 2012-03-16 21:55:30
Message-ID: CA+U5nMLSvw5wuickWAoHu629BQHUt26ws=3MEAe+0x_788PwUw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 16, 2012 at 9:39 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> On Fri, 2012-03-16 at 18:25 +0000, Simon Riggs wrote:
>> Any time we apply a LIMIT clause to a plan with a SeqScan or
>> unqualified IndexScan, we shouldn't assume the scan will do less than
>> say 10% of the table. It might, but its an unsafe assumption because
>> as the selectivity decreases so does the safety of the assumption that
>> rows are uniformly distributed.
>
> Just trying to follow along. You mean "as the selectivity _increases_
> the safety of the assumption that the rows are uniformly distributed
> decreases", right?

Selectivity meaning the value between 0 and 1 that describes, in the
planner, the fraction of rows we will get back from a scan. 1.0 means
100% of rows. When selectivity is low, that means very few rows will
come back. I think you are using "high selectivity" as meaning
"returns few of the rows", so you understand me, but just flip the
meaning of the words.

When you have lots of rows, its a good assumption they are spread out
and a scan will find some of them quickly.

When you have very few rows, assuming they are evenly spaced is just
weird. Clumpiness of some kind seems much more likely. Much more
easily understood if the values are dates, for example.

Given the estimated number of rows is deliberately a worst case (i,e.
high), that sounds like the scan will work. Yet the reality is that
doing the scan is incredibly costly when those assumptions break,
which they do, often. Especially when the values don't exist at all
because the table is sparse.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tareq Aljabban 2012-03-16 22:02:21 Re: Storage Manager crash at mdwrite()
Previous Message Tom Lane 2012-03-16 21:52:55 Re: Command Triggers, patch v11