Re: [HACKERS] BETWEEN optimizer problems with single-value range

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: [HACKERS] BETWEEN optimizer problems with single-value range
Date: 2006-03-15 23:05:08
Message-ID: 1142463908.3859.188.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Wed, 2006-03-15 at 11:56 -0600, Kevin Grittner wrote:
> Attached is a simplified example of a performance problem we have seen,
> with a workaround and a suggestion for enhancement (hence both the
> performance and hackers lists).
>
> Our software is allowing users to specify the start and end dates for a
> query. When they enter the same date for both, the optimizer makes a
> very bad choice. We can work around it in application code by using an
> equality test if both dates match. I think the planner should be able
> to make a better choice here.

> (One obvious way to fix it would be to
> rewrite "BETWEEN a AND b" as "= a" when a is equal to b, but it seems
> like there is some underlying problem which should be fixed instead (or
> in addition to) this.

That might work, but I'm not sure if that is in itself the problem and
it would be mostly wasted overhead in 99% of cases.

The main issue appears to be that the planner chooses "Cal_CalDate"
index rather than "Cal_CtofcNo" index when the BETWEEN values match.

It seems that the cost of the first and third EXPLAINs is equal, yet for
some reason it chooses different indexes in each case. My understanding
was that it would pick the first index created if plan costs were equal.
Is that behaviour repeatable with each query?

ISTM that if we have equal plan costs then we should be choosing the
index for which we have more leading columns, since that is more likely
to lead to a more selective answer. But the plan selection is a simple
"pick the best, or if they're equal pick the best sort order".

> The first query uses BETWEEN with the same date for both min and max
> values. The second query uses an equality test for the same date. The
> third query uses BETWEEN with a two-day range. In all queries, there
> are less than 4,600 rows for the specified cotfcNo value out of over 18
> million rows in the table. We tried boosting the statistics samples for
> the columns in the selection, which made the estimates of rows more
> accurate, but didn't change the choice of plans.

The selectivity seems the same in both - clamped to a minimum of 1 row,
so changing that doesn't look like it would help.

Best Regards, Simon Riggs

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2006-03-15 23:34:33 Re: [HACKERS] BETWEEN optimizer problems with single-value
Previous Message Merlin Moncure 2006-03-15 22:11:16 Re: [PERFORM] BETWEEN optimizer problems with single-value range

Browse pgsql-performance by date

  From Date Subject
Next Message Chris 2006-03-15 23:11:55 Re: Slow SELECTS after large update cycle
Previous Message Merlin Moncure 2006-03-15 22:11:16 Re: [PERFORM] BETWEEN optimizer problems with single-value range