Quick Links

Re: Planner hints in Postgresql

Lists:	pgsql-hackers

From:	Rajmohan C <csrajmohan(at)gmail(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Planner hints in Postgresql
Date:	2014-03-17 15:52:21
Message-ID:	CAHaqV0jD3+g-jyGUJneyHodPq_fJNHZvoa6BD3Vmm+vLYk2f6w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I am implementing Planner hints in Postgresql to force the optimizer to
select a particular plan for a query on request from sql input. I am having
trouble in modifying the planner code. I want to create a path node of hint
plan and make it the plan to be used by executor. How do I enforce this ?
Should I create a new Plan for this ..how to create a plan node which can
be then given directly to executor for a particular query?

From:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
To:	Rajmohan C <csrajmohan(at)gmail(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Planner hints in Postgresql
Date:	2014-03-17 15:55:41
Message-ID:	CAOeZVifr=qsqPRtUYRj05GjVzy_FXoOSdc7o42jSKPe6OU0F3w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Mar 17, 2014 at 9:22 PM, Rajmohan C <csrajmohan(at)gmail(dot)com> wrote:

> I am implementing Planner hints in Postgresql to force the optimizer to
> select a particular plan for a query on request from sql input. I am having
> trouble in modifying the planner code. I want to create a path node of hint
> plan and make it the plan to be used by executor. How do I enforce this ?
> Should I create a new Plan for this ..how to create a plan node which can
> be then given directly to executor for a particular query?
>

Planner hints have been discussed a lot before as well and AFAIK there is a
wiki page that says why we shouldnt implement them. Have you referred to
them?

Please share if you have any new points on the same.

Regards,

Atri

From:	David Johnston <polobo(at)yahoo(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Planner hints in Postgresql
Date:	2014-03-17 16:09:01
Message-ID:	1395072541308-5796353.post@n5.nabble.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Atri Sharma wrote
> On Mon, Mar 17, 2014 at 9:22 PM, Rajmohan C <

> csrajmohan@

> > wrote:
>
>> I am implementing Planner hints in Postgresql to force the optimizer to
>> select a particular plan for a query on request from sql input. I am
>> having
>> trouble in modifying the planner code. I want to create a path node of
>> hint
>> plan and make it the plan to be used by executor. How do I enforce this ?
>> Should I create a new Plan for this ..how to create a plan node which can
>> be then given directly to executor for a particular query?
>>
>
> Planner hints have been discussed a lot before as well and AFAIK there is
> a
> wiki page that says why we shouldnt implement them. Have you referred to
> them?
>
> Please share if you have any new points on the same.
>
> Regards,
>
> Atri

http://wiki.postgresql.org/wiki/Todo

(I got to it via the "FAQ" link on the homepage and the "Developer FAQ"
section there-in. You should make sure you've scanned that as well.)

Note the final section titled: "Features We Do Not Want"

Also, you need to consider what you are doing when you cross-post (a bad
thing generally) "-hackers" and "-novice". As there is, rightly IMO, no
"-novice-hackers" list you should have probably just hit up "-general".

Need to discuss the general "why" before any meaningful help on the "how" is
going to be considered by hackers.

David J.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Planner-hints-in-Postgresql-tp5796347p5796353.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	David Johnston <polobo(at)yahoo(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Planner hints in Postgresql
Date:	2014-03-17 16:15:07
Message-ID:	18192.1395072907@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

David Johnston <polobo(at)yahoo(dot)com> writes:
> Need to discuss the general "why" before any meaningful help on the "how" is
> going to be considered by hackers.

Possibly worth noting is that in past discussions, we've concluded that
the most sensible type of hint would not be "use this plan" at all, but
"here's what to assume about the selectivity of this WHERE clause".
That seems considerably less likely to break than any attempt to directly
specify plan details.

regards, tom lane

From:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	David Johnston <polobo(at)yahoo(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Planner hints in Postgresql
Date:	2014-03-17 17:15:56
Message-ID:	CAOeZVieK0JT4tDZr8wTxEZHwD0byvTTP4J7n+CJ-6R-MU+m4wg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Mar 17, 2014 at 9:45 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> David Johnston <polobo(at)yahoo(dot)com> writes:
> > Need to discuss the general "why" before any meaningful help on the
> "how" is
> > going to be considered by hackers.
>
> Possibly worth noting is that in past discussions, we've concluded that
> the most sensible type of hint would not be "use this plan" at all, but
> "here's what to assume about the selectivity of this WHERE clause".
> That seems considerably less likely to break than any attempt to directly
> specify plan details.
>
>
Isnt using a user given value for selectivity a pretty risky situation as
it can horribly screw up the plan selection?

Why not allow the user to specify an alternate plan and have the planner
assign a higher preference to it during plan evaluation? This shall allow
us to still have a fair evaluation of all possible plans as we do right now
and yet have a higher preference for the user given plan during evaluation?

Regards,

Atri

--
Regards,

Atri
*l'apprenant*

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Cc:	David Johnston <polobo(at)yahoo(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Planner hints in Postgresql
Date:	2014-03-17 17:24:21
Message-ID:	23149.1395077061@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Atri Sharma <atri(dot)jiit(at)gmail(dot)com> writes:
> On Mon, Mar 17, 2014 at 9:45 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Possibly worth noting is that in past discussions, we've concluded that
>> the most sensible type of hint would not be "use this plan" at all, but
>> "here's what to assume about the selectivity of this WHERE clause".
>> That seems considerably less likely to break than any attempt to directly
>> specify plan details.

> Isnt using a user given value for selectivity a pretty risky situation as
> it can horribly screw up the plan selection?

And forcing a plan to be used *isn't* that? Please re-read the older
threads, since you evidently have not.

regards, tom lane

From:	Stephen Frost <sfrost(at)snowman(dot)net>
To:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Johnston <polobo(at)yahoo(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Planner hints in Postgresql
Date:	2014-03-17 17:28:21
Message-ID:	20140317172821.GK26394@tamriel.snowman.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

* Atri Sharma (atri(dot)jiit(at)gmail(dot)com) wrote:
> Isnt using a user given value for selectivity a pretty risky situation as
> it can horribly screw up the plan selection?
>
> Why not allow the user to specify an alternate plan and have the planner

Uh, you're worried about the user given us a garbage selectivity, but
they're going to get a full-blown plan perfect?

> assign a higher preference to it during plan evaluation? This shall allow
> us to still have a fair evaluation of all possible plans as we do right now
> and yet have a higher preference for the user given plan during evaluation?

What exactly would such a "preference" look like? A cost modifier?
We'd almost certainly have to make that into a GUC or a value passed in
as part of the query, with a high likelihood of users figuring out how
to use it to say "use my plan forever and always"..

Thanks,

Stephen

From:	David Johnston <polobo(at)yahoo(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Planner hints in Postgresql
Date:	2014-03-17 17:28:40
Message-ID:	1395077320055-5796378.post@n5.nabble.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Atri Sharma wrote
> On Mon, Mar 17, 2014 at 9:45 PM, Tom Lane <

> tgl(at)(dot)pa

> > wrote:
>
>> David Johnston <

> polobo@

> > writes:
>> > Need to discuss the general "why" before any meaningful help on the
>> "how" is
>> > going to be considered by hackers.
>>
>> Possibly worth noting is that in past discussions, we've concluded that
>> the most sensible type of hint would not be "use this plan" at all, but
>> "here's what to assume about the selectivity of this WHERE clause".
>> That seems considerably less likely to break than any attempt to directly
>> specify plan details.
>>
>>
> Isnt using a user given value for selectivity a pretty risky situation as
> it can horribly screw up the plan selection?
>
> Why not allow the user to specify an alternate plan and have the planner
> assign a higher preference to it during plan evaluation? This shall allow
> us to still have a fair evaluation of all possible plans as we do right
> now
> and yet have a higher preference for the user given plan during
> evaluation?

The larger question to answer first is whether we want to implement
something that is deterministic...

How about just dropping the whole concept of "hinting" and provide a way for
someone to say "use this plan, or die trying." Maybe require it be used in
conjunction with named PREPAREd statements:

PREPARE s1 (USING /path/to/plan_def_on_server_or_something_similar) AS
SELECT ...;

Aside from whole-plan specification I can definitely see where join/where
specification could be useful if it can overcome the current limitation of
not being able to calculate inter-table estimations.

David J.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Planner-hints-in-Postgresql-tp5796347p5796378.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

From:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
To:	Stephen Frost <sfrost(at)snowman(dot)net>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Johnston <polobo(at)yahoo(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Planner hints in Postgresql
Date:	2014-03-17 17:38:36
Message-ID:	CAOeZVickyb3xvzjPdz_ZT6+iDz6ANDsvkbcV8i-0VR=3OWycXA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Mar 17, 2014 at 10:58 PM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:

> * Atri Sharma (atri(dot)jiit(at)gmail(dot)com) wrote:
> > Isnt using a user given value for selectivity a pretty risky situation as
> > it can horribly screw up the plan selection?
> >
> > Why not allow the user to specify an alternate plan and have the planner
>
> Uh, you're worried about the user given us a garbage selectivity, but
> they're going to get a full-blown plan perfect?
>
>
>
I never said that the user plan would be perfect. The entire point of
planner hints is based on the assumption that the user knows more about the
data than the planner does hence the user's ideas about the plan should be
given a preference. Garbage selectivity can screw up the cost estimation
of *all* our possible plans and we could end up preferring a sequential
scan over an index only scan for e.g. I am trying to think of ways that
give some preference to a user plan but do not interfere with the cost
estimation of our other potential plans.

> What exactly would such a "preference" look like? A cost modifier?
> We'd almost certainly have to make that into a GUC or a value passed in
> as part of the query, with a high likelihood of users figuring out how
> to use it to say "use my plan forever and always"..
>
>
A factor that we experimentally determine by which we decrease the cost of
the user specified plan so that it gets a higher preference in the plan
evaluation.

Of course, this is not a nice hack. Specifically after our discussion on
IRC the other day, I am against planner hints, but if we are just
discussing how it could be done, I could think of some ways which I listed.

Regards,

Atri
--
Regards,

Atri
*l'apprenant*

From:	Stephen Frost <sfrost(at)snowman(dot)net>
To:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Johnston <polobo(at)yahoo(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Planner hints in Postgresql
Date:	2014-03-17 17:43:38
Message-ID:	20140317174337.GL26394@tamriel.snowman.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

* Atri Sharma (atri(dot)jiit(at)gmail(dot)com) wrote:
> Of course, this is not a nice hack. Specifically after our discussion on
> IRC the other day, I am against planner hints, but if we are just
> discussing how it could be done, I could think of some ways which I listed.

There's lots of ways to implement planner hints, but I fail to see the
point in discussing how to implement something we actively don't want.

Thanks,

Stephen

From:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
To:	David Johnston <polobo(at)yahoo(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Planner hints in Postgresql
Date:	2014-03-17 17:44:39
Message-ID:	CAOeZVieF6BfRB6XYMDfGB+ofqgcZgWdrLitmAfZF4VJBgFp0bQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

The larger question to answer first is whether we want to implement

> something that is deterministic...
>
> How about just dropping the whole concept of "hinting" and provide a way
> for
> someone to say "use this plan, or die trying." Maybe require it be used in
> conjunction with named PREPAREd statements:
>

You mean taking away the entire concept of query planning and cost
estimation? Thats like replacing the optimizer with DBA decision and I am
not at all comfortable with that idea. That are only my thoughts though.

>
> PREPARE s1 (USING /path/to/plan_def_on_server_or_something_similar) AS
> SELECT ...;
>
> Aside from whole-plan specification I can definitely see where join/where
> specification could be useful if it can overcome the current limitation of
> not being able to calculate inter-table estimations.
>

Prepare plans use a generic plan for the execution. Replacing it with a
totally user defined plan does not seem to be clean.

The crux is that IMHO planner hints are a bad way of trying to circumvent
the need for cross-column statistics. We should do cross-column statistics
done and ignore planner hints completely.

Regards,

Atri

--
Regards,

Atri
*l'apprenant*

From:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
To:	Stephen Frost <sfrost(at)snowman(dot)net>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Johnston <polobo(at)yahoo(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Planner hints in Postgresql
Date:	2014-03-17 17:45:55
Message-ID:	CAOeZVidUBEKiUgyygpdsPRLq8QU_8uLh+7hZbmUENxeGPyZX8w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> There's lots of ways to implement planner hints, but I fail to see the
> point in discussing how to implement something we actively don't want.
>
>
>
+1. The original poster wanted a way to implement it as a personal project
or something ( I think he only replied to me, not the entire list).

Planner hints should be ignored :)

Regards,

Atri

--
Regards,

Atri
*l'apprenant*

From:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	David Johnston <polobo(at)yahoo(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Planner hints in Postgresql
Date:	2014-03-17 17:49:03
Message-ID:	CAOeZVieod1S4C42r8Zq3341j=AxhSYDagO1P=sbqBUeEfyMoVQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Mar 17, 2014 at 10:54 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Atri Sharma <atri(dot)jiit(at)gmail(dot)com> writes:
> > On Mon, Mar 17, 2014 at 9:45 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> Possibly worth noting is that in past discussions, we've concluded that
> >> the most sensible type of hint would not be "use this plan" at all, but
> >> "here's what to assume about the selectivity of this WHERE clause".
> >> That seems considerably less likely to break than any attempt to
> directly
> >> specify plan details.
>
> > Isnt using a user given value for selectivity a pretty risky situation as
> > it can horribly screw up the plan selection?
>
> And forcing a plan to be used *isn't* that? Please re-read the older
> threads, since you evidently have not.
>
>
I never said that we force a plan to be used. I just said that we should
increase the preference for a user given plan and not interfere in the cost
estimation of the other potential plans and the evaluation of the final
selected plan.

Regards,

Atri

--
Regards,

Atri
*l'apprenant*

From:	Merlin Moncure <mmoncure(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	David Johnston <polobo(at)yahoo(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Planner hints in Postgresql
Date:	2014-03-17 17:49:25
Message-ID:	CAHyXU0w4DKKq0V+soax+OGWNf0dc5Vk7DJftdtH0cMoDfd=c4Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Mar 17, 2014 at 11:15 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> David Johnston <polobo(at)yahoo(dot)com> writes:
>> Need to discuss the general "why" before any meaningful help on the "how" is
>> going to be considered by hackers.
>
> Possibly worth noting is that in past discussions, we've concluded that
> the most sensible type of hint would not be "use this plan" at all, but
> "here's what to assume about the selectivity of this WHERE clause".
> That seems considerably less likely to break than any attempt to directly
> specify plan details.

Yeah -- the most common case I see is outlier culling where several
repeated low non-deterministic selectivity quals stack reducing the
row count estimate to 1. For example:
SELECT * FROM foo WHERE length(bar) <= 1000 AND length(bar) >= 2;

The user may have special knowledge that the above is very (or very
un-) selective that is difficult or not cost effective to gather in
the general case. IIRC in the archives (heh) there is a special
workaround using indexes and some discussion regarding how a
hypothetical feature involving user input selectivity estimates might
look. I don't think that discussion is complete: the syntax for user
input selectivity is an unsolved problem.

There's a big difference between saying to the planner, "Use plan X"
vs "Here's some information describing the data supporting choosing
plan X intelligently". The latter allows for better plans in the face
of varied/changing data, integrates with the planner in natural way,
and encourages users to understand how the planner works.

merlin

From:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
To:	Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Johnston <polobo(at)yahoo(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Planner hints in Postgresql
Date:	2014-03-17 17:57:16
Message-ID:	CAOeZVieVE85=NQh_mZkvoVMb0Wg0y8N5isdcm4Ns1-5_QWy2TA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

There's a big difference between saying to the planner, "Use plan X"
> vs "Here's some information describing the data supporting choosing
> plan X intelligently". The latter allows for better plans in the face
> of varied/changing data, integrates with the planner in natural way,
> and encourages users to understand how the planner works.
>

I was thinking of varying the 'weight' of a user defined plan by an fixed
experimental factor to tell the planner to give higher/lower preference to
this plan, but after your idea above, I think Stephen's point of
introducing a GUC for the factor is the only way possible and I agree with
him on the point that eventually the user will figure out a way to force
usage of his plan using the GUC.

Regards,

Atri

--
Regards,

Atri
*l'apprenant*

From:	Stephen Frost <sfrost(at)snowman(dot)net>
To:	Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Johnston <polobo(at)yahoo(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Planner hints in Postgresql
Date:	2014-03-17 17:58:46
Message-ID:	20140317175846.GM26394@tamriel.snowman.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

* Merlin Moncure (mmoncure(at)gmail(dot)com) wrote:
> Yeah -- the most common case I see is outlier culling where several
> repeated low non-deterministic selectivity quals stack reducing the
> row count estimate to 1. For example:
> SELECT * FROM foo WHERE length(bar) <= 1000 AND length(bar) >= 2;

This is exactly the issue that I've seen also- where we end up picking a
Nested Loop because we think only one row is going to be returned and
instead we end up getting a bunch and it takes forever.

There was also some speculation on trying to change plans mid-stream to
address a situation like that, once we realize what's happening. Not
sure that's really practical but it would be nice to find some solution.

Thanks,

Stephen

From:	Merlin Moncure <mmoncure(at)gmail(dot)com>
To:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Johnston <polobo(at)yahoo(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Planner hints in Postgresql
Date:	2014-03-17 18:20:47
Message-ID:	CAHyXU0x=JSoUkJqqLAajEuv6A9wc-wW0OuuMjDiG7VObGKwu0g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Mar 17, 2014 at 12:57 PM, Atri Sharma <atri(dot)jiit(at)gmail(dot)com> wrote:
>
>> There's a big difference between saying to the planner, "Use plan X"
>> vs "Here's some information describing the data supporting choosing
>> plan X intelligently". The latter allows for better plans in the face
>> of varied/changing data, integrates with the planner in natural way,
>> and encourages users to understand how the planner works.
>
> +1
>
> I was thinking of varying the 'weight' of a user defined plan by an fixed
> experimental factor to tell the planner to give higher/lower preference to
> this plan, but after your idea above, I think Stephen's point of introducing
> a GUC for the factor is the only way possible and I agree with him on the
> point that eventually the user will figure out a way to force usage of his
> plan using the GUC.

GUC is not the answer beyond the "broad brush" mostly debugging level
features they already support. What do you do if your plan
simultaneously needs and does not need nestloops?

A query plan is a complicated thing that is the result of detail
analysis of the data. I bet there are less than 100 users on the
planet with the architectural knowledge of the planner to submit a
'plan'. What users do have is knowledge of the data that the database
can't effectively gather for some reason. Looking at my query above,
what it would need (assuming the planner could not be made to look
through length()) would be something like:

SELECT * FROM foo WHERE
length(bar) <= 1000 WITH SELECTIVITY 0.999
AND length(bar) >= 2 WITH SELECTIVITY 0.999;

Note, that's a trivial treatment of the syntax challenges. Ultimately
it'd probably look different and/or be hooked in a different way (say,
via the function call).

merlin

From:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
To:	Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Johnston <polobo(at)yahoo(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Planner hints in Postgresql
Date:	2014-03-17 18:35:38
Message-ID:	CAOeZViegi5CTW=3RH0i6CJa_dqH6jHhoFRR38zTDFu40fOrqjg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Mar 17, 2014 at 11:50 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:

> On Mon, Mar 17, 2014 at 12:57 PM, Atri Sharma <atri(dot)jiit(at)gmail(dot)com> wrote:
> >
> >> There's a big difference between saying to the planner, "Use plan X"
> >> vs "Here's some information describing the data supporting choosing
> >> plan X intelligently". The latter allows for better plans in the face
> >> of varied/changing data, integrates with the planner in natural way,
> >> and encourages users to understand how the planner works.
> >
> > +1
> >
> > I was thinking of varying the 'weight' of a user defined plan by an fixed
> > experimental factor to tell the planner to give higher/lower preference
> to
> > this plan, but after your idea above, I think Stephen's point of
> introducing
> > a GUC for the factor is the only way possible and I agree with him on the
> > point that eventually the user will figure out a way to force usage of
> his
> > plan using the GUC.
>
> GUC is not the answer beyond the "broad brush" mostly debugging level
> features they already support. What do you do if your plan
> simultaneously needs and does not need nestloops?
>
> A query plan is a complicated thing that is the result of detail
> analysis of the data. I bet there are less than 100 users on the
> planet with the architectural knowledge of the planner to submit a
> 'plan'. What users do have is knowledge of the data that the database
> can't effectively gather for some reason. Looking at my query above,
> what it would need (assuming the planner could not be made to look
> through length()) would be something like:
>
> SELECT * FROM foo WHERE
> length(bar) <= 1000 WITH SELECTIVITY 0.999
> AND length(bar) >= 2 WITH SELECTIVITY 0.999;
>
>

Wont this have scaling issues and issues over time as the data in the
table changes?

Suppose I make a view with the above query. With time, as the data in the
table changes, the selectivity values wont be good for planning. This may
potentially lead to a lot of changes in the view definition and other
places where this query was used.

In general, I think I step back on my point that specifying the selectivity
is a bad idea.

Could this also work (for the time being) for cross-column statistics?

Regards,

Atri

--
Regards,

Atri
*l'apprenant*

From:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Cc:	Merlin Moncure <mmoncure(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Johnston <polobo(at)yahoo(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Planner hints in Postgresql
Date:	2014-03-17 18:45:37
Message-ID:	CAFj8pRBpAyP50Rw18s6pmPXSrU0cdtrrfaaPB+oEHp7DOD+KdQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

2014-03-17 19:35 GMT+01:00 Atri Sharma <atri(dot)jiit(at)gmail(dot)com>:

>
>
>
> On Mon, Mar 17, 2014 at 11:50 PM, Merlin Moncure <mmoncure(at)gmail(dot)com>wrote:
>
>> On Mon, Mar 17, 2014 at 12:57 PM, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
>> wrote:
>> >
>> >> There's a big difference between saying to the planner, "Use plan X"
>> >> vs "Here's some information describing the data supporting choosing
>> >> plan X intelligently". The latter allows for better plans in the face
>> >> of varied/changing data, integrates with the planner in natural way,
>> >> and encourages users to understand how the planner works.
>> >
>> > +1
>> >
>> > I was thinking of varying the 'weight' of a user defined plan by an
>> fixed
>> > experimental factor to tell the planner to give higher/lower preference
>> to
>> > this plan, but after your idea above, I think Stephen's point of
>> introducing
>> > a GUC for the factor is the only way possible and I agree with him on
>> the
>> > point that eventually the user will figure out a way to force usage of
>> his
>> > plan using the GUC.
>>
>> GUC is not the answer beyond the "broad brush" mostly debugging level
>> features they already support. What do you do if your plan
>> simultaneously needs and does not need nestloops?
>>
>> A query plan is a complicated thing that is the result of detail
>> analysis of the data. I bet there are less than 100 users on the
>> planet with the architectural knowledge of the planner to submit a
>> 'plan'. What users do have is knowledge of the data that the database
>> can't effectively gather for some reason. Looking at my query above,
>> what it would need (assuming the planner could not be made to look
>> through length()) would be something like:
>>
>> SELECT * FROM foo WHERE
>> length(bar) <= 1000 WITH SELECTIVITY 0.999
>> AND length(bar) >= 2 WITH SELECTIVITY 0.999;
>>
>>
>
> Wont this have scaling issues and issues over time as the data in the
> table changes?
>
> Suppose I make a view with the above query. With time, as the data in the
> table changes, the selectivity values wont be good for planning. This may
> potentially lead to a lot of changes in the view definition and other
> places where this query was used.
>
>
>
> In general, I think I step back on my point that specifying the
> selectivity is a bad idea.
>
> Could this also work (for the time being) for cross-column statistics?
>
>
It is another issue.

I don't believe so SELECTIVITY can work well too. Slow queries are usually
related to some strange points in data. I am thinking so well concept
should be based on validity of estimations. Some plans are based on totally
wrong estimation, but should be fast due less sensitivity to bad
estimations. So well concept is penalization some risk plans - or use brute
force - like COLUMN store engine does. Their plan is usually simply and
tolerant to bad estimations.

Pavel

> Regards,
>
> Atri
>
>
>
> --
> Regards,
>
> Atri
> *l'apprenant*
>

From:	Jim Nasby <jim(at)nasby(dot)net>
To:	Stephen Frost <sfrost(at)snowman(dot)net>, Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Johnston <polobo(at)yahoo(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Planner hints in Postgresql
Date:	2014-03-17 19:02:05
Message-ID:	532746AD.9040202@nasby.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 3/17/14, 12:58 PM, Stephen Frost wrote:
> * Merlin Moncure (mmoncure(at)gmail(dot)com) wrote:
>> Yeah -- the most common case I see is outlier culling where several
>> repeated low non-deterministic selectivity quals stack reducing the
>> row count estimate to 1. For example:
>> SELECT * FROM foo WHERE length(bar) <= 1000 AND length(bar) >= 2;
>
> This is exactly the issue that I've seen also- where we end up picking a
> Nested Loop because we think only one row is going to be returned and
> instead we end up getting a bunch and it takes forever.

FWIW, I've also seen problems with merge and hash joins at work, but I don't have any concrete examples handy. :(

> There was also some speculation on trying to change plans mid-stream to
> address a situation like that, once we realize what's happening. Not
> sure that's really practical but it would be nice to find some solution.

Just being able to detect that something has possibly gone wrong would be useful. We could log that to alert the DBA/user of a potential bad plan. We could even format this in such a fashion that it's suitable for emailing the community with; the query, the plan, the stats, etc. That might make it easier for us to fix the planner (although at this point it seems like we're hitting statistics gathering problems that we simply don't know how to solve).

There is another aspect of this though: plan stability. There are lots of cases where users couldn't care less about getting an optimal plan, but they care *greatly* about not getting a brain-dead plan.
--
Jim C. Nasby, Data Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Cc:	Merlin Moncure <mmoncure(at)gmail(dot)com>, David Johnston <polobo(at)yahoo(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Planner hints in Postgresql
Date:	2014-03-17 19:04:28
Message-ID:	25938.1395083068@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Atri Sharma <atri(dot)jiit(at)gmail(dot)com> writes:
> Wont this have scaling issues and issues over time as the data in the
> table changes?

It can't possibly have worse problems of that sort than explicitly
specifying a plan does.

regards, tom lane

From:	Merlin Moncure <mmoncure(at)gmail(dot)com>
To:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Johnston <polobo(at)yahoo(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Planner hints in Postgresql
Date:	2014-03-17 19:16:20
Message-ID:	CAHyXU0z5+8TMcCqzihh9862y8CpB4CzhH6gcTUb5M+xuHdFRPw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Mar 17, 2014 at 1:45 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> I don't believe so SELECTIVITY can work well too. Slow queries are usually
> related to some strange points in data. I am thinking so well concept should
> be based on validity of estimations. Some plans are based on totally wrong
> estimation, but should be fast due less sensitivity to bad estimations. So
> well concept is penalization some risk plans - or use brute force - like
> COLUMN store engine does. Their plan is usually simply and tolerant to bad
> estimations.

Disagree. There is a special case of slow query where problem is not
with the data but with the expression over the data; something in the
query defeats sampled selectivity. Common culprits are:

*) CASE expressions
*) COALESCE
*) casts
*) simple tranformational expressions
*) predicate string concatenation

When using those expressions, you often end up with default
selectivity assumptions and if they are way off -- watch out.

Plan risk analysis solves a different problem: small changes in the
data mean big changes in the execution runtime. It probably wouldn't
even help cases where the server thinks there is one row and you
actually have thousands or millions unless you want to implement a
selectivity range with perhaps a risk coefficient. This was also
suggested sometime back and was also met with some skepticism (but
it'd be interesting to see!).

merlin

From:	Merlin Moncure <mmoncure(at)gmail(dot)com>
To:	Jim Nasby <jim(at)nasby(dot)net>
Cc:	Stephen Frost <sfrost(at)snowman(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Johnston <polobo(at)yahoo(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Planner hints in Postgresql
Date:	2014-03-17 20:32:34
Message-ID:	CAHyXU0xzUxeKtLE8VJND=Chr4WZu377pB1hsTg83C63CmPjYpw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Mar 17, 2014 at 2:02 PM, Jim Nasby <jim(at)nasby(dot)net> wrote:
> Just being able to detect that something has possibly gone wrong would be
> useful. We could log that to alert the DBA/user of a potential bad plan. We
> could even format this in such a fashion that it's suitable for emailing the
> community with; the query, the plan, the stats, etc. That might make it
> easier for us to fix the planner (although at this point it seems like we're
> hitting statistics gathering problems that we simply don't know how to
> solve).

Again, that's not the case here. The problem is that the server is
using hard wired assumptions (like, 10% selective) *instead* of
statistics -- at least in the case discussed above. That being said,
I think you're on to something: EXPLAIN ANALYZE rowcounts don't
indicate if the row count was generated from data based assumptions or
SWAGs. So maybe you could decorate the plan description with an
indicator that suggests when default selectivity rules were hit.

> There is another aspect of this though: plan stability. There are lots of
> cases where users couldn't care less about getting an optimal plan, but they
> care *greatly* about not getting a brain-dead plan.

Except for cases I noted above, I don't understand how you could flag
'sub-optimal' or 'brain-dead' plans. The server always picks the
best plan it can. The trick is to (in a very simple and
cpu-unintensive way) indicate when there isn't a lot of confidence in
the plan -- but that's not the same thing.

merlin

From:	Martijn van Oosterhout <kleptog(at)svana(dot)org>
To:	Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Johnston <polobo(at)yahoo(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Planner hints in Postgresql
Date:	2014-03-17 20:57:37
Message-ID:	20140317205737.GC16900@svana.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Mar 17, 2014 at 01:20:47PM -0500, Merlin Moncure wrote:
> A query plan is a complicated thing that is the result of detail
> analysis of the data. I bet there are less than 100 users on the
> planet with the architectural knowledge of the planner to submit a
> 'plan'. What users do have is knowledge of the data that the database
> can't effectively gather for some reason. Looking at my query above,
> what it would need (assuming the planner could not be made to look
> through length()) would be something like:
>
> SELECT * FROM foo WHERE
> length(bar) <= 1000 WITH SELECTIVITY 0.999
> AND length(bar) >= 2 WITH SELECTIVITY 0.999;

A small issue with selectivity is that the selectivity is probably not
what the users are expecting anyway, since many will related to
conditional selectivities. PostgreSQL is pretty good at single column
statistics, it just sometimes screws up on cross-column correlations.
This ties in with alerting about a bad plan: if the EXPLAIN output
could list for each condition what the actual selectivity was it might
give user a way of understanding the problem.

So the example given might lead to output like:

clause selectivity estimated
length(bar)>2 0.50 0.50
length(bar)<1000 | length(bar)>2 0.50 0.25

The execution engine can only output conditional selectivities because
of the order of execution. But this would at least give users a handle
on the problem.

Note that a first cut of the problem might simply be something like
likely()/unlikely() as in gcc.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
-- Arthur Schopenhauer

From:	Jim Nasby <jim(at)nasby(dot)net>
To:	Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc:	Stephen Frost <sfrost(at)snowman(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Johnston <polobo(at)yahoo(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Planner hints in Postgresql
Date:	2014-03-17 22:01:14
Message-ID:	532770AA.8040606@nasby.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 3/17/14, 3:32 PM, Merlin Moncure wrote:
> On Mon, Mar 17, 2014 at 2:02 PM, Jim Nasby <jim(at)nasby(dot)net> wrote:
>> Just being able to detect that something has possibly gone wrong would be
>> useful. We could log that to alert the DBA/user of a potential bad plan. We
>> could even format this in such a fashion that it's suitable for emailing the
>> community with; the query, the plan, the stats, etc. That might make it
>> easier for us to fix the planner (although at this point it seems like we're
>> hitting statistics gathering problems that we simply don't know how to
>> solve).
>
> Again, that's not the case here. The problem is that the server is
> using hard wired assumptions (like, 10% selective) *instead* of
> statistics -- at least in the case discussed above. That being said,
> I think you're on to something: EXPLAIN ANALYZE rowcounts don't
> indicate if the row count was generated from data based assumptions or
> SWAGs. So maybe you could decorate the plan description with an
> indicator that suggests when default selectivity rules were hit.

It occurs to me... it should be cheap for us to track actual rowcounts compared to the estimate... perhaps it's worth doing that and flagging plans when the estimates are off by more than X percent. Though... I suspect that will just tell us what we already know. :(

>> There is another aspect of this though: plan stability. There are lots of
>> cases where users couldn't care less about getting an optimal plan, but they
>> care *greatly* about not getting a brain-dead plan.
>
> Except for cases I noted above, I don't understand how you could flag
> 'sub-optimal' or 'brain-dead' plans. The server always picks the
> best plan it can. The trick is to (in a very simple and
> cpu-unintensive way) indicate when there isn't a lot of confidence in
> the plan -- but that's not the same thing.

Keep in mind that the use case here is critical queries that MUST perform "fast enough". They do NOT need to be optimal, but they definitely can not degrade into something stupid. It's often way better to have a query that's 50-100% slower than optimal as opposed to one that suddenly becomes 100+% slower than it normally is (regardless of if normal is optimal or not).

You could possibly do an "anti-hint": Never use this plan, because we know it sucks.

Even better would be if the planner could estimate how bad a plan will become if we made assumptions that turn out to be wrong.

Another option would be tracking normal execution times (and the plans used) for queries (not completely unreasonable now with pg_stat_statements); if we try to run a query and it takes noticeably longer than normal and it's a different plan then assume the plan is bad, mark it as such, and try again with one of the "known good" plans.

Worst case would be allowing a means to store an acceptable plan and force/strongly suggest that the planner use it.
--
Jim C. Nasby, Data Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

From:	Claudio Freire <klaussfreire(at)gmail(dot)com>
To:	Jim Nasby <jim(at)nasby(dot)net>
Cc:	Merlin Moncure <mmoncure(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Johnston <polobo(at)yahoo(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Planner hints in Postgresql
Date:	2014-03-17 22:07:31
Message-ID:	CAGTBQpaHMyyYCqX8U=X2CBCOipYX27qbbmg9Fyj8AhEN6Cq5Dg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Mar 17, 2014 at 7:01 PM, Jim Nasby <jim(at)nasby(dot)net> wrote:

> Even better would be if the planner could estimate how bad a plan will
> become if we made assumptions that turn out to be wrong.
>

That's precisely what risk estimation was about.

Something like

SELECT * FROM wherever WHEN id > something LIMIT COST 10000;

Would forbid a sequential scan *if* the table is big enough to suspect the
plan might take that much, or a nested loop *if* the planner cannot *prove*
it will be faster than that.

I don't believe the limit unit is obscure at all (page fetches being a nice
measuring stick), but what is, is what do you do when no plan fits the
limits.

From:	Jim Nasby <jim(at)nasby(dot)net>
To:	Merlin Moncure <mmoncure(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Johnston <polobo(at)yahoo(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Planner hints in Postgresql
Date:	2014-03-17 22:09:21
Message-ID:	53277291.2090908@nasby.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 3/17/14, 2:16 PM, Merlin Moncure wrote:
> On Mon, Mar 17, 2014 at 1:45 PM, Pavel Stehule<pavel(dot)stehule(at)gmail(dot)com> wrote:
>> >I don't believe so SELECTIVITY can work well too. Slow queries are usually
>> >related to some strange points in data. I am thinking so well concept should
>> >be based on validity of estimations. Some plans are based on totally wrong
>> >estimation, but should be fast due less sensitivity to bad estimations. So
>> >well concept is penalization some risk plans - or use brute force - like
>> >COLUMN store engine does. Their plan is usually simply and tolerant to bad
>> >estimations.
> Disagree. There is a special case of slow query where problem is not
> with the data but with the expression over the data; something in the
> query defeats sampled selectivity. Common culprits are:
>
> *) CASE expressions
> *) COALESCE
> *) casts
> *) simple tranformational expressions
> *) predicate string concatenation

*) time/date functions, ie WHERE date_trunc( 'quarter', some_timestamp ) = '2014-1-1'

Though, in this case it's probably much better to teach the parser how to turn that into a range expression.
--
Jim C. Nasby, Data Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

From:	Claudio Freire <klaussfreire(at)gmail(dot)com>
To:	Jim Nasby <jim(at)nasby(dot)net>
Cc:	Merlin Moncure <mmoncure(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Johnston <polobo(at)yahoo(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Planner hints in Postgresql
Date:	2014-03-17 22:12:04
Message-ID:	CAGTBQpaModPwpyc5iGNvWBaBohZAnkOn-1aPKokbOm_UxT9fvQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Mar 17, 2014 at 7:09 PM, Jim Nasby <jim(at)nasby(dot)net> wrote:

> On 3/17/14, 2:16 PM, Merlin Moncure wrote:
>
>> On Mon, Mar 17, 2014 at 1:45 PM, Pavel Stehule<pavel(dot)stehule(at)gmail(dot)com>
>> wrote:
>>
>>> >I don't believe so SELECTIVITY can work well too. Slow queries are
>>> usually
>>> >related to some strange points in data. I am thinking so well concept
>>> should
>>> >be based on validity of estimations. Some plans are based on totally
>>> wrong
>>> >estimation, but should be fast due less sensitivity to bad estimations.
>>> So
>>> >well concept is penalization some risk plans - or use brute force - like
>>> >COLUMN store engine does. Their plan is usually simply and tolerant to
>>> bad
>>> >estimations.
>>>
>> Disagree. There is a special case of slow query where problem is not
>> with the data but with the expression over the data; something in the
>> query defeats sampled selectivity. Common culprits are:
>>
>> *) CASE expressions
>> *) COALESCE
>> *) casts
>> *) simple tranformational expressions
>> *) predicate string concatenation
>>
>
> *) time/date functions, ie WHERE date_trunc( 'quarter', some_timestamp ) =
> '2014-1-1'
>
> Though, in this case it's probably much better to teach the parser how to
> turn that into a range expression.

Maybe, maybe not.

An index over the truncated time can potentially be much more efficient.

From:	Jim Nasby <jim(at)nasby(dot)net>
To:	Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc:	Merlin Moncure <mmoncure(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Johnston <polobo(at)yahoo(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Planner hints in Postgresql
Date:	2014-03-17 22:12:57
Message-ID:	53277369.3060405@nasby.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 3/17/14, 5:07 PM, Claudio Freire wrote:
>
> On Mon, Mar 17, 2014 at 7:01 PM, Jim Nasby <jim(at)nasby(dot)net <mailto:jim(at)nasby(dot)net>> wrote:
>
> Even better would be if the planner could estimate how bad a plan will become if we made assumptions that turn out to be wrong.
>
>
> That's precisely what risk estimation was about.
>
> Something like
>
> SELECT * FROM wherever WHEN id > something LIMIT COST 10000;
>
> Would forbid a sequential scan *if* the table is big enough to suspect the plan might take that much, or a nested loop *if* the planner cannot *prove* it will be faster than that.
>
> I don't believe the limit unit is obscure at all (page fetches being a nice measuring stick), but what is, is what do you do when no plan fits the limits.

I don't think that's the same thing... what you're describing is a way to not begin a query if a low-enough cost plan can't be found.

What I'm talking about is when the planner picks one low-cost plan over another and it turns out the estimate of the one that was picked was WAY off. I've actually seen cases where plan estimates that were off by just 100 units produce wildly different results.

In that scenario, LIMIT COST won't help at all.
--
Jim C. Nasby, Data Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

From:	Jim Nasby <jim(at)nasby(dot)net>
To:	Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc:	Merlin Moncure <mmoncure(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Johnston <polobo(at)yahoo(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Planner hints in Postgresql
Date:	2014-03-17 22:14:44
Message-ID:	532773D4.6080504@nasby.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 3/17/14, 5:12 PM, Claudio Freire wrote:
>
> On Mon, Mar 17, 2014 at 7:09 PM, Jim Nasby <jim(at)nasby(dot)net <mailto:jim(at)nasby(dot)net>> wrote:
>
> On 3/17/14, 2:16 PM, Merlin Moncure wrote:
>
> On Mon, Mar 17, 2014 at 1:45 PM, Pavel Stehule<pavel(dot)stehule(at)gmail(dot)__com <mailto:pavel(dot)stehule(at)gmail(dot)com>> wrote:
>
> >I don't believe so SELECTIVITY can work well too. Slow queries are usually
> >related to some strange points in data. I am thinking so well concept should
> >be based on validity of estimations. Some plans are based on totally wrong
> >estimation, but should be fast due less sensitivity to bad estimations. So
> >well concept is penalization some risk plans - or use brute force - like
> >COLUMN store engine does. Their plan is usually simply and tolerant to bad
> >estimations.
>
> Disagree. There is a special case of slow query where problem is not
> with the data but with the expression over the data; something in the
> query defeats sampled selectivity. Common culprits are:
>
> *) CASE expressions
> *) COALESCE
> *) casts
> *) simple tranformational expressions
> *) predicate string concatenation
>
>
> *) time/date functions, ie WHERE date_trunc( 'quarter', some_timestamp ) = '2014-1-1'
>
> Though, in this case it's probably much better to teach the parser how to turn that into a range expression.
>
>
>
> Maybe, maybe not.
>
> An index over the truncated time can potentially be much more efficient.

More efficient than a range index? Maybe, but I'm doubtful. Even if that's true, in a warehouse you're going to want to limit by weeks, months, quarters, years, etc. So now you're stuck building tons of special indexes.

(Granted, most warehouses build a separate date dimension because of these kinds of problems... I'm hoping that we could do something better.)
--
Jim C. Nasby, Data Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Martijn van Oosterhout <kleptog(at)svana(dot)org>, Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Johnston <polobo(at)yahoo(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Planner hints in Postgresql
Date:	2014-03-17 22:15:06
Message-ID:	532773EA.4000308@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 03/17/2014 01:57 PM, Martijn van Oosterhout wrote:
> On Mon, Mar 17, 2014 at 01:20:47PM -0500, Merlin Moncure wrote:
>> A query plan is a complicated thing that is the result of detail
>> analysis of the data. I bet there are less than 100 users on the
>> planet with the architectural knowledge of the planner to submit a
>> 'plan'. What users do have is knowledge of the data that the database
>> can't effectively gather for some reason. Looking at my query above,
>> what it would need (assuming the planner could not be made to look
>> through length()) would be something like:
>>
>> SELECT * FROM foo WHERE
>> length(bar) <= 1000 WITH SELECTIVITY 0.999
>> AND length(bar) >= 2 WITH SELECTIVITY 0.999;

So, if we're going to support query decorators, we might as well go all
the way and just offer Oracle-style "use this index". Speaking as
someone who is often called on to fix performance issues in other
people's databases, I find major issues with query decorators:

1. they are impossible to maintain since they're scattered all over the
application code.

2. they eventually become a barrier to upgrading, once the performance
of the DB engine changes in a way that makes older query decorators
crippling and/or erroneous. Because they are scattered all around the
code, it then becomes a major refactoring effort to fix them.

3. There's no obvious way to collect cumulative query hints in order to
supply data for database-level tuning, or for improving the postgresql
query planner.

4. There's no obvious way to use query decorators with ORMs, making them
useless to 95% of our users.

5. Application developers will add them without telling the DBAs, and
vice-versa. Hilarity ensues.

Given that, I would strongly prefer a different mechanism for ad-hoc
query plan adjustment. Possible other mechanisms would include:

a) ability to set selectivity for database objects, possibly including
per-column selectivity (i.e. selectivity for columns a & b & f is 0.01),
and save it.

b) ability to execute a query using a presupplied plan. This would
include ability to edit the plan using some intermediate format, like
JSON or XML.

c) ability to finely adjust costs of specific query operations (e.g.
bitmapscan_cost = 0.02), possibly for specific database objects.

d) ability to save selectivity estimates for specific expressions (e.g.
"selectivity on log15 ( session_id =, log_time BETWEEN ) = 0.03".

Personally, (b) is my favorite version of this feature, becuase it
allows me to test the query executor itself, and it raises the bar for
clobbering the query planner to people who are willing to spend a little
time on it. It would also make a fantastic learning tool for learning
about database planning and optimization, and might open the door for
more people hacking on our planner.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

From:	Vik Fearing <vik(dot)fearing(at)dalibo(dot)com>
To:	Stephen Frost <sfrost(at)snowman(dot)net>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Johnston <polobo(at)yahoo(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Planner hints in Postgresql
Date:	2014-03-17 22:18:53
Message-ID:	532774CD.5060909@dalibo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 03/17/2014 06:28 PM, Stephen Frost wrote:
> a value passed in
> as part of the query, with a high likelihood of users figuring out how
> to use it to say "use my plan forever and always"..

Last time this came up, I said on irc that if we ever do implement
hints, I'd like them to be tied to a major version somehow so that later
versions can issue warnings that the planner might have become smarter
about long-forgotten queries and they need to be re-tested to see if the
hint is still warrented.

--
Vik

From:	Vik Fearing <vik(dot)fearing(at)dalibo(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Johnston <polobo(at)yahoo(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Planner hints in Postgresql
Date:	2014-03-17 22:27:14
Message-ID:	532776C2.4050905@dalibo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 03/17/2014 11:15 PM, Josh Berkus wrote:
> 2. they eventually become a barrier to upgrading, once the performance
> of the DB engine changes in a way that makes older query decorators
> crippling and/or erroneous. Because they are scattered all around the
> code, it then becomes a major refactoring effort to fix them.

My idea of associating them with major versions and issuing warnings
would help with this.

> 5. Application developers will add them without telling the DBAs, and
> vice-versa. Hilarity ensues.

And this would be unmasked at next major upgrade when the logs fill up.

--
Vik

From:	Claudio Freire <klaussfreire(at)gmail(dot)com>
To:	Jim Nasby <jim(at)nasby(dot)net>
Cc:	Merlin Moncure <mmoncure(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Johnston <polobo(at)yahoo(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Planner hints in Postgresql
Date:	2014-03-17 22:40:37
Message-ID:	CAGTBQpbqpNpYxvN9s+UkqrRzj6rBoNXW+vROzR3RyzXAeMyzwg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Mar 17, 2014 at 7:12 PM, Jim Nasby <jim(at)nasby(dot)net> wrote:

> On 3/17/14, 5:07 PM, Claudio Freire wrote:
>
>
>> On Mon, Mar 17, 2014 at 7:01 PM, Jim Nasby <jim(at)nasby(dot)net <mailto:
>> jim(at)nasby(dot)net>> wrote:
>>
>> Even better would be if the planner could estimate how bad a plan
>> will become if we made assumptions that turn out to be wrong.
>>
>>
>> That's precisely what risk estimation was about.
>>
>> Something like
>>
>> SELECT * FROM wherever WHEN id > something LIMIT COST 10000;
>>
>> Would forbid a sequential scan *if* the table is big enough to suspect
>> the plan might take that much, or a nested loop *if* the planner cannot
>> *prove* it will be faster than that.
>>
>> I don't believe the limit unit is obscure at all (page fetches being a
>> nice measuring stick), but what is, is what do you do when no plan fits the
>> limits.
>>
>
> I don't think that's the same thing... what you're describing is a way to
> not begin a query if a low-enough cost plan can't be found.
>
> What I'm talking about is when the planner picks one low-cost plan over
> another and it turns out the estimate of the one that was picked was WAY
> off. I've actually seen cases where plan estimates that were off by just
> 100 units produce wildly different results.
>
> In that scenario, LIMIT COST won't help at all.
>

The case you describe is different. It's when a plan *effectively* is more
expensive than estimated, but the planner could not estimate it.

That's what was mentioned about switching plans mid-way through them, which
is IMO quite ill-defined as such (it needs a lot of love in order to get a
workable spec out of that).

From:	Merlin Moncure <mmoncure(at)gmail(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Martijn van Oosterhout <kleptog(at)svana(dot)org>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Johnston <polobo(at)yahoo(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Planner hints in Postgresql
Date:	2014-03-17 22:43:38
Message-ID:	CAHyXU0zkk6uTzqztg9ZixozyE-1Nw0YbO3LQjjAwBgX-=LMu=w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Mar 17, 2014 at 5:15 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> So, if we're going to support query decorators, we might as well go all
> the way and just offer Oracle-style "use this index". Speaking as
> someone who is often called on to fix performance issues in other
> people's databases, I find major issues with query decorators:

Supplying selectivity estimates in places where the database can't or
wont do them properly itself is a completely different thing from
Oracle style hints. For example, they are much more 'future proof' --
both to schema changes and postgres enhancments -- in the absolute
case the database can peek into your expression in some future version
and generate a better estimate than you can. Aside from that, you're
in no way locked out of future innovations. This could be done in a
relatively clean way: for example, by putting complex quals in an
inlineable function that is decorated with analog of COST/ROWS clause
-- perhaps allowing for a user defined expression to base selectivity
from the input arguments.

OTOH, Hints disable the planner and they are much more complex,
particularly for large queries since you have to consider how
components of the plan relate to one another. Force feeding a plan to
the database is the wrong way to go.

merlin

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc:	Jim Nasby <jim(at)nasby(dot)net>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, David Johnston <polobo(at)yahoo(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Planner hints in Postgresql
Date:	2014-03-17 23:47:27
Message-ID:	11603.1395100047@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Claudio Freire <klaussfreire(at)gmail(dot)com> writes:
> On Mon, Mar 17, 2014 at 7:01 PM, Jim Nasby <jim(at)nasby(dot)net> wrote:
>> Even better would be if the planner could estimate how bad a plan will
>> become if we made assumptions that turn out to be wrong.

> That's precisely what risk estimation was about.

Yeah. I would like to see the planner's cost estimates extended to
include some sort of uncertainty estimate, whereupon risk-averse people
could ask it to prefer low-uncertainty plans over high-uncertainty ones
(the plans we typically choose for ORDER BY ... LIMIT queries being great
examples of the latter). But it's a long way from wishing that to making
it so. Right now it's not even clear (to me anyway) how we'd measure or
model such uncertainty.

regards, tom lane

From:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
To:	Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Johnston <polobo(at)yahoo(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Planner hints in Postgresql
Date:	2014-03-18 05:17:59
Message-ID:	CAOeZVieh0ETTR=ezgpT0Mg=_qWGReAPcGR5QU49+yxDHb=swKw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Mar 18, 2014 at 12:46 AM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:

> On Mon, Mar 17, 2014 at 1:45 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
> wrote:
> > I don't believe so SELECTIVITY can work well too. Slow queries are
> usually
> > related to some strange points in data. I am thinking so well concept
> should
> > be based on validity of estimations. Some plans are based on totally
> wrong
> > estimation, but should be fast due less sensitivity to bad estimations.
> So
> > well concept is penalization some risk plans - or use brute force - like
> > COLUMN store engine does. Their plan is usually simply and tolerant to
> bad
> > estimations.
>
> Disagree. There is a special case of slow query where problem is not
> with the data but with the expression over the data; something in the
> query defeats sampled selectivity. Common culprits are:
>
> *) CASE expressions
> *) COALESCE
> *) casts
> *) simple tranformational expressions
> *) predicate string concatenation
>
> When using those expressions, you often end up with default
> selectivity assumptions and if they are way off -- watch out.
>
> Plan risk analysis solves a different problem: small changes in the
> data mean big changes in the execution runtime. It probably wouldn't
> even help cases where the server thinks there is one row and you
> actually have thousands or millions unless you want to implement a
> selectivity range with perhaps a risk coefficient. This was also
> suggested sometime back and was also met with some skepticism (but
> it'd be interesting to see!).
>
>

Another case is with prepared statements, when things like array size are
not know to the planner and the planner makes a hard coded estimate for it,
leading to selection of a customized plan which is worse than the generic
plan.

This would be even more useful for prepared statements since they need some
support from the user in terms of the selectivity and the user should be
allowed to tell more about the data, since he already has given us some
indications about the type of query plans he requires using prepared
statements.

Regards,

Atri

--
Regards,

Atri
*l'apprenant*

From:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Cc:	Stephen Frost <sfrost(at)snowman(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Johnston <polobo(at)yahoo(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Planner hints in Postgresql
Date:	2014-03-18 06:04:23
Message-ID:	CAMkU=1zfeabmNmGBL6zuT73zo4VstRVbzTKYVkQQq87KNGCKPA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Monday, March 17, 2014, Atri Sharma <atri(dot)jiit(at)gmail(dot)com> wrote:

>
>
>
> On Mon, Mar 17, 2014 at 10:58 PM, Stephen Frost <sfrost(at)snowman(dot)net<javascript:_e(%7B%7D,'cvml','sfrost(at)snowman(dot)net');>
> > wrote:
>
>> * Atri Sharma (atri(dot)jiit(at)gmail(dot)com<javascript:_e(%7B%7D,'cvml','atri(dot)jiit(at)gmail(dot)com');>)
>> wrote:
>> > Isnt using a user given value for selectivity a pretty risky situation
>> as
>> > it can horribly screw up the plan selection?
>> >
>> > Why not allow the user to specify an alternate plan and have the planner
>>
>> Uh, you're worried about the user given us a garbage selectivity, but
>> they're going to get a full-blown plan perfect?
>>
>>
>>
> I never said that the user plan would be perfect. The entire point of
> planner hints is based on the assumption that the user knows more about the
> data than the planner does hence the user's ideas about the plan should be
> given a preference. Garbage selectivity can screw up the cost estimation
> of *all* our possible plans and we could end up preferring a sequential
> scan over an index only scan for e.g. I am trying to think of ways that
> give some preference to a user plan but do not interfere with the cost
> estimation of our other potential plans.
>

I'm not opposed to planner hints (or plan mandates), but also not
optimistic they will ever get implemented, much less accepted. But if they
were, I don't see a use for such fudge factors. By mandating a plan, I am
already asserting I know more than the optimizer does. Maybe I am right,
maybe I am wrong, but either way I have taken it out of the optimizer's
hands and would not welcome it snatching control back.

If it is too deranged for me to trust, why would it not become somewhat
more deranged and so decide to ignore my hints? The only setting for such
a factor I would ever see myself using was the minimum, or the maximum.

The feature I would like in such hints, if they are to exist, is to set a
version to which they apply. Often a fix is made very quickly after the
problem is pointed out, but it could take well over a year for the fix to
see production if it is not backpatched and it lands at the wrong part of
the release cycle.

Cheers,

Jeff

From:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Claudio Freire <klaussfreire(at)gmail(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, David Johnston <polobo(at)yahoo(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Planner hints in Postgresql
Date:	2014-03-18 06:04:24
Message-ID:	CAMkU=1yAEB6oFFyWRObSkDKw_539QOdSYgfcLxY3RTdEg_BGRQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Monday, March 17, 2014, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Claudio Freire <klaussfreire(at)gmail(dot)com <javascript:;>> writes:
> > On Mon, Mar 17, 2014 at 7:01 PM, Jim Nasby <jim(at)nasby(dot)net <javascript:;>>
> wrote:
> >> Even better would be if the planner could estimate how bad a plan will
> >> become if we made assumptions that turn out to be wrong.
>
> > That's precisely what risk estimation was about.
>
> Yeah. I would like to see the planner's cost estimates extended to
> include some sort of uncertainty estimate, whereupon risk-averse people
> could ask it to prefer low-uncertainty plans over high-uncertainty ones
> (the plans we typically choose for ORDER BY ... LIMIT queries being great
> examples of the latter). But it's a long way from wishing that to making
> it so. Right now it's not even clear (to me anyway) how we'd measure or
> model such uncertainty.
>

Most of the cases where I've run into horrible estimates, it seemed like
the same level of knowledge/reasoning that could allow us to know it was
risky, would allow us to just do a better job in the first place.

The exception I can think of is in an antijoin between two huge rels. It
is like subtracting two large measurements to get a much smaller result.
We should know the uncertainty will be large.

Cheers,

Jeff

From:	Claudio Freire <klaussfreire(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Jim Nasby <jim(at)nasby(dot)net>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, David Johnston <polobo(at)yahoo(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Planner hints in Postgresql
Date:	2014-03-18 16:53:53
Message-ID:	CAGTBQpZ13kbHV3mJppvk0wEJyfUDPSOKhJEG7iaSgYQVnT7xig@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Mar 17, 2014 at 8:47 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Claudio Freire <klaussfreire(at)gmail(dot)com> writes:
> > On Mon, Mar 17, 2014 at 7:01 PM, Jim Nasby <jim(at)nasby(dot)net> wrote:
> >> Even better would be if the planner could estimate how bad a plan will
> >> become if we made assumptions that turn out to be wrong.
>
> > That's precisely what risk estimation was about.
>
> Yeah. I would like to see the planner's cost estimates extended to
> include some sort of uncertainty estimate, whereupon risk-averse people
> could ask it to prefer low-uncertainty plans over high-uncertainty ones
> (the plans we typically choose for ORDER BY ... LIMIT queries being great
> examples of the latter). But it's a long way from wishing that to making
> it so. Right now it's not even clear (to me anyway) how we'd measure or
> model such uncertainty.
>

Well, currently, selectivity estimates based on MCV should be pretty
low-uncertainty, whereas certainty of other estimates could be modeled as a
random variable if ANALYZE gathered a few statistical moments (for
variables that are prone to that kind of statistical analysis).

That alone could improve things considerably, and statistical info could be
propagated along expressions to make it possible to model uncertainty in
complex expressions as well.

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Risk Estimation WAS: Planner hints in Postgresql
Date:	2014-03-18 18:13:22
Message-ID:	53288CC2.8090803@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On Mon, Mar 17, 2014 at 8:47 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Yeah. I would like to see the planner's cost estimates extended to
>> include some sort of uncertainty estimate, whereupon risk-averse people
>> could ask it to prefer low-uncertainty plans over high-uncertainty ones
>> (the plans we typically choose for ORDER BY ... LIMIT queries being great
>> examples of the latter). But it's a long way from wishing that to making
>> it so. Right now it's not even clear (to me anyway) how we'd measure or
>> model such uncertainty.

This is not a model, but here's some starting thoughts:

A "high risk" plan has two components:

a) our statistical data is out-of-date or inadequate

b) the potential execution time if our estimates of selectivity are
wrong is high

c) the cost ratio of certain operations is wrong.

Factor (a) can be modeled two ways:

1. If last_analyze is a long time ago, we have increased the risk.
(Ideally, we'd have some idea of the change rate on the table vs.
the last analyze time; right now we don't have those stats)

2. Certain patterns, such as multi-column selectivity and GIN/GiST
selectivity are known to have poor estimates, and be higher risk.
Certainly selectivity functions which have been programmed with a flat
coefficient (like default 0.05 selectivity for gist_ops) could also
return a risk factor which is fairly high.

Factor (b) can be modeled simply by estimating the cost of a plan where
all row estimates are changed by 10X, or even better by a calculation on
the risk factor calculated in (a). This would then give us the "failure
cost" of the bad plan. Note that we need to estimate in both
directions, both for higher estimates and lower ones; "abort early"
plans fail because the rows returned are lower than expected, for example.

(b) estimation would be expensive if we did every combination of the
entire plan with wrong estimates, so I'm wondering if it would be
adequate to just estimate the node selectivity being off on a per-node
basis.

(c) we can't realistically estimate for at all (i.e. if we knew the cost
factor was wrong, we'd fix it) so I suggest ignoring it for risk estimation.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

From:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Risk Estimation WAS: Planner hints in Postgresql
Date:	2014-03-18 18:21:47
Message-ID:	CAOeZVieVa+V-6NYzLLofG6uXfs9Y7N4cUzeoYrYKaQg23nohuw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Mar 18, 2014 at 11:43 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:

>
> > On Mon, Mar 17, 2014 at 8:47 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> Yeah. I would like to see the planner's cost estimates extended to
> >> include some sort of uncertainty estimate, whereupon risk-averse people
> >> could ask it to prefer low-uncertainty plans over high-uncertainty ones
> >> (the plans we typically choose for ORDER BY ... LIMIT queries being
> great
> >> examples of the latter). But it's a long way from wishing that to
> making
> >> it so. Right now it's not even clear (to me anyway) how we'd measure or
> >> model such uncertainty.
>
>

I have been thinking of some ways to have a risk estimate of each
selectivity that our planner gives. I think a way to do it is as follows:

One of the factors that leads to bad estimates is that the histogram of the
values of a column maintained by the planner gets old by time and the data
in the column changes. So, the histogram is no longer a quite accurate view
of the data and it leads to bad selectivity.

One thing we can try to do is to add a factor of error that we feel the
selectivity given can have. This allows us to factor in the probability
that the data changed and the estimate of the difference of the current
histogram and the histogram of the actual data currently present in the
column in the table.

We can use Central Limit Theorem (
http://en.wikipedia.org/wiki/Central_limit_theorem). Essentially, what the
theorem says is that given a distribution that has finite variance and
finite mean, we can take random independent samples from the data and
calculate the standard deviation and the mean of the sample. If we have
large enough number of samples and if we plot the mean and SD, they would
follow a normal distribution.

What is interesting is that this can allow us to predict the SD of a given
dataset from the curve and the SD should be directly proportional to the
deviation it has from the given planner histogram.

I am no mathematician hence its hard for me to explain. I think this link
[1] will be more helpful.

So, we can have a probability value for the random variable and that shall
model the confidence we have in our estimate.

I may be wrong in some parts but I hope I have been able to convey the
general idea.

If this idea develops, I shall be happy to work on this but my hands are
full in ROLLUPS right now, so for my part it shall take time. I just want
to float the idea and get a general feel about the idea right now.

Please let me know your comments and feedback on this.

Regards,

Atri

[1]: http://www.theriac.org/DeskReference/viewDocument.php?id=177
--
Regards,

Atri
*l'apprenant*

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Risk Estimation WAS: Planner hints in Postgresql
Date:	2014-03-18 18:41:59
Message-ID:	23908.1395168119@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Atri Sharma <atri(dot)jiit(at)gmail(dot)com> writes:
> One of the factors that leads to bad estimates is that the histogram of the
> values of a column maintained by the planner gets old by time and the data
> in the column changes. So, the histogram is no longer a quite accurate view
> of the data and it leads to bad selectivity.

TBH, this is so far down the list of problems that it'll be a long time
before we need to worry about it. It's certainly not the number one
priority for any project to model risk in the planner.

The thing that I think is probably the number one problem is estimates
that depend on an assumption of uniform distribution of sought-after rows
among those encountered by a scan. This is usually where bad plans for
LIMIT queries are coming from. We could certainly add some sort of fudge
factor to those costs, but I'd like to have a more-or-less principled
framework for doing so.

regards, tom lane

From:	Merlin Moncure <mmoncure(at)gmail(dot)com>
To:	Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jim Nasby <jim(at)nasby(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, David Johnston <polobo(at)yahoo(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Planner hints in Postgresql
Date:	2014-03-18 19:48:26
Message-ID:	CAHyXU0wuzc8mnd=+DWaP088-zTp=5qSv2jsiZ2gbeF3OA7ojCg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Mar 18, 2014 at 11:53 AM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
>
> On Mon, Mar 17, 2014 at 8:47 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>
>> Claudio Freire <klaussfreire(at)gmail(dot)com> writes:
>> > On Mon, Mar 17, 2014 at 7:01 PM, Jim Nasby <jim(at)nasby(dot)net> wrote:
>> >> Even better would be if the planner could estimate how bad a plan will
>> >> become if we made assumptions that turn out to be wrong.
>>
>> > That's precisely what risk estimation was about.
>>
>> Yeah. I would like to see the planner's cost estimates extended to
>> include some sort of uncertainty estimate, whereupon risk-averse people
>> could ask it to prefer low-uncertainty plans over high-uncertainty ones
>> (the plans we typically choose for ORDER BY ... LIMIT queries being great
>> examples of the latter). But it's a long way from wishing that to making
>> it so. Right now it's not even clear (to me anyway) how we'd measure or
>> model such uncertainty.
>
> Well, currently, selectivity estimates based on MCV should be pretty
> low-uncertainty, whereas certainty of other estimates could be modeled as a
> random variable if ANALYZE gathered a few statistical moments (for variables
> that are prone to that kind of statistical analysis).

Sure, plus as noted you have cases where the planer makes SWAGs. Each
of those SWAGs can introduce say (in the worst case) an order of
magnitude of error in the row count estimate.

> That alone could improve things considerably, and statistical info could be
> propagated along expressions to make it possible to model uncertainty in
> complex expressions as well.

But how would that work? I see no solution adumbrated there :-).
Let's say you change the rowcount estimate to low/bestguess/high *and*
you only engage extra searches when there is enough disparity between
those values you still get exponentially more searches. (my thinking
is that if bestguess estimated execution time is some user definable
amount faster then low/high at any node), a more skeptical plan is
introduced. All that could end up being pessimal to the general case
though.

merlin

From:	Claudio Freire <klaussfreire(at)gmail(dot)com>
To:	Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jim Nasby <jim(at)nasby(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, David Johnston <polobo(at)yahoo(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Planner hints in Postgresql
Date:	2014-03-18 20:08:39
Message-ID:	CAGTBQpbdqT=1NuMPUcM6RZLhWVhr9H1c8QFZiy9240n9OG8Srw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Mar 18, 2014 at 4:48 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:

> > That alone could improve things considerably, and statistical info could
> be
> > propagated along expressions to make it possible to model uncertainty in
> > complex expressions as well.
>
> But how would that work? I see no solution adumbrated there :-).
>

I would have to tipify the SQL expression grammar for this, but I don't
think it would be impossible. Most non-function expression nodes seem
rather trivial. Even CASE, as long as you have a distribution for the
conditional, you can derive a distribution for the whole. User defined
functions would be another game, though. Correlation would have to be
measured, and that can be troublesome and a weak spot of risk computation
as much as it is of planning, but it could be fuzzed arbitrarily until
properly computed - after all, dependency on correlation or non-correlation
is a known source of risk, and accounting for it in any way is better than
not.

> Let's say you change the rowcount estimate to low/bestguess/high *and*
> you only engage extra searches when there is enough disparity between
> those values you still get exponentially more searches.

I was under the impression the planner already did an exhaustive search for
some queries. So it's just a matter of picking the best plan among those
(ie: estimating cost). The case of GEQO isn't any different, except perhaps
introducing a risk-decreasing transformation would be needed, unless I'm
missing something.

> (my thinking
> is that if bestguess estimated execution time is some user definable
> amount faster then low/high at any node), a more skeptical plan is
> introduced. All that could end up being pessimal to the general case
> though.

I think the cost estimate would be replaced by a distribution (simplified
perhaps into an array of moments, or whatever is easily manipulated in the
face of complex expressions). What the user would pick, is a sampling
method of said distribution. Then, plans get measured by the user's stick
(say: arithmetic mean, median, 90th percentile, etc). The arithmetic mean
would I guess be the default, and that ought to be roughly equivalent to
the planner's current behavior.

From:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
To:	Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jim Nasby <jim(at)nasby(dot)net>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, David Johnston <polobo(at)yahoo(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Planner hints in Postgresql
Date:	2014-03-19 04:20:33
Message-ID:	CAOeZVifuFwnR67EjHsTxKnejcBF=HcAvY8oGFfgxo+y_hVgptQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

>
>> > That's precisely what risk estimation was about.
>>
>> Yeah. I would like to see the planner's cost estimates extended to
>> include some sort of uncertainty estimate, whereupon risk-averse people
>> could ask it to prefer low-uncertainty plans over high-uncertainty ones
>> (the plans we typically choose for ORDER BY ... LIMIT queries being great
>> examples of the latter). But it's a long way from wishing that to making
>> it so. Right now it's not even clear (to me anyway) how we'd measure or
>> model such uncertainty.
>>
>
> Well, currently, selectivity estimates based on MCV should be pretty
> low-uncertainty, whereas certainty of other estimates could be modeled as a
> random variable if ANALYZE gathered a few statistical moments (for
> variables that are prone to that kind of statistical analysis).
>
> That alone could improve things considerably, and statistical info could
> be propagated along expressions to make it possible to model uncertainty in
> complex expressions as well.
>
>

That is a sort of solution that I proposed yesterday on the mailing list.
The solution essentially takes lots of samples of the data and then plots
the mean and standard deviation of the independent samples to get the
probability of the histogram selectivity estimate.

The problem is multi faceted (outdated stats, bad guess at distribution of
data, cases Merlin mentioned before (CASE statements, COALESCE statements
etc.). Finding a general solution to this problem shall require a lot of
research and time.

I agree with Tom, we should focus on some of the main problems we have in
that domain and then try to solve them first.

Regards,

Atri

--
Regards,

Atri
*l'apprenant*

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Risk Estimation WAS: Planner hints in Postgresql
Date:	2014-03-20 14:40:43
Message-ID:	CA+TgmoYWM4xOW4V9-bHMiBdrmhC7ArBEED8jsUAR55u5Tzn84g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Mar 18, 2014 at 2:41 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Atri Sharma <atri(dot)jiit(at)gmail(dot)com> writes:
>> One of the factors that leads to bad estimates is that the histogram of the
>> values of a column maintained by the planner gets old by time and the data
>> in the column changes. So, the histogram is no longer a quite accurate view
>> of the data and it leads to bad selectivity.
>
> TBH, this is so far down the list of problems that it'll be a long time
> before we need to worry about it. It's certainly not the number one
> priority for any project to model risk in the planner.
>
> The thing that I think is probably the number one problem is estimates
> that depend on an assumption of uniform distribution of sought-after rows
> among those encountered by a scan. This is usually where bad plans for
> LIMIT queries are coming from. We could certainly add some sort of fudge
> factor to those costs, but I'd like to have a more-or-less principled
> framework for doing so.

I think the problem is, in some sense, more basic than that. I think
the kind of query we're talking about here is:

SELECT * FROM foo WHERE unlikely ORDER BY indexed_column LIMIT 1

Assume for the sake of argument that there are 100 rows that would be
returned in the absence of the limit. Let SC and TC be the startup
cost and total cost of the index scan. As a matter of general policy,
we're going to say that the cost of this is SC + 0.01 * (TC - SC).
What makes this path look appealing to the planner is that SC is small
relative to TC. If we knew, for example, that we weren't going to
find the first match until 90% of the way through the index scan, then
we could set SC = 90% * TC and, all else being equal, the planner
would make the right decision.

So you might think that the problem here is that we're assuming
uniform density. Let's say there are a million rows in the table, and
there are 100 that match our criteria, so the first one is going to
happen 1/10,000'th of the way through the table. Thus we set SC =
0.0001 * TC, and that turns out to be an underestimate if the
distribution isn't as favorable as we're hoping. However, that is NOT
what we are doing. What we are doing is setting SC = 0. I mean, not
quite 0, but yeah, effectively 0. Essentially we're assuming that no
matter how selective the filter condition may be, we assume that it
will match *the very first row*.

So we're not assuming the average case and getting hosed when things
come out worse than average. We're assuming the *best* case. So
unless things happen to really swing in our favor, we got hosed.

Now it might be that a fudge factor of 2 or 1.5 or 10 or 3 or 17 is
appropriate, so that we actually assume we're going to have to scan a
little more of the index than we expect. That can perhaps be
justified by the possibility that there may actually be NO rows
matching the filter condition, and we'll have to try scanning the
entire index to get off the ground. We could also try to come up with
a mathematical model for that. But that fudge factor would presumably
be a multiplier on the effort of finding the first tuple. And right
now we assume that finding the first tuple will be trivial. So I
think we should fix THAT problem first, and then if that turns out to
be insufficient, we can worry about what further fudging is required.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Risk Estimation WAS: Planner hints in Postgresql
Date:	2014-03-20 14:45:28
Message-ID:	3610.1395326728@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> So you might think that the problem here is that we're assuming
> uniform density. Let's say there are a million rows in the table, and
> there are 100 that match our criteria, so the first one is going to
> happen 1/10,000'th of the way through the table. Thus we set SC =
> 0.0001 * TC, and that turns out to be an underestimate if the
> distribution isn't as favorable as we're hoping. However, that is NOT
> what we are doing. What we are doing is setting SC = 0. I mean, not
> quite 0, but yeah, effectively 0. Essentially we're assuming that no
> matter how selective the filter condition may be, we assume that it
> will match *the very first row*.

I think this is wrong. Yeah, the SC may be 0 or near it, but the time to
fetch the first tuple is estimated as SC + (TC-SC)/N.

regards, tom lane

From:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Risk Estimation WAS: Planner hints in Postgresql
Date:	2014-03-20 15:07:11
Message-ID:	CAOeZVifb5TEKiKjxJDsb_LZTk=ufwjuVK99x4qetx5XyLfSVcg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Mar 20, 2014 at 8:10 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Tue, Mar 18, 2014 at 2:41 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Atri Sharma <atri(dot)jiit(at)gmail(dot)com> writes:
> >> One of the factors that leads to bad estimates is that the histogram of
> the
> >> values of a column maintained by the planner gets old by time and the
> data
> >> in the column changes. So, the histogram is no longer a quite accurate
> view
> >> of the data and it leads to bad selectivity.
> >
> > TBH, this is so far down the list of problems that it'll be a long time
> > before we need to worry about it. It's certainly not the number one
> > priority for any project to model risk in the planner.
> >
> > The thing that I think is probably the number one problem is estimates
> > that depend on an assumption of uniform distribution of sought-after rows
> > among those encountered by a scan. This is usually where bad plans for
> > LIMIT queries are coming from. We could certainly add some sort of fudge
> > factor to those costs, but I'd like to have a more-or-less principled
> > framework for doing so.
>
> I think the problem is, in some sense, more basic than that. I think
> the kind of query we're talking about here is:
>
> SELECT * FROM foo WHERE unlikely ORDER BY indexed_column LIMIT 1
>
> Assume for the sake of argument that there are 100 rows that would be
> returned in the absence of the limit. Let SC and TC be the startup
> cost and total cost of the index scan. As a matter of general policy,
> we're going to say that the cost of this is SC + 0.01 * (TC - SC).
> What makes this path look appealing to the planner is that SC is small
> relative to TC. If we knew, for example, that we weren't going to
> find the first match until 90% of the way through the index scan, then
> we could set SC = 90% * TC and, all else being equal, the planner
> would make the right decision.
>
> So you might think that the problem here is that we're assuming
> uniform density. Let's say there are a million rows in the table, and
> there are 100 that match our criteria, so the first one is going to
> happen 1/10,000'th of the way through the table. Thus we set SC =
> 0.0001 * TC, and that turns out to be an underestimate if the
> distribution isn't as favorable as we're hoping. However, that is NOT
> what we are doing. What we are doing is setting SC = 0. I mean, not
> quite 0, but yeah, effectively 0. Essentially we're assuming that no
> matter how selective the filter condition may be, we assume that it
> will match *the very first row*.
>
>

Cannot we reuse the same histogram we have in the planner right now for
this? I mean, AFAIK, the heuristic we have is that we divide the histogram
into equal size buckets and then find the bucket in which our predicate
value lies, then take some part of that bucket and the rest of the buckets
before that bucket,right?

So, suppose a query is SELECT * FROM table WHERE a > 10, we shall find the
bucket that 10 lies in, right?

Now, why cannot we take the estimate of all the buckets behind the bucket
in which our value is present? Will that estimate not give us the fraction
of tuples that are expected to be before the first matching row?

Its pretty wild, but I wanted to know if my understanding of this scenario
is correct or not.

Regards,

Atri

--
Regards,

Atri
*l'apprenant*

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Risk Estimation WAS: Planner hints in Postgresql
Date:	2014-03-20 15:21:10
Message-ID:	4472.1395328870@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Atri Sharma <atri(dot)jiit(at)gmail(dot)com> writes:
> Now, why cannot we take the estimate of all the buckets behind the bucket
> in which our value is present? Will that estimate not give us the fraction
> of tuples that are expected to be before the first matching row?

Uh, no, not unless you assume that the table happens to be perfectly
sorted by the column's value.

regards, tom lane

From:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Risk Estimation WAS: Planner hints in Postgresql
Date:	2014-03-20 15:30:49
Message-ID:	CAOeZVifc88H512xq31Dj3ccsn1ecArr8hStK8ZT-qbt92eJB9g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Mar 20, 2014 at 8:51 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Atri Sharma <atri(dot)jiit(at)gmail(dot)com> writes:
> > Now, why cannot we take the estimate of all the buckets behind the bucket
> > in which our value is present? Will that estimate not give us the
> fraction
> > of tuples that are expected to be before the first matching row?
>
> Uh, no, not unless you assume that the table happens to be perfectly
> sorted by the column's value.
>
>
>

Yes, that is true. So, if an attribute has an index present, can we do this
somehow?

Regards,

Atri

--
Regards,

Atri
*l'apprenant*

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Risk Estimation WAS: Planner hints in Postgresql
Date:	2014-03-20 16:29:51
Message-ID:	CA+TgmobayUB_7yu95fQVuMqZEP_5cOKkc5KJo1O9kY36K5Nbig@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Mar 20, 2014 at 10:45 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> So you might think that the problem here is that we're assuming
>> uniform density. Let's say there are a million rows in the table, and
>> there are 100 that match our criteria, so the first one is going to
>> happen 1/10,000'th of the way through the table. Thus we set SC =
>> 0.0001 * TC, and that turns out to be an underestimate if the
>> distribution isn't as favorable as we're hoping. However, that is NOT
>> what we are doing. What we are doing is setting SC = 0. I mean, not
>> quite 0, but yeah, effectively 0. Essentially we're assuming that no
>> matter how selective the filter condition may be, we assume that it
>> will match *the very first row*.
>
> I think this is wrong. Yeah, the SC may be 0 or near it, but the time to
> fetch the first tuple is estimated as SC + (TC-SC)/N.

Hmm, you're right, and experimentation confirms that the total cost of
the limit comes out to about TC/selectivity. So scratch that theory.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company