Re: Manual vs automatic functionality

Lists: pgsql-hackers
From: pgsql(at)mohawksoft(dot)com
To: pgsql-hackers(at)postgresql(dot)org
Subject: Manual vs automatic functionality
Date: 2005-03-05 16:05:32
Message-ID: 16744.24.91.171.78.1110038732.squirrel@mail.mohawksoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom recently said, when talking about allowing the user (in this case me)
from passing a hash table size to "create index:"

"but that doesn't mean I want to make the user deal with it."

I started thinking about this and, maybe I'm old fashioned, but I would
like the ability to deal with it. So much software these days does things
in an automatic fashion, and too often you are left saying "stop, no do it
that way, damn!" tossing hands up in frustration. Come on, be honest, we
all fight this.

I think the holy grail of completly automatic tuning/functionality is a
lofty goal and a good one to seek, but the harsh reality is that there are
many times when the statistics aren't sufficient and the very broad
audience to which PostgreSQL is targeted clouds various specific use
cases.

I have been on the end of these problems numerous times in the almost 10
years of using PostgreSQL. While I still believe that PostgreSQL is, by
far, one of the best and most usable databases out there, there are times
I just get frustrated.

Being able to assign "hints" to queries may be able to allow DBAs to tune
queries in tables who's characteristics are misrepresented by the
statistics in ANALYZE.

Being able to pass a hash table size to a hash CREATE INDEX statement,
may make hash table faster.

Whould a "hinting" syntax help this out? I don't know, but I do know that
just about every non-trivial project for which I have used PostgreSQL, I
have run into issues where I've had to manually alter statistics source
code, or enable/disable scan types to work around situations where PG just
didn't understand the nature of the problem.

Also, isn't "SET enable_seqscan=FALSE" just another more clumsy way of
issuing a hint to the planner?

CREATE INDEX mytablehash ON mytable USING hash /* +HASH_SIZE(10000000) */

SELECT * from table1, table2 where table1.realm = table2.realm and
table1.name = 'foo' /* +USE_INDEX(tabel1.realm) +USE_HASH(table1.realm,
table2.realm) */


From: Michael Fuhr <mike(at)fuhr(dot)org>
To: pgsql(at)mohawksoft(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Manual vs automatic functionality
Date: 2005-03-05 17:05:27
Message-ID: 20050305170526.GA70196@winnie.fuhr.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Mar 05, 2005 at 11:05:32AM -0500, pgsql(at)mohawksoft(dot)com wrote:

> Being able to assign "hints" to queries may be able to allow DBAs to tune
> queries in tables who's characteristics are misrepresented by the
> statistics in ANALYZE.

I wouldn't mind being able to provide hints to the planner. For
example, I have some set-returning functions that typically return
10-100 rows and I usually have a good idea of how many rows a
particular set of inputs will generate, and sometimes I know other
characteristics about those rows as well. But the planner, not
knowing any better, always assumes the functions will return 1000
rows, which presumably affects the plan, sometimes for the worse.
Here's an example of a problem I've had:

http://archives.postgresql.org/pgsql-performance/2004-08/msg00236.php

I've wondered if the ability to provide hints about the expected
output from set-returning functions would be useful.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/


From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Michael Fuhr <mike(at)fuhr(dot)org>
Cc: pgsql(at)mohawksoft(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Manual vs automatic functionality
Date: 2005-03-06 03:26:55
Message-ID: 422A787F.7060306@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> I wouldn't mind being able to provide hints to the planner. For
> example, I have some set-returning functions that typically return
> 10-100 rows and I usually have a good idea of how many rows a
> particular set of inputs will generate, and sometimes I know other
> characteristics about those rows as well. But the planner, not
> knowing any better, always assumes the functions will return 1000
> rows, which presumably affects the plan, sometimes for the worse.
> Here's an example of a problem I've had:
>
> http://archives.postgresql.org/pgsql-performance/2004-08/msg00236.php
>
> I've wondered if the ability to provide hints about the expected
> output from set-returning functions would be useful.

Hmmm, you could jank that by making a new GUC:

SET expected_srf_rows TO 152;
SELECT * FROM myfunction();

Chris