Re: Simple join optimized badly?

From: "Craig A(dot) James" <cjames(at)modgraph-usa(dot)com>
To: josh(at)agliodbs(dot)com, pgsql-performance(at)postgresql(dot)org
Subject: Re: Simple join optimized badly?
Date: 2006-10-09 01:07:22
Message-ID: 4529A0CA.8000904@modgraph-usa.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

> ... and add 100 other problems. Hints are used because the DBA thinks that
> they are smarter than the optimizer; 99% of the time, they are wrong.
> Just try manually optimizing a complex query, you'll see -- with three
> join types, several scan types, aggregates, bitmaps, internal and external
> sorts, and the ability to collapse subqueries it's significantly more than
> a human can figure out accurately.

Sorry, this is just wrong, wrong, wrong.

I've heard this from several PG developers every time hints have come up in my roughly eighteen months as a PG application developer. And in between every assertion that "the application programmers aren't as smart as the optimizer", there are a dozen or two examples where posters to this list are told to increase this setting, decrease that one, adjust these other two, and the end result is to get the plan that the application programmer -- AND the PG professionals -- knew was the right plan to start with.

People are smarter than computers. Period.

Now I'll agree that the majority, perhaps the great majority, of questions to this group should NOT be solved with hints. You're absolutely right that in most cases hints are a really bad idea. People will resort to hints when they should be learning better ways to craft SQL, and when they should have read the configuration guides.

But that doesn't alter the fact that many, perhaps most, complicated application will, sooner or later, run into a showstopper case where PG just optimizes wrong, and there's not a damned thing the app programmer can do about it.

My example, discussed previously in this forum, is a classic. I have a VERY expensive function (it's in the class of NP-complete problems, so there is no faster way to do it). There is no circumstance when my function should be used as a filter, and no circumstance when it should be done before a join. But PG has no way of knowing the cost of a function, and so the optimizer assigns the same cost to every function. Big disaster.

The result? I can't use my function in any WHERE clause that involves any other conditions or joins. Only by itself. PG will occasionally decide to use my function as a filter instead of doing the join or the other WHERE conditions first, and I'm dead.

The interesting thing is that PG works pretty well for me on big tables -- it does the join first, then applies my expensive functions. But with a SMALL (like 50K rows) table, it applies my function first, then does the join. A search that completes in 1 second on a 5,000,000 row database can take a minute or more on a 50,000 row database.

Instead, I have to separate the WHERE terms into two SQL statements, and do the join myself. I do the first half of my query, suck it all into memory, do the second half, suck it into memory, build a hash table and join the two lists in memory, then take the joined results and apply my function to it.

This is not how a relational database should work. It shouldn't fall over dead just when a table's size SHRINKS beyond some threshold that causes the planner to switch to a poor plan.

Since these tables are all in the same database, adjusting configuration parameters doesn't help me. And I suppose I could use SET to disable various plans, but how is that any different from a HINT feature?

Now you might argue that function-cost needs to be added to the optimizer's arsenal of tricks. And I'd agree with you: That WOULD be a better solution than hints. But I need my problem solved TODAY, not next year. Hints can help solve problems NOW that can be brought to the PG team's attention later, and in the mean time let me get my application to work.

Sorry if I seem particularly hot under the collar on this one. I think you PG designers have created a wonderful product. It's not the lack of hints that bothers me, it's the "You app developers are dumber than we are" attitude. We're not. Some of us know what we're doing, and we need hints.

If it is just a matter of resources, that's fine. I understand that these things take time. But please don't keep dismissing the repeated and serious requests for this feature. It's important.

Thanks for listening.
Craig

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Bruce Momjian 2006-10-09 02:12:12 Re: Simple join optimized badly?
Previous Message Tom Lane 2006-10-08 23:16:37 Re: Simple join optimized badly?