Re: *_collapse_limit, geqo_threshold - example schema

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>, "Robert Haas" <robertmhaas(at)gmail(dot)com>
Subject: Re: *_collapse_limit, geqo_threshold - example schema
Date: 2009-07-09 15:00:42
Message-ID: 200907091700.43411.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tuesday 07 July 2009 17:40:50 Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > I cannot reasonably plan some queries with join_collapse_limit set to 20.
> > At least not without setting the geqo limit very low and a geqo_effort to
> > a low value.
> > So I would definitely not agree that removing j_c_l is a good idea.
> Can you show some specific examples? All of this discussion seems like
> speculation in a vacuum ...
As similar wishes came up multiple times now I started to create a schema I
may present which is sufficiently similar to show the same effects.

I had to cut down the complexity of the schema considerably - both for easier
understanding and easier writing of the demo schema.

I also have a moderately complex demo query similar to really used ones.
Autogenerated (GUI) queries do not use views like I did in the example one but
it seemed easier to play around with query size this way.
Also the real queries often have way much more conditions than the one I
present here.
Also I have not "tuned" the queries here in any way, the join order is not
optimized (like in the real application), but I don't think that does matter
for the purpose of this discussion
The queries itself only sketch what they are intended for and query many
fictional datapoints, but again I dont think this is a problem.

Is it helpfull this way?

Some numbers about the query_2.sql are attached. Short overview:
- a low from_collapse_limit is deadly
- a high from_collapse_limit is not costly here
- geqo_effort basically changes nothing
- geqo changes basically nothing
- with a higher join_collapse_limit (12) geqo=on costs quite a bit! (factor
20!). I double checked. At other times I get 'failed to make a valid plan'

The numbers are all 8.5 as of today.

Some explanations about the schema:
- It uses surrogate keys everywhere as the real schema employs some form of
row level, label based access checking (covert channel issues)
- The real schema uses partitions - I don't think they would be interesting
here?
- its definitely not the most beautiful schema I have seen, but I have to admit
that I cannot think of a much nicer one which serves the different purposes as
well
- somewhat complex queries
- new "information_set"'s and "information"'s are added frequently
- automated and manual data entry has to work with such additions
- GUI query tool needs to work in face of such changes
- I have seen similar schemas multiple times now.
- The original schema employs materialized views in parts (both for execution
and planning speed)
- The queries are crazy, but people definitely create/use them.

Andres

Attachment Content-Type Size
query_2.sql text/x-sql 2.1 KB
schema.sql text/x-sql 6.3 KB
numbers.txt text/plain 1.5 KB
views.sql text/x-sql 55.2 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2009-07-09 15:37:41 Re: *_collapse_limit, geqo_threshold
Previous Message Peter Hunsberger 2009-07-09 14:14:37 Re: *_collapse_limit, geqo_threshold