Hypothetical suggestions for planner, indexing improvement

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-performance <pgsql-performance(at)postgresql(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Hypothetical suggestions for planner, indexing improvement
Date: 2003-05-05 19:19:58
Message-ID: 200305051219.58955.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

Folks,

An area in which postgresql planner & indexing could be improved have occurred
to me over the last week. I'd like to share this ideas with you in case it
is worthy of the todo list.

Please excuse me if this issue is already dealt with in CVS; I've been unable
to keep up completely on HACKERS lately. Please also excuse me if this
issue has been discussed and was tabled due to some theoretical limitation,
such as x^n scaling problems

THE IDEA: The planner should keep statistics on the correlation of foreign
keys and apply them to the expected row counts for EXISTS clause limitations,
and possibly for other query types as well.

To illustrate:
Database "calendar" has two tables, events and event_days.
Event_days has FK on column event_id to parent table Events.
There is at lease one record in event_days for each record in events, and the
average parent-child relationship is 1 event -> 1.15 event_days records.

This query:
SELECT events.* FROM events
WHERE EXISTS (SELECT event_id FROM event_days
WHERE event_day BETWEEN '2003-04-08' AND '2003-05-18');

Currently, (in 7.2.4 and 7.3.1) the planner makes the assumption that the
above EXISTS restriction will only filter events by 50% and makes other join
and execution plans accordingly. In fact, it filters events by 96% and the
ideal execution plan should be quite different.

It would be really keen if planner statistics could be expanded to include
correlation on foriegn keys in order to make more intelligent planner
decisions on the above type of query possible.

Thanks for your attention!

--
-Josh Berkus
Aglio Database Solutions
San Francisco

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2003-05-05 19:23:21 7.4 features list
Previous Message Stephan Szabo 2003-05-05 17:58:42 Re: Why are triggers semi-deferred?

Browse pgsql-performance by date

  From Date Subject
Next Message Josh Berkus 2003-05-05 19:27:25 Re: Suggestions wanted for 7.2.4 query
Previous Message Achilleus Mantzios 2003-05-05 18:38:43 Re: [SQL] Indices are not used by the optimizer