Re: cross column correlation revisted

From: PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject: Re: cross column correlation revisted
Date: 2010-07-14 14:41:01
Message-ID: A28B0AA5-34DA-4D5A-A78F-76E4F78FA310@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

hello ...

look at the syntax i posted in more detail:

>> ALTER TABLE x SET CORRELATION STATISTICS FOR (x.id = y.id AND x.id2 = y.id2)
>

it says X and Y ...
the selectivity of joins are what i am most interested in. cross correlation of columns within the same table are just a byproduct.
the core thing is: how can i estimate the number of rows returned from a join?

an example would be: you have a email accounts + messages. you know that each row will match in a join as you can assume that every account will have a message.
what we need is a syntax which covers the join case and the case where columns inside the same table correlate.
and the fact that an index cannot cover two tables leads me to the conclusion that stats on an index are not the solution to the join problem.

many thanks,

hans

On Jul 14, 2010, at 4:22 PM, Tom Lane wrote:

> =?iso-8859-1?Q?PostgreSQL_-_Hans-J=FCrgen_Sch=F6nig?= <postgres(at)cybertec(dot)at> writes:
>> i think that having stats on an index is a problem by itself for 2 reasons - for cross column correlation at least:
>
>> a.) joins cannot be covered by an index on two tables - we would fix "inside a table correlation problems" but not joins.
>
> Your proposed command didn't cover the two-table case either, and anyway
> what the heck do you mean by cross-correlation across tables?
> Cross-correlation is about the correlation between values in the same
> row.
>
>> b.) who says that there is actually an index in place?
>
> If the combination of columns is actually interesting, there might well
> be an index in place, or the DBA might be willing to create it. For
> that matter, have you considered the idea of examining the index
> contents to derive the statistics? Might work better than trying to get
> numbers via ANALYZE.
>
> regards, tom lane
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-07-14 14:48:28 standard_conforming_strings
Previous Message Robert Haas 2010-07-14 14:39:50 Re: five-key syscaches