Re: proposal : cross-column stats

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: proposal : cross-column stats
Date: 2010-12-13 02:00:37
Message-ID: AANLkTimWyk6qso=_BethgFSYN4FBZa-syAVjtUxjssuo@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Dec 12, 2010 at 8:46 PM, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:
> Dne 13.12.2010 01:05, Robert Haas napsal(a):
>> This is a good idea, but I guess the question is what you do next.  If
>> you know that the "applicability" is 100%, you can disregard the
>> restriction clause on the implied column.  And if it has no
>> implicatory power, then you just do what we do now.  But what if it
>> has some intermediate degree of implicability?
>
> Well, I think you've missed the e-mail from Florian Pflug - he actually
> pointed out that the 'implicativeness' Heikki mentioned is called
> conditional probability. And conditional probability can be used to
> express the "AND" probability we are looking for (selectiveness).
>
> For two columns, this is actually pretty straighforward - as Florian
> wrote, the equation is
>
>   P(A and B) = P(A|B) * P(B) = P(B|A) * P(A)

Well, the question is what data you are actually storing. It's
appealing to store a measure of the extent to which a constraint on
column X constrains column Y, because you'd only need to store
O(ncolumns^2) values, which would be reasonably compact and would
potentially handle the zip code problem - a classic "hard case" rather
neatly. But that wouldn't be sufficient to use the above equation,
because there A and B need to be things like "column X has value x",
and it's not going to be practical to store a complete set of MCVs for
column X for each possible value that could appear in column Y.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2010-12-13 02:08:54 Re: proposal : cross-column stats
Previous Message Tomas Vondra 2010-12-13 01:46:05 Re: proposal : cross-column stats