Re: proposal : cross-column stats

From: tv(at)fuzzy(dot)cz
To: "Yeb Havinga" <yebhavinga(at)gmail(dot)com>
Cc: "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Tomas Vondra" <tv(at)fuzzy(dot)cz>, "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Martijn van Oosterhout" <kleptog(at)svana(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: proposal : cross-column stats
Date: 2010-12-13 10:41:04
Message-ID: 2d932c3085e7eeacd50324bf3752e8ef.squirrel@sq.gransy.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 2010-12-13 03:28, Robert Haas wrote:
>> Well, I'm not real familiar with contingency tables, but it seems like
>> you could end up needing to store a huge amount of data to get any
>> benefit out of it, in some cases. For example, in the United States,
>> there are over 40,000 postal codes, and some even larger number of
>> city names, and doesn't the number of entries go as O(m*n)? Now maybe
>> this is useful enough anyway that we should Just Do It, but it'd be a
>> lot cooler if we could find a way to give the planner a meaningful
>> clue out of some more compact representation.
> A sparse matrix that holds only 'implicative' (P(A|B) <> P(A*B)?)
> combinations? Also, some information might be deduced from others. For
> Heikki's city/region example, for each city it would be known that it is
> 100% in one region. In that case it suffices to store only that
> information, since 0% in all other regions ca be deduced. I wouldn't be
> surprized if storing implicatures like this would reduce the size to O(n).

OK, but I'll leave this for the future. My plan is to build a small PoC,
just to see whether the contingency-table + probability-estimates approach
works for the failure case mentioned by Heikki. I'l like to do this till
the end of this week, if possible.

I'll read the articles/mentioned by Nathan Boley (thanks for those links,
if you have more of them just let me know).

Once we have a solution that solves (or significantly improves) these
failure cases, we can do further plans (how to do that ascually in the
code etc.).

BTW thanks for all the comments!

regards
Tomas

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nicolas Barbier 2010-12-13 10:42:19 Re: ALTER TABLE ... ADD FOREIGN KEY ... NOT ENFORCED
Previous Message Dimitri Fontaine 2010-12-13 10:30:26 Re: ALTER TABLE ... ADD FOREIGN KEY ... NOT ENFORCED