Quick Links

Re: proposal : cross-column stats

From:	Yeb Havinga <yebhavinga(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tomas Vondra <tv(at)fuzzy(dot)cz>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: proposal : cross-column stats
Date:	2010-12-13 09:56:26
Message-ID:	4D05EDCA.9070402@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 2010-12-13 03:28, Robert Haas wrote:
> Well, I'm not real familiar with contingency tables, but it seems like
> you could end up needing to store a huge amount of data to get any
> benefit out of it, in some cases. For example, in the United States,
> there are over 40,000 postal codes, and some even larger number of
> city names, and doesn't the number of entries go as O(m*n)? Now maybe
> this is useful enough anyway that we should Just Do It, but it'd be a
> lot cooler if we could find a way to give the planner a meaningful
> clue out of some more compact representation.
A sparse matrix that holds only 'implicative' (P(A|B) <> P(A*B)?)
combinations? Also, some information might be deduced from others. For
Heikki's city/region example, for each city it would be known that it is
100% in one region. In that case it suffices to store only that
information, since 0% in all other regions ca be deduced. I wouldn't be
surprized if storing implicatures like this would reduce the size to O(n).

regards,
Yeb Havinga

In response to

Re: proposal : cross-column stats at 2010-12-13 02:28:48 from Robert Haas

Responses

Re: proposal : cross-column stats at 2010-12-13 10:41:04 from tv

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Dimitri Fontaine	2010-12-13 10:30:26	Re: ALTER TABLE ... ADD FOREIGN KEY ... NOT ENFORCED
Previous Message	Dmitriy Igrishin	2010-12-13 09:47:38	Re: hstores in pl/python