Re: Cross-column statistics revisited

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Joshua Tolley" <eggyknap(at)gmail(dot)com>
Cc: josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org, "Martijn van Oosterhout" <kleptog(at)svana(dot)org>
Subject: Re: Cross-column statistics revisited
Date: 2008-10-17 00:32:38
Message-ID: 4073.1224203558@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Joshua Tolley" <eggyknap(at)gmail(dot)com> writes:
> Most of the comments on this thread have centered around the questions
> of "what we'd store" and "how we'd use it", which might be better
> phrased as, "The database assumes columns are independent, but we know
> that's not always true. Does this cause enough problems to make it
> worth fixing? How might we fix it?" I have to admit an inability to
> show that it causes problems,

Any small amount of trolling in our archives will turn up plenty of
examples.

It appears to me that a lot of people in this thread are confusing
correlation in the sense of statistical correlation between two
variables with correlation in the sense of how well physically-ordered
a column is. (The latter is actually the same kind of animal, but
always taking one of the two variables to be physical position.)
A bad estimate for physical-position correlation has only limited
impact, as Josh B said upthread; but the other case leads to very
bad rowcount estimates which have *huge* impact on plan choices.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joshua Tolley 2008-10-17 01:30:43 Re: Cross-column statistics revisited
Previous Message Greg Stark 2008-10-17 00:00:20 Re: Cross-column statistics revisited