Re: proposal : cross-column stats

From: Joshua Tolley <eggyknap(at)gmail(dot)com>
To: Nathan Boley <npboley(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: proposal : cross-column stats
Date: 2010-12-13 17:59:01
Message-ID: 4d065f00.ce05ec0a.3fd2.4143@mx.google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Dec 12, 2010 at 07:10:44PM -0800, Nathan Boley wrote:
> Another quick note: I think that storing the full contingency table is
> wasteful since the marginals are already stored in the single column
> statistics. Look at copulas [2] ( FWIW I think that Josh Tolley was
> looking at this a couple years back ).

Josh Tolley still looks at it occasionally, though time hasn't permitted any
sort of significant work for quite some time. The multicolstat branch on my
git.postgresql.org repository will create an empirical copula each
multi-column index, and stick it in pg_statistic. It doesn't yet do anything
useful with that information, nor am I convinced it's remotely bug-free. In a
brief PGCon discussion with Tom a while back, it was suggested a good place
for the planner to use these stats would be clausesel.c, which is responsible
for handling code such as "...WHERE foo > 4 AND foo > 5".

--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2010-12-13 18:02:29 Re: initdb failure with Postgres 8.4.4
Previous Message Heikki Linnakangas 2010-12-13 17:57:52 Re: GiST insert algorithm rewrite