Re: multivariate statistics (v19)

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Ants Aasma <ants(dot)aasma(at)eesti(dot)ee>
Cc: Tatsuo Ishii <ishii(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, david(at)pgmasters(dot)net, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, petr(at)2ndquadrant(dot)com, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: multivariate statistics (v19)
Date: 2016-08-10 18:07:23
Message-ID: 58b46a5b-b3b5-f7c7-39e2-a0d062da9bf8@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 08/10/2016 03:29 PM, Ants Aasma wrote:
> On Wed, Aug 3, 2016 at 4:58 AM, Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> 2) combining multiple statistics
>>
>> I think the ability to combine multivariate statistics (covering different
>> subsets of conditions) is important and useful, but I'm starting to think
>> that the current implementation may not be the correct one (which is why I
>> haven't written the SGML docs about this part of the patch series yet).
>
> While researching this topic a few years ago I came across a paper on
> this exact topic called "Consistently Estimating the Selectivity of
> Conjuncts of Predicates" [1]. While effective it seems to be quite
> heavy-weight, so would probably need support for tiered optimization.
>
> [1] https://courses.cs.washington.edu/courses/cse544/11wi/papers/markl-vldb-2005.pdf
>

I think I've read that paper some time ago, and IIRC it's solving the
same problem but in a very different way - instead of combining the
statistics directly, it relies on the "partial" selectivities and then
estimates the total selectivity using the maximum-entropy principle.

I think it's a nice idea and it probably works fine in many cases, but
it kinda throws away part of the information (that we could get by
matching the statistics against each other directly). But I'll keep that
paper in mind, and we can revisit this solution later.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2016-08-10 18:09:14 Re: multivariate statistics (v19)
Previous Message Regina Obe 2016-08-10 18:03:24 Re: Is there a way around function search_path killing SQL function inlining?