Lists: | pgsql-hackers |
---|
From: | Dimitri Fontaine <dim(at)tapoueh(dot)org> |
---|---|
To: | eggyknap(at)gmail(dot)com |
Cc: | postgres(at)cybertec(dot)at, tgl(at)sss(dot)pgh(dot)pa(dot)us, heikki(dot)linnakangas(at)enterprisedb(dot)com, pgsql-hackers(at)postgresql(dot)org, zb(at)cybertec(dot)at |
Subject: | Re: cross column correlation revisted |
Date: | 2010-07-14 23:33:54 |
Message-ID: | 49F55693-2ED4-4E39-8067-6876B496DCE2@tapoueh.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Joshua Tolley <eggyknap(at)gmail(dot)com> writes:
>> >> ALTER TABLE x SET CORRELATION STATISTICS FOR (x.id =3D y.id AND x.id=
2 =3D y.id2)
>> >=20
>> it says X and Y ... the selectivity of joins are what i am most
>> interested in. cross correlation of columns within the same table are
>> just a byproduct. the core thing is: how can i estimate the number
>> of rows returned from a join?
>
> All the discussion of this topic that I've seen has been limited to the s=
ingle
> table case. The hard problem in that case is coming up with something you=
can
> precalculate that will actually be useful during query planning, without
> taking too much disk, memory, CPU, or something else. Expanding the discu=
ssion
> to include join relations certainly still has valid use cases, but is even
> harder, because you've also got to keep track of precisely how the underl=
ying
> relations are joined, so you know in what context the statistics remain v=
alid.
Well I've been proposing to handle the correlation problem in another
way in some past mails here, and I've been trying to write it down too:
http://archives.postgresql.org/pgsql-performance/2009-06/msg00118.php
http://tapoueh.org/char10.html#sec13
What I propose is to extend ANALYZE to be able to work on a VIEW too,
rather than just a table. The hard parts seems to be:
a. what stats to record, exploiting the view definition the best we can
b. how to match a user query against the view definitions we have in
order to actually use the stats
If you have answers or good ideas=C2=A0:)
Regards,
--=20
dim
--
dim
From: | Hans-Jürgen Schönig <postgres(at)cybertec(dot)at> |
---|---|
To: | Dimitri Fontaine <dim(at)tapoueh(dot)org> |
Cc: | eggyknap(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, heikki(dot)linnakangas(at)enterprisedb(dot)com, pgsql-hackers(at)postgresql(dot)org, zb(at)cybertec(dot)at |
Subject: | Re: cross column correlation revisted |
Date: | 2010-07-15 10:04:21 |
Message-ID: | 7EA01A14-3D58-4E9F-89CF-AB83044A2F40@cybertec.at |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
hello ...
a view is already nice but i think it is still too narrow.
the problem is: you don't want a view for every potential join.
in addition to that - ideally there is not much left of a view when it comes to checking for costs.
so, i think, this is not the kind of approach leading to total success here.
one side question: does anybody happen to know how this is one in oracle or db2?
many thanks,
hans
On Jul 15, 2010, at 1:33 AM, Dimitri Fontaine wrote:
> Joshua Tolley <eggyknap(at)gmail(dot)com> writes:
>>>>> ALTER TABLE x SET CORRELATION STATISTICS FOR (x.id =3D y.id AND x.id=
> 2 =3D y.id2)
>>>> =20
>>> it says X and Y ... the selectivity of joins are what i am most
>>> interested in. cross correlation of columns within the same table are
>>> just a byproduct. the core thing is: how can i estimate the number
>>> of rows returned from a join?
>>
>> All the discussion of this topic that I've seen has been limited to the s=
> ingle
>> table case. The hard problem in that case is coming up with something you=
> can
>> precalculate that will actually be useful during query planning, without
>> taking too much disk, memory, CPU, or something else. Expanding the discu=
> ssion
>> to include join relations certainly still has valid use cases, but is even
>> harder, because you've also got to keep track of precisely how the underl=
> ying
>> relations are joined, so you know in what context the statistics remain v=
> alid.
>
> Well I've been proposing to handle the correlation problem in another
> way in some past mails here, and I've been trying to write it down too:
>
> http://archives.postgresql.org/pgsql-performance/2009-06/msg00118.php
> http://tapoueh.org/char10.html#sec13
>
> What I propose is to extend ANALYZE to be able to work on a VIEW too,
> rather than just a table. The hard parts seems to be:
>
> a. what stats to record, exploiting the view definition the best we can
> b. how to match a user query against the view definitions we have in
> order to actually use the stats
>
> If you have answers or good ideas=C2=A0:)
>
> Regards,
> --=20
> dim
>
>
> --
> dim
>
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de
From: | Joshua Tolley <eggyknap(at)gmail(dot)com> |
---|---|
To: | Hans-Jürgen Schönig <postgres(at)cybertec(dot)at> |
Cc: | Dimitri Fontaine <dim(at)tapoueh(dot)org>, tgl(at)sss(dot)pgh(dot)pa(dot)us, heikki(dot)linnakangas(at)enterprisedb(dot)com, pgsql-hackers(at)postgresql(dot)org, zb(at)cybertec(dot)at |
Subject: | Re: cross column correlation revisted |
Date: | 2010-07-15 13:39:07 |
Message-ID: | 4c3f0f7e.9f3ae70a.0ae9.2699@mx.google.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Thu, Jul 15, 2010 at 12:04:21PM +0200, Hans-Jürgen Schönig wrote:
> hello ...
>
> a view is already nice but i think it is still too narrow.
> the problem is: you don't want a view for every potential join.
> in addition to that - ideally there is not much left of a view when it comes to checking for costs.
> so, i think, this is not the kind of approach leading to total success here.
The prolem is a very big one, and it's helpful to try solving it one piece at
a time, so the single table and view-based cases are probably good starting
points.
> one side question: does anybody happen to know how this is one in oracle or db2?
Neither appear to handle multi-column statistics in any form.
[1] http://download.oracle.com/docs/cd/B13789_01/appdev.101/b10802/d_stats.htm
[2]
http://www.ibm.com/developerworks/data/library/techarticle/dm-0606fechner/index.html
--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com
From: | David Fetter <david(at)fetter(dot)org> |
---|---|
To: | Hans-Jürgen Schönig <postgres(at)cybertec(dot)at> |
Cc: | Dimitri Fontaine <dim(at)tapoueh(dot)org>, eggyknap(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, heikki(dot)linnakangas(at)enterprisedb(dot)com, pgsql-hackers(at)postgresql(dot)org, zb(at)cybertec(dot)at |
Subject: | Re: cross column correlation revisted |
Date: | 2010-07-15 14:14:44 |
Message-ID: | 20100715141444.GB22026@fetter.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Thu, Jul 15, 2010 at 12:04:21PM +0200, Hans-Jürgen Schönig wrote:
> hello ...
>
> a view is already nice but i think it is still too narrow.
One sure way to fail is to take on a problem in chunks too large. If
we get even one of the cross-column issues solved by statistics, we'll
be ahead of all our competition, both free and proprietary.
Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate