Quick Links

Re: WIP: cross column correlation ...

Lists:	pgsql-hackers

From:	PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>
To:	pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc:	Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject:	WIP: cross column correlation ...
Date:	2011-02-22 19:48:47
Message-ID:	95E9B329-CE40-495D-B954-B1FB9B6ADDA4@cybertec.at
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

hello everbody,

we have spent some time in finally attacking cross column correlation. as this is an issue which keeps bugging us for a couple of applications (some years). this is a WIP patch which can do:

special cross column correlation specific syntax:

CREATE CROSS COLUMN STATISTICS ON tablename (field, ...);
DROP CROSS COLUMN STATISTICS ON tablename (field, ...);

we use specific syntax because we simply cannot keep track of all possible correlations in the DB so the admi can take care of things explicitly. some distant day somebody might want to write a mechanism to derive the desired stats automatically but this is beyond the scope of our project for now.

as far as the patch is concerned:
it is patched nicely into clauselist_selectivity(), but has some rough edges, even when a cross-col stat is found, the single col selectivities are still counted ( = lovering the selectivity even more), this is a TODO.
this patch adds the grammar and the start of planner integration with a static selectivity value for now, the previous discussion about cross-column statistics can be continued and perhaps comes to fruition soon.

how does it work? we try to find suitable statistics for an arbitrary length list of conditions so that the planner can use it directly rather than multiplying all the selectivities. this should make estimates a lot more precise.
the current approach can be extended to work with expressions and well as "straight" conditions.

goal: to make cross column correlation work for 9.2 ...

the purpose of this mail is mostly to get the race for a patch going and to see if the approach as such is reasonable / feasible.

many thanks,

hans

Attachment	Content-Type	Size
cross-column-v5.patch	application/octet-stream	38.2 KB

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>
Cc:	pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-02-23 01:58:00
Message-ID:	AANLkTikYR1KJf-AyyejD3E+SjAYLCi2cEj8SCYRY7QJE@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

2011/2/22 PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>:
> how does it work? we try to find suitable statistics for an arbitrary length list of conditions so that the planner can use it directly rather than multiplying all the selectivities. this should make estimates a lot more precise.
> the current approach can be extended to work with expressions and well as "straight" conditions.

/me prepares to go down in flames.

Personally, I think the first thing we ought to do is add a real, bona
fide planner hint to override the selectivity calculation manually,
maybe something like this:

WHERE (x < 5 AND y = 1) SELECTIVITY (0.1);

Then, having provided a method for the DBA to extinguish the raging
flames of searing agony which are consuming them while a crocodile
chews off their leg and their boss asks them why they didn't use
Oracle, we can continue bikeshedding about the best way of fixing this
problem in a more user-transparent fashion.

As to the approach you've proposed here, I'm not sure I understand
what this is actually doing. Selectivity estimates aren't made
directly for predicates; they're made based on MCV and histogram
information for predicates.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-02-23 02:43:59
Message-ID:	11201.1298429039@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> /me prepares to go down in flames.

> Personally, I think the first thing we ought to do is add a real, bona
> fide planner hint to override the selectivity calculation manually,
> maybe something like this:

> WHERE (x < 5 AND y = 1) SELECTIVITY (0.1);

One of the criteria we've always had for a suitable hint-or-whatever-
you-call-it design is that it *not* involve decorating the queries.
There are a number of reasons for that, some of the killer ones being

(1) People frequently *can't* adjust their queries that way, because
they're coming out of some broken query generator or other. (Crappy
query generators are of course one of the prime reasons for
poor-performing queries in the first place, so you can't write this off
as not being a key use case.)

(2) Anything we do like that, we'd be locked into supporting forever,
even after we think of better solutions.

(3) People don't like decorating their queries with nonstandard stuff;
it smells of vendor lock-in. Especially if it's actually SQL syntax
and not comments. Once you put something into the DML it's just too
hard to fix applications to get rid of it (the inverse case of point
#1).

I haven't looked at Hans' patch in any detail, and don't intend to
do so while the CF is still running; but at least he got this point
right.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-02-23 02:56:37
Message-ID:	AANLkTimqd_H2reQZSakcB4i1MiFbFnAo56GV6LU3u_od@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Feb 22, 2011 at 9:43 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> /me prepares to go down in flames.
>
>> Personally, I think the first thing we ought to do is add a real, bona
>> fide planner hint to override the selectivity calculation manually,
>> maybe something like this:
>
>> WHERE (x < 5 AND y = 1) SELECTIVITY (0.1);
>
> One of the criteria we've always had for a suitable hint-or-whatever-
> you-call-it design is that it *not* involve decorating the queries.
> There are a number of reasons for that, some of the killer ones being
>
> (1) People frequently *can't* adjust their queries that way, because
> they're coming out of some broken query generator or other. (Crappy
> query generators are of course one of the prime reasons for
> poor-performing queries in the first place, so you can't write this off
> as not being a key use case.)
>
> (2) Anything we do like that, we'd be locked into supporting forever,
> even after we think of better solutions.
>
> (3) People don't like decorating their queries with nonstandard stuff;
> it smells of vendor lock-in. Especially if it's actually SQL syntax
> and not comments. Once you put something into the DML it's just too
> hard to fix applications to get rid of it (the inverse case of point
> #1).

Those are real problems, but I still want it. The last time I hit
this problem I spent two days redesigning my schema and adding
triggers all over the place to make things work. If I had been
dealing with a 30TB database instead of a 300MB database I would have
been royally up a creek.

To put that another way, it's true that some people can't adjust their
queries, but also some people can. It's true that nonstandard stuff
sucks, but queries that don't work suck, too. And as for better
solutions, how many major release cycles do we expect people to wait
for them? Even one major release cycle is an eternity when you're
trying to get the application working before your company runs out of
money, and this particular problem has had a lot of cycles expended on
it without producing anything very tangible (proposed patch, which
like you I can't spare a lot of cycles to look at just now, possibly
excepted).

I agree that if we can get something that actually works that doesn't
involve decorating the queries, that is better. But I would surely
rather decorate the queries than rewrite the entire application around
the problem.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-02-23 03:04:12
Message-ID:	11628.1298430252@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Tue, Feb 22, 2011 at 9:43 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> One of the criteria we've always had for a suitable hint-or-whatever-
>> you-call-it design is that it *not* involve decorating the queries.

> [ snip ]
> To put that another way, it's true that some people can't adjust their
> queries, but also some people can. It's true that nonstandard stuff
> sucks, but queries that don't work suck, too. And as for better
> solutions, how many major release cycles do we expect people to wait
> for them?

Well, a decorating-the-queries solution that isn't utter crap is not
going to be a small amount of work, either.

regards, tom lane

From:	Nathan Boley <npboley(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-02-23 05:50:50
Message-ID:	AANLkTimLFwakAxeSvjBwBo_sdzGrbeBP9a=BrDkDLn6O@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

If you're going to go that far, why not just collect statistics on
that specific predicate?

ie, ANALYZE SELECTIVITY ON tablename (x, y) WHERE (x < 5 AND y = 1);

Then it won't fall subject to all of the pitfalls that Tom outlines below.

Selectivities are easy to estimate if we know the predicate. They only
become hard when they have to work for every possible predicate.

Best,
Nathan

From:	PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-02-23 08:02:28
Message-ID:	BA2B781B-F048-4654-960B-D15FAA902400@cybertec.at
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Feb 23, 2011, at 2:58 AM, Robert Haas wrote:

> 2011/2/22 PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>:
>> how does it work? we try to find suitable statistics for an arbitrary length list of conditions so that the planner can use it directly rather than multiplying all the selectivities. this should make estimates a lot more precise.
>> the current approach can be extended to work with expressions and well as "straight" conditions.
>
> /me prepares to go down in flames.
>
> Personally, I think the first thing we ought to do is add a real, bona
> fide planner hint to override the selectivity calculation manually,
> maybe something like this:
>
> WHERE (x < 5 AND y = 1) SELECTIVITY (0.1);

i thought there was an agreement that we don't want planner hints?
as tom pointed out - many broken queries come out of some query generator where even the design to make the design is broken by design.
personally i like query generators as long as other people use them ... telling people that this is the wrong way to go is actually financing my holiday next week ... ;). in general - hibernate and stuff like that is a no-go.

personally i like the type of planner hints oleg and teodor came up with - i think we should do more of those hooks they are using but hiding it in some syntax is not a good idea.
it does not change the query and it still gives a lot of room to toy around. it looks like a compromise.

however, oleg's contrib module does not fix the core problem of cross column statistics because a hint is usually static but you want flexible selectivity.

regards,

hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de

From:	PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-02-23 08:15:39
Message-ID:	16061BD9-C58F-47DB-AFCC-D16F0CBBA438@cybertec.at
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> Those are real problems, but I still want it. The last time I hit
> this problem I spent two days redesigning my schema and adding
> triggers all over the place to make things work. If I had been
> dealing with a 30TB database instead of a 300MB database I would have
> been royally up a creek.
>
> To put that another way, it's true that some people can't adjust their
> queries, but also some people can. It's true that nonstandard stuff
> sucks, but queries that don't work suck, too. And as for better
> solutions, how many major release cycles do we expect people to wait
> for them? Even one major release cycle is an eternity when you're
> trying to get the application working before your company runs out of
> money, and this particular problem has had a lot of cycles expended on
> it without producing anything very tangible (proposed patch, which
> like you I can't spare a lot of cycles to look at just now, possibly
> excepted).

cheapest and easiest solution if you run into this: add "fake" functions which the planner cannot estimate properly.
use OR to artificially prop up estimates or use AND to artificially lower them. there is actually no need to redesign the schema to get around it but it is such an ugly solution that it does not even deserve to be called "ugly" ...
however, fast and reliable way to get around it.

regards,

hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Nathan Boley <npboley(at)gmail(dot)com>
Cc:	PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-02-23 14:46:34
Message-ID:	AANLkTind81SDsTqrsSnvoeMJeKbNafAtJXdW4=U1NkRa@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Feb 23, 2011 at 12:50 AM, Nathan Boley <npboley(at)gmail(dot)com> wrote:
>> Personally, I think the first thing we ought to do is add a real, bona
>> fide planner hint to override the selectivity calculation manually,
>> maybe something like this:
>>
>> WHERE (x < 5 AND y = 1) SELECTIVITY (0.1);
>
> If you're going to go that far, why not just collect statistics on
> that specific predicate?
>
> ie, ANALYZE SELECTIVITY ON tablename (x, y) WHERE (x < 5 AND y = 1);
>
> Then it won't fall subject to all of the pitfalls that Tom outlines below.
>
> Selectivities are easy to estimate if we know the predicate. They only
> become hard when they have to work for every possible predicate.

Fair point.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-02-23 14:48:13
Message-ID:	AANLkTi=eW5GOA6jB7bsRBtvf5be_aW07Q7jf_+zczG1Y@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

2011/2/23 PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>:
>> Those are real problems, but I still want it. The last time I hit
>> this problem I spent two days redesigning my schema and adding
>> triggers all over the place to make things work. If I had been
>> dealing with a 30TB database instead of a 300MB database I would have
>> been royally up a creek.
>>
>> To put that another way, it's true that some people can't adjust their
>> queries, but also some people can. It's true that nonstandard stuff
>> sucks, but queries that don't work suck, too. And as for better
>> solutions, how many major release cycles do we expect people to wait
>> for them? Even one major release cycle is an eternity when you're
>> trying to get the application working before your company runs out of
>> money, and this particular problem has had a lot of cycles expended on
>> it without producing anything very tangible (proposed patch, which
>> like you I can't spare a lot of cycles to look at just now, possibly
>> excepted).
>
> cheapest and easiest solution if you run into this: add "fake" functions which the planner cannot estimate properly.
> use OR to artificially prop up estimates or use AND to artificially lower them. there is actually no need to redesign the schema to get around it but it is such an ugly solution that it does not even deserve to be called "ugly" ...
> however, fast and reliable way to get around it.

We couldn't possibly design a hint mechanism that would be uglier or
less future-proof than this workaround (which, by the way, I'll keep
in mind for the next time I get bitten by this).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-02-23 14:54:11
Message-ID:	1CFB88AD-B7E4-4CED-A048-2292E5888B05@cybertec.at
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

>>>
>>
>> cheapest and easiest solution if you run into this: add "fake" functions which the planner cannot estimate properly.
>> use OR to artificially prop up estimates or use AND to artificially lower them. there is actually no need to redesign the schema to get around it but it is such an ugly solution that it does not even deserve to be called "ugly" ...
>> however, fast and reliable way to get around it.
>
> We couldn't possibly design a hint mechanism that would be uglier or
> less future-proof than this workaround (which, by the way, I'll keep
> in mind for the next time I get bitten by this).
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

i think the main issue is: what we do is ugly because of despair and a lack of alternative ... what you proposed is ugly by design ;).
overall: the workaround will win the ugliness contest, however ;).

many thanks,

hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de

From:	PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Nathan Boley <npboley(at)gmail(dot)com>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-02-23 14:56:59
Message-ID:	B987D673-4905-4154-B2D5-C1624368E7E3@cybertec.at
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Feb 23, 2011, at 3:46 PM, Robert Haas wrote:

> On Wed, Feb 23, 2011 at 12:50 AM, Nathan Boley <npboley(at)gmail(dot)com> wrote:
>>> Personally, I think the first thing we ought to do is add a real, bona
>>> fide planner hint to override the selectivity calculation manually,
>>> maybe something like this:
>>>
>>> WHERE (x < 5 AND y = 1) SELECTIVITY (0.1);
>>
>> If you're going to go that far, why not just collect statistics on
>> that specific predicate?
>>
>> ie, ANALYZE SELECTIVITY ON tablename (x, y) WHERE (x < 5 AND y = 1);
>>
>> Then it won't fall subject to all of the pitfalls that Tom outlines below.
>>
>> Selectivities are easy to estimate if we know the predicate. They only
>> become hard when they have to work for every possible predicate.
>
> Fair point.
>
> --
> Robert Haas

basically we got the idea of allowing "expressions" in cross column stuff. i think this can be very useful. it would fix the problem of a query like that:

SELECT * FROM table WHERE cos(field) = some_number;

this takes a constant fraction of the table which is usually plain wrong as well (and the error tends to multiply inside the plan).
i am just not sure if i have understood all corner cases of that already.
ultimate goal: get it right for join estimates (this is why a syntax extension is definitely needed - you cannot track all of them automatically).

many thanks,

hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>
Cc:	pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-02-23 15:10:07
Message-ID:	AANLkTim9agk0dgdVBtjoWs4=HqJvD08p9N_B9e+20bcH@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

2011/2/23 PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>:
> i thought there was an agreement that we don't want planner hints?

Well, I want them. I think some other people do, too. Whether those
people are more numerous than than the people who don't want them, and
how much that matters either way, is another question. I don't want
to have to use them very often, but I like to have an out when I get
desperate.

> as tom pointed out - many broken queries come out of some query generator where even the design to make the design is broken by design.
> personally i like query generators as long as other people use them ... telling people that this is the wrong way to go is actually financing my holiday next week ... ;). in general - hibernate and stuff like that is a no-go.
>
> personally i like the type of planner hints oleg and teodor came up with - i think we should do more of those hooks they are using but hiding it in some syntax is not a good idea.
> it does not change the query and it still gives a lot of room to toy around. it looks like a compromise.
>
> however, oleg's contrib module does not fix the core problem of cross column statistics because a hint is usually static but you want flexible selectivity.

IIRC, what Teodor and Oleg did was a contrib module that excluded a
certain index from consideration based on a GUC. That to me is a
little more hacky than just wiring the selectivity estimate. You're
going to need to set that just before each query that needs it, and
reset it afterwards, so it's actually worse than just decorating the
queries, IMHO. Also, I haven't run into any actual problems in the
field that would be solved by this approach, though I am sure others
have. IME, most bad query plans are caused by either incorrect
estimates of selectivity, or wrongheaded notions about what's likely
to be cached. If we could find a way, automated or manual, of
providing the planner some better information about the facts of life
in those areas, I think we'd be way better off. I'm open to ideas
about what the best way to do that is.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-02-23 22:37:39
Message-ID:	201102232237.p1NMbdb19397@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

PostgreSQL - Hans-Jrgen Schnig wrote:
> > Those are real problems, but I still want it. The last time I hit
> > this problem I spent two days redesigning my schema and adding
> > triggers all over the place to make things work. If I had been
> > dealing with a 30TB database instead of a 300MB database I would have
> > been royally up a creek.
> >
> > To put that another way, it's true that some people can't adjust their
> > queries, but also some people can. It's true that nonstandard stuff
> > sucks, but queries that don't work suck, too. And as for better
> > solutions, how many major release cycles do we expect people to wait
> > for them? Even one major release cycle is an eternity when you're
> > trying to get the application working before your company runs out of
> > money, and this particular problem has had a lot of cycles expended on
> > it without producing anything very tangible (proposed patch, which
> > like you I can't spare a lot of cycles to look at just now, possibly
> > excepted).
>
>
>
> cheapest and easiest solution if you run into this: add "fake" functions
> which the planner cannot estimate properly. use OR to artificially
> prop up estimates or use AND to artificially lower them. there is
> actually no need to redesign the schema to get around it but it is such
> an ugly solution that it does not even deserve to be called "ugly" ...
> however, fast and reliable way to get around it.

I agree that is super-ugly and we do need to address the cross-column
statistics better. I personally like the 2-D histogram idea:

http://archives.postgresql.org/pgsql-hackers/2010-12/msg00913.php

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-02-23 22:40:00
Message-ID:	201102232240.p1NMe0x19604@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas wrote:
> 2011/2/23 PostgreSQL - Hans-J?rgen Sch?nig <postgres(at)cybertec(dot)at>:
> > i thought there was an agreement that we don't want planner hints?
>
> Well, I want them. I think some other people do, too. Whether those
> people are more numerous than than the people who don't want them, and
> how much that matters either way, is another question. I don't want
> to have to use them very often, but I like to have an out when I get
> desperate.
>
> > as tom pointed out - many broken queries come out of some query generator where even the design to make the design is broken by design.
> > personally i like query generators as long as other people use them ... telling people that this is the wrong way to go is actually financing my holiday next week ... ;). ?in general - hibernate and stuff like that is a no-go.
> >
> > personally i like the type of planner hints oleg and teodor came up with - i think we should do more of those hooks they are using but hiding it in some syntax is not a good idea.
> > it does not change the query and it still gives a lot of room to toy around. it looks like a compromise.
> >
> > however, oleg's contrib module does not fix the core problem of cross column statistics because a hint is usually static but you want flexible selectivity.
>
> IIRC, what Teodor and Oleg did was a contrib module that excluded a
> certain index from consideration based on a GUC. That to me is a
> little more hacky than just wiring the selectivity estimate. You're
> going to need to set that just before each query that needs it, and
> reset it afterwards, so it's actually worse than just decorating the
> queries, IMHO. Also, I haven't run into any actual problems in the
> field that would be solved by this approach, though I am sure others
> have. IME, most bad query plans are caused by either incorrect
> estimates of selectivity, or wrongheaded notions about what's likely
> to be cached. If we could find a way, automated or manual, of
> providing the planner some better information about the facts of life
> in those areas, I think we'd be way better off. I'm open to ideas
> about what the best way to do that is.

For me the key is finding a way to get that information to the planner
so all queries can benefit, not just the queries we decorate.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-02-24 01:09:10
Message-ID:	4D65AFB6.1000203@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> Personally, I think the first thing we ought to do is add a real, bona
> fide planner hint to override the selectivity calculation manually,
> maybe something like this:
>
> WHERE (x < 5 AND y = 1) SELECTIVITY (0.1);
>
> Then, having provided a method for the DBA to extinguish the raging
> flames of searing agony which are consuming them while a crocodile
> chews off their leg and their boss asks them why they didn't use
> Oracle, we can continue bikeshedding about the best way of fixing this
> problem in a more user-transparent fashion.

Is there some way we can do that without adding the selectivity hint to
the query itself? That's the biggest issue with hints.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-02-24 03:11:11
Message-ID:	AANLkTi=ZH5=E0gV8YPEEeTMRzcP+U_Th8ASwX9LJis7u@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Feb 23, 2011 at 8:09 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>> Personally, I think the first thing we ought to do is add a real, bona
>> fide planner hint to override the selectivity calculation manually,
>> maybe something like this:
>>
>> WHERE (x < 5 AND y = 1) SELECTIVITY (0.1);
>>
>> Then, having provided a method for the DBA to extinguish the raging
>> flames of searing agony which are consuming them while a crocodile
>> chews off their leg and their boss asks them why they didn't use
>> Oracle, we can continue bikeshedding about the best way of fixing this
>> problem in a more user-transparent fashion.
>
> Is there some way we can do that without adding the selectivity hint to
> the query itself? That's the biggest issue with hints.

I've been mulling this issue over a bit more - Nathan Boley raised a
similar point upthread. I think it's useful to consider some concrete
cases which can occur.

1. Default estimate. The planner tends to estimate that the
selectivity of <something> = <something> is 0.005, and that the
selectivity of <something> != <something> is 0.995, when it doesn't
know any better. This estimate often sucks. Sometimes it sucks
because it's too high, other times because it's too low, and of course
sometimes it is close enough for government work.

2. One special customer. Suppose we have a database that contains
lots and lots of people and associates different attributes to those
people, including customer_id. We put all of our employees in the
table too, and assign them customer_id = 1, since the record with
customer.id = 1 represents us. I've built this kind of system for
several different employers over the years. Turns out, the subset of
the person table with customer_id = 1 looks very different, in terms
of the MCVs on the remaining columns and the distribution of the
values otherwise, than the records with customer_id != 1. I'm sure
this problem comes up in different forms in other domains; this is
just where I've seen it the most.

3. The mostly-redundant condition. Something like creation_date >
'some timestamp' AND active. Turns out, most of the not active stuff
is also... old. A variant of this is creation_date > 'some timestamp'
AND customer_id = 1, which overlaps #2. For extra fun the creation
date and customer_id may be in different tables, with some
intermediate join muddying the waters.

4. The condition that's redundant except when it isn't. The classic
example here is WHERE zipcode = <constant> AND state = <constant>.
Most of the time, the selectivity of the two clauses together is much
higher than the product of their individually selectivities; you might
as well ignore the second part altogether. But if some numbskull user
enters a state that doesn't match the zipcode, then suddenly it
matters a lot - the selectivity drops to zero when the second part is
added.

5. The bitfield. Conditions like (x & 64) != 0. I know disk is
cheap, but people keep doing this.

There are probably some others I'm missing, too. That's just off the
top of my head. Now here are some possible approaches to fixing it:

A. Decorate the query. This would often be useful for case #1, and
some instances of #3 and #5. It's useless for #2 and #4.

B. Specify a particular predicate and the selectivity thereof. Like,
whenever you see (x & 64) = 0, assume the selectivity is 0.5. Upon
reflection, this seems pretty terrible in every respect. Unless you
only ever issue an extremely limited range of queries, you're going to
be hardwiring a lot of selectivities. I think this really only
handles case #5 well, and maybe some instances of case #1.

C. Specify an expression and gather statistics on it as if it were a
column: i.e. ALTER TABLE tab ADD VIRTUAL STATISTICS COLUMN x & 64.
This is pretty good. It is pretty much ideal for #2 and also handles
#5 and some cases of #3 and #1 well. You could even make it handle
some instances of #4 if you made the virtual column ROW(state,
zipcode) and rewrote the query as a row comparison.

D. N x N implicativeness matrix. Record for each pair of attributes
the extent to which a given value for A implies a value for B, and
derate the selectivity multipliers based on this information. This is
an idea of Heikki's. It seemed good to me when he proposed it, and I
think he proposed it in regards to #4, but I'm not sure we really ever
figured out how to make it work.

E. Given a set of columns (A1, .., An), collect MCVs and make a
histogram for ROW(A1, ..., An), and then use it to handle cases like
#4. This is similar to C and is intended to handle the zipcode
problem, but it's not as flexible (because you are only specifying
columns, not expressions). However, it's intended to work without
rewriting the state/zipcode comparisons as a rowcompare.

If you want to take the above as in any way an exhaustive survey of
the landscape (which it isn't), C seems like a standout, maybe
augmented by the making the planner able to notice that A1 = x1 AND A2
= x2 is equivalent to (A1,A2) = (x1, x2) so you don't have to rewrite
queries as much.

I don't really know how to handle the join selectivity problem. I am
not convinced that there is a better solution to that than decorating
the query. After all the join selectivity depends not only on the
join clause itself, but also on what you've filtered out of each table
in the meantime.

Note that I am not sure whether any of this is similar to what the WIP
patch already implements, so apologies for possibly rampaging off in a
different direction and/or reinventing your ideas.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-02-24 03:30:08
Message-ID:	201102240330.p1O3U8Y22434@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas wrote:
> If you want to take the above as in any way an exhaustive survey of
> the landscape (which it isn't), C seems like a standout, maybe
> augmented by the making the planner able to notice that A1 = x1 AND A2
> = x2 is equivalent to (A1,A2) = (x1, x2) so you don't have to rewrite
> queries as much.
>
> I don't really know how to handle the join selectivity problem. I am
> not convinced that there is a better solution to that than decorating
> the query. After all the join selectivity depends not only on the
> join clause itself, but also on what you've filtered out of each table
> in the meantime.

Thinking some more, I think another downside to the "decorate the query"
idea is that many queries use constants that are supplied only at
runtime, so there would be no way to hard-code a selectivity value into
a query when you don't know the value. Could a selectivity function
handle that?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-02-24 03:58:08
Message-ID:	AANLkTikVJXfORX3KkmP5Hj-_91n8CAtnzozdkq7A5RSK@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Feb 23, 2011 at 10:30 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> Robert Haas wrote:
>> If you want to take the above as in any way an exhaustive survey of
>> the landscape (which it isn't), C seems like a standout, maybe
>> augmented by the making the planner able to notice that A1 = x1 AND A2
>> = x2 is equivalent to (A1,A2) = (x1, x2) so you don't have to rewrite
>> queries as much.
>>
>> I don't really know how to handle the join selectivity problem. I am
>> not convinced that there is a better solution to that than decorating
>> the query. After all the join selectivity depends not only on the
>> join clause itself, but also on what you've filtered out of each table
>> in the meantime.
>
> Thinking some more, I think another downside to the "decorate the query"
> idea is that many queries use constants that are supplied only at
> runtime, so there would be no way to hard-code a selectivity value into
> a query when you don't know the value. Could a selectivity function
> handle that?

Beats me. What do you have in mind?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-02-24 06:13:36
Message-ID:	2CAFE07A-9514-414D-9F44-BE85720041E8@cybertec.at
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Feb 24, 2011, at 2:09 AM, Josh Berkus wrote:

>
>> Personally, I think the first thing we ought to do is add a real, bona
>> fide planner hint to override the selectivity calculation manually,
>> maybe something like this:
>>
>> WHERE (x < 5 AND y = 1) SELECTIVITY (0.1);
>>
>> Then, having provided a method for the DBA to extinguish the raging
>> flames of searing agony which are consuming them while a crocodile
>> chews off their leg and their boss asks them why they didn't use
>> Oracle, we can continue bikeshedding about the best way of fixing this
>> problem in a more user-transparent fashion.
>
> Is there some way we can do that without adding the selectivity hint to
> the query itself? That's the biggest issue with hints.
>

well, you could hide this hint in the system table - say; instead of decorating the query you could store the decoration in some system relation ... but, if you get it right, you call this decoration histogram ;).
i think the patch with a multi-dim histogram is good (i have seen something similar for PostGIS).
what is still needed in our patch is a.) multi-dim sampling (no idea how to get it right) and b.) investigating how to deal with joins and expressions (e.g. cos(id) ).
hints into the right direction are highly welcome.

many thanks,

hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-02-24 15:07:35
Message-ID:	201102241507.p1OF7ZU25175@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas wrote:
> On Wed, Feb 23, 2011 at 10:30 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > Robert Haas wrote:
> >> If you want to take the above as in any way an exhaustive survey of
> >> the landscape (which it isn't), C seems like a standout, maybe
> >> augmented by the making the planner able to notice that A1 = x1 AND A2
> >> = x2 is equivalent to (A1,A2) = (x1, x2) so you don't have to rewrite
> >> queries as much.
> >>
> >> I don't really know how to handle the join selectivity problem. ?I am
> >> not convinced that there is a better solution to that than decorating
> >> the query. ?After all the join selectivity depends not only on the
> >> join clause itself, but also on what you've filtered out of each table
> >> in the meantime.
> >
> > Thinking some more, I think another downside to the "decorate the query"
> > idea is that many queries use constants that are supplied only at
> > runtime, so there would be no way to hard-code a selectivity value into
> > a query when you don't know the value. ?Could a selectivity function
> > handle that?
>
> Beats me. What do you have in mind?

My point is just that many queries have constants who's values are not
known at the time the query is written, so any system should have a way
to handle that somehow. This is why query decoration is usually not a
good solution, and why something more flexible that is stored as part of
the column is preferred.

Perhaps a selectivity function that has easy access to the computed
selectivity of the constant involved might be a win. For example, for
the zip code/state code case we could have something like:

function mysel(zip, state) { pgsel(zip)}

meaning we would still use the selectivities found in the optimizer
statistics (pgsel), but modify them in some way. In the case above, the
selectivity only comes from the zip code. You could also do things like:

function mysel(x, y) { pgsel(x) * pgsel(y) * 0.001}

Such functions have a higher probability of working for all queries
involving that column.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-02-24 20:42:39
Message-ID:	4D66C2BF.30606@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2/23/11 7:10 AM, Robert Haas wrote:
> IME, most bad query plans are caused by either incorrect
> estimates of selectivity, or wrongheaded notions about what's likely
> to be cached. If we could find a way, automated or manual, of
> providing the planner some better information about the facts of life
> in those areas, I think we'd be way better off. I'm open to ideas
> about what the best way to do that is.

As previously discussed, I'm fine with approaches which involve
modifying database objects. These are auditable and centrally managable
and aren't devastating to upgrades.

So thinks like the proposed "CREATE SELECTIVITY" would be OK in a way
that decorating queries would not.

Similiarly, I would love to be able to set "cache %" on a per-relation
basis, and override the whole dubious calculation involving
random_page_cost for scans of that table.

The great thing about object decorations is that we could then collect
data on which ones worked and which didn't through the performance list
and then use those to improve the query planner. I doubt such would
work with query decorations.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-02-25 06:33:59
Message-ID:	201102250633.p1P6Xxf08903@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Josh Berkus wrote:
> On 2/23/11 7:10 AM, Robert Haas wrote:
> > IME, most bad query plans are caused by either incorrect
> > estimates of selectivity, or wrongheaded notions about what's likely
> > to be cached. If we could find a way, automated or manual, of
> > providing the planner some better information about the facts of life
> > in those areas, I think we'd be way better off. I'm open to ideas
> > about what the best way to do that is.
>
> As previously discussed, I'm fine with approaches which involve
> modifying database objects. These are auditable and centrally managable
> and aren't devastating to upgrades.
>
> So thinks like the proposed "CREATE SELECTIVITY" would be OK in a way
> that decorating queries would not.
>
> Similiarly, I would love to be able to set "cache %" on a per-relation
> basis, and override the whole dubious calculation involving
> random_page_cost for scans of that table.

We should just fine a way of checking what percentage of a table is
already in the shared buffers. That doesn't help us with the kernel
cache, but it would be a good start and something that doesn't require
user tuning.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-02-25 13:18:40
Message-ID:	AANLkTi=ggrfv6fZJXf7iy-BJyU=nPMhDTqxQ-SBEa9xu@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Feb 25, 2011 at 1:33 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> Josh Berkus wrote:
>> On 2/23/11 7:10 AM, Robert Haas wrote:
>> > IME, most bad query plans are caused by either incorrect
>> > estimates of selectivity, or wrongheaded notions about what's likely
>> > to be cached. If we could find a way, automated or manual, of
>> > providing the planner some better information about the facts of life
>> > in those areas, I think we'd be way better off. I'm open to ideas
>> > about what the best way to do that is.
>>
>> As previously discussed, I'm fine with approaches which involve
>> modifying database objects. These are auditable and centrally managable
>> and aren't devastating to upgrades.
>>
>> So thinks like the proposed "CREATE SELECTIVITY" would be OK in a way
>> that decorating queries would not.
>>
>> Similiarly, I would love to be able to set "cache %" on a per-relation
>> basis, and override the whole dubious calculation involving
>> random_page_cost for scans of that table.
>
> We should just fine a way of checking what percentage of a table is
> already in the shared buffers. That doesn't help us with the kernel
> cache, but it would be a good start and something that doesn't require
> user tuning.

You're reinventing a wheel that's already been discarded multiple
times. There are at least four separate problems:

1. The percentage of the table which is cached in shared_buffers at
plan time need not match the percentage that is cached at execution
time. A delay of even a few seconds between planning and execution
could make the numbers totally different, and plans can be cached for
much longer than that.

2. Because shared_buffers can turn over quite quickly, planning the
statement multiple times in relatively quick succession could give
different results each time. Previous discussions on this topic have
concluded that DBAs hate plan instability, and hate GEQO because it
causes plan instability, and this would inject plan instabiilty into
the main planner.

3. The percentage of the table which is cached in shared_buffers is
not necessarily representative of the percentage which is cached in
general. On a large machine, shared_buffers may be less than 10% of
the total cache. It would be unwise to make guesses about what is and
is not cached based on a small percentage of the cache.

4. Even if we could accurately estimate the percentage of the table
that is cached, what then? For example, suppose that a user issues a
query which retrieves 1% of a table, and we know that 1% of that table
is cached. How much of the data that the user asked for is cache?
Hard to say, right? It could be none of it or all of it. The second
scenario is easy to imagine - just suppose the query's been executed
twice. The first scenario isn't hard to imagine either.

One idea Tom and I kicked around previously is to set an assumed
caching percentage for each table based on its size relative to
effective_cache_size - in other words, assume that the smaller a table
is, the more of it will be cached. Consider a system with 8GB of RAM,
and a table which is 64kB. It is probably unwise to make any plan
based on the assumption that that table is less than fully cached. If
it isn't before the query executes, it soon will be. Going to any
amount of work elsewhere in the plan to avoid the work of reading that
table in from disk is probably a dumb idea. Of course, one downside
of this approach is that it doesn't know which tables are hot and
which tables are cold, but it would probably still be an improvement
over the status quo.

All that having been said, I think that while Josh is thinking fuzzily
about the mathematics of his proposal, the basic idea is pretty
sensible. It is not easy - likely not possible - for the system to
have a good idea which things will be in some kind of cache at the
time the query is executed; it could even change mid-query. The
execution of one part of the query could evict from the cache data
which some other part of the plan assumed would be cached. But DBAs
frequently have a very good idea of which stuff is in cache - they can
make observations over a period of time and then adjust settings and
then observe some more and adjust some more.

PostgreSQL is extremely easy to administer compared with some of its
competitors, and it's frequently necessary to change very little. But
there's a difference between what you absolutely have to change to
make it work and what you have the option to change when necessary.
We need to decrease the amount of stuff in the first category (as we
recently did with wal_buffers) and increase the amount of stuff in the
second category. People coming from Oracle are not favorably
impressed either by the amount of monitoring data PostgreSQL can
gather or by the number of knobs that are available to fix problems
when they occur. We don't need to have as many knobs as Oracle and we
probably don't want to, and for that matter we probably couldn't if we
did want to for lack of manpower, but that doesn't mean we should have
none.

Maybe sometime during my life someone will invent a self-driving car
where I can just get in and say "take me to Bruce's house" and an hour
later it'll parallel park at the end of his driveway. That will be
great. But I think that the first generation of self-driving cars
will still have a steering wheel, and a brake pedal, and a little
switch that turns self-driving mode OFF. It is one thing to say that
you have a system which is really good and does not need much manual
adjustment, and we have such a system. It is another thing altogether
to systematically remove, or refuse to add, any controls that might
permit adjustment in cases where it is necessary. That can be the
right thing to do if the system is of such high quality that such
manual adjustment is so unlikely to be necessary as not to be worth
worrying about; but we are not at that point. And frankly I think if
we don't add some knobs to let this stuff be tuned manually, we will
never get the experience we need to write good auto-tuning algorithms.
Greg Smith would not have known what algorithm to propose for tuning
the wal_buffers option if he had not had a bunch of experience setting
it by hand.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-02-25 16:45:43
Message-ID:	AANLkTim1rr9RNhHwJAw_MX7zCH=SaP2cWb9f7UK38Wgv@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

2011/2/25 Robert Haas <robertmhaas(at)gmail(dot)com>:
> On Fri, Feb 25, 2011 at 1:33 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>> Josh Berkus wrote:
>>> On 2/23/11 7:10 AM, Robert Haas wrote:
>>> > IME, most bad query plans are caused by either incorrect
>>> > estimates of selectivity, or wrongheaded notions about what's likely
>>> > to be cached. If we could find a way, automated or manual, of
>>> > providing the planner some better information about the facts of life
>>> > in those areas, I think we'd be way better off. I'm open to ideas
>>> > about what the best way to do that is.
>>>
>>> As previously discussed, I'm fine with approaches which involve
>>> modifying database objects. These are auditable and centrally managable
>>> and aren't devastating to upgrades.
>>>
>>> So thinks like the proposed "CREATE SELECTIVITY" would be OK in a way
>>> that decorating queries would not.
>>>
>>> Similiarly, I would love to be able to set "cache %" on a per-relation
>>> basis, and override the whole dubious calculation involving
>>> random_page_cost for scans of that table.
>>
>> We should just fine a way of checking what percentage of a table is
>> already in the shared buffers. That doesn't help us with the kernel
>> cache, but it would be a good start and something that doesn't require
>> user tuning.
>
> You're reinventing a wheel that's already been discarded multiple
> times. There are at least four separate problems:
>
> 1. The percentage of the table which is cached in shared_buffers at
> plan time need not match the percentage that is cached at execution
> time. A delay of even a few seconds between planning and execution
> could make the numbers totally different, and plans can be cached for
> much longer than that.
>
> 2. Because shared_buffers can turn over quite quickly, planning the
> statement multiple times in relatively quick succession could give
> different results each time. Previous discussions on this topic have
> concluded that DBAs hate plan instability, and hate GEQO because it
> causes plan instability, and this would inject plan instabiilty into
> the main planner.
>
> 3. The percentage of the table which is cached in shared_buffers is
> not necessarily representative of the percentage which is cached in
> general. On a large machine, shared_buffers may be less than 10% of
> the total cache. It would be unwise to make guesses about what is and
> is not cached based on a small percentage of the cache.
>
> 4. Even if we could accurately estimate the percentage of the table
> that is cached, what then? For example, suppose that a user issues a
> query which retrieves 1% of a table, and we know that 1% of that table
> is cached. How much of the data that the user asked for is cache?
> Hard to say, right? It could be none of it or all of it. The second
> scenario is easy to imagine - just suppose the query's been executed
> twice. The first scenario isn't hard to imagine either.
>
> One idea Tom and I kicked around previously is to set an assumed
> caching percentage for each table based on its size relative to
> effective_cache_size - in other words, assume that the smaller a table
> is, the more of it will be cached. Consider a system with 8GB of RAM,
> and a table which is 64kB. It is probably unwise to make any plan
> based on the assumption that that table is less than fully cached. If
> it isn't before the query executes, it soon will be. Going to any
> amount of work elsewhere in the plan to avoid the work of reading that
> table in from disk is probably a dumb idea. Of course, one downside
> of this approach is that it doesn't know which tables are hot and
> which tables are cold, but it would probably still be an improvement
> over the status quo.

Yes, good idea.

>
> All that having been said, I think that while Josh is thinking fuzzily
> about the mathematics of his proposal, the basic idea is pretty
> sensible. It is not easy - likely not possible - for the system to
> have a good idea which things will be in some kind of cache at the
> time the query is executed; it could even change mid-query. The
> execution of one part of the query could evict from the cache data
> which some other part of the plan assumed would be cached. But DBAs
> frequently have a very good idea of which stuff is in cache - they can
> make observations over a period of time and then adjust settings and
> then observe some more and adjust some more.

I believe we can maintain a small map of area of a relation which are
in the OS buffer cache (shared buffers move more), or at least a
percentage of the relation in OS cache. Getting autovacuum daemon
being able to update those maps/counters might be enought and easy to
do, it is really near what auto-analyze do. My observation is that
the percentage in cache is stable on a production workload after some
tens of minutes needed to warm the server.

What should really help here is to have hooks in the cost functions to
test those ideas without the need to patch postgresql-core a lot. Will
it be ok to have hooks or will it add to much CPU consumption in a
sensible part of the code ?

>
> PostgreSQL is extremely easy to administer compared with some of its
> competitors, and it's frequently necessary to change very little. But
> there's a difference between what you absolutely have to change to
> make it work and what you have the option to change when necessary.
> We need to decrease the amount of stuff in the first category (as we
> recently did with wal_buffers) and increase the amount of stuff in the
> second category. People coming from Oracle are not favorably
> impressed either by the amount of monitoring data PostgreSQL can
> gather or by the number of knobs that are available to fix problems
> when they occur. We don't need to have as many knobs as Oracle and we
> probably don't want to, and for that matter we probably couldn't if we
> did want to for lack of manpower, but that doesn't mean we should have
> none.
>
> Maybe sometime during my life someone will invent a self-driving car
> where I can just get in and say "take me to Bruce's house" and an hour
> later it'll parallel park at the end of his driveway. That will be
> great. But I think that the first generation of self-driving cars
> will still have a steering wheel, and a brake pedal, and a little
> switch that turns self-driving mode OFF. It is one thing to say that
> you have a system which is really good and does not need much manual
> adjustment, and we have such a system. It is another thing altogether
> to systematically remove, or refuse to add, any controls that might
> permit adjustment in cases where it is necessary. That can be the
> right thing to do if the system is of such high quality that such
> manual adjustment is so unlikely to be necessary as not to be worth
> worrying about; but we are not at that point. And frankly I think if
> we don't add some knobs to let this stuff be tuned manually, we will
> never get the experience we need to write good auto-tuning algorithms.
> Greg Smith would not have known what algorithm to propose for tuning
> the wal_buffers option if he had not had a bunch of experience setting
> it by hand.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

--
Cédric Villemain 2ndQuadrant
http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support

From:	Rod Taylor <rod(dot)taylor(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-02-25 17:03:58
Message-ID:	AANLkTi=5F5nyLE+Ox9j2C1AvzG-1qboZzpuZfGSukdZO@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> 4. Even if we could accurately estimate the percentage of the table
> that is cached, what then? For example, suppose that a user issues a
> query which retrieves 1% of a table, and we know that 1% of that table
> is cached. How much of the data that the user asked for is cache?
> Hard to say, right? It could be none of it or all of it. The second
> scenario is easy to imagine - just suppose the query's been executed
> twice. The first scenario isn't hard to imagine either.
>
>
I have a set of slow disks which can impact performance nearly as much as in
cached in memory versus the fast disks.

How practical would it be for analyze to keep a record of response times for
given sections of a table as it randomly accesses them and generate some
kind of a map for expected response times for the pieces of data it is
analysing?

It may well discover, on it's own, that recent data (1 month old or less)
has a random read response time of N, older data (1 year old) in a different
section of the relation tends to have a response time of 1000N, and really
old data (5 year old) tends to have a response time of 3000N.

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-02-25 17:50:51
Message-ID:	AANLkTikN7hidgDjdVKWR=u2Pxy5m-Hmoo_m-pj6baLnO@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

2011/2/25 Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>:
>> All that having been said, I think that while Josh is thinking fuzzily
>> about the mathematics of his proposal, the basic idea is pretty
>> sensible. It is not easy - likely not possible - for the system to
>> have a good idea which things will be in some kind of cache at the
>> time the query is executed; it could even change mid-query. The
>> execution of one part of the query could evict from the cache data
>> which some other part of the plan assumed would be cached. But DBAs
>> frequently have a very good idea of which stuff is in cache - they can
>> make observations over a period of time and then adjust settings and
>> then observe some more and adjust some more.
>
> I believe we can maintain a small map of area of a relation which are
> in the OS buffer cache (shared buffers move more), or at least a
> percentage of the relation in OS cache. Getting autovacuum daemon
> being able to update those maps/counters might be enought and easy to
> do, it is really near what auto-analyze do. My observation is that
> the percentage in cache is stable on a production workload after some
> tens of minutes needed to warm the server.

I don't think we can assume that will be true in all workloads.
Imagine a server doing batch processing. People submit large batches
of work that take, say, an hour to complete. Not all batches use the
same set of tables - maybe they even run in different databases.
After a big batch process finishes crunching numbers in database A,
very little of database B will be cached. But it's not necessarily
right to assume that when we start queries for a new batch in database
B, although it's more likely to be right for large tables (which will
take a long time to get cached meaningfully, if they ever do) than
small ones. Also, it could lead to strange issues where batches run
much faster or slower depending on which batch immediately proceeded
them. If we're going to do something a lot of times, it'd be better
to bite the bullet and read it all in rather than going to more work
elsewhere, but if we're only going to touch it once, then not so much.

You might also have this issue on systems that run OLTP workloads all
day and then do some batch processing at night to get ready for the
next business day. Kevin Grittner wrote previously about those jobs
needing some different settings in his environment (I'm not
remembering which settings at the moment). Suppose that the batch
process is going to issue a query that can be planned in one of two
possible ways. One way involves reading 10% of a relation, and the
other way involves reading the whole thing. The first plan takes 200
s to execute if the relation is not cached, and 180 s if the relevant
portion is cached. The second plan takes 300 s to execute if the
relation is not cached, and 100 s if it is cached. At the start of
the batch run, the relation won't be cached, because it's used *only*
by the overnight job and not by the daily OLTP traffic. Which way
should we execute the query?

The answer is that if the batch job only needs to execute that query
*once*, we should do it the first way. But if it needs to execute it
three or more times, the second way is better, but only if we use the
second plan every time. If we start out with the first plan, we're
always better off sticking with it *unless* we know that we're going
to repeat the query at least twice more after the iteration we're
currently planning. To make the right decision, the query planner
needs a crystal ball. Or, a little help from the DBA.

> What should really help here is to have hooks in the cost functions to
> test those ideas without the need to patch postgresql-core a lot. Will
> it be ok to have hooks or will it add to much CPU consumption in a
> sensible part of the code ?

Depends on where you put them, I guess. Hooks are pretty cheap, but
they're also pretty hard to use.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Rod Taylor <rod(dot)taylor(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-02-25 19:26:10
Message-ID:	1298661927-sup-4013@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Rod Taylor's message of vie feb 25 14:03:58 -0300 2011:

> How practical would it be for analyze to keep a record of response times for
> given sections of a table as it randomly accesses them and generate some
> kind of a map for expected response times for the pieces of data it is
> analysing?

I think what you want is random_page_cost that can be tailored per
tablespace.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-02-25 22:35:45
Message-ID:	AANLkTimb9AjbgGZePLx43FjAvM_GL+PL+htJKifn0M51@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

2011/2/25 Robert Haas <robertmhaas(at)gmail(dot)com>:
> 2011/2/25 Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>:
>>> All that having been said, I think that while Josh is thinking fuzzily
>>> about the mathematics of his proposal, the basic idea is pretty
>>> sensible. It is not easy - likely not possible - for the system to
>>> have a good idea which things will be in some kind of cache at the
>>> time the query is executed; it could even change mid-query. The
>>> execution of one part of the query could evict from the cache data
>>> which some other part of the plan assumed would be cached. But DBAs
>>> frequently have a very good idea of which stuff is in cache - they can
>>> make observations over a period of time and then adjust settings and
>>> then observe some more and adjust some more.
>>
>> I believe we can maintain a small map of area of a relation which are
>> in the OS buffer cache (shared buffers move more), or at least a
>> percentage of the relation in OS cache. Getting autovacuum daemon
>> being able to update those maps/counters might be enought and easy to
>> do, it is really near what auto-analyze do. My observation is that
>> the percentage in cache is stable on a production workload after some
>> tens of minutes needed to warm the server.
>
> I don't think we can assume that will be true in all workloads.
> Imagine a server doing batch processing. People submit large batches
> of work that take, say, an hour to complete. Not all batches use the
> same set of tables - maybe they even run in different databases.
> After a big batch process finishes crunching numbers in database A,
> very little of database B will be cached. But it's not necessarily
> right to assume that when we start queries for a new batch in database
> B, although it's more likely to be right for large tables (which will
> take a long time to get cached meaningfully, if they ever do) than
> small ones. Also, it could lead to strange issues where batches run
> much faster or slower depending on which batch immediately proceeded
> them. If we're going to do something a lot of times, it'd be better
> to bite the bullet and read it all in rather than going to more work
> elsewhere, but if we're only going to touch it once, then not so much.
>
> You might also have this issue on systems that run OLTP workloads all
> day and then do some batch processing at night to get ready for the
> next business day. Kevin Grittner wrote previously about those jobs
> needing some different settings in his environment (I'm not
> remembering which settings at the moment). Suppose that the batch
> process is going to issue a query that can be planned in one of two
> possible ways. One way involves reading 10% of a relation, and the
> other way involves reading the whole thing. The first plan takes 200
> s to execute if the relation is not cached, and 180 s if the relevant
> portion is cached. The second plan takes 300 s to execute if the
> relation is not cached, and 100 s if it is cached. At the start of
> the batch run, the relation won't be cached, because it's used *only*
> by the overnight job and not by the daily OLTP traffic. Which way
> should we execute the query?
>
> The answer is that if the batch job only needs to execute that query
> *once*, we should do it the first way. But if it needs to execute it
> three or more times, the second way is better, but only if we use the
> second plan every time. If we start out with the first plan, we're
> always better off sticking with it *unless* we know that we're going
> to repeat the query at least twice more after the iteration we're
> currently planning. To make the right decision, the query planner
> needs a crystal ball. Or, a little help from the DBA.

Yes, we are talking of improving some part of the model.
Some workloads are dramatic and need special customization. This is true.

Still there is a path of improvement, and probably it will remain a
path of improvement after the current model is updated.

I am not proposing something to solve all the issues, but way more
interesting IMHO than just letting the dba say : 'this table is in
cache at XX%'.

Btw, pgfincore already do solve the usecase you provide by helping the
DBA to prepare its batch processing, so in some sense I am familiar
with what you describe (take the second plan, pgfincore will preload
in the background, and your query will be done in 100s from the
first).

>
>> What should really help here is to have hooks in the cost functions to
>> test those ideas without the need to patch postgresql-core a lot. Will
>> it be ok to have hooks or will it add to much CPU consumption in a
>> sensible part of the code ?
>
> Depends on where you put them, I guess. Hooks are pretty cheap, but
> they're also pretty hard to use.

Yes, it will be easier to make an extension, have people testing it
and validate or not the 'new' model

--
Cédric Villemain 2ndQuadrant
http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-02-25 23:41:09
Message-ID:	4D683E15.4090409@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

FWIW, for a manual override setting, I was thinking that the % would
convert to a probability. In that way, it wouldn't be different from
the existing RPC calculation; we're just estimating how *likely* it is
that the data the user wants is cached.

> One idea Tom and I kicked around previously is to set an assumed
> caching percentage for each table based on its size relative to
> effective_cache_size - in other words, assume that the smaller a table
> is, the more of it will be cached. Consider a system with 8GB of RAM,
> and a table which is 64kB. It is probably unwise to make any plan
> based on the assumption that that table is less than fully cached. If
> it isn't before the query executes, it soon will be. Going to any
> amount of work elsewhere in the plan to avoid the work of reading that
> table in from disk is probably a dumb idea. Of course, one downside
> of this approach is that it doesn't know which tables are hot and
> which tables are cold, but it would probably still be an improvement
> over the status quo.

Actually, we *do* have some idea which tables are hot. Or at least, we
could. Currently, pg_stats for tables are "timeless"; they just
accumulate from the last reset, which has always been a problem in
general for monitoring. If we could make top-level table and index
stats time-based, even in some crude way, we would know which tables
were currently hot. That would also have the benefit of making server
performance analysis and autotuning easier.

> But DBAs
> frequently have a very good idea of which stuff is in cache - they can
> make observations over a period of time and then adjust settings and
> then observe some more and adjust some more.

Agreed.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Rod Taylor <rod(dot)taylor(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-02-26 05:24:26
Message-ID:	AANLkTik_QsFRYRvPpJhmeu3Jw4+1SKCM1PuN+D7YVh7m@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Feb 25, 2011 at 2:26 PM, Alvaro Herrera
<alvherre(at)commandprompt(dot)com> wrote:
> Excerpts from Rod Taylor's message of vie feb 25 14:03:58 -0300 2011:
>
>> How practical would it be for analyze to keep a record of response times for
>> given sections of a table as it randomly accesses them and generate some
>> kind of a map for expected response times for the pieces of data it is
>> analysing?
>
> I think what you want is random_page_cost that can be tailored per
> tablespace.

We have that.

But it's not the same as tracking *sections of a table*.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-02-26 05:29:14
Message-ID:	AANLkTi=jr9nJcTS25avs=Mwm=jE796zw1pjJCKwvnDCo@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Feb 25, 2011 at 6:41 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>> One idea Tom and I kicked around previously is to set an assumed
>> caching percentage for each table based on its size relative to
>> effective_cache_size - in other words, assume that the smaller a table
>> is, the more of it will be cached. Consider a system with 8GB of RAM,
>> and a table which is 64kB. It is probably unwise to make any plan
>> based on the assumption that that table is less than fully cached. If
>> it isn't before the query executes, it soon will be. Going to any
>> amount of work elsewhere in the plan to avoid the work of reading that
>> table in from disk is probably a dumb idea. Of course, one downside
>> of this approach is that it doesn't know which tables are hot and
>> which tables are cold, but it would probably still be an improvement
>> over the status quo.
>
> Actually, we *do* have some idea which tables are hot. Or at least, we
> could. Currently, pg_stats for tables are "timeless"; they just
> accumulate from the last reset, which has always been a problem in
> general for monitoring. If we could make top-level table and index
> stats time-based, even in some crude way, we would know which tables
> were currently hot. That would also have the benefit of making server
> performance analysis and autotuning easier.

I think there would be value in giving the DBA an easier way to see
which tables are hot, but I am really leery about the idea of trying
to feed that directly into the query planner. I think this is one of
those cases where we let people tune it manually for starters, and
then wait for feedback. Eventually someone will say "oh, I never tune
that by hand any more, ever since I wrote this script which does the
following computation... and I just run it out cron". And then we
will get out the party hats. But we will never get the experience we
need to say what that auto-tuning algorithm will be unless we first
provide the knob for someone to fiddle with manually.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-02-26 06:57:54
Message-ID:	201102260657.p1Q6vtL01661@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas wrote:
> > Actually, we *do* have some idea which tables are hot. ?Or at least, we
> > could. ? Currently, pg_stats for tables are "timeless"; they just
> > accumulate from the last reset, which has always been a problem in
> > general for monitoring. ?If we could make top-level table and index
> > stats time-based, even in some crude way, we would know which tables
> > were currently hot. ?That would also have the benefit of making server
> > performance analysis and autotuning easier.
>
> I think there would be value in giving the DBA an easier way to see
> which tables are hot, but I am really leery about the idea of trying
> to feed that directly into the query planner. I think this is one of
> those cases where we let people tune it manually for starters, and
> then wait for feedback. Eventually someone will say "oh, I never tune
> that by hand any more, ever since I wrote this script which does the
> following computation... and I just run it out cron". And then we
> will get out the party hats. But we will never get the experience we
> need to say what that auto-tuning algorithm will be unless we first
> provide the knob for someone to fiddle with manually.

It is also possible we will implement a manual way and never get around
to automating it. :-(

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

From:	Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-02-26 09:33:14
Message-ID:	D549CEA4-1659-48AA-A674-77C80D24DEB0@pointblue.com.pl
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 25 Feb 2011, at 13:18, Robert Haas wrote:

> People coming from Oracle are not favorably
> impressed either by the amount of monitoring data PostgreSQL can
> gather or by the number of knobs that are available to fix problems
> when they occur. We don't need to have as many knobs as Oracle and we
> probably don't want to, and for that matter we probably couldn't if we
> did want to for lack of manpower, but that doesn't mean we should have
> none.

Still, having more data a user can probe would be nice.

I wonder why everyone avoids Microsoft's approach to the subject. Apparently, they go in the 'auto-tune as much as possible' direction.
And tests we did a while ago, involving asking team from Microsoft and a team from oracle to optimise set of queries for the same set of data (bookies data, loads of it) showed that the auto-tuning Microsoft has in their
sql server performed much better than a team of over-sweating oracle dba's.

In my current work place/camp we have many deployments of the same system, over different types of machines, each with different customer data that vary so much that queries need to be rather generic.
Postgresql shows its strength with planner doing a good job for different variants of data, however we do a very little tweaking to the configuration parameters. Just because it is just too hard to overlook all of them.
I guess that the systems could behave much better, but no one is going to tweak settings for 50 different installations over 50 different type of data and 50 different sets of hardware.
If there was even a tiny amount of automation provided in the postgresql, I would welcome it with open arms.

From:	Rod Taylor <rod(dot)taylor(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-02-26 13:38:31
Message-ID:	AANLkTin5u=L7o7Kbx8wttcJdiLJBqYjuYzHT697ANQ8f@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Feb 25, 2011 at 14:26, Alvaro Herrera <alvherre(at)commandprompt(dot)com>wrote:

> Excerpts from Rod Taylor's message of vie feb 25 14:03:58 -0300 2011:
>
> > How practical would it be for analyze to keep a record of response times
> for
> > given sections of a table as it randomly accesses them and generate some
> > kind of a map for expected response times for the pieces of data it is
> > analysing?
>
> I think what you want is random_page_cost that can be tailored per
> tablespace.
>
>
Yes, that can certainly help but does nothing to help with finding typical
hot-spots or cached sections of the table and sending that information to
the planner.

Between Analyze random sampling and perhaps some metric during actual IO of
random of queries we should be able to determine and record which pieces of
data tend to be hot/in cache, or readily available and what data tends not
to be.

If the planner knew that the value "1" tends to have a much lower cost to
fetch than any other value in the table (it is cached or otherwise readily
available), it can choose a plan better suited toward that.

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-02-26 14:45:24
Message-ID:	AANLkTimkRCjUSgpmFK1QTLE0xe5u_OpDRxjGRiXb9aCK@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Feb 26, 2011 at 4:33 AM, Grzegorz Jaskiewicz
<gj(at)pointblue(dot)com(dot)pl> wrote:
>
> On 25 Feb 2011, at 13:18, Robert Haas wrote:
>
>> People coming from Oracle are not favorably
>> impressed either by the amount of monitoring data PostgreSQL can
>> gather or by the number of knobs that are available to fix problems
>> when they occur. We don't need to have as many knobs as Oracle and we
>> probably don't want to, and for that matter we probably couldn't if we
>> did want to for lack of manpower, but that doesn't mean we should have
>> none.
>
> Still, having more data a user can probe would be nice.
>
> I wonder why everyone avoids Microsoft's approach to the subject. Apparently, they go in the 'auto-tune as much as possible' direction.
> And tests we did a while ago, involving asking team from Microsoft and a team from oracle to optimise set of queries for the same set of data (bookies data, loads of it) showed that the auto-tuning Microsoft has in their
> sql server performed much better than a team of over-sweating oracle dba's.

I don't think *anyone* is avoiding that approach. There is almost
universal consensus here that auto-tuning is better than manual
tuning, even to the extent of being unwilling to add knobs to allow
manual tuning of settings we have no idea how to auto-tune and no
plans to auto-tune.

> In my current work place/camp we have many deployments of the same system, over different types of machines, each with different customer data that vary so much that queries need to be rather generic.
> Postgresql shows its strength with planner doing a good job for different variants of data, however we do a very little tweaking to the configuration parameters. Just because it is just too hard to overlook all of them.
> I guess that the systems could behave much better, but no one is going to tweak settings for 50 different installations over 50 different type of data and 50 different sets of hardware.
> If there was even a tiny amount of automation provided in the postgresql, I would welcome it with open arms.

What do you have in mind?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-02-26 14:46:55
Message-ID:	AANLkTimnAD6BY6PQVesuih3KLOM1MF3wQ=kEoHZLa8p6@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Feb 26, 2011 at 1:57 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> Robert Haas wrote:
>> > Actually, we *do* have some idea which tables are hot. ?Or at least, we
>> > could. ? Currently, pg_stats for tables are "timeless"; they just
>> > accumulate from the last reset, which has always been a problem in
>> > general for monitoring. ?If we could make top-level table and index
>> > stats time-based, even in some crude way, we would know which tables
>> > were currently hot. ?That would also have the benefit of making server
>> > performance analysis and autotuning easier.
>>
>> I think there would be value in giving the DBA an easier way to see
>> which tables are hot, but I am really leery about the idea of trying
>> to feed that directly into the query planner. I think this is one of
>> those cases where we let people tune it manually for starters, and
>> then wait for feedback. Eventually someone will say "oh, I never tune
>> that by hand any more, ever since I wrote this script which does the
>> following computation... and I just run it out cron". And then we
>> will get out the party hats. But we will never get the experience we
>> need to say what that auto-tuning algorithm will be unless we first
>> provide the knob for someone to fiddle with manually.
>
> It is also possible we will implement a manual way and never get around
> to automating it. :-(

You make it sound as if we know how but are just too lazy to right the
code. That is not one of the weaknesses that this community has.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>, Bruce Momjian <bruce(at)momjian(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-02-26 18:12:42
Message-ID:	45B07768-8DD5-4FC9-A82D-54E3CD40EFA6@cybertec.at
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

>>>
>>
>> Still, having more data a user can probe would be nice.
>>
>> I wonder why everyone avoids Microsoft's approach to the subject. Apparently, they go in the 'auto-tune as much as possible' direction.
>> And tests we did a while ago, involving asking team from Microsoft and a team from oracle to optimise set of queries for the same set of data (bookies data, loads of it) showed that the auto-tuning Microsoft has in their
>> sql server performed much better than a team of over-sweating oracle dba's.
>
> I don't think *anyone* is avoiding that approach. There is almost
> universal consensus here that auto-tuning is better than manual
> tuning, even to the extent of being unwilling to add knobs to allow
> manual tuning of settings we have no idea how to auto-tune and no
> plans to auto-tune.
>
>> In my current work place/camp we have many deployments of the same system, over different types of machines, each with different customer data that vary so much that queries need to be rather generic.
>> Postgresql shows its strength with planner doing a good job for different variants of data, however we do a very little tweaking to the configuration parameters. Just because it is just too hard to overlook all of them.
>> I guess that the systems could behave much better, but no one is going to tweak settings for 50 different installations over 50 different type of data and 50 different sets of hardware.
>> If there was even a tiny amount of automation provided in the postgresql, I would welcome it with open arms.
>
> What do you have in mind?
>

what we are trying to do is to explicitly store column correlations. so, a histogram for (a, b) correlation and so on.
the planner code then goes through its restrictions in the query and finds the best / longest combination it can find and which has some statistics defined.
it seems we can also do this for join selectivity and expressions. the planner code for "raw column correlation" without expression ( cos(id) or so) and joins is there (WIP, no ANALYZE support and so on so far).

i think auto tuning is a good thing to have and the door to actually do it is wide open with our approach.
all it takes is a mechanism to see which "conditions" are used how often and somebody could write a job which automatically tells the system which stats to collect / sample.
i think for an "average" user this is the most simplistic thing then. but, to get there we have to get the bloody sampling and the rest of the planner code right in the first place.
auto tuning in this area is still something which is far in the future - but at least the road to it is clear.

some people suggested some approach dealing with effective_cache_size and so on ... there are many good approaches here but they don't address the actual problem of wrong size-estimates.

many thanks,

hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>, Bruce Momjian <bruce(at)momjian(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-02-26 18:44:52
Message-ID:	AANLkTinD_vPt_d5tzGKXfJURYzq5=5mCa8K8_G1Sr8+O@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

2011/2/26 PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>:
> what we are trying to do is to explicitly store column correlations. so, a histogram for (a, b) correlation and so on.
>

The problem is that we haven't figured out how to usefully store a
histogram for <a,b>. Consider the oft-quoted example of a
<city,postal-code> -- or <city,zip code> for Americans. A histogram
of the tuple is just the same as a histogram on the city. It doesn't
tell you how much extra selectivity the postal code or zip code gives
you. And if you happen to store a histogram of <postal code, city> by
mistake then it doesn't tell you anything at all.

We need a data structure that lets us answer the bayesian question
"given a city of New York how selective is zip-code = 02139". I don't
know what that data structure would be.

Heikki and I had a wacky hand-crafted 2D histogram data structure that
I suspect doesn't actually work. And someone else did some research on
list and came up with a fancy sounding name of a statistics concept
that might be what we want.

--
greg

From:	Martijn van Oosterhout <kleptog(at)svana(dot)org>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>, Robert Haas <robertmhaas(at)gmail(dot)com>, Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>, Bruce Momjian <bruce(at)momjian(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-02-26 18:58:20
Message-ID:	20110226185819.GA22407@svana.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Feb 26, 2011 at 06:44:52PM +0000, Greg Stark wrote:
> 2011/2/26 PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>:
> > what we are trying to do is to explicitly store column correlations. so, a histogram for (a, b) correlation and so on.
>
> The problem is that we haven't figured out how to usefully store a
> histogram for <a,b>. Consider the oft-quoted example of a
> <city,postal-code> -- or <city,zip code> for Americans. A histogram
> of the tuple is just the same as a histogram on the city.

But there are cases where it can work. Frankly the example you mention
is odd because for we can't even build useful 1D histograms for <city>
and <zip code>, so the fact that 2D is hard is not surprising.

The histograms we do build work fine from > and <, just equality. The
2D will handle the same.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patriotism is when love of your own people comes first; nationalism,
> when hate for people other than your own comes first.
> - Charles de Gaulle

From:	Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-02-26 20:10:46
Message-ID:	6E349644-663B-45C9-A4CA-5E541DA5FF0E@pointblue.com.pl
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 26 Feb 2011, at 14:45, Robert Haas wrote:

> On Sat, Feb 26, 2011 at 4:33 AM, Grzegorz Jaskiewicz
>>
>
> I don't think *anyone* is avoiding that approach. There is almost
> universal consensus here that auto-tuning is better than manual
> tuning, even to the extent of being unwilling to add knobs to allow
> manual tuning of settings we have no idea how to auto-tune and no
> plans to auto-tune.
>
Perhaps one step further is required. To change some settings so that it can be auto-tuned better. There are some even more drastic steps that would have to be taken
and I believe that Microsoft engineers had to take them. Steps back. For instance, if there is an issue with inability to find out how much of a table is in the cache, perhaps postgresql should
have an option to turn off cached reads/writes completely and thus allow DBA to regulate that using the shared_buffers setting. It doesn't sound great, but if you think about it
I'm sure there are people willing to use it, if that adds a bit more auto-tunning to the server. I would even go a step further, and say that I believe that some people will
embrace it on the basis that they can constraint the amount of memory PostgreSQL uses on their server as a whole, and that includes caches.

>> In my current work place/camp we have many deployments of the same system, over different types of machines, each with different customer data that vary so much that queries need to be rather generic.
>> Postgresql shows its strength with planner doing a good job for different variants of data, however we do a very little tweaking to the configuration parameters. Just because it is just too hard to overlook all of them.
>> I guess that the systems could behave much better, but no one is going to tweak settings for 50 different installations over 50 different type of data and 50 different sets of hardware.
>> If there was even a tiny amount of automation provided in the postgresql, I would welcome it with open arms.
>
> What do you have in mind?

All I'm trying to say, that whilst you guys focus mostly on single database server installations PostgreSQL has also a great user base that use it as part of a product that is deployed on different sized machines,
and with same model but different data variation. We don't sell the product to the people and let them take care of it, but rather sell the service - you would say. But we also don't have a DBA per customer that would look solely
at the knob tweaking side of things. So my argument here is, that there isn't always a person who would know tables and databases by their characteristics and thus be able to tweak settings manually.
That probably is just a one of many examples where it makes sense, and probably their primary property is that there's no DBA overlooking whole database and thus being able to tune it.

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Rod Taylor <rod(dot)taylor(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-02-27 07:59:44
Message-ID:	201102270759.p1R7xiK27849@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Rod Taylor wrote:
> On Fri, Feb 25, 2011 at 14:26, Alvaro Herrera <alvherre(at)commandprompt(dot)com>wrote:
>
> > Excerpts from Rod Taylor's message of vie feb 25 14:03:58 -0300 2011:
> >
> > > How practical would it be for analyze to keep a record of response times
> > for
> > > given sections of a table as it randomly accesses them and generate some
> > > kind of a map for expected response times for the pieces of data it is
> > > analysing?
> >
> > I think what you want is random_page_cost that can be tailored per
> > tablespace.
> >
> >
> Yes, that can certainly help but does nothing to help with finding typical
> hot-spots or cached sections of the table and sending that information to
> the planner.
>
> Between Analyze random sampling and perhaps some metric during actual IO of
> random of queries we should be able to determine and record which pieces of
> data tend to be hot/in cache, or readily available and what data tends not
> to be.
>
>
> If the planner knew that the value "1" tends to have a much lower cost to
> fetch than any other value in the table (it is cached or otherwise readily
> available), it can choose a plan better suited toward that.

Well, one idea I have always had is feeding things the executor finds
back to the optimizer for use in planning future queries. One argument
against that is that a planned query might run with different data
behavior than seen by the executor in the past, but we know if the
optimizer is planning something for immediate execution or later
execution, so we could use executor stats only when planning for
immediate execution.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-02-27 08:01:18
Message-ID:	201102270801.p1R81IS28108@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Grzegorz Jaskiewicz wrote:
>
> On 25 Feb 2011, at 13:18, Robert Haas wrote:
>
> > People coming from Oracle are not favorably
> > impressed either by the amount of monitoring data PostgreSQL can
> > gather or by the number of knobs that are available to fix problems
> > when they occur. We don't need to have as many knobs as Oracle and we
> > probably don't want to, and for that matter we probably couldn't if we
> > did want to for lack of manpower, but that doesn't mean we should have
> > none.
>
> Still, having more data a user can probe would be nice.
>
> I wonder why everyone avoids Microsoft's approach to the subject. Apparently, they go in the 'auto-tune as much as possible' direction.
> And tests we did a while ago, involving asking team from Microsoft and a team from oracle to optimise set of queries for the same set of data (bookies data, loads of it) showed that the auto-tuning Microsoft has in their
> sql server performed much better than a team of over-sweating oracle dba's.
>
> In my current work place/camp we have many deployments of the same system, over different types of machines, each with different customer data that vary so much that queries need to be rather generic.
> Postgresql shows its strength with planner doing a good job for different variants of data, however we do a very little tweaking to the configuration parameters. Just because it is just too hard to overlook all of them.
> I guess that the systems could behave much better, but no one is going to tweak settings for 50 different installations over 50 different type of data and 50 different sets of hardware.
> If there was even a tiny amount of automation provided in the postgresql, I would welcome it with open arms.

I totally agree. If we add a tuning parameter that does 10x better than
automatic, but only 1% of our users use it, we would be better off,
overall, with the automatic tuning. See my blog post which talks about
the same tradeoff when adding configuration variables:

http://momjian.us/main/blogs/pgblog/2009.html#January_10_2009

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-02-27 08:03:10
Message-ID:	201102270803.p1R83Af28368@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas wrote:
> On Sat, Feb 26, 2011 at 1:57 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > Robert Haas wrote:
> >> > Actually, we *do* have some idea which tables are hot. ?Or at least, we
> >> > could. ? Currently, pg_stats for tables are "timeless"; they just
> >> > accumulate from the last reset, which has always been a problem in
> >> > general for monitoring. ?If we could make top-level table and index
> >> > stats time-based, even in some crude way, we would know which tables
> >> > were currently hot. ?That would also have the benefit of making server
> >> > performance analysis and autotuning easier.
> >>
> >> I think there would be value in giving the DBA an easier way to see
> >> which tables are hot, but I am really leery about the idea of trying
> >> to feed that directly into the query planner. ?I think this is one of
> >> those cases where we let people tune it manually for starters, and
> >> then wait for feedback. ?Eventually someone will say "oh, I never tune
> >> that by hand any more, ever since I wrote this script which does the
> >> following computation... and I just run it out cron". ?And then we
> >> will get out the party hats. ?But we will never get the experience we
> >> need to say what that auto-tuning algorithm will be unless we first
> >> provide the knob for someone to fiddle with manually.
> >
> > It is also possible we will implement a manual way and never get around
> > to automating it. ? :-(
>
> You make it sound as if we know how but are just too lazy to right the
> code. That is not one of the weaknesses that this community has.

Well, several automatic idea have been floated, but rejected because
they don't work well for queries that are planned and executed later.
Perhaps we should consider auto-tuning of queries that are planned for
immediate execution. I just posed that idea in an email to this thread.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-02-27 17:24:43
Message-ID:	AANLkTi=eGda=VkGQ0dMw8C2xZexDqsPnYwNruOe8eHsm@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Feb 27, 2011 at 3:01 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> Grzegorz Jaskiewicz wrote:
>>
>> On 25 Feb 2011, at 13:18, Robert Haas wrote:
>>
>> > People coming from Oracle are not favorably
>> > impressed either by the amount of monitoring data PostgreSQL can
>> > gather or by the number of knobs that are available to fix problems
>> > when they occur. We don't need to have as many knobs as Oracle and we
>> > probably don't want to, and for that matter we probably couldn't if we
>> > did want to for lack of manpower, but that doesn't mean we should have
>> > none.
>>
>> Still, having more data a user can probe would be nice.
>>
>> I wonder why everyone avoids Microsoft's approach to the subject. Apparently, they go in the 'auto-tune as much as possible' direction.
>> And tests we did a while ago, involving asking team from Microsoft and a team from oracle to optimise set of queries for the same set of data (bookies data, loads of it) showed that the auto-tuning Microsoft has in their
>> sql server performed much better than a team of over-sweating oracle dba's.
>>
>> In my current work place/camp we have many deployments of the same system, over different types of machines, each with different customer data that vary so much that queries need to be rather generic.
>> Postgresql shows its strength with planner doing a good job for different variants of data, however we do a very little tweaking to the configuration parameters. Just because it is just too hard to overlook all of them.
>> I guess that the systems could behave much better, but no one is going to tweak settings for 50 different installations over 50 different type of data and 50 different sets of hardware.
>> If there was even a tiny amount of automation provided in the postgresql, I would welcome it with open arms.
>
> I totally agree. If we add a tuning parameter that does 10x better than
> automatic, but only 1% of our users use it, we would be better off,
> overall, with the automatic tuning.

It's not an either/or proposition. There is no reason why we can't
let things be tuned automatically, but provide overrides for cases
where the automatic tuning does not work well, of which there will
always be some.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-02-27 17:44:35
Message-ID:	AANLkTinLW+m86Q=xnqpYZMbF79n7JDnDXmH6vWnVj1yn@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Feb 27, 2011 at 3:03 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>> You make it sound as if we know how but are just too lazy to right the
>> code. That is not one of the weaknesses that this community has.
>
> Well, several automatic idea have been floated, but rejected because
> they don't work well for queries that are planned and executed later.
> Perhaps we should consider auto-tuning of queries that are planned for
> immediate execution. I just posed that idea in an email to this thread.

Which ideas were rejected for that reason? If we're talking about the
idea of using the current contents of the buffer cache and perhaps the
OS cache to plan queries, I think that's not likely to work well even
if we do restrict it to queries that we're going to execute
immediately. Upthread I listed four problems with the idea of
planning queries based on the current contents of shared_buffers, and
this certainly doesn't address all four.

http://archives.postgresql.org/pgsql-hackers/2011-02/msg02206.php

To reiterate my basic theme here one more time, we have a very good
query planner, but it can fall on its face very badly when it is
unable to correctly estimate selectivity, or due to caching effects,
and we have very little to recommend to people who run afoul of those
problems right now. The problems are real, significant, and affect a
large number of users, some of whom give up on PostgreSQL as a direct
result. I am glad that we are committed to having a system that is
auto-tuning to the greatest degree possible, but I think it is very
short-sighted of us not to provide workarounds for the cases where
they are legitimately needed.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-02-27 22:17:49
Message-ID:	4D6ACD8D.2080205@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> I think there would be value in giving the DBA an easier way to see
> which tables are hot, but I am really leery about the idea of trying
> to feed that directly into the query planner. I think this is one of
> those cases where we let people tune it manually for starters, and
> then wait for feedback. Eventually someone will say "oh, I never tune
> that by hand any more, ever since I wrote this script which does the
> following computation... and I just run it out cron". And then we
> will get out the party hats. But we will never get the experience we
> need to say what that auto-tuning algorithm will be unless we first
> provide the knob for someone to fiddle with manually.

I'm not disagreeing with that. I'm saying "first, we give DBAs a way to
see which tables are currently hot". Such a feature has multiple
benefits, making it worth the overhead and/or coding effort.

Whether we're shooting for autotuning or manual tuning, it starts with
having the data.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-02-28 15:13:39
Message-ID:	AANLkTin7mV6_myCUugF5z6G17ZvmQBZDY+tWG6OTkK83@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Feb 27, 2011 at 5:17 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>
>> I think there would be value in giving the DBA an easier way to see
>> which tables are hot, but I am really leery about the idea of trying
>> to feed that directly into the query planner. I think this is one of
>> those cases where we let people tune it manually for starters, and
>> then wait for feedback. Eventually someone will say "oh, I never tune
>> that by hand any more, ever since I wrote this script which does the
>> following computation... and I just run it out cron". And then we
>> will get out the party hats. But we will never get the experience we
>> need to say what that auto-tuning algorithm will be unless we first
>> provide the knob for someone to fiddle with manually.
>
> I'm not disagreeing with that. I'm saying "first, we give DBAs a way to
> see which tables are currently hot". Such a feature has multiple
> benefits, making it worth the overhead and/or coding effort.
>
> Whether we're shooting for autotuning or manual tuning, it starts with
> having the data.

Well, what we have now is a bunch of counters in pg_stat_all_tables
and pg_statio_all_tables. Making that easier for the DBA almost seems
like more of a job for a third-party tool that, say, graphs it, than a
job for PG itself. But if you have an idea I'm ears.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Why our counters need to be time-based WAS: WIP: cross column correlation ...
Date:	2011-02-28 18:04:54
Message-ID:	4D6BE3C6.1000609@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> Well, what we have now is a bunch of counters in pg_stat_all_tables
> and pg_statio_all_tables.

Right. What I'm saying is those aren't good enough, and have never
been good enough. Counters without a time basis are pretty much useless
for performance monitoring/management (Baron Schwartz has a blog post
talking about this, but I can't find it right now).

Take, for example, a problem I was recently grappling with for Nagios.
I'd like to do a check as to whether or not tables are getting
autoanalyzed often enough. After all, autovac can fall behind, and we'd
want to be alerted of that.

The problem is, in order to measure whether or not autoanalyze is
behind, you need to count how many inserts,updates,deletes have happened
since the last autoanalyze. pg_stat_user_tables just gives us the
counters since the last reset ... and the reset time isn't even stored
in PostgreSQL.

This means that, without adding external tools like pg_statsinfo, we
can't autotune autoanalyze at all.

There are quite a few other examples where the counters could contribute
to autotuning and DBA performance monitoring if only they were
time-based. As it is, they're useful for finding unused indexes and
that's about it.

One possibility, of course, would be to take pg_statsinfo and make it
part of core. There's a couple disadvantages of that; (1) is the
storage and extra objects required, which would then require us to add
extra management routines as well. (2) is that pg_statsinfo only stores
top-level view history, meaning that it wouldn't be very adaptable to
improvements we make in system views in the future.

On the other hand, anything which increases the size of pg_statistic
would be a nightmare.

One possible compromise solution might be to implement code for the
stats collector to automatically reset the stats at a given clock
interval. If we combined this with keeping the reset time, and keeping
a snapshot of the stats from the last clock tick (and their reset time)
that would be "good enough" for most monitoring.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

From:	Martijn van Oosterhout <kleptog(at)svana(dot)org>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Why our counters need to be time-based WAS: WIP: cross column correlation ...
Date:	2011-02-28 18:19:46
Message-ID:	20110228181945.GA26061@svana.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Feb 28, 2011 at 10:04:54AM -0800, Josh Berkus wrote:
> Take, for example, a problem I was recently grappling with for Nagios.
> I'd like to do a check as to whether or not tables are getting
> autoanalyzed often enough. After all, autovac can fall behind, and we'd
> want to be alerted of that.
>
> The problem is, in order to measure whether or not autoanalyze is
> behind, you need to count how many inserts,updates,deletes have happened
> since the last autoanalyze. pg_stat_user_tables just gives us the
> counters since the last reset ... and the reset time isn't even stored
> in PostgreSQL.

The solution I use for that in to use munin to monitor everything and
let it generate alerts based on the levels. It's not great, but better
than nothing.

The problem, as you say, is that you want to now the rates rather than
the absolute values. The problem with rates is that you can get wildly
different results depending on the time interval you're looking at.

For the concrete example above, autoanalyse has to be able to
determine if there is work to do so the information must be somehwere.
I'm guessing it's not easily available? If you had a function
is_autovacuumcandidate you'd be done ofcourse.

But there's ofcourse lots of stats people want, it's just not clear how
to get them. What you really need is to store the stats every few
minutes, but that's what munin does. I doubt it's worth building RRD
like capabilities into postgres.

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Why our counters need to be time-based WAS: WIP: cross column correlation ...
Date:	2011-02-28 18:24:13
Message-ID:	AANLkTinOD6-F1_u=PTiR62yT0eNchezRLLGv=PtvES0p@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Feb 28, 2011 at 1:04 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> On the other hand, anything which increases the size of pg_statistic
> would be a nightmare.

Hmm?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Why our counters need to be time-based WAS: WIP: cross column correlation ...
Date:	2011-02-28 18:31:29
Message-ID:	4D6BEA01.1040705@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2/28/11 10:24 AM, Robert Haas wrote:
> On Mon, Feb 28, 2011 at 1:04 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>> On the other hand, anything which increases the size of pg_statistic
>> would be a nightmare.
>
> Hmm?

Like replacing each statistic with a series of time-based buckets, which
would then increase the size of the table by 5X to 10X. That was the
first solution I thought of, and rejected.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Rod Taylor <rod(dot)taylor(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-02-28 18:44:08
Message-ID:	1298918489-sup-1228@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Robert Haas's message of sáb feb 26 02:24:26 -0300 2011:
> On Fri, Feb 25, 2011 at 2:26 PM, Alvaro Herrera
> <alvherre(at)commandprompt(dot)com> wrote:
> > Excerpts from Rod Taylor's message of vie feb 25 14:03:58 -0300 2011:
> >
> >> How practical would it be for analyze to keep a record of response times for
> >> given sections of a table as it randomly accesses them and generate some
> >> kind of a map for expected response times for the pieces of data it is
> >> analysing?
> >
> > I think what you want is random_page_cost that can be tailored per
> > tablespace.
>
> We have that.

Oh, right.

> But it's not the same as tracking *sections of a table*.

I dunno. I imagine if you have a "section" of a table in different
storage than other sections, you created a tablespace and moved the
partition holding that section there. Otherwise, how do you prevent the
tuples from moving to other "sections"? (We don't really have a concept
of "sections" of a table.)

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Why our counters need to be time-based WAS: WIP: cross column correlation ...
Date:	2011-02-28 18:50:03
Message-ID:	27252.1298919003@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Josh Berkus <josh(at)agliodbs(dot)com> writes:
> On 2/28/11 10:24 AM, Robert Haas wrote:
>> On Mon, Feb 28, 2011 at 1:04 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>>> On the other hand, anything which increases the size of pg_statistic
>>> would be a nightmare.

>> Hmm?

> Like replacing each statistic with a series of time-based buckets, which
> would then increase the size of the table by 5X to 10X. That was the
> first solution I thought of, and rejected.

I think Josh is thinking of the stats collector's dump file, not
pg_statistic.

Ultimately we need to think of a reporting mechanism that's a bit
smarter than "rewrite the whole file for any update" ...

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Why our counters need to be time-based WAS: WIP: cross column correlation ...
Date:	2011-02-28 18:54:23
Message-ID:	AANLkTim==1mXdmXFveiz-OHD5JfHgzoqzp2zVKSF6j7W@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Feb 28, 2011 at 1:50 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Josh Berkus <josh(at)agliodbs(dot)com> writes:
>> On 2/28/11 10:24 AM, Robert Haas wrote:
>>> On Mon, Feb 28, 2011 at 1:04 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>>>> On the other hand, anything which increases the size of pg_statistic
>>>> would be a nightmare.
>
>>> Hmm?
>
>> Like replacing each statistic with a series of time-based buckets, which
>> would then increase the size of the table by 5X to 10X. That was the
>> first solution I thought of, and rejected.
>
> I think Josh is thinking of the stats collector's dump file, not
> pg_statistic.

Yeah.

> Ultimately we need to think of a reporting mechanism that's a bit
> smarter than "rewrite the whole file for any update" ...

Well, we have these things called "tables". Any chance of using those?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Why our counters need to be time-based WAS: WIP: cross column correlation ...
Date:	2011-02-28 19:31:53
Message-ID:	28234.1298921513@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Mon, Feb 28, 2011 at 1:50 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Ultimately we need to think of a reporting mechanism that's a bit
>> smarter than "rewrite the whole file for any update" ...

> Well, we have these things called "tables". Any chance of using those?

Having the stats collector write tables would violate the classical form
of the heisenberg principle (thou shalt avoid having thy measurement
tools affect that which is measured), not to mention assorted practical
problems like not wanting the stats collector to take locks or run
transactions.

The ideal solution would likely be for the stats collector to expose its
data structures as shared memory, but I don't think we get to do that
under SysV shmem --- it doesn't like variable-size shmem much. Maybe
that's another argument for looking harder into mmap or POSIX shmem,
although it's not clear to me how well either of those fixes that.

regards, tom lane

From:	Michael Glaesemann <grzm(at)seespotcode(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Why our counters need to be time-based WAS: WIP: cross column correlation ...
Date:	2011-02-28 19:37:23
Message-ID:	9F3F19CE-5C8E-4B1E-8D32-E8CA593452EC@seespotcode.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Feb 28, 2011, at 14:31, Tom Lane wrote:

> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Mon, Feb 28, 2011 at 1:50 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Ultimately we need to think of a reporting mechanism that's a bit
>>> smarter than "rewrite the whole file for any update" ...
>
>> Well, we have these things called "tables". Any chance of using those?
>
> Having the stats collector write tables would violate the classical form
> of the heisenberg principle (thou shalt avoid having thy measurement
> tools affect that which is measured), not to mention assorted practical
> problems like not wanting the stats collector to take locks or run
> transactions.
>
> The ideal solution would likely be for the stats collector to expose its
> data structures as shared memory, but I don't think we get to do that
> under SysV shmem --- it doesn't like variable-size shmem much. Maybe
> that's another argument for looking harder into mmap or POSIX shmem,
> although it's not clear to me how well either of those fixes that.

Spitballing here, but could sqlite be an intermediate, compromise solution?

Michael Glaesemann
grzm seespotcode net

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Michael Glaesemann <grzm(at)seespotcode(dot)net>
Cc:	pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Why our counters need to be time-based WAS: WIP: cross column correlation ...
Date:	2011-02-28 19:39:19
Message-ID:	4D6BF9E7.3060907@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> Spitballing here, but could sqlite be an intermediate, compromise solution?

For a core PostgreSQL component ?!?!?

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

From:	Euler Taveira de Oliveira <euler(at)timbira(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Why our counters need to be time-based WAS: WIP: cross column correlation ...
Date:	2011-02-28 19:47:22
Message-ID:	4D6BFBCA.8040000@timbira.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Em 28-02-2011 15:50, Tom Lane escreveu:
> Ultimately we need to think of a reporting mechanism that's a bit
> smarter than "rewrite the whole file for any update" ...
>
What about splitting statistic file per database?

--
Euler Taveira de Oliveira
http://www.timbira.com/

From:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Michael Glaesemann <grzm(at)seespotcode(dot)net>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Why our counters need to be time-based WAS: WIP: cross column correlation ...
Date:	2011-02-28 19:47:23
Message-ID:	1298922443.25659.10.camel@jd-desktop
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 2011-02-28 at 11:39 -0800, Josh Berkus wrote:
> > Spitballing here, but could sqlite be an intermediate, compromise solution?
>
> For a core PostgreSQL component ?!?!?

Sure, why not? It is ACID compliant, has the right kind of license, has
a standard API that we are all used to. It seems like a pretty decent
solution in consideration. We don't need MVCC for this problem.

>
> --
> -- Josh Berkus
> PostgreSQL Experts Inc.
> http://www.pgexperts.com
>

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	jd(at)commandprompt(dot)com
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, Michael Glaesemann <grzm(at)seespotcode(dot)net>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Why our counters need to be time-based WAS: WIP: cross column correlation ...
Date:	2011-02-28 20:02:30
Message-ID:	28931.1298923350@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Joshua D. Drake" <jd(at)commandprompt(dot)com> writes:
> On Mon, 2011-02-28 at 11:39 -0800, Josh Berkus wrote:
> Spitballing here, but could sqlite be an intermediate, compromise solution?
>>
>> For a core PostgreSQL component ?!?!?

> Sure, why not?

Because it's fifty times more mechanism than we need here? We don't
want a SQL interface (not even a lightweight one) and it's unclear that
we ever want the data to go to disk at all.

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Euler Taveira de Oliveira <euler(at)timbira(dot)com>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Why our counters need to be time-based WAS: WIP: cross column correlation ...
Date:	2011-02-28 20:03:33
Message-ID:	28962.1298923413@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Euler Taveira de Oliveira <euler(at)timbira(dot)com> writes:
> Em 28-02-2011 15:50, Tom Lane escreveu:
>> Ultimately we need to think of a reporting mechanism that's a bit
>> smarter than "rewrite the whole file for any update" ...

> What about splitting statistic file per database?

That would improve matters for some usage patterns, but I'm afraid
only a minority.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Why our counters need to be time-based WAS: WIP: cross column correlation ...
Date:	2011-02-28 20:24:41
Message-ID:	AANLkTi=7MeBzoB2ouu-wfKGTa67uJAraSonKs6NzL2r_@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Feb 28, 2011 at 2:31 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Mon, Feb 28, 2011 at 1:50 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Ultimately we need to think of a reporting mechanism that's a bit
>>> smarter than "rewrite the whole file for any update" ...
>
>> Well, we have these things called "tables". Any chance of using those?
>
> Having the stats collector write tables would violate the classical form
> of the heisenberg principle (thou shalt avoid having thy measurement
> tools affect that which is measured), not to mention assorted practical
> problems like not wanting the stats collector to take locks or run
> transactions.
>
> The ideal solution would likely be for the stats collector to expose its
> data structures as shared memory, but I don't think we get to do that
> under SysV shmem --- it doesn't like variable-size shmem much. Maybe
> that's another argument for looking harder into mmap or POSIX shmem,
> although it's not clear to me how well either of those fixes that.

Well, certainly, you could make it work with mmap() - you could
arrange a mechanism whereby anyone who tries to reference off the end
of the portion they've mapped calls stat() on the file and remaps it
at its now-increased size. But you'd need to think carefully about
locking and free-space management, which is where it starts to sound
an awful lot like you're reinventing the idea of a heap. Maybe
there's a way to design some kind of lighter weight mechanism, but the
complexity of the problem is not obviously a lot less than the general
problem of storing frequently updated tabular data.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Why our counters need to be time-based WAS: WIP: cross column correlation ...
Date:	2011-02-28 21:13:47
Message-ID:	AANLkTinvPJoqwn-d4Fd+eJ+d83TO9Si-S+x3Da2+uXM2@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Feb 28, 2011 at 6:31 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> Like replacing each statistic with a series of time-based buckets, which
> would then increase the size of the table by 5X to 10X. That was the
> first solution I thought of, and rejected.
>

I don't understand what you're talking about at all here. I think
there are a lot of unsolved problems in monitoring but the one thing I
think everyone is pretty clear on is that the right way to export
metrics like these is to export a counter and then have some external
component periodically copy the counter into some history table and
calculate the derivative, second derivative, running average of the
first derivative, etc.

What's needed here is for someone to write a good mrtg/rrd/whatever
replacement using postgres as its data store. If you're monitoring
something sensitive then you would store the data in a *different*
postgres server to avoid Tom's complaint. There may be aspects of the
job that Postgres does poorly but we can focus on improving those
parts of Postgres rather than looking for another database. And
frankly Postgres isn't that bad a tool for it -- when I did some
performance analysis recently I actually ended up loading the data
into Postgres so I could do some of the aggregations using window
functions anyways.

--
greg

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: Why our counters need to be time-based WAS: WIP: cross column correlation ...
Date:	2011-02-28 22:05:08
Message-ID:	4D6C1C14.2010909@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> I don't understand what you're talking about at all here. I think
> there are a lot of unsolved problems in monitoring but the one thing I
> think everyone is pretty clear on is that the right way to export
> metrics like these is to export a counter and then have some external
> component periodically copy the counter into some history table and
> calculate the derivative, second derivative, running average of the
> first derivative, etc.

You missed the original point of the discussion, which was to have stats
we could use for auto-tuning internally. Not to export them.

For example, there are optimizations we could make with the query
planner if we knew which tables and indexes were "hot" in general.
That's how we started this discussion, and it's not solved by storing
the stats history on another server.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

From:	Robert Treat <rob(at)xzilla(dot)net>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: Why our counters need to be time-based WAS: WIP: cross column correlation ...
Date:	2011-02-28 22:35:28
Message-ID:	AANLkTik57s6pAfRaMT85yW4Jsrjv+rxJ_mBF-+K=m-2i@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Feb 28, 2011 at 4:13 PM, Greg Stark <gsstark(at)mit(dot)edu> wrote:
> On Mon, Feb 28, 2011 at 6:31 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> What's needed here is for someone to write a good mrtg/rrd/whatever
> replacement using postgres as its data store. If you're monitoring
> something sensitive then you would store the data in a *different*
> postgres server to avoid Tom's complaint. There may be aspects of the
> job that Postgres does poorly but we can focus on improving those
> parts of Postgres rather than looking for another database. And
> frankly Postgres isn't that bad a tool for it -- when I did some
> performance analysis recently I actually ended up loading the data
> into Postgres so I could do some of the aggregations using window
> functions anyways.
>

Greg, see https://labs.omniti.com/labs/reconnoiter, but also see
Josh's nearby email about how he's trying to solve this internal to
the database.

Robert Treat
play: xzilla.net
work: omniti.com
hiring: l42.org/Lg

From:	Chris Browne <cbbrowne(at)acm(dot)org>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Re: Why our counters need to be time-based WAS: WIP: cross column correlation ...
Date:	2011-02-28 22:52:00
Message-ID:	87vd03bwmn.fsf@cbbrowne.afilias-int.info
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

josh(at)agliodbs(dot)com (Josh Berkus) writes:
>> I don't understand what you're talking about at all here. I think
>> there are a lot of unsolved problems in monitoring but the one thing
>> I think everyone is pretty clear on is that the right way to export
>> metrics like these is to export a counter and then have some external
>> component periodically copy the counter into some history table and
>> calculate the derivative, second derivative, running average of the
>> first derivative, etc.
>
> You missed the original point of the discussion, which was to have
> stats we could use for auto-tuning internally. Not to export them.
>
> For example, there are optimizations we could make with the query
> planner if we knew which tables and indexes were "hot" in general.
> That's how we started this discussion, and it's not solved by storing
> the stats history on another server.

There's value to both, and there's no dearth of monitoring frameworks
that people keep on replacing with successors, so there's certainly room
for both ;-).

Recent stuff about such...
<https://lopsa.org/content/philosophy-monitoring>
<https://labs.omniti.com/labs/reconnoiter>

I'm not quite sure what ought to be in PostgreSQL as a "built-in;" I
suspect that what's eventually needed is to be able to correlate things
across database instances, so that when Tom says, "I need to know what
data the planner's working on," the answer can be "OK, got that..."

This data is surely useful to get out of the system, so I'd bias towards
something sorta like what Greg suggests. And the closed-ended answer may
prevent us from asking more sophisticated questions, also not a notably
good thing...
--
(reverse (concatenate 'string "moc.liamg" "@" "enworbbc"))
"If tautologies do not convey information, mathematicians would not be
surprised by them."
-- Mark Miller

From:	Rod Taylor <rod(dot)taylor(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject:	Re: WIP: cross column correlation ...
Date:	2011-03-01 00:18:43
Message-ID:	AANLkTinKiVHRzBpp3JMthXUUQ58jM3REZ5P7PSREEwY-@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> > But it's not the same as tracking *sections of a table*.
>
> I dunno. I imagine if you have a "section" of a table in different
> storage than other sections, you created a tablespace and moved the
> partition holding that section there. Otherwise, how do you prevent the
> tuples from moving to other "sections"? (We don't really have a concept
> of "sections" of a table.)
>
>
Section could be as simple as being on the inner or outer part of a single
disk, or as complicated as being on the SSD cache of a spinning disk, or in
the multi-gigabyte cache on the raid card or SAN due to being consistently
accessed.

Section is the wrong word. If primary key values under 10 million are
consistently accessed, they will be cached even if they do get moved through
the structure. Values over 10M may be fast if on the same page as the other
value but probably aren't.

This is very evident when dealing with time based data in what can be a very
large structure. 1% may be very hot and in memory while 99% is not.

Partitioning only helps if you can predict what will be hot in the future.
Sometimes an outside source (world events) impacts what section of the
structure is hot.

regards,

Rod

From:	Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Why our counters need to be time-based WAS: WIP: cross column correlation ...
Date:	2011-03-01 08:50:25
Message-ID:	m2k4gj9qcu.fsf@2ndQuadrant.fr
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> The ideal solution would likely be for the stats collector to expose its
> data structures as shared memory, but I don't think we get to do that
> under SysV shmem --- it doesn't like variable-size shmem much. Maybe
> that's another argument for looking harder into mmap or POSIX shmem,
> although it's not clear to me how well either of those fixes that.

We could certainly use message passing style atop pgpipe.c here, right?

After all we already have a protocol and know how to represent complex
data structure in there, and all components of PostgreSQL should be able
to alleviate this, I'd think. Or this fever ain't really gone yet :)

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

From:	Bernd Helmle <mailings(at)oopsware(dot)de>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, jd(at)commandprompt(dot)com
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, Michael Glaesemann <grzm(at)seespotcode(dot)net>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Why our counters need to be time-based WAS: WIP: cross column correlation ...
Date:	2011-03-02 12:29:59
Message-ID:	0AE678865F4AF306819AF4E4@[172.26.14.62]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

--On 28. Februar 2011 15:02:30 -0500 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Because it's fifty times more mechanism than we need here? We don't
> want a SQL interface (not even a lightweight one) and it's unclear that
> we ever want the data to go to disk at all.

I wonder wether a library like librrd would be a solution for this.

--
Thanks

Bernd