Quick Links

Re: Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan

Lists:	pgsql-hackers

From:	Hannu Krosing <hannu(at)tm(dot)ee>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan
Date:	2000-08-20 11:38:08
Message-ID:	399FC320.AFA7A99B@tm.ee
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

It seems that optimiser is unaware that currval('seq') can be treated as
a constant within
an expression and thus produces suboptimal plans for WHERE clauses that
use currval
thus using a seq scan instead of index scan.

Is it possible (planned) to mark functions as returning a constant when
given a constant
argument and start using it _as a constant_ (pre-evaluated) in queries

The current behaviour is not very intuitive.

------------
Hannu

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Hannu Krosing <hannu(at)tm(dot)ee>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan
Date:	2000-08-20 17:25:29
Message-ID:	10393.966792329@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hannu Krosing <hannu(at)tm(dot)ee> writes:
> It seems that optimiser is unaware that currval('seq') can be treated
> as a constant within an expression and thus produces suboptimal plans
> for WHERE clauses that use currval thus using a seq scan instead of
> index scan.

currval() does not qualify to be marked cachable, since it does not
always return the same result given the same arguments.

There are a few functions that are not cachable but could be treated
as constants within a single transaction, now() being the most obvious
example. Currently there is no intermediate function type between
"cachable" and "noncachable" but I have toyed with the idea of inventing
one. Getting the semantics right could be tricky however.

However, even if we had a concept of "constant within a transaction/
scan/whatever", currval() would not qualify --- what if there is a
nextval() being invoked somewhere else in the query, possibly inside a
user-defined function where the optimizer has no chance of seeing it?

In short, there is no way of optimizing currval() in the way you want
without risking breakage.

For interactive queries you could fake the behavior you want by creating
a user-defined function that just calls currval(), and then marking this
function cachable. Don't try calling such a function inside a SQL or
plpgsql function however, or you will be burnt by premature constant-
folding. Basically, this technique leaves it in your hands to determine
whether the optimization is safe.

regards, tom lane

From:	Tiago Antão <tra(at)fct(dot)unl(dot)pt>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan
Date:	2000-08-20 17:26:03
Message-ID:	Pine.LNX.4.21.0008201449250.22955-100000@eros.si.fct.unl.pt
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi!

On Sun, 20 Aug 2000, Hannu Krosing wrote:

> It seems that optimiser is unaware that currval('seq') can be treated as
> a constant within
> an expression and thus produces suboptimal plans for WHERE clauses that
> use currval
> thus using a seq scan instead of index scan.
>
> Is it possible (planned) to mark functions as returning a constant when
> given a constant
> argument and start using it _as a constant_ (pre-evaluated) in queries

Just one question regrarding this:

Suppose you have
select ... where x in (select currval('seq')) and y in (select
nextval('seq'))....

What's the precise semantics of this? Should there be any precise
semantics? Whats the order of execution? currval before or after
nextval? It seems to me that the declarative nature of SQL makes that no
order whatsoever should be assumed...

In the case of uncorrelated queries, there is the option of
materializing (which I think - after looking at the code - that pg does
not use) the subqueries results as there is no need to recompute them. In
this case materializing vs re-executing seems to cause a semantinc
difference because in mater there is only one execution of nextval and in
reexecution nextval is executed unknown number of times.

If all this as pre-evaluated this last problem would disapear.

Side-effects, side-effects, ...

Best regards,
Tiago
PS - I'm starting the thesis part of a MSc which will be about query
optimization in pg. Here the thesis part of the MSc takes arround one
year, so at least for the next year I'll try to work hard on pg.

From:	Don Baccus <dhogaza(at)pacifier(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Hannu Krosing <hannu(at)tm(dot)ee>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan
Date:	2000-08-20 17:48:44
Message-ID:	3.0.1.32.20000820104844.014ceb40@mail.pacifier.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At 01:25 PM 8/20/00 -0400, Tom Lane wrote:

>However, even if we had a concept of "constant within a transaction/
>scan/whatever", currval() would not qualify --- what if there is a
>nextval() being invoked somewhere else in the query, possibly inside a
>user-defined function where the optimizer has no chance of seeing it?
>
>In short, there is no way of optimizing currval() in the way you want
>without risking breakage.

Does Postgres guarantee order of execution of functions? Many languages
don't other than to follow precedence rules, which is why functions with
side-effects (such as nextval) can yield "implementation defined" or
whatever results.

My point is that such queries may not yield predictable results. Perhaps
today due to left-right execution order or the like, but do you want to
guarantee a defined order of execution in the future?

- Don Baccus, Portland OR <dhogaza(at)pacifier(dot)com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Tiago Antão <tra(at)fct(dot)unl(dot)pt>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan
Date:	2000-08-20 18:03:15
Message-ID:	10706.966794595@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

=?iso-8859-1?Q?Tiago_Ant=E3o?= <tra(at)fct(dot)unl(dot)pt> writes:
> PS - I'm starting the thesis part of a MSc which will be about query
> optimization in pg. Here the thesis part of the MSc takes arround one
> year, so at least for the next year I'll try to work hard on pg.

Cool! Please keep us posted on what you're doing or thinking about
doing, so that there's not duplicate or wasted effort.

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Don Baccus <dhogaza(at)pacifier(dot)com>
Cc:	Hannu Krosing <hannu(at)tm(dot)ee>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan
Date:	2000-08-20 18:18:39
Message-ID:	10775.966795519@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Don Baccus <dhogaza(at)pacifier(dot)com> writes:
> Does Postgres guarantee order of execution of functions?

No, and I don't recall having seen anything about it in the SQL spec
either. If you were doing something like

select foo, nextval('seq') from tab where bar < currval('seq')

then there's no issue of "order of evaluation" per se: nextval will be
evaluated at just those rows where the WHERE clause has already
succeeded. However, the results would still depend on the order in
which tuples are scanned, an order which is most definitely not
guaranteed by the spec nor by our implementation. (Also, in a
pipelined implementation it's conceivable that the WHERE clause would
get evaluated for additional tuples before nextval has been evaluated
at a matching tuple.)

However, that just shows that some patterns of usage of the function
will yield unpredictable results. I don't think that translates to an
argument that the optimizer is allowed to make semantics-altering
transformations...

regards, tom lane

From:	Hannu Krosing <hannu(at)tm(dot)ee>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Don Baccus <dhogaza(at)pacifier(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan
Date:	2000-08-20 21:22:11
Message-ID:	39A04C03.3CEF6385@tm.ee
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
>
> Don Baccus <dhogaza(at)pacifier(dot)com> writes:
> > Does Postgres guarantee order of execution of functions?
>
> No, and I don't recall having seen anything about it in the SQL spec
> either. If you were doing something like
>
> select foo, nextval('seq') from tab where bar < currval('seq')
>
> then there's no issue of "order of evaluation" per se: nextval will be
> evaluated at just those rows where the WHERE clause has already
> succeeded. However, the results would still depend on the order in
> which tuples are scanned, an order which is most definitely not
> guaranteed by the spec nor by our implementation. (Also, in a
> pipelined implementation it's conceivable that the WHERE clause would
> get evaluated for additional tuples before nextval has been evaluated
> at a matching tuple.)
>
> However, that just shows that some patterns of usage of the function
> will yield unpredictable results. I don't think that translates to an
> argument that the optimizer is allowed to make semantics-altering
> transformations...

IMHO, if semantics in undefined then altering it should be OK, no?

What I mean is that there is no safe use of nextval and currval in the
same sql sentence, even if it is used automatically, as in "DEFAULT
NEXTVAL('S')"
and thus marking it as constant is as correct as not marking it, only
more predictable.

And predictability is GOOD ;)

I would even suggest that PG would warn about or even refuse to run
queries
that have both nextval and curval of the same sequence inside them
(and pre-evaluate nextval) as only that case has _any_ predictability.

------------
Hannu

From:	Hannu Krosing <hannu(at)tm(dot)ee>
To:	Don Baccus <dhogaza(at)pacifier(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Optimisation deficiency: currval('seq')-->seqscan, constant-->index scan
Date:	2000-08-20 23:27:54
Message-ID:	39A0697A.F5C6C61F@tm.ee
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Don Baccus wrote:
>
> At 02:18 PM 8/20/00 -0400, Tom Lane wrote:
>
> >However, that just shows that some patterns of usage of the function
> >will yield unpredictable results. I don't think that translates to an
> >argument that the optimizer is allowed to make semantics-altering
> >transformations...
>
> Very much depends on the language spec, if such usage is "implementation
> defined" you can do pretty much whatever you want. Agressive optimizers
> in traditional compilers take advantage of this.
>
> In the case given, though, there's no particular reason why an application
> can't grab "currval()" and then use the value returned in the subsequent
> query.
>
> On the other hand, heuristics like "if there's no nextval() in the
> query, then currval() can be treated as a constant" are very common in
> the traditional compiler world, too ...

And it seems to me that even if there are both nextval and curval we can
crab "curval()" and use the value returned.

It will fail if we have no preceeding nextval inside the session, but so
will
any other plan that tries to evaluate currval before nextval.

So I don't see that we would be violating the spec more by marking
currval as const than by not doing so.

And we do get faster queries, even for the weird queres with undefined
behaviour ;)

---------------
Hannu

From:	Don Baccus <dhogaza(at)pacifier(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Hannu Krosing <hannu(at)tm(dot)ee>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan
Date:	2000-08-20 23:49:57
Message-ID:	3.0.1.32.20000820164957.014d2d60@mail.pacifier.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At 02:18 PM 8/20/00 -0400, Tom Lane wrote:

>However, that just shows that some patterns of usage of the function
>will yield unpredictable results. I don't think that translates to an
>argument that the optimizer is allowed to make semantics-altering
>transformations...

Very much depends on the language spec, if such usage is "implementation
defined" you can do pretty much whatever you want. Agressive optimizers
in traditional compilers take advantage of this.

In the case given, though, there's no particular reason why an application
can't grab "currval()" and then use the value returned in the subsequent
query.

On the other hand, heuristics like "if there's no nextval() in the
query, then currval() can be treated as a constant" are very common in
the traditional compiler world, too ...

- Don Baccus, Portland OR <dhogaza(at)pacifier(dot)com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.

From:	Tiago Antão <tra(at)fct(dot)unl(dot)pt>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan
Date:	2000-08-21 12:18:38
Message-ID:	Pine.LNX.4.21.0008211252300.24628-100000@eros.si.fct.unl.pt
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi!

On Sun, 20 Aug 2000, Tom Lane wrote:

> Cool! Please keep us posted on what you're doing or thinking about
> doing, so that there's not duplicate or wasted effort.

I'm starting to look at pg code, so some comments that follow can be
completly dumb or useless :-).
One thing it might be interesting (please tell me if you think
otherwise) would be to improve pg with better statistical information, by
using, for example, histograms. With this probably better estimations
could be done. I think I'd like to do this (I've not looked the code
thoughly at this stage, so it could be really a bad idea...)...
BTW, I'm open to suggestions, if you think there is something that would
be good to be on the pg optimizer, please tell me.

I'd like also to comment on a matter that is on
http://www.postgresql.org/docs/pgsql/doc/TODO.detail/optimizer, regarding
vacuum and analyze beeing together: I think it is a good idea for them to
be together because if there is no vacuum then optimizer should also model
the "clusterization" of tables, a seqscan on a highly unclustered table
could be very expensive. There is a good article regarding this:
http://www.db2mag.com/summer00/programmer.shtml section "use the index any
way that works best or use it the normal way?"

Best regards,
Tiago

From:	Tiago Antão <tra(at)fct(dot)unl(dot)pt>
To:	Hannu Krosing <hannu(at)tm(dot)ee>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan
Date:	2000-08-21 12:34:49
Message-ID:	Pine.LNX.4.21.0008211319340.24628-100000@eros.si.fct.unl.pt
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi!

On Mon, 21 Aug 2000, Hannu Krosing wrote:

> And predictability is GOOD ;)
>
> I would even suggest that PG would warn about or even refuse to run
> queries
> that have both nextval and curval of the same sequence inside them
> (and pre-evaluate nextval) as only that case has _any_ predictability.

Isn't the problem more general than just nextval? Any user defined
function with side-effects would be a problematic one... plus a user
defined function might not be constant:
select ... from ... where x in (select side_effects(x) ...)
On correlated subqueries there is no guarantee of being constant.

In Prolog, which is a declarative language with some similarities to
relational algebra the ideia is: "if you use predicates with side effects,
then you're on your own".

Tiago
PS - Apologies for any irrelevant comment, I'm just starting to look to pg
code.

From:	Don Baccus <dhogaza(at)pacifier(dot)com>
To:	Tiago Antão <tra(at)fct(dot)unl(dot)pt>, Hannu Krosing <hannu(at)tm(dot)ee>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan
Date:	2000-08-21 14:14:38
Message-ID:	3.0.1.32.20000821071438.014e07c0@mail.pacifier.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At 01:34 PM 8/21/00 +0100, Tiago Antão wrote:
>Hi!
>
>On Mon, 21 Aug 2000, Hannu Krosing wrote:
>
>> And predictability is GOOD ;)
>>
>> I would even suggest that PG would warn about or even refuse to run
>> queries
>> that have both nextval and curval of the same sequence inside them
>> (and pre-evaluate nextval) as only that case has _any_ predictability.
>
>
> Isn't the problem more general than just nextval?

Yes, it is.

- Don Baccus, Portland OR <dhogaza(at)pacifier(dot)com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Tiago Antão <tra(at)fct(dot)unl(dot)pt>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan
Date:	2000-08-21 14:37:29
Message-ID:	1731.966868649@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

=?iso-8859-1?Q?Tiago_Ant=E3o?= <tra(at)fct(dot)unl(dot)pt> writes:
> One thing it might be interesting (please tell me if you think
> otherwise) would be to improve pg with better statistical information, by
> using, for example, histograms.

Yes, that's been on the todo list for a while.

> There is a good article regarding this:
> http://www.db2mag.com/summer00/programmer.shtml

Interesting article. We do most of what she talks about, but we don't
have anything like the ClusterRatio statistic. We need it --- that was
just being discussed a few days ago in another thread. Do you have any
reference on exactly how DB2 defines that stat?

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Tiago Antão <tra(at)fct(dot)unl(dot)pt>
Cc:	Hannu Krosing <hannu(at)tm(dot)ee>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan
Date:	2000-08-21 14:50:39
Message-ID:	1823.966869439@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

=?iso-8859-1?Q?Tiago_Ant=E3o?= <tra(at)fct(dot)unl(dot)pt> writes:
> Isn't the problem more general than just nextval?

Yes it is, and that's why I'm not very excited about the idea of
adding special-case logic for nextval/currval into the optimizer.

It's fairly easy to get around this problem in plpgsql, eg

declare x int;
begin
x := currval('seq');
return f1 from foo where seqfld = x;

so I really am going to resist suggestions that the optimizer should
make invalid assumptions about currval by itself ...

regards, tom lane

From:	Hannu Krosing <hannu(at)tm(dot)ee>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Tiago Antão <tra(at)fct(dot)unl(dot)pt>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan
Date:	2000-08-21 15:32:24
Message-ID:	39A14B88.30C1ACA8@tm.ee
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
>
> =?iso-8859-1?Q?Tiago_Ant=E3o?= <tra(at)fct(dot)unl(dot)pt> writes:
> > Isn't the problem more general than just nextval?
>
> Yes it is, and that's why I'm not very excited about the idea of
> adding special-case logic for nextval/currval into the optimizer.
>
> It's fairly easy to get around this problem in plpgsql,

it is, once you know that psql implements volatile currval ;)

> eg
>
> declare x int;
> begin
> x := currval('seq');
> return f1 from foo where seqfld = x;
>
> so I really am going to resist suggestions that the optimizer should
> make invalid assumptions about currval by itself ...

Why is assuming a constant currval any more "invalid" than not doing so ?

As the execution order of functions is undefined, can't we safely state that
all
currval's are evaluated first, before any other functions that could change
its return value ?

currval is not like random which changes its value without any external
reason.

Afaik, assuming it to return a constant within a single query is at least as
correct as not doing so, only more predictable.

----------------
Hannu

From:	Hannu Krosing <hannu(at)tm(dot)ee>
To:	Tiago Antão <tra(at)fct(dot)unl(dot)pt>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan
Date:	2000-08-21 15:37:38
Message-ID:	39A14CC2.7F932537@tm.ee
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tiago Antão wrote:
>
> Hi!
>
> On Mon, 21 Aug 2000, Hannu Krosing wrote:
>
> > And predictability is GOOD ;)
> >
> > I would even suggest that PG would warn about or even refuse to run
> > queries
> > that have both nextval and curval of the same sequence inside them
> > (and pre-evaluate nextval) as only that case has _any_ predictability.
>
> Isn't the problem more general than just nextval? Any user defined
> function with side-effects would be a problematic one... plus a user
> defined function might not be constant:
> select ... from ... where x in (select side_effects(x) ...)
> On correlated subqueries there is no guarantee of being constant.
>
> In Prolog, which is a declarative language with some similarities to
> relational algebra the ideia is: "if you use predicates with side effects,
> then you're on your own".

And you are probably even worse off in SQL where the query plan changes as
tables are filled up, so you can't even find out what will happen by testing.

with currval marked as constant I would at least know about what currval
will return.

---------------
Hannu

From:	Tiago Antão <tra(at)fct(dot)unl(dot)pt>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan
Date:	2000-08-21 15:48:08
Message-ID:	Pine.LNX.4.21.0008211626250.25226-100000@eros.si.fct.unl.pt
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 21 Aug 2000, Tom Lane wrote:

> > One thing it might be interesting (please tell me if you think
> > otherwise) would be to improve pg with better statistical information, by
> > using, for example, histograms.
>
> Yes, that's been on the todo list for a while.

If it's ok and nobody is working on that, I'll look on that subject.
I'll start by looking at the analize portion of vacuum. I'm thinking in
using arrays for the histogram (I've never used the array data type of
postgres).
Should I use 7.0.2 or the cvs version?

> Interesting article. We do most of what she talks about, but we don't
> have anything like the ClusterRatio statistic. We need it --- that was
> just being discussed a few days ago in another thread. Do you have any
> reference on exactly how DB2 defines that stat?

I don't remember seeing that information spefically. From what I've
read I can speculate:

1. They have clusterratios for both indexes and the relation itself.
2. They might use an index even if there is no "order by" if the table
has a low clusterratio: just to get the RIDs, then sort the RIDs and
fetch.
3. One possible way to calculate this ratio:
a) for tables
SeqScan
if tuple points to a next tuple on the same page then its
"good"
ratio = # good tuples / # all tuples
b) for indexes (high speculation ratio here)
foreach pointed RID in index
if RID is in same page of next RID in index than mark as
"good"

I suspect that if a tuple size is big (relative to page size) than the
cluster ratio is always low.

A tuple might also be "good" if it pointed to the next page.

Tiago

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Hannu Krosing <hannu(at)tm(dot)ee>
Cc:	Tiago Antão <tra(at)fct(dot)unl(dot)pt>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan
Date:	2000-08-21 19:07:32
Message-ID:	5020.966884852@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hannu Krosing <hannu(at)tm(dot)ee> writes:
> Why is assuming a constant currval any more "invalid" than not doing so ?

Because it's wrong: it changes the behavior from what happens if the
optimizer does not do anything special with the function.

The fact that some cases involving currval+nextval (but not all) yield
unpredictable results is not an adequate argument for causing the
behavior of other cases to change. Especially not when there's a
perfectly good way for you to make it do what you want...

regards, tom lane

From:	Hannu Krosing <hannu(at)tm(dot)ee>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Tiago Antão <tra(at)fct(dot)unl(dot)pt>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan
Date:	2000-08-21 20:31:58
Message-ID:	39A191BE.158B857D@tm.ee
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
>
> Hannu Krosing <hannu(at)tm(dot)ee> writes:
> > Why is assuming a constant currval any more "invalid" than not doing so ?
>
> Because it's wrong: it changes the behavior from what happens if the
> optimizer does not do anything special with the function.

Optimizer already does "something special" regarding the function - it
decides the
order of execution of rows, and when both currval and nextval are
present it changes
the end result by doing so. If only currval is present currval is
constant.

But the case when "optimiser does not do anything with the function" is
completely
unpredictable in face of optimiser changing the order of things getting
scanned,
columns getting scanned and functions getting evaluated.

And I'm somewhat suspicious that we have any regression tests that are
dependent
of left-to-right or top-to-bottom execution of functions.

> The fact that some cases involving currval+nextval (but not all)

Could you give me a good example of currval+nextval that has a
SQL[92/99]-defined
result, or even a predictable result?

> yield unpredictable results is not an adequate argument for causing the
> behavior of other cases to change.

Are not all the other cases returning "undefined" (by the standard)
results ?

I mean that the fact that a seasoned pg coder has a feel for what will
happen
for some combination of nextval/currval for some combinations of indexes
and table
sizes does not make even his assumptions always right or future-proof.

> Especially not when there's a perfectly good way for you to make it do what you want...

You mean marking it const in my personal copy of pgsql ? ;)

I did

update pg_proc set proiscachable='t' where proname = 'currval';

And now it seems to do the right thing -

amphora2=# explain
amphora2-# select * from item where item_id = currval('item_id_seq');
NOTICE: QUERY PLAN:

Index Scan using item_pkey on item (cost=0.00..2.03 rows=1 width=140)

- Thanks.

Do you know of any circumstances where I would get _wrong_ answers by
doing the above ?
By wrong I mean really wrong, not just different from the case where
proiscachable='f'.

Can I now trust the optimiser to always pre-evalueate the currval() or
are there some
circumstances where the behaviour is still unpredictable ?

PS. I would not call plpgsql or temporary tables a perfectly good way ?
Plpgsql is not even installed by default (on linux at least).

-------------
Hannu

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Hannu Krosing <hannu(at)tm(dot)ee>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan
Date:	2000-08-21 22:30:31
Message-ID:	27803.966897031@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hannu Krosing <hannu(at)tm(dot)ee> writes:
> Tom Lane wrote:
>> The fact that some cases involving currval+nextval (but not all)

> Could you give me a good example of currval+nextval that has a
> SQL[92/99]-defined result, or even a predictable result?

currval & nextval aren't in the SQL standard, so asking for a standard-
defined result is rather pointless. However, it's certainly possible to
imagine cases where the result is predictable. For example,

UPDATE table SET dataval = foo, seqval = nextval('seq')
WHERE seqval = currval('seq')

is predictable if the seqval column is unique. Admittedly in that case
it wouldn't matter whether we pre-evaluated currval or not. But you'd
have to be very careful about what you mean by "pre-evaluation". For
example, the above could be executed many times within one interactive
query --- say, it could be executed inside a trigger function that's
fired multiple times by an interactive SELECT. Then the results will
change depending on just when you pre-evaluate currval. That's why I'd
rather leave it to the user to evaluate currval separately if he wants
pre-evaluation. That way the user can control what happens. If we
hard-wire an overly-optimistic pre-evaluation policy into the optimizer
then that policy will be wrong for some applications.

>> Especially not when there's a perfectly good way for you to make it do what you want...

> You mean marking it const in my personal copy of pgsql ? ;)

No, I meant putting a pre-evaluation into a plpgsql function, as I
illustrated earlier in this thread.

> Do you know of any circumstances where I would get _wrong_ answers by
> doing the above ?

I already told you earlier in this thread: it will fail inside sql or
plpgsql functions, because the optimizer will freeze the value of the
allegedly constant function sooner than you want, ie during first
execution of the sql/plpgsql function (assuming the input argument looks
like a constant, of course).

regards, tom lane

From:	Hannu Krosing <hannu(at)tm(dot)ee>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan
Date:	2000-08-22 13:57:05
Message-ID:	39A286B1.4653C17D@tm.ee
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
>
> Hannu Krosing <hannu(at)tm(dot)ee> writes:
> > Tom Lane wrote:
> >> The fact that some cases involving currval+nextval (but not all)
>
> > Could you give me a good example of currval+nextval that has a
> > SQL[92/99]-defined result, or even a predictable result?
>
> currval & nextval aren't in the SQL standard,

Are sequences in SQL standard at all ?

If they are, how are they used ?

> so asking for a standard-defined result is rather pointless.
> However, it's certainly possible to
> imagine cases where the result is predictable. For example,
>
> UPDATE table SET dataval = foo, seqval = nextval('seq')
> WHERE seqval = currval('seq')
>
> is predictable if the seqval column is unique.

And if no triggers/rules use nextval('seq') ...

And it is also dependent on optimiser decisions, like order of scanning
the tuples - for seq being at 10 and sequval in 10,11,12,13,14
it can either update 1 or 5 tuples depending on the order of scanning the
tuples.

What I'm trying to say is that using currval/nextval in the same query is
inherently
undefined if we assume that currval means anything else than the value of
sequence
at the start of query

> Admittedly in that case
> it wouldn't matter whether we pre-evaluated currval or not. But you'd
> have to be very careful about what you mean by "pre-evaluation".

What I would want is currval always return the value of sequence at the
start of current transaction.

If I need anything more complex I'd use pgplsql and save the value of
nextval()

I _don't_ want to use plpgsql for the simple case.

> For example, the above could be executed many times within one interactive
> query --- say, it could be executed inside a trigger function that's
> fired multiple times by an interactive SELECT. Then the results will
> change depending on just when you pre-evaluate currval. That's why I'd
> rather leave it to the user to evaluate currval separately if he wants
> pre-evaluation. That way the user can control what happens. If we
> hard-wire an overly-optimistic pre-evaluation policy into the optimizer
> then that policy will be wrong for some applications.
>
> >> Especially not when there's a perfectly good way for you to make it do what you want...
>
> > You mean marking it const in my personal copy of pgsql ? ;)
>
> No, I meant putting a pre-evaluation into a plpgsql function, as I
> illustrated earlier in this thread.

That implies that I have to install plpgsql and probably also need to be
in transaction and also to use a function instead of query which is somewhat
painful to do interactively

> > Do you know of any circumstances where I would get _wrong_ answers by
> > doing the above ?
>
> I already told you earlier in this thread: it will fail inside sql or
> plpgsql functions, because the optimizer will freeze the value of the
> allegedly constant function sooner than you want, ie during first
> execution of the sql/plpgsql function (assuming the input argument looks
> like a constant, of course).

I want curval to freeze the value at the beginning of query ;)

Other people may want it to do something else.

Could we add an additional function with strictly defined behaviour of
returning the value of a sequence at the beginning of current query, perhaps
called ccurval()

Would defining an additional function and marking it cacheable do the trick or
can such a function also return wrong data under some circumstances.

--------------------
Hannu

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Hannu Krosing <hannu(at)tm(dot)ee>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan
Date:	2000-08-23 04:46:13
Message-ID:	26387.967005973@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hannu Krosing <hannu(at)tm(dot)ee> writes:
> Could we add an additional function with strictly defined behaviour of
> returning the value of a sequence at the beginning of current query, perhaps
> called ccurval()

Not unless you want the system to run around and read the current value
of *every* sequence object at the start of *every* transaction, as
insurance against the possibility that some bit of code might ask for
the value of ccurval('foo') at some point in the transaction.

This state-saving could doubtless be optimized away to some extent,
but quite frankly I don't feel a strong need to work on it. You haven't
yet presented any compelling reason why we should care deeply about the
performance of WHERE bar = currval('foo') --- how many people want to do
that? Even more to the point, why is this so important that we should
care about making it fast with absolutely no help from the user? I have
a hard time accepting an "I won't use plpgsql" argument. There are many
more pressing performance problems on my to-do list, most of them with
no such easy workaround.

regards, tom lane

From:	Hannu Krosing <hannu(at)tm(dot)ee>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan
Date:	2000-08-23 06:03:43
Message-ID:	39A3693F.A0A53575@tm.ee
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
>
> Hannu Krosing <hannu(at)tm(dot)ee> writes:
> > Could we add an additional function with strictly defined behaviour of
> > returning the value of a sequence at the beginning of current query, perhaps
> > called ccurval()
>
> Not unless you want the system to run around and read the current value
> of *every* sequence object at the start of *every* transaction, as
> insurance against the possibility that some bit of code might ask for
> the value of ccurval('foo') at some point in the transaction.
>
> This state-saving could doubtless be optimized away to some extent,
> but quite frankly I don't feel a strong need to work on it. You haven't
> yet presented any compelling reason why we should care deeply about the
> performance of WHERE bar = currval('foo') --- how many people want to do
> that?

Probably not many. It just happened that I had to optimise some code
that used it
a lot and it took me some time to figure out why it does a sequential
scan when index
scan would be orders of magnitude faster.

> Even more to the point, why is this so important that we should
> care about making it fast with absolutely no help from the user?

Because it would be very easy to do by marking curval as cacheable.

As I demonstrated to you earlier, using nextval and currval in the same
query is
inherently unsafe and anyone doing it deserves the consequences ;)

Thus making curval cacheable just replaces almost completely
undeterministic
behaviour with another, more predictable and arguably more "correct"
behaviour and
also makes life easier for people programming in pure SQL.

> I have a hard time accepting an "I won't use plpgsql" argument.

I probably will at some point, but I'd much more like the simple case to
be
fast by default.

BTW, did the fmgr update mend the problem with pl functions taking only
8/16 arguments ?

> There are many more pressing performance problems on my to-do list,
> most of them with no such easy workaround.

Sure. This one would just be soo easy to fix ;)

----------
Hannu

From:	Jules Bean <jules(at)jellybean(dot)co(dot)uk>
To:	Tiago Ant?o <tra(at)fct(dot)unl(dot)pt>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan
Date:	2000-08-23 12:34:19
Message-ID:	20000823133418.F17510@grommit.office.vi.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Aug 21, 2000 at 04:48:08PM +0100, Tiago Ant?o wrote:
> On Mon, 21 Aug 2000, Tom Lane wrote:
>
> > > One thing it might be interesting (please tell me if you think
> > > otherwise) would be to improve pg with better statistical information, by
> > > using, for example, histograms.
> >
> > Yes, that's been on the todo list for a while.
>
> If it's ok and nobody is working on that, I'll look on that subject.
> I'll start by looking at the analize portion of vacuum. I'm thinking in
> using arrays for the histogram (I've never used the array data type of
> postgres).

Apologies if this is naive; I don't understand the details of the
optimisation you are discussing. However, I have an optimisation of
my own in mind which might be related.

I have in a table a 'category' column which takes a small number of
(basically fixed) values. Here by 'small', I mean ~1000, while the
table itself has ~10 000 000 rows. Some categories have many, many
more rows than others. In particular, there's one category which hits
over half the rows. Because of this (AIUI) postgresql assumes
that the query

select ... from thistable where category='something'

is best served by a seqscan, even though there is an index on
category. I assume this is because it calculates the 'average' number
of rows per category, and it's too high for an index to be useful.

In fact, for lots of values of 'something' in the query above, and
index scan would be /much/ faster. Many categories have (obviously,
since there's ~1000 of them) less that 0.1% of the rows, and an index
scan would be much faster. [I checked this with set
enable_seqscan=off, FWIW].

I don't quite know what statistics should be collected here, but
something would be useful...

Jules

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Jules Bean <jules(at)jellybean(dot)co(dot)uk>
Cc:	Tiago Ant?o <tra(at)fct(dot)unl(dot)pt>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan
Date:	2000-08-23 14:30:30
Message-ID:	27971.967041030@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Jules Bean <jules(at)jellybean(dot)co(dot)uk> writes:
> I have in a table a 'category' column which takes a small number of
> (basically fixed) values. Here by 'small', I mean ~1000, while the
> table itself has ~10 000 000 rows. Some categories have many, many
> more rows than others. In particular, there's one category which hits
> over half the rows. Because of this (AIUI) postgresql assumes
> that the query
> select ... from thistable where category='something'
> is best served by a seqscan, even though there is an index on
> category.

Yes, we know about that one. We have stats about the most common value
in a column, but no information about how the less-common values are
distributed. We definitely need stats about several top values not just
one, because this phenomenon of a badly skewed distribution is pretty
common.

BTW, if your highly-popular value is actually a dummy value ('UNKNOWN'
or something like that), a fairly effective workaround is to replace the
dummy entries with NULL. The system does account for NULLs separately
from real values, so you'd then get stats based on the most common
non-dummy value.

regards, tom lane

From:	Tiago Antão <tra(at)fct(dot)unl(dot)pt>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Jules Bean <jules(at)jellybean(dot)co(dot)uk>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan
Date:	2000-08-23 15:03:42
Message-ID:	Pine.LNX.4.21.0008231543340.4273-100000@eros.si.fct.unl.pt
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi!

On Wed, 23 Aug 2000, Tom Lane wrote:

> Yes, we know about that one. We have stats about the most common value
> in a column, but no information about how the less-common values are
> distributed. We definitely need stats about several top values not just
> one, because this phenomenon of a badly skewed distribution is pretty
> common.

An end-biased histogram has stats on top values and also on the least
frequent values. So if a there is a selection on a value that is well
bellow average, the selectivity estimation will be more acurate. On some
research papers I've read, it's refered that this is a better approach
than equi-width histograms (which are said to be the "industry" standard).

I not sure whether to use a table or a array attribute on pg_stat for
the histogram, the problem is what could be expected from the size of the
attribute (being a text). I'm very affraid of the cost of going through
several tuples on a table (pg_histogram?) during the optimization phase.

One other idea would be to only have better statistics for special
attributes requested by the user... something like "analyze special
table(column)".

Best Regards,
Tiago

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Tiago Antão <tra(at)fct(dot)unl(dot)pt>
Cc:	Jules Bean <jules(at)jellybean(dot)co(dot)uk>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan
Date:	2000-08-24 03:56:35
Message-ID:	20970.967089395@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

=?iso-8859-1?Q?Tiago_Ant=E3o?= <tra(at)fct(dot)unl(dot)pt> writes:
> One other idea would be to only have better statistics for special
> attributes requested by the user... something like "analyze special
> table(column)".

This might actually fall out "for free" from the cheapest way of
implementing the stats. We've talked before about scanning btree
indexes directly to obtain data values in sorted order, which makes
it very easy to find the most common values. If you do that, you
get good stats for exactly those columns that the user has created
indexes on. A tad indirect but I bet it'd be effective...

regards, tom lane

From:	Jules Bean <jules(at)jellybean(dot)co(dot)uk>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Tiago Ant?o <tra(at)fct(dot)unl(dot)pt>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan
Date:	2000-08-24 09:11:14
Message-ID:	20000824101113.N17510@grommit.office.vi.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Aug 23, 2000 at 10:30:30AM -0400, Tom Lane wrote:
> Jules Bean <jules(at)jellybean(dot)co(dot)uk> writes:
> > I have in a table a 'category' column which takes a small number of
> > (basically fixed) values. Here by 'small', I mean ~1000, while the
> > table itself has ~10 000 000 rows. Some categories have many, many
> > more rows than others. In particular, there's one category which hits
> > over half the rows. Because of this (AIUI) postgresql assumes
> > that the query
> > select ... from thistable where category='something'
> > is best served by a seqscan, even though there is an index on
> > category.
>
> Yes, we know about that one. We have stats about the most common value
> in a column, but no information about how the less-common values are
> distributed. We definitely need stats about several top values not just
> one, because this phenomenon of a badly skewed distribution is pretty
> common.

ISTM that that might be enough, in fact.

If you have stats telling you that the most popular value is 'xyz',
and that it constitutes 50% of the rows (i.e. 5 000 000) then you can
conclude that, on average, other entries constitute a mere 5 000
000/999 ~~ 5000 entries, and it would be definitely be enough.
(That's assuming you store the number of distinct values somewhere).

> BTW, if your highly-popular value is actually a dummy value ('UNKNOWN'
> or something like that), a fairly effective workaround is to replace the
> dummy entries with NULL. The system does account for NULLs separately
> from real values, so you'd then get stats based on the most common
> non-dummy value.

I can't really do that. Even if I could, the distribution is very
skewed -- so the next most common makes up a very high proportion of
what's left. I forget the figures exactly.

Jules

From:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Tiago Antão <tra(at)fct(dot)unl(dot)pt>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan
Date:	2000-10-14 04:19:19
Message-ID:	200010140419.AAA22701@candle.pha.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> =?iso-8859-1?Q?Tiago_Ant=E3o?= <tra(at)fct(dot)unl(dot)pt> writes:
> > One thing it might be interesting (please tell me if you think
> > otherwise) would be to improve pg with better statistical information, by
> > using, for example, histograms.
>
> Yes, that's been on the todo list for a while.
>
> > There is a good article regarding this:
> > http://www.db2mag.com/summer00/programmer.shtml
>
> Interesting article. We do most of what she talks about, but we don't
> have anything like the ClusterRatio statistic. We need it --- that was
> just being discussed a few days ago in another thread. Do you have any
> reference on exactly how DB2 defines that stat?
>

Added to TODO:

* Keep statistics about clustering of table rows

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026

From:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To:	Tiago Antão <tra(at)fct(dot)unl(dot)pt>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan
Date:	2000-10-14 04:20:48
Message-ID:	200010140420.AAA22768@candle.pha.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On Mon, 21 Aug 2000, Tom Lane wrote:
>
> > > One thing it might be interesting (please tell me if you think
> > > otherwise) would be to improve pg with better statistical information, by
> > > using, for example, histograms.
> >
> > Yes, that's been on the todo list for a while.
>
> If it's ok and nobody is working on that, I'll look on that subject.
> I'll start by looking at the analize portion of vacuum. I'm thinking in
> using arrays for the histogram (I've never used the array data type of
> postgres).
> Should I use 7.0.2 or the cvs version?

If you don't like the analyze part, you can blame me. It is
non-optimal, but lean.

From:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To:	Tiago Antão <tra(at)fct(dot)unl(dot)pt>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jules Bean <jules(at)jellybean(dot)co(dot)uk>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan
Date:	2000-10-14 04:25:17
Message-ID:	200010140425.AAA23064@candle.pha.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> Hi!
>
> On Wed, 23 Aug 2000, Tom Lane wrote:
>
> > Yes, we know about that one. We have stats about the most common value
> > in a column, but no information about how the less-common values are
> > distributed. We definitely need stats about several top values not just
> > one, because this phenomenon of a badly skewed distribution is pretty
> > common.
>
>
> An end-biased histogram has stats on top values and also on the least
> frequent values. So if a there is a selection on a value that is well
> bellow average, the selectivity estimation will be more acurate. On some
> research papers I've read, it's refered that this is a better approach
> than equi-width histograms (which are said to be the "industry" standard).

I like this. I never liked the equal-size histograms. The lookup time
was too slow, and used too much disk space.

From:	The Hermit Hacker <scrappy(at)hub(dot)org>
To:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc:	Tiago Antão <tra(at)fct(dot)unl(dot)pt>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Optimisation deficiency: currval('seq')-->seq scan, constant-->index scan
Date:	2000-10-14 04:37:28
Message-ID:	Pine.BSF.4.21.0010140137050.637-100000@thelab.hub.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, 14 Oct 2000, Bruce Momjian wrote:

> > On Mon, 21 Aug 2000, Tom Lane wrote:
> >
> > > > One thing it might be interesting (please tell me if you think
> > > > otherwise) would be to improve pg with better statistical information, by
> > > > using, for example, histograms.
> > >
> > > Yes, that's been on the todo list for a while.
> >
> > If it's ok and nobody is working on that, I'll look on that subject.
> > I'll start by looking at the analize portion of vacuum. I'm thinking in
> > using arrays for the histogram (I've never used the array data type of
> > postgres).
> > Should I use 7.0.2 or the cvs version?

and, to answer what appears to have been missed ... use the CVS version
for anything like this *nod* *grin*