Re: Wrong stats for empty tables

Lists: pgsql-hackers
From: "Emmanuel Cecchet" <Emmanuel(dot)Cecchet(at)asterdata(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Cc: "manu(at)frogthinker(dot)org" <manu(at)frogthinker(dot)org>
Subject: Wrong stats for empty tables
Date: 2009-05-05 16:27:55
Message-ID: 43826FCDC252204EA7823B2E7CF3CCEC06CBE567@Pandora.AsterData.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Here is an example showing the problem:

Welcome to psql 8.3.6, the PostgreSQL interactive terminal.
manu=# create table foo (x int);
CREATE TABLE
manu=# explain select * from foo;
QUERY PLAN
-------------------------------------------------------
Seq Scan on foo (cost=0.00..34.00 rows=2400 width=4)
(1 row)

manu=# analyze foo;
ANALYZE
manu=# explain select * from foo;
QUERY PLAN
-------------------------------------------------------
Seq Scan on foo (cost=0.00..34.00 rows=2400 width=4)
(1 row)

manu=# insert into foo values (1);
INSERT 0 1
manu=# analyze foo;
ANALYZE
manu=# explain select * from foo;
QUERY PLAN
---------------------------------------------------
Seq Scan on foo (cost=0.00..1.01 rows=1 width=4)
(1 row)

Now a possible cause for this might be the relpages attribute in pg_class (the default value 0 does not seem to be interpreted correctly):

manu=# create table bar(x int);
CREATE TABLE
manu=# explain select * from bar;
QUERY PLAN
-------------------------------------------------------
Seq Scan on bar (cost=0.00..34.00 rows=2400 width=4)
(1 row)

manu=# select relpages from pg_class where relname='bar';
relpages
----------
0
(1 row)
manu=# update pg_class set relpages=1 where relname='bar';
UPDATE 1
manu=# explain select * from bar;
QUERY PLAN
---------------------------------------------------
Seq Scan on bar (cost=0.00..0.00 rows=1 width=4)
(1 row)

This is a real problem if you have a lot of empty child tables. Postgres will not optimize correctly queries in the presence of empty child tables.

Is this a bug?

Thanks for your help,
Emmanuel
--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Emmanuel Cecchet" <Emmanuel(dot)Cecchet(at)asterdata(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "manu(at)frogthinker(dot)org" <manu(at)frogthinker(dot)org>
Subject: Re: Wrong stats for empty tables
Date: 2009-05-05 16:52:23
Message-ID: 19504.1241542343@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Emmanuel Cecchet" <Emmanuel(dot)Cecchet(at)asterdata(dot)com> writes:
> Is this a bug?

No, it's intentional.

regards, tom lane


From: "Emmanuel Cecchet" <Emmanuel(dot)Cecchet(at)asterdata(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "manu(at)frogthinker(dot)org" <manu(at)frogthinker(dot)org>
Subject: Re: Wrong stats for empty tables
Date: 2009-05-05 18:03:18
Message-ID: 43826FCDC252204EA7823B2E7CF3CCEC06CBE56A@Pandora.AsterData.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

From: Tom Lane [tgl(at)sss(dot)pgh(dot)pa(dot)us]
Subject: Re: [HACKERS] Wrong stats for empty tables

"Emmanuel Cecchet" <Emmanuel(dot)Cecchet(at)asterdata(dot)com> writes:
> Is this a bug?

No, it's intentional.

So what is the rationale behind not being able to use indexes and optimizing empty tables as in the following example:

manu=# create table father (id int, val int, tex varchar(100), primary key(id));
manu=# create table other (id1 int, id2 int, data varchar(10), primary key(id1,id2));
insert some data
manu=# explain select father.*,id2 from father left join other on father.id=other.id1 where id2=2 order by id;
QUERY PLAN
------------------------------------------------------------------------
Sort (cost=37.81..37.82 rows=5 width=230)
Sort Key: father.id
-> Hash Join (cost=23.44..37.75 rows=5 width=230)
Hash Cond: (father.id = other.id1)
-> Seq Scan on father (cost=0.00..13.10 rows=310 width=226)
-> Hash (cost=23.38..23.38 rows=5 width=8)
-> Seq Scan on other (cost=0.00..23.38 rows=5 width=8)
Filter: (id2 = 2)
(8 rows)

manu=# create table child1() inherits(father);
manu=# create table child2() inherits(father);
manu=# create table child3() inherits(father);
manu=# create table child4() inherits(father);
manu=# create table child5() inherits(father);
manu=# create table child6() inherits(father);
manu=# create table child7() inherits(father);
manu=# create index i1 on child1(id);
manu=# create index i2 on child2(id);
manu=# create index i3 on child3(id);
manu=# create index i4 on child4(id);
manu=# create index i5 on child5(id);
manu=# create index i6 on child6(id);
manu=# create index i7 on child7(id);
manu=# explain select father.*,id2 from father left join other on father.id=other.id1 where id2=2 order by id;
QUERY PLAN
------------------------------------------------------------------------------------
Sort (cost=140.00..140.16 rows=62 width=230)
Sort Key: public.father.id
-> Hash Join (cost=23.44..138.16 rows=62 width=230)
Hash Cond: (public.father.id = other.id1)
-> Append (cost=0.00..104.80 rows=2480 width=226)
-> Seq Scan on father (cost=0.00..13.10 rows=310 width=226)
-> Seq Scan on child1 father (cost=0.00..13.10 rows=310 width=226)
-> Seq Scan on child2 father (cost=0.00..13.10 rows=310 width=226)
-> Seq Scan on child3 father (cost=0.00..13.10 rows=310 width=226)
-> Seq Scan on child4 father (cost=0.00..13.10 rows=310 width=226)
-> Seq Scan on child5 father (cost=0.00..13.10 rows=310 width=226)
-> Seq Scan on child6 father (cost=0.00..13.10 rows=310 width=226)
-> Seq Scan on child7 father (cost=0.00..13.10 rows=310 width=226)
-> Hash (cost=23.38..23.38 rows=5 width=8)
-> Seq Scan on other (cost=0.00..23.38 rows=5 width=8)
Filter: (id2 = 2)
(16 rows)

I must admit that I did not see what the original intention was to get this behavior.
Emmanuel


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "manu(at)frogthinker(dot)org" <manu(at)frogthinker(dot)org>
Subject: Re: Wrong stats for empty tables
Date: 2009-05-05 19:30:18
Message-ID: 603c8f070905051230w55c4c6cbwcb8eea7d5201f5f4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, May 5, 2009 at 2:03 PM, Emmanuel Cecchet
<Emmanuel(dot)Cecchet(at)asterdata(dot)com> wrote:
> So what is the rationale behind not being able to use indexes and optimizing empty tables as in the following example:
>
> manu=# create table father (id int, val int, tex varchar(100), primary key(id));
> manu=# create table other (id1 int, id2 int, data varchar(10), primary key(id1,id2));
> insert some data
> manu=# explain select father.*,id2 from father left join other on father.id=other.id1 where id2=2 order by id;

Just because the table was empty at the time statistics were most
recently gathered doesn't mean it's still empty at the time the query
is executed.

ANALYZE;
PREPARE foo AS SELECT ...;
INSERT INTO ...some previously empty child table...
EXECUTE foo;

In order to rely on this for query planning, you'd need some way to
invalidate any cached plans when inserting into an empty table.

...Robert


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, pgsql-hackers(at)postgresql(dot)org, manu(at)frogthinker(dot)org
Subject: Re: Wrong stats for empty tables
Date: 2009-05-05 22:10:06
Message-ID: 4A00B93E.8090804@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 5/5/09 9:52 AM, Tom Lane wrote:
> "Emmanuel Cecchet"<Emmanuel(dot)Cecchet(at)asterdata(dot)com> writes:
>> Is this a bug?
>
> No, it's intentional.

Huh? Why would we want wrong stats?

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, pgsql-hackers(at)postgresql(dot)org, manu(at)frogthinker(dot)org
Subject: Re: Wrong stats for empty tables
Date: 2009-05-05 22:27:12
Message-ID: 13505.1241562432@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Josh Berkus <josh(at)agliodbs(dot)com> writes:
> On 5/5/09 9:52 AM, Tom Lane wrote:
>> No, it's intentional.

> Huh? Why would we want wrong stats?

Tables rarely stay empty; and a plan generated on the assumption that a
table is empty is likely to suck much more when the table stops being
empty than a plan generated on the assumption that the table contains
some data will suck when it really doesn't. Neither case is really
attractive, but the downside of a size underestimate tends to be a
lot worse than that of an overestimate.

This decision was made before we had autovacuum/autoanalyze support
or the ability to replan automatically after a stats update, but I think
it's still good even now that we do. You can add a hundred or so tuples
to an empty table before autovac will deign to pay attention, and that's
more than enough to blow a nestloop plan out of the water. Also, the
most common case for this type of issue is a temp table, which autovac
can't help with at all.

regards, tom lane