Lists: | pgsql-hackers |
---|
From: | "Emmanuel Cecchet" <Emmanuel(dot)Cecchet(at)asterdata(dot)com> |
---|---|
To: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Cc: | "manu(at)frogthinker(dot)org" <manu(at)frogthinker(dot)org> |
Subject: | Wrong stats for empty tables |
Date: | 2009-05-05 16:27:55 |
Message-ID: | 43826FCDC252204EA7823B2E7CF3CCEC06CBE567@Pandora.AsterData.local |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi,
Here is an example showing the problem:
Welcome to psql 8.3.6, the PostgreSQL interactive terminal.
manu=# create table foo (x int);
CREATE TABLE
manu=# explain select * from foo;
QUERY PLAN
-------------------------------------------------------
Seq Scan on foo (cost=0.00..34.00 rows=2400 width=4)
(1 row)
manu=# analyze foo;
ANALYZE
manu=# explain select * from foo;
QUERY PLAN
-------------------------------------------------------
Seq Scan on foo (cost=0.00..34.00 rows=2400 width=4)
(1 row)
manu=# insert into foo values (1);
INSERT 0 1
manu=# analyze foo;
ANALYZE
manu=# explain select * from foo;
QUERY PLAN
---------------------------------------------------
Seq Scan on foo (cost=0.00..1.01 rows=1 width=4)
(1 row)
Now a possible cause for this might be the relpages attribute in pg_class (the default value 0 does not seem to be interpreted correctly):
manu=# create table bar(x int);
CREATE TABLE
manu=# explain select * from bar;
QUERY PLAN
-------------------------------------------------------
Seq Scan on bar (cost=0.00..34.00 rows=2400 width=4)
(1 row)
manu=# select relpages from pg_class where relname='bar';
relpages
----------
0
(1 row)
manu=# update pg_class set relpages=1 where relname='bar';
UPDATE 1
manu=# explain select * from bar;
QUERY PLAN
---------------------------------------------------
Seq Scan on bar (cost=0.00..0.00 rows=1 width=4)
(1 row)
This is a real problem if you have a lot of empty child tables. Postgres will not optimize correctly queries in the presence of empty child tables.
Is this a bug?
Thanks for your help,
Emmanuel
--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | "Emmanuel Cecchet" <Emmanuel(dot)Cecchet(at)asterdata(dot)com> |
Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "manu(at)frogthinker(dot)org" <manu(at)frogthinker(dot)org> |
Subject: | Re: Wrong stats for empty tables |
Date: | 2009-05-05 16:52:23 |
Message-ID: | 19504.1241542343@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
"Emmanuel Cecchet" <Emmanuel(dot)Cecchet(at)asterdata(dot)com> writes:
> Is this a bug?
No, it's intentional.
regards, tom lane
From: | "Emmanuel Cecchet" <Emmanuel(dot)Cecchet(at)asterdata(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "manu(at)frogthinker(dot)org" <manu(at)frogthinker(dot)org> |
Subject: | Re: Wrong stats for empty tables |
Date: | 2009-05-05 18:03:18 |
Message-ID: | 43826FCDC252204EA7823B2E7CF3CCEC06CBE56A@Pandora.AsterData.local |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
From: Tom Lane [tgl(at)sss(dot)pgh(dot)pa(dot)us]
Subject: Re: [HACKERS] Wrong stats for empty tables
"Emmanuel Cecchet" <Emmanuel(dot)Cecchet(at)asterdata(dot)com> writes:
> Is this a bug?
No, it's intentional.
So what is the rationale behind not being able to use indexes and optimizing empty tables as in the following example:
manu=# create table father (id int, val int, tex varchar(100), primary key(id));
manu=# create table other (id1 int, id2 int, data varchar(10), primary key(id1,id2));
insert some data
manu=# explain select father.*,id2 from father left join other on father.id=other.id1 where id2=2 order by id;
QUERY PLAN
------------------------------------------------------------------------
Sort (cost=37.81..37.82 rows=5 width=230)
Sort Key: father.id
-> Hash Join (cost=23.44..37.75 rows=5 width=230)
Hash Cond: (father.id = other.id1)
-> Seq Scan on father (cost=0.00..13.10 rows=310 width=226)
-> Hash (cost=23.38..23.38 rows=5 width=8)
-> Seq Scan on other (cost=0.00..23.38 rows=5 width=8)
Filter: (id2 = 2)
(8 rows)
manu=# create table child1() inherits(father);
manu=# create table child2() inherits(father);
manu=# create table child3() inherits(father);
manu=# create table child4() inherits(father);
manu=# create table child5() inherits(father);
manu=# create table child6() inherits(father);
manu=# create table child7() inherits(father);
manu=# create index i1 on child1(id);
manu=# create index i2 on child2(id);
manu=# create index i3 on child3(id);
manu=# create index i4 on child4(id);
manu=# create index i5 on child5(id);
manu=# create index i6 on child6(id);
manu=# create index i7 on child7(id);
manu=# explain select father.*,id2 from father left join other on father.id=other.id1 where id2=2 order by id;
QUERY PLAN
------------------------------------------------------------------------------------
Sort (cost=140.00..140.16 rows=62 width=230)
Sort Key: public.father.id
-> Hash Join (cost=23.44..138.16 rows=62 width=230)
Hash Cond: (public.father.id = other.id1)
-> Append (cost=0.00..104.80 rows=2480 width=226)
-> Seq Scan on father (cost=0.00..13.10 rows=310 width=226)
-> Seq Scan on child1 father (cost=0.00..13.10 rows=310 width=226)
-> Seq Scan on child2 father (cost=0.00..13.10 rows=310 width=226)
-> Seq Scan on child3 father (cost=0.00..13.10 rows=310 width=226)
-> Seq Scan on child4 father (cost=0.00..13.10 rows=310 width=226)
-> Seq Scan on child5 father (cost=0.00..13.10 rows=310 width=226)
-> Seq Scan on child6 father (cost=0.00..13.10 rows=310 width=226)
-> Seq Scan on child7 father (cost=0.00..13.10 rows=310 width=226)
-> Hash (cost=23.38..23.38 rows=5 width=8)
-> Seq Scan on other (cost=0.00..23.38 rows=5 width=8)
Filter: (id2 = 2)
(16 rows)
I must admit that I did not see what the original intention was to get this behavior.
Emmanuel
From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "manu(at)frogthinker(dot)org" <manu(at)frogthinker(dot)org> |
Subject: | Re: Wrong stats for empty tables |
Date: | 2009-05-05 19:30:18 |
Message-ID: | 603c8f070905051230w55c4c6cbwcb8eea7d5201f5f4@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Tue, May 5, 2009 at 2:03 PM, Emmanuel Cecchet
<Emmanuel(dot)Cecchet(at)asterdata(dot)com> wrote:
> So what is the rationale behind not being able to use indexes and optimizing empty tables as in the following example:
>
> manu=# create table father (id int, val int, tex varchar(100), primary key(id));
> manu=# create table other (id1 int, id2 int, data varchar(10), primary key(id1,id2));
> insert some data
> manu=# explain select father.*,id2 from father left join other on father.id=other.id1 where id2=2 order by id;
Just because the table was empty at the time statistics were most
recently gathered doesn't mean it's still empty at the time the query
is executed.
ANALYZE;
PREPARE foo AS SELECT ...;
INSERT INTO ...some previously empty child table...
EXECUTE foo;
In order to rely on this for query planning, you'd need some way to
invalidate any cached plans when inserting into an empty table.
...Robert
From: | Josh Berkus <josh(at)agliodbs(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, pgsql-hackers(at)postgresql(dot)org, manu(at)frogthinker(dot)org |
Subject: | Re: Wrong stats for empty tables |
Date: | 2009-05-05 22:10:06 |
Message-ID: | 4A00B93E.8090804@agliodbs.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 5/5/09 9:52 AM, Tom Lane wrote:
> "Emmanuel Cecchet"<Emmanuel(dot)Cecchet(at)asterdata(dot)com> writes:
>> Is this a bug?
>
> No, it's intentional.
Huh? Why would we want wrong stats?
--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Josh Berkus <josh(at)agliodbs(dot)com> |
Cc: | Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, pgsql-hackers(at)postgresql(dot)org, manu(at)frogthinker(dot)org |
Subject: | Re: Wrong stats for empty tables |
Date: | 2009-05-05 22:27:12 |
Message-ID: | 13505.1241562432@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Josh Berkus <josh(at)agliodbs(dot)com> writes:
> On 5/5/09 9:52 AM, Tom Lane wrote:
>> No, it's intentional.
> Huh? Why would we want wrong stats?
Tables rarely stay empty; and a plan generated on the assumption that a
table is empty is likely to suck much more when the table stops being
empty than a plan generated on the assumption that the table contains
some data will suck when it really doesn't. Neither case is really
attractive, but the downside of a size underestimate tends to be a
lot worse than that of an overestimate.
This decision was made before we had autovacuum/autoanalyze support
or the ability to replan automatically after a stats update, but I think
it's still good even now that we do. You can add a hundred or so tuples
to an empty table before autovac will deign to pay attention, and that's
more than enough to blow a nestloop plan out of the water. Also, the
most common case for this type of issue is a temp table, which autovac
can't help with at all.
regards, tom lane