Re: performance of IN (subquery)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Paul Tillotson <pntil(at)shentel(dot)net>
Cc: Kevin Murphy <murphy(at)genome(dot)chop(dot)edu>, pgsql-general(at)postgresql(dot)org
Subject: Re: performance of IN (subquery)
Date: 2004-08-27 03:49:46
Message-ID: 5862.1093578586@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Paul Tillotson <pntil(at)shentel(dot)net> writes:
> The planner thinks that the sequential scan on elements will return 1000
> rows, but it actually returned 185000. Did you ANALYZE this table recently?

Or either of the other ones? All those scan costs look like defaults
:-(

> Afterthought: It would be nice if the database was smart enough to
> analyze a table of its own accord when a sequential scan returns more
> than, say, 20 times what it was supposed to.

I've thought about this before. One simple trick would be to get rid
of the current pg_class reltuples/relpages fields in favor of a
tuples-per-page estimate, which could be multiplied by
RelationGetNumberOfBlocks() during planning. In the absence of any
ANALYZE data the tuples-per-page estimate might be pretty bogus, but
it couldn't be off by more than an order of magnitude or so either way.
And in any case we'd have a guaranteed up-to-date number of blocks.

The objections that could be raised to this are (AFAICS) two:

1. Adding at least an lseek() kernel call per table, and per index, to
every planning operation. I'm not sure this would be significant,
but I'm not sure it wouldn't be, either.

2. Instability of plans. Right now, the planner will not change plans
underneath you --- you have to issue an explicit VACUUM or ANALYZE
to change the terms of discussion. That would stop being true if
physical file size were always taken into account. Maybe this is a
problem, or maybe it isn't ... as someone who likes to be able to
debug planner behavior without actually creating umpteen-gig test
tables, my world view may be a bit skewed ...

It's certainly doable if we decide the pluses outweigh the minuses.
Thoughts?

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2004-08-27 05:00:21 Re: owner orphaned databases
Previous Message Greg Stark 2004-08-27 03:33:35 Re: performance of IN (subquery)