Re: question about executing JOINs

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Josh Burdick <jburdick(at)gradient(dot)cis(dot)upenn(dot)edu>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: question about executing JOINs
Date: 2002-10-13 00:30:48
Message-ID: 24975.1034469048@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Josh Burdick <jburdick(at)gradient(dot)cis(dot)upenn(dot)edu> writes:
> Nested Loop (cost=0.00..198183643.06 rows=85230539 width=51)
> -> Seq Scan on ucsc_ref_seq_ali_hg12 (cost=0.00..817.65 rows=15165
> width=29)
> -> Index Scan using ucsc_snp_tsc_hg12_chrom_start on
> ucsc_snp_tsc_hg12 (cost=0.00..12962.39 rows=4713 width=22)

What's the actual numbers of rows involved? (EXPLAIN ANALYZE output
would be far more useful than plain EXPLAIN.)

Do you have a feeling for the number of rows that would be produced by
just the JOIN/ON condition (no constraint on snp_start)? How does that
compare to EXPLAIN's estimate of that number of rows?

> The planner is assuming that it's a cross join: that we need all
> pairs of records. It's not taking into account the WHERE clause which
> restricts to a tiny fraction of the records.

No it isn't, and yes it is, but it evidently is making a bad estimate of
the fraction of rows eliminated by those clauses. I'd like to find out
just what its estimate of that fraction is and what the correct value
would be.

> Perhaps the planner should assume that a nested loop over an index
> scan only looks at 1% of its records?

Arbitrary assumptions designed to fix one example tend to break other
examples ...

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Martijn van Oosterhout 2002-10-13 02:54:52 Re: Removing {"="} privledges
Previous Message Tom Lane 2002-10-12 20:36:44 \copy needs work (was Re: Changing Column Order)