Quick Links

Re: 8.4 optimization regression?

From:	Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	PGSQL Performance <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: 8.4 optimization regression?
Date:	2011-09-01 23:54:23
Message-ID:	4E601B2F.30107@catalyst.net.nz
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On 02/09/11 11:18, Mark Kirkwood wrote:
> On 02/09/11 11:13, Tom Lane wrote:
>> I wrote:
>>> Mark Kirkwood<mark(dot)kirkwood(at)catalyst(dot)net(dot)nz> writes:
>>>> [ assorted examples showing that commit
>>>> 7f3eba30c9d622d1981b1368f2d79ba0999cdff2 has got problems ]
>>> ...
>>> So, not only are you correct that we should revert the changes to
>>> eqjoinsel_inner, but what's happening in eqjoinsel_semi is wrong too.
>> I've retested these examples with the patches I committed yesterday.
>> Six of the eight examples are estimated pretty nearly dead on, while the
>> other two are estimated about 50% too high (still a lot better than
>> before). AFAICT there's no easy way to improve those estimates further;
>> eqjoinsel_semi just plain hasn't got enough information to know how many
>> matches there will be.
>>
>>
>
> Just noticed your two commits this morning and ran them through the
> examples too - results look really good! Not only are the plain join
> queries looking way better but that last semi join that was way off is
> now being estimated pretty close. Should be interesting to see how
> much this improves more complex queries!
>
>

While this is still fresh in your mind, a couple of additional anti join
queries are still managing to sneak past estimation:

EXPLAIN ANALYZE SELECT 1 FROM nodekeyword nk WHERE nk.keywordid < 100000
AND NOT EXISTS (SELECT 1 FROM node n WHERE n.nodeid = nk.nodeid AND
n.updated > '2011-01-01'::timestamp );
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Hash Anti Join (cost=501666.88..851597.05 rows=1 width=0) (actual
time=29956.971..50933.702 rows=5914 loops=1)
Hash Cond: (nk.nodeid = n.nodeid)
-> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=1991560
width=4) (actual time=13.352..13765.749 rows=1999780 loops=1)
Filter: (keywordid < 100000)
-> Hash (cost=419643.00..419643.00 rows=4999510 width=4) (actual
time=29345.238..29345.238 rows=4985269 loops=1)
Buckets: 4096 Batches: 256 Memory Usage: 699kB
-> Seq Scan on node n (cost=0.00..419643.00 rows=4999510
width=4) (actual time=0.010..22731.316 rows=4985269 loops=1)
Filter: (updated > '2011-01-01 00:00:00'::timestamp
without time zone)

EXPLAIN ANALYZE SELECT 1 FROM nodekeyword nk WHERE nk.keywordid < 10000
AND NOT EXISTS (SELECT 1 FROM node n WHERE n.nodeid = nk.nodeid AND
n.updated > '2011-01-01'::timestamp );
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Hash Anti Join (cost=501666.88..821806.96 rows=1 width=0) (actual
time=46497.231..49196.057 rows=566 loops=1)
Hash Cond: (nk.nodeid = n.nodeid)
-> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=192921
width=4) (actual time=19.916..16250.224 rows=199616 loops=1)
Filter: (keywordid < 10000)
-> Hash (cost=419643.00..419643.00 rows=4999510 width=4) (actual
time=29901.178..29901.178 rows=4985269 loops=1)
Buckets: 4096 Batches: 256 Memory Usage: 699kB
-> Seq Scan on node n (cost=0.00..419643.00 rows=4999510
width=4) (actual time=0.008..23207.964 rows=4985269 loops=1)
Filter: (updated > '2011-01-01 00:00:00'::timestamp
without time zone)

In response to

Re: 8.4 optimization regression? at 2011-09-01 23:18:58 from Mark Kirkwood

Responses

Re: 8.4 optimization regression? at 2011-09-04 22:14:59 from Tom Lane

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Jesper Krogh	2011-09-02 04:14:50	Re: Summaries on SSD usage?
Previous Message	Mark Kirkwood	2011-09-01 23:18:58	Re: 8.4 optimization regression?