Re: 8.4 optimization regression?

From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PGSQL Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: 8.4 optimization regression?
Date: 2011-09-01 23:54:23
Message-ID: 4E601B2F.30107@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 02/09/11 11:18, Mark Kirkwood wrote:
> On 02/09/11 11:13, Tom Lane wrote:
>> I wrote:
>>> Mark Kirkwood<mark(dot)kirkwood(at)catalyst(dot)net(dot)nz> writes:
>>>> [ assorted examples showing that commit
>>>> 7f3eba30c9d622d1981b1368f2d79ba0999cdff2 has got problems ]
>>> ...
>>> So, not only are you correct that we should revert the changes to
>>> eqjoinsel_inner, but what's happening in eqjoinsel_semi is wrong too.
>> I've retested these examples with the patches I committed yesterday.
>> Six of the eight examples are estimated pretty nearly dead on, while the
>> other two are estimated about 50% too high (still a lot better than
>> before). AFAICT there's no easy way to improve those estimates further;
>> eqjoinsel_semi just plain hasn't got enough information to know how many
>> matches there will be.
>>
>>
>
> Just noticed your two commits this morning and ran them through the
> examples too - results look really good! Not only are the plain join
> queries looking way better but that last semi join that was way off is
> now being estimated pretty close. Should be interesting to see how
> much this improves more complex queries!
>
>

While this is still fresh in your mind, a couple of additional anti join
queries are still managing to sneak past estimation:

EXPLAIN ANALYZE SELECT 1 FROM nodekeyword nk WHERE nk.keywordid < 100000
AND NOT EXISTS (SELECT 1 FROM node n WHERE n.nodeid = nk.nodeid AND
n.updated > '2011-01-01'::timestamp );
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Hash Anti Join (cost=501666.88..851597.05 rows=1 width=0) (actual
time=29956.971..50933.702 rows=5914 loops=1)
Hash Cond: (nk.nodeid = n.nodeid)
-> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=1991560
width=4) (actual time=13.352..13765.749 rows=1999780 loops=1)
Filter: (keywordid < 100000)
-> Hash (cost=419643.00..419643.00 rows=4999510 width=4) (actual
time=29345.238..29345.238 rows=4985269 loops=1)
Buckets: 4096 Batches: 256 Memory Usage: 699kB
-> Seq Scan on node n (cost=0.00..419643.00 rows=4999510
width=4) (actual time=0.010..22731.316 rows=4985269 loops=1)
Filter: (updated > '2011-01-01 00:00:00'::timestamp
without time zone)

EXPLAIN ANALYZE SELECT 1 FROM nodekeyword nk WHERE nk.keywordid < 10000
AND NOT EXISTS (SELECT 1 FROM node n WHERE n.nodeid = nk.nodeid AND
n.updated > '2011-01-01'::timestamp );
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Hash Anti Join (cost=501666.88..821806.96 rows=1 width=0) (actual
time=46497.231..49196.057 rows=566 loops=1)
Hash Cond: (nk.nodeid = n.nodeid)
-> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=192921
width=4) (actual time=19.916..16250.224 rows=199616 loops=1)
Filter: (keywordid < 10000)
-> Hash (cost=419643.00..419643.00 rows=4999510 width=4) (actual
time=29901.178..29901.178 rows=4985269 loops=1)
Buckets: 4096 Batches: 256 Memory Usage: 699kB
-> Seq Scan on node n (cost=0.00..419643.00 rows=4999510
width=4) (actual time=0.008..23207.964 rows=4985269 loops=1)
Filter: (updated > '2011-01-01 00:00:00'::timestamp
without time zone)

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jesper Krogh 2011-09-02 04:14:50 Re: Summaries on SSD usage?
Previous Message Mark Kirkwood 2011-09-01 23:18:58 Re: 8.4 optimization regression?