Re: Need help with one query

Lists: pgsql-performance
From: Anne Rosset <arosset(at)collab(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: Need help with one query
Date: 2009-03-19 20:35:01
Message-ID: 49C2AC75.5060601@collab.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hi,
We have the following 2 tables:
\d audit_change
Table "public.audit_change"
Column | Type | Modifiers
----------------+------------------------+-----------
id | character varying(32) | not null
audit_entry_id | character varying(32) |
...
Indexes:
"audit_change_pk" primary key, btree (id)
"audit_change_entry" btree (audit_entry_id)

and
\d audit_entry;
Table "public.audit_entry"
Column | Type | Modifiers
----------------+--------------------------+-----------
id | character varying(32) | not null
object_id | character varying(32) | not null
...
Indexes:
"audit_entry_pk" primary key, btree (id)
"audit_entry_object" btree (object_id)

We do the following query:
EXPLAIN ANALYZE
SELECT audit_change.id AS id,
audit_change.audit_entry_id AS auditEntryId,
audit_entry.object_id AS objectId,
audit_change.property_name AS propertyName,
audit_change.property_type AS propertyType,
audit_change.old_value AS
oldValue, audit_change.new_value AS newValue,
audit_change.flexfield AS flexField
FROM audit_entry audit_entry, audit_change audit_change
WHERE audit_change.audit_entry_id=audit_entry.id
AND audit_entry.object_id='artf414029';

QUERY
PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=8.79..253664.55 rows=4 width=136) (actual
time=4612.674..6683.158 rows=4 loops=1)
Hash Cond: ((audit_change.audit_entry_id)::text = (audit_entry.id)::text)
-> Seq Scan on audit_change (cost=0.00..225212.52 rows=7584852
width=123) (actual time=0.009..2838.216 rows=7584852 loops=1)
-> Hash (cost=8.75..8.75 rows=3 width=45) (actual time=0.049..0.049
rows=4 loops=1)
-> Index Scan using audit_entry_object on audit_entry
(cost=0.00..8.75 rows=3 width=45) (actual time=0.033..0.042 rows=4 loops=1)
Index Cond: ((object_id)::text = 'artf414029'::text)
Total runtime: 6683.220 ms
(7 rows)

Why does the query not use the index on audit_entry_id and do a seq scan
(as you see the table has many rows)?

If we split the query into 2 queries, it only takes less than 0.3 ms
EXPLAIN ANALYZE select * from audit_entry WHERE
audit_entry.object_id='artf414029';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Index Scan using audit_entry_object on audit_entry (cost=0.00..8.75
rows=3 width=111) (actual time=0.037..0.044 rows=4 loops=1)
Index Cond: ((object_id)::text = 'artf414029'::text)
Total runtime: 0.073 ms
(3 rows)

EXPLAIN ANALYZE select * from audit_change WHERE audit_entry_id in
('adte1DDFEA5B011C8988C3928752', 'adte5DDFEA5B011D441230BD20CC',
'adte5DDFEA5B011E40601E8DA10F', 'adte5DDFEA5B011E8CC26071627C') ORDER BY
property_name ASC;

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=30.25..30.27 rows=10 width=123) (actual time=0.190..0.192
rows=4 loops=1)
Sort Key: property_name
-> Bitmap Heap Scan on audit_change (cost=9.99..30.08 rows=10
width=123) (actual time=0.173..0.177 rows=4 loops=1)
Recheck Cond: ((audit_entry_id)::text = ANY
(('{adte1DDFEA5B011C8988C3928752,adte5DDFEA5B011D441230BD20CC,adte5DDFEA5B011E40601E8DA10F,adte5DDFEA5B011E8CC26071627C}'::character
varying[])::text[]))
-> Bitmap Index Scan on audit_change_entry (cost=0.00..9.99
rows=10 width=0) (actual time=0.167..0.167 rows=4 loops=1)
Index Cond: ((audit_entry_id)::text = ANY
(('{adte1DDFEA5B011C8988C3928752,adte5DDFEA5B011D441230BD20CC,adte5DDFEA5B011E40601E8DA10F,adte5DDFEA5B011E8CC26071627C}'::character
varying[])::text[]))
Total runtime: 0.219 ms
(7 rows)

Thanks for your help,
Anne


From: Richard Huxton <dev(at)archonet(dot)com>
To: Anne Rosset <arosset(at)collab(dot)net>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Need help with one query
Date: 2009-03-20 09:21:18
Message-ID: 49C3600E.20504@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Anne Rosset wrote:
> EXPLAIN ANALYZE
> SELECT
> audit_change.id AS id,
> audit_change.audit_entry_id AS auditEntryId,
> audit_entry.object_id AS objectId,
> audit_change.property_name AS propertyName,
> audit_change.property_type AS propertyType,
> audit_change.old_value AS oldValue,
> audit_change.new_value AS newValue,
> audit_change.flexfield AS flexField
> FROM
> audit_entry audit_entry, audit_change audit_change
> WHERE
> audit_change.audit_entry_id = audit_entry.id
> AND audit_entry.object_id = 'artf414029';
[query reformatted to make it more readable]

Not quite clear why you are aliasing the tables to their own names...

> ---------------------------------------------------------------------------------------------------------------------------------------------
>
> Hash Join (cost=8.79..253664.55 rows=4 width=136) (actual
> time=4612.674..6683.158 rows=4 loops=1)
> Hash Cond: ((audit_change.audit_entry_id)::text = (audit_entry.id)::text)
> -> Seq Scan on audit_change (cost=0.00..225212.52 rows=7584852
> width=123) (actual time=0.009..2838.216 rows=7584852 loops=1)
> -> Hash (cost=8.75..8.75 rows=3 width=45) (actual time=0.049..0.049
> rows=4 loops=1)
> -> Index Scan using audit_entry_object on audit_entry
> (cost=0.00..8.75 rows=3 width=45) (actual time=0.033..0.042 rows=4 loops=1)
> Index Cond: ((object_id)::text = 'artf414029'::text)
> Total runtime: 6683.220 ms

Very odd. It knows the table is large and that the seq-scan is going to
be expensive.

Try issuing "set enable_seqscan = off" and run the explain analyse
again. That should show the cost of using the indexes.

--
Richard Huxton
Archonet Ltd


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Richard Huxton <dev(at)archonet(dot)com>
Cc: Anne Rosset <arosset(at)collab(dot)net>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Need help with one query
Date: 2009-03-20 15:27:57
Message-ID: 13823.1237562877@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Richard Huxton <dev(at)archonet(dot)com> writes:
>> Hash Join (cost=8.79..253664.55 rows=4 width=136) (actual
>> time=4612.674..6683.158 rows=4 loops=1)
>> Hash Cond: ((audit_change.audit_entry_id)::text = (audit_entry.id)::text)
>> -> Seq Scan on audit_change (cost=0.00..225212.52 rows=7584852
>> width=123) (actual time=0.009..2838.216 rows=7584852 loops=1)
>> -> Hash (cost=8.75..8.75 rows=3 width=45) (actual time=0.049..0.049
>> rows=4 loops=1)
>> -> Index Scan using audit_entry_object on audit_entry
>> (cost=0.00..8.75 rows=3 width=45) (actual time=0.033..0.042 rows=4 loops=1)
>> Index Cond: ((object_id)::text = 'artf414029'::text)
>> Total runtime: 6683.220 ms

> Very odd. It knows the table is large and that the seq-scan is going to
> be expensive.

Yeah, *very* odd. A nestloop with inner indexscan should have an
estimated cost far lower than this plan. What Postgres version is
this exactly? Do you have any nondefault planner parameter settings?

regards, tom lane


From: Anne Rosset <arosset(at)collab(dot)net>
To: Richard Huxton <dev(at)archonet(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Need help with one query
Date: 2009-03-20 17:16:02
Message-ID: 49C3CF52.9090602@collab.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Richard Huxton wrote:

>Anne Rosset wrote:
>
>
>>EXPLAIN ANALYZE
>>SELECT
>> audit_change.id AS id,
>> audit_change.audit_entry_id AS auditEntryId,
>> audit_entry.object_id AS objectId,
>> audit_change.property_name AS propertyName,
>> audit_change.property_type AS propertyType,
>> audit_change.old_value AS oldValue,
>> audit_change.new_value AS newValue,
>> audit_change.flexfield AS flexField
>>FROM
>> audit_entry audit_entry, audit_change audit_change
>>WHERE
>> audit_change.audit_entry_id = audit_entry.id
>> AND audit_entry.object_id = 'artf414029';
>>
>>
>[query reformatted to make it more readable]
>
>Not quite clear why you are aliasing the tables to their own names...
>
>
>
>>---------------------------------------------------------------------------------------------------------------------------------------------
>>
>>Hash Join (cost=8.79..253664.55 rows=4 width=136) (actual
>>time=4612.674..6683.158 rows=4 loops=1)
>> Hash Cond: ((audit_change.audit_entry_id)::text = (audit_entry.id)::text)
>> -> Seq Scan on audit_change (cost=0.00..225212.52 rows=7584852
>>width=123) (actual time=0.009..2838.216 rows=7584852 loops=1)
>> -> Hash (cost=8.75..8.75 rows=3 width=45) (actual time=0.049..0.049
>>rows=4 loops=1)
>> -> Index Scan using audit_entry_object on audit_entry
>>(cost=0.00..8.75 rows=3 width=45) (actual time=0.033..0.042 rows=4 loops=1)
>> Index Cond: ((object_id)::text = 'artf414029'::text)
>>Total runtime: 6683.220 ms
>>
>>
>
>Very odd. It knows the table is large and that the seq-scan is going to
>be expensive.
>
>Try issuing "set enable_seqscan = off" and run the explain analyse
>again. That should show the cost of using the indexes.
>
>
>

With "set enable_seqscan = off":

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=11.35..12497.53 rows=59 width=859) (actual
time=46.074..49.742 rows=7 loops=1)
-> Index Scan using audit_entry_pk on audit_entry (cost=0.00..7455.95
rows=55 width=164) (actual time=45.940..49.541 rows=2 loops=1)
Filter: ((object_id)::text = 'artf1024'::text)
-> Bitmap Heap Scan on audit_change (cost=11.35..90.93 rows=59
width=777) (actual time=0.086..0.088 rows=4 loops=2)
Recheck Cond: ((audit_change.audit_entry_id)::text = (audit_entry.id)::text)
-> Bitmap Index Scan on audit_change_entry (cost=0.00..11.33 rows=59
width=0) (actual time=0.076..0.076 rows=4 loops=2)
Index Cond: ((audit_change.audit_entry_id)::text = (audit_entry.id)::text)
Total runtime: 49.801 ms

The db version is 8.2.4

We are wondering if it is because of our audit_entry_id's format (like
'adte1DDFEA5B011C8988C3928752'). Any inputs?
Thanks,
Anne


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Anne Rosset <arosset(at)collab(dot)net>
Cc: Richard Huxton <dev(at)archonet(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Need help with one query
Date: 2009-03-20 19:04:57
Message-ID: 603c8f070903201204s53f2a3cvbf9afc8649b2201a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Fri, Mar 20, 2009 at 1:16 PM, Anne Rosset <arosset(at)collab(dot)net> wrote:
> Richard Huxton wrote:
>> Anne Rosset wrote:
>>> EXPLAIN ANALYZE
>>> SELECT
>>>  audit_change.id             AS id,
>>>  audit_change.audit_entry_id AS auditEntryId,
>>>  audit_entry.object_id       AS objectId,
>>>  audit_change.property_name  AS propertyName,
>>>  audit_change.property_type  AS propertyType,
>>>  audit_change.old_value      AS oldValue,
>>>  audit_change.new_value      AS newValue,
>>>  audit_change.flexfield      AS flexField
>>> FROM
>>>  audit_entry audit_entry, audit_change audit_change
>>> WHERE
>>>  audit_change.audit_entry_id = audit_entry.id
>>>  AND audit_entry.object_id = 'artf414029';
>>>
>>
>> [query reformatted to make it more readable]
>>
>> Not quite clear why you are aliasing the tables to their own names...
>>
>>
>>>
>>>
>>> ---------------------------------------------------------------------------------------------------------------------------------------------
>>>
>>> Hash Join  (cost=8.79..253664.55 rows=4 width=136) (actual
>>> time=4612.674..6683.158 rows=4 loops=1)
>>>  Hash Cond: ((audit_change.audit_entry_id)::text =
>>> (audit_entry.id)::text)
>>>  ->  Seq Scan on audit_change  (cost=0.00..225212.52 rows=7584852
>>> width=123) (actual time=0.009..2838.216 rows=7584852 loops=1)
>>>  ->  Hash  (cost=8.75..8.75 rows=3 width=45) (actual time=0.049..0.049
>>> rows=4 loops=1)
>>>       ->  Index Scan using audit_entry_object on audit_entry
>>> (cost=0.00..8.75 rows=3 width=45) (actual time=0.033..0.042 rows=4 loops=1)
>>>             Index Cond: ((object_id)::text = 'artf414029'::text)
>>> Total runtime: 6683.220 ms
>>>
>>
>> Very odd. It knows the table is large and that the seq-scan is going to
>> be expensive.
>>
>> Try issuing "set enable_seqscan = off" and run the explain analyse
>> again. That should show the cost of using the indexes.
>>
>>
>
> With "set enable_seqscan = off":
>
> QUERY PLAN
> ------------------------------------------------------------------------------------------------------------------------------------------
> Nested Loop (cost=11.35..12497.53 rows=59 width=859) (actual
> time=46.074..49.742 rows=7 loops=1)
> -> Index Scan using audit_entry_pk on audit_entry (cost=0.00..7455.95
> rows=55 width=164) (actual time=45.940..49.541 rows=2 loops=1)
> Filter: ((object_id)::text = 'artf1024'::text)
> -> Bitmap Heap Scan on audit_change (cost=11.35..90.93 rows=59 width=777)
> (actual time=0.086..0.088 rows=4 loops=2)
> Recheck Cond: ((audit_change.audit_entry_id)::text = (audit_entry.id)::text)
> -> Bitmap Index Scan on audit_change_entry (cost=0.00..11.33 rows=59
> width=0) (actual time=0.076..0.076 rows=4 loops=2)
> Index Cond: ((audit_change.audit_entry_id)::text = (audit_entry.id)::text)
> Total runtime: 49.801 ms
>
>
> The db version is 8.2.4
>
> We are wondering if it is because of our audit_entry_id's format (like
> 'adte1DDFEA5B011C8988C3928752').  Any inputs?
> Thanks,
> Anne

Something is wrong here. How can setting enable_seqscan to off result
in a plan with a far lower estimated cost than the original plan? If
the planner thought the non-seq-scan plan is cheaper, it would have
picked that one to begin with.

...Robert


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Anne Rosset <arosset(at)collab(dot)net>, Richard Huxton <dev(at)archonet(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Need help with one query
Date: 2009-03-20 19:14:53
Message-ID: 20090320191453.GM8313@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Robert Haas escribió:

> Something is wrong here. How can setting enable_seqscan to off result
> in a plan with a far lower estimated cost than the original plan? If
> the planner thought the non-seq-scan plan is cheaper, it would have
> picked that one to begin with.

GEQO? Anne, what's geqo_threshold set to?

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Anne Rosset <arosset(at)collab(dot)net>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Richard Huxton <dev(at)archonet(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Need help with one query
Date: 2009-03-20 20:29:02
Message-ID: 49C3FC8E.9070004@collab.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Alvaro Herrera wrote:

>Robert Haas escribió:
>
>
>
>>Something is wrong here. How can setting enable_seqscan to off result
>>in a plan with a far lower estimated cost than the original plan? If
>>the planner thought the non-seq-scan plan is cheaper, it would have
>>picked that one to begin with.
>>
>>
>
>GEQO? Anne, what's geqo_threshold set to?
>
>
>
Hi Alvaro:
It is turned off
geqo | off | Enables genetic query optimization.
Thanks,
Anne


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Anne Rosset <arosset(at)collab(dot)net>, Richard Huxton <dev(at)archonet(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Need help with one query
Date: 2009-03-21 00:37:20
Message-ID: 12521.1237595840@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Fri, Mar 20, 2009 at 1:16 PM, Anne Rosset <arosset(at)collab(dot)net> wrote:
>> The db version is 8.2.4

> Something is wrong here. How can setting enable_seqscan to off result
> in a plan with a far lower estimated cost than the original plan?

Planner bug no doubt ... given how old the PG release is, I'm not
particularly interested in probing into it now. If Anne can still
reproduce this behavior on 8.2.something-recent, we should look closer.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Anne Rosset <arosset(at)collab(dot)net>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Richard Huxton <dev(at)archonet(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Need help with one query
Date: 2009-03-21 00:37:52
Message-ID: 603c8f070903201737u7543a26w62f265d2352318f5@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Fri, Mar 20, 2009 at 4:29 PM, Anne Rosset <arosset(at)collab(dot)net> wrote:
> Alvaro Herrera wrote:
>> Robert Haas escribió:
>>> Something is wrong here.  How can setting enable_seqscan to off result
>>> in a plan with a far lower estimated cost than the original plan?  If
>>> the planner thought the non-seq-scan plan is cheaper, it would have
>>> picked that one to begin with.
>>>
>>
>> GEQO?  Anne, what's geqo_threshold set to?
> Hi Alvaro:
> It is turned off
> geqo | off | Enables genetic query optimization.
> Thanks,
> Anne

Can you please send ALL your non-commented postgresql.conf lines?

...Robert


From: Anne Rosset <arosset(at)collab(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Richard Huxton <dev(at)archonet(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Need help with one query
Date: 2009-03-23 17:08:59
Message-ID: 49C7C22B.9050003@collab.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Tom Lane wrote:

>Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>
>
>>On Fri, Mar 20, 2009 at 1:16 PM, Anne Rosset <arosset(at)collab(dot)net> wrote:
>>
>>
>>>The db version is 8.2.4
>>>
>>>
>
>
>
>>Something is wrong here. How can setting enable_seqscan to off result
>>in a plan with a far lower estimated cost than the original plan?
>>
>>
>
>Planner bug no doubt ... given how old the PG release is, I'm not
>particularly interested in probing into it now. If Anne can still
>reproduce this behavior on 8.2.something-recent, we should look closer.
>
> regards, tom lane
>
>
Thanks Tom, Richard.
Here are our postgres conf :

shared_buffers = 2000MB
sort_mem = 150000
vacuum_mem = 100000
work_mem = 20MB # min 64kB
maintenance_work_mem = 256MB # min 1MB
max_fsm_pages = 204800
full_page_writes = off # recover from partial page writes
wal_buffers = 1MB # min 32kB
# (change requires restart)
commit_delay = 20000 # range 0-100000, in microseconds
commit_siblings = 3 # range 1-1000
checkpoint_segments = 128
checkpoint_warning = 240s
enable_indexscan = on
enable_mergejoin = on
enable_nestloop = off
random_page_cost = 2.0
effective_cache_size = 2500MB
geqo = off
default_statistics_target = 750
stats_command_string = on
update_process_title = on

stats_start_collector = on
stats_row_level = on

autovacuum = on
autovacuum_vacuum_threshold = 500 # min # of tuple updates before
# vacuum
autovacuum_analyze_threshold = 250 # min # of tuple updates before
# analyze
autovacuum_vacuum_scale_factor = 0.2 # fraction of rel size before
# vacuum
autovacuum_analyze_scale_factor = 0.1

Anne


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Anne Rosset <arosset(at)collab(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Richard Huxton <dev(at)archonet(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Need help with one query
Date: 2009-03-23 17:15:20
Message-ID: 603c8f070903231015h417f759cqd4519d24aefa6b5d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Mon, Mar 23, 2009 at 1:08 PM, Anne Rosset <arosset(at)collab(dot)net> wrote:
> enable_nestloop = off

That may be the source of your problem. Generally setting enable_* to
off is a debugging tool, not something you ever want to do in
production.

...Robert


From: Anne Rosset <arosset(at)collab(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Richard Huxton <dev(at)archonet(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Need help with one query
Date: 2009-03-23 19:56:11
Message-ID: 49C7E95B.3050502@collab.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Robert Haas wrote:

>On Mon, Mar 23, 2009 at 1:08 PM, Anne Rosset <arosset(at)collab(dot)net> wrote:
>
>
>>enable_nestloop = off
>>
>>
>
>That may be the source of your problem. Generally setting enable_* to
>off is a debugging tool, not something you ever want to do in
>production.
>
>...Robert
>
>
Thanks Robert. It seems to have solved the problem. Thanks a lot,

Anne