Re: Slow Query on Postgres 8.2

Lists: pgsql-performance
From: "Dave Dutcher" <dave(at)tridecap(dot)com>
To: <pgsql-performance(at)postgresql(dot)org>
Subject: Slow Query on Postgres 8.2
Date: 2007-01-04 23:31:43
Message-ID: 006801c73058$7eb7d2a0$8300a8c0@tridecap.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hello,

I am looking at upgrading from 8.1.2 to 8.2.0, and I've found a query which
runs a lot slower. Here is the query:

select type, currency_id, instrument_id, sum(amount) as total_amount from
om_transaction
where
strategy_id in
('BASKET1','BASKET2','BASKET3','BASKET4','BASKET5','BASKET6','BASKET7','BASK
ET8','BASKET9','BASKET10','BASKET11')
and owner_trader_id in ('dave','sam','bob','tad',
'tim','harry','frank','bart','lisa','homer','marge','maggie','apu','milhouse
','disco stu')
and cf_account_id in (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,29)
and as_of_date > '2006-12-04' and as_of_date <= '2006-12-05'
group by type, currency_id, instrument_id;

I changed the values in the in statements to fake ones, but it still takes
over three seconds on 8.2, where 8.1 only takes 26 milliseconds. When I
increase the number of valules in the IN clauses, the query rapidly gets
worse. I tried increasing my stats target to 1000 and analyzing, but that
didn't help so I put that back to 10. While the query is running the CPU is
at 100%. Is there a more efficient way to write a query like this? I've
attached the output from EXPLAIN ANALYZE in a file because it is somewhat
large.

Thanks,

Dave Dutcher
Telluride Asset Management
952.653.6411


Attachment Content-Type Size
plans.txt text/plain 10.4 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Dave Dutcher" <dave(at)tridecap(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Slow Query on Postgres 8.2
Date: 2007-01-05 01:12:49
Message-ID: 7598.1167959569@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

"Dave Dutcher" <dave(at)tridecap(dot)com> writes:
> I am looking at upgrading from 8.1.2 to 8.2.0, and I've found a query which
> runs a lot slower.

Um ... what indexes has this table got exactly? It's very unclear what
alternatives the planner is being faced with.

regards, tom lane


From: "Adam Rich" <adam(dot)r(at)sbcglobal(dot)net>
To: "'Dave Dutcher'" <dave(at)tridecap(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Slow Query on Postgres 8.2
Date: 2007-01-05 01:19:25
Message-ID: 00f501c73067$89f2bd60$6400a8c0@dualcore
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Dave,
Is it me or are the two examples you attached returning different row
counts?
That means either the source data is different, or your queries are.

-----Original Message-----
From: pgsql-performance-owner(at)postgresql(dot)org
[mailto:pgsql-performance-owner(at)postgresql(dot)org] On Behalf Of Dave
Dutcher
Sent: Thursday, January 04, 2007 5:32 PM
To: pgsql-performance(at)postgresql(dot)org
Subject: [PERFORM] Slow Query on Postgres 8.2

Hello,

I am looking at upgrading from 8.1.2 to 8.2.0, and I've found a query
which runs a lot slower. Here is the query:

select type, currency_id, instrument_id, sum(amount) as total_amount
from om_transaction
where
strategy_id in
('BASKET1','BASKET2','BASKET3','BASKET4','BASKET5','BASKET6','BASKET7','
BASKET8','BASKET9','BASKET10','BASKET11')
and owner_trader_id in ('dave','sam','bob','tad',
'tim','harry','frank','bart','lisa','homer','marge','maggie','apu','milh
ouse','disco stu')
and cf_account_id in (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,29)
and as_of_date > '2006-12-04' and as_of_date <= '2006-12-05'
group by type, currency_id, instrument_id;

I changed the values in the in statements to fake ones, but it still
takes over three seconds on 8.2, where 8.1 only takes 26 milliseconds.
When I increase the number of valules in the IN clauses, the query
rapidly gets worse. I tried increasing my stats target to 1000 and
analyzing, but that didn't help so I put that back to 10. While the
query is running the CPU is at 100%. Is there a more efficient way to
write a query like this? I've attached the output from EXPLAIN ANALYZE
in a file because it is somewhat large.

Thanks,

Dave Dutcher
Telluride Asset Management
952.653.6411


From: "Dave Dutcher" <dave(at)tridecap(dot)com>
To: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Slow Query on Postgres 8.2
Date: 2007-01-05 01:51:16
Message-ID: 007901c7306b$fd1d2c90$8300a8c0@tridecap.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

> -----Original Message-----
> From: pgsql-performance-owner(at)postgresql(dot)org
> [mailto:pgsql-performance-owner(at)postgresql(dot)org] On Behalf Of Tom Lane
>
> Um ... what indexes has this table got exactly? It's very
> unclear what
> alternatives the planner is being faced with.
>

Here is the table definition. Thanks.

Table "public.om_transaction"
Column | Type | Modifiers
-----------------+------------------------+---------------------------------
--------
transaction_id | character varying(20) | not null default '0'::character
varying
type | character varying(20) | not null default ''::character
varying
fund_id | character varying(10) | not null default ''::character
varying
owner_trader_id | character varying(10) | not null default ''::character
varying
strategy_id | character varying(30) | not null default ''::character
varying
instrument_id | integer | default 0
cf_account_id | integer | not null default 0
as_of_date | date | not null default
'0001-01-01'::date
insert_date | date | not null default
'0001-01-01'::date
amount | numeric(22,9) | not null default 0.000000000
currency_id | integer | not null default 0
process_state | integer | not null
comment | character varying(256) | default ''::character varying
Indexes:
"om_transaction_pkey" PRIMARY KEY, btree (transaction_id)
"cf_account_id_om_transaction_index" btree (cf_account_id)
"currency_id_om_transaction_index" btree (currency_id)
"fund_id_om_transaction_index" btree (fund_id)
"instrument_id_om_transaction_index" btree (instrument_id)
"om_transaction_om_transaction_index" btree (as_of_date, fund_id,
strategy_id, owner_trader_id, cf_account_id, instrument_id, "type")
"om_transaction_partial_process_state_index" btree (process_state) WHERE
process_state = 0
"owner_trader_id_om_transaction_index" btree (owner_trader_id)
"strategy_id_om_transaction_index" btree (strategy_id)
Foreign-key constraints:
"$1" FOREIGN KEY (owner_trader_id) REFERENCES om_trader(trader_id)
"$2" FOREIGN KEY (fund_id) REFERENCES om_fund(fund_id)
"$3" FOREIGN KEY (strategy_id) REFERENCES om_strategy(strategy_id)
"$4" FOREIGN KEY (cf_account_id) REFERENCES om_cf_account(id)
"$5" FOREIGN KEY (instrument_id) REFERENCES om_instrument(id)
"$6" FOREIGN KEY (currency_id) REFERENCES om_instrument(id)


From: "Dave Dutcher" <dave(at)tridecap(dot)com>
To: "'Adam Rich'" <adam(dot)r(at)sbcglobal(dot)net>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Slow Query on Postgres 8.2
Date: 2007-01-05 02:01:12
Message-ID: 007e01c7306d$60ccc6f0$8300a8c0@tridecap.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

The source data is a little different. The fast query was on our production
8.1 server, and the other was a test 8.2 server with day old data. The
production server has like 3.84 million rows vs 3.83 million rows in test,
so the statistics might be a little different, but I would figure the
compairison is still valid.

-----Original Message-----
From: pgsql-performance-owner(at)postgresql(dot)org
[mailto:pgsql-performance-owner(at)postgresql(dot)org] On Behalf Of Adam Rich
Sent: Thursday, January 04, 2007 7:19 PM
To: 'Dave Dutcher'; pgsql-performance(at)postgresql(dot)org
Subject: Re: [PERFORM] Slow Query on Postgres 8.2

Dave,
Is it me or are the two examples you attached returning different row
counts?
That means either the source data is different, or your queries are.

-----Original Message-----
From: pgsql-performance-owner(at)postgresql(dot)org
[mailto:pgsql-performance-owner(at)postgresql(dot)org] On Behalf Of Dave Dutcher
Sent: Thursday, January 04, 2007 5:32 PM
To: pgsql-performance(at)postgresql(dot)org
Subject: [PERFORM] Slow Query on Postgres 8.2

Hello,

I am looking at upgrading from 8.1.2 to 8.2.0, and I've found a query which
runs a lot slower. Here is the query:

select type, currency_id, instrument_id, sum(amount) as total_amount from
om_transaction
where
strategy_id in
('BASKET1','BASKET2','BASKET3','BASKET4','BASKET5','BASKET6','BASKET7','BASK
ET8','BASKET9','BASKET10','BASKET11')
and owner_trader_id in ('dave','sam','bob','tad',
'tim','harry','frank','bart','lisa','homer','marge','maggie','apu','milhouse
','disco stu')
and cf_account_id in (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,29)
and as_of_date > '2006-12-04' and as_of_date <= '2006-12-05'
group by type, currency_id, instrument_id;

I changed the values in the in statements to fake ones, but it still takes
over three seconds on 8.2, where 8.1 only takes 26 milliseconds. When I
increase the number of valules in the IN clauses, the query rapidly gets
worse. I tried increasing my stats target to 1000 and analyzing, but that
didn't help so I put that back to 10. While the query is running the CPU is
at 100%. Is there a more efficient way to write a query like this? I've
attached the output from EXPLAIN ANALYZE in a file because it is somewhat
large.

Thanks,

Dave Dutcher
Telluride Asset Management
952.653.6411


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Dave Dutcher" <dave(at)tridecap(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Slow Query on Postgres 8.2
Date: 2007-01-05 03:04:05
Message-ID: 8870.1167966245@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

"Dave Dutcher" <dave(at)tridecap(dot)com> writes:
> Here is the table definition. Thanks.

[ fools around with it for awhile... ] I think this is already fixed
for 8.2.1. Note the costs of the two related index scans:

8.2.0:
-> Bitmap Index Scan on om_transaction_om_transaction_index (cost=0.00..7421.67 rows=488 width=0) (actual time=3411.227..3411.227 rows=0 loops=1)
Index Cond: ((as_of_date > '2006-12-04'::date) AND (as_of_date <= '2006-12-05'::date) AND ((strategy_id)::text = ANY (('{BASKET1,BASKET2,BASKET3,BASKET4,BASKET5,BASKET6,BASKET7,BASKET8,BASKET9,BASKET10,BASKET11}'::character varying[])::text[])) AND ((owner_trader_id)::text = ANY (('{dave,sam,bob,tad,tim,harry,frank,bart,lisa,homer,marge,maggie,apu,milhouse,"disco stu"}'::character varying[])::text[])) AND (cf_account_id = ANY ('{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,29}'::integer[])))

8.1.2:
-> Bitmap Index Scan on om_transaction_om_transaction_index (cost=0.00..101.69 rows=5949 width=0) (actual time=3.419..3.419 rows=7967 loops=1)
Index Cond: ((as_of_date > '2006-12-04'::date) AND (as_of_date <= '2006-12-05'::date))

8.1.2 returns a lot more rows but spends a lot less time doing it.
The reason is that using all those =ANY clauses as index quals is
*expensive* --- they actually trigger multiple scans of the index.
8.2.0 is underestimating their cost. We fixed that a couple weeks ago
(after some reports from Arjen van der Meijden) and I can't actually get
8.2 branch tip to produce a plan like what you show.

Please try it again when 8.2.1 comes out (Monday) and we'll see if
there's any more tweaking needed.

BTW, it's interesting to note that the plan 8.1.2 produces is pretty
obviously bogus in itself ... why do only the first two arms of the
BitmapOr use as_of_date conditions? We fixed some sillinesses in the
bitmap scan planning later in the 8.1 series, so I think you'd find
that 8.1.latest does this differently.

regards, tom lane


From: "Dave Dutcher" <dave(at)tridecap(dot)com>
To: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Slow Query on Postgres 8.2
Date: 2007-01-08 18:59:14
Message-ID: 012601c73357$1736ff00$8300a8c0@tridecap.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

> -----Original Message-----
> From: pgsql-performance-owner(at)postgresql(dot)org
> [mailto:pgsql-performance-owner(at)postgresql(dot)org] On Behalf Of Tom Lane
>
> [ fools around with it for awhile... ] I think this is already fixed
> for 8.2.1. Note the costs of the two related index scans:

I installed 8.2.1 this morning and it works much better. The query that was
taking 3411.429ms on 8.2.0 now takes 9.3ms. Thanks for your help.