Re: vacuum_mem

Lists: pgsql-performance
From: Joel McGraw <jmcgraw(at)eldocomp(dot)com>
To: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: query plan wierdness?
Date: 2004-07-07 21:27:27
Message-ID: 7B3E33EF2A10A84185E3667F6B9A1B781A0679@ECIEXCHANGE.eldocomp.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance


Can someone explain what I'm missing here? This query does what I
expect--it uses the "foo" index on the openeddatetime, callstatus,
calltype, callkey fields:

elon2=# explain analyse select * from call where aspid='123C' and
OpenedDateTime between '2000-01-01 00:00:00.0' and '2004-06-24
23:59:59.999' order by openeddatetime desc, callstatus desc, calltype
desc, callkey desc limit 26;

QUERY PLAN

------------------------------------------------------------------------
------------------------------------------------------------------------
-----------------------------
Limit (cost=0.00..103.76 rows=26 width=297) (actual time=0.07..0.58
rows=26 loops=1)
-> Index Scan Backward using foo on call (cost=0.00..1882805.77
rows=471781 width=297) (actual time=0.06..0.54 rows=27 loops=1)
Index Cond: ((openeddatetime >= '2000-01-01
00:00:00-07'::timestamp with time zone) AND (openeddatetime <=
'2004-06-24 23:59:59.999-07'::timestamp with time zone))
Filter: (aspid = '123C'::bpchar)
Total runtime: 0.66 msec
(5 rows)

However, this query performs a sequence scan on the table, ignoring the
call_idx13 index (the only difference is the addition of the aspid field
in the order by clause):

elon2=# explain analyse select * from call where aspid='123C' and
OpenedDateTime between '2000-01-01 00:00:00.0' and '2004-06-24
23:59:59.999' order by aspid, openeddatetime desc, callstatus desc,
calltype desc, callkey desc limit 26;

QUERY PLAN

------------------------------------------------------------------------
------------------------------------------------------------------------
------------------------------------------------------------
Limit (cost=349379.41..349379.48 rows=26 width=297) (actual
time=32943.52..32943.61 rows=26 loops=1)
-> Sort (cost=349379.41..350558.87 rows=471781 width=297) (actual
time=32943.52..32943.56 rows=27 loops=1)
Sort Key: aspid, openeddatetime, callstatus, calltype, callkey
-> Seq Scan on call (cost=0.00..31019.36 rows=471781
width=297) (actual time=1.81..7318.13 rows=461973 loops=1)
Filter: ((aspid = '123C'::bpchar) AND (openeddatetime >=
'2000-01-01 00:00:00-07'::timestamp with time zone) AND (openeddatetime
<= '2004-06-24 23:59:59.999-07'::timestamp with time zone))
Total runtime: 39353.86 msec
(6 rows)

Here's the structure of the table in question:

Table "public.call"
Column | Type | Modifiers
------------------+--------------------------+-----------
aspid | character(4) |
lastmodifiedtime | timestamp with time zone |
moduser | character(13) |
callkey | character(13) |
calltype | text |
callqueueid | text |
openeddatetime | timestamp with time zone |
assigneddatetime | timestamp with time zone |
closeddatetime | timestamp with time zone |
reopeneddatetime | timestamp with time zone |
openedby | text |
callstatus | character(1) |
callpriority | text |
callreasontext | text |
keyword1 | text |
keyword2 | text |
callername | text |
custfirstname | text |
custlastname | text |
custssntin | character(9) |
custssnseq | text |
custdbccode | character(9) |
custlongname | text |
custtypecode | character(2) |
custphone | text |
custid | character(9) |
assigneduserid | character varying(30) |
historyitemcount | integer |
callertype | text |
callerphoneext | text |
followupdate | text |
hpjobnumber | character(11) |
Indexes: call_idx1 unique btree (aspid, callkey),
call_aspid btree (aspid),
call_aspid_opendedatetime btree (aspid, openeddatetime),
call_idx10 btree (aspid, keyword1, openeddatetime, callstatus,
calltype
, custtypecode, custid, callkey),
call_idx11 btree (aspid, keyword2, openeddatetime, callstatus,
calltype
, custtypecode, custid, callkey),
call_idx12 btree (aspid, custtypecode, custid, openeddatetime,
callstat
us, calltype, callkey),
call_idx13 btree (aspid, openeddatetime, callstatus, calltype,
callkey),
call_idx14 btree (aspid, callqueueid, callstatus, callkey),
call_idx2 btree (aspid, callqueueid, openeddatetime,
custtypecode, call
status, callkey),
call_idx3 btree (aspid, assigneduserid, openeddatetime,
custtypecode, c
allstatus, callkey),
call_idx4 btree (aspid, custid, custtypecode, callkey,
callstatus),
call_idx7 btree (aspid, calltype, custtypecode, custid,
callstatus, cal
lkey),
call_idx9 btree (aspid, assigneduserid, callstatus,
followupdate),
foo btree (openeddatetime, callstatus, calltype, callkey)

TIA,

-Joel

-- CONFIDENTIALITY NOTICE --

This message is intended for the sole use of the individual and entity to whom it is addressed, and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you are not the intended addressee, nor authorized to receive for the intended addressee, you are hereby notified that you may not use, copy, disclose or distribute to anyone the message or any information contained in the message. If you have received this message in error, please immediately advise the sender by reply email, and delete the message. Thank you.


From: Guido Barosio <gbarosio(at)sinectis(dot)com(dot)ar>
To: Joel McGraw <jmcgraw(at)eldocomp(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: query plan wierdness?
Date: 2004-07-07 21:45:42
Message-ID: 20040707184542.515a73f5.gbarosio@sinectis.com.ar
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

The limit is tricking you.
I guess a sequential scan is cheaper than an index scan with the limit 26 found there.

I am wrong?

Greets

--
-------------------------------------------
Guido Barosio
Buenos Aires, Argentina
-------------------------------------------


From: Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>
To: Joel McGraw <jmcgraw(at)eldocomp(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: query plan wierdness?
Date: 2004-07-07 22:50:31
Message-ID: 20040707154849.J63760@megazone.bigpanda.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Wed, 7 Jul 2004, Joel McGraw wrote:

> However, this query performs a sequence scan on the table, ignoring the
> call_idx13 index (the only difference is the addition of the aspid field
> in the order by clause):
>
> elon2=# explain analyse select * from call where aspid='123C' and
> OpenedDateTime between '2000-01-01 00:00:00.0' and '2004-06-24
> 23:59:59.999' order by aspid, openeddatetime desc, callstatus desc,
> calltype desc, callkey desc limit 26;
>
> QUERY PLAN
>
> ------------------------------------------------------------------------
> ------------------------------------------------------------------------
> ------------------------------------------------------------
> Limit (cost=349379.41..349379.48 rows=26 width=297) (actual
> time=32943.52..32943.61 rows=26 loops=1)
> -> Sort (cost=349379.41..350558.87 rows=471781 width=297) (actual
> time=32943.52..32943.56 rows=27 loops=1)
> Sort Key: aspid, openeddatetime, callstatus, calltype, callkey
> -> Seq Scan on call (cost=0.00..31019.36 rows=471781
> width=297) (actual time=1.81..7318.13 rows=461973 loops=1)
> Filter: ((aspid = '123C'::bpchar) AND (openeddatetime >=
> '2000-01-01 00:00:00-07'::timestamp with time zone) AND (openeddatetime
> <= '2004-06-24 23:59:59.999-07'::timestamp with time zone))
> Total runtime: 39353.86 msec
> (6 rows)

Hmm, what does it say after a set enable_seqscan=off?

Also, what does it say if you use aspid desc rather than just aspid in the
order by?


From: Rod Taylor <pg(at)rbt(dot)ca>
To: Joel McGraw <jmcgraw(at)eldocomp(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: query plan wierdness?
Date: 2004-07-08 16:50:21
Message-ID: 1089305420.5999.208.camel@jester
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance


> However, this query performs a sequence scan on the table, ignoring the
> call_idx13 index (the only difference is the addition of the aspid field
> in the order by clause):

You do not have an index which matches the ORDER BY, so PostgreSQL
cannot simply scan the index for the data you want. Thus is needs to
find all matching rows, order them, etc.

> 23:59:59.999' order by aspid, openeddatetime desc, callstatus desc,
> calltype desc, callkey desc limit 26;

aspid ASC, openeddatetime DESC, callstatus DESC, calltype DESC

> call_idx13 btree (aspid, openeddatetime, callstatus, calltype,
> callkey),

This index is: aspid ASC, openeddatetime ASC, callstatus ASC, calltype
ASC, callkey ASC

A reverse scan, would of course be DESC, DESC, DESC, DESC, DESC --
neither of which matches your requested order by, thus cannot help the
reduce the lines looked at to 26.

This leaves your WHERE clause to restrict the dataset and it doesn't do
a very good job of it. There are more than 450000 rows matching the
where clause, which means the sequential scan was probably the right
choice (unless you have over 10 million entries in the table).

Since your WHERE clause contains a single aspid, an improvement to the
PostgreSQL optimizer may be to ignore that field in the ORDER BY as
order is no longer important since there is only one possible value. If
it did ignore aspid, it would use a plan similar to the first one you
provided.

You can accomplish the same thing by leaving out aspid ASC OR by setting
it to aspid DESC in the ORDER BY. Leaving it out entirely will be
slightly faster, but DESC will cause PostgreSQL to use index
"call_idx13".


From: Litao Wu <litaowu(at)yahoo(dot)com>
To: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: vacuum_mem
Date: 2004-07-08 18:03:43
Message-ID: 20040708180343.88032.qmail@web13121.mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hi,

I tested vacuum_mem setting under a
4CPU and 4G RAM machine. I am the only person
on that machine.

The table:
tablename | size_kb | reltuples
---------------------------+-------------------------
big_t | 2048392 | 7.51515e+06

Case 1:
1. vacuum full big_t;
2. begin;
update big_t set email = lpad('a', 255, 'b');
rollback;
3. set vacuum_mem=655360; -- 640M
4. vacuum big_t;
It takes 1415,375 ms
Also from top, the max SIZE is 615M while
SHARE is always 566M

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM
TIME COMMAND
5914 postgres 16 0 615M 615M 566M D 7.5 15.8
21:21 postgres: postgres mydb xxx.xxx.xxx.xxx:34361
VACUUM

Case 2:
1. vacuum full big_t;
2. begin;
update big_t set email = lpad('a', 255, 'b');
rollback;
3. set vacuum_mem=65536; -- 64M
4. vacuum big_t;
It takes 1297,798 ms
Also from top, the max SIZE is 615M while
SHARE is always 566M

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM
TIME COMMAND
3613 postgres 15 0 615M 615M 566M D 17.1 15.8
9:04 postgres: postgres mydb xxx.xxx.xxx.xxx:34365
VACUUM

It seems vacuum_mem does not have performance
effect at all.

In reality, we vaccum nightly and I want to find out
which vacuum_mem value is the
best to short vacuum time.

Any thoughts?

Thanks,


__________________________________
Do you Yahoo!?
New and Improved Yahoo! Mail - Send 10MB messages!
http://promotions.yahoo.com/new_mail


From: Rod Taylor <pg(at)rbt(dot)ca>
To: Litao Wu <litaowu(at)yahoo(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: vacuum_mem
Date: 2004-07-08 18:25:30
Message-ID: 1089311129.5999.219.camel@jester
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

> It seems vacuum_mem does not have performance
> effect at all.

Wrong conclusion. It implies that your test case takes less than 64M of
memory to track your removed tuples. I think it takes 8 bytes to track a
tuple for vacuuming an index, which means it should be able to track
800000 deletions. Since you're demonstration had 750000 for removal,
it's under the limit.

Try your test again with 32MB; it should make a single sequential pass
on the table, and 2 passes on each index for that table.

Either that, or do a few more aborted updates.