Re: query plan wierdness?

Lists: pgsql-performance
From: Joel McGraw <jmcgraw(at)eldocomp(dot)com>
To: Rod Taylor <pg(at)rbt(dot)ca>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: query plan wierdness?
Date: 2004-07-12 23:54:07
Message-ID: 7B3E33EF2A10A84185E3667F6B9A1B781A068A@ECIEXCHANGE.eldocomp.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

>
> Considering you're pulling out 450k rows in 8 seconds, I'd also guess
> the data is mostly in memory. Is that normal? Or is this a result of
> having run several test queries against the same data multiple times?
>

Ah yes, that would have been the result of running the query several
times...

Oddly enough, I put the same database on a different machine, and the
query now behaves as I hoped all along. Notice that I'm using the
"real" query, with the aspid in asc and the other fields in desc order,
yet the query does use the call_idx13 index:

csitech=# explain analyse select * from call where aspid='123C' and
OpenedDateTime between '2000-01-01 00:00:00.0' and '2004-06-24
23:59:59.999' order by aspid, openeddatetime desc, callstatus desc,
calltype desc, callkey desc;

QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
----------------------------------------------------------
Sort (cost=60.01..60.05 rows=14 width=696) (actual
time=42393.56..43381.85 rows=510705 loops=1)
Sort Key: aspid, openeddatetime, callstatus, calltype, callkey
-> Index Scan using call_idx13 on call (cost=0.00..59.74 rows=14
width=696) (actual time=0.33..19679.01 rows=510705 loops=1)
Index Cond: ((aspid = '123C'::bpchar) AND (openeddatetime >=
'2000-01-01 00:00:00-07'::timestamp with time zone) AND (openeddatetime
<= '2004-06-24 23:59:59.999-07'::timestamp with time zone))
Total runtime: 43602.05 msec

FWIW, this is different hardware (Solaris 9/Sparc), but the same version
of Postgres (7.3.4). The data is a superset of the data in the other
database (they are both snapshots taken from production).

I dropped and recreated the index on the other (Linux) machine, ran
vacuum analyse, then tried the query again. It still performs a
sequence scan on the call table. :(

>
> Any chance you could put together a test case demonstrating the above
> behaviour? Everything from CREATE TABLE, through dataload to the
EXPLAIN
> ANALYZE.

Forgive me for being thick: what exactly would be involved? Due to
HIPAA regulations, I cannot "expose" any of the data.

<background>
I hesitated to bring this up because I wanted to focus on the technical
issues rather than have this degenerate into a religious war. The chief
developer in charge of the project brought this query to my attention.
He has a fair amount of political sway in the company, and is now
lobbying to switch to MySQL because he maintains that PostgreSQL is
broken and/or too slow for our needs. He has apparently benchmarked the
same query using MySQL and gotten much more favorable results (I have
been unable to corroborate this yet).
</background>

-Joel

-- CONFIDENTIALITY NOTICE --

This message is intended for the sole use of the individual and entity to whom it is addressed, and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you are not the intended addressee, nor authorized to receive for the intended addressee, you are hereby notified that you may not use, copy, disclose or distribute to anyone the message or any information contained in the message. If you have received this message in error, please immediately advise the sender by reply email, and delete the message. Thank you.


From: Rod Taylor <pg(at)rbt(dot)ca>
To: Joel McGraw <jmcgraw(at)eldocomp(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: query plan wierdness?
Date: 2004-07-13 02:06:42
Message-ID: 1089682228.44991.20.camel@jester
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

> Oddly enough, I put the same database on a different machine, and the
> query now behaves as I hoped all along. Notice that I'm using the
> "real" query, with the aspid in asc and the other fields in desc order,
> yet the query does use the call_idx13 index:

Notice that while it only takes 19 seconds to pull the data out of the
table, it is spending 30 seconds sorting it -- so the index scan isn't
buying you very much.

Try it again with ORDER BY ascid DESC and you should get the query down
to 20 seconds in total on that Sparc; so I wouldn't call it exactly what
you wanted.

he decision about whether to use an index or not, is borderline. And as
you've shown they take approximately the same amount of time. Use of an
index will not necessarily be faster than a sequential scan -- but the
penalty for accidentally selecting one when it shouldn't have is much
higher.

> > Any chance you could put together a test case demonstrating the above
> > behaviour? Everything from CREATE TABLE, through dataload to the
> EXPLAIN
> > ANALYZE.
>
>
> Forgive me for being thick: what exactly would be involved? Due to
> HIPAA regulations, I cannot "expose" any of the data.

Of course. But that doesn't mean you couldn't create table different
name and muck around with the values. But you're getting what you want,
so it isn't a problem anymore.

> <background>
> I hesitated to bring this up because I wanted to focus on the technical
> issues rather than have this degenerate into a religious war. The chief
> developer in charge of the project brought this query to my attention.
> He has a fair amount of political sway in the company, and is now
> lobbying to switch to MySQL because he maintains that PostgreSQL is
> broken and/or too slow for our needs. He has apparently benchmarked the
> same query using MySQL and gotten much more favorable results (I have
> been unable to corroborate this yet).
> </background>

I wouldn't be surprised if MySQL did run this single query faster with
nothing else going on during that time. MySQL was designed primarily
with a single user in mind, but it is unlikely this will be your
production situation so the benchmark is next to useless.

Connect 50 clients to the databases running this (and a mixture of other
selects) while another 20 clients are firing off updates, inserts,
deletes on these and other structures -- or whatever matches your full
production load.

This is what PostgreSQL (and a number of other DBs) are designed for,
typical production loads.