Re: specific query (not all) on Pg8 MUCH slower than Pg7

Lists: pgsql-performance
From: Susan Russo <russo(at)morgan(dot)harvard(dot)edu>
To: pgsql-performance(at)postgresql(dot)org
Cc: harvsys(at)morgan(dot)harvard(dot)edu, russo(at)morgan(dot)harvard(dot)edu
Subject: specific query (not all) on Pg8 MUCH slower than Pg7
Date: 2007-05-08 14:18:34
Message-ID: 200705081418.l48EIY8V026360@larrys.harvard.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance


Hi,

Despite numerous efforts, we're unable to solve a severe performance limitation between Pg 7.3.2
and Pg 8.1.4.

The query and 'explain analyze' plan below, runs in
26.20 msec on Pg 7.3.2, and
2463.968 ms on Pg 8.1.4,

and the Pg7.3.2 is on older hardware and OS.

Multiply this time difference by >82K, and a 10 minute procedure (which includes
this query), now runs in 10 *hours*.....not good....

In general, however, we're pleased with performance of this very same Pg8.1.4 server
as compared to the Pg7.3.2 server (loading/dumping, and other queries are much faster).

QUERY:

SELECT dx.db_id, dx.accession, f.uniquename, f.name, cvt.name as ntype,
fd.is_current
from feature f, feature_dbxref fd, dbxref dx, cvterm cvt
where fd.dbxref_id = dx.dbxref_id
and fd.feature_id = f.feature_id
and f.type_id = cvt.cvterm_id
and accession like 'AY851043%'
and cvt.name not in ('gene','protein','natural_transposable_element','chromosome_structure_variation','chromosome_arm','repeat_region')
;

explain analyze output on Pg7.3.2:

-----------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..23.45 rows=1 width=120) (actual time=25.59..25.59 rows=0 loops=1)
-> Nested Loop (cost=0.00..17.49 rows=1 width=82) (actual time=25.58..25.58 rows=0 loops=1)
-> Nested Loop (cost=0.00..11.93 rows=1 width=30) (actual time=25.58..25.58 rows=0 loops=1)
-> Index Scan using dbxref_idx2 on dbxref dx (cost=0.00..5.83 rows=1 width=21) (actual time=25.58..25.58 rows=0 loops=1)
Index Cond: ((accession >= 'AY851043'::character varying) AND (accession < 'AY851044'::character varying))
Filter: (accession ~~ 'AY851043%'::text)
-> Index Scan using feature_dbxref_idx2 on feature_dbxref fd (cost=0.00..6.05 rows=5 width=9) (never executed)
Index Cond: (fd.dbxref_id = "outer".dbxref_id)
-> Index Scan using feature_pkey on feature f (cost=0.00..5.54 rows=1 width=52) (never executed)
Index Cond: ("outer".feature_id = f.feature_id)
-> Index Scan using cvterm_pkey on cvterm cvt (cost=0.00..5.94 rows=1 width=38) (never executed)
Index Cond: ("outer".type_id = cvt.cvterm_id)
Filter: ((name <> 'gene'::character varying) AND (name <> 'protein'::character varying) AND (name <> 'natural_transposable_element'::character varying) AND (name <> 'chromosome_structure_variation'::character varying) AND (name <> 'chromosome_arm'::character varying) AND (name <> 'repeat_region'::character varying))
Total runtime: 26.20 msec
(14 rows)

========

explain analyze output on Pg8.1.4:

-----------------------------------------------------------------
Nested Loop (cost=0.00..47939.87 rows=1 width=108) (actual time=2463.654..2463.654 rows=0 loops=1)
-> Nested Loop (cost=0.00..47933.92 rows=1 width=73) (actual time=2463.651..2463.651 rows=0 loops=1)
-> Nested Loop (cost=0.00..47929.86 rows=1 width=22) (actual time=2463.649..2463.649 rows=0 loops=1)
-> Seq Scan on dbxref dx (cost=0.00..47923.91 rows=1 width=21) (actual time=2463.646..2463.646 rows=0 loops=1)
Filter: ((accession)::text ~~ 'AY851043%'::text)
-> Index Scan using feature_dbxref_idx2 on feature_dbxref fd (cost=0.00..5.90 rows=4 width=9) (never executed)
Index Cond: (fd.dbxref_id = "outer".dbxref_id)
-> Index Scan using feature_pkey on feature f (cost=0.00..4.05 rows=1 width=59) (never executed)
Index Cond: ("outer".feature_id = f.feature_id)
-> Index Scan using cvterm_pkey on cvterm cvt (cost=0.00..5.94 rows=1 width=43) (never executed)
Index Cond: ("outer".type_id = cvt.cvterm_id)
Filter: (((name)::text <> 'gene'::text) AND ((name)::text <> 'protein'::text) AND ((name)::text <> 'natural_transposable_element'::text) AND ((name)::text <> 'chromosome_structure_variation'::text) AND ((name)::text <> 'chromosome_arm'::text) AND ((name)::text <> 'repeat_region'::text))
Total runtime: 2463.968 ms
(13 rows)

=======

I tried tuning configs, including shutting off enable seqscan, forcing use of index (set shared_buffers high
with random_page_cost set low). A colleague who gets 1697ms on Pg8.1.4 with this query provided his
postgresql.conf -- didn't help....

We use standard dump/load commands between these servers:
pg_dump -O fb_2007_01_05 | compress > fb_2007_01_05.Z
uncompress -c fb_2007_01_05.Z | psql fb_2007_01_05

Hardware/OS specs:
- Pg7.3.2: SunFire 280R, 900mHz SPARC processor, 3gb total RAM, 10Krpm SCSI internal disks, Solaris 2.8
- Pg8.1.4: v240 - dual Ultra-SPARC IIIi 1500MHz SPARC processor, 8GB total RAM, Solaris 2.10
(used both Sun-supplied postgres binaries, and compiled postgres from source)

Thanks for your help,
Susan Russo


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Susan Russo <russo(at)morgan(dot)harvard(dot)edu>
Cc: pgsql-performance(at)postgresql(dot)org, harvsys(at)morgan(dot)harvard(dot)edu
Subject: Re: specific query (not all) on Pg8 MUCH slower than Pg7
Date: 2007-05-08 14:48:25
Message-ID: 1160.1178635705@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Susan Russo <russo(at)morgan(dot)harvard(dot)edu> writes:
> Despite numerous efforts, we're unable to solve a severe performance limitation between Pg 7.3.2
> and Pg 8.1.4.

> The query and 'explain analyze' plan below, runs in
> 26.20 msec on Pg 7.3.2, and
> 2463.968 ms on Pg 8.1.4,

You're not getting the indexscan optimization of the LIKE clause, which
is most likely due to having initdb'd the 8.1 installation in something
other than C locale. You can either redo the initdb in C locale (which
might be a good move to fix other inconsistencies from the 7.3 behavior
you're used to) or create a varchar_pattern_ops index on the column(s)
you're using LIKE with.

regards, tom lane


From: "Steinar H(dot) Gunderson" <sgunderson(at)bigfoot(dot)com>
To: Susan Russo <russo(at)morgan(dot)harvard(dot)edu>
Cc: pgsql-performance(at)postgresql(dot)org, harvsys(at)morgan(dot)harvard(dot)edu
Subject: Re: specific query (not all) on Pg8 MUCH slower than Pg7
Date: 2007-05-08 14:48:34
Message-ID: 20070508144834.GA5678@uio.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Tue, May 08, 2007 at 10:18:34AM -0400, Susan Russo wrote:
> explain analyze output on Pg7.3.2:
>
> -> Index Scan using dbxref_idx2 on dbxref dx (cost=0.00..5.83 rows=1 width=21) (actual time=25.58..25.58 rows=0 loops=1)
> Index Cond: ((accession >= 'AY851043'::character varying) AND (accession < 'AY851044'::character varying))
> Filter: (accession ~~ 'AY851043%'::text)
>
> explain analyze output on Pg8.1.4:
>
> -> Seq Scan on dbxref dx (cost=0.00..47923.91 rows=1 width=21) (actual time=2463.646..2463.646 rows=0 loops=1)
> Filter: ((accession)::text ~~ 'AY851043%'::text)

This is almost all of your cost. Did you perchance initdb the 8.1.4 cluster
in a non-C locale? You could always try

CREATE INDEX test_index ON dbxref (accession varchar_pattern_ops);

which would create an index that might be more useful for your LIKE query,
even in a non-C locale.

/* Steinar */
--
Homepage: http://www.sesse.net/


From: Richard Broersma Jr <rabroersma(at)yahoo(dot)com>
To: Susan Russo <russo(at)morgan(dot)harvard(dot)edu>, pgsql-performance(at)postgresql(dot)org
Cc: harvsys(at)morgan(dot)harvard(dot)edu, russo(at)morgan(dot)harvard(dot)edu
Subject: Re: specific query (not all) on Pg8 MUCH slower than Pg7
Date: 2007-05-08 14:54:48
Message-ID: 601821.86400.qm@web31815.mail.mud.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance


--- Susan Russo <russo(at)morgan(dot)harvard(dot)edu> wrote:
> and accession like 'AY851043%'

I don't know if you've tried refactoring your query, but you could try:

AND accession BETWEEN 'AY8510430' AND 'AY8510439' -- where the last digit is
^ ^ -- lowest AND highest expected value

Regards,
Richard Broersma Jr.


From: "Alexander Staubo" <alex(at)purefiction(dot)net>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Susan Russo" <russo(at)morgan(dot)harvard(dot)edu>, pgsql-performance(at)postgresql(dot)org, harvsys(at)morgan(dot)harvard(dot)edu
Subject: Re: specific query (not all) on Pg8 MUCH slower than Pg7
Date: 2007-05-08 14:59:30
Message-ID: 88daf38c0705080759o2e5771c3qdf1b34baa581fc53@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 5/8/07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> You're not getting the indexscan optimization of the LIKE clause, which
> is most likely due to having initdb'd the 8.1 installation in something
> other than C locale. You can either redo the initdb in C locale (which
> might be a good move to fix other inconsistencies from the 7.3 behavior
> you're used to) or create a varchar_pattern_ops index on the column(s)
> you're using LIKE with.

Given the performance implications of setting the wrong locale, and
the high probability of accidentally doing this (I run my shells with
LANG=en_US.UTF-8, so all my databases have inherited this locale), why
is there no support for changing the database locale after the fact?

# alter database test set lc_collate = 'C';
ERROR: parameter "lc_collate" cannot be changed

Alexander.


From: Bill Moran <wmoran(at)collaborativefusion(dot)com>
To: "Alexander Staubo" <alex(at)purefiction(dot)net>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: specific query (not all) on Pg8 MUCH slower than Pg7
Date: 2007-05-08 15:04:27
Message-ID: 20070508110427.c714d52f.wmoran@collaborativefusion.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

In response to "Alexander Staubo" <alex(at)purefiction(dot)net>:

> On 5/8/07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > You're not getting the indexscan optimization of the LIKE clause, which
> > is most likely due to having initdb'd the 8.1 installation in something
> > other than C locale. You can either redo the initdb in C locale (which
> > might be a good move to fix other inconsistencies from the 7.3 behavior
> > you're used to) or create a varchar_pattern_ops index on the column(s)
> > you're using LIKE with.
>
> Given the performance implications of setting the wrong locale, and
> the high probability of accidentally doing this (I run my shells with
> LANG=en_US.UTF-8, so all my databases have inherited this locale), why
> is there no support for changing the database locale after the fact?
>
> # alter database test set lc_collate = 'C';
> ERROR: parameter "lc_collate" cannot be changed

How would that command handle UTF data that could not be converted to C?

--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran(at)collaborativefusion(dot)com
Phone: 412-422-3463x4023


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Alexander Staubo" <alex(at)purefiction(dot)net>
Cc: "Susan Russo" <russo(at)morgan(dot)harvard(dot)edu>, pgsql-performance(at)postgresql(dot)org, harvsys(at)morgan(dot)harvard(dot)edu
Subject: Re: specific query (not all) on Pg8 MUCH slower than Pg7
Date: 2007-05-08 15:05:48
Message-ID: 1406.1178636748@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

"Alexander Staubo" <alex(at)purefiction(dot)net> writes:
> why is there no support for changing the database locale after the fact?

It'd corrupt all your indexes (or all the ones on textual columns anyway).

There are some TODO entries related to this, but don't hold your breath
waiting for a fix ...

regards, tom lane