Re: 100x slowdown for nearly identical tables

From: Craig James <cjames(at)emolecules(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: 100x slowdown for nearly identical tables
Date: 2013-05-02 00:45:11
Message-ID: CAFwQ8rchn1J9Jic=FaHDsHG2cTT+s_pFr2nuNy7PTG3_LCEYnA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, May 1, 2013 at 5:18 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Craig James <cjames(at)emolecules(dot)com> writes:
> > I have two tables that are nearly identical, yet the same query runs 100x
> > slower on the newer one. ...
>
> > db=> explain analyze select id, 1 from str_conntab
> > where (id >= 12009977 and id <= 12509976) order by id;
>
> > Index Scan using new_str_conntab_pkey_3217 on str_conntab
> > (cost=0.00..230431.33 rows=87827 width=4)
> > (actual time=65.771..51341.899 rows=48613 loops=1)
> > Index Cond: ((id >= 12009977) AND (id <= 12509976))
> > Total runtime: 51350.556 ms
>
> > db=> explain analyze select id, 1 from old_str_conntab
> > where (id >= 12009977 and id <= 12509976) order by id;
>
> > Index Scan using str_conntab_pkey on old_str_conntab
> > (cost=0.00..82262.56 rows=78505 width=4)
> > (actual time=38.327..581.235 rows=48725 loops=1)
> > Index Cond: ((id >= 12009977) AND (id <= 12509976))
> > Total runtime: 586.071 ms
>
> It looks like old_str_conntab is more or less clustered by "id",
> and str_conntab not so much. You could try EXPLAIN (ANALYZE, BUFFERS)
> (on newer PG versions) to verify how many distinct pages are getting
> touched during the indexscan.
>

Yeah, now that you say it, it's obvious. The original table was built with
ID from a sequence, so it's going to be naturally clustered by ID. The new
table was built by reloading the data in alphabetical order by supplier
name, so it would have scattered the IDs all over the place.

I guess I could actually cluster the new table, but since that one table
holds about 90% of the total data in the database, that would be a chore.
Probably better to find a more efficient way to do the query.

Thanks,
Craig

>
> regards, tom lane
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Mark Kirkwood 2013-05-02 00:49:17 Re: In progress INSERT wrecks plans on table
Previous Message Tom Lane 2013-05-02 00:18:32 Re: 100x slowdown for nearly identical tables