Re: The need for clustered indexes to boost TPC-V performance

From: Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
To: Reza Taheri <rtaheri(at)vmware(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: The need for clustered indexes to boost TPC-V performance
Date: 2012-07-06 00:46:11
Message-ID: 4FF63553.4080300@ringerc.id.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-performance

First: Please do try 9.2 beta if you're upgrading from 8.4. It'll be out
as a final release soon enough, and index only scans may make a /big/
difference for the problem you're currently having.

Looking at your configuration I have a few comments, but it's worth
noting that I don't work with hardware at that scale, and I'm more used
to tuning I/O bottlenecked systems with onboard storage rather than
CPU-bottlenecked ones on big SANs. Hopefully now that you've posted your
configuration and setup there might be interest from others.

If you're able to post an EXPLAIN ANALYZE or two for a query you feel is
slow that certainly won't hurt. Using http://explain.depesz.com/ saves
you the hassle of dealing with word-wrapping when posting them, btw.

As for your config:

I notice that your autovacuum settings are at their defaults. With heavy
UPDATE / DELETE load this'll tend to lead to table and index bloat, so
the DB has to scan more useless data to get what it needs. It also means
table stats won't be maintained as well, potentially leading to poor
planner decisions. The following fairly scary query can help identify
bloat, as the database server doesn't currently have anything much built
in to help you spot such issues:

http://wiki.postgresql.org/wiki/Show_database_bloat

It might be helpful to set effective_cache_size and
effective_io_concurrency so Pg has more idea of the scale of your
hardware. The defaults are very conservative - it's supposed to be easy
for people to use for simple things without melting their systems, and
it's expected that anyone doing bigger work will tune the database.

http://www.postgresql.org/docs/9.1/static/runtime-config-resource.html

It looks like you've already tweaked many of the critical points for big
installs - your checkpoint_segments, wal_buffers, shared_buffers, etc. I
lack the big hardware experience to know if they're appropriate, but
they're not the extremely conservative defaults, which is a start.

Your random_page_cost and seq_page_cost are probably dead wrong for a
SAN with RAM and SSD cache in front of fast disks. Their defaults are
for local uncached spinning HDD media where seeks are expensive. The
typical advice on such hardware is to set them to something more like
seq_page_cost = 0.1 random_page_cost = 0.15 - ie cheaper relative to
the cpu cost, and with random I/O only a little more expensive than
sequential I/O. What's right for your situation varies a bit based on DB
size vs hardware size, etc; Greg discusses this more in his book.

What isolation level do your transactions use? This is significant
because of the move to true serializable isolation with predicate
locking in 9.0; it made serializable transactions a bit slower in some
circumstances in exchange for much stronger correctness guarantees. The
READ COMMITTED default was unchanged.

It also looks like you might not have seen the second part of my earlier
reply:

while PostgreSQL doesn't support covering indexes or clustered indexes
at this point, 9.2 has added support for index-only scans, which are a
half-way point of sorts. See:

http://rhaas.blogspot.com.au/2011/10/index-only-scans-weve-got-em.html
http://rhaas.blogspot.com.au/2010/11/index-only-scans.html
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a2822fb9337a21f98ac4ce850bb4145acf47ca27

If at all possible please see how your test is affected by this
PostgreSQL 9.2 enhancement. It should make a big difference, and if it
doesn't it's important to know why.

(CC'd Robert Haas)

As for the "invest" side - that's really a matter for EnterpriseDB,
Command Prompt, Red Hat, and the other backers who're employing people
to work on the DB. Consider asking on pgsql-hackers, too; if nothing
else you'll get a good explanation of the current state and progress
toward clustered indexes.

Some links that may be useful to you are:

http://wiki.postgresql.org/wiki/Todo
Things that it'd be good to support/implement at some point.
Surprisingly, covering/clustered indexes aren't on there or at least
aren't easily found. It's certainly a much-desired feature despite its
apparent absence from the TODO.

http://wiki.postgresql.org/wiki/PostgreSQL_9.2_Development_Plan
http://wiki.postgresql.org/wiki/PostgreSQL_9.2_Open_Items

--
Craig Ringer

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Craig Ringer 2012-07-06 00:57:40 Re: The need for clustered indexes to boost TPC-V performance
Previous Message Andy Colson 2012-07-06 00:41:40 Re: The need for clustered indexes to boost TPC-V performance

Browse pgsql-performance by date

  From Date Subject
Next Message Mark Kirkwood 2012-07-06 00:51:33 SSDs again, LSI Warpdrive 2 anyone?
Previous Message Andy Colson 2012-07-06 00:41:40 Re: The need for clustered indexes to boost TPC-V performance