Re: The need for clustered indexes to boost TPC-V performance

From: Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
To: Reza Taheri <rtaheri(at)vmware(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: The need for clustered indexes to boost TPC-V performance
Date: 2012-07-04 05:43:43
Message-ID: 4FF3D80F.9040408@ringerc.id.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-performance

On 07/04/2012 07:13 AM, Reza Taheri wrote:
>
> Following the earlier email introducing the TPC-V benchmark, and that
> we are developing an industry standard benchmarking kit for TPC-V
> using PostgreSQL, here is a specific performance issue we have run into.
>

Which version of PostgreSQL are you using?

How has it been tuned beyond the defaults - autovacuum settings,
shared_buffers, effective_cache_size, WAL settings, etc?

How much RAM is on the blade? What OS and version are on the blade?

> Comparing the table sizes, we are close to 2X larger (more on this in
> a later note). But the index size is what stands out. Our overall
> index usage (again, after accounting for different numbers of rows) is
> 4.8X times larger. 35% of our I/Os are to the index space. I am
> guessing that the 4.8X ballooning has something to do with this, and
> that in itself explains a lot about our high I/O rate, as well as
> higher CPU/tran cycles compared to MS SQL (we are 2.5-3 times slower).
>
This is making me wonder about bloat issues and whether proper vacuuming
is being done. If the visibility map and free space map aren't
maintained by proper vaccum operation everything gets messy, fast.

> Well, MS SQL used a "clustered index" for CT, i.e., the data is held
> in the leaf pages of the index B-Tree. The data and index are in one
> data structure. Once you lookup the index, you also have the data at
> zero additional cost.
>
> [snip]
>
> Is the PGSQL community willing to invest in a feature that a) has been
> requested by many others already; and b) can make a huge difference in
> a benchmark that can lend substantial credibility to PGSQL performance?
>

while PostgreSQL doesn't support covering indexes or clustered indexes
at this point, 9.2 has added support for index-only scans, which are a
half-way point of sorts. See:

http://rhaas.blogspot.com.au/2011/10/index-only-scans-weve-got-em.html
http://rhaas.blogspot.com.au/2010/11/index-only-scans.html
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a2822fb9337a21f98ac4ce850bb4145acf47ca27

If at all possible please see how your test is affected by this
PostgreSQL 9.2 enhancement. It should make a big difference, and if it
doesn't it's important to know why.

(CC'd Robert Haas)

I'm not sure what the best option for getting a 9.2 beta build for
Windows is.

As for the "invest" side - that's really a matter for EnterpriseDB,
Command Prompt, Red Hat, and the other backers who're employing people
to work on the DB. Consider asking on pgsql-hackers, too; if nothing
else you'll get a good explanation of the current state and progress
toward clustered indexes.

Some links that may be useful to you are:

http://wiki.postgresql.org/wiki/Todo
Things that it'd be good to support/implement at some point.
Surprisingly, covering/clustered indexes aren't on there or at least
aren't easily found. It's certainly a much-desired feature despite its
apparent absence from the TODO.

http://wiki.postgresql.org/wiki/PostgreSQL_9.2_Development_Plan
http://wiki.postgresql.org/wiki/PostgreSQL_9.2_Open_Items

--
Craig Ringer

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Thomas Kellerer 2012-07-04 07:57:30 Re: The need for clustered indexes to boost TPC-V performance
Previous Message Andy Chambers 2012-07-04 00:38:10 Cancel a pg_ctl stop

Browse pgsql-performance by date

  From Date Subject
Next Message Thomas Kellerer 2012-07-04 07:57:30 Re: The need for clustered indexes to boost TPC-V performance
Previous Message Craig Ringer 2012-07-04 05:19:14 Re: Introducing the TPC-V benchmark, and its relationship to PostgreSQL