Quick Links

Re: : Performance Improvement Strategy

Lists:	pgsql-performance

From:	Venkat Balaji <venkat(dot)balaji(at)verse(dot)in>
To:	PGSQL Performance <pgsql-performance(at)postgresql(dot)org>
Subject:	: Performance Improvement Strategy
Date:	2011-09-20 16:22:42
Message-ID:	CAFrxt0j5UysgEU8hQxZ_mGbdex8Ev-bGdGsdpOGYYSO-jO24Fw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Hello Everyone,

I had posted a query in "GENERAL" category, not sure if that was the
correct category to post.

Please help me understand how to calculate free space in Tables and Indexes
even after vacuuming and analyzing is performed.

What i understand is that, even if we perform VACUUM ANALYZE regularly, the
free space generated is not filled up.

I see lot of free spaces or free pages in Tables and Indexes. But, I need to
give an exact calculation on how much space will be reclaimed after VACUUM
FULL and RE-INDEXING.

Is there any standard procedure or process to calculate the same ?

Please help !

Thanks
Venkat

From:	Marcin Mirosław <marcin(at)mejor(dot)pl>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: : Performance Improvement Strategy
Date:	2011-09-20 17:51:36
Message-ID:	4E78D2A8.9090709@mejor.pl
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

W dniu 2011-09-20 18:22, Venkat Balaji pisze:
> Hello Everyone,
>
> I had posted a query in "GENERAL" category, not sure if that was the
> correct category to post.
>
> Please help me understand how to calculate free space in Tables and
> Indexes even after vacuuming and analyzing is performed.
>
> What i understand is that, even if we perform VACUUM ANALYZE regularly,
> the free space generated is not filled up.
>
> I see lot of free spaces or free pages in Tables and Indexes. But, I
> need to give an exact calculation on how much space will be reclaimed
> after VACUUM FULL and RE-INDEXING.
>
> Is there any standard procedure or process to calculate the same ?

Hello!
I hope this link will be usefull for you :
http://wiki.postgresql.org/wiki/Show_database_bloat
Regards

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: : Performance Improvement Strategy
Date:	2011-09-20 18:09:48
Message-ID:	4E78D6EC.30205@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Venkat,

> I see lot of free spaces or free pages in Tables and Indexes. But, I need to
> give an exact calculation on how much space will be reclaimed after VACUUM
> FULL and RE-INDEXING.

At present, there is no way to calculate this precisely. You can only
estimate, and estimates have significant error factors. The query which
Marcin linked for you, for example, can be as much as 500% off (although
usually only 50% off).

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

From:	Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: : Performance Improvement Strategy
Date:	2011-09-20 22:05:07
Message-ID:	4E790E13.8060308@catalyst.net.nz
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On 21/09/11 06:09, Josh Berkus wrote:
> Venkat,
>
>> I see lot of free spaces or free pages in Tables and Indexes. But, I need to
>> give an exact calculation on how much space will be reclaimed after VACUUM
>> FULL and RE-INDEXING.
> At present, there is no way to calculate this precisely. You can only
> estimate, and estimates have significant error factors. The query which
> Marcin linked for you, for example, can be as much as 500% off (although
> usually only 50% off).
>

If you have autovacuum on (which should be typical thee days), then
using the freespacemap contrib module should give very accurate results:

SELECT oid::regclass,
pg_relation_size(oid)/(1024*1024) AS mb,
sum(free)/(1024*1024) AS free_mb
FROM
(SELECT oid, (pg_freespace(oid)).avail AS free
FROM pg_class) AS a
GROUP BY a.oid ORDER BY free_mb DESC;

regards

Mark

From:	Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: : Performance Improvement Strategy
Date:	2011-09-20 22:10:49
Message-ID:	4E790F69.9060809@catalyst.net.nz
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On 21/09/11 10:05, Mark Kirkwood wrote:
>
> ...then using the freespacemap contrib module should give very
> accurate results:
>

Sorry, should have said - for 8.4 and later!

From:	Venkat Balaji <venkat(dot)balaji(at)verse(dot)in>
To:	Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: : Performance Improvement Strategy
Date:	2011-09-21 05:38:13
Message-ID:	CAFrxt0jixkb6v4EFByZMdByyrMw8CQKvPc3jjk9p4GZaPqHADQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Thank Everyone for your inputs !

Mark,

We are using 9.0, so, i should be able to make use of this "freespacemap"
contrib module and would get back to you with the results.

I was using below query (which i got it by googling)..

But, was not sure, if its picking up the correct information. I want to
avoid mis-prediction cost after whole production has been scheduled for
downtime for maintenance.

SELECT
current_database(), schemaname, tablename, /*reltuples::bigint,
relpages::bigint, otta,*/
ROUND(CASE WHEN otta=0 THEN 0.0 ELSE sml.relpages/otta::numeric END,1) AS
tbloat,
CASE WHEN relpages < otta THEN 0 ELSE bs*(sml.relpages-otta)::bigint END
AS wastedbytes,
iname, /*ituples::bigint, ipages::bigint, iotta,*/
ROUND(CASE WHEN iotta=0 OR ipages=0 THEN 0.0 ELSE ipages/iotta::numeric
END,1) AS ibloat,
CASE WHEN ipages < iotta THEN 0 ELSE bs*(ipages-iotta) END AS
wastedibytes
FROM (
SELECT
schemaname, tablename, cc.reltuples, cc.relpages, bs,
CEIL((cc.reltuples*((datahdr+ma-
(CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma
END))+nullhdr2+4))/(bs-20::float)) AS otta,
COALESCE(c2.relname,'?') AS iname, COALESCE(c2.reltuples,0) AS ituples,
COALESCE(c2.relpages,0) AS ipages,
COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::float)),0) AS iotta --
very rough approximation, assumes all cols
FROM (
SELECT
ma,bs,schemaname,tablename,
(datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma
END)))::numeric AS datahdr,
(maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE
nullhdr%ma END))) AS nullhdr2
FROM (
SELECT
schemaname, tablename, hdr, ma, bs,
SUM((1-null_frac)*avg_width) AS datawidth,
MAX(null_frac) AS maxfracsum,
hdr+(
SELECT 1+count(*)/8
FROM pg_stats s2
WHERE null_frac<>0 AND s2.schemaname = s.schemaname AND
s2.tablename = s.tablename
) AS nullhdr
FROM pg_stats s, (
SELECT
(SELECT current_setting('block_size')::numeric) AS bs,
CASE WHEN substring(v,12,3) IN ('8.0','8.1','8.2') THEN 27 ELSE 23
END AS hdr,
CASE WHEN v ~ 'mingw32' THEN 8 ELSE 4 END AS ma
FROM (SELECT version() AS v) AS foo
) AS constants
GROUP BY 1,2,3,4,5
) AS foo
) AS rs
JOIN pg_class cc ON cc.relname = rs.tablename
JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname =
rs.schemaname AND nn.nspname <> 'information_schema'
LEFT JOIN pg_index i ON indrelid = cc.oid
LEFT JOIN pg_class c2 ON c2.oid = i.indexrelid
) AS sml
ORDER BY wastedbytes DESC

Thanks
Venkat

On Wed, Sep 21, 2011 at 3:40 AM, Mark Kirkwood <
mark(dot)kirkwood(at)catalyst(dot)net(dot)nz> wrote:

> On 21/09/11 10:05, Mark Kirkwood wrote:
>
>>
>> ...then using the freespacemap contrib module should give very accurate
>> results:
>>
>>
> Sorry, should have said - for 8.4 and later!
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)**
> org <pgsql-performance(at)postgresql(dot)org>)
> To make changes to your subscription:
> http://www.postgresql.org/**mailpref/pgsql-performance<http://www.postgresql.org/mailpref/pgsql-performance>
>

From:	Venkat Balaji <venkat(dot)balaji(at)verse(dot)in>
To:	Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: : Performance Improvement Strategy
Date:	2011-09-21 05:59:23
Message-ID:	CAFrxt0jEuKXnqtUnZ79bsipizQegXd9MvqmHfb2Dm2JT1x7L-Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Can you please help me understand what "blkno" column refers to ?

Thanks
Venkat

On Wed, Sep 21, 2011 at 11:08 AM, Venkat Balaji <venkat(dot)balaji(at)verse(dot)in>wrote:

> Thank Everyone for your inputs !
>
> Mark,
>
> We are using 9.0, so, i should be able to make use of this "freespacemap"
> contrib module and would get back to you with the results.
>
> I was using below query (which i got it by googling)..
>
> But, was not sure, if its picking up the correct information. I want to
> avoid mis-prediction cost after whole production has been scheduled for
> downtime for maintenance.
>
> SELECT
> current_database(), schemaname, tablename, /*reltuples::bigint,
> relpages::bigint, otta,*/
> ROUND(CASE WHEN otta=0 THEN 0.0 ELSE sml.relpages/otta::numeric END,1) AS
> tbloat,
> CASE WHEN relpages < otta THEN 0 ELSE bs*(sml.relpages-otta)::bigint END
> AS wastedbytes,
> iname, /*ituples::bigint, ipages::bigint, iotta,*/
> ROUND(CASE WHEN iotta=0 OR ipages=0 THEN 0.0 ELSE ipages/iotta::numeric
> END,1) AS ibloat,
> CASE WHEN ipages < iotta THEN 0 ELSE bs*(ipages-iotta) END AS
> wastedibytes
> FROM (
> SELECT
> schemaname, tablename, cc.reltuples, cc.relpages, bs,
> CEIL((cc.reltuples*((datahdr+ma-
> (CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma
> END))+nullhdr2+4))/(bs-20::float)) AS otta,
> COALESCE(c2.relname,'?') AS iname, COALESCE(c2.reltuples,0) AS ituples,
> COALESCE(c2.relpages,0) AS ipages,
> COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::float)),0) AS iotta
> -- very rough approximation, assumes all cols
> FROM (
> SELECT
> ma,bs,schemaname,tablename,
> (datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma
> END)))::numeric AS datahdr,
> (maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE
> nullhdr%ma END))) AS nullhdr2
> FROM (
> SELECT
> schemaname, tablename, hdr, ma, bs,
> SUM((1-null_frac)*avg_width) AS datawidth,
> MAX(null_frac) AS maxfracsum,
> hdr+(
> SELECT 1+count(*)/8
> FROM pg_stats s2
> WHERE null_frac<>0 AND s2.schemaname = s.schemaname AND
> s2.tablename = s.tablename
> ) AS nullhdr
> FROM pg_stats s, (
> SELECT
> (SELECT current_setting('block_size')::numeric) AS bs,
> CASE WHEN substring(v,12,3) IN ('8.0','8.1','8.2') THEN 27 ELSE
> 23 END AS hdr,
> CASE WHEN v ~ 'mingw32' THEN 8 ELSE 4 END AS ma
> FROM (SELECT version() AS v) AS foo
> ) AS constants
> GROUP BY 1,2,3,4,5
> ) AS foo
> ) AS rs
> JOIN pg_class cc ON cc.relname = rs.tablename
> JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname =
> rs.schemaname AND nn.nspname <> 'information_schema'
> LEFT JOIN pg_index i ON indrelid = cc.oid
> LEFT JOIN pg_class c2 ON c2.oid = i.indexrelid
> ) AS sml
> ORDER BY wastedbytes DESC
>
> Thanks
> Venkat
>
>
> On Wed, Sep 21, 2011 at 3:40 AM, Mark Kirkwood <
> mark(dot)kirkwood(at)catalyst(dot)net(dot)nz> wrote:
>
>> On 21/09/11 10:05, Mark Kirkwood wrote:
>>
>>>
>>> ...then using the freespacemap contrib module should give very accurate
>>> results:
>>>
>>>
>> Sorry, should have said - for 8.4 and later!
>>
>>
>> --
>> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)**
>> org <pgsql-performance(at)postgresql(dot)org>)
>> To make changes to your subscription:
>> http://www.postgresql.org/**mailpref/pgsql-performance<http://www.postgresql.org/mailpref/pgsql-performance>
>>
>
>

From:	Shaun Thomas <sthomas(at)peak6(dot)com>
To:	Venkat Balaji <venkat(dot)balaji(at)verse(dot)in>
Cc:	PGSQL Performance <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: : Performance Improvement Strategy
Date:	2011-09-21 13:03:13
Message-ID:	4E79E091.2030401@peak6.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On 09/20/2011 11:22 AM, Venkat Balaji wrote:

> Please help me understand how to calculate free space in Tables and
> Indexes even after vacuuming and analyzing is performed.

Besides the query Mark gave you using freespacemap, there's also the
pgstattuple contrib module. You'd use it like this:

SELECT pg_size_pretty(free_space) AS mb_free
FROM pgstattuple('some_table');

Query must be run as a super-user, and I wouldn't recommend running it
on huge tables, since it scans the actual data files to get its
information. There's a lot of other useful information in that function,
such as the number of dead rows.

> What i understand is that, even if we perform VACUUM ANALYZE
> regularly, the free space generated is not filled up.

VACUUM does not actually generate free space. It locates and marks
reusable tuples. Any future updates or inserts on that table will be put
in those newly reclaimed spots, instead of being bolted onto the end of
the table.

> I see lot of free spaces or free pages in Tables and Indexes. But, I
> need to give an exact calculation on how much space will be reclaimed
> after VACUUM FULL and RE-INDEXING.

Why? If your database is so desperate for space, VACUUM and REINDEX
won't really help you. A properly maintained database will still have a
certain amount of "bloat" equal to the number of rows that change
between maintenance intervals. One way or another, that space is going
to be used by *something*.

It sounds more like you need to tweak your autovacuum settings to be
more aggressive if you're seeing significant enough turnover that your
tables are bloating significantly. One of our tables, for instance, gets
vacuumed more than once per hour because it experiences 1,000% turnover
daily.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas(at)peak6(dot)com

______________________________________________

See http://www.peak6.com/email-disclaimer/ for terms and conditions related to this email

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Venkat Balaji" <venkat(dot)balaji(at)verse(dot)in>
Cc:	<sthomas(at)peak6(dot)com>, "PGSQL Performance" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: : Performance Improvement Strategy
Date:	2011-09-21 13:57:35
Message-ID:	4E79A6FF0200002500041506@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Shaun Thomas <sthomas(at)peak6(dot)com> wrote:
> Venkat Balaji wrote:

>> I see lot of free spaces or free pages in Tables and Indexes.
>> But, I need to give an exact calculation on how much space will
>> be reclaimed after VACUUM FULL and RE-INDEXING.
>
> Why?

I've been wondering that, too. And talking about the space being
"reclaimed" seems to be at odds with your subject line. The space
is given up by the database engine to the file system free space,
where reuse by the database will be much more expensive. For good
performance you want some free space in the tables and indexes,
where it can be allocated to new tuples without going out through OS
calls to the file system.

Clearly, if free space gets higher than necessary to support
creation of new tuples, it can start to harm performance, and you
may need to take aggressive action (such as CLUSTER) to reclaim it;
but any time you find it necessary to do *that* you should be
investigating what went wrong to put you in such a spot. Either
your autovacuum is (as Shaun suggested) not aggressive enough, or
you have some long running transaction (possibly "idle in
transaction") which is preventing vacuums from doing their work
effectively. Investigating that is going to help more than
calculating just how much space the database is going to give up to
file system free space.

-Kevin

From:	Venkat Balaji <venkat(dot)balaji(at)verse(dot)in>
To:	sthomas(at)peak6(dot)com, Kevin(dot)Grittner(at)wicourts(dot)gov
Cc:	PGSQL Performance <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: : Performance Improvement Strategy
Date:	2011-09-21 16:13:45
Message-ID:	CAFrxt0hikCw7=O=v-yyQxvHog=kGZeTTTsgAauY8dktEDzOf0A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Thank you very much for your detailed explanation !

I will be working on our existing "auto-vacuuming" strategy to see
if that's optimal. But, we do have VACUUM VERBOSE ANALYZE running at the
cluster level every day and auto-vacuum is aggressive for highly active
tables.

Today, we have vacuumed a 10GB table and the table size decreased to 5 GB.

I understand that, it would very expensive for the table to reclaim the
space back from the filesystem. We have decided to do the maintenance after
a thorough analysis and our databases were not subjected to any kind of
maintenance activity since 2 yrs (with downtime).

I as a DBA, suggested to perform VACUUM FULL and RE-INDEXING + ANALYZE to
ensure that IO performance and Indexing performance would be good and the PG
optimizer would pick up the optimal plan. As said earlier, our databases
have never been part of any re-organization since 2 years and are highly
transactional databases. I believe that, performing VACUUM FULL and
RE-INDEXING would have tightly packed rows (in every page) would ensure good
IOs.

I might have not put across the explanation in an understandable manner.

Please help me know the following -

1. When would pg_stat_user_tables will be updated and what would the
information show ?
2. Will the information about dead-rows and live-rows vanish after VACUUM or
ANALYZE or VACUUM FULL ?

I am just preparing a monitoring system which would help us know the rate of
bloats and data generation on daily basis.

Sorry for the long email !

Looking forward for your help !

Thanks
Venkat

On Wed, Sep 21, 2011 at 7:27 PM, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov
> wrote:

> Shaun Thomas <sthomas(at)peak6(dot)com> wrote:
> > Venkat Balaji wrote:
>
> >> I see lot of free spaces or free pages in Tables and Indexes.
> >> But, I need to give an exact calculation on how much space will
> >> be reclaimed after VACUUM FULL and RE-INDEXING.
> >
> > Why?
>
> I've been wondering that, too. And talking about the space being
> "reclaimed" seems to be at odds with your subject line. The space
> is given up by the database engine to the file system free space,
> where reuse by the database will be much more expensive. For good
> performance you want some free space in the tables and indexes,
> where it can be allocated to new tuples without going out through OS
> calls to the file system.
>
> Clearly, if free space gets higher than necessary to support
> creation of new tuples, it can start to harm performance, and you
> may need to take aggressive action (such as CLUSTER) to reclaim it;
> but any time you find it necessary to do *that* you should be
> investigating what went wrong to put you in such a spot. Either
> your autovacuum is (as Shaun suggested) not aggressive enough, or
> you have some long running transaction (possibly "idle in
> transaction") which is preventing vacuums from doing their work
> effectively. Investigating that is going to help more than
> calculating just how much space the database is going to give up to
> file system free space.
>
> -Kevin
>

From:	Greg Smith <greg(at)2ndQuadrant(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: : Performance Improvement Strategy
Date:	2011-09-21 17:57:51
Message-ID:	4E7A259F.9060401@2ndQuadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On 09/21/2011 12:13 PM, Venkat Balaji wrote:
> I as a DBA, suggested to perform VACUUM FULL and RE-INDEXING + ANALYZE
> to ensure that IO performance and Indexing performance would be good

Read http://wiki.postgresql.org/wiki/VACUUM_FULL before you run VACUUM
FULL. You probably don't want to do that. A multi-gigabyte table can
easily be unavailable for several hours if you execute VACUUM FULL
against it. CLUSTER is almost always faster.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us

From:	Venkat Balaji <venkat(dot)balaji(at)verse(dot)in>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: : Performance Improvement Strategy
Date:	2011-09-21 18:18:39
Message-ID:	CAFrxt0i6TjkuUa-xB5wsySnPkgjCvq7MZkeRy981gc-oAbnUKg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Thanks Greg !

If i got it correct, CLUSTER would do the same what VACUUM FULL does (except
being fast).

CLUSTER is recommended only because it is faster ? As per the link, the
table would be unavailable (for shorter period compared to VACUUM FULL) when
CLUSTER is executed as well. Hope i got it correct !

Thanks
Venkat

On Wed, Sep 21, 2011 at 11:27 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:

> On 09/21/2011 12:13 PM, Venkat Balaji wrote:
>
>> I as a DBA, suggested to perform VACUUM FULL and RE-INDEXING + ANALYZE to
>> ensure that IO performance and Indexing performance would be good
>>
>
>
> Read http://wiki.postgresql.org/**wiki/VACUUM_FULL<http://wiki.postgresql.org/wiki/VACUUM_FULL>before you run VACUUM FULL. You probably don't want to do that. A
> multi-gigabyte table can easily be unavailable for several hours if you
> execute VACUUM FULL against it. CLUSTER is almost always faster.
>
> --
> Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
> PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)**
> org <pgsql-performance(at)postgresql(dot)org>)
> To make changes to your subscription:
> http://www.postgresql.org/**mailpref/pgsql-performance<http://www.postgresql.org/mailpref/pgsql-performance>
>

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Greg Smith" <greg(at)2ndquadrant(dot)com>, "Venkat Balaji" <venkat(dot)balaji(at)verse(dot)in>
Cc:	<pgsql-performance(at)postgresql(dot)org>
Subject:	Re: : Performance Improvement Strategy
Date:	2011-09-21 18:41:17
Message-ID:	4E79E97D0200002500041570@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Venkat Balaji <venkat(dot)balaji(at)verse(dot)in> wrote:

> If i got it correct, CLUSTER would do the same what VACUUM FULL
> does (except being fast)

CLUSTER copies the table (in the sequence of the specified index) to
a new set of files, builds fresh indexes, and then replaces the
original set of files with the new ones. So you do need room on
disk for a second copy of the table, but it tends to be much faster
then VACUUM FULL in PostgreSQL versions before 9.0. (Starting in
9.0, VACUUM FULL does the same thing as CLUSTER except that it scans
the table data rather than using an index.) REINDEX is not needed
when using CLUSTER or 9.x VACUUM FULL. Older versions of VACUUM
FULL would tend to bloat indexes, so a REINDEX after VACUUM FULL was
generally a good idea.

When choosing an index for CLUSTER, pick one on which you often
search for a *range* of rows, if possible. Like a name column if
you do a lot of name searches.

-Kevin

From:	Venkat Balaji <venkat(dot)balaji(at)verse(dot)in>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: : Performance Improvement Strategy
Date:	2011-09-27 12:29:06
Message-ID:	CAFrxt0hft-BcE9-gBOce9QEkX_9Mme+=oCSrcXDq4SAjMWbciQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

We had performed VACUUM FULL on our production and performance has improved
a lot !

I started using pg_stattuple and pg_freespacemap for tracking freespace in
the tables and Indexes and is helping us a lot.

Thanks for all your inputs and help !

Regards,
VB

On Thu, Sep 22, 2011 at 12:11 AM, Kevin Grittner <
Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:

> Venkat Balaji <venkat(dot)balaji(at)verse(dot)in> wrote:
>
> > If i got it correct, CLUSTER would do the same what VACUUM FULL
> > does (except being fast)
>
> CLUSTER copies the table (in the sequence of the specified index) to
> a new set of files, builds fresh indexes, and then replaces the
> original set of files with the new ones. So you do need room on
> disk for a second copy of the table, but it tends to be much faster
> then VACUUM FULL in PostgreSQL versions before 9.0. (Starting in
> 9.0, VACUUM FULL does the same thing as CLUSTER except that it scans
> the table data rather than using an index.) REINDEX is not needed
> when using CLUSTER or 9.x VACUUM FULL. Older versions of VACUUM
> FULL would tend to bloat indexes, so a REINDEX after VACUUM FULL was
> generally a good idea.
>
> When choosing an index for CLUSTER, pick one on which you often
> search for a *range* of rows, if possible. Like a name column if
> you do a lot of name searches.
>
> -Kevin
>

From:	Venkat Balaji <venkat(dot)balaji(at)verse(dot)in>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: : Performance Improvement Strategy
Date:	2011-09-27 12:31:01
Message-ID:	CAFrxt0hwFeBF66NqQMqjG5ZtcvqiFPJcjvYoj83W9XAZ77Ay8g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Forgot to mention -

Kevin,

CLUSTER seems to be an very interesting concept to me.

I am thinking to test the CLUSTER TABLE on our production according to the
Index usage on the table.

Will let you know once i get the results.

Regards,
VB

On Tue, Sep 27, 2011 at 5:59 PM, Venkat Balaji <venkat(dot)balaji(at)verse(dot)in>wrote:

> We had performed VACUUM FULL on our production and performance has improved
> a lot !
>
> I started using pg_stattuple and pg_freespacemap for tracking freespace in
> the tables and Indexes and is helping us a lot.
>
> Thanks for all your inputs and help !
>
> Regards,
> VB
>
>
> On Thu, Sep 22, 2011 at 12:11 AM, Kevin Grittner <
> Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
>
>> Venkat Balaji <venkat(dot)balaji(at)verse(dot)in> wrote:
>>
>> > If i got it correct, CLUSTER would do the same what VACUUM FULL
>> > does (except being fast)
>>
>> CLUSTER copies the table (in the sequence of the specified index) to
>> a new set of files, builds fresh indexes, and then replaces the
>> original set of files with the new ones. So you do need room on
>> disk for a second copy of the table, but it tends to be much faster
>> then VACUUM FULL in PostgreSQL versions before 9.0. (Starting in
>> 9.0, VACUUM FULL does the same thing as CLUSTER except that it scans
>> the table data rather than using an index.) REINDEX is not needed
>> when using CLUSTER or 9.x VACUUM FULL. Older versions of VACUUM
>> FULL would tend to bloat indexes, so a REINDEX after VACUUM FULL was
>> generally a good idea.
>>
>> When choosing an index for CLUSTER, pick one on which you often
>> search for a *range* of rows, if possible. Like a name column if
>> you do a lot of name searches.
>>
>> -Kevin
>>
>
>

From:	Venkat Balaji <venkat(dot)balaji(at)verse(dot)in>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: : Performance Improvement Strategy
Date:	2011-10-03 10:59:42
Message-ID:	CAFrxt0hcTgRzqOv82QvHHyBvFEa_i80nSAjakBe9zNw7_bdAoQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Hello,

Thanks for your suggestions !

We CLUSTERED a table using mostly used Index. Application is performing
better now.

Thanks
VB

On Tue, Sep 27, 2011 at 6:01 PM, Venkat Balaji <venkat(dot)balaji(at)verse(dot)in>wrote:

> Forgot to mention -
>
> Kevin,
>
> CLUSTER seems to be an very interesting concept to me.
>
> I am thinking to test the CLUSTER TABLE on our production according to the
> Index usage on the table.
>
> Will let you know once i get the results.
>
> Regards,
> VB
>
> On Tue, Sep 27, 2011 at 5:59 PM, Venkat Balaji <venkat(dot)balaji(at)verse(dot)in>wrote:
>
>> We had performed VACUUM FULL on our production and performance has
>> improved a lot !
>>
>> I started using pg_stattuple and pg_freespacemap for tracking freespace in
>> the tables and Indexes and is helping us a lot.
>>
>> Thanks for all your inputs and help !
>>
>> Regards,
>> VB
>>
>>
>> On Thu, Sep 22, 2011 at 12:11 AM, Kevin Grittner <
>> Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
>>
>>> Venkat Balaji <venkat(dot)balaji(at)verse(dot)in> wrote:
>>>
>>> > If i got it correct, CLUSTER would do the same what VACUUM FULL
>>> > does (except being fast)
>>>
>>> CLUSTER copies the table (in the sequence of the specified index) to
>>> a new set of files, builds fresh indexes, and then replaces the
>>> original set of files with the new ones. So you do need room on
>>> disk for a second copy of the table, but it tends to be much faster
>>> then VACUUM FULL in PostgreSQL versions before 9.0. (Starting in
>>> 9.0, VACUUM FULL does the same thing as CLUSTER except that it scans
>>> the table data rather than using an index.) REINDEX is not needed
>>> when using CLUSTER or 9.x VACUUM FULL. Older versions of VACUUM
>>> FULL would tend to bloat indexes, so a REINDEX after VACUUM FULL was
>>> generally a good idea.
>>>
>>> When choosing an index for CLUSTER, pick one on which you often
>>> search for a *range* of rows, if possible. Like a name column if
>>> you do a lot of name searches.
>>>
>>> -Kevin
>>>
>>
>>
>

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Venkat Balaji" <venkat(dot)balaji(at)verse(dot)in>
Cc:	"Greg Smith" <greg(at)2ndquadrant(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: : Performance Improvement Strategy
Date:	2011-10-03 16:15:17
Message-ID:	4E89994502000025000419B6@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Venkat Balaji <venkat(dot)balaji(at)verse(dot)in> wrote:

> We CLUSTERED a table using mostly used Index. Application is
> performing better now.

CLUSTER can help in at least four ways:

(1) It eliminates bloat in the table heap.

(2) It eliminates bloat in the indexes.

(3) It can correct fragmentation in the underlying disk files.

(4) It can put tuples which are accessed by the same query into
adjacent locations on disk, reducing physical disk access.

An aggressive autovacuum configuration can generally prevent the
first two from coming back to haunt you, and the third may not be a
big problem (depending on your OS and file system), but that last
one is a benefit which will degrade over time in most use cases --
the order in the heap is set by the cluster, but not maintained
after that. If this ordering is a significant part of the
performance improvement you're seeing, you may want to schedule some
regular CLUSTER run. It's hard to say what frequency would make
sense, but if performance gradually deteriorates and a CLUSTER fixes
it, you'll get a sense of how often it pays to do it.

-Kevin

From:	Venkat Balaji <venkat(dot)balaji(at)verse(dot)in>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: : Performance Improvement Strategy
Date:	2011-10-03 16:54:19
Message-ID:	CAFrxt0iLcvy0XVY7mTD34h=FeM_ryuQx+vq1GQDqpW6_FVnLBQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Thanks a lot Kevin !

This email has deepened my understanding on the clustering concept.

Keeping this in mind, I have recommended a new disk layout at the OS level
for our production servers so that IOs will be balanced on the disks as
well.

Currently, we do not have mount points divided according to the type of IOs.

I will share my recommended plan in an different email thread.

Thanks again for this detailed explanation.

Regards,
VB

On Mon, Oct 3, 2011 at 9:45 PM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov>wrote:

> Venkat Balaji <venkat(dot)balaji(at)verse(dot)in> wrote:
>
> > We CLUSTERED a table using mostly used Index. Application is
> > performing better now.
>
> CLUSTER can help in at least four ways:
>
> (1) It eliminates bloat in the table heap.
>
> (2) It eliminates bloat in the indexes.
>
> (3) It can correct fragmentation in the underlying disk files.
>
> (4) It can put tuples which are accessed by the same query into
> adjacent locations on disk, reducing physical disk access.
>
> An aggressive autovacuum configuration can generally prevent the
> first two from coming back to haunt you, and the third may not be a
> big problem (depending on your OS and file system), but that last
> one is a benefit which will degrade over time in most use cases --
> the order in the heap is set by the cluster, but not maintained
> after that. If this ordering is a significant part of the
> performance improvement you're seeing, you may want to schedule some
> regular CLUSTER run. It's hard to say what frequency would make
> sense, but if performance gradually deteriorates and a CLUSTER fixes
> it, you'll get a sense of how often it pays to do it.
>
> -Kevin
>

From:	Venkat Balaji <venkat(dot)balaji(at)verse(dot)in>
To:	sthomas(at)peak6(dot)com, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	PGSQL Performance <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: : Performance Improvement Strategy
Date:	2011-10-05 09:08:43
Message-ID:	CAFrxt0jLu9rtVTSkojRJn4ezehfmAyyShoVZdPKSUKKrcLT6nw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Hello,

I was attempting to calculate the actual occupied space by a Table.

Below is what i did -

I summed up the avg_width of each column of a table from pg_stats, which
gives me the average size of a row (277 bytes).

select* sum(avg_width) as average_row_size from pg_stats *where
tablename='tablename'

average_row_size
---------------------------
277

(1 row)

Calculated the actual occupied space by rows in the table as below -

*Took the average_row_size * number_of_rows from pg_class*

select 277*reltuples/1024 as occupied_space from pg_class where
relname='tablename'; == 552 KB

occupied_space
-------------------------
552.6474609375

Calculated the actual Table size (600 kb)

select pg_size_pretty(pg_relation_size('tablename'));

pg_size_pretty
----------------
600 KB

(1 row)

Calculated the free space with in the table (by scanning the pages - as
suggested by Shaun Thomas) -- 14 KB

SELECT pg_size_pretty(free_space) AS mb_free FROM pgstattuple('tablename');

mb_free
---------
14 KB

(1 row)

600 KB is the size of the table (taken through pg_size_pretty)
14 KB is the free space (taken through contrib modules)
600+14 = 586 KB -- is the occupied space by normal calculation through
contrib modules. This is based on number of pages allocated to the table.
552 KB is the actual occupied size by the rows (taken by calculating avg row
size ). This is based on number of rows with in the pages.
586-552 = 34 KB -- is still free some where with in the occupied pages (
calculated through pg_stats and pg_class )
34 KB is still free within the pages ( each 8K ) which is basically taken as
occupied space.

This is similar concept which i successfully applied in an other RDBMS
Technology to calculate space usage metrics on production.
This is all calculated after considering Vacuum and Analyze jobs are
executed.

Please comment !

Sorry if this is too confusing and too long.

Thanks
VB

On Wed, Sep 21, 2011 at 6:33 PM, Shaun Thomas <sthomas(at)peak6(dot)com> wrote:

> On 09/20/2011 11:22 AM, Venkat Balaji wrote:
>
> Please help me understand how to calculate free space in Tables and
>> Indexes even after vacuuming and analyzing is performed.
>>
>
> Besides the query Mark gave you using freespacemap, there's also the
> pgstattuple contrib module. You'd use it like this:
>
> SELECT pg_size_pretty(free_space) AS mb_free
> FROM pgstattuple('some_table');
>
> Query must be run as a super-user, and I wouldn't recommend running it on
> huge tables, since it scans the actual data files to get its information.
> There's a lot of other useful information in that function, such as the
> number of dead rows.
>
>
> What i understand is that, even if we perform VACUUM ANALYZE
>> regularly, the free space generated is not filled up.
>>
>
> VACUUM does not actually generate free space. It locates and marks reusable
> tuples. Any future updates or inserts on that table will be put in those
> newly reclaimed spots, instead of being bolted onto the end of the table.
>
>
> I see lot of free spaces or free pages in Tables and Indexes. But, I
>> need to give an exact calculation on how much space will be reclaimed
>> after VACUUM FULL and RE-INDEXING.
>>
>
> Why? If your database is so desperate for space, VACUUM and REINDEX won't
> really help you. A properly maintained database will still have a certain
> amount of "bloat" equal to the number of rows that change between
> maintenance intervals. One way or another, that space is going to be used by
> *something*.
>
> It sounds more like you need to tweak your autovacuum settings to be more
> aggressive if you're seeing significant enough turnover that your tables are
> bloating significantly. One of our tables, for instance, gets vacuumed more
> than once per hour because it experiences 1,000% turnover daily.
>
> --
> Shaun Thomas
> OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
> 312-676-8870
> sthomas(at)peak6(dot)com
>
> ______________________________**________________
>
> See http://www.peak6.com/email-**disclaimer/<http://www.peak6.com/email-disclaimer/>for terms and conditions related to this email
>

From:	Raghavendra <raghavendra(dot)rao(at)enterprisedb(dot)com>
To:	Venkat Balaji <venkat(dot)balaji(at)verse(dot)in>
Cc:	sthomas(at)peak6(dot)com, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PGSQL Performance <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: : Performance Improvement Strategy
Date:	2011-10-05 09:30:59
Message-ID:	CA+h6AhhuQMoa61bO7K0P-PdnAOcxkZ2SddT5DYGrA4w9mqQ2sQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Wed, Oct 5, 2011 at 2:38 PM, Venkat Balaji <venkat(dot)balaji(at)verse(dot)in>wrote:

> Hello,
>
> I was attempting to calculate the actual occupied space by a Table.
>
> Below is what i did -
>
> I summed up the avg_width of each column of a table from pg_stats, which
> gives me the average size of a row (277 bytes).
>
> select* sum(avg_width) as average_row_size from pg_stats *where
> tablename='tablename'
>
> average_row_size
> ---------------------------
> 277
>
> (1 row)
>
> Calculated the actual occupied space by rows in the table as below -
>
> *Took the average_row_size * number_of_rows from pg_class*
>
> select 277*reltuples/1024 as occupied_space from pg_class where
> relname='tablename'; == 552 KB
>
> occupied_space
> -------------------------
> 552.6474609375
>
> Calculated the actual Table size (600 kb)
>
> select pg_size_pretty(pg_relation_size('tablename'));
>
>
> pg_size_pretty
> ----------------
> 600 KB
>
> (1 row)
>
> Calculated the free space with in the table (by scanning the pages - as
> suggested by Shaun Thomas) -- 14 KB
>
> SELECT pg_size_pretty(free_space) AS mb_free FROM pgstattuple('tablename');
>
> mb_free
> ---------
> 14 KB
>
> (1 row)
>
> 600 KB is the size of the table (taken through pg_size_pretty)
> 14 KB is the free space (taken through contrib modules)
> 600+14 = 586 KB -- is the occupied space by normal calculation through
> contrib modules. This is based on number of pages allocated to the table.
>

Its typo 600 - 14 = 586 KB

552 KB is the actual occupied size by the rows (taken by calculating avg row
> size ). This is based on number of rows with in the pages.
> 586-552 = 34 KB -- is still free some where with in the occupied pages (
> calculated through pg_stats and pg_class )
> 34 KB is still free within the pages ( each 8K ) which is basically taken
> as occupied space.
>
>
One more point to add to this good discussion, each row header will occupy
24 bytes + 4 bytes pointer on page to tuple.

---
Regards,
Raghavendra
EnterpriseDB Corporation
Blog: http://raghavt.blogspot.com/

From:	Guillaume Cottenceau <gc(at)mnc(dot)ch>
To:	Venkat Balaji <venkat(dot)balaji(at)verse(dot)in>
Cc:	PGSQL Performance <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: : Performance Improvement Strategy
Date:	2011-10-05 09:39:58
Message-ID:	m3zkhfhj4h.fsf@mnc.ch
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Venkat Balaji <venkat.balaji 'at' verse.in> writes:

> Hello,
>
> I was attempting to calculate the actual occupied space by a Table.

SELECT relname, reltuples, pg_size_pretty(relpages*8*1024) as size FROM pg_class, pg_namespace WHERE pg_namespace.oid = pg_class.relnamespace AND relkind = 'r' AND nspname = 'public' ORDER BY relpages DESC;

relkind = 'i' for indexes.

--
Guillaume Cottenceau

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Raghavendra" <raghavendra(dot)rao(at)enterprisedb(dot)com>, "Venkat Balaji" <venkat(dot)balaji(at)verse(dot)in>
Cc:	<sthomas(at)peak6(dot)com>, "PGSQL Performance" <pgsql-performance(at)postgresql(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: : Performance Improvement Strategy
Date:	2011-10-05 16:13:59
Message-ID:	4E8C3BF70200002500041AAF@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Raghavendra <raghavendra(dot)rao(at)enterprisedb(dot)com> wrote:
> Venkat Balaji <venkat(dot)balaji(at)verse(dot)in>wrote:

>> [attempt to calculate file space from row layout and number of
>> rows]

> One more point to add to this good discussion, each row header
> will occupy 24 bytes + 4 bytes pointer on page to tuple.

Not to mention:

http://www.postgresql.org/docs/9.1/interactive/storage-toast.html

-Kevin

From:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: : Performance Improvement Strategy
Date:	2011-10-05 17:09:45
Message-ID:	CAOR=d=1sfgPk0z2zwTqEMapu+5=VHWAB+o9azHs-LytXgsWNnQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Wed, Sep 21, 2011 at 11:57 AM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> On 09/21/2011 12:13 PM, Venkat Balaji wrote:
>>
>> I as a DBA, suggested to perform VACUUM FULL and RE-INDEXING + ANALYZE to
>> ensure that IO performance and Indexing performance would be good
>
>
> Read http://wiki.postgresql.org/wiki/VACUUM_FULL before you run VACUUM FULL.
> You probably don't want to do that. A multi-gigabyte table can easily be
> unavailable for several hours if you execute VACUUM FULL against it.
> CLUSTER is almost always faster.

It used to be that cluster on a very randomly ordered table was much
slower than doing something like select * into newtable from oldtable
order by col1, col2; Is that still the case in 9.0/9.1?

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: : Performance Improvement Strategy
Date:	2011-10-05 18:15:54
Message-ID:	23412.1317838554@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com> writes:
> It used to be that cluster on a very randomly ordered table was much
> slower than doing something like select * into newtable from oldtable
> order by col1, col2; Is that still the case in 9.0/9.1?

Fixed in 9.1, per release notes:

* Allow CLUSTER to sort the table rather than scanning the index when it seems likely to be cheaper (Leonardo Francalanci)

regards, tom lane

From:	Raghavendra <raghavendra(dot)rao(at)enterprisedb(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: : Performance Improvement Strategy
Date:	2011-10-06 09:31:23
Message-ID:	CA+h6AhjSHZpriwb_TWTJBMpv=yN_z6mR+T8THkJYX0Lggs0TVw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Thanks Kevin.

Am in 9.1 and tested same scenario, how exactly storage metrics are
calculated. Please comment.

*No. of rows:*

postgres=# select relname,reltuples from pg_class where relname='test';
relname | reltuples
---------+-----------
test | 1001
(1 row)

*Average Row size:*

postgres=# select sum(avg_width) as average_row_size from pg_stats where
tablename='test';
average_row_size
------------------
17
(1 row)

*Occupied Space:*

postgres=# select 17*reltuples/1024 as "No.of.Row_size * No.of.Rows =
Occupied_Space" from pg_class where relname='test';
No.of.Row_size * No.of.Rows = Occupied_Space
----------------------------------------------
16.6181640625

*Actual Table Size:*

postgres=# select pg_size_pretty(pg_relation_size('test'));
pg_size_pretty
----------------
48 kB
(1 row)

postgres=# SELECT relname, reltuples, pg_size_pretty(relpages*8*1024) as
size FROM pg_class, pg_namespace WHERE pg_namespace.oid =
pg_class.relnamespace AND relkind = 'r' AND nspname = 'public' ORDER BY
relpages DESC;
relname | reltuples | size
---------+-----------+-------
test | 1001 | 48 kB
(1 row)

Its different here:

postgres=# select pg_size_pretty(pg_total_relation_size('test'));
pg_size_pretty
----------------
88 kB
(1 row)

*Free Space:*

postgres=# SELECT pg_size_pretty(free_space) AS mb_free FROM
pgstattuple('test');
mb_free
-----------
936 bytes
(1 row)

postgres=# select * from pgstattuple('test');
table_len | tuple_count | tuple_len | tuple_percent | dead_tuple_count |
dead_tuple_len | dead_tuple_percent | free_space | free_percent
-----------+-------------+-----------+---------------+------------------+----------------+--------------------+------------+--------------
49152 | 1001 | 41041 | 83.5 | 0 |
0 | 0 | 936 | 1.9
(1 row)

*OS Level Storage:*

bash-4.1$ ll -h 16447*
-rw------- 1 postgres postgres 48K Oct 2 17:40 16447
-rw------- 1 postgres postgres 24K Oct 2 17:40 16447_fsm
-rw------- 1 postgres postgres 8.0K Oct 2 17:40 16447_vm

What has occupied in extra 8KB ?

postgres=# select pg_size_pretty(pg_total_relation_size('test'));
pg_size_pretty
----------------
88 kB
(1 row)

Thanks in advance.

---
Regards,
Raghavendra
EnterpriseDB Corporation
Blog: http://raghavt.blogspot.com/

From:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: : Performance Improvement Strategy
Date:	2011-10-06 19:20:51
Message-ID:	CAOR=d=1HNVVnsADVngt+Kj7L0rpyCfyvL5oLkxOj7jsq9ZVcqg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Wed, Oct 5, 2011 at 12:15 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com> writes:
>> It used to be that cluster on a very randomly ordered table was much
>> slower than doing something like select * into newtable from oldtable
>> order by col1, col2; Is that still the case in 9.0/9.1?
>
> Fixed in 9.1, per release notes:
>
> * Allow CLUSTER to sort the table rather than scanning the index when it seems likely to be cheaper (Leonardo Francalanci)

Looks like I owe Leonardo Francalanci a pizza.

From:	Leonardo Francalanci <m_lists(at)yahoo(dot)it>
To:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: : Performance Improvement Strategy
Date:	2011-10-10 08:04:37
Message-ID:	1318233877.13796.YahooMailNeo@web29020.mail.ird.yahoo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

>> * Allow CLUSTER to sort the table rather than scanning the index

> when it seems likely to be cheaper (Leonardo Francalanci)
>
> Looks like I owe Leonardo Francalanci a pizza.

Well, the patch started from a work by Gregory Stark, and Tom fixed
a nasty bug; but I'll take a slice ;)

Leonardo