Re: pgstattuple extension for indexes

Lists: pgsql-hackerspgsql-patches
From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: pgsql-patches(at)postgresql(dot)org, ishii(at)sraoss(dot)co(dot)jp
Subject: pgstattuple extension for indexes
Date: 2006-06-28 07:22:31
Message-ID: 20060628154451.FC33.ITAGAKI.TAKAHIRO@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

This is an extension of pgstattuple to query information from indexes.
It supports btree, hash and gist. Gin is not supported.
It scans only index pages and does not read corresponding heap tuples.
Therefore, 'dead_tuple' means the number of tuples with LP_DELETE flag.

Also, I added an experimental feature for btree indexes. It checks
fragmentation factor of indexes. If an leaf has the right link on the next
adjacent page in the file, it is assumed to be continuous (not fragmented).
It will help us to decide when to REINDEX.

Suggestions welcome.

----
$ pgbench -i
$ pgbench -n -t 100 -c 10
# select * from pgstattuple('accounts_pkey');
NOTICE: 0.36% fragmented
HINT: continuous=273, forward=1, backward=0
-[ RECORD 1 ]------+--------
table_len | 2260992
tuple_count | 100996 -- 996 tuples are dead practically,
tuple_len | 1615936 but no LP_DELETE yet.
tuple_percent | 71.47
dead_tuple_count | 4
dead_tuple_len | 64 -- 64 tuples are marked as LP_DELETE.
dead_tuple_percent | 0
free_space | 208188
free_percent | 9.21
----

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

Attachment Content-Type Size
pgstattuple.patch application/octet-stream 22.5 KB

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: pgsql-patches(at)postgresql(dot)org, ishii(at)sraoss(dot)co(dot)jp
Subject: Re: pgstattuple extension for indexes
Date: 2006-07-06 02:18:20
Message-ID: 200607060218.k662IKq25483@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


Patch applied. Thanks.

---------------------------------------------------------------------------

ITAGAKI Takahiro wrote:
> This is an extension of pgstattuple to query information from indexes.
> It supports btree, hash and gist. Gin is not supported.
> It scans only index pages and does not read corresponding heap tuples.
> Therefore, 'dead_tuple' means the number of tuples with LP_DELETE flag.
>
> Also, I added an experimental feature for btree indexes. It checks
> fragmentation factor of indexes. If an leaf has the right link on the next
> adjacent page in the file, it is assumed to be continuous (not fragmented).
> It will help us to decide when to REINDEX.
>
> Suggestions welcome.
>
> ----
> $ pgbench -i
> $ pgbench -n -t 100 -c 10
> # select * from pgstattuple('accounts_pkey');
> NOTICE: 0.36% fragmented
> HINT: continuous=273, forward=1, backward=0
> -[ RECORD 1 ]------+--------
> table_len | 2260992
> tuple_count | 100996 -- 996 tuples are dead practically,
> tuple_len | 1615936 but no LP_DELETE yet.
> tuple_percent | 71.47
> dead_tuple_count | 4
> dead_tuple_len | 64 -- 64 tuples are marked as LP_DELETE.
> dead_tuple_percent | 0
> free_space | 208188
> free_percent | 9.21
> ----
>
>
> Regards,
> ---
> ITAGAKI Takahiro
> NTT Open Source Software Center

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-patches(at)postgresql(dot)org, ishii(at)sraoss(dot)co(dot)jp
Subject: Re: pgstattuple extension for indexes
Date: 2006-07-14 17:44:02
Message-ID: 2992.1152899042@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> Patch applied. Thanks.

>> Also, I added an experimental feature for btree indexes. It checks
>> fragmentation factor of indexes. If an leaf has the right link on the next
>> adjacent page in the file, it is assumed to be continuous (not fragmented).
>> It will help us to decide when to REINDEX.

This was done in an entirely unacceptable way, to wit

ereport(NOTICE,
(errmsg("%.2f%% fragmented",
100.0 * (stat.forward + stat.backward) /
(stat.continuous + stat.forward + stat.backward)),
errhint("continuous=%llu, forward=%llu, backward=%llu",
stat.continuous, stat.forward, stat.backward)));

The really serious problem with reporting this info via NOTICE is that
there's no way for a program to get its hands on the info. The output
tuple format needs to be extended instead.

The lesser problem that drew my attention is that %llu is unportable
(and in fact draws gcc warnings for me; did you ignore that?). But
using UINT64_FORMAT instead would create a headache for translation
because the string would vary across platforms.

I won't bother correcting the violations of message style guidelines,
because this code simply has to go away.

regards, tom lane


From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-patches(at)postgresql(dot)org, ishii(at)sraoss(dot)co(dot)jp
Subject: Re: pgstattuple extension for indexes
Date: 2006-07-24 06:54:44
Message-ID: 20060724144315.699D.ITAGAKI.TAKAHIRO@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> >> Also, I added an experimental feature for btree indexes. It checks
> >> fragmentation factor of indexes.

> The really serious problem with reporting this info via NOTICE is that
> there's no way for a program to get its hands on the info. The output
> tuple format needs to be extended instead.

Ok, I added 'fragmented_percent' field to the output tuple. This
information will help us to decide when to do REINDEX.
However, it is only avaliable for btree index presently. Other indexes
should have equivalent information, but I don't know how to determine it.

BTW, should we change VACUUM VERBOSE in the same way? If we do so,
autovacuum can handle the reports of VACUUM VERBOSE and plan when to
do VACUUM FULL, REINDEX and/or CLUSTER using the information.
Is this worth doing?

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

Attachment Content-Type Size
pgstattuple-0724.patch application/octet-stream 11.7 KB

From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-patches(at)postgresql(dot)org, ishii(at)sraoss(dot)co(dot)jp
Subject: Re: pgstattuple extension for indexes
Date: 2006-07-24 14:26:28
Message-ID: 20060724142628.GE5223@surnet.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

ITAGAKI Takahiro wrote:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> > >> Also, I added an experimental feature for btree indexes. It checks
> > >> fragmentation factor of indexes.
>
> > The really serious problem with reporting this info via NOTICE is that
> > there's no way for a program to get its hands on the info. The output
> > tuple format needs to be extended instead.
>
> Ok, I added 'fragmented_percent' field to the output tuple. This
> information will help us to decide when to do REINDEX.
> However, it is only avaliable for btree index presently. Other indexes
> should have equivalent information, but I don't know how to determine it.

BTW while you're handling this, why not change the function to use OUT
parameters instead of having to CREATE TYPE to handle the return type?
I think it is easier to handle ...

One question I had was: in the percentage of fragmentation, is higher
better or lower better? (I'm guessing lower is better, but this should
be mentioned in the docs)

> BTW, should we change VACUUM VERBOSE in the same way? If we do so,
> autovacuum can handle the reports of VACUUM VERBOSE and plan when to
> do VACUUM FULL, REINDEX and/or CLUSTER using the information.
> Is this worth doing?

You mean having VACUUM VERBOSE return a result set?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-patches(at)postgresql(dot)org, ishii(at)sraoss(dot)co(dot)jp
Subject: Re: pgstattuple extension for indexes
Date: 2006-07-24 14:52:49
Message-ID: 10827.1153752769@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> ITAGAKI Takahiro wrote:
>> BTW, should we change VACUUM VERBOSE in the same way? If we do so,
>> autovacuum can handle the reports of VACUUM VERBOSE and plan when to
>> do VACUUM FULL, REINDEX and/or CLUSTER using the information.
>> Is this worth doing?

> You mean having VACUUM VERBOSE return a result set?

To me, the point of VACUUM VERBOSE is mostly to give you some
reassurance that it's making progress. If it were returning rows
instead of notice messages, you'd lose that functionality (at least
in libpq-based clients). In any case, autovacuum has other ways
of getting the information without needing a change in user-visible
behavior.

regards, tom lane


From: "Satoshi Nagayasu" <nagayasus(at)nttdata(dot)co(dot)jp>
To: "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Bruce Momjian" <bruce(at)momjian(dot)us>, pgsql-patches(at)postgresql(dot)org, ishii(at)sraoss(dot)co(dot)jp
Subject: Re: pgstattuple extension for indexes
Date: 2006-07-24 23:33:03
Message-ID: 1a53b89f0607241633m1804258ct9674a1132cbde8e4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Hi,

I'm working on an utility for b-tree index, called `pgstatindex`.

It reports b-tree index statistics like a pgstattuple as below.
----------------------------------------------------------------
pgbench=# \x
Expanded display is on.
pgbench=# SELECT * FROM pgstatindex('accounts_pkey1');
-[ RECORD 1 ]------+---------
version | 2
tree_level | 2
index_size | 17956864
root_block_no | 361
internal_pages | 8
leaf_pages | 2184
empty_pages | 0
deleted_pages | 0
avg_leaf_density | 90.07
leaf_fragmentation | 0

pgbench=#
----------------------------------------------------------------

I want to make this to contrib module like a pgstattuple,
and to make this open to public in a few days.

Do you think this is useful?

2006/7/24, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> > >> Also, I added an experimental feature for btree indexes. It checks
> > >> fragmentation factor of indexes.
>
> > The really serious problem with reporting this info via NOTICE is that
> > there's no way for a program to get its hands on the info. The output
> > tuple format needs to be extended instead.
>
> Ok, I added 'fragmented_percent' field to the output tuple. This
> information will help us to decide when to do REINDEX.
> However, it is only avaliable for btree index presently. Other indexes
> should have equivalent information, but I don't know how to determine it.
>
>
> BTW, should we change VACUUM VERBOSE in the same way? If we do so,
> autovacuum can handle the reports of VACUUM VERBOSE and plan when to
> do VACUUM FULL, REINDEX and/or CLUSTER using the information.
> Is this worth doing?
>
> Regards,
> ---
> ITAGAKI Takahiro
> NTT Open Source Software Center
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
>
>
>
>


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>
Cc: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-patches(at)postgresql(dot)org, ishii(at)sraoss(dot)co(dot)jp
Subject: Re: pgstattuple extension for indexes
Date: 2006-07-25 01:49:33
Message-ID: 200607250149.k6P1nX000866@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Satoshi Nagayasu wrote:
> Hi,
>
> I'm working on an utility for b-tree index, called `pgstatindex`.
>
> It reports b-tree index statistics like a pgstattuple as below.
> ----------------------------------------------------------------
> pgbench=# \x
> Expanded display is on.
> pgbench=# SELECT * FROM pgstatindex('accounts_pkey1');
> -[ RECORD 1 ]------+---------
> version | 2
> tree_level | 2
> index_size | 17956864
> root_block_no | 361
> internal_pages | 8
> leaf_pages | 2184
> empty_pages | 0
> deleted_pages | 0
> avg_leaf_density | 90.07
> leaf_fragmentation | 0
>
> pgbench=#
> ----------------------------------------------------------------
>
> I want to make this to contrib module like a pgstattuple,
> and to make this open to public in a few days.
>
> Do you think this is useful?

Yes, for performance debugging, I think.

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>
Cc: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-patches(at)postgresql(dot)org, ishii(at)sraoss(dot)co(dot)jp
Subject: Re: pgstattuple extension for indexes
Date: 2006-07-25 03:46:58
Message-ID: 20060725034658.GA12868@surnet.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Satoshi Nagayasu wrote:
> Hi,
>
> I'm working on an utility for b-tree index, called `pgstatindex`.

Does it make sense to merge the pgstatindex stuff with pgstattuple, and
have the fragmentation report into pgstatindex instead of pgstattuple
itself?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-patches(at)postgresql(dot)org, ishii(at)sraoss(dot)co(dot)jp
Subject: Re: pgstattuple extension for indexes
Date: 2006-07-25 04:12:12
Message-ID: 20060725130246.5D05.ITAGAKI.TAKAHIRO@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Alvaro Herrera <alvherre(at)commandprompt(dot)com> wrote:

> Satoshi Nagayasu wrote:
> > I'm working on an utility for b-tree index, called `pgstatindex`.
>
> Does it make sense to merge the pgstatindex stuff with pgstattuple, and
> have the fragmentation report into pgstatindex instead of pgstattuple
> itself?

It sounds good. We will have two separate commands:
- pgstattuple: returns tuple-level information
- pgstatindex: returns page-level information

We can use tuple-level info to check LP_DELETE flags on index tuples,
and use page-level info to check needs for REINDEX.

Do we add pgstatindex as a new contrib module,
or merge it into contrib/pgstattuple?

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgresql(dot)org, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-patches(at)postgresql(dot)org, ishii(at)sraoss(dot)co(dot)jp
Subject: Re: pgstattuple extension for indexes
Date: 2006-07-25 04:52:13
Message-ID: 5240.1153803133@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> writes:
> Do we add pgstatindex as a new contrib module,
> or merge it into contrib/pgstattuple?

I believe Alvaro was suggesting that you should add it as an additional
SQL function within contrib/pgstattuple. That'd be my advice too ---
I don't see a reason to break this out as a separate contrib module.

regards, tom lane


From: satoshi nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>
To: pgsql-hackers(at)postgresql(dot)org, pgsql-patches(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Bruce Momjian <bruce(at)momjian(dot)us>, ishii(at)sraoss(dot)co(dot)jp
Subject: Re: pgstattuple extension for indexes
Date: 2006-07-28 06:00:18
Message-ID: 44C9A7F2.2010607@nttdata.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Hi folks,

As I said on -PATCHES, I've been working on an utility to get
a b-tree index information. I'm happy to introduce
my new functions to you.

pgstattuple module provides a `pgstatindex()`, and other small
functions, which allow you to get b-tree internal information.
I believe this module will be helpful to know b-tree index deeply.

So please try it, send comment to me, and have fun.

Thanks,
--
NAGAYASU Satoshi <nagayasus(at)nttdata(dot)co(dot)jp>

-----------------------------------------------------
pgbench=# \x
Expanded display is on.
pgbench=# SELECT * FROM pgstatindex('accounts_pkey');
-[ RECORD 1 ]------+--------
version | 2
tree_level | 1
index_size | 3588096
root_block_no | 3
internal_pages | 0
leaf_pages | 437
empty_pages | 0
deleted_pages | 0
avg_leaf_density | 59.5
leaf_fragmentation | 49.89
-----------------------------------------------------

Attachment Content-Type Size
pgstatindex.tar.gz application/x-gzip 5.9 KB

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: satoshi nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-patches(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, ishii(at)sraoss(dot)co(dot)jp
Subject: Re: pgstattuple extension for indexes
Date: 2006-07-28 13:31:02
Message-ID: 200607281331.k6SDV2P12212@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


I thought these new functions were going to be merged into
/contrib/pgstattuple.

---------------------------------------------------------------------------

satoshi nagayasu wrote:
> Hi folks,
>
> As I said on -PATCHES, I've been working on an utility to get
> a b-tree index information. I'm happy to introduce
> my new functions to you.
>
> pgstattuple module provides a `pgstatindex()`, and other small
> functions, which allow you to get b-tree internal information.
> I believe this module will be helpful to know b-tree index deeply.
>
> So please try it, send comment to me, and have fun.
>
> Thanks,
> --
> NAGAYASU Satoshi <nagayasus(at)nttdata(dot)co(dot)jp>
>
> -----------------------------------------------------
> pgbench=# \x
> Expanded display is on.
> pgbench=# SELECT * FROM pgstatindex('accounts_pkey');
> -[ RECORD 1 ]------+--------
> version | 2
> tree_level | 1
> index_size | 3588096
> root_block_no | 3
> internal_pages | 0
> leaf_pages | 437
> empty_pages | 0
> deleted_pages | 0
> avg_leaf_density | 59.5
> leaf_fragmentation | 49.89
> -----------------------------------------------------
>
>

[ application/x-gzip is not supported, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: satoshi nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org, pgsql-patches(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, ishii(at)sraoss(dot)co(dot)jp
Subject: Re: [HACKERS] pgstattuple extension for indexes
Date: 2006-07-28 14:03:31
Message-ID: 20060728140331.GD32007@surnet.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian wrote:
>
> I thought these new functions were going to be merged into
> /contrib/pgstattuple.

Well, that's exactly what this patch seems to do ...

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: satoshi nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org, pgsql-patches(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, ishii(at)sraoss(dot)co(dot)jp
Subject: Re: [HACKERS] pgstattuple extension for indexes
Date: 2006-07-28 16:22:54
Message-ID: 200607281622.k6SGMs905735@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Alvaro Herrera wrote:
> Bruce Momjian wrote:
> >
> > I thought these new functions were going to be merged into
> > /contrib/pgstattuple.
>
> Well, that's exactly what this patch seems to do ...

Well, looking at the tarball it puts everything in pgstatindex, and the
Makefile is:

#-------------------------------------------------------------------------
#
# pgstatindex Makefile
#
# $PostgreSQL$
#
#-------------------------------------------------------------------------

SRCS = pgstatindex.c

MODULE_big = pgstatindex
OBJS = $(SRCS:.c=.o)
DOCS =
DATA_built = pgstatindex.sql

ifdef USE_PGXS
PGXS := $(shell pg_config --pgxs)
include $(PGXS)
else
subdir = contrib/pgstatindex
top_builddir = /home/snaga/pgsql/sources/postgresql-8.1.3
include $(top_builddir)/src/Makefile.global
include $(top_srcdir)/contrib/contrib-global.mk
endif

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: satoshi nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-patches(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, ishii(at)sraoss(dot)co(dot)jp
Subject: Re: pgstattuple extension for indexes
Date: 2006-08-10 01:56:49
Message-ID: 200608100156.k7A1unU21775@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


nagayasu-san,

This looks good, but we would like the code added to
/contrib/pgstattuple, rather than it being its own /contrib module. Can
you make that adjustment? Thanks.

---------------------------------------------------------------------------

satoshi nagayasu wrote:
> Hi folks,
>
> As I said on -PATCHES, I've been working on an utility to get
> a b-tree index information. I'm happy to introduce
> my new functions to you.
>
> pgstattuple module provides a `pgstatindex()`, and other small
> functions, which allow you to get b-tree internal information.
> I believe this module will be helpful to know b-tree index deeply.
>
> So please try it, send comment to me, and have fun.
>
> Thanks,
> --
> NAGAYASU Satoshi <nagayasus(at)nttdata(dot)co(dot)jp>
>
> -----------------------------------------------------
> pgbench=# \x
> Expanded display is on.
> pgbench=# SELECT * FROM pgstatindex('accounts_pkey');
> -[ RECORD 1 ]------+--------
> version | 2
> tree_level | 1
> index_size | 3588096
> root_block_no | 3
> internal_pages | 0
> leaf_pages | 437
> empty_pages | 0
> deleted_pages | 0
> avg_leaf_density | 59.5
> leaf_fragmentation | 49.89
> -----------------------------------------------------
>
>

[ application/x-gzip is not supported, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-patches(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, ishii(at)sraoss(dot)co(dot)jp
Subject: Re: pgstattuple extension for indexes
Date: 2006-08-10 02:14:18
Message-ID: 44DA967A.9010608@nttdata.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce,

I'll fix it in this week. Please wait a few days.
Thanks.

Bruce Momjian wrote:
> nagayasu-san,
>
> This looks good, but we would like the code added to
> /contrib/pgstattuple, rather than it being its own /contrib module. Can
> you make that adjustment? Thanks.
>
> ---------------------------------------------------------------------------
>
> satoshi nagayasu wrote:
>> Hi folks,
>>
>> As I said on -PATCHES, I've been working on an utility to get
>> a b-tree index information. I'm happy to introduce
>> my new functions to you.
>>
>> pgstattuple module provides a `pgstatindex()`, and other small
>> functions, which allow you to get b-tree internal information.
>> I believe this module will be helpful to know b-tree index deeply.
>>
>> So please try it, send comment to me, and have fun.
>>
>> Thanks,
>> --
>> NAGAYASU Satoshi <nagayasus(at)nttdata(dot)co(dot)jp>
>>
>> -----------------------------------------------------
>> pgbench=# \x
>> Expanded display is on.
>> pgbench=# SELECT * FROM pgstatindex('accounts_pkey');
>> -[ RECORD 1 ]------+--------
>> version | 2
>> tree_level | 1
>> index_size | 3588096
>> root_block_no | 3
>> internal_pages | 0
>> leaf_pages | 437
>> empty_pages | 0
>> deleted_pages | 0
>> avg_leaf_density | 59.5
>> leaf_fragmentation | 49.89
>> -----------------------------------------------------
>>
>>
>
> [ application/x-gzip is not supported, skipping... ]
>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 4: Have you searched our list archives?
>>
>> http://archives.postgresql.org
>

--
NAGAYASU Satoshi <nagayasus(at)nttdata(dot)co(dot)jp>
Phone: +81-3-3523-8122


From: Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>
To: Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-patches(at)postgresql(dot)org
Cc: Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, ishii(at)sraoss(dot)co(dot)jp
Subject: Re: pgstattuple extension for indexes
Date: 2006-08-11 10:08:31
Message-ID: 44DC571F.3080109@nttdata.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Hi all,

Here is a patch to add pgstatindex functions to the pgstattuple module,
which can work with 8.1.4. Please review and try it. Thanks.

Satoshi Nagayasu wrote:
> Bruce,
>
> I'll fix it in this week. Please wait a few days.
> Thanks.
>
> Bruce Momjian wrote:
>> nagayasu-san,
>>
>> This looks good, but we would like the code added to
>> /contrib/pgstattuple, rather than it being its own /contrib module. Can
>> you make that adjustment? Thanks.
>>
>> ---------------------------------------------------------------------------
>>
>> satoshi nagayasu wrote:
>>> Hi folks,
>>>
>>> As I said on -PATCHES, I've been working on an utility to get
>>> a b-tree index information. I'm happy to introduce
>>> my new functions to you.
>>>
>>> pgstattuple module provides a `pgstatindex()`, and other small
>>> functions, which allow you to get b-tree internal information.
>>> I believe this module will be helpful to know b-tree index deeply.
>>>
>>> So please try it, send comment to me, and have fun.
>>>
>>> Thanks,
>>> --
>>> NAGAYASU Satoshi <nagayasus(at)nttdata(dot)co(dot)jp>
>>>
>>> -----------------------------------------------------
>>> pgbench=# \x
>>> Expanded display is on.
>>> pgbench=# SELECT * FROM pgstatindex('accounts_pkey');
>>> -[ RECORD 1 ]------+--------
>>> version | 2
>>> tree_level | 1
>>> index_size | 3588096
>>> root_block_no | 3
>>> internal_pages | 0
>>> leaf_pages | 437
>>> empty_pages | 0
>>> deleted_pages | 0
>>> avg_leaf_density | 59.5
>>> leaf_fragmentation | 49.89
>>> -----------------------------------------------------
>>>
>>>
>> [ application/x-gzip is not supported, skipping... ]
>>
>>> ---------------------------(end of broadcast)---------------------------
>>> TIP 4: Have you searched our list archives?
>>>
>>> http://archives.postgresql.org
>
>

--
NAGAYASU Satoshi <nagayasus(at)nttdata(dot)co(dot)jp>
Phone: +81-3-3523-8122

Attachment Content-Type Size
pgstatindex.diff text/plain 22.8 KB

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>
Cc: pgsql-patches(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, ishii(at)sraoss(dot)co(dot)jp
Subject: Re: pgstattuple extension for indexes
Date: 2006-08-11 14:59:05
Message-ID: 200608111459.k7BEx5B15018@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


I don't see any documentation, so I assume you want me to add something
to README.pgstattuple.

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---------------------------------------------------------------------------

Satoshi Nagayasu wrote:
> Hi all,
>
> Here is a patch to add pgstatindex functions to the pgstattuple module,
> which can work with 8.1.4. Please review and try it. Thanks.
>
>
> Satoshi Nagayasu wrote:
> > Bruce,
> >
> > I'll fix it in this week. Please wait a few days.
> > Thanks.
> >
> > Bruce Momjian wrote:
> >> nagayasu-san,
> >>
> >> This looks good, but we would like the code added to
> >> /contrib/pgstattuple, rather than it being its own /contrib module. Can
> >> you make that adjustment? Thanks.
> >>
> >> ---------------------------------------------------------------------------
> >>
> >> satoshi nagayasu wrote:
> >>> Hi folks,
> >>>
> >>> As I said on -PATCHES, I've been working on an utility to get
> >>> a b-tree index information. I'm happy to introduce
> >>> my new functions to you.
> >>>
> >>> pgstattuple module provides a `pgstatindex()`, and other small
> >>> functions, which allow you to get b-tree internal information.
> >>> I believe this module will be helpful to know b-tree index deeply.
> >>>
> >>> So please try it, send comment to me, and have fun.
> >>>
> >>> Thanks,
> >>> --
> >>> NAGAYASU Satoshi <nagayasus(at)nttdata(dot)co(dot)jp>
> >>>
> >>> -----------------------------------------------------
> >>> pgbench=# \x
> >>> Expanded display is on.
> >>> pgbench=# SELECT * FROM pgstatindex('accounts_pkey');
> >>> -[ RECORD 1 ]------+--------
> >>> version | 2
> >>> tree_level | 1
> >>> index_size | 3588096
> >>> root_block_no | 3
> >>> internal_pages | 0
> >>> leaf_pages | 437
> >>> empty_pages | 0
> >>> deleted_pages | 0
> >>> avg_leaf_density | 59.5
> >>> leaf_fragmentation | 49.89
> >>> -----------------------------------------------------
> >>>
> >>>
> >> [ application/x-gzip is not supported, skipping... ]
> >>
> >>> ---------------------------(end of broadcast)---------------------------
> >>> TIP 4: Have you searched our list archives?
> >>>
> >>> http://archives.postgresql.org
> >
> >
>
>
> --
> NAGAYASU Satoshi <nagayasus(at)nttdata(dot)co(dot)jp>
> Phone: +81-3-3523-8122

> diff -ruN pgstattuple.orig/Makefile pgstattuple/Makefile
> --- pgstattuple.orig/Makefile 2006-08-10 19:22:47.000000000 +0900
> +++ pgstattuple/Makefile 2006-08-10 19:24:05.000000000 +0900
> @@ -6,7 +6,7 @@
> #
> #-------------------------------------------------------------------------
>
> -SRCS = pgstattuple.c
> +SRCS = pgstattuple.c pgstatindex.c
>
> MODULE_big = pgstattuple
> OBJS = $(SRCS:.c=.o)
> diff -ruN pgstattuple.orig/pgstatindex.c pgstattuple/pgstatindex.c
> --- pgstattuple.orig/pgstatindex.c 1970-01-01 09:00:00.000000000 +0900
> +++ pgstattuple/pgstatindex.c 2006-08-11 17:51:26.000000000 +0900
> @@ -0,0 +1,714 @@
> +/*
> + * pgstatindex
> + *
> + * Copyright (c) 2006 Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>
> + *
> + * Permission to use, copy, modify, and distribute this software and
> + * its documentation for any purpose, without fee, and without a
> + * written agreement is hereby granted, provided that the above
> + * copyright notice and this paragraph and the following two
> + * paragraphs appear in all copies.
> + *
> + * IN NO EVENT SHALL THE AUTHOR BE LIABLE TO ANY PARTY FOR DIRECT,
> + * INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
> + * LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
> + * DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED
> + * OF THE POSSIBILITY OF SUCH DAMAGE.
> + *
> + * THE AUTHOR SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT
> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + * A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS
> + * IS" BASIS, AND THE AUTHOR HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE,
> + * SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
> + */
> +
> +#include "postgres.h"
> +
> +#include "fmgr.h"
> +#include "funcapi.h"
> +#include "access/heapam.h"
> +#include "access/itup.h"
> +#include "access/nbtree.h"
> +#include "access/transam.h"
> +#include "catalog/namespace.h"
> +#include "catalog/pg_type.h"
> +#include "utils/builtins.h"
> +#include "utils/inval.h"
> +
> +PG_FUNCTION_INFO_V1(pgstatindex);
> +PG_FUNCTION_INFO_V1(bt_metap);
> +PG_FUNCTION_INFO_V1(bt_page_items);
> +PG_FUNCTION_INFO_V1(bt_page_stats);
> +PG_FUNCTION_INFO_V1(pg_relpages);
> +
> +extern Datum pgstatindex(PG_FUNCTION_ARGS);
> +extern Datum bt_metap(PG_FUNCTION_ARGS);
> +extern Datum bt_page_items(PG_FUNCTION_ARGS);
> +extern Datum bt_page_stats(PG_FUNCTION_ARGS);
> +extern Datum pg_relpages(PG_FUNCTION_ARGS);
> +
> +#define PGSTATINDEX_TYPE "public.pgstatindex_type"
> +#define PGSTATINDEX_NCOLUMNS 10
> +
> +#define BTMETAP_TYPE "public.bt_metap_type"
> +#define BTMETAP_NCOLUMNS 6
> +
> +#define BTPAGEITEMS_TYPE "public.bt_page_items_type"
> +#define BTPAGEITEMS_NCOLUMNS 6
> +
> +#define BTPAGESTATS_TYPE "public.bt_page_stats_type"
> +#define BTPAGESTATS_NCOLUMNS 12
> +
> +
> +#define IS_INDEX(r) ((r)->rd_rel->relkind == 'i')
> +#define IS_BTREE(r) ((r)->rd_rel->relam == BTREE_AM_OID)
> +
> +#define CHECK_PAGE_OFFSET_RANGE(page, offset) { \
> + if ( !(FirstOffsetNumber<=(offset) && \
> + (offset)<=PageGetMaxOffsetNumber(page)) ) \
> + elog(ERROR, "Page offset number out of range."); }
> +
> +#define CHECK_RELATION_BLOCK_RANGE(rel, blkno) { \
> + if ( (blkno)<0 && RelationGetNumberOfBlocks((rel))<=(blkno) ) \
> + elog(ERROR, "Block number out of range."); }
> +
> +/* ------------------------------------------------
> + * structure for single btree page statistics
> + * ------------------------------------------------
> + */
> +typedef struct BTPageStat {
> + uint32 blkno;
> + uint32 live_items;
> + uint32 dead_items;
> + uint32 page_size;
> + uint32 max_avail;
> + uint32 free_size;
> + uint32 avg_item_size;
> + uint32 fragments;
> + bool is_root;
> + bool is_internal;
> + bool is_leaf;
> + bool is_deleted;
> + bool is_empty;
> +} BTPageStat;
> +
> +/* ------------------------------------------------
> + * A structure for a whole btree index statistics
> + * used by pgstatindex().
> + * ------------------------------------------------
> + */
> +typedef struct BTIndexStat {
> + uint32 version;
> + BlockNumber root_blkno;
> + uint32 level;
> +
> + uint32 live_items;
> + uint32 dead_items;
> +
> + uint32 root_pages;
> + uint32 internal_pages;
> + uint32 leaf_pages;
> + uint32 empty_pages;
> + uint32 deleted_pages;
> +
> + uint32 page_size;
> + uint32 avg_item_size;
> +
> + uint32 max_avail;
> + uint32 free_space;
> +
> + uint32 fragments;
> +} BTIndexStat;
> +
> +/* -------------------------------------------------
> + * GetBTPageStatistics()
> + *
> + * Collect statistics of single b-tree leaf page
> + * -------------------------------------------------
> + */
> +static bool
> +GetBTPageStatistics(BlockNumber blkno, Buffer buffer, BTPageStat *stat)
> +{
> + Page page = BufferGetPage(buffer);
> + PageHeader phdr = (PageHeader) page;
> + OffsetNumber maxoff = PageGetMaxOffsetNumber(page);
> + BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);
> + int item_size = 0;
> + int off;
> +
> + stat->blkno = blkno;
> +
> + stat->max_avail = BLCKSZ - ( BLCKSZ - phdr->pd_special + SizeOfPageHeaderData );
> +
> + stat->dead_items = stat->live_items = 0;
> +
> + stat->page_size = PageGetPageSize(page);
> +
> + /* page type */
> + stat->is_root = false;
> + stat->is_leaf = false;
> + stat->is_deleted = false;
> + stat->is_empty = false;
> +
> + if ( P_ISDELETED(opaque) )
> + {
> + stat->is_deleted = true;
> + return true;
> + }
> + else if ( P_IGNORE(opaque) )
> + stat->is_empty = true;
> + else if ( P_ISLEAF(opaque) )
> + stat->is_leaf = true;
> + else if ( P_ISROOT(opaque) )
> + stat->is_root = true;
> + else
> + stat->is_internal = true;
> +
> + /*----------------------------------------------
> + * If a next leaf is on the previous block,
> + * it means a fragmentation.
> + *----------------------------------------------
> + */
> + stat->fragments = 0;
> + if ( stat->is_leaf )
> + {
> + if ( opaque->btpo_next != P_NONE && opaque->btpo_next < blkno )
> + stat->fragments++;
> + }
> +
> + /* count live and dead tuples, and free space */
> + for (off=FirstOffsetNumber ; off<=maxoff ; off++)
> + {
> + BTItem btitem;
> + IndexTuple itup;
> +
> + ItemId id = PageGetItemId(page, off);
> +
> + btitem = (BTItem)PageGetItem(page, id);
> +
> + itup = &(btitem->bti_itup);
> +
> + item_size += IndexTupleSize(itup);
> +
> + if ( !ItemIdDeleted(id) )
> + stat->live_items++;
> + else
> + stat->dead_items++;
> + }
> + stat->free_size = PageGetFreeSpace(page);
> +
> +#ifdef NOT_USED
> + elog(NOTICE, "%d/%d (%.2f%%)", stat->free_size, stat->max_avail,
> + (float)stat->free_size/(float)stat->max_avail*100.0);
> +#endif
> +
> + if ( (stat->live_items + stat->dead_items) > 0 )
> + stat->avg_item_size = item_size / ( stat->live_items + stat->dead_items );
> + else
> + stat->avg_item_size = 0;
> +
> + return true;
> +}
> +
> +
> +/* ------------------------------------------------------
> + * pgstatindex()
> + *
> + * Usage: SELECT * FROM pgstatindex('t1_pkey');
> + * ------------------------------------------------------
> + */
> +Datum
> +pgstatindex(PG_FUNCTION_ARGS)
> +{
> + text *relname = PG_GETARG_TEXT_P(0);
> + Relation rel;
> + RangeVar *relrv;
> + Datum result;
> + uint32 nblocks;
> + uint32 blkno;
> + BTIndexStat indexStat;
> +
> + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
> + rel = relation_openrv(relrv, AccessShareLock);
> +
> + if ( !IS_INDEX(rel) || !IS_BTREE(rel) )
> + elog(ERROR, "pgstatindex() can be used only on b-tree index.");
> +
> + /*-------------------
> + * Read a metapage
> + *-------------------
> + */
> + {
> + Buffer buffer = ReadBuffer(rel, 0);
> + Page page = BufferGetPage(buffer);
> + BTMetaPageData *metad = BTPageGetMeta(page);
> +
> +// snprintf(values[j++], 32, "%d", metad->btm_magic);
> + indexStat.version = metad->btm_version;
> + indexStat.root_blkno = metad->btm_root;
> + indexStat.level = metad->btm_level;
> +// snprintf(values[j++], 32, "%d", metad->btm_fastroot);
> +// snprintf(values[j++], 32, "%d", metad->btm_fastlevel);
> +
> + ReleaseBuffer(buffer);
> + }
> +
> + nblocks = RelationGetNumberOfBlocks(rel);
> +
> + /* -- init stat -- */
> + indexStat.fragments = 0;
> +
> + indexStat.root_pages = 0;
> + indexStat.leaf_pages = 0;
> + indexStat.internal_pages = 0;
> + indexStat.empty_pages = 0;
> + indexStat.deleted_pages = 0;
> +
> + indexStat.max_avail = 0;
> + indexStat.free_space = 0;
> +
> + /*-----------------------
> + * Scan all blocks
> + *-----------------------
> + */
> + for (blkno=1 ; blkno<nblocks ; blkno++)
> + {
> + Buffer buffer = ReadBuffer(rel, blkno);
> + BTPageStat stat;
> +
> + /* scan one page */
> + stat.blkno = blkno;
> + GetBTPageStatistics(blkno, buffer, &stat);
> +
> + /*---------------------
> + * page status (type)
> + *---------------------
> + */
> + if ( stat.is_deleted )
> + indexStat.deleted_pages++;
> + else if ( stat.is_leaf )
> + indexStat.leaf_pages++;
> + else if ( stat.is_internal )
> + indexStat.internal_pages++;
> + else if ( stat.is_empty )
> + indexStat.empty_pages++;
> + else if ( stat.is_root )
> + indexStat.root_pages++;
> + else
> + elog(ERROR, "unknown page status.");
> +
> + /* -- leaf fragmentation -- */
> + indexStat.fragments += stat.fragments;
> +
> + if ( stat.is_leaf )
> + {
> + indexStat.max_avail += stat.max_avail;
> + indexStat.free_space += stat.free_size;
> + }
> +
> + ReleaseBuffer(buffer);
> + }
> +
> + relation_close(rel, AccessShareLock);
> +
> +#ifdef NOT_USED
> + elog(NOTICE, "[index information]");
> + elog(NOTICE, "version.................: %d", indexStat.version);
> + elog(NOTICE, "tree level..............: %d", indexStat.level);
> + elog(NOTICE, "index size..............: %d", (indexStat.root_pages +
> + indexStat.leaf_pages +
> + indexStat.internal_pages +
> + indexStat.deleted_pages +
> + indexStat.empty_pages) * BLCKSZ);
> +
> + elog(NOTICE, "");
> + elog(NOTICE, "[page statistics]");
> + elog(NOTICE, "root block number.......: %d", indexStat.root_blkno);
> + elog(NOTICE, "internal pages..........: %d", indexStat.internal_pages);
> + elog(NOTICE, "leaf pages..............: %d", indexStat.leaf_pages);
> + elog(NOTICE, "empty pages.............: %d", indexStat.empty_pages);
> + elog(NOTICE, "deleted pages...........: %d", indexStat.deleted_pages);
> +
> + elog(NOTICE, "");
> + elog(NOTICE, "[leaf statistics]");
> + elog(NOTICE, "avg. leaf density.......: %.2f%%", 100.0-(float)indexStat.free_space / (float)indexStat.max_avail * 100.0 );
> + elog(NOTICE, "leaf fragmentation......: %.2f%%", (float)indexStat.fragments / 2.0 / (float)indexStat.leaf_pages * 100.0 );
> +#endif
> +
> + /*----------------------------
> + * Build a result tuple
> + *----------------------------
> + */
> + {
> + TupleDesc tupleDesc;
> + int j;
> + char *values[PGSTATINDEX_NCOLUMNS];
> +
> + HeapTupleData tupleData;
> + HeapTuple tuple = &tupleData;
> +
> + tupleDesc = RelationNameGetTupleDesc(PGSTATINDEX_TYPE);
> +
> + j = 0;
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", indexStat.version);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", indexStat.level);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", (indexStat.root_pages +
> + indexStat.leaf_pages +
> + indexStat.internal_pages +
> + indexStat.deleted_pages +
> + indexStat.empty_pages) * BLCKSZ );
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", indexStat.root_blkno);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", indexStat.internal_pages);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", indexStat.leaf_pages);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", indexStat.empty_pages);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", indexStat.deleted_pages);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%.2f", 100.0-(float)indexStat.free_space / (float)indexStat.max_avail * 100.0 );
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%.2f", (float)indexStat.fragments / (float)indexStat.leaf_pages * 100.0 );
> +
> + tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
> + values);
> +
> + result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
> + }
> +
> + PG_RETURN_DATUM( result );
> +}
> +
> +/* -----------------------------------------------
> + * bt_page()
> + *
> + * Usage: SELECT * FROM bt_page('t1_pkey', 0);
> + * -----------------------------------------------
> + */
> +Datum
> +bt_page_stats(PG_FUNCTION_ARGS)
> +{
> + text *relname = PG_GETARG_TEXT_P(0);
> + uint32 blkno = PG_GETARG_UINT32(1);
> + Buffer buffer;
> +
> + Relation rel;
> + RangeVar *relrv;
> + Datum result;
> +
> + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
> + rel = relation_openrv(relrv, AccessShareLock);
> +
> + CHECK_RELATION_BLOCK_RANGE(rel, blkno);
> +
> + buffer = ReadBuffer(rel, blkno);
> +
> + if ( !IS_INDEX(rel) || !IS_BTREE(rel) )
> + elog(ERROR, "bt_page_stats() can be used only on b-tree index.");
> +
> + if ( blkno==0 )
> + elog(ERROR, "Block 0 is a meta page.");
> +
> + {
> + HeapTuple tuple;
> + TupleDesc tupleDesc;
> + int j;
> + char *values[BTPAGESTATS_NCOLUMNS];
> +
> + BTPageStat stat;
> +
> + GetBTPageStatistics(blkno, buffer, &stat);
> +
> + tupleDesc = RelationNameGetTupleDesc(BTPAGESTATS_TYPE);
> +
> + j = 0;
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", stat.blkno);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", stat.live_items);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", stat.dead_items);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", stat.live_items + stat.dead_items);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", stat.avg_item_size);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", stat.page_size);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", stat.free_size);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", stat.is_deleted);
> + values[j] = palloc(32);
> +// snprintf(values[j++], 32, "%d", opaque->btpo_prev);
> + snprintf(values[j++], 32, "%d", 0); /*-- not implemented yet --*/
> + values[j] = palloc(32);
> +// snprintf(values[j++], 32, "%d", opaque->btpo_next);
> + snprintf(values[j++], 32, "%d", 0); /*-- not implemented yet --*/
> + values[j] = palloc(32);
> +// snprintf(values[j++], 32, "%d", opaque->btpo.level);
> + snprintf(values[j++], 32, "%d", 0); /*-- not implemented yet --*/
> + values[j] = palloc(32);
> +// snprintf(values[j++], 32, "%d", opaque->btpo_flags);
> + snprintf(values[j++], 32, "%d", 0); /*-- not implemented yet --*/
> +
> + tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
> + values);
> +
> + result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
> + }
> +
> + ReleaseBuffer(buffer);
> +
> + relation_close(rel, AccessShareLock);
> +
> + PG_RETURN_DATUM(result);
> +}
> +
> +/*-------------------------------------------------------
> + * bt_page_items()
> + *
> + * Get IndexTupleData set in a leaf page
> + *
> + * Usage: SELECT * FROM bt_page_items('t1_pkey', 0);
> + *-------------------------------------------------------
> + */
> +/* ---------------------------------------------------
> + * data structure for SRF to hold a scan information
> + * ---------------------------------------------------
> + */
> +struct user_args
> +{
> + TupleDesc tupd;
> + Relation rel;
> + Buffer buffer;
> + Page page;
> + uint16 offset;
> +};
> +
> +Datum
> +bt_page_items(PG_FUNCTION_ARGS)
> +{
> + text *relname = PG_GETARG_TEXT_P(0);
> + uint32 blkno = PG_GETARG_UINT32(1);
> +
> + RangeVar *relrv;
> + Datum result;
> + char *values[BTPAGEITEMS_NCOLUMNS];
> + BTPageOpaque opaque;
> + HeapTuple tuple;
> + ItemId id;
> +
> + FuncCallContext *fctx;
> + MemoryContext mctx;
> + struct user_args *uargs = NULL;
> +
> + if ( blkno==0 )
> + elog(ERROR, "Block 0 is a meta page.");
> +
> + if ( SRF_IS_FIRSTCALL() )
> + {
> + fctx = SRF_FIRSTCALL_INIT();
> + mctx = MemoryContextSwitchTo(fctx->multi_call_memory_ctx);
> +
> + uargs = palloc(sizeof(struct user_args));
> +
> + uargs->tupd = RelationNameGetTupleDesc(BTPAGEITEMS_TYPE);
> + uargs->offset = FirstOffsetNumber;
> +
> + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
> + uargs->rel = relation_openrv(relrv, AccessShareLock);
> +
> + CHECK_RELATION_BLOCK_RANGE(uargs->rel, blkno);
> +
> + uargs->buffer = ReadBuffer(uargs->rel, blkno);
> +
> + if ( !IS_INDEX(uargs->rel) || !IS_BTREE(uargs->rel) )
> + elog(ERROR, "bt_page_items() can be used only on b-tree index.");
> +
> + uargs->page = BufferGetPage(uargs->buffer);
> +
> + opaque = (BTPageOpaque) PageGetSpecialPointer(uargs->page);
> +
> + if ( P_ISDELETED(opaque) )
> + elog(NOTICE, "bt_page_items(): this page is deleted.");
> +
> + fctx->max_calls = PageGetMaxOffsetNumber(uargs->page);
> + fctx->user_fctx = uargs;
> +
> + MemoryContextSwitchTo(mctx);
> + }
> +
> + fctx = SRF_PERCALL_SETUP();
> + uargs = fctx->user_fctx;
> +
> + if ( fctx->call_cntr < fctx->max_calls )
> + {
> + BTItem btitem;
> + IndexTuple itup;
> +
> + id = PageGetItemId(uargs->page, uargs->offset);
> +
> + if ( !ItemIdIsValid(id) )
> + elog(ERROR, "Invalid ItemId.");
> +
> + btitem = (BTItem)PageGetItem(uargs->page, id);
> + itup = &(btitem->bti_itup);
> +
> + {
> + int j = 0;
> +
> + BlockNumber blkno = BlockIdGetBlockNumber(&(itup->t_tid.ip_blkid));
> +
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", uargs->offset);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "(%u,%u)", blkno, itup->t_tid.ip_posid);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", IndexTupleSize(itup));
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%c", IndexTupleHasNulls(itup) ? 't' : 'f');
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%c", IndexTupleHasVarwidths(itup) ? 't' : 'f');
> +
> + {
> + int off;
> + char *dump;
> + char *ptr = (char *)itup + IndexInfoFindDataOffset(itup->t_info);
> +
> + dump = palloc(IndexTupleSize(itup)*3);
> + memset(dump, 0, IndexTupleSize(itup)*3);
> +
> + for (off=0 ;
> + off < IndexTupleSize(itup)-IndexInfoFindDataOffset(itup->t_info) ;
> + off++)
> + {
> + if ( dump[0]=='\0' )
> + sprintf(dump, "%02x", *(ptr+off) & 0xff);
> + else
> + {
> + char buf[4];
> + sprintf(buf, " %02x", *(ptr+off) & 0xff);
> + strcat(dump, buf);
> + }
> + }
> + values[j] = dump;
> + }
> +
> + tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(uargs->tupd), values);
> + result = TupleGetDatum(TupleDescGetSlot(uargs->tupd), tuple);
> + }
> +
> + uargs->offset = uargs->offset + 1;
> +
> + SRF_RETURN_NEXT(fctx, result);
> + }
> + else
> + {
> + ReleaseBuffer(uargs->buffer);
> + relation_close(uargs->rel, AccessShareLock);
> +
> + SRF_RETURN_DONE(fctx);
> + }
> +}
> +
> +
> +/* ------------------------------------------------
> + * bt_metap()
> + *
> + * Get a btree meta-page information
> + *
> + * Usage: SELECT * FROM bt_metap('t1_pkey')
> + * ------------------------------------------------
> + */
> +Datum
> +bt_metap(PG_FUNCTION_ARGS)
> +{
> + text *relname = PG_GETARG_TEXT_P(0);
> + Buffer buffer;
> +
> + Relation rel;
> + RangeVar *relrv;
> + Datum result;
> +
> + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
> + rel = relation_openrv(relrv, AccessShareLock);
> +
> + if ( !IS_INDEX(rel) || !IS_BTREE(rel) )
> + elog(ERROR, "bt_metap() can be used only on b-tree index.");
> +
> + buffer = ReadBuffer(rel, 0);
> +
> + {
> + BTMetaPageData *metad;
> +
> + TupleDesc tupleDesc;
> + int j;
> + char *values[BTMETAP_NCOLUMNS];
> + HeapTuple tuple;
> +
> + Page page = BufferGetPage(buffer);
> +
> + metad = BTPageGetMeta(page);
> +
> + tupleDesc = RelationNameGetTupleDesc(BTMETAP_TYPE);
> +
> + j = 0;
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", metad->btm_magic);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", metad->btm_version);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", metad->btm_root);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", metad->btm_level);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", metad->btm_fastroot);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", metad->btm_fastlevel);
> +
> + tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
> + values);
> +
> + result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
> + }
> +
> + ReleaseBuffer(buffer);
> +
> + relation_close(rel, AccessShareLock);
> +
> + PG_RETURN_DATUM(result);
> +}
> +
> +/* --------------------------------------------------------
> + * pg_relpages()
> + *
> + * Get a number of pages of the table/index.
> + *
> + * Usage: SELECT pg_relpages('t1');
> + * SELECT pg_relpages('t1_pkey');
> + * --------------------------------------------------------
> + */
> +Datum
> +pg_relpages(PG_FUNCTION_ARGS)
> +{
> + text *relname = PG_GETARG_TEXT_P(0);
> +
> + Relation rel;
> + RangeVar *relrv;
> + int4 relpages;
> +
> + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
> + rel = relation_openrv(relrv, AccessShareLock);
> +
> + relpages = RelationGetNumberOfBlocks(rel);
> +
> + relation_close(rel, AccessShareLock);
> +
> + PG_RETURN_INT32(relpages);
> +}
> +
> diff -ruN pgstattuple.orig/pgstattuple.sql.in pgstattuple/pgstattuple.sql.in
> --- pgstattuple.orig/pgstattuple.sql.in 2006-08-10 19:22:47.000000000 +0900
> +++ pgstattuple/pgstattuple.sql.in 2006-08-11 17:46:19.000000000 +0900
> @@ -22,3 +22,97 @@
> RETURNS pgstattuple_type
> AS 'MODULE_PATHNAME', 'pgstattuplebyid'
> LANGUAGE 'C' STRICT;
> +
> +--
> +-- pgstatindex
> +--
> +DROP TYPE pgstatindex_type CASCADE;
> +CREATE TYPE pgstatindex_type AS (
> + version int4,
> + tree_level int4,
> + index_size int4,
> + root_block_no int4,
> + internal_pages int4,
> + leaf_pages int4,
> + empty_pages int4,
> + deleted_pages int4,
> + avg_leaf_density float8,
> + leaf_fragmentation float8
> +);
> +
> +CREATE OR REPLACE FUNCTION pgstatindex(text)
> +RETURNS pgstatindex_type
> +AS 'MODULE_PATHNAME', 'pgstatindex'
> +LANGUAGE 'C' STRICT;
> +
> +--
> +-- bt_metap()
> +--
> +DROP TYPE bt_metap_type CASCADE;
> +CREATE TYPE bt_metap_type AS (
> + magic int4,
> + version int4,
> + root int4,
> + level int4,
> + fastroot int4,
> + fastlevel int4
> +);
> +
> +CREATE OR REPLACE FUNCTION bt_metap(text)
> +RETURNS bt_metap_type
> +AS 'MODULE_PATHNAME', 'bt_metap'
> +LANGUAGE 'C' STRICT;
> +
> +--
> +-- bt_page_items()
> +--
> +DROP TYPE bt_page_items_type CASCADE;
> +CREATE TYPE bt_page_items_type AS (
> + itemoffset int4,
> + ctid tid,
> + itemlen int4,
> + nulls bool,
> + vars bool,
> + data text
> +);
> +
> +DROP FUNCTION bt_page_items(text, int4);
> +
> +CREATE OR REPLACE FUNCTION bt_page_items(text, int4)
> +RETURNS SETOF bt_page_items_type
> +AS 'MODULE_PATHNAME', 'bt_page_items'
> +LANGUAGE 'C' STRICT;
> +
> +--
> +-- bt_page_stats()
> +--
> +DROP TYPE bt_page_stats_type CASCADE;
> +CREATE TYPE bt_page_stats_type AS (
> + blkno int4,
> + live_items int4,
> + dead_items int4,
> + total_items int4,
> + avg_item_size float,
> + page_size int4,
> + free_size int4,
> + is_deleted int4,
> + btpo_prev int4,
> + btpo_next int4,
> + btpo_level int4,
> + btpo_flags int4
> +);
> +
> +DROP FUNCTION bt_page_stats(text, int4);
> +
> +CREATE OR REPLACE FUNCTION bt_page_stats(text, int4)
> +RETURNS bt_page_stats_type
> +AS 'MODULE_PATHNAME', 'bt_page_stats'
> +LANGUAGE 'C' STRICT;
> +
> +--
> +-- pg_relpages()
> +--
> +CREATE OR REPLACE FUNCTION pg_relpages(text)
> +RETURNS int
> +AS 'MODULE_PATHNAME', 'pg_relpages'
> +LANGUAGE 'C' STRICT;
> diff -ruN pgstattuple.orig/test.sh pgstattuple/test.sh
> --- pgstattuple.orig/test.sh 1970-01-01 09:00:00.000000000 +0900
> +++ pgstattuple/test.sh 2006-08-11 17:47:05.000000000 +0900
> @@ -0,0 +1,22 @@
> +#!/bin/sh
> +
> +export PATH=/usr/local/pgsql814/bin:$PATH
> +
> +psql pgbench<<EOF
> +DROP FUNCTION pgstatindex(text);
> +EOF
> +
> +psql -f /usr/local/pgsql814/share/contrib/pgstattuple.sql pgbench
> +
> +psql pgbench<<EOF
> +SELECT * FROM pg_relpages('accounts_pkey');
> +\x
> +SELECT * FROM pgstatindex('accounts_pkey');
> +SELECT * FROM bt_metap('accounts_pkey');
> +\x
> +SELECT * FROM bt_page_items('accounts_pkey', 0);
> +SELECT * FROM bt_page_items('accounts_pkey', 1) limit 20;
> +
> +\x
> +SELECT * FROM bt_page_stats('accounts_pkey', 1);
> +EOF
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-patches(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, ishii(at)sraoss(dot)co(dot)jp
Subject: Re: pgstattuple extension for indexes
Date: 2006-08-12 01:42:58
Message-ID: 20060812014258.GA29973@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Satoshi Nagayasu wrote:
> Hi all,
>
> Here is a patch to add pgstatindex functions to the pgstattuple module,
> which can work with 8.1.4. Please review and try it. Thanks.

Huh, I bet it works with 8.1.4, but it doesn't work on CVS HEAD:

/pgsql/source/00orig/contrib/pgstattuple/pgstatindex.c: In function 'GetBTPageStatistics':
/pgsql/source/00orig/contrib/pgstattuple/pgstatindex.c:182: error: 'BTItem' undeclared (first use in this function)

While you're at it, please consider removing C++ style comments and
unused code.

Formatting is way off as well, but I guess that is easily fixed with
pgindent.

Regarding the pg_relpages function, why do you think it's necessary?
(It returns the true number of blocks of a given relation). It may
belong into core given a reasonable use case, but otherwise it doesn't
seem to belong into pgstatindex (or pgstattuple for that matter).

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-patches(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, ishii(at)sraoss(dot)co(dot)jp
Subject: Re: pgstattuple extension for indexes
Date: 2006-08-12 04:12:42
Message-ID: 44DD553A.6070301@nttdata.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Alvaro,

Alvaro Herrera wrote:
> Huh, I bet it works with 8.1.4, but it doesn't work on CVS HEAD:
>
> /pgsql/source/00orig/contrib/pgstattuple/pgstatindex.c: In function 'GetBTPageStatistics':
> /pgsql/source/00orig/contrib/pgstattuple/pgstatindex.c:182: error: 'BTItem' undeclared (first use in this function)
>
>
> While you're at it, please consider removing C++ style comments and
> unused code.
>
> Formatting is way off as well, but I guess that is easily fixed with
> pgindent.

Thanks for comments. I'm going to fix my patch from now.

> Regarding the pg_relpages function, why do you think it's necessary?
> (It returns the true number of blocks of a given relation). It may
> belong into core given a reasonable use case, but otherwise it doesn't
> seem to belong into pgstatindex (or pgstattuple for that matter).

I wanted to sample some pages from the table/index, and get their statistics
to know table/index conditions. I know pgstattuple() reports table
statistics, however, pgstattuple() generates heavy CPU and I/O load.

When we need to sample some pages from table/index, we need to know
true number of blocks.

I have another function, called pgstatpage(), to get information inside
a single block/page statistics of the table. pg_relpages() will be used
with this.

Sorry for not mentioned in previous post about pgstatpage(),
but I've remembered about it just now.

Many memories in my brain have already `paged-out` (too busy in last few months),
and some of them got `out-of-memory`. :^)

Thanks.
--
NAGAYASU Satoshi <nagayasus(at)nttdata(dot)co(dot)jp>
Phone: +81-3-3523-8122


From: Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-patches(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, ishii(at)sraoss(dot)co(dot)jp
Subject: Re: pgstattuple extension for indexes
Date: 2006-08-14 02:33:19
Message-ID: 44DFE0EF.7050600@nttdata.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce,

Attached patch has been cleaned up,
and modified to be able to work with CVS HEAD.

Thanks.

Satoshi Nagayasu wrote:
> Alvaro,
>
> Alvaro Herrera wrote:
>> Huh, I bet it works with 8.1.4, but it doesn't work on CVS HEAD:
>>
>> /pgsql/source/00orig/contrib/pgstattuple/pgstatindex.c: In function 'GetBTPageStatistics':
>> /pgsql/source/00orig/contrib/pgstattuple/pgstatindex.c:182: error: 'BTItem' undeclared (first use in this function)
>>
>>
>> While you're at it, please consider removing C++ style comments and
>> unused code.
>>
>> Formatting is way off as well, but I guess that is easily fixed with
>> pgindent.
>
> Thanks for comments. I'm going to fix my patch from now.
>
>> Regarding the pg_relpages function, why do you think it's necessary?
>> (It returns the true number of blocks of a given relation). It may
>> belong into core given a reasonable use case, but otherwise it doesn't
>> seem to belong into pgstatindex (or pgstattuple for that matter).
>
> I wanted to sample some pages from the table/index, and get their statistics
> to know table/index conditions. I know pgstattuple() reports table
> statistics, however, pgstattuple() generates heavy CPU and I/O load.
>
> When we need to sample some pages from table/index, we need to know
> true number of blocks.
>
> I have another function, called pgstatpage(), to get information inside
> a single block/page statistics of the table. pg_relpages() will be used
> with this.
>
> Sorry for not mentioned in previous post about pgstatpage(),
> but I've remembered about it just now.
>
> Many memories in my brain have already `paged-out` (too busy in last few months),
> and some of them got `out-of-memory`. :^)
>
> Thanks.

--
NAGAYASU Satoshi <nagayasus(at)nttdata(dot)co(dot)jp>
Phone: +81-3-3523-8122

Attachment Content-Type Size
pgstattuple.diff text/plain 21.5 KB

From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-patches(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, ishii(at)sraoss(dot)co(dot)jp
Subject: Re: pgstattuple extension for indexes
Date: 2006-08-14 04:51:40
Message-ID: 20060814045140.GA477@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Satoshi Nagayasu wrote:
> Bruce,
>
> Attached patch has been cleaned up,
> and modified to be able to work with CVS HEAD.

I was thinking, isn't it a lot cleaner to define the functions to use
OUT parameters instead of having to define a custom type for each?

Also, in 8.2 there is a uninstall SQL script -- ISTM you should put the
DROP commands there, not in the install script.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-patches(at)postgresql(dot)org, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, ishii(at)sraoss(dot)co(dot)jp
Subject: Re: pgstattuple extension for indexes
Date: 2006-08-14 11:58:12
Message-ID: 22273.1155556692@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> I was thinking, isn't it a lot cleaner to define the functions to use
> OUT parameters instead of having to define a custom type for each?

Not really --- it's a bit less notation maybe, but if he's got it
written like that already, I see no need to change it.

> Also, in 8.2 there is a uninstall SQL script -- ISTM you should put the
> DROP commands there, not in the install script.

Agreed.

regards, tom lane


From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pgstattuple extension for indexes
Date: 2006-08-17 03:55:28
Message-ID: 20060817123632.5F47.ITAGAKI.TAKAHIRO@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Hi Nagayasu san and folks,
I have a question about pgstatindex.

Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp> wrote:

> Attached patch has been cleaned up,
> and modified to be able to work with CVS HEAD.

Index leaf pages are ordered just after REINDEX.
[1] [2] [3]
After full-split, they will be the following:
[1] [3] [5] [2] [4] [6]
because new pages are allocated at the end of the index file.

I think this condition should be regarded as full-fragmented,
but pgstatindex reports that the leaf_fragmentation is 50%.

Presently, fragmentation factor is computed as the code below:

if (opaque->btpo_next != P_NONE && opaque->btpo_next < blkno)
stat->fragments++;

But the method has the above problem. So I suggest to use whether
the right link points to the next adjacent page or not.

if (opaque->btpo_next != P_NONE && opaque->btpo_next != blkno + 1)
stat->fragments++;

Do you think which method is better? Or do you have other ideas?

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pgstattuple extension for indexes
Date: 2006-08-17 12:23:48
Message-ID: 20060817122348.GB31061@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Thu, Aug 17, 2006 at 12:55:28PM +0900, ITAGAKI Takahiro wrote:
> I think this condition should be regarded as full-fragmented,
> but pgstatindex reports that the leaf_fragmentation is 50%.
>
> Presently, fragmentation factor is computed as the code below:
>
> if (opaque->btpo_next != P_NONE && opaque->btpo_next < blkno)
> stat->fragments++;
>
> But the method has the above problem. So I suggest to use whether
> the right link points to the next adjacent page or not.
>
> if (opaque->btpo_next != P_NONE && opaque->btpo_next != blkno + 1)
> stat->fragments++;
>
> Do you think which method is better? Or do you have other ideas?

If we do it your way, then every index will probably get a
fragmentation of nearly 100%. That's not very useful. I don't agree
that your example is fully fragmented, since on the first read the OS
will read the next four (or more) blocks so all the others are
zero-cost.

A more useful definition of fragmentation would be: if you're scanning
forward through an index, how often do you have to jump backwards
again. The current calculation seems to get that fairly right...

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.


From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pgstattuple extension for indexes
Date: 2006-08-17 19:54:20
Message-ID: 20060817195419.GD21363@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Thu, Aug 17, 2006 at 02:23:48PM +0200, Martijn van Oosterhout wrote:
> On Thu, Aug 17, 2006 at 12:55:28PM +0900, ITAGAKI Takahiro wrote:
> > I think this condition should be regarded as full-fragmented,
> > but pgstatindex reports that the leaf_fragmentation is 50%.
> >
> > Presently, fragmentation factor is computed as the code below:
> >
> > if (opaque->btpo_next != P_NONE && opaque->btpo_next < blkno)
> > stat->fragments++;
> >
> > But the method has the above problem. So I suggest to use whether
> > the right link points to the next adjacent page or not.
> >
> > if (opaque->btpo_next != P_NONE && opaque->btpo_next != blkno + 1)
> > stat->fragments++;
> >
> > Do you think which method is better? Or do you have other ideas?
>
> If we do it your way, then every index will probably get a
> fragmentation of nearly 100%. That's not very useful. I don't agree
> that your example is fully fragmented, since on the first read the OS
> will read the next four (or more) blocks so all the others are
> zero-cost.

Ok, fine... expand the example out to an index that's not trivial in
size. Even with read-ahead, once you get to a few megs (which is
obviously not that big), you're seeking.

> A more useful definition of fragmentation would be: if you're scanning
> forward through an index, how often do you have to jump backwards
> again. The current calculation seems to get that fairly right...

Well, what really matters is how often you'll need to seek (either
intra-track or inter-track). Any time you need to seek you're hit with a
pretty serious penalty. And until we have an asyncronous prefetch
process, a forward seek will most likely be just as expensive as a
backwards seek, because by the time the data winds it's way from the
drive back to PostgreSQL and the next read request winds it's way back
down to the drive the data you wanted has probably flown past the head.
Granted, I'm ignoring OS read-ahead here, but in a heavily fragmented
index that you're actually reading off disk (ie: it's not trivially
small), that read-ahead isn't likely to help you too much.

Given all that, I'd argue that it's best to consider any page that isn't
in-order as another fragment.

On another note, now that scans happen at a per-page level, does that
make some kind of online index clustering command a possibility? Another
thought that comes to mind is putting enough brains in the indexes or
the FSM to request free pages that are in a specific region of the index
file. That would allow things to stay less fragmented. Of course a
similar method could be used to try and maintain a table heap in cluster
order, and I suspect that method would probably be a lot easier to
impliment.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pgstattuple extension for indexes
Date: 2006-08-17 21:10:45
Message-ID: 20060817211045.GA5710@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Thu, Aug 17, 2006 at 02:54:20PM -0500, Jim C. Nasby wrote:
> On Thu, Aug 17, 2006 at 02:23:48PM +0200, Martijn van Oosterhout wrote:
> > On Thu, Aug 17, 2006 at 12:55:28PM +0900, ITAGAKI Takahiro wrote:
> > > But the method has the above problem. So I suggest to use whether
> > > the right link points to the next adjacent page or not.
> > >
> > > if (opaque->btpo_next != P_NONE && opaque->btpo_next != blkno + 1)
> > > stat->fragments++;
> > >
> > > Do you think which method is better? Or do you have other ideas?
>
> Ok, fine... expand the example out to an index that's not trivial in
> size. Even with read-ahead, once you get to a few megs (which is
> obviously not that big), you're seeking.

Well, mostly I'm just saying that only matching on the next block
number is going to give unrealistically low numbers. We can't ignore OS
level caching, the way Postgres works relies on it in many ways.

I'd suggest something like: btpo_next between blkno and blkno + X, to
try to take into account caching. But I'm not sure that will give
numbers significantly different from what is already generated.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.


From: Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>
To: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pgstattuple extension for indexes
Date: 2006-08-18 00:15:59
Message-ID: 44E506BF.8090703@nttdata.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

ITAGAKI Takahiro wrote:
> But the method has the above problem. So I suggest to use whether
> the right link points to the next adjacent page or not.
>
> if (opaque->btpo_next != P_NONE && opaque->btpo_next != blkno + 1)
> stat->fragments++;

Well, in that way, following two conditions,
[1] [x] [2] [y] [3]
and
[3] [x] [2] [y] [1]
will be calculated as same fragmentation ratio(100%), I can't agree
with that, because both will generate different costs while index scan
in the real world (I don't care about page splitting algorithm now).

If we think 'fragmentation' more strictly, the fragmentation ratio
should be calculated with 'distance' and 'direction' of the block
ordering and positions, because
[1] [x] [y] [z] [2]
and
[2] [x] [y] [1] [z]
have different costs each.

However, in such way, if I get '57.6%' as a fragmentation radio,
what does it mean? What can I do next? Two cases (forward ordered blocks
with some gaps, and backward ordered blocks with some gaps) are clearly
different, but will result same radios.

Understanding and estimating real cost of the index scan is difficult.
So I want to think 'fragmentation radio' simply,
"How many backward seeks will occur while your index scan?".

I guess, in some cases, people will want to know more detailed information,
but most people need a tool which is easy to use and easy to understand.
And I believe present calculation is good enough.

--
NAGAYASU Satoshi <nagayasus(at)nttdata(dot)co(dot)jp>
Phone: +81-3-3523-8122


From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pgstattuple extension for indexes
Date: 2006-08-18 00:50:29
Message-ID: 20060818092409.5557.ITAGAKI.TAKAHIRO@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp> wrote:

> Well, in that way, following two conditions,
> [1] [x] [2] [y] [3]
> and
> [3] [x] [2] [y] [1]
> will be calculated as same fragmentation ratio(100%), I can't agree
> with that, because both will generate different costs while index scan
> in the real world (I don't care about page splitting algorithm now).

I think the calculations (100%) are appropriate, because we should do
REINDEX in both case. Supposing to the sizes of [x], [y] are mega or giga
bytes, the order is not important; we have to do large seeks in both case.
In addition, the latter case rarely happens in real world, isn't it?

> However, in such way, if I get '57.6%' as a fragmentation radio,
> what does it mean? What can I do next?

I think the information of fragmentations are probably not
the most important; the information users want to know are
"When to do REINDEX?" and "How to set the fillfactor?".

I hope you to write how to interpret the framgentation (and other) info
in README. In my understanding, I'll write "You'd better do REINDEX when
you see the fragmentation is greater than 50%" under the present
calculation method.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


From: Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>
To: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pgstattuple extension for indexes
Date: 2006-08-18 01:38:13
Message-ID: 44E51A05.10601@nttdata.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

ITAGAKI Takahiro wrote:
> Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp> wrote:
>
>> Well, in that way, following two conditions,
>> [1] [x] [2] [y] [3]
>> and
>> [3] [x] [2] [y] [1]
>> will be calculated as same fragmentation ratio(100%), I can't agree
>> with that, because both will generate different costs while index scan
>> in the real world (I don't care about page splitting algorithm now).
>
> I think the calculations (100%) are appropriate, because we should do
> REINDEX in both case. Supposing to the sizes of [x], [y] are mega or giga
> bytes, the order is not important; we have to do large seeks in both case.

I don't think so. A few blocks forward skip while scan can be reasonable
and acceptable (of course, it's case by case).

BTW, What does 'large seeks' mean? Seeking a few blocks, hundred of blocks
and millions of blocks are not same, I think. Are they same for you?

>> However, in such way, if I get '57.6%' as a fragmentation radio,
>> what does it mean? What can I do next?
>
> I think the information of fragmentations are probably not
> the most important; the information users want to know are
> "When to do REINDEX?" and "How to set the fillfactor?".

Agreed.
So, I'm just counting backward seeks simply for the fragmentation ratio.
It means 'the mismatch radio between logical order and physical order
of the blocks'.

> I hope you to write how to interpret the framgentation (and other) info
> in README. In my understanding, I'll write "You'd better do REINDEX when
> you see the fragmentation is greater than 50%" under the present
> calculation method.

I can't understand why you want to make such decision, because you're
thinking the fragmentation information is not the most important for
the users, aren't you?
--
NAGAYASU Satoshi <nagayasus(at)nttdata(dot)co(dot)jp>
Phone: +81-3-3523-8122


From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pgstattuple extension for indexes
Date: 2006-08-18 02:05:36
Message-ID: 20060818104047.555A.ITAGAKI.TAKAHIRO@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp> wrote:

> > I hope you to write how to interpret the framgentation (and other) info
> > in README. In my understanding, I'll write "You'd better do REINDEX when
> > you see the fragmentation is greater than 50%" under the present
> > calculation method.
>
> I can't understand why you want to make such decision, because you're
> thinking the fragmentation information is not the most important for
> the users, aren't you?

Suppose a simple update case, for example, the accounts table in pgbench.
The default fillfactor of btree indexes is 90%, so the leaf pages are
fully split after we update 10-20% of tuples. But pgstatindex reports
the fragmentation is 50% in such condition, but I think we should do
REINDEX then. My decision came from this.

The setting fillfactor=50% is better than the case with high fillfactor
but all pages have split once, even if sizes of the indexes are same.
I worry that users will misunderstand the 50% of fragmentation -- if the
report says 100%, they'll consider to do REINDEX. But 50%, the necessity
is unclear.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


From: Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>
To: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pgstattuple extension for indexes
Date: 2006-08-18 02:31:59
Message-ID: 44E5269F.8000004@nttdata.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

ITAGAKI Takahiro wrote:
> Suppose a simple update case, for example, the accounts table in pgbench.
> The default fillfactor of btree indexes is 90%, so the leaf pages are
> fully split after we update 10-20% of tuples. But pgstatindex reports
> the fragmentation is 50% in such condition, but I think we should do
> REINDEX then. My decision came from this.
>
> The setting fillfactor=50% is better than the case with high fillfactor
> but all pages have split once, even if sizes of the indexes are same.
> I worry that users will misunderstand the 50% of fragmentation -- if the
> report says 100%, they'll consider to do REINDEX. But 50%, the necessity
> is unclear.

I think you should use 'average of page density' and 'number of leaf pages'
in such case. It is more useful to know filling condition of the leaves.

I've observed both while running pgbench, and the result is coming with
the WEB+DB PRESS magazine in next Wednesday. :)

Thanks.
--
NAGAYASU Satoshi <nagayasus(at)nttdata(dot)co(dot)jp>
Phone: +81-3-3523-8122


From: Jim Nasby <jnasby(at)pervasive(dot)com>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pgstattuple extension for indexes
Date: 2006-08-21 22:11:44
Message-ID: DC7D56B7-3AB0-4AD7-8555-799A9E3212A0@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Aug 17, 2006, at 4:10 PM, Martijn van Oosterhout wrote:
> On Thu, Aug 17, 2006 at 02:54:20PM -0500, Jim C. Nasby wrote:
>> On Thu, Aug 17, 2006 at 02:23:48PM +0200, Martijn van Oosterhout
>> wrote:
>>> On Thu, Aug 17, 2006 at 12:55:28PM +0900, ITAGAKI Takahiro wrote:
>>>> But the method has the above problem. So I suggest to use whether
>>>> the right link points to the next adjacent page or not.
>>>>
>>>> if (opaque->btpo_next != P_NONE && opaque->btpo_next !=
>>>> blkno + 1)
>>>> stat->fragments++;
>>>>
>>>> Do you think which method is better? Or do you have other ideas?
>>
>> Ok, fine... expand the example out to an index that's not trivial in
>> size. Even with read-ahead, once you get to a few megs (which is
>> obviously not that big), you're seeking.
>
> Well, mostly I'm just saying that only matching on the next block
> number is going to give unrealistically low numbers. We can't
> ignore OS
> level caching, the way Postgres works relies on it in many ways.

While I agree that *users* must take caching into account, I don't
think we should be fudging fragmentation numbers. For starters, we
have absolutely no idea how much caching is actually happening.

We should just report the raw numbers and let users draw their own
conclusions. Doing otherwise makes the stat far less useful.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>
Cc: pgsql-patches(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, ishii(at)sraoss(dot)co(dot)jp
Subject: Re: pgstattuple extension for indexes
Date: 2006-08-22 00:57:58
Message-ID: 200608220057.k7M0vww11163@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


BTIem is no longer in CVS HEAD, though it was in 8.1.X. Please update
your patch for CVS HEAD. Thanks.

---------------------------------------------------------------------------

Satoshi Nagayasu wrote:
> Hi all,
>
> Here is a patch to add pgstatindex functions to the pgstattuple module,
> which can work with 8.1.4. Please review and try it. Thanks.
>
>
> Satoshi Nagayasu wrote:
> > Bruce,
> >
> > I'll fix it in this week. Please wait a few days.
> > Thanks.
> >
> > Bruce Momjian wrote:
> >> nagayasu-san,
> >>
> >> This looks good, but we would like the code added to
> >> /contrib/pgstattuple, rather than it being its own /contrib module. Can
> >> you make that adjustment? Thanks.
> >>
> >> ---------------------------------------------------------------------------
> >>
> >> satoshi nagayasu wrote:
> >>> Hi folks,
> >>>
> >>> As I said on -PATCHES, I've been working on an utility to get
> >>> a b-tree index information. I'm happy to introduce
> >>> my new functions to you.
> >>>
> >>> pgstattuple module provides a `pgstatindex()`, and other small
> >>> functions, which allow you to get b-tree internal information.
> >>> I believe this module will be helpful to know b-tree index deeply.
> >>>
> >>> So please try it, send comment to me, and have fun.
> >>>
> >>> Thanks,
> >>> --
> >>> NAGAYASU Satoshi <nagayasus(at)nttdata(dot)co(dot)jp>
> >>>
> >>> -----------------------------------------------------
> >>> pgbench=# \x
> >>> Expanded display is on.
> >>> pgbench=# SELECT * FROM pgstatindex('accounts_pkey');
> >>> -[ RECORD 1 ]------+--------
> >>> version | 2
> >>> tree_level | 1
> >>> index_size | 3588096
> >>> root_block_no | 3
> >>> internal_pages | 0
> >>> leaf_pages | 437
> >>> empty_pages | 0
> >>> deleted_pages | 0
> >>> avg_leaf_density | 59.5
> >>> leaf_fragmentation | 49.89
> >>> -----------------------------------------------------
> >>>
> >>>
> >> [ application/x-gzip is not supported, skipping... ]
> >>
> >>> ---------------------------(end of broadcast)---------------------------
> >>> TIP 4: Have you searched our list archives?
> >>>
> >>> http://archives.postgresql.org
> >
> >
>
>
> --
> NAGAYASU Satoshi <nagayasus(at)nttdata(dot)co(dot)jp>
> Phone: +81-3-3523-8122

> diff -ruN pgstattuple.orig/Makefile pgstattuple/Makefile
> --- pgstattuple.orig/Makefile 2006-08-10 19:22:47.000000000 +0900
> +++ pgstattuple/Makefile 2006-08-10 19:24:05.000000000 +0900
> @@ -6,7 +6,7 @@
> #
> #-------------------------------------------------------------------------
>
> -SRCS = pgstattuple.c
> +SRCS = pgstattuple.c pgstatindex.c
>
> MODULE_big = pgstattuple
> OBJS = $(SRCS:.c=.o)
> diff -ruN pgstattuple.orig/pgstatindex.c pgstattuple/pgstatindex.c
> --- pgstattuple.orig/pgstatindex.c 1970-01-01 09:00:00.000000000 +0900
> +++ pgstattuple/pgstatindex.c 2006-08-11 17:51:26.000000000 +0900
> @@ -0,0 +1,714 @@
> +/*
> + * pgstatindex
> + *
> + * Copyright (c) 2006 Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>
> + *
> + * Permission to use, copy, modify, and distribute this software and
> + * its documentation for any purpose, without fee, and without a
> + * written agreement is hereby granted, provided that the above
> + * copyright notice and this paragraph and the following two
> + * paragraphs appear in all copies.
> + *
> + * IN NO EVENT SHALL THE AUTHOR BE LIABLE TO ANY PARTY FOR DIRECT,
> + * INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
> + * LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
> + * DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED
> + * OF THE POSSIBILITY OF SUCH DAMAGE.
> + *
> + * THE AUTHOR SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT
> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + * A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS
> + * IS" BASIS, AND THE AUTHOR HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE,
> + * SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
> + */
> +
> +#include "postgres.h"
> +
> +#include "fmgr.h"
> +#include "funcapi.h"
> +#include "access/heapam.h"
> +#include "access/itup.h"
> +#include "access/nbtree.h"
> +#include "access/transam.h"
> +#include "catalog/namespace.h"
> +#include "catalog/pg_type.h"
> +#include "utils/builtins.h"
> +#include "utils/inval.h"
> +
> +PG_FUNCTION_INFO_V1(pgstatindex);
> +PG_FUNCTION_INFO_V1(bt_metap);
> +PG_FUNCTION_INFO_V1(bt_page_items);
> +PG_FUNCTION_INFO_V1(bt_page_stats);
> +PG_FUNCTION_INFO_V1(pg_relpages);
> +
> +extern Datum pgstatindex(PG_FUNCTION_ARGS);
> +extern Datum bt_metap(PG_FUNCTION_ARGS);
> +extern Datum bt_page_items(PG_FUNCTION_ARGS);
> +extern Datum bt_page_stats(PG_FUNCTION_ARGS);
> +extern Datum pg_relpages(PG_FUNCTION_ARGS);
> +
> +#define PGSTATINDEX_TYPE "public.pgstatindex_type"
> +#define PGSTATINDEX_NCOLUMNS 10
> +
> +#define BTMETAP_TYPE "public.bt_metap_type"
> +#define BTMETAP_NCOLUMNS 6
> +
> +#define BTPAGEITEMS_TYPE "public.bt_page_items_type"
> +#define BTPAGEITEMS_NCOLUMNS 6
> +
> +#define BTPAGESTATS_TYPE "public.bt_page_stats_type"
> +#define BTPAGESTATS_NCOLUMNS 12
> +
> +
> +#define IS_INDEX(r) ((r)->rd_rel->relkind == 'i')
> +#define IS_BTREE(r) ((r)->rd_rel->relam == BTREE_AM_OID)
> +
> +#define CHECK_PAGE_OFFSET_RANGE(page, offset) { \
> + if ( !(FirstOffsetNumber<=(offset) && \
> + (offset)<=PageGetMaxOffsetNumber(page)) ) \
> + elog(ERROR, "Page offset number out of range."); }
> +
> +#define CHECK_RELATION_BLOCK_RANGE(rel, blkno) { \
> + if ( (blkno)<0 && RelationGetNumberOfBlocks((rel))<=(blkno) ) \
> + elog(ERROR, "Block number out of range."); }
> +
> +/* ------------------------------------------------
> + * structure for single btree page statistics
> + * ------------------------------------------------
> + */
> +typedef struct BTPageStat {
> + uint32 blkno;
> + uint32 live_items;
> + uint32 dead_items;
> + uint32 page_size;
> + uint32 max_avail;
> + uint32 free_size;
> + uint32 avg_item_size;
> + uint32 fragments;
> + bool is_root;
> + bool is_internal;
> + bool is_leaf;
> + bool is_deleted;
> + bool is_empty;
> +} BTPageStat;
> +
> +/* ------------------------------------------------
> + * A structure for a whole btree index statistics
> + * used by pgstatindex().
> + * ------------------------------------------------
> + */
> +typedef struct BTIndexStat {
> + uint32 version;
> + BlockNumber root_blkno;
> + uint32 level;
> +
> + uint32 live_items;
> + uint32 dead_items;
> +
> + uint32 root_pages;
> + uint32 internal_pages;
> + uint32 leaf_pages;
> + uint32 empty_pages;
> + uint32 deleted_pages;
> +
> + uint32 page_size;
> + uint32 avg_item_size;
> +
> + uint32 max_avail;
> + uint32 free_space;
> +
> + uint32 fragments;
> +} BTIndexStat;
> +
> +/* -------------------------------------------------
> + * GetBTPageStatistics()
> + *
> + * Collect statistics of single b-tree leaf page
> + * -------------------------------------------------
> + */
> +static bool
> +GetBTPageStatistics(BlockNumber blkno, Buffer buffer, BTPageStat *stat)
> +{
> + Page page = BufferGetPage(buffer);
> + PageHeader phdr = (PageHeader) page;
> + OffsetNumber maxoff = PageGetMaxOffsetNumber(page);
> + BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);
> + int item_size = 0;
> + int off;
> +
> + stat->blkno = blkno;
> +
> + stat->max_avail = BLCKSZ - ( BLCKSZ - phdr->pd_special + SizeOfPageHeaderData );
> +
> + stat->dead_items = stat->live_items = 0;
> +
> + stat->page_size = PageGetPageSize(page);
> +
> + /* page type */
> + stat->is_root = false;
> + stat->is_leaf = false;
> + stat->is_deleted = false;
> + stat->is_empty = false;
> +
> + if ( P_ISDELETED(opaque) )
> + {
> + stat->is_deleted = true;
> + return true;
> + }
> + else if ( P_IGNORE(opaque) )
> + stat->is_empty = true;
> + else if ( P_ISLEAF(opaque) )
> + stat->is_leaf = true;
> + else if ( P_ISROOT(opaque) )
> + stat->is_root = true;
> + else
> + stat->is_internal = true;
> +
> + /*----------------------------------------------
> + * If a next leaf is on the previous block,
> + * it means a fragmentation.
> + *----------------------------------------------
> + */
> + stat->fragments = 0;
> + if ( stat->is_leaf )
> + {
> + if ( opaque->btpo_next != P_NONE && opaque->btpo_next < blkno )
> + stat->fragments++;
> + }
> +
> + /* count live and dead tuples, and free space */
> + for (off=FirstOffsetNumber ; off<=maxoff ; off++)
> + {
> + BTItem btitem;
> + IndexTuple itup;
> +
> + ItemId id = PageGetItemId(page, off);
> +
> + btitem = (BTItem)PageGetItem(page, id);
> +
> + itup = &(btitem->bti_itup);
> +
> + item_size += IndexTupleSize(itup);
> +
> + if ( !ItemIdDeleted(id) )
> + stat->live_items++;
> + else
> + stat->dead_items++;
> + }
> + stat->free_size = PageGetFreeSpace(page);
> +
> +#ifdef NOT_USED
> + elog(NOTICE, "%d/%d (%.2f%%)", stat->free_size, stat->max_avail,
> + (float)stat->free_size/(float)stat->max_avail*100.0);
> +#endif
> +
> + if ( (stat->live_items + stat->dead_items) > 0 )
> + stat->avg_item_size = item_size / ( stat->live_items + stat->dead_items );
> + else
> + stat->avg_item_size = 0;
> +
> + return true;
> +}
> +
> +
> +/* ------------------------------------------------------
> + * pgstatindex()
> + *
> + * Usage: SELECT * FROM pgstatindex('t1_pkey');
> + * ------------------------------------------------------
> + */
> +Datum
> +pgstatindex(PG_FUNCTION_ARGS)
> +{
> + text *relname = PG_GETARG_TEXT_P(0);
> + Relation rel;
> + RangeVar *relrv;
> + Datum result;
> + uint32 nblocks;
> + uint32 blkno;
> + BTIndexStat indexStat;
> +
> + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
> + rel = relation_openrv(relrv, AccessShareLock);
> +
> + if ( !IS_INDEX(rel) || !IS_BTREE(rel) )
> + elog(ERROR, "pgstatindex() can be used only on b-tree index.");
> +
> + /*-------------------
> + * Read a metapage
> + *-------------------
> + */
> + {
> + Buffer buffer = ReadBuffer(rel, 0);
> + Page page = BufferGetPage(buffer);
> + BTMetaPageData *metad = BTPageGetMeta(page);
> +
> +// snprintf(values[j++], 32, "%d", metad->btm_magic);
> + indexStat.version = metad->btm_version;
> + indexStat.root_blkno = metad->btm_root;
> + indexStat.level = metad->btm_level;
> +// snprintf(values[j++], 32, "%d", metad->btm_fastroot);
> +// snprintf(values[j++], 32, "%d", metad->btm_fastlevel);
> +
> + ReleaseBuffer(buffer);
> + }
> +
> + nblocks = RelationGetNumberOfBlocks(rel);
> +
> + /* -- init stat -- */
> + indexStat.fragments = 0;
> +
> + indexStat.root_pages = 0;
> + indexStat.leaf_pages = 0;
> + indexStat.internal_pages = 0;
> + indexStat.empty_pages = 0;
> + indexStat.deleted_pages = 0;
> +
> + indexStat.max_avail = 0;
> + indexStat.free_space = 0;
> +
> + /*-----------------------
> + * Scan all blocks
> + *-----------------------
> + */
> + for (blkno=1 ; blkno<nblocks ; blkno++)
> + {
> + Buffer buffer = ReadBuffer(rel, blkno);
> + BTPageStat stat;
> +
> + /* scan one page */
> + stat.blkno = blkno;
> + GetBTPageStatistics(blkno, buffer, &stat);
> +
> + /*---------------------
> + * page status (type)
> + *---------------------
> + */
> + if ( stat.is_deleted )
> + indexStat.deleted_pages++;
> + else if ( stat.is_leaf )
> + indexStat.leaf_pages++;
> + else if ( stat.is_internal )
> + indexStat.internal_pages++;
> + else if ( stat.is_empty )
> + indexStat.empty_pages++;
> + else if ( stat.is_root )
> + indexStat.root_pages++;
> + else
> + elog(ERROR, "unknown page status.");
> +
> + /* -- leaf fragmentation -- */
> + indexStat.fragments += stat.fragments;
> +
> + if ( stat.is_leaf )
> + {
> + indexStat.max_avail += stat.max_avail;
> + indexStat.free_space += stat.free_size;
> + }
> +
> + ReleaseBuffer(buffer);
> + }
> +
> + relation_close(rel, AccessShareLock);
> +
> +#ifdef NOT_USED
> + elog(NOTICE, "[index information]");
> + elog(NOTICE, "version.................: %d", indexStat.version);
> + elog(NOTICE, "tree level..............: %d", indexStat.level);
> + elog(NOTICE, "index size..............: %d", (indexStat.root_pages +
> + indexStat.leaf_pages +
> + indexStat.internal_pages +
> + indexStat.deleted_pages +
> + indexStat.empty_pages) * BLCKSZ);
> +
> + elog(NOTICE, "");
> + elog(NOTICE, "[page statistics]");
> + elog(NOTICE, "root block number.......: %d", indexStat.root_blkno);
> + elog(NOTICE, "internal pages..........: %d", indexStat.internal_pages);
> + elog(NOTICE, "leaf pages..............: %d", indexStat.leaf_pages);
> + elog(NOTICE, "empty pages.............: %d", indexStat.empty_pages);
> + elog(NOTICE, "deleted pages...........: %d", indexStat.deleted_pages);
> +
> + elog(NOTICE, "");
> + elog(NOTICE, "[leaf statistics]");
> + elog(NOTICE, "avg. leaf density.......: %.2f%%", 100.0-(float)indexStat.free_space / (float)indexStat.max_avail * 100.0 );
> + elog(NOTICE, "leaf fragmentation......: %.2f%%", (float)indexStat.fragments / 2.0 / (float)indexStat.leaf_pages * 100.0 );
> +#endif
> +
> + /*----------------------------
> + * Build a result tuple
> + *----------------------------
> + */
> + {
> + TupleDesc tupleDesc;
> + int j;
> + char *values[PGSTATINDEX_NCOLUMNS];
> +
> + HeapTupleData tupleData;
> + HeapTuple tuple = &tupleData;
> +
> + tupleDesc = RelationNameGetTupleDesc(PGSTATINDEX_TYPE);
> +
> + j = 0;
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", indexStat.version);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", indexStat.level);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", (indexStat.root_pages +
> + indexStat.leaf_pages +
> + indexStat.internal_pages +
> + indexStat.deleted_pages +
> + indexStat.empty_pages) * BLCKSZ );
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", indexStat.root_blkno);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", indexStat.internal_pages);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", indexStat.leaf_pages);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", indexStat.empty_pages);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", indexStat.deleted_pages);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%.2f", 100.0-(float)indexStat.free_space / (float)indexStat.max_avail * 100.0 );
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%.2f", (float)indexStat.fragments / (float)indexStat.leaf_pages * 100.0 );
> +
> + tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
> + values);
> +
> + result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
> + }
> +
> + PG_RETURN_DATUM( result );
> +}
> +
> +/* -----------------------------------------------
> + * bt_page()
> + *
> + * Usage: SELECT * FROM bt_page('t1_pkey', 0);
> + * -----------------------------------------------
> + */
> +Datum
> +bt_page_stats(PG_FUNCTION_ARGS)
> +{
> + text *relname = PG_GETARG_TEXT_P(0);
> + uint32 blkno = PG_GETARG_UINT32(1);
> + Buffer buffer;
> +
> + Relation rel;
> + RangeVar *relrv;
> + Datum result;
> +
> + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
> + rel = relation_openrv(relrv, AccessShareLock);
> +
> + CHECK_RELATION_BLOCK_RANGE(rel, blkno);
> +
> + buffer = ReadBuffer(rel, blkno);
> +
> + if ( !IS_INDEX(rel) || !IS_BTREE(rel) )
> + elog(ERROR, "bt_page_stats() can be used only on b-tree index.");
> +
> + if ( blkno==0 )
> + elog(ERROR, "Block 0 is a meta page.");
> +
> + {
> + HeapTuple tuple;
> + TupleDesc tupleDesc;
> + int j;
> + char *values[BTPAGESTATS_NCOLUMNS];
> +
> + BTPageStat stat;
> +
> + GetBTPageStatistics(blkno, buffer, &stat);
> +
> + tupleDesc = RelationNameGetTupleDesc(BTPAGESTATS_TYPE);
> +
> + j = 0;
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", stat.blkno);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", stat.live_items);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", stat.dead_items);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", stat.live_items + stat.dead_items);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", stat.avg_item_size);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", stat.page_size);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", stat.free_size);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", stat.is_deleted);
> + values[j] = palloc(32);
> +// snprintf(values[j++], 32, "%d", opaque->btpo_prev);
> + snprintf(values[j++], 32, "%d", 0); /*-- not implemented yet --*/
> + values[j] = palloc(32);
> +// snprintf(values[j++], 32, "%d", opaque->btpo_next);
> + snprintf(values[j++], 32, "%d", 0); /*-- not implemented yet --*/
> + values[j] = palloc(32);
> +// snprintf(values[j++], 32, "%d", opaque->btpo.level);
> + snprintf(values[j++], 32, "%d", 0); /*-- not implemented yet --*/
> + values[j] = palloc(32);
> +// snprintf(values[j++], 32, "%d", opaque->btpo_flags);
> + snprintf(values[j++], 32, "%d", 0); /*-- not implemented yet --*/
> +
> + tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
> + values);
> +
> + result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
> + }
> +
> + ReleaseBuffer(buffer);
> +
> + relation_close(rel, AccessShareLock);
> +
> + PG_RETURN_DATUM(result);
> +}
> +
> +/*-------------------------------------------------------
> + * bt_page_items()
> + *
> + * Get IndexTupleData set in a leaf page
> + *
> + * Usage: SELECT * FROM bt_page_items('t1_pkey', 0);
> + *-------------------------------------------------------
> + */
> +/* ---------------------------------------------------
> + * data structure for SRF to hold a scan information
> + * ---------------------------------------------------
> + */
> +struct user_args
> +{
> + TupleDesc tupd;
> + Relation rel;
> + Buffer buffer;
> + Page page;
> + uint16 offset;
> +};
> +
> +Datum
> +bt_page_items(PG_FUNCTION_ARGS)
> +{
> + text *relname = PG_GETARG_TEXT_P(0);
> + uint32 blkno = PG_GETARG_UINT32(1);
> +
> + RangeVar *relrv;
> + Datum result;
> + char *values[BTPAGEITEMS_NCOLUMNS];
> + BTPageOpaque opaque;
> + HeapTuple tuple;
> + ItemId id;
> +
> + FuncCallContext *fctx;
> + MemoryContext mctx;
> + struct user_args *uargs = NULL;
> +
> + if ( blkno==0 )
> + elog(ERROR, "Block 0 is a meta page.");
> +
> + if ( SRF_IS_FIRSTCALL() )
> + {
> + fctx = SRF_FIRSTCALL_INIT();
> + mctx = MemoryContextSwitchTo(fctx->multi_call_memory_ctx);
> +
> + uargs = palloc(sizeof(struct user_args));
> +
> + uargs->tupd = RelationNameGetTupleDesc(BTPAGEITEMS_TYPE);
> + uargs->offset = FirstOffsetNumber;
> +
> + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
> + uargs->rel = relation_openrv(relrv, AccessShareLock);
> +
> + CHECK_RELATION_BLOCK_RANGE(uargs->rel, blkno);
> +
> + uargs->buffer = ReadBuffer(uargs->rel, blkno);
> +
> + if ( !IS_INDEX(uargs->rel) || !IS_BTREE(uargs->rel) )
> + elog(ERROR, "bt_page_items() can be used only on b-tree index.");
> +
> + uargs->page = BufferGetPage(uargs->buffer);
> +
> + opaque = (BTPageOpaque) PageGetSpecialPointer(uargs->page);
> +
> + if ( P_ISDELETED(opaque) )
> + elog(NOTICE, "bt_page_items(): this page is deleted.");
> +
> + fctx->max_calls = PageGetMaxOffsetNumber(uargs->page);
> + fctx->user_fctx = uargs;
> +
> + MemoryContextSwitchTo(mctx);
> + }
> +
> + fctx = SRF_PERCALL_SETUP();
> + uargs = fctx->user_fctx;
> +
> + if ( fctx->call_cntr < fctx->max_calls )
> + {
> + BTItem btitem;
> + IndexTuple itup;
> +
> + id = PageGetItemId(uargs->page, uargs->offset);
> +
> + if ( !ItemIdIsValid(id) )
> + elog(ERROR, "Invalid ItemId.");
> +
> + btitem = (BTItem)PageGetItem(uargs->page, id);
> + itup = &(btitem->bti_itup);
> +
> + {
> + int j = 0;
> +
> + BlockNumber blkno = BlockIdGetBlockNumber(&(itup->t_tid.ip_blkid));
> +
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", uargs->offset);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "(%u,%u)", blkno, itup->t_tid.ip_posid);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", IndexTupleSize(itup));
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%c", IndexTupleHasNulls(itup) ? 't' : 'f');
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%c", IndexTupleHasVarwidths(itup) ? 't' : 'f');
> +
> + {
> + int off;
> + char *dump;
> + char *ptr = (char *)itup + IndexInfoFindDataOffset(itup->t_info);
> +
> + dump = palloc(IndexTupleSize(itup)*3);
> + memset(dump, 0, IndexTupleSize(itup)*3);
> +
> + for (off=0 ;
> + off < IndexTupleSize(itup)-IndexInfoFindDataOffset(itup->t_info) ;
> + off++)
> + {
> + if ( dump[0]=='\0' )
> + sprintf(dump, "%02x", *(ptr+off) & 0xff);
> + else
> + {
> + char buf[4];
> + sprintf(buf, " %02x", *(ptr+off) & 0xff);
> + strcat(dump, buf);
> + }
> + }
> + values[j] = dump;
> + }
> +
> + tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(uargs->tupd), values);
> + result = TupleGetDatum(TupleDescGetSlot(uargs->tupd), tuple);
> + }
> +
> + uargs->offset = uargs->offset + 1;
> +
> + SRF_RETURN_NEXT(fctx, result);
> + }
> + else
> + {
> + ReleaseBuffer(uargs->buffer);
> + relation_close(uargs->rel, AccessShareLock);
> +
> + SRF_RETURN_DONE(fctx);
> + }
> +}
> +
> +
> +/* ------------------------------------------------
> + * bt_metap()
> + *
> + * Get a btree meta-page information
> + *
> + * Usage: SELECT * FROM bt_metap('t1_pkey')
> + * ------------------------------------------------
> + */
> +Datum
> +bt_metap(PG_FUNCTION_ARGS)
> +{
> + text *relname = PG_GETARG_TEXT_P(0);
> + Buffer buffer;
> +
> + Relation rel;
> + RangeVar *relrv;
> + Datum result;
> +
> + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
> + rel = relation_openrv(relrv, AccessShareLock);
> +
> + if ( !IS_INDEX(rel) || !IS_BTREE(rel) )
> + elog(ERROR, "bt_metap() can be used only on b-tree index.");
> +
> + buffer = ReadBuffer(rel, 0);
> +
> + {
> + BTMetaPageData *metad;
> +
> + TupleDesc tupleDesc;
> + int j;
> + char *values[BTMETAP_NCOLUMNS];
> + HeapTuple tuple;
> +
> + Page page = BufferGetPage(buffer);
> +
> + metad = BTPageGetMeta(page);
> +
> + tupleDesc = RelationNameGetTupleDesc(BTMETAP_TYPE);
> +
> + j = 0;
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", metad->btm_magic);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", metad->btm_version);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", metad->btm_root);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", metad->btm_level);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", metad->btm_fastroot);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", metad->btm_fastlevel);
> +
> + tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
> + values);
> +
> + result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
> + }
> +
> + ReleaseBuffer(buffer);
> +
> + relation_close(rel, AccessShareLock);
> +
> + PG_RETURN_DATUM(result);
> +}
> +
> +/* --------------------------------------------------------
> + * pg_relpages()
> + *
> + * Get a number of pages of the table/index.
> + *
> + * Usage: SELECT pg_relpages('t1');
> + * SELECT pg_relpages('t1_pkey');
> + * --------------------------------------------------------
> + */
> +Datum
> +pg_relpages(PG_FUNCTION_ARGS)
> +{
> + text *relname = PG_GETARG_TEXT_P(0);
> +
> + Relation rel;
> + RangeVar *relrv;
> + int4 relpages;
> +
> + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
> + rel = relation_openrv(relrv, AccessShareLock);
> +
> + relpages = RelationGetNumberOfBlocks(rel);
> +
> + relation_close(rel, AccessShareLock);
> +
> + PG_RETURN_INT32(relpages);
> +}
> +
> diff -ruN pgstattuple.orig/pgstattuple.sql.in pgstattuple/pgstattuple.sql.in
> --- pgstattuple.orig/pgstattuple.sql.in 2006-08-10 19:22:47.000000000 +0900
> +++ pgstattuple/pgstattuple.sql.in 2006-08-11 17:46:19.000000000 +0900
> @@ -22,3 +22,97 @@
> RETURNS pgstattuple_type
> AS 'MODULE_PATHNAME', 'pgstattuplebyid'
> LANGUAGE 'C' STRICT;
> +
> +--
> +-- pgstatindex
> +--
> +DROP TYPE pgstatindex_type CASCADE;
> +CREATE TYPE pgstatindex_type AS (
> + version int4,
> + tree_level int4,
> + index_size int4,
> + root_block_no int4,
> + internal_pages int4,
> + leaf_pages int4,
> + empty_pages int4,
> + deleted_pages int4,
> + avg_leaf_density float8,
> + leaf_fragmentation float8
> +);
> +
> +CREATE OR REPLACE FUNCTION pgstatindex(text)
> +RETURNS pgstatindex_type
> +AS 'MODULE_PATHNAME', 'pgstatindex'
> +LANGUAGE 'C' STRICT;
> +
> +--
> +-- bt_metap()
> +--
> +DROP TYPE bt_metap_type CASCADE;
> +CREATE TYPE bt_metap_type AS (
> + magic int4,
> + version int4,
> + root int4,
> + level int4,
> + fastroot int4,
> + fastlevel int4
> +);
> +
> +CREATE OR REPLACE FUNCTION bt_metap(text)
> +RETURNS bt_metap_type
> +AS 'MODULE_PATHNAME', 'bt_metap'
> +LANGUAGE 'C' STRICT;
> +
> +--
> +-- bt_page_items()
> +--
> +DROP TYPE bt_page_items_type CASCADE;
> +CREATE TYPE bt_page_items_type AS (
> + itemoffset int4,
> + ctid tid,
> + itemlen int4,
> + nulls bool,
> + vars bool,
> + data text
> +);
> +
> +DROP FUNCTION bt_page_items(text, int4);
> +
> +CREATE OR REPLACE FUNCTION bt_page_items(text, int4)
> +RETURNS SETOF bt_page_items_type
> +AS 'MODULE_PATHNAME', 'bt_page_items'
> +LANGUAGE 'C' STRICT;
> +
> +--
> +-- bt_page_stats()
> +--
> +DROP TYPE bt_page_stats_type CASCADE;
> +CREATE TYPE bt_page_stats_type AS (
> + blkno int4,
> + live_items int4,
> + dead_items int4,
> + total_items int4,
> + avg_item_size float,
> + page_size int4,
> + free_size int4,
> + is_deleted int4,
> + btpo_prev int4,
> + btpo_next int4,
> + btpo_level int4,
> + btpo_flags int4
> +);
> +
> +DROP FUNCTION bt_page_stats(text, int4);
> +
> +CREATE OR REPLACE FUNCTION bt_page_stats(text, int4)
> +RETURNS bt_page_stats_type
> +AS 'MODULE_PATHNAME', 'bt_page_stats'
> +LANGUAGE 'C' STRICT;
> +
> +--
> +-- pg_relpages()
> +--
> +CREATE OR REPLACE FUNCTION pg_relpages(text)
> +RETURNS int
> +AS 'MODULE_PATHNAME', 'pg_relpages'
> +LANGUAGE 'C' STRICT;
> diff -ruN pgstattuple.orig/test.sh pgstattuple/test.sh
> --- pgstattuple.orig/test.sh 1970-01-01 09:00:00.000000000 +0900
> +++ pgstattuple/test.sh 2006-08-11 17:47:05.000000000 +0900
> @@ -0,0 +1,22 @@
> +#!/bin/sh
> +
> +export PATH=/usr/local/pgsql814/bin:$PATH
> +
> +psql pgbench<<EOF
> +DROP FUNCTION pgstatindex(text);
> +EOF
> +
> +psql -f /usr/local/pgsql814/share/contrib/pgstattuple.sql pgbench
> +
> +psql pgbench<<EOF
> +SELECT * FROM pg_relpages('accounts_pkey');
> +\x
> +SELECT * FROM pgstatindex('accounts_pkey');
> +SELECT * FROM bt_metap('accounts_pkey');
> +\x
> +SELECT * FROM bt_page_items('accounts_pkey', 0);
> +SELECT * FROM bt_page_items('accounts_pkey', 1) limit 20;
> +
> +\x
> +SELECT * FROM bt_page_stats('accounts_pkey', 1);
> +EOF
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: pgsql-patches(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, ishii(at)sraoss(dot)co(dot)jp
Subject: Re: pgstattuple extension for indexes
Date: 2006-08-22 01:09:34
Message-ID: 44EA594E.10303@nttdata.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce,

Bruce Momjian wrote:
> BTIem is no longer in CVS HEAD, though it was in 8.1.X. Please update
> your patch for CVS HEAD. Thanks.

I've posted CVS HEAD workable version on Aug.14.
Please check it out. Thanks.

--
NAGAYASU Satoshi <nagayasus(at)nttdata(dot)co(dot)jp>
Phone: +81-3-3523-8122


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>
Cc: pgsql-patches(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, ishii(at)sraoss(dot)co(dot)jp
Subject: Re: pgstattuple extension for indexes
Date: 2006-08-22 01:55:23
Message-ID: 200608220155.k7M1tNu14848@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Satoshi Nagayasu wrote:
> Bruce,
>
> Bruce Momjian wrote:
> > BTIem is no longer in CVS HEAD, though it was in 8.1.X. Please update
> > your patch for CVS HEAD. Thanks.
>
> I've posted CVS HEAD workable version on Aug.14.
> Please check it out. Thanks.

OK, I found it, but it has no updates to README.pgstattuple to describe
the new functionality. Should I write it?

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: pgsql-patches(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, ishii(at)sraoss(dot)co(dot)jp
Subject: Re: pgstattuple extension for indexes
Date: 2006-08-22 02:05:15
Message-ID: 44EA665B.3090106@nttdata.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Sorry, I'll write README (and uninstall.sql?) by tomorrow.

Bruce Momjian wrote:
> Satoshi Nagayasu wrote:
>> Bruce,
>>
>> Bruce Momjian wrote:
>>> BTIem is no longer in CVS HEAD, though it was in 8.1.X. Please update
>>> your patch for CVS HEAD. Thanks.
>> I've posted CVS HEAD workable version on Aug.14.
>> Please check it out. Thanks.
>
> OK, I found it, but it has no updates to README.pgstattuple to describe
> the new functionality. Should I write it?
>

--
NAGAYASU Satoshi <nagayasus(at)nttdata(dot)co(dot)jp>
Phone: +81-3-3523-8122


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>
Cc: pgsql-patches(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, ishii(at)sraoss(dot)co(dot)jp
Subject: Re: pgstattuple extension for indexes
Date: 2006-08-22 02:54:47
Message-ID: 200608220254.k7M2sl519691@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Satoshi Nagayasu wrote:
> Sorry, I'll write README (and uninstall.sql?) by tomorrow.

Thanks. Yea, you need to update the uninstall too.

---------------------------------------------------------------------------

>
> Bruce Momjian wrote:
> > Satoshi Nagayasu wrote:
> >> Bruce,
> >>
> >> Bruce Momjian wrote:
> >>> BTIem is no longer in CVS HEAD, though it was in 8.1.X. Please update
> >>> your patch for CVS HEAD. Thanks.
> >> I've posted CVS HEAD workable version on Aug.14.
> >> Please check it out. Thanks.
> >
> > OK, I found it, but it has no updates to README.pgstattuple to describe
> > the new functionality. Should I write it?
> >
>
>
> --
> NAGAYASU Satoshi <nagayasus(at)nttdata(dot)co(dot)jp>
> Phone: +81-3-3523-8122

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>
Cc: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pgstattuple extension for indexes
Date: 2006-08-23 07:14:19
Message-ID: 20060823071419.GI88878@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Fri, Aug 18, 2006 at 09:15:59AM +0900, Satoshi Nagayasu wrote:
> ITAGAKI Takahiro wrote:
> > But the method has the above problem. So I suggest to use whether
> > the right link points to the next adjacent page or not.
> >
> > if (opaque->btpo_next != P_NONE && opaque->btpo_next != blkno + 1)
> > stat->fragments++;
>
> Well, in that way, following two conditions,
> [1] [x] [2] [y] [3]
> and
> [3] [x] [2] [y] [1]
> will be calculated as same fragmentation ratio(100%), I can't agree
> with that, because both will generate different costs while index scan
> in the real world (I don't care about page splitting algorithm now).

What about just reporting the correlation of pages in the index, as well
as fragmentation?
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-patches(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, ishii(at)sraoss(dot)co(dot)jp
Subject: Re: pgstattuple extension for indexes
Date: 2006-09-02 17:06:28
Message-ID: 200609021706.k82H6Sr29059@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


Patch applied. Thanks.

I updated the README documentation for the new functions, attached. I
could not update the Japanese version of the README.

---------------------------------------------------------------------------

Satoshi Nagayasu wrote:
> Bruce,
>
> Attached patch has been cleaned up,
> and modified to be able to work with CVS HEAD.
>
> Thanks.
>
> Satoshi Nagayasu wrote:
> > Alvaro,
> >
> > Alvaro Herrera wrote:
> >> Huh, I bet it works with 8.1.4, but it doesn't work on CVS HEAD:
> >>
> >> /pgsql/source/00orig/contrib/pgstattuple/pgstatindex.c: In function 'GetBTPageStatistics':
> >> /pgsql/source/00orig/contrib/pgstattuple/pgstatindex.c:182: error: 'BTItem' undeclared (first use in this function)
> >>
> >>
> >> While you're at it, please consider removing C++ style comments and
> >> unused code.
> >>
> >> Formatting is way off as well, but I guess that is easily fixed with
> >> pgindent.
> >
> > Thanks for comments. I'm going to fix my patch from now.
> >
> >> Regarding the pg_relpages function, why do you think it's necessary?
> >> (It returns the true number of blocks of a given relation). It may
> >> belong into core given a reasonable use case, but otherwise it doesn't
> >> seem to belong into pgstatindex (or pgstattuple for that matter).
> >
> > I wanted to sample some pages from the table/index, and get their statistics
> > to know table/index conditions. I know pgstattuple() reports table
> > statistics, however, pgstattuple() generates heavy CPU and I/O load.
> >
> > When we need to sample some pages from table/index, we need to know
> > true number of blocks.
> >
> > I have another function, called pgstatpage(), to get information inside
> > a single block/page statistics of the table. pg_relpages() will be used
> > with this.
> >
> > Sorry for not mentioned in previous post about pgstatpage(),
> > but I've remembered about it just now.
> >
> > Many memories in my brain have already `paged-out` (too busy in last few months),
> > and some of them got `out-of-memory`. :^)
> >
> > Thanks.
>
>
> --
> NAGAYASU Satoshi <nagayasus(at)nttdata(dot)co(dot)jp>
> Phone: +81-3-3523-8122

> diff -ruN pgstattuple.orig/Makefile pgstattuple/Makefile
> --- pgstattuple.orig/Makefile 2006-02-27 21:54:40.000000000 +0900
> +++ pgstattuple/Makefile 2006-08-14 09:28:58.000000000 +0900
> @@ -6,7 +6,7 @@
> #
> #-------------------------------------------------------------------------
>
> -SRCS = pgstattuple.c
> +SRCS = pgstattuple.c pgstatindex.c
>
> MODULE_big = pgstattuple
> OBJS = $(SRCS:.c=.o)
> diff -ruN pgstattuple.orig/pgstatindex.c pgstattuple/pgstatindex.c
> --- pgstattuple.orig/pgstatindex.c 1970-01-01 09:00:00.000000000 +0900
> +++ pgstattuple/pgstatindex.c 2006-08-14 11:24:23.000000000 +0900
> @@ -0,0 +1,706 @@
> +/*
> + * pgstatindex
> + *
> + * Copyright (c) 2006 Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>
> + *
> + * Permission to use, copy, modify, and distribute this software and
> + * its documentation for any purpose, without fee, and without a
> + * written agreement is hereby granted, provided that the above
> + * copyright notice and this paragraph and the following two
> + * paragraphs appear in all copies.
> + *
> + * IN NO EVENT SHALL THE AUTHOR BE LIABLE TO ANY PARTY FOR DIRECT,
> + * INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
> + * LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
> + * DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED
> + * OF THE POSSIBILITY OF SUCH DAMAGE.
> + *
> + * THE AUTHOR SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT
> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + * A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS
> + * IS" BASIS, AND THE AUTHOR HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE,
> + * SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
> + */
> +
> +#include "postgres.h"
> +
> +#include "fmgr.h"
> +#include "funcapi.h"
> +#include "access/heapam.h"
> +#include "access/itup.h"
> +#include "access/nbtree.h"
> +#include "access/transam.h"
> +#include "catalog/namespace.h"
> +#include "catalog/pg_type.h"
> +#include "utils/builtins.h"
> +#include "utils/inval.h"
> +
> +PG_FUNCTION_INFO_V1(pgstatindex);
> +PG_FUNCTION_INFO_V1(bt_metap);
> +PG_FUNCTION_INFO_V1(bt_page_items);
> +PG_FUNCTION_INFO_V1(bt_page_stats);
> +PG_FUNCTION_INFO_V1(pg_relpages);
> +
> +extern Datum pgstatindex(PG_FUNCTION_ARGS);
> +extern Datum bt_metap(PG_FUNCTION_ARGS);
> +extern Datum bt_page_items(PG_FUNCTION_ARGS);
> +extern Datum bt_page_stats(PG_FUNCTION_ARGS);
> +extern Datum pg_relpages(PG_FUNCTION_ARGS);
> +
> +#define PGSTATINDEX_TYPE "public.pgstatindex_type"
> +#define PGSTATINDEX_NCOLUMNS 10
> +
> +#define BTMETAP_TYPE "public.bt_metap_type"
> +#define BTMETAP_NCOLUMNS 6
> +
> +#define BTPAGEITEMS_TYPE "public.bt_page_items_type"
> +#define BTPAGEITEMS_NCOLUMNS 6
> +
> +#define BTPAGESTATS_TYPE "public.bt_page_stats_type"
> +#define BTPAGESTATS_NCOLUMNS 11
> +
> +
> +#define IS_INDEX(r) ((r)->rd_rel->relkind == 'i')
> +#define IS_BTREE(r) ((r)->rd_rel->relam == BTREE_AM_OID)
> +
> +#define CHECK_PAGE_OFFSET_RANGE(page, offset) { \
> + if ( !(FirstOffsetNumber<=(offset) && \
> + (offset)<=PageGetMaxOffsetNumber(page)) ) \
> + elog(ERROR, "Page offset number out of range."); }
> +
> +#define CHECK_RELATION_BLOCK_RANGE(rel, blkno) { \
> + if ( (blkno)<0 && RelationGetNumberOfBlocks((rel))<=(blkno) ) \
> + elog(ERROR, "Block number out of range."); }
> +
> +/* ------------------------------------------------
> + * structure for single btree page statistics
> + * ------------------------------------------------
> + */
> +typedef struct BTPageStat
> +{
> + uint32 blkno;
> + uint32 live_items;
> + uint32 dead_items;
> + uint32 page_size;
> + uint32 max_avail;
> + uint32 free_size;
> + uint32 avg_item_size;
> + uint32 fragments;
> + char type;
> +
> + /* opaque data */
> + BlockNumber btpo_prev;
> + BlockNumber btpo_next;
> + union
> + {
> + uint32 level;
> + TransactionId xact;
> + } btpo;
> + uint16 btpo_flags;
> + BTCycleId btpo_cycleid;
> +} BTPageStat;
> +
> +/* ------------------------------------------------
> + * A structure for a whole btree index statistics
> + * used by pgstatindex().
> + * ------------------------------------------------
> + */
> +typedef struct BTIndexStat
> +{
> + uint32 magic;
> + uint32 version;
> + BlockNumber root_blkno;
> + uint32 level;
> +
> + BlockNumber fastroot;
> + uint32 fastlevel;
> +
> + uint32 live_items;
> + uint32 dead_items;
> +
> + uint32 root_pages;
> + uint32 internal_pages;
> + uint32 leaf_pages;
> + uint32 empty_pages;
> + uint32 deleted_pages;
> +
> + uint32 page_size;
> + uint32 avg_item_size;
> +
> + uint32 max_avail;
> + uint32 free_space;
> +
> + uint32 fragments;
> +} BTIndexStat;
> +
> +/* -------------------------------------------------
> + * GetBTPageStatistics()
> + *
> + * Collect statistics of single b-tree leaf page
> + * -------------------------------------------------
> + */
> +static bool
> +GetBTPageStatistics(BlockNumber blkno, Buffer buffer, BTPageStat * stat)
> +{
> + Page page = BufferGetPage(buffer);
> + PageHeader phdr = (PageHeader) page;
> + OffsetNumber maxoff = PageGetMaxOffsetNumber(page);
> + BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);
> + int item_size = 0;
> + int off;
> +
> + stat->blkno = blkno;
> +
> + stat->max_avail = BLCKSZ - (BLCKSZ - phdr->pd_special + SizeOfPageHeaderData);
> +
> + stat->dead_items = stat->live_items = 0;
> +
> + stat->page_size = PageGetPageSize(page);
> +
> + /* page type (flags) */
> + if (P_ISDELETED(opaque))
> + {
> + stat->type = 'd';
> + return true;
> + }
> + else if (P_IGNORE(opaque))
> + stat->type = 'e';
> + else if (P_ISLEAF(opaque))
> + stat->type = 'l';
> + else if (P_ISROOT(opaque))
> + stat->type = 'r';
> + else
> + stat->type = 'i';
> +
> + /* btpage opaque data */
> + stat->btpo_prev = opaque->btpo_prev;
> + stat->btpo_next = opaque->btpo_next;
> + if (P_ISDELETED(opaque))
> + stat->btpo.xact = opaque->btpo.xact;
> + else
> + stat->btpo.level = opaque->btpo.level;
> + stat->btpo_flags = opaque->btpo_flags;
> + stat->btpo_cycleid = opaque->btpo_cycleid;
> +
> + /*----------------------------------------------
> + * If a next leaf is on the previous block,
> + * it means a fragmentation.
> + *----------------------------------------------
> + */
> + stat->fragments = 0;
> + if (stat->type == 'l')
> + {
> + if (opaque->btpo_next != P_NONE && opaque->btpo_next < blkno)
> + stat->fragments++;
> + }
> +
> + /* count live and dead tuples, and free space */
> + for (off = FirstOffsetNumber; off <= maxoff; off++)
> + {
> + IndexTuple itup;
> +
> + ItemId id = PageGetItemId(page, off);
> +
> + itup = (IndexTuple) PageGetItem(page, id);
> +
> + item_size += IndexTupleSize(itup);
> +
> + if (!ItemIdDeleted(id))
> + stat->live_items++;
> + else
> + stat->dead_items++;
> + }
> + stat->free_size = PageGetFreeSpace(page);
> +
> + if ((stat->live_items + stat->dead_items) > 0)
> + stat->avg_item_size = item_size / (stat->live_items + stat->dead_items);
> + else
> + stat->avg_item_size = 0;
> +
> + return true;
> +}
> +
> +
> +/* ------------------------------------------------------
> + * pgstatindex()
> + *
> + * Usage: SELECT * FROM pgstatindex('t1_pkey');
> + * ------------------------------------------------------
> + */
> +Datum
> +pgstatindex(PG_FUNCTION_ARGS)
> +{
> + text *relname = PG_GETARG_TEXT_P(0);
> + Relation rel;
> + RangeVar *relrv;
> + Datum result;
> + uint32 nblocks;
> + uint32 blkno;
> + BTIndexStat indexStat;
> +
> + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
> + rel = relation_openrv(relrv, AccessShareLock);
> +
> + if (!IS_INDEX(rel) || !IS_BTREE(rel))
> + elog(ERROR, "pgstatindex() can be used only on b-tree index.");
> +
> + /*-------------------
> + * Read a metapage
> + *-------------------
> + */
> + {
> + Buffer buffer = ReadBuffer(rel, 0);
> + Page page = BufferGetPage(buffer);
> + BTMetaPageData *metad = BTPageGetMeta(page);
> +
> + indexStat.magic = metad->btm_magic;
> + indexStat.version = metad->btm_version;
> + indexStat.root_blkno = metad->btm_root;
> + indexStat.level = metad->btm_level;
> + indexStat.fastroot = metad->btm_fastroot;
> + indexStat.fastlevel = metad->btm_fastlevel;
> +
> + ReleaseBuffer(buffer);
> + }
> +
> + nblocks = RelationGetNumberOfBlocks(rel);
> +
> + /* -- init stat -- */
> + indexStat.fragments = 0;
> +
> + indexStat.root_pages = 0;
> + indexStat.leaf_pages = 0;
> + indexStat.internal_pages = 0;
> + indexStat.empty_pages = 0;
> + indexStat.deleted_pages = 0;
> +
> + indexStat.max_avail = 0;
> + indexStat.free_space = 0;
> +
> + /*-----------------------
> + * Scan all blocks
> + *-----------------------
> + */
> + for (blkno = 1; blkno < nblocks; blkno++)
> + {
> + Buffer buffer = ReadBuffer(rel, blkno);
> + BTPageStat stat;
> +
> + /* scan one page */
> + stat.blkno = blkno;
> + GetBTPageStatistics(blkno, buffer, &stat);
> +
> + /*---------------------
> + * page status (type)
> + *---------------------
> + */
> + switch (stat.type)
> + {
> + case 'd':
> + indexStat.deleted_pages++;
> + break;
> + case 'l':
> + indexStat.leaf_pages++;
> + break;
> + case 'i':
> + indexStat.internal_pages++;
> + break;
> + case 'e':
> + indexStat.empty_pages++;
> + break;
> + case 'r':
> + indexStat.root_pages++;
> + break;
> + default:
> + elog(ERROR, "unknown page status.");
> + }
> +
> + /* -- leaf fragmentation -- */
> + indexStat.fragments += stat.fragments;
> +
> + if (stat.type == 'l')
> + {
> + indexStat.max_avail += stat.max_avail;
> + indexStat.free_space += stat.free_size;
> + }
> +
> + ReleaseBuffer(buffer);
> + }
> +
> + relation_close(rel, AccessShareLock);
> +
> + /*----------------------------
> + * Build a result tuple
> + *----------------------------
> + */
> + {
> + TupleDesc tupleDesc;
> + int j;
> + char *values[PGSTATINDEX_NCOLUMNS];
> +
> + HeapTupleData tupleData;
> + HeapTuple tuple = &tupleData;
> +
> + tupleDesc = RelationNameGetTupleDesc(PGSTATINDEX_TYPE);
> +
> + j = 0;
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", indexStat.version);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", indexStat.level);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", (indexStat.root_pages +
> + indexStat.leaf_pages +
> + indexStat.internal_pages +
> + indexStat.deleted_pages +
> + indexStat.empty_pages) * BLCKSZ);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", indexStat.root_blkno);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", indexStat.internal_pages);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", indexStat.leaf_pages);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", indexStat.empty_pages);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", indexStat.deleted_pages);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%.2f", 100.0 - (float) indexStat.free_space / (float) indexStat.max_avail * 100.0);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%.2f", (float) indexStat.fragments / (float) indexStat.leaf_pages * 100.0);
> +
> + tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
> + values);
> +
> + result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
> + }
> +
> + PG_RETURN_DATUM(result);
> +}
> +
> +/* -----------------------------------------------
> + * bt_page()
> + *
> + * Usage: SELECT * FROM bt_page('t1_pkey', 0);
> + * -----------------------------------------------
> + */
> +Datum
> +bt_page_stats(PG_FUNCTION_ARGS)
> +{
> + text *relname = PG_GETARG_TEXT_P(0);
> + uint32 blkno = PG_GETARG_UINT32(1);
> + Buffer buffer;
> +
> + Relation rel;
> + RangeVar *relrv;
> + Datum result;
> +
> + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
> + rel = relation_openrv(relrv, AccessShareLock);
> +
> + CHECK_RELATION_BLOCK_RANGE(rel, blkno);
> +
> + buffer = ReadBuffer(rel, blkno);
> +
> + if (!IS_INDEX(rel) || !IS_BTREE(rel))
> + elog(ERROR, "bt_page_stats() can be used only on b-tree index.");
> +
> + if (blkno == 0)
> + elog(ERROR, "Block 0 is a meta page.");
> +
> + {
> + HeapTuple tuple;
> + TupleDesc tupleDesc;
> + int j;
> + char *values[BTPAGESTATS_NCOLUMNS];
> +
> + BTPageStat stat;
> +
> + GetBTPageStatistics(blkno, buffer, &stat);
> +
> + tupleDesc = RelationNameGetTupleDesc(BTPAGESTATS_TYPE);
> +
> + j = 0;
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", stat.blkno);
> +
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%c", stat.type);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", stat.live_items);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", stat.dead_items);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", stat.avg_item_size);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", stat.page_size);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", stat.free_size);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", stat.btpo_prev);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", stat.btpo_next);
> +
> + values[j] = palloc(32);
> + if (stat.type == 'd')
> + snprintf(values[j++], 32, "%d", stat.btpo.xact);
> + else
> + snprintf(values[j++], 32, "%d", stat.btpo.level);
> +
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", stat.btpo_flags);
> +
> + tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
> + values);
> +
> + result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
> + }
> +
> + ReleaseBuffer(buffer);
> +
> + relation_close(rel, AccessShareLock);
> +
> + PG_RETURN_DATUM(result);
> +}
> +
> +/*-------------------------------------------------------
> + * bt_page_items()
> + *
> + * Get IndexTupleData set in a leaf page
> + *
> + * Usage: SELECT * FROM bt_page_items('t1_pkey', 0);
> + *-------------------------------------------------------
> + */
> +/* ---------------------------------------------------
> + * data structure for SRF to hold a scan information
> + * ---------------------------------------------------
> + */
> +struct user_args
> +{
> + TupleDesc tupd;
> + Relation rel;
> + Buffer buffer;
> + Page page;
> + uint16 offset;
> +};
> +
> +Datum
> +bt_page_items(PG_FUNCTION_ARGS)
> +{
> + text *relname = PG_GETARG_TEXT_P(0);
> + uint32 blkno = PG_GETARG_UINT32(1);
> +
> + RangeVar *relrv;
> + Datum result;
> + char *values[BTPAGEITEMS_NCOLUMNS];
> + BTPageOpaque opaque;
> + HeapTuple tuple;
> + ItemId id;
> +
> + FuncCallContext *fctx;
> + MemoryContext mctx;
> + struct user_args *uargs = NULL;
> +
> + if (blkno == 0)
> + elog(ERROR, "Block 0 is a meta page.");
> +
> + if (SRF_IS_FIRSTCALL())
> + {
> + fctx = SRF_FIRSTCALL_INIT();
> + mctx = MemoryContextSwitchTo(fctx->multi_call_memory_ctx);
> +
> + uargs = palloc(sizeof(struct user_args));
> +
> + uargs->tupd = RelationNameGetTupleDesc(BTPAGEITEMS_TYPE);
> + uargs->offset = FirstOffsetNumber;
> +
> + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
> + uargs->rel = relation_openrv(relrv, AccessShareLock);
> +
> + CHECK_RELATION_BLOCK_RANGE(uargs->rel, blkno);
> +
> + uargs->buffer = ReadBuffer(uargs->rel, blkno);
> +
> + if (!IS_INDEX(uargs->rel) || !IS_BTREE(uargs->rel))
> + elog(ERROR, "bt_page_items() can be used only on b-tree index.");
> +
> + uargs->page = BufferGetPage(uargs->buffer);
> +
> + opaque = (BTPageOpaque) PageGetSpecialPointer(uargs->page);
> +
> + if (P_ISDELETED(opaque))
> + elog(NOTICE, "bt_page_items(): this page is deleted.");
> +
> + fctx->max_calls = PageGetMaxOffsetNumber(uargs->page);
> + fctx->user_fctx = uargs;
> +
> + MemoryContextSwitchTo(mctx);
> + }
> +
> + fctx = SRF_PERCALL_SETUP();
> + uargs = fctx->user_fctx;
> +
> + if (fctx->call_cntr < fctx->max_calls)
> + {
> + IndexTuple itup;
> +
> + id = PageGetItemId(uargs->page, uargs->offset);
> +
> + if (!ItemIdIsValid(id))
> + elog(ERROR, "Invalid ItemId.");
> +
> + itup = (IndexTuple) PageGetItem(uargs->page, id);
> +
> + {
> + int j = 0;
> +
> + BlockNumber blkno = BlockIdGetBlockNumber(&(itup->t_tid.ip_blkid));
> +
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", uargs->offset);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "(%u,%u)", blkno, itup->t_tid.ip_posid);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", IndexTupleSize(itup));
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%c", IndexTupleHasNulls(itup) ? 't' : 'f');
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%c", IndexTupleHasVarwidths(itup) ? 't' : 'f');
> +
> + {
> + int off;
> + char *dump;
> + char *ptr = (char *) itup + IndexInfoFindDataOffset(itup->t_info);
> +
> + dump = palloc(IndexTupleSize(itup) * 3);
> + memset(dump, 0, IndexTupleSize(itup) * 3);
> +
> + for (off = 0;
> + off < IndexTupleSize(itup) - IndexInfoFindDataOffset(itup->t_info);
> + off++)
> + {
> + if (dump[0] == '\0')
> + sprintf(dump, "%02x", *(ptr + off) & 0xff);
> + else
> + {
> + char buf[4];
> +
> + sprintf(buf, " %02x", *(ptr + off) & 0xff);
> + strcat(dump, buf);
> + }
> + }
> + values[j] = dump;
> + }
> +
> + tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(uargs->tupd), values);
> + result = TupleGetDatum(TupleDescGetSlot(uargs->tupd), tuple);
> + }
> +
> + uargs->offset = uargs->offset + 1;
> +
> + SRF_RETURN_NEXT(fctx, result);
> + }
> + else
> + {
> + ReleaseBuffer(uargs->buffer);
> + relation_close(uargs->rel, AccessShareLock);
> +
> + SRF_RETURN_DONE(fctx);
> + }
> +}
> +
> +
> +/* ------------------------------------------------
> + * bt_metap()
> + *
> + * Get a btree meta-page information
> + *
> + * Usage: SELECT * FROM bt_metap('t1_pkey')
> + * ------------------------------------------------
> + */
> +Datum
> +bt_metap(PG_FUNCTION_ARGS)
> +{
> + text *relname = PG_GETARG_TEXT_P(0);
> + Buffer buffer;
> +
> + Relation rel;
> + RangeVar *relrv;
> + Datum result;
> +
> + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
> + rel = relation_openrv(relrv, AccessShareLock);
> +
> + if (!IS_INDEX(rel) || !IS_BTREE(rel))
> + elog(ERROR, "bt_metap() can be used only on b-tree index.");
> +
> + buffer = ReadBuffer(rel, 0);
> +
> + {
> + BTMetaPageData *metad;
> +
> + TupleDesc tupleDesc;
> + int j;
> + char *values[BTMETAP_NCOLUMNS];
> + HeapTuple tuple;
> +
> + Page page = BufferGetPage(buffer);
> +
> + metad = BTPageGetMeta(page);
> +
> + tupleDesc = RelationNameGetTupleDesc(BTMETAP_TYPE);
> +
> + j = 0;
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", metad->btm_magic);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", metad->btm_version);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", metad->btm_root);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", metad->btm_level);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", metad->btm_fastroot);
> + values[j] = palloc(32);
> + snprintf(values[j++], 32, "%d", metad->btm_fastlevel);
> +
> + tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
> + values);
> +
> + result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
> + }
> +
> + ReleaseBuffer(buffer);
> +
> + relation_close(rel, AccessShareLock);
> +
> + PG_RETURN_DATUM(result);
> +}
> +
> +/* --------------------------------------------------------
> + * pg_relpages()
> + *
> + * Get a number of pages of the table/index.
> + *
> + * Usage: SELECT pg_relpages('t1');
> + * SELECT pg_relpages('t1_pkey');
> + * --------------------------------------------------------
> + */
> +Datum
> +pg_relpages(PG_FUNCTION_ARGS)
> +{
> + text *relname = PG_GETARG_TEXT_P(0);
> +
> + Relation rel;
> + RangeVar *relrv;
> + int4 relpages;
> +
> + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
> + rel = relation_openrv(relrv, AccessShareLock);
> +
> + relpages = RelationGetNumberOfBlocks(rel);
> +
> + relation_close(rel, AccessShareLock);
> +
> + PG_RETURN_INT32(relpages);
> +}
> diff -ruN pgstattuple.orig/pgstattuple.sql.in pgstattuple/pgstattuple.sql.in
> --- pgstattuple.orig/pgstattuple.sql.in 2006-02-28 01:09:50.000000000 +0900
> +++ pgstattuple/pgstattuple.sql.in 2006-08-14 10:37:32.000000000 +0900
> @@ -22,3 +22,96 @@
> RETURNS pgstattuple_type
> AS 'MODULE_PATHNAME', 'pgstattuplebyid'
> LANGUAGE C STRICT;
> +
> +--
> +-- pgstatindex
> +--
> +DROP TYPE pgstatindex_type CASCADE;
> +CREATE TYPE pgstatindex_type AS (
> + version int4,
> + tree_level int4,
> + index_size int4,
> + root_block_no int4,
> + internal_pages int4,
> + leaf_pages int4,
> + empty_pages int4,
> + deleted_pages int4,
> + avg_leaf_density float8,
> + leaf_fragmentation float8
> +);
> +
> +CREATE OR REPLACE FUNCTION pgstatindex(text)
> +RETURNS pgstatindex_type
> +AS 'MODULE_PATHNAME', 'pgstatindex'
> +LANGUAGE 'C' STRICT;
> +
> +--
> +-- bt_metap()
> +--
> +DROP TYPE bt_metap_type CASCADE;
> +CREATE TYPE bt_metap_type AS (
> + magic int4,
> + version int4,
> + root int4,
> + level int4,
> + fastroot int4,
> + fastlevel int4
> +);
> +
> +CREATE OR REPLACE FUNCTION bt_metap(text)
> +RETURNS bt_metap_type
> +AS 'MODULE_PATHNAME', 'bt_metap'
> +LANGUAGE 'C' STRICT;
> +
> +--
> +-- bt_page_items()
> +--
> +DROP TYPE bt_page_items_type CASCADE;
> +CREATE TYPE bt_page_items_type AS (
> + itemoffset int4,
> + ctid tid,
> + itemlen int4,
> + nulls bool,
> + vars bool,
> + data text
> +);
> +
> +DROP FUNCTION bt_page_items(text, int4);
> +
> +CREATE OR REPLACE FUNCTION bt_page_items(text, int4)
> +RETURNS SETOF bt_page_items_type
> +AS 'MODULE_PATHNAME', 'bt_page_items'
> +LANGUAGE 'C' STRICT;
> +
> +--
> +-- bt_page_stats()
> +--
> +DROP TYPE bt_page_stats_type CASCADE;
> +CREATE TYPE bt_page_stats_type AS (
> + blkno int4,
> + type char,
> + live_items int4,
> + dead_items int4,
> + avg_item_size float,
> + page_size int4,
> + free_size int4,
> + btpo_prev int4,
> + btpo_next int4,
> + btpo int4,
> + btpo_flags int4
> +);
> +
> +DROP FUNCTION bt_page_stats(text, int4);
> +
> +CREATE OR REPLACE FUNCTION bt_page_stats(text, int4)
> +RETURNS bt_page_stats_type
> +AS 'MODULE_PATHNAME', 'bt_page_stats'
> +LANGUAGE 'C' STRICT;
> +
> +--
> +-- pg_relpages()
> +--
> +CREATE OR REPLACE FUNCTION pg_relpages(text)
> +RETURNS int
> +AS 'MODULE_PATHNAME', 'pg_relpages'
> +LANGUAGE 'C' STRICT;
> diff -ruN pgstattuple.orig/test.sh pgstattuple/test.sh
> --- pgstattuple.orig/test.sh 1970-01-01 09:00:00.000000000 +0900
> +++ pgstattuple/test.sh 2006-08-14 10:23:08.000000000 +0900
> @@ -0,0 +1,27 @@
> +#!/bin/sh
> +
> +PGHOME=/home/snaga/pgsql20060814
> +export PATH=${PGHOME}/bin:$PATH
> +
> +psql -p 9999 pgbench<<EOF
> +DROP FUNCTION pgstatindex(text);
> +EOF
> +
> +psql -p 9999 -f ${PGHOME}/share/contrib/pgstattuple.sql pgbench
> +
> +psql -p 9999 pgbench<<EOF
> +SELECT * FROM pg_relpages('accounts_pkey');
> +\x
> +SELECT * FROM pgstatindex('accounts_pkey');
> +SELECT * FROM bt_metap('accounts_pkey');
> +\x
> +SELECT * FROM bt_page_items('accounts_pkey', 0);
> +SELECT * FROM bt_page_items('accounts_pkey', 1) limit 20;
> +
> +SELECT * FROM bt_page_stats('accounts_pkey', 1);
> +\x
> +SELECT * FROM bt_page_stats('accounts_pkey', 1);
> +SELECT * FROM bt_page_stats('accounts_pkey', 361);
> +\x
> +SELECT * FROM bt_page_items('accounts_pkey', 361);
> +EOF
>
--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Attachment Content-Type Size
unknown_filename text/plain 4.4 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-patches(at)postgresql(dot)org, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, ishii(at)sraoss(dot)co(dot)jp
Subject: Re: pgstattuple extension for indexes
Date: 2006-09-03 23:00:29
Message-ID: 1876.1157324429@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> Patch applied. Thanks.

For some reason I expected this patch to correct the portability errors
and design problems identified here:
http://archives.postgresql.org/pgsql-patches/2006-07/msg00100.php

Not only has it not fixed anything, it's made things worse:

gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Winline -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -g -fpic -I. -I../../src/include -D_GNU_SOURCE -c -o pgstattuple.o pgstattuple.c
pgstattuple.c: In function 'pgstat_btree':
pgstattuple.c:335: warning: format '%llu' expects type 'long long unsigned int', but argument 2 has type 'uint64'
pgstattuple.c:335: warning: format '%llu' expects type 'long long unsigned int', but argument 3 has type 'uint64'
pgstattuple.c:335: warning: format '%llu' expects type 'long long unsigned int', but argument 4 has type 'uint64'
gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Winline -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -g -fpic -I. -I../../src/include -D_GNU_SOURCE -c -o pgstatindex.o pgstatindex.c
pgstatindex.c: In function 'bt_page_items':
pgstatindex.c:564: warning: format '%d' expects type 'int', but argument 4 has type 'long unsigned int'
pgstatindex.c:564: warning: format '%d' expects type 'int', but argument 4 has type 'long unsigned int'
ar crs libpgstattuple.a pgstattuple.o pgstatindex.o

The only reason the buildfarm isn't crashing on this contrib module is
that it lacks any regression test to crash on.

regards, tom lane


From: Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-patches(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, ishii(at)sraoss(dot)co(dot)jp
Subject: Re: pgstattuple extension for indexes
Date: 2006-09-03 23:13:33
Message-ID: 44FB619D.2010905@nttdata.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Thanks Bruce,

Here are updated Japanese README, and uninstall_pgstattuple.sql.

Bruce Momjian wrote:
> Patch applied. Thanks.
>
> I updated the README documentation for the new functions, attached. I
> could not update the Japanese version of the README.
>
> ---------------------------------------------------------------------------
>
>
> Satoshi Nagayasu wrote:
>> Bruce,
>>
>> Attached patch has been cleaned up,
>> and modified to be able to work with CVS HEAD.
>>
>> Thanks.
>>
>> Satoshi Nagayasu wrote:
>>> Alvaro,
>>>
>>> Alvaro Herrera wrote:
>>>> Huh, I bet it works with 8.1.4, but it doesn't work on CVS HEAD:
>>>>
>>>> /pgsql/source/00orig/contrib/pgstattuple/pgstatindex.c: In function 'GetBTPageStatistics':
>>>> /pgsql/source/00orig/contrib/pgstattuple/pgstatindex.c:182: error: 'BTItem' undeclared (first use in this function)
>>>>
>>>>
>>>> While you're at it, please consider removing C++ style comments and
>>>> unused code.
>>>>
>>>> Formatting is way off as well, but I guess that is easily fixed with
>>>> pgindent.
>>> Thanks for comments. I'm going to fix my patch from now.
>>>
>>>> Regarding the pg_relpages function, why do you think it's necessary?
>>>> (It returns the true number of blocks of a given relation). It may
>>>> belong into core given a reasonable use case, but otherwise it doesn't
>>>> seem to belong into pgstatindex (or pgstattuple for that matter).
>>> I wanted to sample some pages from the table/index, and get their statistics
>>> to know table/index conditions. I know pgstattuple() reports table
>>> statistics, however, pgstattuple() generates heavy CPU and I/O load.
>>>
>>> When we need to sample some pages from table/index, we need to know
>>> true number of blocks.
>>>
>>> I have another function, called pgstatpage(), to get information inside
>>> a single block/page statistics of the table. pg_relpages() will be used
>>> with this.
>>>
>>> Sorry for not mentioned in previous post about pgstatpage(),
>>> but I've remembered about it just now.
>>>
>>> Many memories in my brain have already `paged-out` (too busy in last few months),
>>> and some of them got `out-of-memory`. :^)
>>>
>>> Thanks.
>>
>> --
>> NAGAYASU Satoshi <nagayasus(at)nttdata(dot)co(dot)jp>
>> Phone: +81-3-3523-8122
>
>> diff -ruN pgstattuple.orig/Makefile pgstattuple/Makefile
>> --- pgstattuple.orig/Makefile 2006-02-27 21:54:40.000000000 +0900
>> +++ pgstattuple/Makefile 2006-08-14 09:28:58.000000000 +0900
>> @@ -6,7 +6,7 @@
>> #
>> #-------------------------------------------------------------------------
>>
>> -SRCS = pgstattuple.c
>> +SRCS = pgstattuple.c pgstatindex.c
>>
>> MODULE_big = pgstattuple
>> OBJS = $(SRCS:.c=.o)
>> diff -ruN pgstattuple.orig/pgstatindex.c pgstattuple/pgstatindex.c
>> --- pgstattuple.orig/pgstatindex.c 1970-01-01 09:00:00.000000000 +0900
>> +++ pgstattuple/pgstatindex.c 2006-08-14 11:24:23.000000000 +0900
>> @@ -0,0 +1,706 @@
>> +/*
>> + * pgstatindex
>> + *
>> + * Copyright (c) 2006 Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>
>> + *
>> + * Permission to use, copy, modify, and distribute this software and
>> + * its documentation for any purpose, without fee, and without a
>> + * written agreement is hereby granted, provided that the above
>> + * copyright notice and this paragraph and the following two
>> + * paragraphs appear in all copies.
>> + *
>> + * IN NO EVENT SHALL THE AUTHOR BE LIABLE TO ANY PARTY FOR DIRECT,
>> + * INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
>> + * LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
>> + * DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED
>> + * OF THE POSSIBILITY OF SUCH DAMAGE.
>> + *
>> + * THE AUTHOR SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT
>> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
>> + * A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS
>> + * IS" BASIS, AND THE AUTHOR HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE,
>> + * SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
>> + */
>> +
>> +#include "postgres.h"
>> +
>> +#include "fmgr.h"
>> +#include "funcapi.h"
>> +#include "access/heapam.h"
>> +#include "access/itup.h"
>> +#include "access/nbtree.h"
>> +#include "access/transam.h"
>> +#include "catalog/namespace.h"
>> +#include "catalog/pg_type.h"
>> +#include "utils/builtins.h"
>> +#include "utils/inval.h"
>> +
>> +PG_FUNCTION_INFO_V1(pgstatindex);
>> +PG_FUNCTION_INFO_V1(bt_metap);
>> +PG_FUNCTION_INFO_V1(bt_page_items);
>> +PG_FUNCTION_INFO_V1(bt_page_stats);
>> +PG_FUNCTION_INFO_V1(pg_relpages);
>> +
>> +extern Datum pgstatindex(PG_FUNCTION_ARGS);
>> +extern Datum bt_metap(PG_FUNCTION_ARGS);
>> +extern Datum bt_page_items(PG_FUNCTION_ARGS);
>> +extern Datum bt_page_stats(PG_FUNCTION_ARGS);
>> +extern Datum pg_relpages(PG_FUNCTION_ARGS);
>> +
>> +#define PGSTATINDEX_TYPE "public.pgstatindex_type"
>> +#define PGSTATINDEX_NCOLUMNS 10
>> +
>> +#define BTMETAP_TYPE "public.bt_metap_type"
>> +#define BTMETAP_NCOLUMNS 6
>> +
>> +#define BTPAGEITEMS_TYPE "public.bt_page_items_type"
>> +#define BTPAGEITEMS_NCOLUMNS 6
>> +
>> +#define BTPAGESTATS_TYPE "public.bt_page_stats_type"
>> +#define BTPAGESTATS_NCOLUMNS 11
>> +
>> +
>> +#define IS_INDEX(r) ((r)->rd_rel->relkind == 'i')
>> +#define IS_BTREE(r) ((r)->rd_rel->relam == BTREE_AM_OID)
>> +
>> +#define CHECK_PAGE_OFFSET_RANGE(page, offset) { \
>> + if ( !(FirstOffsetNumber<=(offset) && \
>> + (offset)<=PageGetMaxOffsetNumber(page)) ) \
>> + elog(ERROR, "Page offset number out of range."); }
>> +
>> +#define CHECK_RELATION_BLOCK_RANGE(rel, blkno) { \
>> + if ( (blkno)<0 && RelationGetNumberOfBlocks((rel))<=(blkno) ) \
>> + elog(ERROR, "Block number out of range."); }
>> +
>> +/* ------------------------------------------------
>> + * structure for single btree page statistics
>> + * ------------------------------------------------
>> + */
>> +typedef struct BTPageStat
>> +{
>> + uint32 blkno;
>> + uint32 live_items;
>> + uint32 dead_items;
>> + uint32 page_size;
>> + uint32 max_avail;
>> + uint32 free_size;
>> + uint32 avg_item_size;
>> + uint32 fragments;
>> + char type;
>> +
>> + /* opaque data */
>> + BlockNumber btpo_prev;
>> + BlockNumber btpo_next;
>> + union
>> + {
>> + uint32 level;
>> + TransactionId xact;
>> + } btpo;
>> + uint16 btpo_flags;
>> + BTCycleId btpo_cycleid;
>> +} BTPageStat;
>> +
>> +/* ------------------------------------------------
>> + * A structure for a whole btree index statistics
>> + * used by pgstatindex().
>> + * ------------------------------------------------
>> + */
>> +typedef struct BTIndexStat
>> +{
>> + uint32 magic;
>> + uint32 version;
>> + BlockNumber root_blkno;
>> + uint32 level;
>> +
>> + BlockNumber fastroot;
>> + uint32 fastlevel;
>> +
>> + uint32 live_items;
>> + uint32 dead_items;
>> +
>> + uint32 root_pages;
>> + uint32 internal_pages;
>> + uint32 leaf_pages;
>> + uint32 empty_pages;
>> + uint32 deleted_pages;
>> +
>> + uint32 page_size;
>> + uint32 avg_item_size;
>> +
>> + uint32 max_avail;
>> + uint32 free_space;
>> +
>> + uint32 fragments;
>> +} BTIndexStat;
>> +
>> +/* -------------------------------------------------
>> + * GetBTPageStatistics()
>> + *
>> + * Collect statistics of single b-tree leaf page
>> + * -------------------------------------------------
>> + */
>> +static bool
>> +GetBTPageStatistics(BlockNumber blkno, Buffer buffer, BTPageStat * stat)
>> +{
>> + Page page = BufferGetPage(buffer);
>> + PageHeader phdr = (PageHeader) page;
>> + OffsetNumber maxoff = PageGetMaxOffsetNumber(page);
>> + BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);
>> + int item_size = 0;
>> + int off;
>> +
>> + stat->blkno = blkno;
>> +
>> + stat->max_avail = BLCKSZ - (BLCKSZ - phdr->pd_special + SizeOfPageHeaderData);
>> +
>> + stat->dead_items = stat->live_items = 0;
>> +
>> + stat->page_size = PageGetPageSize(page);
>> +
>> + /* page type (flags) */
>> + if (P_ISDELETED(opaque))
>> + {
>> + stat->type = 'd';
>> + return true;
>> + }
>> + else if (P_IGNORE(opaque))
>> + stat->type = 'e';
>> + else if (P_ISLEAF(opaque))
>> + stat->type = 'l';
>> + else if (P_ISROOT(opaque))
>> + stat->type = 'r';
>> + else
>> + stat->type = 'i';
>> +
>> + /* btpage opaque data */
>> + stat->btpo_prev = opaque->btpo_prev;
>> + stat->btpo_next = opaque->btpo_next;
>> + if (P_ISDELETED(opaque))
>> + stat->btpo.xact = opaque->btpo.xact;
>> + else
>> + stat->btpo.level = opaque->btpo.level;
>> + stat->btpo_flags = opaque->btpo_flags;
>> + stat->btpo_cycleid = opaque->btpo_cycleid;
>> +
>> + /*----------------------------------------------
>> + * If a next leaf is on the previous block,
>> + * it means a fragmentation.
>> + *----------------------------------------------
>> + */
>> + stat->fragments = 0;
>> + if (stat->type == 'l')
>> + {
>> + if (opaque->btpo_next != P_NONE && opaque->btpo_next < blkno)
>> + stat->fragments++;
>> + }
>> +
>> + /* count live and dead tuples, and free space */
>> + for (off = FirstOffsetNumber; off <= maxoff; off++)
>> + {
>> + IndexTuple itup;
>> +
>> + ItemId id = PageGetItemId(page, off);
>> +
>> + itup = (IndexTuple) PageGetItem(page, id);
>> +
>> + item_size += IndexTupleSize(itup);
>> +
>> + if (!ItemIdDeleted(id))
>> + stat->live_items++;
>> + else
>> + stat->dead_items++;
>> + }
>> + stat->free_size = PageGetFreeSpace(page);
>> +
>> + if ((stat->live_items + stat->dead_items) > 0)
>> + stat->avg_item_size = item_size / (stat->live_items + stat->dead_items);
>> + else
>> + stat->avg_item_size = 0;
>> +
>> + return true;
>> +}
>> +
>> +
>> +/* ------------------------------------------------------
>> + * pgstatindex()
>> + *
>> + * Usage: SELECT * FROM pgstatindex('t1_pkey');
>> + * ------------------------------------------------------
>> + */
>> +Datum
>> +pgstatindex(PG_FUNCTION_ARGS)
>> +{
>> + text *relname = PG_GETARG_TEXT_P(0);
>> + Relation rel;
>> + RangeVar *relrv;
>> + Datum result;
>> + uint32 nblocks;
>> + uint32 blkno;
>> + BTIndexStat indexStat;
>> +
>> + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
>> + rel = relation_openrv(relrv, AccessShareLock);
>> +
>> + if (!IS_INDEX(rel) || !IS_BTREE(rel))
>> + elog(ERROR, "pgstatindex() can be used only on b-tree index.");
>> +
>> + /*-------------------
>> + * Read a metapage
>> + *-------------------
>> + */
>> + {
>> + Buffer buffer = ReadBuffer(rel, 0);
>> + Page page = BufferGetPage(buffer);
>> + BTMetaPageData *metad = BTPageGetMeta(page);
>> +
>> + indexStat.magic = metad->btm_magic;
>> + indexStat.version = metad->btm_version;
>> + indexStat.root_blkno = metad->btm_root;
>> + indexStat.level = metad->btm_level;
>> + indexStat.fastroot = metad->btm_fastroot;
>> + indexStat.fastlevel = metad->btm_fastlevel;
>> +
>> + ReleaseBuffer(buffer);
>> + }
>> +
>> + nblocks = RelationGetNumberOfBlocks(rel);
>> +
>> + /* -- init stat -- */
>> + indexStat.fragments = 0;
>> +
>> + indexStat.root_pages = 0;
>> + indexStat.leaf_pages = 0;
>> + indexStat.internal_pages = 0;
>> + indexStat.empty_pages = 0;
>> + indexStat.deleted_pages = 0;
>> +
>> + indexStat.max_avail = 0;
>> + indexStat.free_space = 0;
>> +
>> + /*-----------------------
>> + * Scan all blocks
>> + *-----------------------
>> + */
>> + for (blkno = 1; blkno < nblocks; blkno++)
>> + {
>> + Buffer buffer = ReadBuffer(rel, blkno);
>> + BTPageStat stat;
>> +
>> + /* scan one page */
>> + stat.blkno = blkno;
>> + GetBTPageStatistics(blkno, buffer, &stat);
>> +
>> + /*---------------------
>> + * page status (type)
>> + *---------------------
>> + */
>> + switch (stat.type)
>> + {
>> + case 'd':
>> + indexStat.deleted_pages++;
>> + break;
>> + case 'l':
>> + indexStat.leaf_pages++;
>> + break;
>> + case 'i':
>> + indexStat.internal_pages++;
>> + break;
>> + case 'e':
>> + indexStat.empty_pages++;
>> + break;
>> + case 'r':
>> + indexStat.root_pages++;
>> + break;
>> + default:
>> + elog(ERROR, "unknown page status.");
>> + }
>> +
>> + /* -- leaf fragmentation -- */
>> + indexStat.fragments += stat.fragments;
>> +
>> + if (stat.type == 'l')
>> + {
>> + indexStat.max_avail += stat.max_avail;
>> + indexStat.free_space += stat.free_size;
>> + }
>> +
>> + ReleaseBuffer(buffer);
>> + }
>> +
>> + relation_close(rel, AccessShareLock);
>> +
>> + /*----------------------------
>> + * Build a result tuple
>> + *----------------------------
>> + */
>> + {
>> + TupleDesc tupleDesc;
>> + int j;
>> + char *values[PGSTATINDEX_NCOLUMNS];
>> +
>> + HeapTupleData tupleData;
>> + HeapTuple tuple = &tupleData;
>> +
>> + tupleDesc = RelationNameGetTupleDesc(PGSTATINDEX_TYPE);
>> +
>> + j = 0;
>> + values[j] = palloc(32);
>> + snprintf(values[j++], 32, "%d", indexStat.version);
>> + values[j] = palloc(32);
>> + snprintf(values[j++], 32, "%d", indexStat.level);
>> + values[j] = palloc(32);
>> + snprintf(values[j++], 32, "%d", (indexStat.root_pages +
>> + indexStat.leaf_pages +
>> + indexStat.internal_pages +
>> + indexStat.deleted_pages +
>> + indexStat.empty_pages) * BLCKSZ);
>> + values[j] = palloc(32);
>> + snprintf(values[j++], 32, "%d", indexStat.root_blkno);
>> + values[j] = palloc(32);
>> + snprintf(values[j++], 32, "%d", indexStat.internal_pages);
>> + values[j] = palloc(32);
>> + snprintf(values[j++], 32, "%d", indexStat.leaf_pages);
>> + values[j] = palloc(32);
>> + snprintf(values[j++], 32, "%d", indexStat.empty_pages);
>> + values[j] = palloc(32);
>> + snprintf(values[j++], 32, "%d", indexStat.deleted_pages);
>> + values[j] = palloc(32);
>> + snprintf(values[j++], 32, "%.2f", 100.0 - (float) indexStat.free_space / (float) indexStat.max_avail * 100.0);
>> + values[j] = palloc(32);
>> + snprintf(values[j++], 32, "%.2f", (float) indexStat.fragments / (float) indexStat.leaf_pages * 100.0);
>> +
>> + tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
>> + values);
>> +
>> + result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
>> + }
>> +
>> + PG_RETURN_DATUM(result);
>> +}
>> +
>> +/* -----------------------------------------------
>> + * bt_page()
>> + *
>> + * Usage: SELECT * FROM bt_page('t1_pkey', 0);
>> + * -----------------------------------------------
>> + */
>> +Datum
>> +bt_page_stats(PG_FUNCTION_ARGS)
>> +{
>> + text *relname = PG_GETARG_TEXT_P(0);
>> + uint32 blkno = PG_GETARG_UINT32(1);
>> + Buffer buffer;
>> +
>> + Relation rel;
>> + RangeVar *relrv;
>> + Datum result;
>> +
>> + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
>> + rel = relation_openrv(relrv, AccessShareLock);
>> +
>> + CHECK_RELATION_BLOCK_RANGE(rel, blkno);
>> +
>> + buffer = ReadBuffer(rel, blkno);
>> +
>> + if (!IS_INDEX(rel) || !IS_BTREE(rel))
>> + elog(ERROR, "bt_page_stats() can be used only on b-tree index.");
>> +
>> + if (blkno == 0)
>> + elog(ERROR, "Block 0 is a meta page.");
>> +
>> + {
>> + HeapTuple tuple;
>> + TupleDesc tupleDesc;
>> + int j;
>> + char *values[BTPAGESTATS_NCOLUMNS];
>> +
>> + BTPageStat stat;
>> +
>> + GetBTPageStatistics(blkno, buffer, &stat);
>> +
>> + tupleDesc = RelationNameGetTupleDesc(BTPAGESTATS_TYPE);
>> +
>> + j = 0;
>> + values[j] = palloc(32);
>> + snprintf(values[j++], 32, "%d", stat.blkno);
>> +
>> + values[j] = palloc(32);
>> + snprintf(values[j++], 32, "%c", stat.type);
>> + values[j] = palloc(32);
>> + snprintf(values[j++], 32, "%d", stat.live_items);
>> + values[j] = palloc(32);
>> + snprintf(values[j++], 32, "%d", stat.dead_items);
>> + values[j] = palloc(32);
>> + snprintf(values[j++], 32, "%d", stat.avg_item_size);
>> + values[j] = palloc(32);
>> + snprintf(values[j++], 32, "%d", stat.page_size);
>> + values[j] = palloc(32);
>> + snprintf(values[j++], 32, "%d", stat.free_size);
>> + values[j] = palloc(32);
>> + snprintf(values[j++], 32, "%d", stat.btpo_prev);
>> + values[j] = palloc(32);
>> + snprintf(values[j++], 32, "%d", stat.btpo_next);
>> +
>> + values[j] = palloc(32);
>> + if (stat.type == 'd')
>> + snprintf(values[j++], 32, "%d", stat.btpo.xact);
>> + else
>> + snprintf(values[j++], 32, "%d", stat.btpo.level);
>> +
>> + values[j] = palloc(32);
>> + snprintf(values[j++], 32, "%d", stat.btpo_flags);
>> +
>> + tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
>> + values);
>> +
>> + result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
>> + }
>> +
>> + ReleaseBuffer(buffer);
>> +
>> + relation_close(rel, AccessShareLock);
>> +
>> + PG_RETURN_DATUM(result);
>> +}
>> +
>> +/*-------------------------------------------------------
>> + * bt_page_items()
>> + *
>> + * Get IndexTupleData set in a leaf page
>> + *
>> + * Usage: SELECT * FROM bt_page_items('t1_pkey', 0);
>> + *-------------------------------------------------------
>> + */
>> +/* ---------------------------------------------------
>> + * data structure for SRF to hold a scan information
>> + * ---------------------------------------------------
>> + */
>> +struct user_args
>> +{
>> + TupleDesc tupd;
>> + Relation rel;
>> + Buffer buffer;
>> + Page page;
>> + uint16 offset;
>> +};
>> +
>> +Datum
>> +bt_page_items(PG_FUNCTION_ARGS)
>> +{
>> + text *relname = PG_GETARG_TEXT_P(0);
>> + uint32 blkno = PG_GETARG_UINT32(1);
>> +
>> + RangeVar *relrv;
>> + Datum result;
>> + char *values[BTPAGEITEMS_NCOLUMNS];
>> + BTPageOpaque opaque;
>> + HeapTuple tuple;
>> + ItemId id;
>> +
>> + FuncCallContext *fctx;
>> + MemoryContext mctx;
>> + struct user_args *uargs = NULL;
>> +
>> + if (blkno == 0)
>> + elog(ERROR, "Block 0 is a meta page.");
>> +
>> + if (SRF_IS_FIRSTCALL())
>> + {
>> + fctx = SRF_FIRSTCALL_INIT();
>> + mctx = MemoryContextSwitchTo(fctx->multi_call_memory_ctx);
>> +
>> + uargs = palloc(sizeof(struct user_args));
>> +
>> + uargs->tupd = RelationNameGetTupleDesc(BTPAGEITEMS_TYPE);
>> + uargs->offset = FirstOffsetNumber;
>> +
>> + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
>> + uargs->rel = relation_openrv(relrv, AccessShareLock);
>> +
>> + CHECK_RELATION_BLOCK_RANGE(uargs->rel, blkno);
>> +
>> + uargs->buffer = ReadBuffer(uargs->rel, blkno);
>> +
>> + if (!IS_INDEX(uargs->rel) || !IS_BTREE(uargs->rel))
>> + elog(ERROR, "bt_page_items() can be used only on b-tree index.");
>> +
>> + uargs->page = BufferGetPage(uargs->buffer);
>> +
>> + opaque = (BTPageOpaque) PageGetSpecialPointer(uargs->page);
>> +
>> + if (P_ISDELETED(opaque))
>> + elog(NOTICE, "bt_page_items(): this page is deleted.");
>> +
>> + fctx->max_calls = PageGetMaxOffsetNumber(uargs->page);
>> + fctx->user_fctx = uargs;
>> +
>> + MemoryContextSwitchTo(mctx);
>> + }
>> +
>> + fctx = SRF_PERCALL_SETUP();
>> + uargs = fctx->user_fctx;
>> +
>> + if (fctx->call_cntr < fctx->max_calls)
>> + {
>> + IndexTuple itup;
>> +
>> + id = PageGetItemId(uargs->page, uargs->offset);
>> +
>> + if (!ItemIdIsValid(id))
>> + elog(ERROR, "Invalid ItemId.");
>> +
>> + itup = (IndexTuple) PageGetItem(uargs->page, id);
>> +
>> + {
>> + int j = 0;
>> +
>> + BlockNumber blkno = BlockIdGetBlockNumber(&(itup->t_tid.ip_blkid));
>> +
>> + values[j] = palloc(32);
>> + snprintf(values[j++], 32, "%d", uargs->offset);
>> + values[j] = palloc(32);
>> + snprintf(values[j++], 32, "(%u,%u)", blkno, itup->t_tid.ip_posid);
>> + values[j] = palloc(32);
>> + snprintf(values[j++], 32, "%d", IndexTupleSize(itup));
>> + values[j] = palloc(32);
>> + snprintf(values[j++], 32, "%c", IndexTupleHasNulls(itup) ? 't' : 'f');
>> + values[j] = palloc(32);
>> + snprintf(values[j++], 32, "%c", IndexTupleHasVarwidths(itup) ? 't' : 'f');
>> +
>> + {
>> + int off;
>> + char *dump;
>> + char *ptr = (char *) itup + IndexInfoFindDataOffset(itup->t_info);
>> +
>> + dump = palloc(IndexTupleSize(itup) * 3);
>> + memset(dump, 0, IndexTupleSize(itup) * 3);
>> +
>> + for (off = 0;
>> + off < IndexTupleSize(itup) - IndexInfoFindDataOffset(itup->t_info);
>> + off++)
>> + {
>> + if (dump[0] == '\0')
>> + sprintf(dump, "%02x", *(ptr + off) & 0xff);
>> + else
>> + {
>> + char buf[4];
>> +
>> + sprintf(buf, " %02x", *(ptr + off) & 0xff);
>> + strcat(dump, buf);
>> + }
>> + }
>> + values[j] = dump;
>> + }
>> +
>> + tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(uargs->tupd), values);
>> + result = TupleGetDatum(TupleDescGetSlot(uargs->tupd), tuple);
>> + }
>> +
>> + uargs->offset = uargs->offset + 1;
>> +
>> + SRF_RETURN_NEXT(fctx, result);
>> + }
>> + else
>> + {
>> + ReleaseBuffer(uargs->buffer);
>> + relation_close(uargs->rel, AccessShareLock);
>> +
>> + SRF_RETURN_DONE(fctx);
>> + }
>> +}
>> +
>> +
>> +/* ------------------------------------------------
>> + * bt_metap()
>> + *
>> + * Get a btree meta-page information
>> + *
>> + * Usage: SELECT * FROM bt_metap('t1_pkey')
>> + * ------------------------------------------------
>> + */
>> +Datum
>> +bt_metap(PG_FUNCTION_ARGS)
>> +{
>> + text *relname = PG_GETARG_TEXT_P(0);
>> + Buffer buffer;
>> +
>> + Relation rel;
>> + RangeVar *relrv;
>> + Datum result;
>> +
>> + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
>> + rel = relation_openrv(relrv, AccessShareLock);
>> +
>> + if (!IS_INDEX(rel) || !IS_BTREE(rel))
>> + elog(ERROR, "bt_metap() can be used only on b-tree index.");
>> +
>> + buffer = ReadBuffer(rel, 0);
>> +
>> + {
>> + BTMetaPageData *metad;
>> +
>> + TupleDesc tupleDesc;
>> + int j;
>> + char *values[BTMETAP_NCOLUMNS];
>> + HeapTuple tuple;
>> +
>> + Page page = BufferGetPage(buffer);
>> +
>> + metad = BTPageGetMeta(page);
>> +
>> + tupleDesc = RelationNameGetTupleDesc(BTMETAP_TYPE);
>> +
>> + j = 0;
>> + values[j] = palloc(32);
>> + snprintf(values[j++], 32, "%d", metad->btm_magic);
>> + values[j] = palloc(32);
>> + snprintf(values[j++], 32, "%d", metad->btm_version);
>> + values[j] = palloc(32);
>> + snprintf(values[j++], 32, "%d", metad->btm_root);
>> + values[j] = palloc(32);
>> + snprintf(values[j++], 32, "%d", metad->btm_level);
>> + values[j] = palloc(32);
>> + snprintf(values[j++], 32, "%d", metad->btm_fastroot);
>> + values[j] = palloc(32);
>> + snprintf(values[j++], 32, "%d", metad->btm_fastlevel);
>> +
>> + tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
>> + values);
>> +
>> + result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
>> + }
>> +
>> + ReleaseBuffer(buffer);
>> +
>> + relation_close(rel, AccessShareLock);
>> +
>> + PG_RETURN_DATUM(result);
>> +}
>> +
>> +/* --------------------------------------------------------
>> + * pg_relpages()
>> + *
>> + * Get a number of pages of the table/index.
>> + *
>> + * Usage: SELECT pg_relpages('t1');
>> + * SELECT pg_relpages('t1_pkey');
>> + * --------------------------------------------------------
>> + */
>> +Datum
>> +pg_relpages(PG_FUNCTION_ARGS)
>> +{
>> + text *relname = PG_GETARG_TEXT_P(0);
>> +
>> + Relation rel;
>> + RangeVar *relrv;
>> + int4 relpages;
>> +
>> + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
>> + rel = relation_openrv(relrv, AccessShareLock);
>> +
>> + relpages = RelationGetNumberOfBlocks(rel);
>> +
>> + relation_close(rel, AccessShareLock);
>> +
>> + PG_RETURN_INT32(relpages);
>> +}
>> diff -ruN pgstattuple.orig/pgstattuple.sql.in pgstattuple/pgstattuple.sql.in
>> --- pgstattuple.orig/pgstattuple.sql.in 2006-02-28 01:09:50.000000000 +0900
>> +++ pgstattuple/pgstattuple.sql.in 2006-08-14 10:37:32.000000000 +0900
>> @@ -22,3 +22,96 @@
>> RETURNS pgstattuple_type
>> AS 'MODULE_PATHNAME', 'pgstattuplebyid'
>> LANGUAGE C STRICT;
>> +
>> +--
>> +-- pgstatindex
>> +--
>> +DROP TYPE pgstatindex_type CASCADE;
>> +CREATE TYPE pgstatindex_type AS (
>> + version int4,
>> + tree_level int4,
>> + index_size int4,
>> + root_block_no int4,
>> + internal_pages int4,
>> + leaf_pages int4,
>> + empty_pages int4,
>> + deleted_pages int4,
>> + avg_leaf_density float8,
>> + leaf_fragmentation float8
>> +);
>> +
>> +CREATE OR REPLACE FUNCTION pgstatindex(text)
>> +RETURNS pgstatindex_type
>> +AS 'MODULE_PATHNAME', 'pgstatindex'
>> +LANGUAGE 'C' STRICT;
>> +
>> +--
>> +-- bt_metap()
>> +--
>> +DROP TYPE bt_metap_type CASCADE;
>> +CREATE TYPE bt_metap_type AS (
>> + magic int4,
>> + version int4,
>> + root int4,
>> + level int4,
>> + fastroot int4,
>> + fastlevel int4
>> +);
>> +
>> +CREATE OR REPLACE FUNCTION bt_metap(text)
>> +RETURNS bt_metap_type
>> +AS 'MODULE_PATHNAME', 'bt_metap'
>> +LANGUAGE 'C' STRICT;
>> +
>> +--
>> +-- bt_page_items()
>> +--
>> +DROP TYPE bt_page_items_type CASCADE;
>> +CREATE TYPE bt_page_items_type AS (
>> + itemoffset int4,
>> + ctid tid,
>> + itemlen int4,
>> + nulls bool,
>> + vars bool,
>> + data text
>> +);
>> +
>> +DROP FUNCTION bt_page_items(text, int4);
>> +
>> +CREATE OR REPLACE FUNCTION bt_page_items(text, int4)
>> +RETURNS SETOF bt_page_items_type
>> +AS 'MODULE_PATHNAME', 'bt_page_items'
>> +LANGUAGE 'C' STRICT;
>> +
>> +--
>> +-- bt_page_stats()
>> +--
>> +DROP TYPE bt_page_stats_type CASCADE;
>> +CREATE TYPE bt_page_stats_type AS (
>> + blkno int4,
>> + type char,
>> + live_items int4,
>> + dead_items int4,
>> + avg_item_size float,
>> + page_size int4,
>> + free_size int4,
>> + btpo_prev int4,
>> + btpo_next int4,
>> + btpo int4,
>> + btpo_flags int4
>> +);
>> +
>> +DROP FUNCTION bt_page_stats(text, int4);
>> +
>> +CREATE OR REPLACE FUNCTION bt_page_stats(text, int4)
>> +RETURNS bt_page_stats_type
>> +AS 'MODULE_PATHNAME', 'bt_page_stats'
>> +LANGUAGE 'C' STRICT;
>> +
>> +--
>> +-- pg_relpages()
>> +--
>> +CREATE OR REPLACE FUNCTION pg_relpages(text)
>> +RETURNS int
>> +AS 'MODULE_PATHNAME', 'pg_relpages'
>> +LANGUAGE 'C' STRICT;
>> diff -ruN pgstattuple.orig/test.sh pgstattuple/test.sh
>> --- pgstattuple.orig/test.sh 1970-01-01 09:00:00.000000000 +0900
>> +++ pgstattuple/test.sh 2006-08-14 10:23:08.000000000 +0900
>> @@ -0,0 +1,27 @@
>> +#!/bin/sh
>> +
>> +PGHOME=/home/snaga/pgsql20060814
>> +export PATH=${PGHOME}/bin:$PATH
>> +
>> +psql -p 9999 pgbench<<EOF
>> +DROP FUNCTION pgstatindex(text);
>> +EOF
>> +
>> +psql -p 9999 -f ${PGHOME}/share/contrib/pgstattuple.sql pgbench
>> +
>> +psql -p 9999 pgbench<<EOF
>> +SELECT * FROM pg_relpages('accounts_pkey');
>> +\x
>> +SELECT * FROM pgstatindex('accounts_pkey');
>> +SELECT * FROM bt_metap('accounts_pkey');
>> +\x
>> +SELECT * FROM bt_page_items('accounts_pkey', 0);
>> +SELECT * FROM bt_page_items('accounts_pkey', 1) limit 20;
>> +
>> +SELECT * FROM bt_page_stats('accounts_pkey', 1);
>> +\x
>> +SELECT * FROM bt_page_stats('accounts_pkey', 1);
>> +SELECT * FROM bt_page_stats('accounts_pkey', 361);
>> +\x
>> +SELECT * FROM bt_page_items('accounts_pkey', 361);
>> +EOF
>>
>>
>> ------------------------------------------------------------------------
>>
>> pgstattuple README 2002/08/29 Tatsuo Ishii
>>
>> 1. Functions supported:
>>
>> pgstattuple
>> -----------
>> pgstattuple() returns the relation length, percentage of the "dead"
>> tuples of a relation and other info. This may help users to determine
>> whether vacuum is necessary or not. Here is an example session:
>>
>> test=> \x
>> Expanded display is on.
>> test=> SELECT * FROM pgstattuple('pg_catalog.pg_proc');
>> -[ RECORD 1 ]------+-------
>> table_len | 458752
>> tuple_count | 1470
>> tuple_len | 438896
>> tuple_percent | 95.67
>> dead_tuple_count | 11
>> dead_tuple_len | 3157
>> dead_tuple_percent | 0.69
>> free_space | 8932
>> free_percent | 1.95
>>
>> Here are explanations for each column:
>>
>> table_len -- physical relation length in bytes
>> tuple_count -- number of live tuples
>> tuple_len -- total tuples length in bytes
>> tuple_percent -- live tuples in %
>> dead_tuple_len -- total dead tuples length in bytes
>> dead_tuple_percent -- dead tuples in %
>> free_space -- free space in bytes
>> free_percent -- free space in %
>>
>> pg_relpages
>> -----------
>> pg_relpages() returns the number of pages in the relation.
>>
>> pgstatindex
>> -----------
>> pgstatindex() returns an array showing the information about an index:
>>
>> test=> \x
>> Expanded display is on.
>> test=> SELECT * FROM pgstatindex('pg_cast_oid_index');
>> -[ RECORD 1 ]------+------
>> version | 2
>> tree_level | 0
>> index_size | 8192
>> root_block_no | 1
>> internal_pages | 0
>> leaf_pages | 1
>> empty_pages | 0
>> deleted_pages | 0
>> avg_leaf_density | 50.27
>> leaf_fragmentation | 0
>>
>> bt_metap
>> --------
>> bt_metap() returns information about the btree index metapage:
>>
>> test=> SELECT * FROM bt_metap('pg_cast_oid_index');
>> -[ RECORD 1 ]-----
>> magic | 340322
>> version | 2
>> root | 1
>> level | 0
>> fastroot | 1
>> fastlevel | 0
>>
>> bt_page_stats
>> -------------
>> bt_page_stats() shows information about single btree pages:
>>
>> test=> SELECT * FROM bt_page_stats('pg_cast_oid_index', 1);
>> -[ RECORD 1 ]-+-----
>> blkno | 1
>> type | l
>> live_items | 256
>> dead_items | 0
>> avg_item_size | 12
>> page_size | 8192
>> free_size | 4056
>> btpo_prev | 0
>> btpo_next | 0
>> btpo | 0
>> btpo_flags | 3
>>
>> bt_page_items
>> -------------
>> bt_page_items() returns information about specific items on btree pages:
>>
>> test=> SELECT * FROM bt_page_items('pg_cast_oid_index', 1);
>> itemoffset | ctid | itemlen | nulls | vars | data
>> ------------+---------+---------+-------+------+-------------
>> 1 | (0,1) | 12 | f | f | 23 27 00 00
>> 2 | (0,2) | 12 | f | f | 24 27 00 00
>> 3 | (0,3) | 12 | f | f | 25 27 00 00
>> 4 | (0,4) | 12 | f | f | 26 27 00 00
>> 5 | (0,5) | 12 | f | f | 27 27 00 00
>> 6 | (0,6) | 12 | f | f | 28 27 00 00
>> 7 | (0,7) | 12 | f | f | 29 27 00 00
>> 8 | (0,8) | 12 | f | f | 2a 27 00 00
>>
>>
>> 2. Installing pgstattuple
>>
>> $ make
>> $ make install
>> $ psql -e -f /usr/local/pgsql/share/contrib/pgstattuple.sql test
>>
>>
>> 3. Using pgstattuple
>>
>> pgstattuple may be called as a relation function and is
>> defined as follows:
>>
>> CREATE OR REPLACE FUNCTION pgstattuple(text) RETURNS pgstattuple_type
>> AS 'MODULE_PATHNAME', 'pgstattuple'
>> LANGUAGE C STRICT;
>>
>> CREATE OR REPLACE FUNCTION pgstattuple(oid) RETURNS pgstattuple_type
>> AS 'MODULE_PATHNAME', 'pgstattuplebyid'
>> LANGUAGE C STRICT;
>>
>> The argument is the relation name (optionally it may be qualified)
>> or the OID of the relation. Note that pgstattuple only returns
>> one row.
>>
>>
>> 4. Notes
>>
>> pgstattuple acquires only a read lock on the relation. So concurrent
>> update may affect the result.
>>
>> pgstattuple judges a tuple is "dead" if HeapTupleSatisfiesNow()
>> returns false.
>>
>>
>> 5. History
>>
>> 2006/06/28
>>
>> Extended to work against indexes.

--
NAGAYASU Satoshi <nagayasus(at)nttdata(dot)co(dot)jp>
Phone: +81-3-3523-8122

Attachment Content-Type Size
README.pgstattuple.euc_jp text/plain 5.5 KB
uninstall_pgstattuple.sql text/plain 480 bytes

From: Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-patches(at)postgresql(dot)org, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, ishii(at)sraoss(dot)co(dot)jp
Subject: Re: pgstattuple extension for indexes
Date: 2006-09-03 23:25:31
Message-ID: 44FB646B.8030808@nttdata.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


Tom Lane wrote:
> gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Winline -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -g -fpic -I. -I../../src/include -D_GNU_SOURCE -c -o pgstatindex.o pgstatindex.c
> pgstatindex.c: In function 'bt_page_items':
> pgstatindex.c:564: warning: format '%d' expects type 'int', but argument 4 has type 'long unsigned int'
> pgstatindex.c:564: warning: format '%d' expects type 'int', but argument 4 has type 'long unsigned int'

I guess my '%d' should be '%zd', right?
--
NAGAYASU Satoshi <nagayasus(at)nttdata(dot)co(dot)jp>
Phone: +81-3-3523-8122

Attachment Content-Type Size
pgstatindex.c.diff text/plain 811 bytes

From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: pgsql-patches(at)postgresql(dot)org
Cc: Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>, ishii(at)sraoss(dot)co(dot)jp
Subject: Re: pgstattuple extension for indexes
Date: 2006-09-04 01:39:18
Message-ID: 20060904100418.5CCA.ITAGAKI.TAKAHIRO@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> Patch applied. Thanks.

The two attached patches fix contrib/pgstattuple.

pgstattuple.c.diff removes the fragmemtation reporting in pgstattuple().
It is no longer needed, because pgstatindex() has upward functionality now.
Also, the report using elog was judged as improper in earlier discusses.

pgstattuple.sql.in.diff removes DROP statements in the installer. The DROP
statements make some unpleasant ERROR logs during install. According to
other contrib modules, DROPs should be in the uninstaller and should not
be in the installer.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

Attachment Content-Type Size
pgstattuple.sql.in.diff application/octet-stream 1.3 KB
pgstattuple.c.diff application/octet-stream 7.2 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgresql(dot)org, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: pgsql-patches(at)postgresql(dot)org, Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>, ishii(at)sraoss(dot)co(dot)jp
Subject: Re: pgstattuple extension for indexes
Date: 2006-09-04 02:05:54
Message-ID: 8271.1157335554@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> writes:
> The two attached patches fix contrib/pgstattuple.

Good, applied. I made some additional changes to get install/uninstall/
reinstall to work cleanly after the latest additions, and to get it to
compile without warnings on a 64-bit Fedora machine. (It seems to
actually work there, too.)

I notice that the original pgstattuple() function comes in two flavors,
one with OID input and one with text-relation-name input. Shouldn't all
the others be likewise?

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-patches(at)postgresql(dot)org, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, ishii(at)sraoss(dot)co(dot)jp
Subject: Re: pgstattuple extension for indexes
Date: 2006-09-04 02:13:41
Message-ID: 8453.1157336021@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp> writes:
> Tom Lane wrote:
>> pgstatindex.c: In function 'bt_page_items':
>> pgstatindex.c:564: warning: format '%d' expects type 'int', but argument 4 has type 'long unsigned int'

> I guess my '%d' should be '%zd', right?

No, that sounds even less portable :-(

Given the expected range of IndexTupleSize(), it seems sufficient to
cast its result to int and then use %d formatting. I've done that
in the latest commit.

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-patches(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, ishii(at)sraoss(dot)co(dot)jp
Subject: Re: pgstattuple extension for indexes
Date: 2006-09-04 03:49:13
Message-ID: 200609040349.k843nDv09463@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


Applied.

---------------------------------------------------------------------------

Satoshi Nagayasu wrote:
> Thanks Bruce,
>
> Here are updated Japanese README, and uninstall_pgstattuple.sql.
>
> Bruce Momjian wrote:
> > Patch applied. Thanks.
> >
> > I updated the README documentation for the new functions, attached. I
> > could not update the Japanese version of the README.
> >
> > ---------------------------------------------------------------------------
> >
> >
> > Satoshi Nagayasu wrote:
> >> Bruce,
> >>
> >> Attached patch has been cleaned up,
> >> and modified to be able to work with CVS HEAD.
> >>
> >> Thanks.
> >>
> >> Satoshi Nagayasu wrote:
> >>> Alvaro,
> >>>
> >>> Alvaro Herrera wrote:
> >>>> Huh, I bet it works with 8.1.4, but it doesn't work on CVS HEAD:
> >>>>
> >>>> /pgsql/source/00orig/contrib/pgstattuple/pgstatindex.c: In function 'GetBTPageStatistics':
> >>>> /pgsql/source/00orig/contrib/pgstattuple/pgstatindex.c:182: error: 'BTItem' undeclared (first use in this function)
> >>>>
> >>>>
> >>>> While you're at it, please consider removing C++ style comments and
> >>>> unused code.
> >>>>
> >>>> Formatting is way off as well, but I guess that is easily fixed with
> >>>> pgindent.
> >>> Thanks for comments. I'm going to fix my patch from now.
> >>>
> >>>> Regarding the pg_relpages function, why do you think it's necessary?
> >>>> (It returns the true number of blocks of a given relation). It may
> >>>> belong into core given a reasonable use case, but otherwise it doesn't
> >>>> seem to belong into pgstatindex (or pgstattuple for that matter).
> >>> I wanted to sample some pages from the table/index, and get their statistics
> >>> to know table/index conditions. I know pgstattuple() reports table
> >>> statistics, however, pgstattuple() generates heavy CPU and I/O load.
> >>>
> >>> When we need to sample some pages from table/index, we need to know
> >>> true number of blocks.
> >>>
> >>> I have another function, called pgstatpage(), to get information inside
> >>> a single block/page statistics of the table. pg_relpages() will be used
> >>> with this.
> >>>
> >>> Sorry for not mentioned in previous post about pgstatpage(),
> >>> but I've remembered about it just now.
> >>>
> >>> Many memories in my brain have already `paged-out` (too busy in last few months),
> >>> and some of them got `out-of-memory`. :^)
> >>>
> >>> Thanks.
> >>
> >> --
> >> NAGAYASU Satoshi <nagayasus(at)nttdata(dot)co(dot)jp>
> >> Phone: +81-3-3523-8122
> >
> >> diff -ruN pgstattuple.orig/Makefile pgstattuple/Makefile
> >> --- pgstattuple.orig/Makefile 2006-02-27 21:54:40.000000000 +0900
> >> +++ pgstattuple/Makefile 2006-08-14 09:28:58.000000000 +0900
> >> @@ -6,7 +6,7 @@
> >> #
> >> #-------------------------------------------------------------------------
> >>
> >> -SRCS = pgstattuple.c
> >> +SRCS = pgstattuple.c pgstatindex.c
> >>
> >> MODULE_big = pgstattuple
> >> OBJS = $(SRCS:.c=.o)
> >> diff -ruN pgstattuple.orig/pgstatindex.c pgstattuple/pgstatindex.c
> >> --- pgstattuple.orig/pgstatindex.c 1970-01-01 09:00:00.000000000 +0900
> >> +++ pgstattuple/pgstatindex.c 2006-08-14 11:24:23.000000000 +0900
> >> @@ -0,0 +1,706 @@
> >> +/*
> >> + * pgstatindex
> >> + *
> >> + * Copyright (c) 2006 Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>
> >> + *
> >> + * Permission to use, copy, modify, and distribute this software and
> >> + * its documentation for any purpose, without fee, and without a
> >> + * written agreement is hereby granted, provided that the above
> >> + * copyright notice and this paragraph and the following two
> >> + * paragraphs appear in all copies.
> >> + *
> >> + * IN NO EVENT SHALL THE AUTHOR BE LIABLE TO ANY PARTY FOR DIRECT,
> >> + * INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
> >> + * LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
> >> + * DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED
> >> + * OF THE POSSIBILITY OF SUCH DAMAGE.
> >> + *
> >> + * THE AUTHOR SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT
> >> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> >> + * A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS
> >> + * IS" BASIS, AND THE AUTHOR HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE,
> >> + * SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
> >> + */
> >> +
> >> +#include "postgres.h"
> >> +
> >> +#include "fmgr.h"
> >> +#include "funcapi.h"
> >> +#include "access/heapam.h"
> >> +#include "access/itup.h"
> >> +#include "access/nbtree.h"
> >> +#include "access/transam.h"
> >> +#include "catalog/namespace.h"
> >> +#include "catalog/pg_type.h"
> >> +#include "utils/builtins.h"
> >> +#include "utils/inval.h"
> >> +
> >> +PG_FUNCTION_INFO_V1(pgstatindex);
> >> +PG_FUNCTION_INFO_V1(bt_metap);
> >> +PG_FUNCTION_INFO_V1(bt_page_items);
> >> +PG_FUNCTION_INFO_V1(bt_page_stats);
> >> +PG_FUNCTION_INFO_V1(pg_relpages);
> >> +
> >> +extern Datum pgstatindex(PG_FUNCTION_ARGS);
> >> +extern Datum bt_metap(PG_FUNCTION_ARGS);
> >> +extern Datum bt_page_items(PG_FUNCTION_ARGS);
> >> +extern Datum bt_page_stats(PG_FUNCTION_ARGS);
> >> +extern Datum pg_relpages(PG_FUNCTION_ARGS);
> >> +
> >> +#define PGSTATINDEX_TYPE "public.pgstatindex_type"
> >> +#define PGSTATINDEX_NCOLUMNS 10
> >> +
> >> +#define BTMETAP_TYPE "public.bt_metap_type"
> >> +#define BTMETAP_NCOLUMNS 6
> >> +
> >> +#define BTPAGEITEMS_TYPE "public.bt_page_items_type"
> >> +#define BTPAGEITEMS_NCOLUMNS 6
> >> +
> >> +#define BTPAGESTATS_TYPE "public.bt_page_stats_type"
> >> +#define BTPAGESTATS_NCOLUMNS 11
> >> +
> >> +
> >> +#define IS_INDEX(r) ((r)->rd_rel->relkind == 'i')
> >> +#define IS_BTREE(r) ((r)->rd_rel->relam == BTREE_AM_OID)
> >> +
> >> +#define CHECK_PAGE_OFFSET_RANGE(page, offset) { \
> >> + if ( !(FirstOffsetNumber<=(offset) && \
> >> + (offset)<=PageGetMaxOffsetNumber(page)) ) \
> >> + elog(ERROR, "Page offset number out of range."); }
> >> +
> >> +#define CHECK_RELATION_BLOCK_RANGE(rel, blkno) { \
> >> + if ( (blkno)<0 && RelationGetNumberOfBlocks((rel))<=(blkno) ) \
> >> + elog(ERROR, "Block number out of range."); }
> >> +
> >> +/* ------------------------------------------------
> >> + * structure for single btree page statistics
> >> + * ------------------------------------------------
> >> + */
> >> +typedef struct BTPageStat
> >> +{
> >> + uint32 blkno;
> >> + uint32 live_items;
> >> + uint32 dead_items;
> >> + uint32 page_size;
> >> + uint32 max_avail;
> >> + uint32 free_size;
> >> + uint32 avg_item_size;
> >> + uint32 fragments;
> >> + char type;
> >> +
> >> + /* opaque data */
> >> + BlockNumber btpo_prev;
> >> + BlockNumber btpo_next;
> >> + union
> >> + {
> >> + uint32 level;
> >> + TransactionId xact;
> >> + } btpo;
> >> + uint16 btpo_flags;
> >> + BTCycleId btpo_cycleid;
> >> +} BTPageStat;
> >> +
> >> +/* ------------------------------------------------
> >> + * A structure for a whole btree index statistics
> >> + * used by pgstatindex().
> >> + * ------------------------------------------------
> >> + */
> >> +typedef struct BTIndexStat
> >> +{
> >> + uint32 magic;
> >> + uint32 version;
> >> + BlockNumber root_blkno;
> >> + uint32 level;
> >> +
> >> + BlockNumber fastroot;
> >> + uint32 fastlevel;
> >> +
> >> + uint32 live_items;
> >> + uint32 dead_items;
> >> +
> >> + uint32 root_pages;
> >> + uint32 internal_pages;
> >> + uint32 leaf_pages;
> >> + uint32 empty_pages;
> >> + uint32 deleted_pages;
> >> +
> >> + uint32 page_size;
> >> + uint32 avg_item_size;
> >> +
> >> + uint32 max_avail;
> >> + uint32 free_space;
> >> +
> >> + uint32 fragments;
> >> +} BTIndexStat;
> >> +
> >> +/* -------------------------------------------------
> >> + * GetBTPageStatistics()
> >> + *
> >> + * Collect statistics of single b-tree leaf page
> >> + * -------------------------------------------------
> >> + */
> >> +static bool
> >> +GetBTPageStatistics(BlockNumber blkno, Buffer buffer, BTPageStat * stat)
> >> +{
> >> + Page page = BufferGetPage(buffer);
> >> + PageHeader phdr = (PageHeader) page;
> >> + OffsetNumber maxoff = PageGetMaxOffsetNumber(page);
> >> + BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);
> >> + int item_size = 0;
> >> + int off;
> >> +
> >> + stat->blkno = blkno;
> >> +
> >> + stat->max_avail = BLCKSZ - (BLCKSZ - phdr->pd_special + SizeOfPageHeaderData);
> >> +
> >> + stat->dead_items = stat->live_items = 0;
> >> +
> >> + stat->page_size = PageGetPageSize(page);
> >> +
> >> + /* page type (flags) */
> >> + if (P_ISDELETED(opaque))
> >> + {
> >> + stat->type = 'd';
> >> + return true;
> >> + }
> >> + else if (P_IGNORE(opaque))
> >> + stat->type = 'e';
> >> + else if (P_ISLEAF(opaque))
> >> + stat->type = 'l';
> >> + else if (P_ISROOT(opaque))
> >> + stat->type = 'r';
> >> + else
> >> + stat->type = 'i';
> >> +
> >> + /* btpage opaque data */
> >> + stat->btpo_prev = opaque->btpo_prev;
> >> + stat->btpo_next = opaque->btpo_next;
> >> + if (P_ISDELETED(opaque))
> >> + stat->btpo.xact = opaque->btpo.xact;
> >> + else
> >> + stat->btpo.level = opaque->btpo.level;
> >> + stat->btpo_flags = opaque->btpo_flags;
> >> + stat->btpo_cycleid = opaque->btpo_cycleid;
> >> +
> >> + /*----------------------------------------------
> >> + * If a next leaf is on the previous block,
> >> + * it means a fragmentation.
> >> + *----------------------------------------------
> >> + */
> >> + stat->fragments = 0;
> >> + if (stat->type == 'l')
> >> + {
> >> + if (opaque->btpo_next != P_NONE && opaque->btpo_next < blkno)
> >> + stat->fragments++;
> >> + }
> >> +
> >> + /* count live and dead tuples, and free space */
> >> + for (off = FirstOffsetNumber; off <= maxoff; off++)
> >> + {
> >> + IndexTuple itup;
> >> +
> >> + ItemId id = PageGetItemId(page, off);
> >> +
> >> + itup = (IndexTuple) PageGetItem(page, id);
> >> +
> >> + item_size += IndexTupleSize(itup);
> >> +
> >> + if (!ItemIdDeleted(id))
> >> + stat->live_items++;
> >> + else
> >> + stat->dead_items++;
> >> + }
> >> + stat->free_size = PageGetFreeSpace(page);
> >> +
> >> + if ((stat->live_items + stat->dead_items) > 0)
> >> + stat->avg_item_size = item_size / (stat->live_items + stat->dead_items);
> >> + else
> >> + stat->avg_item_size = 0;
> >> +
> >> + return true;
> >> +}
> >> +
> >> +
> >> +/* ------------------------------------------------------
> >> + * pgstatindex()
> >> + *
> >> + * Usage: SELECT * FROM pgstatindex('t1_pkey');
> >> + * ------------------------------------------------------
> >> + */
> >> +Datum
> >> +pgstatindex(PG_FUNCTION_ARGS)
> >> +{
> >> + text *relname = PG_GETARG_TEXT_P(0);
> >> + Relation rel;
> >> + RangeVar *relrv;
> >> + Datum result;
> >> + uint32 nblocks;
> >> + uint32 blkno;
> >> + BTIndexStat indexStat;
> >> +
> >> + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
> >> + rel = relation_openrv(relrv, AccessShareLock);
> >> +
> >> + if (!IS_INDEX(rel) || !IS_BTREE(rel))
> >> + elog(ERROR, "pgstatindex() can be used only on b-tree index.");
> >> +
> >> + /*-------------------
> >> + * Read a metapage
> >> + *-------------------
> >> + */
> >> + {
> >> + Buffer buffer = ReadBuffer(rel, 0);
> >> + Page page = BufferGetPage(buffer);
> >> + BTMetaPageData *metad = BTPageGetMeta(page);
> >> +
> >> + indexStat.magic = metad->btm_magic;
> >> + indexStat.version = metad->btm_version;
> >> + indexStat.root_blkno = metad->btm_root;
> >> + indexStat.level = metad->btm_level;
> >> + indexStat.fastroot = metad->btm_fastroot;
> >> + indexStat.fastlevel = metad->btm_fastlevel;
> >> +
> >> + ReleaseBuffer(buffer);
> >> + }
> >> +
> >> + nblocks = RelationGetNumberOfBlocks(rel);
> >> +
> >> + /* -- init stat -- */
> >> + indexStat.fragments = 0;
> >> +
> >> + indexStat.root_pages = 0;
> >> + indexStat.leaf_pages = 0;
> >> + indexStat.internal_pages = 0;
> >> + indexStat.empty_pages = 0;
> >> + indexStat.deleted_pages = 0;
> >> +
> >> + indexStat.max_avail = 0;
> >> + indexStat.free_space = 0;
> >> +
> >> + /*-----------------------
> >> + * Scan all blocks
> >> + *-----------------------
> >> + */
> >> + for (blkno = 1; blkno < nblocks; blkno++)
> >> + {
> >> + Buffer buffer = ReadBuffer(rel, blkno);
> >> + BTPageStat stat;
> >> +
> >> + /* scan one page */
> >> + stat.blkno = blkno;
> >> + GetBTPageStatistics(blkno, buffer, &stat);
> >> +
> >> + /*---------------------
> >> + * page status (type)
> >> + *---------------------
> >> + */
> >> + switch (stat.type)
> >> + {
> >> + case 'd':
> >> + indexStat.deleted_pages++;
> >> + break;
> >> + case 'l':
> >> + indexStat.leaf_pages++;
> >> + break;
> >> + case 'i':
> >> + indexStat.internal_pages++;
> >> + break;
> >> + case 'e':
> >> + indexStat.empty_pages++;
> >> + break;
> >> + case 'r':
> >> + indexStat.root_pages++;
> >> + break;
> >> + default:
> >> + elog(ERROR, "unknown page status.");
> >> + }
> >> +
> >> + /* -- leaf fragmentation -- */
> >> + indexStat.fragments += stat.fragments;
> >> +
> >> + if (stat.type == 'l')
> >> + {
> >> + indexStat.max_avail += stat.max_avail;
> >> + indexStat.free_space += stat.free_size;
> >> + }
> >> +
> >> + ReleaseBuffer(buffer);
> >> + }
> >> +
> >> + relation_close(rel, AccessShareLock);
> >> +
> >> + /*----------------------------
> >> + * Build a result tuple
> >> + *----------------------------
> >> + */
> >> + {
> >> + TupleDesc tupleDesc;
> >> + int j;
> >> + char *values[PGSTATINDEX_NCOLUMNS];
> >> +
> >> + HeapTupleData tupleData;
> >> + HeapTuple tuple = &tupleData;
> >> +
> >> + tupleDesc = RelationNameGetTupleDesc(PGSTATINDEX_TYPE);
> >> +
> >> + j = 0;
> >> + values[j] = palloc(32);
> >> + snprintf(values[j++], 32, "%d", indexStat.version);
> >> + values[j] = palloc(32);
> >> + snprintf(values[j++], 32, "%d", indexStat.level);
> >> + values[j] = palloc(32);
> >> + snprintf(values[j++], 32, "%d", (indexStat.root_pages +
> >> + indexStat.leaf_pages +
> >> + indexStat.internal_pages +
> >> + indexStat.deleted_pages +
> >> + indexStat.empty_pages) * BLCKSZ);
> >> + values[j] = palloc(32);
> >> + snprintf(values[j++], 32, "%d", indexStat.root_blkno);
> >> + values[j] = palloc(32);
> >> + snprintf(values[j++], 32, "%d", indexStat.internal_pages);
> >> + values[j] = palloc(32);
> >> + snprintf(values[j++], 32, "%d", indexStat.leaf_pages);
> >> + values[j] = palloc(32);
> >> + snprintf(values[j++], 32, "%d", indexStat.empty_pages);
> >> + values[j] = palloc(32);
> >> + snprintf(values[j++], 32, "%d", indexStat.deleted_pages);
> >> + values[j] = palloc(32);
> >> + snprintf(values[j++], 32, "%.2f", 100.0 - (float) indexStat.free_space / (float) indexStat.max_avail * 100.0);
> >> + values[j] = palloc(32);
> >> + snprintf(values[j++], 32, "%.2f", (float) indexStat.fragments / (float) indexStat.leaf_pages * 100.0);
> >> +
> >> + tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
> >> + values);
> >> +
> >> + result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
> >> + }
> >> +
> >> + PG_RETURN_DATUM(result);
> >> +}
> >> +
> >> +/* -----------------------------------------------
> >> + * bt_page()
> >> + *
> >> + * Usage: SELECT * FROM bt_page('t1_pkey', 0);
> >> + * -----------------------------------------------
> >> + */
> >> +Datum
> >> +bt_page_stats(PG_FUNCTION_ARGS)
> >> +{
> >> + text *relname = PG_GETARG_TEXT_P(0);
> >> + uint32 blkno = PG_GETARG_UINT32(1);
> >> + Buffer buffer;
> >> +
> >> + Relation rel;
> >> + RangeVar *relrv;
> >> + Datum result;
> >> +
> >> + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
> >> + rel = relation_openrv(relrv, AccessShareLock);
> >> +
> >> + CHECK_RELATION_BLOCK_RANGE(rel, blkno);
> >> +
> >> + buffer = ReadBuffer(rel, blkno);
> >> +
> >> + if (!IS_INDEX(rel) || !IS_BTREE(rel))
> >> + elog(ERROR, "bt_page_stats() can be used only on b-tree index.");
> >> +
> >> + if (blkno == 0)
> >> + elog(ERROR, "Block 0 is a meta page.");
> >> +
> >> + {
> >> + HeapTuple tuple;
> >> + TupleDesc tupleDesc;
> >> + int j;
> >> + char *values[BTPAGESTATS_NCOLUMNS];
> >> +
> >> + BTPageStat stat;
> >> +
> >> + GetBTPageStatistics(blkno, buffer, &stat);
> >> +
> >> + tupleDesc = RelationNameGetTupleDesc(BTPAGESTATS_TYPE);
> >> +
> >> + j = 0;
> >> + values[j] = palloc(32);
> >> + snprintf(values[j++], 32, "%d", stat.blkno);
> >> +
> >> + values[j] = palloc(32);
> >> + snprintf(values[j++], 32, "%c", stat.type);
> >> + values[j] = palloc(32);
> >> + snprintf(values[j++], 32, "%d", stat.live_items);
> >> + values[j] = palloc(32);
> >> + snprintf(values[j++], 32, "%d", stat.dead_items);
> >> + values[j] = palloc(32);
> >> + snprintf(values[j++], 32, "%d", stat.avg_item_size);
> >> + values[j] = palloc(32);
> >> + snprintf(values[j++], 32, "%d", stat.page_size);
> >> + values[j] = palloc(32);
> >> + snprintf(values[j++], 32, "%d", stat.free_size);
> >> + values[j] = palloc(32);
> >> + snprintf(values[j++], 32, "%d", stat.btpo_prev);
> >> + values[j] = palloc(32);
> >> + snprintf(values[j++], 32, "%d", stat.btpo_next);
> >> +
> >> + values[j] = palloc(32);
> >> + if (stat.type == 'd')
> >> + snprintf(values[j++], 32, "%d", stat.btpo.xact);
> >> + else
> >> + snprintf(values[j++], 32, "%d", stat.btpo.level);
> >> +
> >> + values[j] = palloc(32);
> >> + snprintf(values[j++], 32, "%d", stat.btpo_flags);
> >> +
> >> + tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
> >> + values);
> >> +
> >> + result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
> >> + }
> >> +
> >> + ReleaseBuffer(buffer);
> >> +
> >> + relation_close(rel, AccessShareLock);
> >> +
> >> + PG_RETURN_DATUM(result);
> >> +}
> >> +
> >> +/*-------------------------------------------------------
> >> + * bt_page_items()
> >> + *
> >> + * Get IndexTupleData set in a leaf page
> >> + *
> >> + * Usage: SELECT * FROM bt_page_items('t1_pkey', 0);
> >> + *-------------------------------------------------------
> >> + */
> >> +/* ---------------------------------------------------
> >> + * data structure for SRF to hold a scan information
> >> + * ---------------------------------------------------
> >> + */
> >> +struct user_args
> >> +{
> >> + TupleDesc tupd;
> >> + Relation rel;
> >> + Buffer buffer;
> >> + Page page;
> >> + uint16 offset;
> >> +};
> >> +
> >> +Datum
> >> +bt_page_items(PG_FUNCTION_ARGS)
> >> +{
> >> + text *relname = PG_GETARG_TEXT_P(0);
> >> + uint32 blkno = PG_GETARG_UINT32(1);
> >> +
> >> + RangeVar *relrv;
> >> + Datum result;
> >> + char *values[BTPAGEITEMS_NCOLUMNS];
> >> + BTPageOpaque opaque;
> >> + HeapTuple tuple;
> >> + ItemId id;
> >> +
> >> + FuncCallContext *fctx;
> >> + MemoryContext mctx;
> >> + struct user_args *uargs = NULL;
> >> +
> >> + if (blkno == 0)
> >> + elog(ERROR, "Block 0 is a meta page.");
> >> +
> >> + if (SRF_IS_FIRSTCALL())
> >> + {
> >> + fctx = SRF_FIRSTCALL_INIT();
> >> + mctx = MemoryContextSwitchTo(fctx->multi_call_memory_ctx);
> >> +
> >> + uargs = palloc(sizeof(struct user_args));
> >> +
> >> + uargs->tupd = RelationNameGetTupleDesc(BTPAGEITEMS_TYPE);
> >> + uargs->offset = FirstOffsetNumber;
> >> +
> >> + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
> >> + uargs->rel = relation_openrv(relrv, AccessShareLock);
> >> +
> >> + CHECK_RELATION_BLOCK_RANGE(uargs->rel, blkno);
> >> +
> >> + uargs->buffer = ReadBuffer(uargs->rel, blkno);
> >> +
> >> + if (!IS_INDEX(uargs->rel) || !IS_BTREE(uargs->rel))
> >> + elog(ERROR, "bt_page_items() can be used only on b-tree index.");
> >> +
> >> + uargs->page = BufferGetPage(uargs->buffer);
> >> +
> >> + opaque = (BTPageOpaque) PageGetSpecialPointer(uargs->page);
> >> +
> >> + if (P_ISDELETED(opaque))
> >> + elog(NOTICE, "bt_page_items(): this page is deleted.");
> >> +
> >> + fctx->max_calls = PageGetMaxOffsetNumber(uargs->page);
> >> + fctx->user_fctx = uargs;
> >> +
> >> + MemoryContextSwitchTo(mctx);
> >> + }
> >> +
> >> + fctx = SRF_PERCALL_SETUP();
> >> + uargs = fctx->user_fctx;
> >> +
> >> + if (fctx->call_cntr < fctx->max_calls)
> >> + {
> >> + IndexTuple itup;
> >> +
> >> + id = PageGetItemId(uargs->page, uargs->offset);
> >> +
> >> + if (!ItemIdIsValid(id))
> >> + elog(ERROR, "Invalid ItemId.");
> >> +
> >> + itup = (IndexTuple) PageGetItem(uargs->page, id);
> >> +
> >> + {
> >> + int j = 0;
> >> +
> >> + BlockNumber blkno = BlockIdGetBlockNumber(&(itup->t_tid.ip_blkid));
> >> +
> >> + values[j] = palloc(32);
> >> + snprintf(values[j++], 32, "%d", uargs->offset);
> >> + values[j] = palloc(32);
> >> + snprintf(values[j++], 32, "(%u,%u)", blkno, itup->t_tid.ip_posid);
> >> + values[j] = palloc(32);
> >> + snprintf(values[j++], 32, "%d", IndexTupleSize(itup));
> >> + values[j] = palloc(32);
> >> + snprintf(values[j++], 32, "%c", IndexTupleHasNulls(itup) ? 't' : 'f');
> >> + values[j] = palloc(32);
> >> + snprintf(values[j++], 32, "%c", IndexTupleHasVarwidths(itup) ? 't' : 'f');
> >> +
> >> + {
> >> + int off;
> >> + char *dump;
> >> + char *ptr = (char *) itup + IndexInfoFindDataOffset(itup->t_info);
> >> +
> >> + dump = palloc(IndexTupleSize(itup) * 3);
> >> + memset(dump, 0, IndexTupleSize(itup) * 3);
> >> +
> >> + for (off = 0;
> >> + off < IndexTupleSize(itup) - IndexInfoFindDataOffset(itup->t_info);
> >> + off++)
> >> + {
> >> + if (dump[0] == '\0')
> >> + sprintf(dump, "%02x", *(ptr + off) & 0xff);
> >> + else
> >> + {
> >> + char buf[4];
> >> +
> >> + sprintf(buf, " %02x", *(ptr + off) & 0xff);
> >> + strcat(dump, buf);
> >> + }
> >> + }
> >> + values[j] = dump;
> >> + }
> >> +
> >> + tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(uargs->tupd), values);
> >> + result = TupleGetDatum(TupleDescGetSlot(uargs->tupd), tuple);
> >> + }
> >> +
> >> + uargs->offset = uargs->offset + 1;
> >> +
> >> + SRF_RETURN_NEXT(fctx, result);
> >> + }
> >> + else
> >> + {
> >> + ReleaseBuffer(uargs->buffer);
> >> + relation_close(uargs->rel, AccessShareLock);
> >> +
> >> + SRF_RETURN_DONE(fctx);
> >> + }
> >> +}
> >> +
> >> +
> >> +/* ------------------------------------------------
> >> + * bt_metap()
> >> + *
> >> + * Get a btree meta-page information
> >> + *
> >> + * Usage: SELECT * FROM bt_metap('t1_pkey')
> >> + * ------------------------------------------------
> >> + */
> >> +Datum
> >> +bt_metap(PG_FUNCTION_ARGS)
> >> +{
> >> + text *relname = PG_GETARG_TEXT_P(0);
> >> + Buffer buffer;
> >> +
> >> + Relation rel;
> >> + RangeVar *relrv;
> >> + Datum result;
> >> +
> >> + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
> >> + rel = relation_openrv(relrv, AccessShareLock);
> >> +
> >> + if (!IS_INDEX(rel) || !IS_BTREE(rel))
> >> + elog(ERROR, "bt_metap() can be used only on b-tree index.");
> >> +
> >> + buffer = ReadBuffer(rel, 0);
> >> +
> >> + {
> >> + BTMetaPageData *metad;
> >> +
> >> + TupleDesc tupleDesc;
> >> + int j;
> >> + char *values[BTMETAP_NCOLUMNS];
> >> + HeapTuple tuple;
> >> +
> >> + Page page = BufferGetPage(buffer);
> >> +
> >> + metad = BTPageGetMeta(page);
> >> +
> >> + tupleDesc = RelationNameGetTupleDesc(BTMETAP_TYPE);
> >> +
> >> + j = 0;
> >> + values[j] = palloc(32);
> >> + snprintf(values[j++], 32, "%d", metad->btm_magic);
> >> + values[j] = palloc(32);
> >> + snprintf(values[j++], 32, "%d", metad->btm_version);
> >> + values[j] = palloc(32);
> >> + snprintf(values[j++], 32, "%d", metad->btm_root);
> >> + values[j] = palloc(32);
> >> + snprintf(values[j++], 32, "%d", metad->btm_level);
> >> + values[j] = palloc(32);
> >> + snprintf(values[j++], 32, "%d", metad->btm_fastroot);
> >> + values[j] = palloc(32);
> >> + snprintf(values[j++], 32, "%d", metad->btm_fastlevel);
> >> +
> >> + tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
> >> + values);
> >> +
> >> + result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
> >> + }
> >> +
> >> + ReleaseBuffer(buffer);
> >> +
> >> + relation_close(rel, AccessShareLock);
> >> +
> >> + PG_RETURN_DATUM(result);
> >> +}
> >> +
> >> +/* --------------------------------------------------------
> >> + * pg_relpages()
> >> + *
> >> + * Get a number of pages of the table/index.
> >> + *
> >> + * Usage: SELECT pg_relpages('t1');
> >> + * SELECT pg_relpages('t1_pkey');
> >> + * --------------------------------------------------------
> >> + */
> >> +Datum
> >> +pg_relpages(PG_FUNCTION_ARGS)
> >> +{
> >> + text *relname = PG_GETARG_TEXT_P(0);
> >> +
> >> + Relation rel;
> >> + RangeVar *relrv;
> >> + int4 relpages;
> >> +
> >> + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
> >> + rel = relation_openrv(relrv, AccessShareLock);
> >> +
> >> + relpages = RelationGetNumberOfBlocks(rel);
> >> +
> >> + relation_close(rel, AccessShareLock);
> >> +
> >> + PG_RETURN_INT32(relpages);
> >> +}
> >> diff -ruN pgstattuple.orig/pgstattuple.sql.in pgstattuple/pgstattuple.sql.in
> >> --- pgstattuple.orig/pgstattuple.sql.in 2006-02-28 01:09:50.000000000 +0900
> >> +++ pgstattuple/pgstattuple.sql.in 2006-08-14 10:37:32.000000000 +0900
> >> @@ -22,3 +22,96 @@
> >> RETURNS pgstattuple_type
> >> AS 'MODULE_PATHNAME', 'pgstattuplebyid'
> >> LANGUAGE C STRICT;
> >> +
> >> +--
> >> +-- pgstatindex
> >> +--
> >> +DROP TYPE pgstatindex_type CASCADE;
> >> +CREATE TYPE pgstatindex_type AS (
> >> + version int4,
> >> + tree_level int4,
> >> + index_size int4,
> >> + root_block_no int4,
> >> + internal_pages int4,
> >> + leaf_pages int4,
> >> + empty_pages int4,
> >> + deleted_pages int4,
> >> + avg_leaf_density float8,
> >> + leaf_fragmentation float8
> >> +);
> >> +
> >> +CREATE OR REPLACE FUNCTION pgstatindex(text)
> >> +RETURNS pgstatindex_type
> >> +AS 'MODULE_PATHNAME', 'pgstatindex'
> >> +LANGUAGE 'C' STRICT;
> >> +
> >> +--
> >> +-- bt_metap()
> >> +--
> >> +DROP TYPE bt_metap_type CASCADE;
> >> +CREATE TYPE bt_metap_type AS (
> >> + magic int4,
> >> + version int4,
> >> + root int4,
> >> + level int4,
> >> + fastroot int4,
> >> + fastlevel int4
> >> +);
> >> +
> >> +CREATE OR REPLACE FUNCTION bt_metap(text)
> >> +RETURNS bt_metap_type
> >> +AS 'MODULE_PATHNAME', 'bt_metap'
> >> +LANGUAGE 'C' STRICT;
> >> +
> >> +--
> >> +-- bt_page_items()
> >> +--
> >> +DROP TYPE bt_page_items_type CASCADE;
> >> +CREATE TYPE bt_page_items_type AS (
> >> + itemoffset int4,
> >> + ctid tid,
> >> + itemlen int4,
> >> + nulls bool,
> >> + vars bool,
> >> + data text
> >> +);
> >> +
> >> +DROP FUNCTION bt_page_items(text, int4);
> >> +
> >> +CREATE OR REPLACE FUNCTION bt_page_items(text, int4)
> >> +RETURNS SETOF bt_page_items_type
> >> +AS 'MODULE_PATHNAME', 'bt_page_items'
> >> +LANGUAGE 'C' STRICT;
> >> +
> >> +--
> >> +-- bt_page_stats()
> >> +--
> >> +DROP TYPE bt_page_stats_type CASCADE;
> >> +CREATE TYPE bt_page_stats_type AS (
> >> + blkno int4,
> >> + type char,
> >> + live_items int4,
> >> + dead_items int4,
> >> + avg_item_size float,
> >> + page_size int4,
> >> + free_size int4,
> >> + btpo_prev int4,
> >> + btpo_next int4,
> >> + btpo int4,
> >> + btpo_flags int4
> >> +);
> >> +
> >> +DROP FUNCTION bt_page_stats(text, int4);
> >> +
> >> +CREATE OR REPLACE FUNCTION bt_page_stats(text, int4)
> >> +RETURNS bt_page_stats_type
> >> +AS 'MODULE_PATHNAME', 'bt_page_stats'
> >> +LANGUAGE 'C' STRICT;
> >> +
> >> +--
> >> +-- pg_relpages()
> >> +--
> >> +CREATE OR REPLACE FUNCTION pg_relpages(text)
> >> +RETURNS int
> >> +AS 'MODULE_PATHNAME', 'pg_relpages'
> >> +LANGUAGE 'C' STRICT;
> >> diff -ruN pgstattuple.orig/test.sh pgstattuple/test.sh
> >> --- pgstattuple.orig/test.sh 1970-01-01 09:00:00.000000000 +0900
> >> +++ pgstattuple/test.sh 2006-08-14 10:23:08.000000000 +0900
> >> @@ -0,0 +1,27 @@
> >> +#!/bin/sh
> >> +
> >> +PGHOME=/home/snaga/pgsql20060814
> >> +export PATH=${PGHOME}/bin:$PATH
> >> +
> >> +psql -p 9999 pgbench<<EOF
> >> +DROP FUNCTION pgstatindex(text);
> >> +EOF
> >> +
> >> +psql -p 9999 -f ${PGHOME}/share/contrib/pgstattuple.sql pgbench
> >> +
> >> +psql -p 9999 pgbench<<EOF
> >> +SELECT * FROM pg_relpages('accounts_pkey');
> >> +\x
> >> +SELECT * FROM pgstatindex('accounts_pkey');
> >> +SELECT * FROM bt_metap('accounts_pkey');
> >> +\x
> >> +SELECT * FROM bt_page_items('accounts_pkey', 0);
> >> +SELECT * FROM bt_page_items('accounts_pkey', 1) limit 20;
> >> +
> >> +SELECT * FROM bt_page_stats('accounts_pkey', 1);
> >> +\x
> >> +SELECT * FROM bt_page_stats('accounts_pkey', 1);
> >> +SELECT * FROM bt_page_stats('accounts_pkey', 361);
> >> +\x
> >> +SELECT * FROM bt_page_items('accounts_pkey', 361);
> >> +EOF
> >>
> >>
> >> ------------------------------------------------------------------------
> >>
> >> pgstattuple README 2002/08/29 Tatsuo Ishii
> >>
> >> 1. Functions supported:
> >>
> >> pgstattuple
> >> -----------
> >> pgstattuple() returns the relation length, percentage of the "dead"
> >> tuples of a relation and other info. This may help users to determine
> >> whether vacuum is necessary or not. Here is an example session:
> >>
> >> test=> \x
> >> Expanded display is on.
> >> test=> SELECT * FROM pgstattuple('pg_catalog.pg_proc');
> >> -[ RECORD 1 ]------+-------
> >> table_len | 458752
> >> tuple_count | 1470
> >> tuple_len | 438896
> >> tuple_percent | 95.67
> >> dead_tuple_count | 11
> >> dead_tuple_len | 3157
> >> dead_tuple_percent | 0.69
> >> free_space | 8932
> >> free_percent | 1.95
> >>
> >> Here are explanations for each column:
> >>
> >> table_len -- physical relation length in bytes
> >> tuple_count -- number of live tuples
> >> tuple_len -- total tuples length in bytes
> >> tuple_percent -- live tuples in %
> >> dead_tuple_len -- total dead tuples length in bytes
> >> dead_tuple_percent -- dead tuples in %
> >> free_space -- free space in bytes
> >> free_percent -- free space in %
> >>
> >> pg_relpages
> >> -----------
> >> pg_relpages() returns the number of pages in the relation.
> >>
> >> pgstatindex
> >> -----------
> >> pgstatindex() returns an array showing the information about an index:
> >>
> >> test=> \x
> >> Expanded display is on.
> >> test=> SELECT * FROM pgstatindex('pg_cast_oid_index');
> >> -[ RECORD 1 ]------+------
> >> version | 2
> >> tree_level | 0
> >> index_size | 8192
> >> root_block_no | 1
> >> internal_pages | 0
> >> leaf_pages | 1
> >> empty_pages | 0
> >> deleted_pages | 0
> >> avg_leaf_density | 50.27
> >> leaf_fragmentation | 0
> >>
> >> bt_metap
> >> --------
> >> bt_metap() returns information about the btree index metapage:
> >>
> >> test=> SELECT * FROM bt_metap('pg_cast_oid_index');
> >> -[ RECORD 1 ]-----
> >> magic | 340322
> >> version | 2
> >> root | 1
> >> level | 0
> >> fastroot | 1
> >> fastlevel | 0
> >>
> >> bt_page_stats
> >> -------------
> >> bt_page_stats() shows information about single btree pages:
> >>
> >> test=> SELECT * FROM bt_page_stats('pg_cast_oid_index', 1);
> >> -[ RECORD 1 ]-+-----
> >> blkno | 1
> >> type | l
> >> live_items | 256
> >> dead_items | 0
> >> avg_item_size | 12
> >> page_size | 8192
> >> free_size | 4056
> >> btpo_prev | 0
> >> btpo_next | 0
> >> btpo | 0
> >> btpo_flags | 3
> >>
> >> bt_page_items
> >> -------------
> >> bt_page_items() returns information about specific items on btree pages:
> >>
> >> test=> SELECT * FROM bt_page_items('pg_cast_oid_index', 1);
> >> itemoffset | ctid | itemlen | nulls | vars | data
> >> ------------+---------+---------+-------+------+-------------
> >> 1 | (0,1) | 12 | f | f | 23 27 00 00
> >> 2 | (0,2) | 12 | f | f | 24 27 00 00
> >> 3 | (0,3) | 12 | f | f | 25 27 00 00
> >> 4 | (0,4) | 12 | f | f | 26 27 00 00
> >> 5 | (0,5) | 12 | f | f | 27 27 00 00
> >> 6 | (0,6) | 12 | f | f | 28 27 00 00
> >> 7 | (0,7) | 12 | f | f | 29 27 00 00
> >> 8 | (0,8) | 12 | f | f | 2a 27 00 00
> >>
> >>
> >> 2. Installing pgstattuple
> >>
> >> $ make
> >> $ make install
> >> $ psql -e -f /usr/local/pgsql/share/contrib/pgstattuple.sql test
> >>
> >>
> >> 3. Using pgstattuple
> >>
> >> pgstattuple may be called as a relation function and is
> >> defined as follows:
> >>
> >> CREATE OR REPLACE FUNCTION pgstattuple(text) RETURNS pgstattuple_type
> >> AS 'MODULE_PATHNAME', 'pgstattuple'
> >> LANGUAGE C STRICT;
> >>
> >> CREATE OR REPLACE FUNCTION pgstattuple(oid) RETURNS pgstattuple_type
> >> AS 'MODULE_PATHNAME', 'pgstattuplebyid'
> >> LANGUAGE C STRICT;
> >>
> >> The argument is the relation name (optionally it may be qualified)
> >> or the OID of the relation. Note that pgstattuple only returns
> >> one row.
> >>
> >>
> >> 4. Notes
> >>
> >> pgstattuple acquires only a read lock on the relation. So concurrent
> >> update may affect the result.
> >>
> >> pgstattuple judges a tuple is "dead" if HeapTupleSatisfiesNow()
> >> returns false.
> >>
> >>
> >> 5. History
> >>
> >> 2006/06/28
> >>
> >> Extended to work against indexes.
>
>
> --
> NAGAYASU Satoshi <nagayasus(at)nttdata(dot)co(dot)jp>
> Phone: +81-3-3523-8122

> $PostgreSQL: pgsql/contrib/pgstattuple/README.pgstattuple.euc_jp,v 1.7 2006/07/06 02:18:07 momjian Exp $
>
> pgstattuple README 2002/08/22
>
> 1.
>
> pgstattuple
> -----------
> pgstattupleUPDATEDELETE
>
> vacuum
>
>
> pgstattuple() ""
> vacuum
>
>
> test=> \x
> Expanded display is on.
> test=> SELECT * FROM pgstattuple('pg_catalog.pg_proc');
> -[ RECORD 1 ]------+-------
> table_len | 458752
> tuple_count | 1470
> tuple_len | 438896
> tuple_percent | 95.67
> dead_tuple_count | 11
> dead_tuple_len | 3157
> dead_tuple_percent | 0.69
> free_space | 8932
> free_percent | 1.95
>
>
>
> table_len -- ()
> tuple_count --
> tuple_len -- ()
> tuple_percent -- table_lentuple_len
> dead_tuple_len --
> dead_tuple_percent -- table_lentuple_len
> free_space -- ()
> free_percent -- table_lenfree_space
>
> pg_relpages
> -----------
> pg_relpages()
>
> pgstatindex
> -----------
> pgstatindex()
>
> test=> \x
> Expanded display is on.
> test=> SELECT * FROM pgstatindex('pg_cast_oid_index');
> -[ RECORD 1 ]------+------
> version | 2
> tree_level | 0
> index_size | 8192
> root_block_no | 1
> internal_pages | 0
> leaf_pages | 1
> empty_pages | 0
> deleted_pages | 0
> avg_leaf_density | 50.27
> leaf_fragmentation | 0
>
> bt_metap
> --------
> bt_metap() btree
>
> test=> SELECT * FROM bt_metap('pg_cast_oid_index');
> -[ RECORD 1 ]-----
> magic | 340322
> version | 2
> root | 1
> level | 0
> fastroot | 1
> fastlevel | 0
>
> bt_page_stats
> -------------
> bt_page_stats() btree
>
> test=> SELECT * FROM bt_page_stats('pg_cast_oid_index', 1);
> -[ RECORD 1 ]-+-----
> blkno | 1
> type | l
> live_items | 256
> dead_items | 0
> avg_item_size | 12
> page_size | 8192
> free_size | 4056
> btpo_prev | 0
> btpo_next | 0
> btpo | 0
> btpo_flags | 3
>
> bt_page_items
> -------------
> bt_page_items() btree
>
> test=> SELECT * FROM bt_page_items('pg_cast_oid_index', 1);
> itemoffset | ctid | itemlen | nulls | vars | data
> ------------+---------+---------+-------+------+-------------
> 1 | (0,1) | 12 | f | f | 23 27 00 00
> 2 | (0,2) | 12 | f | f | 24 27 00 00
> 3 | (0,3) | 12 | f | f | 25 27 00 00
> 4 | (0,4) | 12 | f | f | 26 27 00 00
> 5 | (0,5) | 12 | f | f | 27 27 00 00
> 6 | (0,6) | 12 | f | f | 28 27 00 00
> 7 | (0,7) | 12 | f | f | 29 27 00 00
> 8 | (0,8) | 12 | f | f | 2a 27 00 00
>
>
> 2. pgstattuple
>
> PostgreSQL/usr/local/pgsqltest
> pgstattuple
>
> $ make
> $ make install
>
>
>
> $ psql -e -f /usr/local/pgsql/share/contrib/pgstattuple.sql test
>
>
> 3. pgstattuple
>
> pgstattuple
>
> CREATE OR REPLACE FUNCTION pgstattuple(text) RETURNS pgstattuple_type
> AS 'MODULE_PATHNAME', 'pgstattuple'
> LANGUAGE C STRICT;
>
> :
>
> pgstattuple_type
>
> pgstattupleAccessShareLock
> pgstattuple
>
>
> pgstattuple
> HeapTupleSatisfiesNow()
>
> 4. pgstattuple
>
> pgstattuple.cpgstattuple
> pgstattuple
>
>
> 5.
>
> 2006/06/28
>
>
>
> 2002/09/04
>
> SRFTom Lane
> README
>
> 2002/08/23
>
> SRF(Set Returning Function)7.3
>
> 2001/12/20 Tom Lane
>
> Fix pgstattuple to acquire a read lock on the target table. This
> prevents embarassments such as having the table dropped or truncated
> partway through the scan. Also, fix free space calculation to include
> pages that currently contain no tuples.
>
> 2001/10/01 PostgreSQL 7.2 contrib module
>
> 2001/08/30 pgstattuple 0.1
>
> -- Adjust this setting to control where the objects get created.
> SET search_path = public;
>
> DROP FUNCTION pgstattuple(oid);
>
> DROP FUNCTION pgstattuple(text);
>
> DROP TYPE pgstattuple_type;
>
> DROP FUNCTION pgstatindex(text);
> DROP FUNCTION bt_metap(text);
> DROP FUNCTION bt_page_items(text, int4);
> DROP FUNCTION bt_page_stats(text, int4);
> DROP FUNCTION pg_relpages(text);
>
> DROP TYPE pgstatindex_type;
> DROP TYPE bt_metap_type;
> DROP TYPE bt_page_items_type;
> DROP TYPE bt_page_stats_type;

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +