Re: Subject: bool / vacuum full bug followup part 2

Lists: pgsql-general
From: Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com>
To: <pgsql-general(at)postgresql(dot)org>
Subject: bool / vacuum full bug followup
Date: 2002-05-02 22:30:15
Message-ID: Pine.LNX.4.33.0205021624280.15253-100000@css120.ihs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

OK, this is just beyond weird. I swear to GOD that my table size was
growing. I'm on a workstation that doesn't accept connections from
anybody but me.

I was running 7.2 for weeks.
I created a simple table:
create table test (buf text, yn bool);
I run an explain:
explain select * from test where yn=true;
I run an analyze:
analyze;
I update the table:
update test set yn=true where yn=true;
update test set yn=true where yn=true;
update test set yn=true where yn=true;
I do a REGULAR vacuum:
vacuum;

And after that, the table started growing after all vacuum fulls. I can't
reproduce it now. I blew away the table, recreated it, and now it doesn't
do that. I blew away the database, and installed 7.2.1 and it won't do
it. I reinstalled 7.2 and blew away the database and it still won't do
it.

So, I don't know why it WAS doing it, but it isn't now. And of course,
it's my personal workstation, so no log files. :-( I'll see if I can
reproduce this though, as I've seen similar problems pop up once in the
past with a seldom vacuumed database that reached a point where it stopped
freeing dead tuples. Very bothersome.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Subject: bool / vacuum full bug followup
Date: 2002-05-03 00:44:27
Message-ID: 2327.1020386667@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com> writes:
> And after that, the table started growing after all vacuum fulls. I can't
> reproduce it now.

Perhaps you had an open transaction laying about? That'd prevent vacuum
from recovering space ...

regards, tom lane


From: Jeffrey Baker <jwbaker(at)acm(dot)org>
To: Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Subject: bool / vacuum full bug followup
Date: 2002-05-03 03:58:40
Message-ID: 20020503035840.GB368@noodles
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thu, May 02, 2002 at 04:30:15PM -0600, Scott Marlowe wrote:
> OK, this is just beyond weird. I swear to GOD that my table size was
> growing. I'm on a workstation that doesn't accept connections from
> anybody but me.
>
> I was running 7.2 for weeks.
> I created a simple table:
> create table test (buf text, yn bool);
> I run an explain:
> explain select * from test where yn=true;
> I run an analyze:
> analyze;
> I update the table:
> update test set yn=true where yn=true;
> update test set yn=true where yn=true;
> update test set yn=true where yn=true;
> I do a REGULAR vacuum:
> vacuum;
>
> And after that, the table started growing after all vacuum fulls. I can't
> reproduce it now. I blew away the table, recreated it, and now it doesn't
> do that. I blew away the database, and installed 7.2.1 and it won't do
> it. I reinstalled 7.2 and blew away the database and it still won't do
> it.
>
> So, I don't know why it WAS doing it, but it isn't now. And of course,
> it's my personal workstation, so no log files. :-( I'll see if I can
> reproduce this though, as I've seen similar problems pop up once in the
> past with a seldom vacuumed database that reached a point where it stopped
> freeing dead tuples. Very bothersome.

Well, if the table has an associated TOAST table, it is possible
that it will suddenly start growing out of control, and nothing you
can do with VACUUM will help. The TOAST system has a free space map
(FSM) with some default number of pages it can track. If you exceed
this number, it will become completely unable to reclaim space. If
this happens you will have to dump and reload the database or take
other drastic action.

Check the VACUUM VERBOSE output to see the pages in your TOAST
tables.

-jwb


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeffrey Baker <jwbaker(at)acm(dot)org>
Cc: Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Subject: bool / vacuum full bug followup
Date: 2002-05-03 04:20:44
Message-ID: 4028.1020399644@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Jeffrey Baker <jwbaker(at)acm(dot)org> writes:
> Well, if the table has an associated TOAST table, it is possible
> that it will suddenly start growing out of control, and nothing you
> can do with VACUUM will help. The TOAST system has a free space map
> (FSM) with some default number of pages it can track. If you exceed
> this number, it will become completely unable to reclaim space.

That's a gross misstatement.

If you have more pages with free space than the FSM can track, it will
forget about the ones with the least free space (at least until the next
vacuum tries to reload the info). This will very possibly lead to table
growth over time, but it's hardly the same as "completely unable to
reclaim space".

BTW, TOAST has nothing to do with this. Space in main tables and
toast tables is tracked alike.

regards, tom lane


From: Jeffrey Baker <jwbaker(at)acm(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Subject: bool / vacuum full bug followup
Date: 2002-05-03 04:46:42
Message-ID: 20020503044642.GF368@noodles
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Fri, May 03, 2002 at 12:20:44AM -0400, Tom Lane wrote:
> Jeffrey Baker <jwbaker(at)acm(dot)org> writes:
> > Well, if the table has an associated TOAST table, it is possible
> > that it will suddenly start growing out of control, and nothing you
> > can do with VACUUM will help. The TOAST system has a free space map
> > (FSM) with some default number of pages it can track. If you exceed
> > this number, it will become completely unable to reclaim space.
>
> That's a gross misstatement.
>
> If you have more pages with free space than the FSM can track, it will
> forget about the ones with the least free space (at least until the next
> vacuum tries to reload the info). This will very possibly lead to table
> growth over time, but it's hardly the same as "completely unable to
> reclaim space".

I expected someone to come out with this reply, but my own
monitoring in productions shows otherwise. I'll forward my data
along once I have completed my study.

The short of my observations is that once you have more free pages
than the FSM can track, the table grows linearly over time. This is
for a table with heavy insert and delete activity. This is perhaps
not the design intent but it is what we are seeing.

As I said I intend to send along unbearably detailed info once I
think I have mastered the problem.

> BTW, TOAST has nothing to do with this. Space in main tables and
> toast tables is tracked alike.

In my observation only TOAST tables have this behavior.

Regards
jwb


From: Holger Marzen <holger(at)marzen(dot)de>
To: Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Subject: bool / vacuum full bug followup
Date: 2002-05-03 08:41:51
Message-ID: Pine.LNX.4.44.0205031040250.20585-100000@bluebell.marzen.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thu, 2 May 2002, Scott Marlowe wrote:

> And after that, the table started growing after all vacuum fulls. I can't
> reproduce it now. I blew away the table, recreated it, and now it doesn't
> do that. I blew away the database, and installed 7.2.1 and it won't do
> it. I reinstalled 7.2 and blew away the database and it still won't do
> it.
>
> So, I don't know why it WAS doing it, but it isn't now. And of course,
> it's my personal workstation, so no log files. :-( I'll see if I can
> reproduce this though, as I've seen similar problems pop up once in the
> past with a seldom vacuumed database that reached a point where it stopped
> freeing dead tuples. Very bothersome.

Looks like something that happens only if counters or OIDs are big.
Possible integer overflow in the vacuum code?

--
PGP/GPG Key-ID:
http://blackhole.pca.dfn.de:11371/pks/lookup?op=get&search=0xB5A1AFE1


From: Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Subject: bool / vacuum full bug followup
Date: 2002-05-03 15:37:36
Message-ID: Pine.LNX.4.33.0205030936250.1594-100000@css120.ihs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thu, 2 May 2002, Tom Lane wrote:

> Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com> writes:
> > And after that, the table started growing after all vacuum fulls. I can't
> > reproduce it now.
>
> Perhaps you had an open transaction laying about? That'd prevent vacuum
> from recovering space ...

I thought of that, but no, I had just rebooted my box, and was using top
to view all the postgres user activities, and my psql session was the only
thing attached to it. If I can get it to do it, I'll pg_ctl stop/pg_ctl
start the database and see if it keeps doing it.


From: Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com>
To: Jeffrey Baker <jwbaker(at)acm(dot)org>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Subject: bool / vacuum full bug followup
Date: 2002-05-03 15:43:12
Message-ID: Pine.LNX.4.33.0205030942080.1594-100000@css120.ihs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thu, 2 May 2002, Jeffrey Baker wrote:

> On Thu, May 02, 2002 at 04:30:15PM -0600, Scott Marlowe wrote:
> > OK, this is just beyond weird. I swear to GOD that my table size was
> > growing. I'm on a workstation that doesn't accept connections from
> > anybody but me.

>> SNIPPAGE

> > And after that, the table started growing after all vacuum fulls. I can't
> > reproduce it now. I blew away the table, recreated it, and now it doesn't
> > do that. I blew away the database, and installed 7.2.1 and it won't do
> > it. I reinstalled 7.2 and blew away the database and it still won't do
> > it.

> Well, if the table has an associated TOAST table, it is possible
> that it will suddenly start growing out of control, and nothing you
> can do with VACUUM will help. The TOAST system has a free space map
> (FSM) with some default number of pages it can track. If you exceed
> this number, it will become completely unable to reclaim space. If
> this happens you will have to dump and reload the database or take
> other drastic action.
>
> Check the VACUUM VERBOSE output to see the pages in your TOAST
> tables.

Thanks, I'll do that. The tuples themselves were quite small, a single
ascii character in a text field and a single boole. So I wouldn't expect
them to be toasted.


From: Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com>
To: Holger Marzen <holger(at)marzen(dot)de>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Subject: bool / vacuum full bug followup
Date: 2002-05-03 15:46:56
Message-ID: Pine.LNX.4.33.0205030946190.1594-100000@css120.ihs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Fri, 3 May 2002, Holger Marzen wrote:

> On Thu, 2 May 2002, Scott Marlowe wrote:
>
> > And after that, the table started growing after all vacuum fulls. I can't
> > reproduce it now. I blew away the table, recreated it, and now it doesn't
> > do that. I blew away the database, and installed 7.2.1 and it won't do
> > it. I reinstalled 7.2 and blew away the database and it still won't do
> > it.
> >
> > So, I don't know why it WAS doing it, but it isn't now. And of course,
> > it's my personal workstation, so no log files. :-( I'll see if I can
> > reproduce this though, as I've seen similar problems pop up once in the
> > past with a seldom vacuumed database that reached a point where it stopped
> > freeing dead tuples. Very bothersome.
>
> Looks like something that happens only if counters or OIDs are big.
> Possible integer overflow in the vacuum code?

Prior to the test I had been running 100,000 transaction pgbench tests
repeatedly. I'll try running a few million transactions first and see if
the problem surfaces.

Thanks for the suggestions.


From: Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com>
To: <pgsql-general(at)postgresql(dot)org>
Subject: bool / vacuum full bug followup part 2
Date: 2002-05-03 16:49:15
Message-ID: Pine.LNX.4.33.0205031039280.2391-100000@css120.ihs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

OK, I remembered what was different. I had an index the first time. So,
it was the index I was watchin grow and never shrink.

Now this is reproduceable for me.

I dropped and recreated my test index

drop index test_yn_dx ;
create index test_yn_dx on test (yn);
vacuum full;

[postgres(at)css120] oid2name -d scott.marlowe
All tables from database "scott.marlowe":
---------------------------------
16557 = test
126563 = test_yn_dx

cd into the base/oid of my database, then:

du -s 16557 126563
11128 16557
1772 126563

update test set yn=true where yn=true;
UPDATE 50080

[postgres(at)css120 16556]$ du -s 16557 126563
16704 16557
2948 126563

vacuum;

[postgres(at)css120 16556]$ du -s 16557 126563
16704 16557
2948 126563

vacuum full;

[postgres(at)css120 16556]$ du -s 16557 126563
11128 16557
4100 126563

I tried:

vacuum test_yn_dx;
NOTICE: VACUUM: can not process indexes, views or special system tables
NOTICE: VACUUM: table not found
VACUUM

so, then I tried:

reindex index test_yn_dx;
REINDEX

oid2name -d scott.marlowe
All tables from database "scott.marlowe":
---------------------------------
16557 = test
126564 = test_yn_dx

[postgres(at)css120 16556]$ du -s 16557 126564
11128 16557
1772 126564

And reclaimed the space. Is that the official way, short of dropping and
recreating an index to reclaim its space? Is there a plan to make vacuum
reclaim unused space in indexes? Just wondering.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Subject: bool / vacuum full bug followup part 2
Date: 2002-05-03 19:47:54
Message-ID: 15296.1020455274@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com> writes:
> And reclaimed the space. Is that the official way, short of dropping and
> recreating an index to reclaim its space? Is there a plan to make vacuum
> reclaim unused space in indexes?

Yes, and yes, but don't hold your breath on the latter part --- that
TODO item has been around for awhile. And it's gotten harder now that
we have lazy VACUUM; that means we need to be able to condense indexes
concurrently with other index operations.

AFAIK there's not a big problem with index growth if the range of index
keys remains reasonably static. The problem comes in if you have a
range of values that keeps growing (eg, you are indexing a SERIAL or
timestamp column). The right end of the btree keeps growing, but
there's no mechanism to collapse out no-longer-used space at the left
end.

regards, tom lane


From: Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Subject: bool / vacuum full bug followup part 2
Date: 2002-05-03 21:19:45
Message-ID: Pine.LNX.4.33.0205031512120.2841-100000@css120.ihs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Well, my keys aren't changing and the index is growing like they are.
I.e. the update statement is:

update table set field1=y where field1=y;

I'm not adding rows, I'm not deleting rows, and I'm not even changing the
value really. I'm sure this is the problem the earlier poster was
complaining about making him dump and restore his whole database every few
weeks.

I'd say the indexes keep growing whether the data they are indexing
changes or not from my observations. Do your own tests if you want, but
so far this looks like a serious issue for joe average DBA. He expects
vacuum [full] to reclaim all his unused space, but indexes, which are
constantly growing, are not reclaimed, and his store fills up. Leading
him to dump / restore instead of just reindexing.

We should at least add something to the administrator docs to say you need
to reindex heavily updated indexes to save space.

Here's my quick and dirty php script to reindex all indexes in all
databases on a given server. It's ugly, but it works.

#!/usr/local/bin/php -q
<?php
$conn0 = pg_connect("dbname=template1 user=postgres");
$res0 = pg_exec($conn0,"select datname from pg_database where datname not
like 'template%'");
$drows = pg_numrows($res0);
for ($j=0;$j<$drows;$j++){
$datname = pg_result($res0,$j,'datname');
print 'database: '.$datname."n";
$conn1 = pg_connect("dbname=$datname user=postgres");
$res1 = pg_exec($conn1,"select indexname from pg_indexes where
indexname not like 'pg_%'");
$rows = pg_numrows($res1);
for ($i=0;$i<$rows;$i++){
$indexname = pg_result($res1,$i,indexname);
$query = "reindex index $indexname";
print 'table: '.$indexname."\n";
pg_exec($conn1,$query);
}
}
?>

On Fri, 3 May 2002, Tom Lane wrote:

> Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com> writes:
> > And reclaimed the space. Is that the official way, short of dropping and
> > recreating an index to reclaim its space? Is there a plan to make vacuum
> > reclaim unused space in indexes?
>
> Yes, and yes, but don't hold your breath on the latter part --- that
> TODO item has been around for awhile. And it's gotten harder now that
> we have lazy VACUUM; that means we need to be able to condense indexes
> concurrently with other index operations.
>
> AFAIK there's not a big problem with index growth if the range of index
> keys remains reasonably static. The problem comes in if you have a
> range of values that keeps growing (eg, you are indexing a SERIAL or
> timestamp column). The right end of the btree keeps growing, but
> there's no mechanism to collapse out no-longer-used space at the left
> end.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Subject: bool / vacuum full bug followup part 2
Date: 2002-05-03 22:36:31
Message-ID: 22775.1020465391@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com> writes:
> Well, my keys aren't changing and the index is growing like they are.

Could we see the exact details of your test case?

regards, tom lane


From: Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Subject: bool / vacuum full bug followup part 2
Date: 2002-05-03 23:30:47
Message-ID: Pine.LNX.4.33.0205031702200.1471-100000@css120.ihs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Fri, 3 May 2002, Tom Lane wrote:

> Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com> writes:
> > Well, my keys aren't changing and the index is growing like they are.
>
> Could we see the exact details of your test case?

Sure. I think I posted most of it here already...

Here's my table:

scott.marlowe=# \d test
Table "test"
Column | Type | Modifiers
--------+---------+-----------
buf | text |
yn | boolean |
id | integer |
Indexes: test_id_dx,
test_yn_dx

Here's the indexes:

scott.marlowe=# \di test_id_dx
List of relations
Name | Type | Owner
------------+-------+---------------
test_id_dx | index | scott.marlowe

scott.marlowe=# \di test_yn_dx
List of relations
Name | Type | Owner
------------+-------+---------------
test_yn_dx | index | scott.marlowe
(1 row)

Here's the php script that creates the test data, it makes $count number
of rows and sets the bool to true or false randomly:

set_time_limit(3600);
$conn = pg_connect("dbname=scott.marlowe host=css120.ihs.com");
$count = 100000;
pg_exec($conn,"begin");
$flag[0]="false";
$flag[1]="true";
for ($i=0;$i<$count;$i++){
if ($i%1000==0) {
pg_exec($conn,"end");
pg_exec($conn,"begin");
}
$letter = chr(rand(65,91));
$tf = rand(0,1);
$query = "insert into test (buf,yn) values ('";
$query.= $letter."',".$flag[$tf].")";
pg_exec($conn,$query);
}
pg_exec($conn,"end");

Here's the files in my database directory, and their size by du in
kbytes after vacuum full;
vacuum;
reindex index test_id_dx;
reindex index test_yn_dx;

[postgres(at)css120 16556]$ oid2name -d scott.marlowe
All tables from database "scott.marlowe":
---------------------------------
126572 = accounts
126574 = accounts_pkey
126566 = branches
126568 = branches_pkey
126575 = history
126569 = tellers
126571 = tellers_pkey
16557 = test
1126687 = test_id_dx
1126688 = test_yn_dx

[postgres(at)css120 16556]$ du -s 16557 1126687 1126688
11448 16557 (test)
1772 1126687 (test_id_dx)
1772 1126688 (test_yn_dx)

WHAT I DID:

scott.marlowe=# update test set yn=true where yn=true;
UPDATE 50080

[postgres(at)css120 16556]$ du -s 16557 1126687 1126688
17176 16557
3516 1126687
2924 1126688

scott.marlowe=# vacuum;
VACUUM

[postgres(at)css120 16556]$ du -s 16557 1126687 1126688
17176 16557
3516 1126687
2924 1126688

scott.marlowe=# vacuum full;
VACUUM

[postgres(at)css120 16556]$ du -s 16557 1126687 1126688
11448 16557
3516 1126687
4052 1126688 <-- Notice that the index here just GREW

scott.marlowe=# update test set yn=true where yn=true;
scott.marlowe=# update test set yn=true where yn=true;
scott.marlowe=# update test set yn=true where yn=true;
scott.marlowe=# update test set yn=true where yn=true;
scott.marlowe=# update test set yn=true where yn=true;
scott.marlowe=# update test set yn=true where yn=true;
scott.marlowe=# update test set yn=true where yn=true;
scott.marlowe=# update test set yn=true where yn=true;
scott.marlowe=# update test set yn=true where yn=true;
scott.marlowe=# update test set yn=true where yn=true;

[postgres(at)css120 16556]$ du -s 16557 1126687 1126688
68744 16557
13980 1126687
15660 1126688

scott.marlowe=# vacuum;
VACUUM

[postgres(at)css120 16556]$ du -s 16557 1126687 1126688
68736 16557
13964 1126687
15652 1126688

scott.marlowe=# vacuum full;
VACUUM

[postgres(at)css120 16556]$ du -s 16557 1126687 1126688
11448 16557
13964 1126687
16808 1126688

***************
So, now thinking the problem might be just vacuum full, I try plain old
vacuums
***************

scott.marlowe=# update test set yn=true where yn=true;
scott.marlowe=# update test set yn=true where yn=true;

scott.marlowe=# vacuum;
VACUUM

[postgres(at)css120 16556]$ du -s 16557 1126687 1126688
22908 16557
13964 1126687
20088 1126688

scott.marlowe=# update test set yn=true where yn=true;
scott.marlowe=# update test set yn=true where yn=true;

scott.marlowe=# vacuum;
VACUUM

[postgres(at)css120 16556]$ du -s 16557 1126687 1126688
22908 16557
13964 1126687
22380 1126688

*****************************
Nope, the index on the bools just keeps growing and growing.
Given a few million updates and it will be bigger than the data it is
supposed to index.

scott.marlowe=# reindex index test_yn_dx;
REINDEX
scott.marlowe=# reindex index test_id_dx;
REINDEX

oid2name -d scott.marlowe |grep dx
1126690 = test_id_dx
1126689 = test_yn_dx
[postgres(at)css120 16556]$ du -s 16557 1126690 1126689
22908 16557
1772 1126690
1772 1126689

and now they're small again.

It would at least be nice if reindex was operational in a transaction so
it would be safe to use on a live database, since it appears to not be
intended for this purpose really, but for fixing broken indexes. Til then
I'll write a script that asks pg_indexes that drops the index and
recreates it in a transaction to keep my data store svelt and clean.

Thanks for the attention to this. Scott.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Subject: bool / vacuum full bug followup part 2
Date: 2002-05-04 00:34:27
Message-ID: 23684.1020472467@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com> writes:
> On Fri, 3 May 2002, Tom Lane wrote:
>> Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com> writes:
> Well, my keys aren't changing and the index is growing like they are.
>>
>> Could we see the exact details of your test case?

> Sure. I think I posted most of it here already...

Okay, what I see is that the index on the integer column behaves like I
would expect: you can update, vacuum, update, vacuum, and it doesn't get
bigger. But the index on the boolean column does grow. I believe the
problem is that there are so many equal keys. The reinserted index
entries are always inserted at the end of the range of matching keys,
and so there's no opportunity to re-use space within other pages of the
index. There are only two leaf pages getting the insertions, and so
nothing to do but split them over and over.

What this really points up, of course, is that making a btree index on
a boolean column is a pretty foolish thing to do. I'm not particularly
unhappy about the performance being bad with respect to space usage,
because the fact of the matter is that performance is going to be bad
by any measure.

regards, tom lane


From: Jeffrey Baker <jwbaker(at)acm(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Subject: bool / vacuum full bug followup part 2
Date: 2002-05-04 17:48:47
Message-ID: 20020504174847.GA370@noodles
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Fri, May 03, 2002 at 03:47:54PM -0400, Tom Lane wrote:
> Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com> writes:
> > And reclaimed the space. Is that the official way, short of dropping and
> > recreating an index to reclaim its space? Is there a plan to make vacuum
> > reclaim unused space in indexes?
>
> Yes, and yes, but don't hold your breath on the latter part --- that
> TODO item has been around for awhile. And it's gotten harder now that
> we have lazy VACUUM; that means we need to be able to condense indexes
> concurrently with other index operations.
>
> AFAIK there's not a big problem with index growth if the range of index
> keys remains reasonably static. The problem comes in if you have a
> range of values that keeps growing (eg, you are indexing a SERIAL or
> timestamp column). The right end of the btree keeps growing, but
> there's no mechanism to collapse out no-longer-used space at the left
> end.

Wouldn't that explain the complaints I have about my toast tables
always growing? Because each toast table has an index, and the
above paragraph makes it sound like indexes on serial values grow
all the time, that would imply that table that where tuples live for
windows of time will always be growing.

Or did I read that incorrectly?

-jwb


From: Jeffrey Baker <jwbaker(at)acm(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Subject: bool / vacuum full bug followup part 2
Date: 2002-05-04 19:52:17
Message-ID: 20020504195217.GC370@noodles
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Sat, May 04, 2002 at 10:48:47AM -0700, Jeffrey Baker wrote:
> On Fri, May 03, 2002 at 03:47:54PM -0400, Tom Lane wrote:
> > Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com> writes:
> > > And reclaimed the space. Is that the official way, short of dropping and
> > > recreating an index to reclaim its space? Is there a plan to make vacuum
> > > reclaim unused space in indexes?
> >
> > Yes, and yes, but don't hold your breath on the latter part --- that
> > TODO item has been around for awhile. And it's gotten harder now that
> > we have lazy VACUUM; that means we need to be able to condense indexes
> > concurrently with other index operations.
> >
> > AFAIK there's not a big problem with index growth if the range of index
> > keys remains reasonably static. The problem comes in if you have a
> > range of values that keeps growing (eg, you are indexing a SERIAL or
> > timestamp column). The right end of the btree keeps growing, but
> > there's no mechanism to collapse out no-longer-used space at the left
> > end.
>
> Wouldn't that explain the complaints I have about my toast tables
> always growing? Because each toast table has an index, and the
> above paragraph makes it sound like indexes on serial values grow
> all the time, that would imply that table that where tuples live for
> windows of time will always be growing.
>
> Or did I read that incorrectly?

Indeed, I did not. Part of the space leak I am seeing is from this:

Start.
Insert 20,000 tuples of 13KB each.
Delete 20,000 tuples.
Vacuum full.
Goto Start.

Toast index grows by ~535 pages or 4.2MB[1] per cycle, even though
vacuum is able to truncate the main relation and the toast relation
to zero pages.

-jwb

1: This implies a page size of 16KB in the index. I expected it to
be smaller.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeffrey Baker <jwbaker(at)acm(dot)org>
Cc: Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Subject: bool / vacuum full bug followup part 2
Date: 2002-05-04 22:06:38
Message-ID: 19070.1020549998@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Jeffrey Baker <jwbaker(at)acm(dot)org> writes:
> On Fri, May 03, 2002 at 03:47:54PM -0400, Tom Lane wrote:
>> AFAIK there's not a big problem with index growth if the range of index
>> keys remains reasonably static. The problem comes in if you have a
>> range of values that keeps growing (eg, you are indexing a SERIAL or
>> timestamp column). The right end of the btree keeps growing, but
>> there's no mechanism to collapse out no-longer-used space at the left
>> end.

> Wouldn't that explain the complaints I have about my toast tables
> always growing?

It'd explain the indexes growing --- the index key is an OID, which will
keep increasing as you store new toasted values. I thought you'd been
complaining about the tables themselves, though.

regards, tom lane


From: Jeffrey Baker <jwbaker(at)acm(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Subject: bool / vacuum full bug followup part 2
Date: 2002-05-04 22:17:57
Message-ID: 20020504221757.GD370@noodles
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Sat, May 04, 2002 at 06:06:38PM -0400, Tom Lane wrote:
> Jeffrey Baker <jwbaker(at)acm(dot)org> writes:
> > On Fri, May 03, 2002 at 03:47:54PM -0400, Tom Lane wrote:
> >> AFAIK there's not a big problem with index growth if the range of index
> >> keys remains reasonably static. The problem comes in if you have a
> >> range of values that keeps growing (eg, you are indexing a SERIAL or
> >> timestamp column). The right end of the btree keeps growing, but
> >> there's no mechanism to collapse out no-longer-used space at the left
> >> end.
>
> > Wouldn't that explain the complaints I have about my toast tables
> > always growing?
>
> It'd explain the indexes growing --- the index key is an OID, which will
> keep increasing as you store new toasted values. I thought you'd been
> complaining about the tables themselves, though.

You're right, I am. But in my quest to operate Pg properly I am
trying to nail down everything that causes its disk usage to
increase. I just had a look at my prod. database and the toast
tables are much larger than their indices, so it is probably
irrelevant.

-jwb


From: Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Subject: bool / vacuum full bug followup part 2
Date: 2002-05-06 16:23:53
Message-ID: Pine.LNX.4.33.0205061007040.13079-100000@css120.ihs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Fri, 3 May 2002, Tom Lane wrote:

> Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com> writes:
> > On Fri, 3 May 2002, Tom Lane wrote:
> >> Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com> writes:
> > Well, my keys aren't changing and the index is growing like they are.
> >>
> >> Could we see the exact details of your test case?
>
> > Sure. I think I posted most of it here already...
>
> Okay, what I see is that the index on the integer column behaves like I
> would expect: you can update, vacuum, update, vacuum, and it doesn't get
> bigger.

Yes, it does get bigger, but only with use, not vacuum full.

It doesn't look like the index on the text column is getting reused
either. Is that because I'm update a lot of rows with a single update
statement? would it be reused if I was changing one row at a time and
commiting it?

I'll test that theory, but I'm gonna bet right now that it won't.

> But the index on the boolean column does grow.

Not only that, but it grows from a vacuum full. I would expect it to at
least stay the same.

> I believe the
> problem is that there are so many equal keys. The reinserted index
> entries are always inserted at the end of the range of matching keys,
> and so there's no opportunity to re-use space within other pages of the
> index. There are only two leaf pages getting the insertions, and so
> nothing to do but split them over and over.
>
> What this really points up, of course, is that making a btree index on
> a boolean column is a pretty foolish thing to do.

Since postgresql 7.2.1 refuses to make an index of any kind other than
btree, what is the answer? no indexes? While single column indexes may
seem wasteful, remember that the boole may be stored in a table that has
very large tuples, and a sequential scan of such a table could be quite
slow, or there may be a situation where a tiny percentage of the booles
are one setting while most are the other, like an approval system for
online content. Either way, creating an index shouldn't result in a
database directory that grows to 100X it's original size, gets slower, and
doesn't give back space to vacuum;

scott.marlowe=# create index test_yn_dx on test using hash (yn);
ERROR: data type boolean has no default operator class for access method
"hash"
You must specify an operator class for the index or define a
default operator class for the data type
scott.marlowe=# create index test_yn_dx on test using rtree (yn);
ERROR: data type boolean has no default operator class for access method
"rtree"
You must specify an operator class for the index or define a
default operator class for the data type
scott.marlowe=# create index test_yn_dx on test using gist (yn);
ERROR: data type boolean has no default operator class for access method
"gist"
You must specify an operator class for the index or define a
default operator class for the data type
scott.marlowe=# create index test_yn_dx on test using btree (yn);
CREATE
scott.marlowe=# drop index tesT_yn_dx;
DROP

> I'm not particularly
> unhappy about the performance being bad with respect to space usage,
> because the fact of the matter is that performance is going to be bad
> by any measure.

If the database gets slow, that's forgiveable. If it runs out of space in
the middle of the day because indexes just grow and grow, that's not. It
makes postgresql look like a toy database.

A growing index that vacuum doesn't shrink is a serious issue for people
who expect to reclaim lost space with vacuum. We at least need to let
people know of this behavior in the admin docs, as right now they (the
docs) seem to imply that vacuum frees up all unused space. For indexes,
this isn't true, and people who are getting started don't need this kind
of gotcha waiting to kill a production database 2 or 6 months into use.

Is it maybe at least possible to make reindex either transaction safe or
have an option that pretty much drops and recreates the index in a
transactionally safe mode or something?

I never knew about this problem until now, and I've found that I had
indexes that were 180Megs that reindexed to 48k in size. My database had
been getting slower and slower, and now it flies again. This was true of
ALL types of indexes, on ints, text, everything. The indexes were HUGE.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Subject: bool / vacuum full bug followup part 2
Date: 2002-05-06 16:44:03
Message-ID: 15976.1020703443@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com> writes:
> Since postgresql 7.2.1 refuses to make an index of any kind other than
> btree, what is the answer? no indexes? While single column indexes may
> seem wasteful, remember that the boole may be stored in a table that has
> very large tuples, and a sequential scan of such a table could be quite
> slow, or there may be a situation where a tiny percentage of the booles
> are one setting while most are the other, like an approval system for
> online content.

You could use a partial index for the latter case. If the column is
about fifty-fifty then any kind of index is a waste of space...

regards, tom lane


From: Steve Lane <slane(at)fmpro(dot)com>
To: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Subject: bool / vacuum full bug followup part 2
Date: 2002-05-06 19:16:35
Message-ID: B8FC40C3.CF76%slane@fmpro.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I'd like to interject a short novice question, because I'd like to check
this:

What's the best way to see how much space my indexes take up?

-- sgl
=======================================================
Steve Lane

Vice President
Chris Moyer Consulting, Inc.
833 West Chicago Ave Suite 203

Voice: (312) 433-2421 Email: slane(at)fmpro(dot)com
Fax: (312) 850-3930 Web: http://www.fmpro.com
=======================================================


From: Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Subject: bool / vacuum full bug followup part 2
Date: 2002-05-06 19:34:56
Message-ID: Pine.LNX.4.33.0205061334300.15514-100000@css120.ihs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Mon, 6 May 2002, Scott Marlowe wrote:

> On Fri, 3 May 2002, Tom Lane wrote:
>
> > Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com> writes:
> > > On Fri, 3 May 2002, Tom Lane wrote:
> > >> Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com> writes:
> > > Well, my keys aren't changing and the index is growing like they are.
> > >>
> > >> Could we see the exact details of your test case?
> >
> > > Sure. I think I posted most of it here already...
> >
> > Okay, what I see is that the index on the integer column behaves like I
> > would expect: you can update, vacuum, update, vacuum, and it doesn't get
> > bigger.
>
> Yes, it does get bigger, but only with use, not vacuum full.
>
> It doesn't look like the index on the text column is getting reused
> either. Is that because I'm update a lot of rows with a single update
> statement? would it be reused if I was changing one row at a time and
> commiting it?

Correction, that should be "index on the int4 column"


From: Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com>
To: Steve Lane <slane(at)fmpro(dot)com>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Subject: bool / vacuum full bug followup part 2
Date: 2002-05-06 20:39:08
Message-ID: Pine.LNX.4.33.0205061423001.15633-100000@css120.ihs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Mon, 6 May 2002, Steve Lane wrote:

> I'd like to interject a short novice question, because I'd like to check
> this:
>
> What's the best way to see how much space my indexes take up?

You'll need a program called oid2name, found in the contrib directory of
the postgresql source distribution. Many Linux distros include it
automagically, but if not, it's a simple install.

On my machine, I build postgresql from source, and I build it in the
/usr/local/src/postgresql-x.y.z directory, where x.y.z is the version
number. so, for 7.2.1, I would do this:

su -
(enter root password)
cd /usr/local/src/postgresql-7.2.1/contrib/oid2name
make
make install
exit (back to being joe regular user)

then try entering oid2name. Assuming you have /usr/local/pgsql/bin in
your path, it should run and tell you the name of all your databases.

On my local test box, I get this:
All databases:
---------------------------------
16556 = scott.marlowe
1126697 = test
1 = template1
16555 = template0

Using the -d switch, you can get a list of all the oids used by a given
database, like so:

oid2name -d scott.marlowe
All tables from database "scott.marlowe":
---------------------------------
126572 = accounts
1126708 = accounts_pkey
126566 = branches
1126706 = branches_pkey
126575 = history
126569 = tellers
1126707 = tellers_pkey
16557 = test
1126709 = test_id_dx

Now, lastly, you need to be the postgres super user to do this, replacing
db with the name of the database you wanna see, and dx with the name of
the index you want to know about: (note this should be on one line, but it
ran pretty long, so I have a \ continuation character in there)

du -s $PGDATA/base/`oid2name |grep db|cut -d ' ' -f 1`/`oid2name -d \
db|grep dx|cut -d ' ' -f 1`

or scriptify it by putting that command into a file called
/usr/local/pgsql/bin/tsize that looks like this:

#!/bin/bash
du -s $PGDATA/base/`oid2name |grep $1|cut -d ' ' -f 1`/`oid2name -d \
$1|grep $2|cut -d ' ' -f 1`

(don't forget to chmod 755 it so it's executable)
and call it like so:

tsize dbname tablename

Hope that helps!

explanation:

What the above script does is kind like this:

cd $PGDATA/base
oid2name
(find oid of your database in list)
cd oid_of_your_database
oid2name -d your_database
(find oid of your index)
du -s oid_of_your_index


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)postgresql(dot)org
Subject: Re: Subject: bool / vacuum full bug followup part 2
Date: 2002-05-06 23:54:43
Message-ID: 20020507095443.A14434@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Mon, May 06, 2002 at 10:23:53AM -0600, Scott Marlowe wrote:
> A growing index that vacuum doesn't shrink is a serious issue for people
> who expect to reclaim lost space with vacuum. We at least need to let
> people know of this behavior in the admin docs, as right now they (the
> docs) seem to imply that vacuum frees up all unused space. For indexes,
> this isn't true, and people who are getting started don't need this kind
> of gotcha waiting to kill a production database 2 or 6 months into use.

PostgreSQL has never shrunk indexes, not now not never. The only option is
to reindex or recreate them. We use a script here to automatically rebuild
all the indexes each month.

> Is it maybe at least possible to make reindex either transaction safe or
> have an option that pretty much drops and recreates the index in a
> transactionally safe mode or something?

It is safe to drop and create the index within a transaction.

Feed output of this to psql. Totally transaction safe :)
======================
#!/usr/bin/perl -w

my $DB = "database";

open( FH, "pg_dump -s $DB |grep INDEX |" ) || die "Can't pg_dump ($!)\n";

while(<FH>)
{
chomp;
/"(\w+)"/ or die "Couldn't extract index name from [$_]\n";

my $index = $1;
s/$index/${index}_reindex/;
s/ "\w+"(,| \))/$1/g;
print "begin;\n";
print "$_\n";
print "drop index $index;\n";
print "alter table ${index}_reindex rename to $index;\n";
print "commit;\n";
}
======================
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Canada, Mexico, and Australia form the Axis of Nations That
> Are Actually Quite Nice But Secretly Have Nasty Thoughts About America


From: Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com>
To: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Subject: bool / vacuum full bug followup part 2
Date: 2002-05-07 15:01:47
Message-ID: Pine.LNX.4.33.0205070859240.16461-100000@css120.ihs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, 7 May 2002, Martijn van Oosterhout wrote:

> On Mon, May 06, 2002 at 10:23:53AM -0600, Scott Marlowe wrote:
> > A growing index that vacuum doesn't shrink is a serious issue for people
> > who expect to reclaim lost space with vacuum. We at least need to let
> > people know of this behavior in the admin docs, as right now they (the
> > docs) seem to imply that vacuum frees up all unused space. For indexes,
> > this isn't true, and people who are getting started don't need this kind
> > of gotcha waiting to kill a production database 2 or 6 months into use.
>
> PostgreSQL has never shrunk indexes, not now not never. The only option is
> to reindex or recreate them. We use a script here to automatically rebuild
> all the indexes each month.

That wouldn't be so bothersome if it was in the admin docs, or if the
space in them got reused, but so far, it looks like the btrees don't
reuse space much, if at all, and this is the first I've heard about
indexes just growing and growing and growing (insert bunny with drum here
:-)

Think I'll write a section for the postgresql admin docs and submit it.

Thanks for the sript, I already had one in php, and was about to write one
in bash to make it more portable.


From: Steve Lane <slane(at)fmpro(dot)com>
To: Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Subject: bool / vacuum full bug followup part 2
Date: 2002-05-07 23:15:48
Message-ID: B8FDCA54.D048%slane@fmpro.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 5/6/02 3:39 PM, "Scott Marlowe" <scott(dot)marlowe(at)ihs(dot)com> wrote:

> On Mon, 6 May 2002, Steve Lane wrote:
>
>> I'd like to interject a short novice question, because I'd like to check
>> this:
>>
>> What's the best way to see how much space my indexes take up?
>
> You'll need a program called oid2name, found in the contrib directory of
> the postgresql source distribution. Many Linux distros include it
> automagically, but if not, it's a simple install.
>
> On my machine, I build postgresql from source, and I build it in the
> /usr/local/src/postgresql-x.y.z directory, where x.y.z is the version
> number. so, for 7.2.1, I would do this:

That's wonderfully precise and helpful. Thanks! I'll go grab it and give it
a try ...

-- sgl


From: "Marin Dimitrov" <marin(dot)dimitrov(at)sirma(dot)bg>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Subject: bool / vacuum full bug followup part 2
Date: 2002-05-10 10:46:35
Message-ID: 011501c1f80f$f52e61a0$9ba1d53e@sirma.bg
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


----- Original Message -----
From: "Tom Lane"

> keys remains reasonably static. The problem comes in if you have a
> range of values that keeps growing (eg, you are indexing a SERIAL or
> timestamp column). The right end of the btree keeps growing, but
> there's no mechanism to collapse out no-longer-used space at the left
> end.
>

related to this: it seems that there is no handy way to rebuild the index,
so that the it's storage is optimised (i.e. remove fragmentation).

Are there any plans for implementing ALTER INDEX ... REBUILD (or ALTER INDEX
... COALESCE) that could rebuild the index based only on its data, without
accessing the related table (which will be the case with DROP + CREATE for
the index)?

thanx,

Marin

----
"...what you brought from your past, is of no use in your present. When
you must choose a new path, do not bring old experiences with you.
Those who strike out afresh, but who attempt to retain a little of the
old life, end up torn apart by their own memories. "


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Marin Dimitrov" <marin(dot)dimitrov(at)sirma(dot)bg>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Subject: bool / vacuum full bug followup part 2
Date: 2002-05-10 15:18:09
Message-ID: 28457.1021043889@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"Marin Dimitrov" <marin(dot)dimitrov(at)sirma(dot)bg> writes:
> related to this: it seems that there is no handy way to rebuild the index,
> so that the it's storage is optimised (i.e. remove fragmentation).

See REINDEX.

regards, tom lane


From: "Marin Dimitrov" <marin(dot)dimitrov(at)sirma(dot)bg>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Subject: bool / vacuum full bug followup part 2
Date: 2002-05-10 15:58:31
Message-ID: 004901c1f83b$88774000$9ba1d53e@sirma.bg
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


----- Original Message -----
From: "Tom Lane"

> "Marin Dimitrov" <marin(dot)dimitrov(at)sirma(dot)bg> writes:
> > related to this: it seems that there is no handy way to rebuild the
index,
> > so that the it's storage is optimised (i.e. remove fragmentation).
>
> See REINDEX.
>

will REINDEX use only the index and never read from the table?

because it doesn't seem so - the docs say: "REINDEX is used to rebuild
corrupted indexes....indexes may become corrupted due to software bugs or
hardware failures." , which implies the table is accessed (otherwise I don't
see a way of rebuilding the index based on its corrupted data)

the idea of ALTER INDEX ...REBUILD/COALESCE is that the index is optimised
without accessing the table, so it should be faster

Marin

----
"...what you brought from your past, is of no use in your present. When
you must choose a new path, do not bring old experiences with you.
Those who strike out afresh, but who attempt to retain a little of the
old life, end up torn apart by their own memories. "


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Marin Dimitrov" <marin(dot)dimitrov(at)sirma(dot)bg>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Subject: bool / vacuum full bug followup part 2
Date: 2002-05-10 16:39:56
Message-ID: 29140.1021048796@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"Marin Dimitrov" <marin(dot)dimitrov(at)sirma(dot)bg> writes:
> the idea of ALTER INDEX ...REBUILD/COALESCE is that the index is optimised
> without accessing the table, so it should be faster

If you feel a compelling need to have a variant of REINDEX that works
that way, feel free to write it. It seems like an extremely
low-priority concern to me, though. (No, I'm not convinced there's
much performance gain to be had there, either...)

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeffrey Baker <jwbaker(at)acm(dot)org>, Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Subject: bool / vacuum full bug followup
Date: 2002-06-02 21:02:17
Message-ID: 200206022102.g52L2Hp28149@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Tom Lane wrote:
> Jeffrey Baker <jwbaker(at)acm(dot)org> writes:
> > Well, if the table has an associated TOAST table, it is possible
> > that it will suddenly start growing out of control, and nothing you
> > can do with VACUUM will help. The TOAST system has a free space map
> > (FSM) with some default number of pages it can track. If you exceed
> > this number, it will become completely unable to reclaim space.
>
> That's a gross misstatement.
>
> If you have more pages with free space than the FSM can track, it will
> forget about the ones with the least free space (at least until the next
> vacuum tries to reload the info). This will very possibly lead to table
> growth over time, but it's hardly the same as "completely unable to
> reclaim space".
>
> BTW, TOAST has nothing to do with this. Space in main tables and
> toast tables is tracked alike.

One subtle point is that TOAST space and HEAP space used by a table are
not shared, meaning if you have TOAST space free, that can't be used for
HEAP storage, and via versa. Of course, a FULL vacuum frees all unused
space.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Subject: bool / vacuum full bug followup part 2
Date: 2002-06-03 02:24:01
Message-ID: 200206030224.g532O1e29501@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Tom Lane wrote:
> Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com> writes:
> > On Fri, 3 May 2002, Tom Lane wrote:
> >> Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com> writes:
> > Well, my keys aren't changing and the index is growing like they are.
> >>
> >> Could we see the exact details of your test case?
>
> > Sure. I think I posted most of it here already...
>
> Okay, what I see is that the index on the integer column behaves like I
> would expect: you can update, vacuum, update, vacuum, and it doesn't get
> bigger. But the index on the boolean column does grow. I believe the
> problem is that there are so many equal keys. The reinserted index
> entries are always inserted at the end of the range of matching keys,
> and so there's no opportunity to re-use space within other pages of the
> index. There are only two leaf pages getting the insertions, and so
> nothing to do but split them over and over.

TODO updated:

* Certain indexes will not shrink, e.g. indexes on ever-increasing
columns and indexes with many duplicate keys

> What this really points up, of course, is that making a btree index on
> a boolean column is a pretty foolish thing to do. I'm not particularly
> unhappy about the performance being bad with respect to space usage,
> because the fact of the matter is that performance is going to be bad
> by any measure.

Yes, but we can't expect people to know to use a partial index when they
are indexing a column like bool. (Our partial index code didn't even
work for several years.) (And there are valid reasons to index bool,
like if there are only a few true or false values, as was pointed out.)

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Subject: bool / vacuum full bug followup part 2
Date: 2002-06-03 02:56:52
Message-ID: 200206030256.g532ur604192@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Tom Lane wrote:
> Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com> writes:
> > And reclaimed the space. Is that the official way, short of dropping and
> > recreating an index to reclaim its space? Is there a plan to make vacuum
> > reclaim unused space in indexes?
>
> Yes, and yes, but don't hold your breath on the latter part --- that
> TODO item has been around for awhile. And it's gotten harder now that
> we have lazy VACUUM; that means we need to be able to condense indexes
> concurrently with other index operations.

Can you remind me why it is so hard to fix this. I do not expect lazy
vacuum to handle index shrinking, but it should be possible with full
vacuum.

> AFAIK there's not a big problem with index growth if the range of index
> keys remains reasonably static. The problem comes in if you have a
> range of values that keeps growing (eg, you are indexing a SERIAL or
> timestamp column). The right end of the btree keeps growing, but
> there's no mechanism to collapse out no-longer-used space at the left
> end.

I must say I am embarrased by our failure to shrink all indexes. I
haven't said this before because no one has complained.

I am sure lots of people are affected by this, but few realize it
because few analyze did space.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Subject: bool / vacuum full bug followup part 2
Date: 2002-06-03 23:43:41
Message-ID: 13052.1023147821@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> Tom Lane wrote:
>> Yes, and yes, but don't hold your breath on the latter part --- that
>> TODO item has been around for awhile. And it's gotten harder now that
>> we have lazy VACUUM; that means we need to be able to condense indexes
>> concurrently with other index operations.

> Can you remind me why it is so hard to fix this. I do not expect lazy
> vacuum to handle index shrinking, but it should be possible with full
> vacuum.

If you make that restriction then it might be less painful to do. I
have not thought about doing it that way; I'm of the opinion that only
a solution that lets lazy vacuum do it will be a real solution.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Subject: bool / vacuum full bug followup part 2
Date: 2002-06-04 01:42:58
Message-ID: 200206040142.g541gwW19774@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > Tom Lane wrote:
> >> Yes, and yes, but don't hold your breath on the latter part --- that
> >> TODO item has been around for awhile. And it's gotten harder now that
> >> we have lazy VACUUM; that means we need to be able to condense indexes
> >> concurrently with other index operations.
>
> > Can you remind me why it is so hard to fix this. I do not expect lazy
> > vacuum to handle index shrinking, but it should be possible with full
> > vacuum.
>
> If you make that restriction then it might be less painful to do. I
> have not thought about doing it that way; I'm of the opinion that only
> a solution that lets lazy vacuum do it will be a real solution.

Let me give you my logic on this --- to me it seems lazy vacuum already
allows index reuse, but there is a restriction that the reuse can only
be used by new keys that are similar to expired keys. It is hard to
imagine how you would implement btree index page _reuse_ --- I guess you
could have the page in some kind of free page map and use that when you
need a new index page. Condensing index entries across pages into a single
page seems almost impossible under concurrent access, and without that
capability, it seems pretty useless.

Now, with full vacuum, people expect the disk space to be returned, and
we have a full lock on the relation, so it seems a much simpler problem
and one people would expect to free.

I am saying that concurrent freeing vs. exclusive lock freeing seem like
different problems to me, and that we should at least be able to get
exclusive lock freeing working, somehow, even if we have to punt and
just automatically reindex the index when a certain percentage of the
index is dead entries. I honestly think auto-reindex is going to be the
easiest and fastest solution in most cases. Heck, I think reindex is
faster than updating the index entries with the new post-vacuum location
in cases with a large percentage of dead tuples. (Worst case is
deleting 50% of the tuples because vacuum moves the other 50% and has to
update the index.) I think this was reported about a year ago. We even
have a TODO item:

* Improve speed with indexes (perhaps recreate index instead) [vacuum]

I guess what I am saying is that we have two index/vacuum problems, and
some type of auto-reindexing could fix all of them in one shot.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com>
Cc: Steve Lane <slane(at)fmpro(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Subject: bool / vacuum full bug followup part 2
Date: 2002-06-12 21:18:59
Message-ID: 200206122118.g5CLIx912074@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


I added a -q/quiet option to oid2name, and have added a section to the
oid2name README showing how to use the utility, with an example using
du:

$ du * | while read SIZE OID
> do
> echo "$SIZE `oid2name -q -d test -o $OID`"
> done |
> sort -rn
2048 19324 = bigtable
1950 23903 = customers

---------------------------------------------------------------------------

Scott Marlowe wrote:
> On Mon, 6 May 2002, Steve Lane wrote:
>
> > I'd like to interject a short novice question, because I'd like to check
> > this:
> >
> > What's the best way to see how much space my indexes take up?
>
> You'll need a program called oid2name, found in the contrib directory of
> the postgresql source distribution. Many Linux distros include it
> automagically, but if not, it's a simple install.
>
> On my machine, I build postgresql from source, and I build it in the
> /usr/local/src/postgresql-x.y.z directory, where x.y.z is the version
> number. so, for 7.2.1, I would do this:
>
> su -
> (enter root password)
> cd /usr/local/src/postgresql-7.2.1/contrib/oid2name
> make
> make install
> exit (back to being joe regular user)
>
> then try entering oid2name. Assuming you have /usr/local/pgsql/bin in
> your path, it should run and tell you the name of all your databases.
>
> On my local test box, I get this:
> All databases:
> ---------------------------------
> 16556 = scott.marlowe
> 1126697 = test
> 1 = template1
> 16555 = template0
>
> Using the -d switch, you can get a list of all the oids used by a given
> database, like so:
>
> oid2name -d scott.marlowe
> All tables from database "scott.marlowe":
> ---------------------------------
> 126572 = accounts
> 1126708 = accounts_pkey
> 126566 = branches
> 1126706 = branches_pkey
> 126575 = history
> 126569 = tellers
> 1126707 = tellers_pkey
> 16557 = test
> 1126709 = test_id_dx
>
> Now, lastly, you need to be the postgres super user to do this, replacing
> db with the name of the database you wanna see, and dx with the name of
> the index you want to know about: (note this should be on one line, but it
> ran pretty long, so I have a \ continuation character in there)
>
> du -s $PGDATA/base/`oid2name |grep db|cut -d ' ' -f 1`/`oid2name -d \
> db|grep dx|cut -d ' ' -f 1`
>
> or scriptify it by putting that command into a file called
> /usr/local/pgsql/bin/tsize that looks like this:
>
> #!/bin/bash
> du -s $PGDATA/base/`oid2name |grep $1|cut -d ' ' -f 1`/`oid2name -d \
> $1|grep $2|cut -d ' ' -f 1`
>
> (don't forget to chmod 755 it so it's executable)
> and call it like so:
>
> tsize dbname tablename
>
> Hope that helps!
>
> explanation:
>
> What the above script does is kind like this:
>
> cd $PGDATA/base
> oid2name
> (find oid of your database in list)
> cd oid_of_your_database
> oid2name -d your_database
> (find oid of your index)
> du -s oid_of_your_index
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026