Re: vacuum does not reclaim rows

Lists: pgsql-hackers
From: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
To: pgsql-hackers(at)postgresql(dot)org
Subject: vacuum does not reclaim rows
Date: 2003-07-05 15:00:39
Message-ID: 20030706.000039.78704633.t-ishii@sra.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Does anybody know why vacuum full does not relcaim deleted rows if a
open transaction which started before the deletion happens is running
even on a different database?

Here is an example:

T1: psql db1
T2: psql db2
T3: psql db2

T2: create table t1(i int); insert into t1 values(1);
T1: begin;
T2: delete from t1;
T3: vacuum full t1;

here vacuum will not reclaim deleted rows.

While tracking this, I found that GetOldestXmin(false) checks the
database id correctly:

if (allDbs || proc->databaseId == MyDatabaseId)

but after that it checks proc->xmin, where xmin may not be running on
the same database. I wonder if this is correct or not. Maybe we should
make sure that xmin is running on the same database if GetOldestXmin()
is called with its arg being set false? This is PostgreSQL 7.3.3.
--
Tatsuo Ishii


From: Kurt Roeckx <Q(at)ping(dot)be>
To: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: vacuum does not reclaim rows
Date: 2003-07-05 18:44:50
Message-ID: 20030705184450.GA10950@ping.be
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Jul 06, 2003 at 12:00:39AM +0900, Tatsuo Ishii wrote:
> Hi,
>
> Does anybody know why vacuum full does not relcaim deleted rows if a
> open transaction which started before the deletion happens is running
> even on a different database?

On what do you base that it doesn't reclaim those rows?

Kurt


From: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
To: Q(at)ping(dot)be
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: vacuum does not reclaim rows
Date: 2003-07-05 23:38:18
Message-ID: 20030706.083818.112625481.t-ishii@sra.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> > Does anybody know why vacuum full does not relcaim deleted rows if a
> > open transaction which started before the deletion happens is running
> > even on a different database?
>
> On what do you base that it doesn't reclaim those rows?

test=# vacuum full verbose t1;
INFO: --Relation public.t1--
INFO: Pages 1: Changed 1, reaped 0, Empty 0, New 0; Tup 1: Vac 0, Keep/VTL 1/0, UnUsed 0, MinLen 32, MaxLen 32; Re-using: Free/Avail. Space 8136/8136; EndEmpty/Avail. Pages 0/1.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: Rel t1: Pages: 1 --> 1; Tuple(s) moved: 0.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM

Here you can see "Pages: 1 --> 1; Tuple(s) moved: 0", it indicates
vacuum actually skips the table and does not reclaim those rows.

Here is a case when vacuum does reclaim.

test=# vacuum full verbose t1;
INFO: --Relation public.t1--
INFO: Pages 1: Changed 0, reaped 1, Empty 0, New 0; Tup 0: Vac 1, Keep/VTL 0/0, UnUsed 0, MinLen 0, MaxLen 0; Re-using: Free/Avail. Space 8168/0; EndEmpty/Avail. Pages 1/0.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: Rel t1: Pages: 1 --> 0.
VACUUM
--
Tatsuo Ishii


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: vacuum does not reclaim rows
Date: 2003-07-06 01:51:16
Message-ID: 6976.1057456276@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp> writes:
> but after that it checks proc->xmin, where xmin may not be running on
> the same database. I wonder if this is correct or not. Maybe we should
> make sure that xmin is running on the same database

How would you know? (At the time you are looking, it's quite possible
the other guy's xmin doesn't exist anymore.) In any case you can't just
arbitrarily ignore the other guy's xmin, since it's a proxy for
subsequent transaction IDs as well, and those might be in any database.

It might be possible to do something by having each proc store both
a "local" and a "global" xmin computed as of its current xid start,
but I haven't really thought through the details. In any case, that
would be extra bookkeeping needed during every transaction start,
so I'd want to see proof of a generally-useful improvement in return.

On the whole I'm against changing this logic ... I think the odds
of breaking something are high, and the odds of making a useful
improvement low ...

regards, tom lane


From: Andrew Sullivan <andrew(at)libertyrms(dot)info>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: vacuum does not reclaim rows
Date: 2003-07-08 04:39:32
Message-ID: 20030708043932.GC11964@libertyrms.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Jul 05, 2003 at 09:51:16PM -0400, Tom Lane wrote:

> but I haven't really thought through the details. In any case, that
> would be extra bookkeeping needed during every transaction start,
> so I'd want to see proof of a generally-useful improvement in return.

For what it's worth, we have redesigned around this very problem,
because we had a table, vacuumed every 5 minutes, which was always >
50% dead tuples. Of course, we _were_ able to redesign around it,
but I'm not sure whether we just moved the problem to a new location.
We'll see in the upcoming weeks. (Our testing says no, but I always
figure there's _some_ strange client case I never thought of.)

A

--
----
Andrew Sullivan 204-4141 Yonge Street
Liberty RMS Toronto, Ontario Canada
<andrew(at)libertyrms(dot)info> M2P 2A8
+1 416 646 3304 x110


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: vacuum does not reclaim rows
Date: 2003-07-21 04:14:55
Message-ID: 200307210414.h6L4Et626100@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


I think the big issue is that people think that if they have no one in a
specific database, that VACUUM FULL will completely remove unused
space, while this is not true if there are other backends connected to
other databases.

This might be a stupid question, but why does one backend have to care
about the global xmin at all? Isn't the local xmin the only important
value?

---------------------------------------------------------------------------

Tom Lane wrote:
> Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp> writes:
> > but after that it checks proc->xmin, where xmin may not be running on
> > the same database. I wonder if this is correct or not. Maybe we should
> > make sure that xmin is running on the same database
>
> How would you know? (At the time you are looking, it's quite possible
> the other guy's xmin doesn't exist anymore.) In any case you can't just
> arbitrarily ignore the other guy's xmin, since it's a proxy for
> subsequent transaction IDs as well, and those might be in any database.
>
> It might be possible to do something by having each proc store both
> a "local" and a "global" xmin computed as of its current xid start,
> but I haven't really thought through the details. In any case, that
> would be extra bookkeeping needed during every transaction start,
> so I'd want to see proof of a generally-useful improvement in return.
>
> On the whole I'm against changing this logic ... I think the odds
> of breaking something are high, and the odds of making a useful
> improvement low ...
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: vacuum does not reclaim rows
Date: 2003-07-21 04:45:32
Message-ID: 25076.1058762732@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> This might be a stupid question, but why does one backend have to care
> about the global xmin at all? Isn't the local xmin the only important
> value?

Not when it's looking at shared tables.

Do you want to try to make the tqual code aware of whether it's looking
at a shared or nonshared table? I don't honestly think it's worth it...

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: vacuum does not reclaim rows
Date: 2003-07-21 05:03:42
Message-ID: 200307210503.h6L53gE00844@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Oh, shared tables --- I forgot about those.

---------------------------------------------------------------------------

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > This might be a stupid question, but why does one backend have to care
> > about the global xmin at all? Isn't the local xmin the only important
> > value?
>
> Not when it's looking at shared tables.
>
> Do you want to try to make the tqual code aware of whether it's looking
> at a shared or nonshared table? I don't honestly think it's worth it...
>
> regards, tom lane
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073