Re: The vacuum-ignore-vacuum patch

Lists: pgsql-hackerspgsql-patches
From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Patches <pgsql-patches(at)postgresql(dot)org>
Subject: The vacuum-ignore-vacuum patch
Date: 2006-07-11 21:01:27
Message-ID: 20060711210127.GA8463@surnet.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Hi,

Hannu Krossing asked me about his patch to ignore transactions running
VACUUM LAZY in other vacuum transactions. I attach a version of the
patch updated to the current sources.

Just to remind what this is about: the point of the patch is to be able
to run more than one VACUUM LAZY simultaneously and not have them
interefere with each other. For example, assume you have a database
with two tables, one very big and another very small but with a high
update rate. One usually wants to vacuum the small one very frequently
in order to keep the number of dead tuples low. But if one starts to
vacuum the big table, it will take a long time, during which the vacuums
applied to the smaller table won't be able to recover any tuple because
that transaction will think the other transaction may want to read some
of the tuples that the small transaction is trying to remove.

We know this is not so -- a VACUUM can only be run in a standalone
transaction, and it only checks the one table it's vacuuming. Thus we
can optimize the vacuuming so that if the only thing that's holding the
tuples undeletable is another big vacuum operation, ignore it and delete
the tuples anyway.

One exception is that we can't do that with full vacuums. The reason is
that full vacuum may want to run user-defined functions to be able to
index the tuples it moves. This isn't a problem normally, except in the
case where the function tries to scan some other table: if we ignored
that transaction, then another lazy vacuum might delete tuples from that
table that we need to see.

In a previous version of the patch, there was a note somewhere that made
the code not ignore lazy vacuums in the case where we were running
database-wide vacuums. The reason was that the value we computed was
also used as truncate point for pg_clog; thus if we ignored that
transaction, the truncate point could be further ahead than the vacuum,
so the clog page for the vacuum transaction could be gone and it
wouldn't be able to commit. This is no longer the case, because with
the patch I committed yesterday, the clog truncation point is calculated
differently and thus we don't need to take special care about this.

--
Alvaro Herrera http://www.advogato.org/person/alvherre
"Uno combate cuando es necesario... ¡no cuando está de humor!
El humor es para el ganado, o para hacer el amor, o para tocar el
baliset. No para combatir." (Gurney Halleck)

Attachment Content-Type Size
ignore-vacuum-2006-07-11.patch text/plain 13.8 KB

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-patches(at)postgresql(dot)org
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: The vacuum-ignore-vacuum patch
Date: 2006-07-12 13:22:45
Message-ID: 200607121522.45771.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Am Dienstag, 11. Juli 2006 23:01 schrieb Alvaro Herrera:
> One exception is that we can't do that with full vacuums. The reason is
> that full vacuum may want to run user-defined functions to be able to
> index the tuples it moves. This isn't a problem normally, except in the
> case where the function tries to scan some other table: if we ignored
> that transaction, then another lazy vacuum might delete tuples from that
> table that we need to see.

Functions in the index expression must be immutable, so I don't think that is
a real concern.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Patches <pgsql-patches(at)postgresql(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: The vacuum-ignore-vacuum patch
Date: 2006-07-24 18:00:48
Message-ID: 12532.1153764048@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> writes:
> Hannu Krossing asked me about his patch to ignore transactions running
> VACUUM LAZY in other vacuum transactions. I attach a version of the
> patch updated to the current sources.

nonInVacuumXmin seems useless ... perhaps a vestige of some earlier
version of the computation?

In general, it seems to me that a transaction running lazy vacuum could
be ignored for every purpose except truncating clog/subtrans. Since it
will never insert its own XID into the database (note: VACUUM ANALYZE is
run as two separate transactions, hence the pg_statistic rows inserted
by ANALYZE are not a counterexample), there's no need for anyone to
include it as running in their snapshots. So unless I'm missing
something, this is a safe change for lazy vacuum, but perhaps not for
full vacuum, which *does* put its XID into the database.

A possible objection to this is that it would foreclose running VACUUM
and ANALYZE as a single transaction, exactly because of the point that
we couldn't insert pg_statistic rows using a lazy vacuum's XID. I think
there was some discussion of doing that in connection with enlarging
ANALYZE's sample greatly --- if ANALYZE goes back to being a full scan
or nearly so, it'd sure be nice to combine it with the VACUUM scan.
However maybe we should just accept that as the price of not having
multiple vacuums interfere with each other.

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Patches <pgsql-patches(at)postgresql(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: The vacuum-ignore-vacuum patch
Date: 2006-07-24 18:14:08
Message-ID: 20060724181408.GP5223@surnet.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> writes:
> > Hannu Krossing asked me about his patch to ignore transactions running
> > VACUUM LAZY in other vacuum transactions. I attach a version of the
> > patch updated to the current sources.
>
> nonInVacuumXmin seems useless ... perhaps a vestige of some earlier
> version of the computation?

Hmm ... I remember removing a now-useless variable somewhere, but maybe
this one escaped me. I don't have the code handy -- will check.

> In general, it seems to me that a transaction running lazy vacuum could
> be ignored for every purpose except truncating clog/subtrans. Since it
> will never insert its own XID into the database (note: VACUUM ANALYZE is
> run as two separate transactions, hence the pg_statistic rows inserted
> by ANALYZE are not a counterexample), there's no need for anyone to
> include it as running in their snapshots. So unless I'm missing
> something, this is a safe change for lazy vacuum, but perhaps not for
> full vacuum, which *does* put its XID into the database.

But keep in mind that in the current code, clog truncation takes
relminxid (actually datminxid) into account, not running transactions,
so AFAICS this should affect anything.

Subtrans truncation is different and it certainly should consider lazy
vacuum's Xids.

> A possible objection to this is that it would foreclose running VACUUM
> and ANALYZE as a single transaction, exactly because of the point that
> we couldn't insert pg_statistic rows using a lazy vacuum's XID. I think
> there was some discussion of doing that in connection with enlarging
> ANALYZE's sample greatly --- if ANALYZE goes back to being a full scan
> or nearly so, it'd sure be nice to combine it with the VACUUM scan.
> However maybe we should just accept that as the price of not having
> multiple vacuums interfere with each other.

Hmm, what about having a single scan for both, and then starting a
normal transaction just for the sake of inserting the pg_statistics
tuple?

I think the interactions of Xids and vacuum and other stuff are starting
to get complex; IMHO it warrants having a README.vacuum, or something.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Patches <pgsql-patches(at)postgresql(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: The vacuum-ignore-vacuum patch
Date: 2006-07-24 18:39:12
Message-ID: 12906.1153766352@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Tom Lane wrote:
>> A possible objection to this is that it would foreclose running VACUUM
>> and ANALYZE as a single transaction, exactly because of the point that
>> we couldn't insert pg_statistic rows using a lazy vacuum's XID.

> Hmm, what about having a single scan for both, and then starting a
> normal transaction just for the sake of inserting the pg_statistics
> tuple?

We could, but I think memory consumption would be the issue. VACUUM
wants a lotta memory for the dead-TIDs array, ANALYZE wants a lot for
its statistics gathering ... even more if it's trying to take a larger
sample than before. (This is probably why we kept them separate in
the last rewrite.)

> I think the interactions of Xids and vacuum and other stuff are starting
> to get complex; IMHO it warrants having a README.vacuum, or something.

Go for it ...

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: The vacuum-ignore-vacuum patch
Date: 2006-07-24 22:27:29
Message-ID: 20060724222729.GA11023@surnet.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Cc: pgsql-hackers removed, as this mail contains a patch.

Tom Lane wrote:
> Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> writes:
> > Hannu Krossing asked me about his patch to ignore transactions running
> > VACUUM LAZY in other vacuum transactions. I attach a version of the
> > patch updated to the current sources.
>
> nonInVacuumXmin seems useless ... perhaps a vestige of some earlier
> version of the computation?

Yup -- I checked that code and found out that nonInVacuumXmin can be
taken out as it's not used anywhere. One upside of this is that taking
it out means we can remove all diffs to GetSnapshotData. New patch
attached; it's a bit smaller than the last one.

I'm currently testing it. Since it appears there are no further
objections, I intend to commit it.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Attachment Content-Type Size
ignore-vacuum-2006-07-24.patch text/plain 12.2 KB

From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, Hannu Krosing <hannu(at)skype(dot)net>
Subject: Re: The vacuum-ignore-vacuum patch
Date: 2006-07-27 23:29:14
Message-ID: 20060727232914.GC21610@surnet.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> writes:
> > Hannu Krossing asked me about his patch to ignore transactions running
> > VACUUM LAZY in other vacuum transactions. I attach a version of the
> > patch updated to the current sources.
>
> nonInVacuumXmin seems useless ... perhaps a vestige of some earlier
> version of the computation?

Hmm, not useless at all really -- only a bug of mine. Turns out the
notInVacuumXmin stuff is essential, so I put it back.

I noticed something however -- in calculating the OldestXmin we always
consider all DBs, even though there is a parameter for skipping backends
not in the current DB -- this is because the Xmin we store in PGPROC is
always computed using all backends. The allDbs parameter only allows us
to skip the Xid of a transaction running elsewhere, but this is not very
helpful because the Xmin of transactions running in the local DB will
include those foreign Xids.

In case I'm not explaining myself, the problem is that if I open a
transaction in database A and then vacuum a table in database B, those
tuples deleted after the transaction in database A started cannot be
removed.

To solve this problem, one idea is to change the new member of PGPROC to
"current database's not in vacuum Xmin", which is the minimum of Xmins
of backends running in my database which are not executing a lazy
vacuum. This can be used to vacuum non-shared relations.

We could either add it anew, beside nonInVacuumXmin, or replace
nonInVacuumXmin. The difference will be whether we will have something
to be used to vacuum shared relations or not. I think in general,
shared relations are not vacuumed much so it shouldn't be too much of a
problem if we leave them to be vacuumed with the regular, all-databases,
include-vacuum Xmin.

The other POV is that we don't really care about long-running
transaction in other databases unless they are lazy vacuum, a case which
is appropiately covered by the patch as it currently stands. This seems
to be the POV that Hannu takes: the only long-running transactions he
cares about are lazy vacuums.

Thoughts?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Hannu Krosing <hannu(at)skype(dot)net>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: The vacuum-ignore-vacuum patch
Date: 2006-07-28 00:08:08
Message-ID: 1154045288.2908.26.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Ühel kenal päeval, N, 2006-07-27 kell 19:29, kirjutas Alvaro Herrera:

>
> We could either add it anew, beside nonInVacuumXmin, or replace
> nonInVacuumXmin. The difference will be whether we will have something
> to be used to vacuum shared relations or not. I think in general,
> shared relations are not vacuumed much so it shouldn't be too much of a
> problem if we leave them to be vacuumed with the regular, all-databases,
> include-vacuum Xmin.

Yes. I don't think that vacuuming shared relations will ever be a
significant performance concern.

> The other POV is that we don't really care about long-running
> transaction in other databases unless they are lazy vacuum, a case which
> is appropiately covered by the patch as it currently stands. This seems
> to be the POV that Hannu takes: the only long-running transactions he
> cares about are lazy vacuums.

Yes. The original target audience of this patch are users running 24/7
OLTP databases with big slow changing tables and small fast-changing
tables which need to stay small even at the time when the big ones are
vacuumed.

The other possible transactions which _could_ possibly be ignored while
VACUUMING are those from ANALYSE and non-lazy VACUUMs.

I don't care about them as:

ANALYSE is relatively fast, even on huge tables, and thus can be
ignored.

If you do run VACUUM FULL on anything bigger than a few thousand
lines then you are not running a 24/7 OLTP database anyway.

I also can't see a usecase for OLTP database where VACUUM FREEZE is
required.

Maybe we could also start ignoring the transactions that are running the
new CONCURRENT CREATE INDEX command, as it also runs inside its own
transaction(s) which can't possibly touch the tuples in the table being
vacuumed as it locks out VACUUM on the indexed table.

That would probably be quite easy to do by just having CONCURRENT CREATE
INDEX also mark its transactions as ignorable by VACUUM. Maybe the
variable name for that (proc->inVacuum) needs to be changed to something
like trxSafeToIgnoreByVacuum.

--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Hannu Krosing <hannu(at)skype(dot)net>
Subject: Re: The vacuum-ignore-vacuum patch
Date: 2006-07-28 00:34:20
Message-ID: 14676.1154046860@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Tom Lane wrote:
>> nonInVacuumXmin seems useless ... perhaps a vestige of some earlier
>> version of the computation?

> Hmm, not useless at all really -- only a bug of mine. Turns out the
> notInVacuumXmin stuff is essential, so I put it back.

Uh, why?

> I noticed something however -- in calculating the OldestXmin we always
> consider all DBs, even though there is a parameter for skipping backends
> not in the current DB -- this is because the Xmin we store in PGPROC is
> always computed using all backends. The allDbs parameter only allows us
> to skip the Xid of a transaction running elsewhere, but this is not very
> helpful because the Xmin of transactions running in the local DB will
> include those foreign Xids.

Yeah, this has been recognized for some time. However the overhead of
calculating local and global xmins in *every* transaction start is a
significant reason not to do it.

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org, Hannu Krosing <hannu(at)skype(dot)net>
Subject: Re: The vacuum-ignore-vacuum patch
Date: 2006-07-28 02:05:02
Message-ID: 200607280205.k6S252m20133@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


Another idea Jan had today was whether we could vacuum more rows if a
long-running backend is in serializable mode, like pg_dump.

---------------------------------------------------------------------------

Tom Lane wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> > Tom Lane wrote:
> >> nonInVacuumXmin seems useless ... perhaps a vestige of some earlier
> >> version of the computation?
>
> > Hmm, not useless at all really -- only a bug of mine. Turns out the
> > notInVacuumXmin stuff is essential, so I put it back.
>
> Uh, why?
>
> > I noticed something however -- in calculating the OldestXmin we always
> > consider all DBs, even though there is a parameter for skipping backends
> > not in the current DB -- this is because the Xmin we store in PGPROC is
> > always computed using all backends. The allDbs parameter only allows us
> > to skip the Xid of a transaction running elsewhere, but this is not very
> > helpful because the Xmin of transactions running in the local DB will
> > include those foreign Xids.
>
> Yeah, this has been recognized for some time. However the overhead of
> calculating local and global xmins in *every* transaction start is a
> significant reason not to do it.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, Hannu Krosing <hannu(at)skype(dot)net>
Subject: Re: The vacuum-ignore-vacuum patch
Date: 2006-07-28 04:05:22
Message-ID: 20060728040522.GD21610@surnet.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> > Tom Lane wrote:
> >> nonInVacuumXmin seems useless ... perhaps a vestige of some earlier
> >> version of the computation?
>
> > Hmm, not useless at all really -- only a bug of mine. Turns out the
> > notInVacuumXmin stuff is essential, so I put it back.
>
> Uh, why?

Because it's used to determine the Xmin that our vacuum will use. If
there is a transaction whose Xmin calculation included the Xid of a
transaction running vacuum, we have gained nothing from directly
excluding said vacuum's Xid, because it will affect us anyway indirectly
via that transaction's Xmin.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Hannu Krosing <hannu(at)skype(dot)net>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: The vacuum-ignore-vacuum patch
Date: 2006-07-28 07:36:04
Message-ID: 1154072165.2908.35.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Ühel kenal päeval, N, 2006-07-27 kell 22:05, kirjutas Bruce Momjian:
> Another idea Jan had today was whether we could vacuum more rows if a
> long-running backend is in serializable mode, like pg_dump.

I don't see how this gives us ability to vacuum more rows, as the
snapshot of a serializable transaction is the oldest one.

--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Hannu Krosing <hannu(at)skype(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: The vacuum-ignore-vacuum patch
Date: 2006-07-28 13:34:27
Message-ID: 200607281334.k6SDYR012849@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Hannu Krosing wrote:
> ?hel kenal p?eval, N, 2006-07-27 kell 22:05, kirjutas Bruce Momjian:
> > Another idea Jan had today was whether we could vacuum more rows if a
> > long-running backend is in serializable mode, like pg_dump.
>
> I don't see how this gives us ability to vacuum more rows, as the
> snapshot of a serializable transaction is the oldest one.

Good question. Imagine you have a serializable transaction like
pg_dump, and then you have lots of newer transactions. If pg_dump is
xid=12, and all the new transactions start at xid=30, any row created
and expired between 12 and 30 can be removed because they are not
visible. For a use case, imagine an UPDATE chain where a rows was
created by x=15 and expired by xid=19. Right now, we don't remove that
row, though we could.

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Hannu Krosing <hannu(at)skype(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: The vacuum-ignore-vacuum patch
Date: 2006-07-28 13:47:38
Message-ID: 19986.1154094458@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> Good question. Imagine you have a serializable transaction like
> pg_dump, and then you have lots of newer transactions. If pg_dump is
> xid=12, and all the new transactions start at xid=30, any row created
> and expired between 12 and 30 can be removed because they are not
> visible.

This reasoning is bogus.

It would probably be safe for pg_dump because it's a read-only
operation, but it fails badly if the serializable transaction is trying
to do updates. An update needs to chase the chain of newer versions of
the row forward from the version that's visible to the xact's
serializable snapshot, to see if anyone has committed a newer version.
Your proposal would remove elements of that chain, thereby possibly
allowing the serializable xact to conclude it may update the tuple
when it should have given an error.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Hannu Krosing <hannu(at)skype(dot)net>
Subject: Re: The vacuum-ignore-vacuum patch
Date: 2006-07-28 13:56:28
Message-ID: 20070.1154094988@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Tom Lane wrote:
>> Uh, why?

> Because it's used to determine the Xmin that our vacuum will use. If
> there is a transaction whose Xmin calculation included the Xid of a
> transaction running vacuum, we have gained nothing from directly
> excluding said vacuum's Xid, because it will affect us anyway indirectly
> via that transaction's Xmin.

But the patch changes things so that *everyone* excludes the vacuum from
their xmin. Or at least I thought that was the plan.

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, Hannu Krosing <hannu(at)skype(dot)net>
Subject: Re: The vacuum-ignore-vacuum patch
Date: 2006-07-28 14:49:31
Message-ID: 20060728144931.GB731@surnet.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> > Tom Lane wrote:
> >> Uh, why?
>
> > Because it's used to determine the Xmin that our vacuum will use. If
> > there is a transaction whose Xmin calculation included the Xid of a
> > transaction running vacuum, we have gained nothing from directly
> > excluding said vacuum's Xid, because it will affect us anyway indirectly
> > via that transaction's Xmin.
>
> But the patch changes things so that *everyone* excludes the vacuum from
> their xmin. Or at least I thought that was the plan.

We shouldn't do that, because that Xmin is also used to truncate
SUBTRANS. Unless we are prepared to say that vacuum does not use
subtransactions so it doesn't matter. This is true currently, so we
could go ahead and do it (unless I'm missing something) -- but it means
lazy vacuum will never be able to use subtransactions.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Hannu Krosing <hannu(at)skype(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: The vacuum-ignore-vacuum patch
Date: 2006-07-28 15:28:20
Message-ID: 200607281528.k6SFSKA18098@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > Good question. Imagine you have a serializable transaction like
> > pg_dump, and then you have lots of newer transactions. If pg_dump is
> > xid=12, and all the new transactions start at xid=30, any row created
> > and expired between 12 and 30 can be removed because they are not
> > visible.
>
> This reasoning is bogus.
>
> It would probably be safe for pg_dump because it's a read-only
> operation, but it fails badly if the serializable transaction is trying
> to do updates. An update needs to chase the chain of newer versions of
> the row forward from the version that's visible to the xact's
> serializable snapshot, to see if anyone has committed a newer version.
> Your proposal would remove elements of that chain, thereby possibly
> allowing the serializable xact to conclude it may update the tuple
> when it should have given an error.

So in fact members of the chain are not visible, but vacuum doesn't have
a strong enough lock to remove parts of the chain. What seems strange
is that vacuum can trim the chain, but only if you do members starting
from the head. I assume this is because you don't need to rejoin the
chain around the expired tuples.

("bogus" seems a little strong.)

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Hannu Krosing <hannu(at)skype(dot)net>
Subject: Re: The vacuum-ignore-vacuum patch
Date: 2006-07-28 16:03:20
Message-ID: 24952.1154102600@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Tom Lane wrote:
>> But the patch changes things so that *everyone* excludes the vacuum from
>> their xmin. Or at least I thought that was the plan.

> We shouldn't do that, because that Xmin is also used to truncate
> SUBTRANS.

Yeah, but you were going to change that, no? Truncating SUBTRANS will
need to include the vacuum xact's xmin, but we don't need it for any
other purpose.

> but it means
> lazy vacuum will never be able to use subtransactions.

This patch already depends on the assumption that lazy vacuum will never
do any transactional updates, so I don't see what it would need
subtransactions for.

regards, tom lane


From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Hannu Krosing <hannu(at)skype(dot)net>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: The vacuum-ignore-vacuum patch
Date: 2006-07-28 17:38:41
Message-ID: 20060728173840.GS66525@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Fri, Jul 28, 2006 at 03:08:08AM +0300, Hannu Krosing wrote:
> > The other POV is that we don't really care about long-running
> > transaction in other databases unless they are lazy vacuum, a case which
> > is appropiately covered by the patch as it currently stands. This seems
> > to be the POV that Hannu takes: the only long-running transactions he
> > cares about are lazy vacuums.
>
> Yes. The original target audience of this patch are users running 24/7
> OLTP databases with big slow changing tables and small fast-changing
> tables which need to stay small even at the time when the big ones are
> vacuumed.
>
> The other possible transactions which _could_ possibly be ignored while
> VACUUMING are those from ANALYSE and non-lazy VACUUMs.

There are other transactions to consider: user transactions that will
run a long time, but only hit a limited number of relations. These are
as big a problem in an OLTP environment as vacuum is.

Rather than coming up with machinery that will special-case vacuum or
pg_dump, etc., I'd suggest thinking about a generic framework that would
work for any long-runnnig transaction. One possibility:

Transaction flags itself as 'long-running' and provides a list of
exactly what relations it will be touching.

That list is stored someplace a future vacuum can get at.

The transaction runs, with additional checks that ensure it will not
touch any relations that aren't in the list it provided.

Any vacuums that start will take into account these lists of relations
from long-running transactions and build a list of XIDs that have
provided a list, and the minimum XID for every relation that was listed.
If vacuum wants to vacuum a relation that has been listed as part of a
long-running transaction, it will use the oldest XID in the
database/cluster or the oldest XID listed for that relation, whichever
is older. If it wants to vacuum a relation that is not listed, it will
use the oldest XID in the database/cluster, excluding those XIDs that
have listed exactly what relations they will be looking at.

That scheme won't help pg_dump... in order to do so, you'd need to allow
transactions to drop relations from their list.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461


From: Chris Browne <cbbrowne(at)acm(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: The vacuum-ignore-vacuum patch
Date: 2006-07-28 18:25:23
Message-ID: 60odv98u3w.fsf@dba2.int.libertyrms.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

jnasby(at)pervasive(dot)com ("Jim C. Nasby") writes:
> There are other transactions to consider: user transactions that will
> run a long time, but only hit a limited number of relations. These are
> as big a problem in an OLTP environment as vacuum is.
>
> Rather than coming up with machinery that will special-case vacuum or
> pg_dump, etc., I'd suggest thinking about a generic framework that would
> work for any long-runnnig transaction. One possibility:
>
> Transaction flags itself as 'long-running' and provides a list of
> exactly what relations it will be touching.
>
> That list is stored someplace a future vacuum can get at.
>
> The transaction runs, with additional checks that ensure it will not
> touch any relations that aren't in the list it provided.

One thought that's a bit different...

How about we mark transactions that are in serializable mode? That
would merely be a flag...

We would know that, for each such transaction, we could treat all
tuples "deadified" after those transactions as being dead and
cleanable.

That doesn't require any knowledge of relations that are
touched/locked...
--
"cbbrowne","@","cbbrowne.com"
http://www.ntlug.org/~cbbrowne/nonrdbms.html
To err is human, to moo bovine.


From: Hannu Krosing <hannu(at)skype(dot)net>
To: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: The vacuum-ignore-vacuum patch
Date: 2006-07-28 22:05:03
Message-ID: 1154124303.2961.18.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Ühel kenal päeval, R, 2006-07-28 kell 12:38, kirjutas Jim C. Nasby:
> On Fri, Jul 28, 2006 at 03:08:08AM +0300, Hannu Krosing wrote:
> > > The other POV is that we don't really care about long-running
> > > transaction in other databases unless they are lazy vacuum, a case which
> > > is appropiately covered by the patch as it currently stands. This seems
> > > to be the POV that Hannu takes: the only long-running transactions he
> > > cares about are lazy vacuums.
> >
> > Yes. The original target audience of this patch are users running 24/7
> > OLTP databases with big slow changing tables and small fast-changing
> > tables which need to stay small even at the time when the big ones are
> > vacuumed.
> >
> > The other possible transactions which _could_ possibly be ignored while
> > VACUUMING are those from ANALYSE and non-lazy VACUUMs.
>
> There are other transactions to consider: user transactions that will
> run a long time, but only hit a limited number of relations. These are
> as big a problem in an OLTP environment as vacuum is.

These transactions are better kept out of an OLTP database, by their
nature they belong to OLAP db :)

The reason I addressed the VACUUM first, was the fact that you can't
avoid VACUUM on OLTP db.

> Rather than coming up with machinery that will special-case vacuum or
> pg_dump, etc., I'd suggest thinking about a generic framework that would
> work for any long-runnnig transaction.

So instead of actually *solving* one problem you suggest *thinking*
about solving the general case ?

We have been *thinking* about dead-space-map for at least three years by
now.

> One possibility:
>
> Transaction flags itself as 'long-running' and provides a list of
> exactly what relations it will be touching.
>
> That list is stored someplace a future vacuum can get at.
>
> The transaction runs, with additional checks that ensure it will not
> touch any relations that aren't in the list it provided.

I have thought abou that too, but checking on each data change seemed
too expensive to me, at least for the first cut.

There seems to be some ways to avoid actual checking for table-in-list,
but you still have to check weather you have to check .

> Any vacuums that start will take into account these lists of relations
> from long-running transactions and build a list of XIDs that have
> provided a list, and the minimum XID for every relation that was listed.
> If vacuum wants to vacuum a relation that has been listed as part of a
> long-running transaction, it will use the oldest XID in the
> database/cluster or the oldest XID listed for that relation, whichever
> is older. If it wants to vacuum a relation that is not listed, it will
> use the oldest XID in the database/cluster, excluding those XIDs that
> have listed exactly what relations they will be looking at.
>
> That scheme won't help pg_dump... in order to do so, you'd need to allow
> transactions to drop relations from their list.

The whole thing is probably doable, but I doubt it will be done before
8.2 (or even 8.5, considering that I had the first vacuum-ignore-vacuum
patch ready by 8.0 (i think))

--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com


From: Jim Nasby <jnasby(at)pervasive(dot)com>
To: Hannu Krosing <hannu(at)skype(dot)net>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: The vacuum-ignore-vacuum patch
Date: 2006-07-28 22:43:03
Message-ID: CC87FE0E-F181-474C-9221-CC2E94A96E15@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Jul 28, 2006, at 5:05 PM, Hannu Krosing wrote:
> Ühel kenal päeval, R, 2006-07-28 kell 12:38, kirjutas Jim C. Nasby:
>> There are other transactions to consider: user transactions that will
>> run a long time, but only hit a limited number of relations. These
>> are
>> as big a problem in an OLTP environment as vacuum is.
>
> These transactions are better kept out of an OLTP database, by their
> nature they belong to OLAP db :)

Sure, but that's not always possible/practical.

>> Rather than coming up with machinery that will special-case vacuum or
>> pg_dump, etc., I'd suggest thinking about a generic framework that
>> would
>> work for any long-runnnig transaction.
>
> So instead of actually *solving* one problem you suggest *thinking*
> about solving the general case ?
>
> We have been *thinking* about dead-space-map for at least three
> years by
> now.

No, I just wanted anyone who was actually going to work on this to
think about a more general fix. If the vacuum-only fix has a chance
of getting into core a version before the general case, I'll happily
take what I can get.

>> One possibility:
>>
>> Transaction flags itself as 'long-running' and provides a list of
>> exactly what relations it will be touching.
>>
>> That list is stored someplace a future vacuum can get at.
>>
>> The transaction runs, with additional checks that ensure it will not
>> touch any relations that aren't in the list it provided.
>
> I have thought abou that too, but checking on each data change seemed
> too expensive to me, at least for the first cut.
>
> There seems to be some ways to avoid actual checking for table-in-
> list,
> but you still have to check weather you have to check .

Well, presumably the check to see if you have to check would be
extremely cheap. As for checking that only approved relations are
touched, you can do that by analyzing the rules/triggers/etc that are
on all the tables involved. Or for a start, just disallow this on
tables with rules or triggers (well, we'd probably have to allow for
RI).
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, Hannu Krosing <hannu(at)skype(dot)net>
Subject: Re: The vacuum-ignore-vacuum patch
Date: 2006-07-29 02:16:39
Message-ID: 20060729021639.GA6616@surnet.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> > Tom Lane wrote:
> >> But the patch changes things so that *everyone* excludes the vacuum from
> >> their xmin. Or at least I thought that was the plan.
>
> > We shouldn't do that, because that Xmin is also used to truncate
> > SUBTRANS.
>
> Yeah, but you were going to change that, no? Truncating SUBTRANS will
> need to include the vacuum xact's xmin, but we don't need it for any
> other purpose.

That's correct.

> > but it means
> > lazy vacuum will never be able to use subtransactions.
>
> This patch already depends on the assumption that lazy vacuum will never
> do any transactional updates, so I don't see what it would need
> subtransactions for.

Here is a patch pursuant to there ideas. The main change is that in
GetSnapshotData, a backend is skipped entirely if inVacuum is found to
be true.

I've been trying to update my SSH CVS several times today but I can't
reach the server. Maybe it's the DoS attach that it's been under, I
don't know.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Attachment Content-Type Size
ignore-vacuum-8.patch text/plain 16.6 KB

From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Jim Nasby <jnasby(at)pervasive(dot)com>
Cc: Hannu Krosing <hannu(at)skype(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: The vacuum-ignore-vacuum patch
Date: 2006-07-29 02:21:01
Message-ID: 20060729022101.GB6616@surnet.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Jim Nasby wrote:
> On Jul 28, 2006, at 5:05 PM, Hannu Krosing wrote:

> >So instead of actually *solving* one problem you suggest *thinking*
> >about solving the general case ?
> >
> >We have been *thinking* about dead-space-map for at least three
> >years by now.
>
> No, I just wanted anyone who was actually going to work on this to
> think about a more general fix. If the vacuum-only fix has a chance
> of getting into core a version before the general case, I'll happily
> take what I can get.

Well, the vacuum-only fix has the advantage that the patch has already
been written, tested, discussed, beaten to death, resurrected,
rewritten, and is ready to be committed, while the "general solution" is
not even past the handwaving phase, let alone *thinking*.

And we have only three days before feature freeze, so if you want the
general solution for 8.2 you should start *thinking* really fast :-)

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, Hannu Krosing <hannu(at)skype(dot)net>
Subject: Re: The vacuum-ignore-vacuum patch
Date: 2006-07-30 02:13:10
Message-ID: 20060730021310.GA19036@surnet.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Alvaro Herrera wrote:

> Here is a patch pursuant to there ideas. The main change is that in
> GetSnapshotData, a backend is skipped entirely if inVacuum is found to
> be true.

Patch applied.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.