More vacuum.c refactoring

Lists: pgsql-hackers
From: Manfred Koizar <mkoi-pg(at)aon(dot)at>
To: pgsql-hackers(at)postgresql(dot)org
Subject: More vacuum.c refactoring
Date: 2004-06-10 20:08:22
Message-ID: f5fhc09r8a5if2iodf0m72shi5m7j62cn3@email.aon.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Near the end of repair_frag() in vacuum.c -- under the comment /* clean
moved tuples from last page in Nvacpagelist list */ -- there is code
that marks itemids as unused. Itemids affected are those referring to
tuples that have been moved off the last page.

This code is very similar to vacuum_page(). The major difference is
that vacuum_page() uses vacpage->offsets while the code in repair_frag()
looks for MOVED_OFF bits in tuple headers. AFAICS the tuples with the
MOVED_OFF bit set are exactly those referenced by vacpage->offsets.

The attached patch passes make check and make installcheck. Please
apply unless I'm missing something.

Servus
Manfred

Attachment Content-Type Size
unknown_filename text/plain 2.2 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Manfred Koizar <mkoi-pg(at)aon(dot)at>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: More vacuum.c refactoring
Date: 2004-06-10 21:19:22
Message-ID: 4018.1086902362@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Manfred Koizar <mkoi-pg(at)aon(dot)at> writes:
> This code is very similar to vacuum_page(). The major difference is
> that vacuum_page() uses vacpage->offsets while the code in repair_frag()
> looks for MOVED_OFF bits in tuple headers. AFAICS the tuples with the
> MOVED_OFF bit set are exactly those referenced by vacpage->offsets.

This does not make me comfortable. You *think* that two different bits
of code are doing the same thing, so you want to hack up vacuum.c? This
module is delicate code --- we've had tons of bugs there in the past
--- and no I have zero confidence that passing the regression tests
proves anything, because all those prior bugs passed the regression
tests.

regards, tom lane


From: Manfred Koizar <mkoi-pg(at)aon(dot)at>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: More vacuum.c refactoring
Date: 2004-06-11 00:00:07
Message-ID: 58nhc0tcojhg57anm1iu5cgnjcrqk6o625@email.aon.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 10 Jun 2004 17:19:22 -0400, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>This does not make me comfortable.

I understand you, honestly. Do I read between your lines that you
didn't review my previous vacuum.c refactoring patch? Please do. It'd
make *me* more comfortable.

> You *think* that two different bits of code are doing the same thing,

I see three significant differences between the code in repair_frag()
and vacuum_page().

1) vacuum_page() has
Assert(vacpage->offsets_used == 0);

vacpage is the last VacPage that has been inserted into Nvacpagelist.
It is allocated in line 1566, offsets_used is immediately set to 0 and
never changed. So this Assert(...) doesn't hurt.

2) In vacuum_page() the lp_flags are set inside a critical section.

This is no problem because the clear-used-flags loop does not
elog(ERROR, ...). Please correct me if I'm wrong.

3) vacuum_page() uses vacpage->offsets to locate the itemids that are to
be updated.

If we can show that these are the same itemids that belong to the tuples
that are found by inspecting the tuple headers, then the two code
snippets are equivalent. The first hint that this is the case is
Assert(vacpage->offsets_free == num_tuples);

So both spots expect to update the same number of itemids. What about
the contents of the offsets[] array? Offset numbers are entered into
this array by statements like

vacpage->offsets[vacpage->offsets_free++] = offnum;

in or immediately after the walk-along-page loop. These assignments are
preceded either by code that sets the appropriate infomask bits or by
assertions that the bits are already set appropriately.

The rest (from PageRepairFragmentation to END_CRIT_SECTION) is
identical.

> so you want to hack up vacuum.c? This
>module is delicate code --- we've had tons of bugs there in the past

But why is it so delicate? Not only because it handles difficult
problems, but also because it is written in a not very
maintenance-friendly way. Before I started refactoring the code
repair_frag() had more than 1100 lines and (almost) all variables used
anywhere in the function were declared at function level.

We cannot declare a code freeze for a module just because it is so badly
written that every change is likely to break it. Someday someone will
*have* to change it.

Better we break it today in an effort to make the code clearer.

>--- and no I have zero confidence that passing the regression tests
>proves anything

Unfortunately you are right :-(

Servus
Manfred


From: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
To: Manfred Koizar <mkoi-pg(at)aon(dot)at>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: More vacuum.c refactoring
Date: 2004-06-11 00:24:50
Message-ID: 20040611002450.GA5581@dcc.uchile.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jun 11, 2004 at 02:00:07AM +0200, Manfred Koizar wrote:

If I may ...

> > so you want to hack up vacuum.c? This
> >module is delicate code --- we've had tons of bugs there in the past
>
> But why is it so delicate? Not only because it handles difficult
> problems, but also because it is written in a not very
> maintenance-friendly way. Before I started refactoring the code
> repair_frag() had more than 1100 lines and (almost) all variables used
> anywhere in the function were declared at function level.

I agree. This code is horrid. I also agree with Tom in that this
should be done with extreme caution, but it is a needed task.

Maybe we could establish heavier testing for this kind of change so
potential patches can be tested extensively. Concurrent vacuums with
all kinds of imaginable operations (insert, updates, deletes), in tight
loops, could be a start.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"No es bueno caminar con un hombre muerto"


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Manfred Koizar <mkoi-pg(at)aon(dot)at>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: More vacuum.c refactoring
Date: 2004-06-11 05:25:15
Message-ID: 19745.1086931515@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Manfred Koizar <mkoi-pg(at)aon(dot)at> writes:
> I understand you, honestly. Do I read between your lines that you
> didn't review my previous vacuum.c refactoring patch? Please do. It'd
> make *me* more comfortable.

I did not yet, but I will get to it. I encourage everyone else to
take a look too. I agree with Alvaro that fooling with this code
merits extreme caution.

BTW, I do not at all mean to suggest that vacuum.c contains no bugs
at the moment ;-). I suspect for example that it is a bit random
about whether MOVED_OFF/MOVED_IN bits get cleared immediately, or
only by the next transaction that chances to visit the tuple. The
next-transaction-fixup behavior has to be there in case the VACUUM
transaction crashes, but that doesn't mean that VACUUM should
deliberately leave work undone.

> I see three significant differences between the code in repair_frag()
> and vacuum_page().

Will study these comments later, but it's too late at night here...
again, the more eyeballs on this the better...

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
Cc: Manfred Koizar <mkoi-pg(at)aon(dot)at>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: More vacuum.c refactoring
Date: 2004-06-11 05:28:35
Message-ID: 19783.1086931715@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl> writes:
> Maybe we could establish heavier testing for this kind of change so
> potential patches can be tested extensively. Concurrent vacuums with
> all kinds of imaginable operations (insert, updates, deletes), in tight
> loops, could be a start.

VACUUM FULL takes an exclusive lock, so it should not have to worry
about concurrent operations on the table. What we have to think about
is the initial states it can see.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Manfred Koizar <mkoi-pg(at)aon(dot)at>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: More vacuum.c refactoring
Date: 2004-06-19 03:05:29
Message-ID: 200406190305.i5J35TI15948@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Where are we on this?

---------------------------------------------------------------------------

Tom Lane wrote:
> Manfred Koizar <mkoi-pg(at)aon(dot)at> writes:
> > I understand you, honestly. Do I read between your lines that you
> > didn't review my previous vacuum.c refactoring patch? Please do. It'd
> > make *me* more comfortable.
>
> I did not yet, but I will get to it. I encourage everyone else to
> take a look too. I agree with Alvaro that fooling with this code
> merits extreme caution.
>
> BTW, I do not at all mean to suggest that vacuum.c contains no bugs
> at the moment ;-). I suspect for example that it is a bit random
> about whether MOVED_OFF/MOVED_IN bits get cleared immediately, or
> only by the next transaction that chances to visit the tuple. The
> next-transaction-fixup behavior has to be there in case the VACUUM
> transaction crashes, but that doesn't mean that VACUUM should
> deliberately leave work undone.
>
> > I see three significant differences between the code in repair_frag()
> > and vacuum_page().
>
> Will study these comments later, but it's too late at night here...
> again, the more eyeballs on this the better...
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Manfred Koizar <mkoi-pg(at)aon(dot)at>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: More vacuum.c refactoring
Date: 2004-07-10 01:29:52
Message-ID: 200407100129.i6A1Tqt16607@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Where are we on this, 2x. :-)

---------------------------------------------------------------------------

Bruce Momjian wrote:
>
> Where are we on this?
>
> ---------------------------------------------------------------------------
>
> Tom Lane wrote:
> > Manfred Koizar <mkoi-pg(at)aon(dot)at> writes:
> > > I understand you, honestly. Do I read between your lines that you
> > > didn't review my previous vacuum.c refactoring patch? Please do. It'd
> > > make *me* more comfortable.
> >
> > I did not yet, but I will get to it. I encourage everyone else to
> > take a look too. I agree with Alvaro that fooling with this code
> > merits extreme caution.
> >
> > BTW, I do not at all mean to suggest that vacuum.c contains no bugs
> > at the moment ;-). I suspect for example that it is a bit random
> > about whether MOVED_OFF/MOVED_IN bits get cleared immediately, or
> > only by the next transaction that chances to visit the tuple. The
> > next-transaction-fixup behavior has to be there in case the VACUUM
> > transaction crashes, but that doesn't mean that VACUUM should
> > deliberately leave work undone.
> >
> > > I see three significant differences between the code in repair_frag()
> > > and vacuum_page().
> >
> > Will study these comments later, but it's too late at night here...
> > again, the more eyeballs on this the better...
> >
> > regards, tom lane
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 3: if posting/reading through Usenet, please send an appropriate
> > subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> > message can get through to the mailing list cleanly
> >
>
> --
> Bruce Momjian | http://candle.pha.pa.us
> pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
> + If your life is a hard drive, | 13 Roberts Road
> + Christ can be your backup. | Newtown Square, Pennsylvania 19073
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Manfred Koizar <mkoi-pg(at)aon(dot)at>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: More vacuum.c refactoring
Date: 2004-08-06 14:02:22
Message-ID: bh37h01lcs1vtogd8jh9b7e91893s57q8o@email.aon.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

[Sorry for the late reply. I'm still struggling to catch up after
vacation ...]

On Fri, 9 Jul 2004 21:29:52 -0400 (EDT), Bruce Momjian
<pgman(at)candle(dot)pha(dot)pa(dot)us> wrote:
>
>Where are we on this, 2x. :-)

Here:
>> Tom Lane wrote:
>> > Will study these comments later, but it's too late at night here...

Servus
Manfred


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Manfred Koizar <mkoi-pg(at)aon(dot)at>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: More vacuum.c refactoring
Date: 2004-08-06 14:56:48
Message-ID: 11855.1091804208@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Manfred Koizar <mkoi-pg(at)aon(dot)at> writes:
> On Fri, 9 Jul 2004 21:29:52 -0400 (EDT), Bruce Momjian
> <pgman(at)candle(dot)pha(dot)pa(dot)us> wrote:
>>
>> Where are we on this, 2x. :-)

> Here:
> Tom Lane wrote:
> Will study these comments later, but it's too late at night here...

I haven't had time to review the original, already-applied refactoring,
let alone the follow-up. Considering that there's been at least one bug
report against the original (unused variable IIRC), I think it does need
reviewing.

regards, tom lane


From: Manfred Koizar <mkoi-pg(at)aon(dot)at>
To: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
Cc: Neil Conway <neilc(at)samurai(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Refactoring
Date: 2005-01-19 17:57:48
Message-ID: 283tu0pamefsnoo4dgqfqhh366l3g366ga@email.aon.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

[Sorry, Neil, for abusing your thread. Moving this discussion back to
where it belongs.]

On Tue, 18 Jan 2005 13:17:17 -0300, Alvaro Herrera
<alvherre(at)dcc(dot)uchile(dot)cl> wrote:
>Hmm. I think this is a good idea on principle, but what happens in case
>a previous vacuum was interrupted? Is there a possibility that tuples
>belonging to that vacuum are still marked MOVED_OFF but are not in
>vacpage->offsets, for example?

These bits are handled by an earlier phase of VACUUM, by
HeapTupleSatisfiesVacuum() in scan_heap(). My first vacuum.c
refactoring patch, rev 1.281 2004-06-08, added these comments in
repair_frag():

/*
* VACUUM FULL has an exclusive lock on the relation. So
* normally no other transaction can have pending INSERTs or
* DELETEs in this relation. A tuple is either
* (a) a tuple in a system catalog, inserted or deleted by
* a not yet committed transaction or
* (b) dead (XMIN_INVALID or XMAX_COMMITTED) or
* (c) inserted by a committed xact (XMIN_COMMITTED) or
* (d) moved by the currently running VACUUM.
* In case (a) we wouldn't be in repair_frag() at all.
* In case (b) we cannot be here, because scan_heap() has
* already marked the item as unused, see continue above.
* Case (c) is what normally is to be expected.
* Case (d) is only possible, if a whole tuple chain has been
* moved while processing this or a higher numbered block.
*/

/*
* There cannot be another concurrently running VACUUM. If
* the tuple had been moved in by a previous VACUUM, the
* visibility check would have set XMIN_COMMITTED. If the
* tuple had been moved in by the currently running VACUUM,
* the loop would have been terminated. We had
* elog(ERROR, ...) here, but as we are testing for a
* can't-happen condition, Assert() seems more appropriate.
*/

/*
* MOVED_OFF by another VACUUM would have caused the
* visibility check to set XMIN_COMMITTED or XMIN_INVALID.
*/

This should answer your question, unless the comments are wrong. (BTW
parts of that patch have been backed out by someone, so you won't find
the second comment in current sources.)

As for the problem whether the two code paths deal with the same set of
tuples, read
http://archives.postgresql.org/pgsql-hackers/2004-06/msg00410.php:

| [...] These assignments are
| preceded either by code that sets the appropriate infomask bits or by
| assertions that the bits are already set appropriately.

Servus
Manfred


From: Manfred Koizar <mkoi-pg(at)aon(dot)at>
To: Manfred Koizar <mkoi-pg(at)aon(dot)at>
Cc: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>, Neil Conway <neilc(at)samurai(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Refactoring
Date: 2005-01-31 11:19:18
Message-ID: vd4sv054si7sfq81oi79dkonrc23ulfk8m@email.aon.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 19 Jan 2005 18:57:48 +0100, I wrote:
> My first vacuum.c
>refactoring patch, rev 1.281 2004-06-08, added these comments in
>repair_frag():
>
>/*
> * VACUUM FULL has an exclusive lock on the relation. So
> * normally no other transaction can have pending INSERTs or
> * DELETEs in this relation. A tuple is either
> * (a) a tuple in a system catalog, inserted or deleted by
> * a not yet committed transaction or
> * (b) dead (XMIN_INVALID or XMAX_COMMITTED) or
> * (c) inserted by a committed xact (XMIN_COMMITTED) or
> * (d) moved by the currently running VACUUM.
> * In case (a) we wouldn't be in repair_frag() at all.
> * In case (b) we cannot be here, because scan_heap() has
> * already marked the item as unused, see continue above.
> * Case (c) is what normally is to be expected.
> * Case (d) is only possible, if a whole tuple chain has been
> * moved while processing this or a higher numbered block.
> */

It turns out that this comment is not quite correct. It is incomplete.
Case (b) should be: known dead (XMIN_INVALID, or XMAX_COMMITTED and xmax
is visible to all active transactions).

And there is a fifth possibility: (e) deleted (XMAX_COMMITTED) but at
least one active transaction does not see the deleting transaction.

The patch seems to imply that case (e) is a subcase of (b), but
effectively tuples in this state are treated more like (c).

Servus
Manfred


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Manfred Koizar <mkoi-pg(at)aon(dot)at>
Cc: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>, Neil Conway <neilc(at)samurai(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Refactoring
Date: 2005-02-15 03:49:40
Message-ID: 200502150349.j1F3new04362@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Manfred Koizar wrote:
> On Wed, 19 Jan 2005 18:57:48 +0100, I wrote:
> > My first vacuum.c
> >refactoring patch, rev 1.281 2004-06-08, added these comments in
> >repair_frag():
> >
> >/*
> > * VACUUM FULL has an exclusive lock on the relation. So
> > * normally no other transaction can have pending INSERTs or
> > * DELETEs in this relation. A tuple is either
> > * (a) a tuple in a system catalog, inserted or deleted by
> > * a not yet committed transaction or
> > * (b) dead (XMIN_INVALID or XMAX_COMMITTED) or
> > * (c) inserted by a committed xact (XMIN_COMMITTED) or
> > * (d) moved by the currently running VACUUM.
> > * In case (a) we wouldn't be in repair_frag() at all.
> > * In case (b) we cannot be here, because scan_heap() has
> > * already marked the item as unused, see continue above.
> > * Case (c) is what normally is to be expected.
> > * Case (d) is only possible, if a whole tuple chain has been
> > * moved while processing this or a higher numbered block.
> > */
>
> It turns out that this comment is not quite correct. It is incomplete.
> Case (b) should be: known dead (XMIN_INVALID, or XMAX_COMMITTED and xmax
> is visible to all active transactions).
>
> And there is a fifth possibility: (e) deleted (XMAX_COMMITTED) but at
> least one active transaction does not see the deleting transaction.
>
> The patch seems to imply that case (e) is a subcase of (b), but
> effectively tuples in this state are treated more like (c).

OK, comment updated to:

/* ---
* VACUUM FULL has an exclusive lock on the relation. So
* normally no other transaction can have pending INSERTs or
* DELETEs in this relation. A tuple is either:
* (a) a tuple in a system catalog, inserted or deleted
* by a not yet committed transaction
* (b) known dead (XMIN_INVALID, or XMAX_COMMITTED and xmax
* is visible to all active transactions)
* (c) inserted by a committed xact (XMIN_COMMITTED)
* (d) moved by the currently running VACUUM.
* (e) deleted (XMAX_COMMITTED) but at least one active
* transaction does not see the deleting transaction
* In case (a) we wouldn't be in repair_frag() at all.
* In case (b) we cannot be here, because scan_heap() has
* already marked the item as unused, see continue above. Case
* (c) is what normally is to be expected. Case (d) is only
* possible, if a whole tuple chain has been moved while
* processing this or a higher numbered block.
* ---
*/

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073