Single pass vacuum - take 2

Lists: pgsql-hackers
From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Single pass vacuum - take 2
Date: 2011-08-22 06:22:35
Message-ID: CABOikdPhAX5uGugB9RJNSj+zVEYTV8Sn4ctYfcMBc47r6_B2_g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi All,

Here is a revised patch based on our earlier discussion. I implemented
Robert's idea of tracking the vacuum generation number in the line
pointer itself. For LP_DEAD line pointers, the lp_off/lp_len is unused
(and always set to 0 for heap tuples). We use those 30 bits to store
the generation number of the vacuum which would have potentially
removed the corresponding index pointers, if the vacuum finished
successfully. The pg_class information is used to know the status of
the vacuum, whether it failed or succeeded. 30-bit numbers are large
enough that we can ignore any wrap-around related issues. With this
change, we don't need any additional header or special space in the
page which was one of the main objection to the previous version.

Other than this major change, I have added code commentary at relevant
places and also fixed the item.h comments to reflect the change. I
think the patch is ready for a serious review now.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com

Attachment Content-Type Size
Single-Pass-Vacuum-v4.patch text/x-patch 35.1 KB

From: Jim Nasby <jim(at)nasby(dot)net>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Single pass vacuum - take 2
Date: 2011-08-22 21:17:26
Message-ID: 91A3B5A1-8909-41AE-B6A4-6E48F7D3516A@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Aug 22, 2011, at 1:22 AM, Pavan Deolasee wrote:
> Hi All,
>
> Here is a revised patch based on our earlier discussion. I implemented
> Robert's idea of tracking the vacuum generation number in the line
> pointer itself. For LP_DEAD line pointers, the lp_off/lp_len is unused
> (and always set to 0 for heap tuples). We use those 30 bits to store
> the generation number of the vacuum which would have potentially
> removed the corresponding index pointers, if the vacuum finished
> successfully. The pg_class information is used to know the status of
> the vacuum, whether it failed or succeeded. 30-bit numbers are large
> enough that we can ignore any wrap-around related issues. With this

+ * Note: We don't worry about the wrap-around issues here since it would
+ * take a 1 Billion vacuums on the same relation for the vacuum generation
+ * to wrap-around. That would take ages to happen and even if it happens,
+ * the chances that we might have dead-vacuumed line pointers still
+ * stamped with the old (failed) vacuum are infinitely small since some
+ * other vacuum cycle would have taken care of them.

It would be good if some comment explained how we're safe in the case of an aborted vacuum. I'm guessing that when vacuum finds any line pointers that don't match the last successful vacuum exactly it will go and re-examine them from scratch?

I'm thinking that there should be a single comment somewhere that explains exactly how the 2-pass algorithm works. The comment in vacuum_log_cleanup_info seems to have the most info, but there's a few pieces still missing.

Also, found a typo:

+ * pass anyways). But this gives us two lareg benefits:

--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net


From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Jim Nasby <jim(at)nasby(dot)net>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Single pass vacuum - take 2
Date: 2011-08-30 10:38:44
Message-ID: CABOikdPFO7U5rrEkk7z84kxkFuZ3DGRkQj4m=qmSuLfuH_jNEQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Aug 23, 2011 at 2:47 AM, Jim Nasby <jim(at)nasby(dot)net> wrote:
> On Aug 22, 2011, at 1:22 AM, Pavan Deolasee wrote:
>> Hi All,
>>
>> Here is a revised patch based on our earlier discussion. I implemented
>> Robert's idea of tracking the vacuum generation number in the line
>> pointer itself. For LP_DEAD line pointers, the lp_off/lp_len is unused
>> (and always set to 0 for heap tuples). We use those 30 bits to store
>> the generation number of the vacuum which would have potentially
>> removed the corresponding index pointers, if the vacuum finished
>> successfully. The pg_class information is used to know the status of
>> the vacuum, whether it failed or succeeded. 30-bit numbers are large
>> enough that we can ignore any wrap-around related issues. With this
>
> +        * Note: We don't worry about the wrap-around issues here since it would
> +        * take a 1 Billion vacuums on the same relation for the vacuum generation
> +        * to wrap-around. That would take ages to happen and even if it happens,
> +        * the chances that we might have dead-vacuumed line pointers still
> +        * stamped with the old (failed) vacuum are infinitely small since some
> +        * other vacuum cycle would have taken care of them.
>
> It would be good if some comment explained how we're safe in the case of an aborted vacuum. I'm guessing that when vacuum finds any line pointers that don't match the last successful vacuum exactly it will go and re-examine them from scratch?
>

Yeah. If we don't know the status of the vacuum that collected the
line pointer and marked it vacuum-dead, the next vacuum will pick it
up again and stamp it with its own generation number.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com


From: Andy Colson <andy(at)squeakycode(dot)net>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: REVIEW Single pass vacuum - take 2
Date: 2011-09-07 02:58:48
Message-ID: 4E66DDE8.7080007@squeakycode.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 08/22/2011 01:22 AM, Pavan Deolasee wrote:
> Hi All,
>
> Here is a revised patch based on our earlier discussion. I implemented
> Robert's idea of tracking the vacuum generation number in the line
> pointer itself. For LP_DEAD line pointers, the lp_off/lp_len is unused
> (and always set to 0 for heap tuples). We use those 30 bits to store
> the generation number of the vacuum which would have potentially
> removed the corresponding index pointers, if the vacuum finished
> successfully. The pg_class information is used to know the status of
> the vacuum, whether it failed or succeeded. 30-bit numbers are large
> enough that we can ignore any wrap-around related issues. With this
> change, we don't need any additional header or special space in the
> page which was one of the main objection to the previous version.
>
> Other than this major change, I have added code commentary at relevant
> places and also fixed the item.h comments to reflect the change. I
> think the patch is ready for a serious review now.
>
> Thanks,
> Pavan
>

Hi Pavan, I tried to apply your patch to git master (as of just now) and it failed. I assume that's what I should be checking out, right?

-Andy


From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Andy Colson <andy(at)squeakycode(dot)net>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: REVIEW Single pass vacuum - take 2
Date: 2011-09-07 07:05:07
Message-ID: CABOikdMyDd1-D_qUO-5uneq_ten13L723imek0fe7qcX2zCMPA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Sep 7, 2011 at 8:28 AM, Andy Colson <andy(at)squeakycode(dot)net> wrote:
> On 08/22/2011 01:22 AM, Pavan Deolasee wrote:

>>
>
> Hi Pavan, I tried to apply your patch to git master (as of just now) and it
> failed.  I assume that's what I should be checking out, right?
>

Yeah, seems like it bit-rotted. Please try the attached patch. I also
fixed a typo and added some more comments as per suggestion by Jim.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com

Attachment Content-Type Size
Single-Pass-Vacuum-v5.patch text/x-patch 35.3 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Jim Nasby <jim(at)nasby(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Single pass vacuum - take 2
Date: 2011-09-23 16:37:18
Message-ID: CA+TgmobKRb5jwFOTZYYiDsvkY9dtaZPMqr_BzJ5JWHeBrTwHkQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Aug 30, 2011 at 6:38 AM, Pavan Deolasee
<pavan(dot)deolasee(at)gmail(dot)com> wrote:
> Yeah. If we don't know the status of the vacuum that collected the
> line pointer and marked it vacuum-dead, the next vacuum will pick it
> up again and stamp it with its own generation number.

I'm still not really comfortable with the handling of vacuum
generation numbers. If we're going to say that 2^30 is large enough
that we don't need to worry about the counter wrapping around, then we
need some justification for that position. Why can't we have 2^30
consecutive failed vacuums on a single table? Sure, it would take a
long time, but we guard against many failure conditions that would
take a long time, and the result is that we have fewer corner-case
failures. I want an explanation of why it's *safe*, and what the
smallest number of vacuum generations that we must support to make it
safe is. If we blow the handling of this, we are going to eat the
user's data, so we had better have a really convincing argument as to
why what we're doing is OK.

Here's a possible alternative implementation: we allow up to 32 vacuum
generations to exist at once. We keep a 64 bit integer indicating the
state of each vacuum generation: 00 = no line pointers with this
vacuum generation exist in the heap, 01 = some line pointers with this
vacuum generation may exist in the heap, but they are not removable,
11 = some line pointers with this vacuum generation exist in the heap,
and they are removable. Then, when we start a VACUUM, we look for a
vacuum generation with flags 01. If we find one, we adopt that as the
generation number for this vacuum. If not, we look for one with flags
00, and if we find one, we set its flags to 01 and adopt it as the
generation number for this vacuum. (If this too fails, then all
vacuums are in state 11. There are several ways that could be handled
- either we make a pass over the heap just to free dead line pointers,
or we randomly select a vacuum generation number and push it back to
state 01, or we make all line pointers encountered during the vacuum
merely dead rather than dead-vacuumed; I think I like that option
best.) When we complete the heap scan, we set the flags of any vacuum
generation numbers that were previously 11 back to 00 (assuming we've
visited all the not-all-visible pages). When we complete the index
pass, we set the flags of our chosen vacuum generation number to 11.

There is clearly room for argument about the details here; for
example, as the algorithm is presented, it's hard to see how you would
end up with more than one vacuum generation number in each state, so
maybe you only need three values, not 32. I suppose it could be
useful to have more values if you want to sometimes vacuum only part
of the heap, because then you'd only get to mark vacuum generation
numbers as unused on those occasions when you actually did scan the
whole heap. But regardless of that detail, the thing I like about
what I'm proposing here is that it provides a closed loop around the
management of vacuum generation numbers - we always know the exact
state of each vacuum generation number, as opposed to just hoping that
by the billionth vacuum there won't be any leftovers. Of course, it
may be also that we can convince ourselves that your algorithm as
implemented is safe ... but I'm not convinced, yet.

Another thing I'm not sure whether to worry about is the question of
where we store the vacuum generation information. I mean, if we store
it in pg_class, then what happens if the user does a manual update of
pg_class just as we're updating the vacuum generation information? We
had better make sure that there are no cases where we can accidentally
think that it's OK to reclaim dead line pointers that really still
have references, or we're going to end up with some awfully
difficult-to-find bugs... never mind the fact the possibility of the
user manually updating the value and hosing themselves. Of course, we
already have some of those issues - relfrozenxid probably has the same
problems - and I'm not 100% sure whether this one is any worse. It
would be really nice to have those non-transactional tables that
Alvaro keeps mumbling about, though, or some other way to store this
information.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Jim Nasby <jim(at)nasby(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Single pass vacuum - take 2
Date: 2011-10-01 19:20:49
Message-ID: 29E7E7C0-A4AE-4FB2-951A-2A54E6EFF80F@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sep 23, 2011, at 11:37 AM, Robert Haas wrote:
> Another thing I'm not sure whether to worry about is the question of
> where we store the vacuum generation information. I mean, if we store
> it in pg_class, then what happens if the user does a manual update of
> pg_class just as we're updating the vacuum generation information? We
> had better make sure that there are no cases where we can accidentally
> think that it's OK to reclaim dead line pointers that really still
> have references, or we're going to end up with some awfully
> difficult-to-find bugs... never mind the fact the possibility of the
> user manually updating the value and hosing themselves. Of course, we
> already have some of those issues - relfrozenxid probably has the same
> problems - and I'm not 100% sure whether this one is any worse. It
> would be really nice to have those non-transactional tables that
> Alvaro keeps mumbling about, though, or some other way to store this
> information.

Whenever I'd doing data modeling that involves both user modified data and system modified data, I always try to separate the two. That way you know that everything in the user-modify table can be changed at any time, and you can also lock down the system-data table to prevent the possibility of any user-driven changes.

So, non-transactional tables or not, I think it would be a pretty good idea to build some separation into the catalog tables where there is the risk of a conflict between user activities and system activities. Actually, assuming that all catalog tables keep using the internal access methods, it might be wise to go as far as separate data that is maintained by separate system activities, to avoid conflicts between different parts of the system.
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Jim Nasby <jim(at)nasby(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Single pass vacuum - take 2
Date: 2011-11-02 18:20:54
Message-ID: CA+TgmoZmVt8Em3Y9g3_2FkuWDFGcH_TXojsHXr=W-d7b5o7pwg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Sep 23, 2011 at 12:37 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I'm still not really comfortable with the handling of vacuum
> generation numbers.

Pavan and I spent a bit of time today talking about how many vacuum
generation numbers we need to have in order for this scheme to work.
Before my memory fades, here are some notes:

- In an ideal world, we'd only need two vacuum generation numbers.
Call them 1 and 2. We store the vacuum generation number of the last
successful vacuum somewhere. When the next vacuum starts, any
dead-vacuumed line pointers stamped with the generation number of the
last successful vacuum get marked unused. The new vacuum uses the
other generation number, stamping any new dead line pointers with that
generation and eventually, after the index vacuum is successfully
completed, storing that value as the last successful vacuum generation
number. The next vacuum will repeat the whole cycle with the roles of
the two available vacuum generation numbers reversed. If a vacuum
fails midway through, the last successful vacuum generation number
doesn't get updated; the next vacuum will reuse the same vacuum
generation number, which should be fine.

- However, making this work with HOT pruning is a bit stickier. If
the last successful vacuum generation number is stored in pg_class and
heap_page_prune() looks at the relcache entry to get it, the value
there might be out of date. If a HOT pruning operation sees that the
last successful vacuum generation was X, but in the meantime a new
vacuum has started that also uses generation number X, then the HOT
prune might mark a dead line pointer as unused while there are still
index entries pointing to it, which would be bad. So here, there's
value to having a large number of vacuum generations rather than just
two.

- In particular, assuming we store the vacuum generation number in
pg_class, we'd like to have enough vacuum generation numbers that the
counter can't wrap around while there's still an old relcache entry
lying around. 2^31 seems like enough, because each vacuum consumes an
XID (but will that necessarily always be the case?) and if you've
eaten through 2^31 XIDs then any still-running transaction would be
suffering from wraparound problems anyway (but what if it's read-only
and keeps taking new snapshots without ever rebuilding the relcache
entry? can that happen?). However, if we store the vacuum generation
in the line pointer, we only have 30 bits available.

- There's also a problem with having just two vacuum generation
numbers if someone does a transactional update to pg_class, even if
they don't touch the hypothetical field that stores the generation
number:

rhaas=# begin;
BEGIN
rhaas=# update pg_class set relname=relname where oid='test'::regclass;
UPDATE 1

Then, in another session:
rhaas=# vacuum test;
VACUUM

VACUUM is perfectly happy to do a non-transactional update on the
then-current version of the pg_class tuple even while an open
transaction has a pending update to that tuple that might get
committed just afterward. We can't risk getting confused about the
current vacuum generation number. Well, OK, technically we can: if
there are an infinite number of vacuum generation numbers available,
then the worst thing that happens is we forget that a bunch of dead
line pointers are reclaimable, and do a bunch of extra work that isn't
really necessary. But if there are just two, we're now going to get
confused about which line pointers can be safely reclaimed.

...

So, what do we do? Possible solutions appear to include:

- Find some more bit space, so that we can make the vacuum generation
number wider.
- Store the vacuum generation number someplace other than a system
catalog, where the effects that can make us see a stale value or lose
an update don't exist.
- Don't let HOT pruning reclaim dead-vacuumed line pointers.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company