Quick Links

Re: Free space management within heap page

Lists:	pgsql-hackers

From:	"Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Free space management within heap page
Date:	2007-01-23 08:18:08
Message-ID:	2e78013d0701230018m7bebdd35t32bdc8ea786511e8@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I am thinking that maintaining fragmented free space within a heap page
might be a good idea. It would help us to reuse the free space ASAP without
waiting for a vacuum run on the page. This in turn will lead to lesser heap
bloats and also increase the probability of placing updated tuple in the
same heap page as the original one.

So during a sequential or index scan, if a tuple is found to be dead, the
corresponding line pointer is marked "unused" and the space is returned to a
free list. This free list is maintained within the page. A linked-list can
be used for this purpose and the special area of the heap-page can be used
to track the fragment list. We can maintain some additional information
about the fragmented space such as, total_free_space, max_fragment_size,
num_of_fragments etc in the special area.

During UPDATEs, if we find that there is no free space in the block, the
fragment list is searched (either first-fit or best-fit), the required space
is consumed and the remaining space is returned to the free list.

We might not be able to reuse the line pointers because indexes may have
references to it. All such line pointers will be freed when the page is
vacuumed during the regular vacuum.

Thanks,
Pavan

EnterpriseDB http://www.enterprisedb.com

From:	Martijn van Oosterhout <kleptog(at)svana(dot)org>
To:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Free space management within heap page
Date:	2007-01-23 08:48:46
Message-ID:	20070123084846.GC19527@svana.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 23, 2007 at 01:48:08PM +0530, Pavan Deolasee wrote:
> I am thinking that maintaining fragmented free space within a heap page
> might be a good idea. It would help us to reuse the free space ASAP without
> waiting for a vacuum run on the page. This in turn will lead to lesser heap
> bloats and also increase the probability of placing updated tuple in the
> same heap page as the original one.

<snip>

Nice idea but:

> We might not be able to reuse the line pointers because indexes may have
> references to it. All such line pointers will be freed when the page is
> vacuumed during the regular vacuum.

The overwhelming vast majoirty of tuples are going to be in one or more
indexes. Which means nearly all tuples are going to fall into this
category. So where's the benefit?

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

From:	"Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
To:	"Martijn van Oosterhout" <kleptog(at)svana(dot)org>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Free space management within heap page
Date:	2007-01-23 09:03:11
Message-ID:	2e78013d0701230103y1d587db9q6733adca0ca92ca0@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 1/23/07, Martijn van Oosterhout <kleptog(at)svana(dot)org> wrote:
>
> On Tue, Jan 23, 2007 at 01:48:08PM +0530, Pavan Deolasee wrote:
>
> > We might not be able to reuse the line pointers because indexes may have
> > references to it. All such line pointers will be freed when the page is
> > vacuumed during the regular vacuum.
>
> The overwhelming vast majoirty of tuples are going to be in one or more
> indexes. Which means nearly all tuples are going to fall into this
> category. So where's the benefit?

The line pointers can not reused, but the space consumed by the tuple can
be.
So the benefit is in utilizing that space for newer tuples and thus reduce
the
bloat.

One assumption I am making here is that its sufficient to mark the line
pointer
"unused" (reset LP_USED flag) even though there is an index entry pointing
to
the tuple. During index scan, we anyways check for ItemIdIsUsed() before
proceeding further. I know it might break the ctid chain, but does that
really
matter ? I don't see any reason why somebody would need to follow ctid chain
past a dead tuple.

Thanks,
Pavan

EnterpriseDB http://www.enterprisedb.com

From:	ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To:	"Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
Cc:	"Martijn van Oosterhout" <kleptog(at)svana(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Free space management within heap page
Date:	2007-01-23 09:28:55
Message-ID:	20070123181802.55E1.ITAGAKI.TAKAHIRO@oss.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com> wrote:

> > The overwhelming vast majoirty of tuples are going to be in one or more
> > indexes. Which means nearly all tuples are going to fall into this
> > category. So where's the benefit?
>
> The line pointers can not reused, but the space consumed by the tuple can be.
> So the benefit is in utilizing that space for newer tuples and thus reduce the
> bloat.

I think your idea is same as the following TODO Item, that I suggested before.

* Consider shrinking expired tuples to just their headers.
http://archives.postgresql.org/pgsql-patches/2006-03/msg00142.php
http://archives.postgresql.org/pgsql-patches/2006-03/msg00166.php

> One assumption I am making here is that its sufficient to mark the line pointer
> "unused" (reset LP_USED flag) even though there is an index entry pointing to
> the tuple. During index scan, we anyways check for ItemIdIsUsed() before
> proceeding further. I know it might break the ctid chain, but does that really
> matter ? I don't see any reason why somebody would need to follow ctid chain
> past a dead tuple.

Keeping only line pointers itself is not a problem, but it might lead
bloating of line pointers. If a particular tuple in a page is replaced
repeatedly, the line pointers area bloats up to 1/4 of the page.
We need to work around the problem.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

From:	Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc:	Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Free space management within heap page
Date:	2007-01-23 09:33:28
Message-ID:	45B5D668.5050406@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Pavan Deolasee wrote:
> One assumption I am making here is that its sufficient to mark the line
> pointer
> "unused" (reset LP_USED flag) even though there is an index entry pointing
> to
> the tuple. During index scan, we anyways check for ItemIdIsUsed() before
> proceeding further. I know it might break the ctid chain, but does that
> really
> matter ? I don't see any reason why somebody would need to follow ctid
> chain
> past a dead tuple.

You can't clear the LP_USED flag, but you could use the LP_DELETE flag
that's currently not used in heap pages.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Free space management within heap page
Date:	2007-01-23 09:37:30
Message-ID:	45B5D75A.5040003@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Pavan Deolasee wrote:
> I am thinking that maintaining fragmented free space within a heap page
> might be a good idea. It would help us to reuse the free space ASAP without
> waiting for a vacuum run on the page. This in turn will lead to lesser heap
> bloats and also increase the probability of placing updated tuple in the
> same heap page as the original one.

Agreed.

> So during a sequential or index scan, if a tuple is found to be dead, the
> corresponding line pointer is marked "unused" and the space is returned
> to a
> free list. This free list is maintained within the page. A linked-list can
> be used for this purpose and the special area of the heap-page can be used
> to track the fragment list. We can maintain some additional information
> about the fragmented space such as, total_free_space, max_fragment_size,
> num_of_fragments etc in the special area.

Maintaining a list like that seems like a lot of hassle to me. Instead,
you could just scan the line pointers looking for a dead tuple of the
right size. We already have to scan the line pointers when inserting to
find a free line pointer.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To:	ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Free space management within heap page
Date:	2007-01-23 10:09:03
Message-ID:	45B5DEBF.3090200@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

ITAGAKI Takahiro wrote:
> "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com> wrote:
>
>>> The overwhelming vast majoirty of tuples are going to be in one or more
>>> indexes. Which means nearly all tuples are going to fall into this
>>> category. So where's the benefit?
>> The line pointers can not reused, but the space consumed by the tuple can be.
>> So the benefit is in utilizing that space for newer tuples and thus reduce the
>> bloat.
>
> I think your idea is same as the following TODO Item, that I suggested before.
>
> * Consider shrinking expired tuples to just their headers.
> http://archives.postgresql.org/pgsql-patches/2006-03/msg00142.php
> http://archives.postgresql.org/pgsql-patches/2006-03/msg00166.php

Yeah, same idea. You suggested in that thread that we should keep the
headers because of line pointer bloat, but I don't see how that's
better. You're still going to get some line pointer bloat, but not able
to reclaim as much free space.

In that thread, Tom mentioned that we may need to keep the header
because the dead tuple might be part of an update chain. Reading back
the discussion on the vacuum bug, I can't see how removing the header
would be a problem, but maybe I'm missing something.

>> One assumption I am making here is that its sufficient to mark the line pointer
>> "unused" (reset LP_USED flag) even though there is an index entry pointing to
>> the tuple. During index scan, we anyways check for ItemIdIsUsed() before
>> proceeding further. I know it might break the ctid chain, but does that really
>> matter ? I don't see any reason why somebody would need to follow ctid chain
>> past a dead tuple.
>
> Keeping only line pointers itself is not a problem, but it might lead
> bloating of line pointers. If a particular tuple in a page is replaced
> repeatedly, the line pointers area bloats up to 1/4 of the page.

Where does the 1/4 figure come from?

> We need to work around the problem.

If a row is updated many times until vacuum comes along, what currently
happens is that we end up with a bunch of pages full of dead tuples.
With the truncation scheme, we could fit way more dead tuples on each
page, reducing the need to vacuum. If a row is for example 40 bytes
long, including header (a quite narrow one), you could fit 10 line
pointers to the space of one row, which means that you could ideally
multiply your vacuum interval by a factor of 10x. That's a huge benefit,
though indexes would still bloat unless selects marking index pointers
as dead keep the bloat in control.

The problem is that if a tuple is updated say hundreds of times before
vacuum, but then it's not updated anymore, you'll have a page full of
useless line pointers that are not reclaimed. Clearly we should start
reclaiming line pointers, but we can only do that for unused line
pointers after the last used one.

Would it be enough cap the number of dead line pointers with a simple
rule like "max 20% of line pointers can be dead"? I'd be happy with that.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	"Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
To:	"Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc:	"ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, "Martijn van Oosterhout" <kleptog(at)svana(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Free space management within heap page
Date:	2007-01-23 10:30:23
Message-ID:	2e78013d0701230230n4643a30ao91f0001854154fc2@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 1/23/07, Heikki Linnakangas <heikki(at)enterprisedb(dot)com> wrote:
>
> ITAGAKI Takahiro wrote:
>
> > Keeping only line pointers itself is not a problem, but it might lead
> > bloating of line pointers. If a particular tuple in a page is replaced
> > repeatedly, the line pointers area bloats up to 1/4 of the page.
>
> Where does the 1/4 figure come from?
>
> > We need to work around the problem.
>
> If a row is updated many times until vacuum comes along, what currently
> happens is that we end up with a bunch of pages full of dead tuples.
> With the truncation scheme, we could fit way more dead tuples on each
> page, reducing the need to vacuum. If a row is for example 40 bytes
> long, including header (a quite narrow one), you could fit 10 line
> pointers to the space of one row, which means that you could ideally
> multiply your vacuum interval by a factor of 10x. That's a huge benefit,
> though indexes would still bloat unless selects marking index pointers
> as dead keep the bloat in control.
>
> The problem is that if a tuple is updated say hundreds of times before
> vacuum, but then it's not updated anymore, you'll have a page full of
> useless line pointers that are not reclaimed. Clearly we should start
> reclaiming line pointers, but we can only do that for unused line
> pointers after the last used one.
>
>
I thought that we can not reclaim the line pointers unless we remove the
corresponding index entries as well. Isn't that the case ? If so, how would
we reclaim the line pointers after the last used one ?

Thanks,
Pavan

EnterpriseDB http://www.enterprisedb.com

From:	"Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
To:	"Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Free space management within heap page
Date:	2007-01-23 10:35:39
Message-ID:	2e78013d0701230235k5c123bddu1a850c59cb5e5cb6@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 1/23/07, Heikki Linnakangas <heikki(at)enterprisedb(dot)com> wrote:
>
> Pavan Deolasee wrote:
>
> > So during a sequential or index scan, if a tuple is found to be dead,
> the
> > corresponding line pointer is marked "unused" and the space is returned
> > to a
> > free list. This free list is maintained within the page. A linked-list
> can
> > be used for this purpose and the special area of the heap-page can be
> used
> > to track the fragment list. We can maintain some additional information
> > about the fragmented space such as, total_free_space, max_fragment_size,
> > num_of_fragments etc in the special area.
>
> Maintaining a list like that seems like a lot of hassle to me. Instead,
> you could just scan the line pointers looking for a dead tuple of the
> right size. We already have to scan the line pointers when inserting to
> find a free line pointer.

That's a good suggestion. Just to avoid useless scans when there is no
fragment which can accommodate the new tuple, we may have some book keeping
information in the special area though.

Thanks,
Pavan

EnterpriseDB http://www.enterprisedb.com

From:	ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To:	Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Free space management within heap page
Date:	2007-01-23 10:47:50
Message-ID:	20070123191419.55E4.ITAGAKI.TAKAHIRO@oss.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas <heikki(at)enterprisedb(dot)com> wrote:

> > * Consider shrinking expired tuples to just their headers.
>
> Yeah, same idea. You suggested in that thread that we should keep the
> headers because of line pointer bloat, but I don't see how that's
> better. You're still going to get some line pointer bloat, but not able
> to reclaim as much free space.

That is not an essential solution, as you are aware. I think it will be
better to combine tuple shrinking and other restrictions of LP area.

> > Keeping only line pointers itself is not a problem, but it might lead
> > bloating of line pointers. If a particular tuple in a page is replaced
> > repeatedly, the line pointers area bloats up to 1/4 of the page.
>
> Where does the 1/4 figure come from?

BLCKSZ is typically 8192 bytes and sizeof(ItemPointerData) is 4 bytes.
1/4 comes from 8192 / 4 = 2048. If we allow zero-size tuples, the line
pointers area can bloat up to the ratio. We have tuples no less than
32 bytes-size, so the area is restricted 256 bytes now.

> The problem is that if a tuple is updated say hundreds of times before
> vacuum, but then it's not updated anymore, you'll have a page full of
> useless line pointers that are not reclaimed. Clearly we should start
> reclaiming line pointers, but we can only do that for unused line
> pointers after the last used one.

We can recycle unused line pointers, but we cannot shrink the area unless
the tail end of line pointers are removed. i.e, unusable free space will
remains at the middle of LP area.

[used lp][***unusable free space***][used lp] [free space] [heap tuples]

> Would it be enough cap the number of dead line pointers with a simple
> rule like "max 20% of line pointers can be dead"? I'd be happy with that.

Yeah, I think it is enough, too. It might be a signal of vacuum.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

From:	Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc:	ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Free space management within heap page
Date:	2007-01-23 10:49:44
Message-ID:	45B5E848.4060102@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Pavan Deolasee wrote:
> I thought that we can not reclaim the line pointers unless we remove the
> corresponding index entries as well. Isn't that the case ? If so, how would
> we reclaim the line pointers after the last used one ?

There might be index pointers to dead line pointers in the proposed
truncation scheme, so those can't be reclaimed, but after the index
pointers are removed and the line pointers are unused like they are
today, they could be reclaimed.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To:	ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Free space management within heap page
Date:	2007-01-23 13:18:55
Message-ID:	45B60B3F.2030707@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

ITAGAKI Takahiro wrote:
> BLCKSZ is typically 8192 bytes and sizeof(ItemPointerData) is 4 bytes.
> 1/4 comes from 8192 / 4 = 2048. If we allow zero-size tuples, the line
> pointers area can bloat up to the ratio. We have tuples no less than
> 32 bytes-size, so the area is restricted 256 bytes now.

sizeof(ItemPointerData) == 6 bytes

> We can recycle unused line pointers, but we cannot shrink the area unless
> the tail end of line pointers are removed. i.e, unusable free space will
> remains at the middle of LP area.

Yeah, agreed. It'd still be a good idea to do it when possible.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	"Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
To:	"Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc:	"ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, "Martijn van Oosterhout" <kleptog(at)svana(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Free space management within heap page
Date:	2007-01-23 13:43:39
Message-ID:	2e78013d0701230543k54087265tce71bf74155dc8b1@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 1/23/07, Heikki Linnakangas <heikki(at)enterprisedb(dot)com> wrote:
>
> ITAGAKI Takahiro wrote:
> > BLCKSZ is typically 8192 bytes and sizeof(ItemPointerData) is 4 bytes.
> > 1/4 comes from 8192 / 4 = 2048. If we allow zero-size tuples, the line
> > pointers area can bloat up to the ratio. We have tuples no less than
> > 32 bytes-size, so the area is restricted 256 bytes now.
>
> sizeof(ItemPointerData) == 6 bytes
>
>
I guess ITAGAKI meant sizeof(ItemIdData) which is 4 bytes. Thats the data
type
for the line pointer.

Thanks,
Pavan

EnterpriseDB http://www.enterprisedb.com

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
Cc:	"Martijn van Oosterhout" <kleptog(at)svana(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Free space management within heap page
Date:	2007-01-23 14:37:59
Message-ID:	18259.1169563079@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com> writes:
> I know it might break the ctid chain, but does that really matter ?

Yes. You can't just decide that the tuple isn't needed anymore.
As per other followup, you could possibly shrink a known-dead tuple to
just the header.

The notion of keeping linked lists etc seems like gross overdesign to me.
Why not just compact out the free space?

regards, tom lane

From:	"Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Martijn van Oosterhout" <kleptog(at)svana(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Free space management within heap page
Date:	2007-01-24 07:15:53
Message-ID:	2e78013d0701232315s50edcabet95ea44285498e996@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 1/23/07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com> writes:
> > I know it might break the ctid chain, but does that really matter ?
>
> Yes. You can't just decide that the tuple isn't needed anymore.
> As per other followup, you could possibly shrink a known-dead tuple to
> just the header.

My apologies if this has been discussed before. I went through the earlier
discussions, but its still very fuzzy to me. I am not able to construct a
case
where a tuple is DEAD (not RECENTLY_DEAD) and still there could be
a transaction need to follow the ctid pointer chain from its parent. Can
somebody help me to construct this scenario ?

The notion of keeping linked lists etc seems like gross overdesign to me.
> Why not just compact out the free space?

That would require us to acquire vacuum-strength lock on the page. For a
very large table where the probability of two backends looking at the same
page is very low, we might still be able to do that in most of the cases.
But
compacting a page would cause lots of data movements which might be
CPU intensive. Just a thought though.

Thanks,
Pavan

EnterpriseDB http://www.enterprisedb.com

From:	Martijn van Oosterhout <kleptog(at)svana(dot)org>
To:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Free space management within heap page
Date:	2007-01-24 10:53:50
Message-ID:	20070124105350.GA20752@svana.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jan 24, 2007 at 12:45:53PM +0530, Pavan Deolasee wrote:
> My apologies if this has been discussed before. I went through the earlier
> discussions, but its still very fuzzy to me. I am not able to construct a
> case
> where a tuple is DEAD (not RECENTLY_DEAD) and still there could be
> a transaction need to follow the ctid pointer chain from its parent. Can
> somebody help me to construct this scenario ?

I thought the classical example was a transaction that updated the same
tuple multiple times before committing. Then the version prior to the
transaction start isn't dead yet, but all but one of the versions
created by the transaction will be dead (they were never visible by
anybody else anyway).

I beleive other such corner cases are transactions that have
subtransactions that aborted after updating.

But I'm not that knowledgable on MVCC to be sure about that.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

From:	"Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
To:	"Martijn van Oosterhout" <kleptog(at)svana(dot)org>
Cc:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Free space management within heap page
Date:	2007-01-24 13:09:48
Message-ID:	2e78013d0701240509i4b373516s47f80194f73c6@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 1/24/07, Martijn van Oosterhout <kleptog(at)svana(dot)org> wrote:
>
> On Wed, Jan 24, 2007 at 12:45:53PM +0530, Pavan Deolasee wrote:
> > My apologies if this has been discussed before. I went through the
> earlier
> > discussions, but its still very fuzzy to me. I am not able to construct
> a
> > case
> > where a tuple is DEAD (not RECENTLY_DEAD) and still there could be
> > a transaction need to follow the ctid pointer chain from its parent. Can
> > somebody help me to construct this scenario ?
>
> I thought the classical example was a transaction that updated the same
> tuple multiple times before committing. Then the version prior to the
> transaction start isn't dead yet, but all but one of the versions
> created by the transaction will be dead (they were never visible by
> anybody else anyway).

I believe that calculation of oldestXmin would consider the running
transaction,
if any, which can still see the original tuple. So the intermediate tuples
won't be
declared DEAD (they will be declared RECENTLY_DEAD) as long as the other
transaction is running. Any newer transactions would always see the
committed
copy and hence need not follow ctid through the dead tuples.

I might be missing something very obvious, but thats what I am trying to
understand.

Thanks,
Pavan

EnterpriseDB http://www.enterprisedb.com

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	"Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
Cc:	"Martijn van Oosterhout" <kleptog(at)svana(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Free space management within heap page
Date:	2007-01-24 14:08:51
Message-ID:	877ivcecuk.fsf@stark.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com> writes:

> On 1/24/07, Martijn van Oosterhout <kleptog(at)svana(dot)org> wrote:
>>
>> I thought the classical example was a transaction that updated the same
>> tuple multiple times before committing. Then the version prior to the
>> transaction start isn't dead yet, but all but one of the versions
>> created by the transaction will be dead (they were never visible by
>> anybody else anyway).
>
> I believe that calculation of oldestXmin would consider the running
> transaction, if any, which can still see the original tuple. So the
> intermediate tuples won't be declared DEAD (they will be declared
> RECENTLY_DEAD) as long as the other transaction is running. Any newer
> transactions would always see the committed copy and hence need not follow
> ctid through the dead tuples.

Martijn is correct that HeapTupleSatisfiesVacuum considers tuples dead if
there were created and deleted by the same transaction even if that
transaction isn't past the oldestxmin horizon.

There's already been one bug in that area when it broke update chains, and to
fix it vacuum ignores tuples that were deleted by the same transaction in an
UPDATE statement.

This seems like such an unusual case, especially now that it's been narrowed
by that exception, that it's silly to optimize for it. Just treat these tuples
as live and they'll be vacuumed when their transaction commits and passes the
oldestxmin like normal.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

From:	"Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
To:	"Gregory Stark" <stark(at)enterprisedb(dot)com>
Cc:	"Martijn van Oosterhout" <kleptog(at)svana(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Free space management within heap page
Date:	2007-01-24 14:48:33
Message-ID:	2e78013d0701240648s6a21af71v3457695432b94654@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 1/24/07, Gregory Stark <stark(at)enterprisedb(dot)com> wrote:
>
> "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com> writes:
>
> > On 1/24/07, Martijn van Oosterhout <kleptog(at)svana(dot)org> wrote:
> >>
> >> I thought the classical example was a transaction that updated the same
> >> tuple multiple times before committing. Then the version prior to the
> >> transaction start isn't dead yet, but all but one of the versions
> >> created by the transaction will be dead (they were never visible by
> >> anybody else anyway).
> >
> > I believe that calculation of oldestXmin would consider the running
> > transaction, if any, which can still see the original tuple. So the
> > intermediate tuples won't be declared DEAD (they will be declared
> > RECENTLY_DEAD) as long as the other transaction is running. Any newer
> > transactions would always see the committed copy and hence need not
> follow
> > ctid through the dead tuples.
>
> Martijn is correct that HeapTupleSatisfiesVacuum considers tuples dead if
> there were created and deleted by the same transaction even if that
> transaction isn't past the oldestxmin horizon.

I agree. Here the tuple must had been created as an effect of INSERT and not
UPDATE. Since if its created because of UPDATE, then HEAP_UPDATED bit
is set on the tuple and tuple is not considered dead by
HeapTupleSatisfiesVacuum,
even if its xmin and xmax are same. So it must have been created by INSERT.
In
that case there can not be a parent linking this tuple via t_ctid.

> There's already been one bug in that area when it broke update chains, and
> to
> fix it vacuum ignores tuples that were deleted by the same transaction in
> an
> UPDATE statement.

Sounds logical.

> This seems like such an unusual case, especially now that it's been
> narrowed
> by that exception, that it's silly to optimize for it. Just treat these
> tuples
> as live and they'll be vacuumed when their transaction commits and passes
> the
> oldestxmin like normal.

I agree. Nevertheless, I don't see any problem with having that
optimization.

Now that I think more about it, there are places where xmin of the next
tuple
in the t_ctid chain is matched with the xmax of the previous tuple to detect
cases
where one of the intermediate DEAD tuples has been vacuumed away and the
slot
has been reused by a completely unrelated tuple. So doesn't than mean we
have
already made provision for scenarios where intermediate DEAD tuples are
vacuumed
away ?

Thanks,
Pavan

EnterpriseDB http://www.enterprisedb.com