Re: CLUSTER can change t_len

Lists: pgsql-hackers
From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: CLUSTER can change t_len
Date: 2010-11-09 03:44:45
Message-ID: 1289274285.754.8.camel@jdavis-ux.asterdata.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I don't think that this is a bug exactly, but it seems inconsistent.

See case below. After the item length gets changed, then when reading
the tuple later you get a t_len that includes padding.

We should document in a comment that t_len can mean multiple things. Or,
we should fix raw_heap_insert() to be consistent with the rest of the
code, which doesn't MAXALIGN the t_len.

Regards,
Jeff Davis

create table foo(i int4 unique);
insert into foo values(1);
insert into foo values(2);
checkpoint;
select relfilenode from pg_class where relname = 'foo';

relfilenode
-------------
16413
(1 row)

--
-- Look at on-disk item lengths, which are 28 (0x38 >> 1)
-- on my machine
--

cluster foo using foo_i_key;
select relfilenode from pg_class where relname = 'foo';

relfilenode
-------------
16418
(1 row)

checkpoint;

--
-- Now look again. They are 32 (0x40 >> 1) on my machine.
--


From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: CLUSTER can change t_len
Date: 2010-11-09 09:11:59
Message-ID: AANLkTi=NyAK9S8wi2X5uUwzJ2znvt3ajsfVRKun4XiYD@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Nov 9, 2010 at 12:44 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> See case below. After the item length gets changed, then when reading
> the tuple later you get a t_len that includes padding.

We can easily find it with pageinspect:

\i pageinspect.sql
create table foo(i int4);
insert into foo values(1);
SELECT lp, lp_len FROM heap_page_items(get_raw_page('foo', 0));
lp | lp_len
----+--------
1 | 28
VACUUM FULL foo;
SELECT lp, lp_len FROM heap_page_items(get_raw_page('foo', 0));
lp | lp_len
----+--------
1 | 32

> We should document in a comment that t_len can mean multiple things. Or,
> we should fix raw_heap_insert() to be consistent with the rest of the
> code, which doesn't MAXALIGN the t_len.

We have a comment /* be conservative */ in the function, but I'm not sure
we actually need the MAXALIGN. However, there would be almost no benefits
to keep t_len in small value because we often treat memory in MAXALIGN unit.

diff --git a/src/backend/access/heap/rewriteheap.c
b/src/backend/access/heap/rewriteheap.c
index 0bd1865..0ed94ef 100644
*** a/src/backend/access/heap/rewriteheap.c
--- b/src/backend/access/heap/rewriteheap.c
*************** raw_heap_insert(RewriteState state, Heap
*** 586,592 ****
else
heaptup = tup;

! len = MAXALIGN(heaptup->t_len); /* be conservative */

/*
* If we're gonna fail for oversize tuple, do it right away
--- 586,592 ----
else
heaptup = tup;

! len = heaptup->t_len;

/*
* If we're gonna fail for oversize tuple, do it right away

--
Itagaki Takahiro


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: CLUSTER can change t_len
Date: 2010-11-09 10:20:16
Message-ID: 4CD92060.1010801@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 09.11.2010 11:11, Itagaki Takahiro wrote:
> On Tue, Nov 9, 2010 at 12:44 PM, Jeff Davis<pgsql(at)j-davis(dot)com> wrote:
>> See case below. After the item length gets changed, then when reading
>> the tuple later you get a t_len that includes padding.
>
> We can easily find it with pageinspect:
>
> \i pageinspect.sql
> create table foo(i int4);
> insert into foo values(1);
> SELECT lp, lp_len FROM heap_page_items(get_raw_page('foo', 0));
> lp | lp_len
> ----+--------
> 1 | 28
> VACUUM FULL foo;
> SELECT lp, lp_len FROM heap_page_items(get_raw_page('foo', 0));
> lp | lp_len
> ----+--------
> 1 | 32
>
>> We should document in a comment that t_len can mean multiple things. Or,
>> we should fix raw_heap_insert() to be consistent with the rest of the
>> code, which doesn't MAXALIGN the t_len.
>
> We have a comment /* be conservative */ in the function, but I'm not sure
> we actually need the MAXALIGN. However, there would be almost no benefits
> to keep t_len in small value because we often treat memory in MAXALIGN unit.

Hmm, the conservatism at that point affects the free space calculations.
I'm not sure if it makes any difference in practice, but I'm also not
sure it doesn't. pd_upper is always MAXALIGNed, but pd_lower is not.

This would be more in line with what the main heap_insert code does:

--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -641,7 +641,7 @@ raw_heap_insert(RewriteState state, HeapTuple tup)
}

/* And now we can insert the tuple into the page */
- newoff = PageAddItem(page, (Item) heaptup->t_data, len,
+ newoff = PageAddItem(page, (Item) heaptup->t_data, heaptup->t_len,
InvalidOffsetNumber, false, true);
if (newoff == InvalidOffsetNumber)
elog(ERROR, "failed to add tuple");

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: CLUSTER can change t_len
Date: 2010-11-09 13:57:55
Message-ID: AANLkTin6DiCOeyAwPKk73+cs9Zm-XG4KxhxdW3hx2zN=@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Nov 9, 2010 at 10:20 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>
>> We have a comment /* be conservative */ in the function, but I'm not sure
>> we actually need the MAXALIGN. However, there would be almost no benefits
>> to keep t_len in small value because we often treat memory in MAXALIGN
>> unit.
>
> Hmm, the conservatism at that point affects the free space calculations. I'm
> not sure if it makes any difference in practice, but I'm also not sure it
> doesn't. pd_upper is always MAXALIGNed, but pd_lower is not.
>
> This would be more in line with what the main heap_insert code does:
>

Doesn't this cause assertion failures in heap_fill_tuple when the data
size isn't what's expected? I guess we never actually use the t_len
for later tuple reconstructions, we just recompute the needed size?
--
greg


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: CLUSTER can change t_len
Date: 2010-11-09 14:23:25
Message-ID: 4CD9595D.20103@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 09.11.2010 15:57, Greg Stark wrote:
> On Tue, Nov 9, 2010 at 10:20 AM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>>
>>> We have a comment /* be conservative */ in the function, but I'm not sure
>>> we actually need the MAXALIGN. However, there would be almost no benefits
>>> to keep t_len in small value because we often treat memory in MAXALIGN
>>> unit.
>>
>> Hmm, the conservatism at that point affects the free space calculations. I'm
>> not sure if it makes any difference in practice, but I'm also not sure it
>> doesn't. pd_upper is always MAXALIGNed, but pd_lower is not.
>>
>> This would be more in line with what the main heap_insert code does:
>
> Doesn't this cause assertion failures in heap_fill_tuple when the data
> size isn't what's expected? I guess we never actually use the t_len
> for later tuple reconstructions, we just recompute the needed size?

Right, the length from t_len or the item pointer is never passed to
heap_fill_tuple.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: CLUSTER can change t_len
Date: 2010-11-09 15:14:09
Message-ID: 17846.1289315649@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> On 09.11.2010 11:11, Itagaki Takahiro wrote:
>> We have a comment /* be conservative */ in the function, but I'm not sure
>> we actually need the MAXALIGN. However, there would be almost no benefits
>> to keep t_len in small value because we often treat memory in MAXALIGN unit.

> Hmm, the conservatism at that point affects the free space calculations.
> I'm not sure if it makes any difference in practice, but I'm also not
> sure it doesn't. pd_upper is always MAXALIGNed, but pd_lower is not.

I tend to agree with Jeff's original point that the behavior should
match regular tuple insertion exactly. This isn't about saving space,
because it won't; it's about not confusing readers by doing the same
thing in randomly different ways. I will also note that the regular
path is FAR better tested than raw_heap_insert. If there are any bugs
here, it's 1000:1 they're in raw_heap_insert not the regular path.

regards, tom lane


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: CLUSTER can change t_len
Date: 2010-11-09 15:53:31
Message-ID: 4CD96E7B.1080105@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 09.11.2010 17:14, Tom Lane wrote:
> Heikki Linnakangas<heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> On 09.11.2010 11:11, Itagaki Takahiro wrote:
>>> We have a comment /* be conservative */ in the function, but I'm not sure
>>> we actually need the MAXALIGN. However, there would be almost no benefits
>>> to keep t_len in small value because we often treat memory in MAXALIGN unit.
>
>> Hmm, the conservatism at that point affects the free space calculations.
>> I'm not sure if it makes any difference in practice, but I'm also not
>> sure it doesn't. pd_upper is always MAXALIGNed, but pd_lower is not.
>
> I tend to agree with Jeff's original point that the behavior should
> match regular tuple insertion exactly. This isn't about saving space,
> because it won't; it's about not confusing readers by doing the same
> thing in randomly different ways. I will also note that the regular
> path is FAR better tested than raw_heap_insert. If there are any bugs
> here, it's 1000:1 they're in raw_heap_insert not the regular path.

Agreed. I've committed my patch to make it behave like heap_insert.
Thank you, Itagaki, for the easy test case using pageinspect.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com