Re: Reading deleted records - PageHeader v3

Lists: pgsql-hackers
From: "Jonathan Bond-Caron" <jbondc(at)gmail(dot)com>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: Reading deleted records - PageHeader v3
Date: 2010-02-05 13:39:14
Message-ID: 002201caa668$9c5583a0$d5008ae0$@com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

So first I'm a pgsql hacker newbie and I've been reading up on the storage
structure:

http://www.postgresql.org/docs/8.2/interactive/storage-page-layout.html

I'm trying to recover deleted records from a page file (postgresql 8.2) :
i.e. base/dbId/20132

I am able to successfully read all the header data I need (PageHeaderData,
ItemIdData , HeapTupleHeaderData)

but I hit a wall when I try to start reading user data.

This has helped:

http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/include/postgres.h?rev=1.
77;content-type=text%2Fplain

I've read and understood fairly well how varlena structures are stored
(plain, compressed, external/toast) but so far I can't seem to read a plain
inline value.

I think part of my problem is I haven't really understood what 'Then make
sure you have the right alignment' means.

My approach currently is:

After reading HeapTupleHeaderData (23 bytes), I advance another 4 bytes
(hoff) and try to read a 32 bit integer (first attribute).

I am expecting to get an integer value 1 but I get 512 .

Am I doing this wrong?

Could someone point me to the pgsql code pieces I should be looking at?

If useful, this is the information I have before reading the 'user data':

object(PostgreSQL_HeapTupleHeaderData)#14 (7) {

["xmin"]=>

string(5) "13824"

["xmax"]=>

string(1) "0"

["cid"]=>

string(1) "0"

["ctid"]=>

object(PostgreSQL_ItemPointerData)#16 (2) {

["blockId"]=>

string(1) "0"

["posId"]=>

int(0)

}

["infomask2"]=>

int(0)

["infomask"]=>

int(2)

["hoff"]=>

int(4)

}

object(PostgreSQL_Attribute)#7 (6) {

["name"]=>

string(7) "book_id"

["relid"]=>

int(20132)

["len"]=>

int(4)

["num"]=>

int(1)

["ndims"]=>

int(0)

["align"]=>

string(1) "i"

}

array(1) {

["book_id"]=>

int(512)

}


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Jonathan Bond-Caron" <jbondc(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Reading deleted records - PageHeader v3
Date: 2010-02-06 06:20:53
Message-ID: 14890.1265437253@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Jonathan Bond-Caron" <jbondc(at)gmail(dot)com> writes:
> I think part of my problem is I haven't really understood what 'Then make
> sure you have the right alignment' means.

> My approach currently is:

> After reading HeapTupleHeaderData (23 bytes), I advance another 4 bytes
> (hoff) and try to read a 32 bit integer (first attribute).

No. First you start at the tuple beginning plus the number of bytes
indicated by hoff (which should be at least 24). The first field
will always be right there, because this position is always maximally
aligned. For subsequent fields you have to advance to a multiple of
the alignment requirement of the datatype. For example, assume the
table's first column is of type bool (1 byte) and the second column
is of type integer. The bool will be at offset hoff, but the integer
will be at offset hoff + 4 ... it can't immediately follow the bool,
at offset hoff + 1, because that position isn't correctly aligned.
It has to start at the next offset that's a multiple of 4.

regards, tom lane


From: "Jonathan Bond-Caron" <jbondc(at)gmail(dot)com>
To: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reading deleted records - PageHeader v3
Date: 2010-02-07 14:21:52
Message-ID: 000c01caa800$e5e83ff0$b1b8bfd0$@com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat Feb 6 01:20 AM, Tom Lane wrote:
> "Jonathan Bond-Caron" <jbondc(at)gmail(dot)com> writes:
> > I think part of my problem is I haven't really understood what 'Then
> > make sure you have the right alignment' means.
>
> > My approach currently is:
>
> > After reading HeapTupleHeaderData (23 bytes), I advance another 4
> > bytes
> > (hoff) and try to read a 32 bit integer (first attribute).
>
> No. First you start at the tuple beginning plus the number of bytes
> indicated by hoff (which should be at least 24).

Thanks, much appreciated!

I was reading HeapTupleHeaderData as 23 bytes but it's 27 bytes in
access/htup.h?rev=1.87.

The hoff now makes sense with a 28 bytes value and I can start to read the
user data.