Re: Btree internal node data?

Lists: pgsql-hackers
From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Btree internal node data?
Date: 2014-08-28 02:08:24
Message-ID: 20140828.110824.1195843073079055852.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

While looking into a btree internal page using pg_filedump against an
int4 index generated pgbench, I noticed that only item 2 has length 8,
which indicates that the index tuple has only tuple header and has no
index data. In my understanding this indicates that the item is used
to represent a down link to a page. Question is, why the item is 2,
not 1. I thought an index tuple indicating down link is always 1. Is
this a sign that something goes wrong?

Block 3 ********************************************************
<Header> -----
Block Offset: 0x00006000 Offsets: Lower 1164 (0x048c)
Block: Size 8192 Version 4 Upper 3624 (0x0e28)
LSN: logid 2 recoff 0x1550a608 Special 8176 (0x1ff0)
Items: 285 Free Space: 2460
Checksum: 0x0000 Prune XID: 0x00000000 Flags: 0x0000 ()
Length (including item array): 1164

<Data> ------
Item 1 -- Length: 16 Offset: 3624 (0x0e28) Flags: NORMAL
Item 2 -- Length: 8 Offset: 8168 (0x1fe8) Flags: NORMAL
Item 3 -- Length: 16 Offset: 8152 (0x1fd8) Flags: NORMAL
Item 4 -- Length: 16 Offset: 8136 (0x1fc8) Flags: NORMAL
Item 5 -- Length: 16 Offset: 8120 (0x1fb8) Flags: NORMAL
[snip]
Item 281 -- Length: 16 Offset: 3704 (0x0e78) Flags: NORMAL
Item 282 -- Length: 16 Offset: 3688 (0x0e68) Flags: NORMAL
Item 283 -- Length: 16 Offset: 3672 (0x0e58) Flags: NORMAL
Item 284 -- Length: 16 Offset: 3656 (0x0e48) Flags: NORMAL
Item 285 -- Length: 16 Offset: 3640 (0x0e38) Flags: NORMAL

<Special Section> -----
BTree Index Section:
Flags: 0x0000 ()
Blocks: Previous (0) Next (289) Level (1) CycleId (0)

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Btree internal node data?
Date: 2014-08-28 02:19:44
Message-ID: CAM3SWZT3Q+f3jRV9q_=TtZ_DXxf0O5KRiUe0+8TnM9gZ5cirAQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Aug 27, 2014 at 7:08 PM, Tatsuo Ishii <ishii(at)postgresql(dot)org> wrote:
> While looking into a btree internal page using pg_filedump against an
> int4 index generated pgbench, I noticed that only item 2 has length 8,
> which indicates that the index tuple has only tuple header and has no
> index data. In my understanding this indicates that the item is used
> to represent a down link to a page. Question is, why the item is 2,
> not 1. I thought an index tuple indicating down link is always 1. Is
> this a sign that something goes wrong?

No. On a non-rightmost page, the "high key" item is physically first
(which is a bit odd, because it serves as a high-bound invariant on
the items that the page stores, but it's convenient to do it that way
for other reasons). On an internal page (that is also non-rightmost),
the second item (which is the first "real" item - i.e. the item which
P_FIRSTDATAKEY() returns) is just placeholder garbage. The reason for
that is noted above _bt_compare():

* CRUCIAL NOTE: on a non-leaf page, the first data key is assumed to be
* "minus infinity": this routine will always claim it is less than the
* scankey. The actual key value stored (if any, which there probably isn't)
* does not matter. This convention allows us to implement the Lehman and
* Yao convention that the first down-link pointer is before the first key.
* See backend/access/nbtree/README for details.

--
Peter Geoghegan