Re: GiST insert algorithm rewrite

From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: GiST insert algorithm rewrite
Date: 2010-11-17 17:46:22
Message-ID: 4CE414EE.1000501@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Sorry, I missed beginning of discussion on GiST, so I read it on the web mail
archive.

You wrote:
http://archives.postgresql.org/pgsql-hackers/2010-11/msg00939.php
[skip]
0. (the child page is locked)
1. The parent page is locked.
2. The child page is split. The original page becomes the left half, and new
buffers are allocated for the right halves.
3. The downlink is inserted on the parent page (and the original downlink is
updated to reflect only the keys that stayed on the left page). While keeping
the child pages locked, the NSN field on the children are updated with the new
LSN of the parent page.
...
The scan checks that by comparing the LSN it saw on the parent page with the NSN
on the child page. If parent LSN < NSN, we saw the parent before the downlink
was inserted.

Now, the problem with crash recovery is that the above algorithm depends on the
split to keep the parent and child locked until the downlink is inserted in the
parent. If you crash between steps 2 and 3, the locks are gone. If a later
insert then updates the parent page, because of a split on some unrelated child
page, that will bump the LSN of the parent above the NSN on the child. Scans
will see that the parent LSN > child NSN, and will no longer follow the > rightlink.
[skip]

I disagree with that opinion: if we crash between 2 and 3 then why will somebody
update parent before WAL replay? WAL replay process in this case should complete
child split by inserting "invalid" pointer and tree become correct again,
although it needs to repair "invalid" pointers. The same situation with b-tree:
WAL replay repairs incomplete split before any other processing.

Or do I miss something important?

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2010-11-17 18:11:23 Re: unlogged tables
Previous Message Greg Stark 2010-11-17 17:42:33 Re: changing MyDatabaseId