Re: Gist Recovery testing

From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Gist Recovery testing
Date: 2005-06-14 11:47:37
Message-ID: 42AEC3D9.5060107@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> btree manages to avoid calling the index support functions during WAL
> restore --- can't you do the same? Perhaps you need to be including
> more information in the initial xlog records, so that the cleanup step
> has everything it needs. Or resort to brute-force search (which is more
> or less what btree does). I don't think this operation needs to be very
> efficient, since it's a corner case that should only seldom be invoked.

I've just commited WALogging for GiST. It works for online backup (normal
recovery) and mostly after crash, but in this case it can't restore inclompleted
inserts although it can detected and say to log thet it's needed to reindex
GiST index.

The problem with incopmleted inserts is: when new entry is installed into leaf
page, all chain (in a worst case) of keys from root page to leaf should be
updated. Of course, case of updating all chain is rarely and usially it's
updated only part. Each key on inner pages contains union of keys (bounding box
in a case of rtree, for example) on page which it points. This union can formed
only with a help of user-defined function of opclass, because of GiST doesn't
know something about nature of keys. Returning to WAL, GiST core write xlog
entry with all nessary information for restoration before write page, but in
this moment it doesn't know it should update keys on parent page or key is
unchanged. So GiST's WAL restoration code should remember this page's update as
incompleted insert. When insert complited, GiST's core write to log that insert
is completed and restore code can clean up stored incompleted insert. If it was
crash, then sign of completed insert can be absent in log, and GiST's restore
code should continue it. While continue, it's know which page was changed and
should walk up to root. On each step of walk it should form union for page and
insert it to parent.

This is absent in btree, because of btree is sorted, so parent's key simply is
right on child page and xlog code doesn't need to call any support function.

Which ways I see:
1 Leave it as is. Incomplited inserts isn't often case, and code can detect it
to suggest reindex.
2 In principle, its possible to collect all information about changes per insert
in the tree in one WAL record. But itsn't good solution, because of it needs to
ugly coding of inserts: collect all pages and write it after pushing WAL record.
Second, this manner kills using LSN for page control for concurrency. But it
fully solve problem with incomplete insert.
3 Construct relation by XLogOpenRelation some more full than now. In fact,
restore code needs only 2 thing about index: index->rd_att and information for
working index_getprocinfo. I understand why idea to calling any support
functions during WAL recovery is bad.
4 Add one hook to finalyze WAL code. This hook should be called when system
catalog is maded accessible.
5 There is one unused bit in IndexTupleData. GiST code can use it as mark that
this tuple contains incorrect union key. In this case, GiST search algorithm
should keep it mind that such tuple has incorrect value and always should go to
it's child to check. During insertions in tree such tuples will be little by
little changed to correct. Defect of this scheme will be a possible performance
degradation after crash recovery. In other hand, it's possible to add recreation
such tuples to vacuum code (in gistbulkdelete).

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jan Wieck 2005-06-14 12:10:47 Re: The Contrib Roundup (long)
Previous Message Oleg Bartunov 2005-06-14 10:04:59 Re: Best way to scan on-disk bitmaps