Quick Links

Re: Gist Recovery testing

Lists:	pgsql-hackers

From:	Teodor Sigaev <teodor(at)sigaev(dot)ru>
To:	Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Gist Recovery testing
Date:	2005-06-10 16:24:13
Message-ID:	42A9BEAD.6050806@sigaev.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Ok, I already realized WAL in GiST and tested with online backup. It works.
Now I need test it on recovery after crash (power loss etc) and there is a
problem in GiST with incompleted inserts, I hope, I resolved it, but this
problem, I think, has the same nature as incompleted splits in btree code. For
solving, btree xlog code uses stack of incompleted splits and if they leaves
incompleted, btree_xlog_cleanup function completes them. I make use similar
idea, but I can't reproduce situation when xlog_cleanup makes some useful work
except just looks at void stack. So question: how is it possible to reproduce
desired behaviour?

I tried to kill(getpid(), SIGKILL) in a middle of insertion (inside gist.c), but
postgres doesn't try ro restore aborted transaction....

Thank you.
--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc:	Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Gist Recovery testing
Date:	2005-06-10 16:46:32
Message-ID:	23513.1118421992@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Teodor Sigaev <teodor(at)sigaev(dot)ru> writes:
> Ok, I already realized WAL in GiST and tested with online backup. It works.
> Now I need test it on recovery after crash (power loss etc) and there is a
> problem in GiST with incompleted inserts, I hope, I resolved it, but this
> problem, I think, has the same nature as incompleted splits in btree code. For
> solving, btree xlog code uses stack of incompleted splits and if they leaves
> incompleted, btree_xlog_cleanup function completes them. I make use similar
> idea, but I can't reproduce situation when xlog_cleanup makes some useful work
> except just looks at void stack. So question: how is it possible to reproduce
> desired behaviour?

It's not very easy; you have to arrange for the WAL history to end
between the two action records that you are interested in. What I'd
suggest is
- run test backend under GDB, set breakpoint between
making log entry for first split step and entry for second.
- let it reach the breakpoint
- in another backend, commit some unrelated transaction to
force XLOG buffers to disk
- in GDB, "kill -9" to prevent test backend from proceeding

If you want to trace through the recovery as well, the best bet might be
to temporarily add a "sleep(30)" near the beginning of StartupXlog so
that you have time to reattach to the recovery process.

regards, tom lane

From:	Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Gist Recovery testing
Date:	2005-06-12 17:37:03
Message-ID:	Pine.GSO.4.63.0506122131250.14456@ra.sai.msu.su
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

There is a problem with gist_xlog_cleanup function - we have no
information about user defined functions and indices, because postmaster
doesn't started yet and system catalog isn't available. For btree it's not
necessary, but for GiST we certainly need them. Is't possible to add
additional hook after clean up stage, so we could use it to complete
incompleted inserts ? Any suggestions ?

Oleg
On Fri, 10 Jun 2005, Tom Lane wrote:

> Teodor Sigaev <teodor(at)sigaev(dot)ru> writes:
>> Ok, I already realized WAL in GiST and tested with online backup. It works.
>> Now I need test it on recovery after crash (power loss etc) and there is a
>> problem in GiST with incompleted inserts, I hope, I resolved it, but this
>> problem, I think, has the same nature as incompleted splits in btree code. For
>> solving, btree xlog code uses stack of incompleted splits and if they leaves
>> incompleted, btree_xlog_cleanup function completes them. I make use similar
>> idea, but I can't reproduce situation when xlog_cleanup makes some useful work
>> except just looks at void stack. So question: how is it possible to reproduce
>> desired behaviour?
>
> It's not very easy; you have to arrange for the WAL history to end
> between the two action records that you are interested in. What I'd
> suggest is
> - run test backend under GDB, set breakpoint between
> making log entry for first split step and entry for second.
> - let it reach the breakpoint
> - in another backend, commit some unrelated transaction to
> force XLOG buffers to disk
> - in GDB, "kill -9" to prevent test backend from proceeding
>
> If you want to trace through the recovery as well, the best bet might be
> to temporarily add a "sleep(30)" near the beginning of StartupXlog so
> that you have time to reattach to the recovery process.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Cc:	Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Gist Recovery testing
Date:	2005-06-12 17:57:38
Message-ID:	25636.1118599058@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> writes:
> There is a problem with gist_xlog_cleanup function - we have no
> information about user defined functions and indices, because postmaster
> doesn't started yet and system catalog isn't available. For btree it's not
> necessary, but for GiST we certainly need them.

btree manages to avoid calling the index support functions during WAL
restore --- can't you do the same? Perhaps you need to be including
more information in the initial xlog records, so that the cleanup step
has everything it needs. Or resort to brute-force search (which is more
or less what btree does). I don't think this operation needs to be very
efficient, since it's a corner case that should only seldom be invoked.

It's a really bad idea to be calling any support functions during WAL
recovery, since you won't have a transaction open for them to run in...

regards, tom lane

From:	Teodor Sigaev <teodor(at)sigaev(dot)ru>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Gist Recovery testing
Date:	2005-06-14 11:47:37
Message-ID:	42AEC3D9.5060107@sigaev.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> btree manages to avoid calling the index support functions during WAL
> restore --- can't you do the same? Perhaps you need to be including
> more information in the initial xlog records, so that the cleanup step
> has everything it needs. Or resort to brute-force search (which is more
> or less what btree does). I don't think this operation needs to be very
> efficient, since it's a corner case that should only seldom be invoked.

I've just commited WALogging for GiST. It works for online backup (normal
recovery) and mostly after crash, but in this case it can't restore inclompleted
inserts although it can detected and say to log thet it's needed to reindex
GiST index.

The problem with incopmleted inserts is: when new entry is installed into leaf
page, all chain (in a worst case) of keys from root page to leaf should be
updated. Of course, case of updating all chain is rarely and usially it's
updated only part. Each key on inner pages contains union of keys (bounding box
in a case of rtree, for example) on page which it points. This union can formed
only with a help of user-defined function of opclass, because of GiST doesn't
know something about nature of keys. Returning to WAL, GiST core write xlog
entry with all nessary information for restoration before write page, but in
this moment it doesn't know it should update keys on parent page or key is
unchanged. So GiST's WAL restoration code should remember this page's update as
incompleted insert. When insert complited, GiST's core write to log that insert
is completed and restore code can clean up stored incompleted insert. If it was
crash, then sign of completed insert can be absent in log, and GiST's restore
code should continue it. While continue, it's know which page was changed and
should walk up to root. On each step of walk it should form union for page and
insert it to parent.

This is absent in btree, because of btree is sorted, so parent's key simply is
right on child page and xlog code doesn't need to call any support function.

Which ways I see:
1 Leave it as is. Incomplited inserts isn't often case, and code can detect it
to suggest reindex.
2 In principle, its possible to collect all information about changes per insert
in the tree in one WAL record. But itsn't good solution, because of it needs to
ugly coding of inserts: collect all pages and write it after pushing WAL record.
Second, this manner kills using LSN for page control for concurrency. But it
fully solve problem with incomplete insert.
3 Construct relation by XLogOpenRelation some more full than now. In fact,
restore code needs only 2 thing about index: index->rd_att and information for
working index_getprocinfo. I understand why idea to calling any support
functions during WAL recovery is bad.
4 Add one hook to finalyze WAL code. This hook should be called when system
catalog is maded accessible.
5 There is one unused bit in IndexTupleData. GiST code can use it as mark that
this tuple contains incorrect union key. In this case, GiST search algorithm
should keep it mind that such tuple has incorrect value and always should go to
it's child to check. During insertions in tree such tuples will be little by
little changed to correct. Defect of this scheme will be a possible performance
degradation after crash recovery. In other hand, it's possible to add recreation
such tuples to vacuum code (in gistbulkdelete).

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc:	Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Gist Recovery testing
Date:	2005-06-14 13:26:43
Message-ID:	27733.1118755603@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Teodor Sigaev <teodor(at)sigaev(dot)ru> writes:
> ... Which ways I see:
> 5 There is one unused bit in IndexTupleData. GiST code can use it as
> mark that this tuple contains incorrect union key. In this case, GiST
> search algorithm should keep it mind that such tuple has incorrect
> value and always should go to it's child to check. During insertions
> in tree such tuples will be little by little changed to
> correct. Defect of this scheme will be a possible performance
> degradation after crash recovery. In other hand, it's possible to add
> recreation such tuples to vacuum code (in gistbulkdelete).

This seems like a good answer --- if you fix up such things during
vacuum then the performance issue won't last too long.

Rather than eating up the extra flag bit, though, would it be possible
to change the tuple to appear to contain a NULL? AFAIK, GiST can't have
a NULL (at least not in the first column) so this should be an unused
state.

regards, tom lane

From:	Teodor Sigaev <teodor(at)sigaev(dot)ru>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Gist Recovery testing
Date:	2005-06-14 14:01:33
Message-ID:	42AEE33D.30806@sigaev.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

>
> This seems like a good answer --- if you fix up such things during
> vacuum then the performance issue won't last too long.
>
> Rather than eating up the extra flag bit, though, would it be possible
> to change the tuple to appear to contain a NULL? AFAIK, GiST can't have
> a NULL (at least not in the first column) so this should be an unused
> state.

We would like to preserve NULL as non-special value because we hope to add
support of NULLs to GiST, although it's of low priority. In other side,
IndexTupleData contains ItemPointerData which contains block number and offset.
GiST doesn't use offset on inner pages (on leaf pages, ItemPointerData points to
heap tuple), it always equal FirstOffsetNumber. So I can use offset to indicate
correctness of tuple's key...

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc:	Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Gist Recovery testing
Date:	2005-06-14 14:31:03
Message-ID:	28437.1118759463@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Teodor Sigaev <teodor(at)sigaev(dot)ru> writes:
>> Rather than eating up the extra flag bit, though, would it be possible
>> to change the tuple to appear to contain a NULL?

> We would like to preserve NULL as non-special value because we hope to add
> support of NULLs to GiST, although it's of low priority.

Fair enough.

> In other side, IndexTupleData contains ItemPointerData which contains
> block number and offset. GiST doesn't use offset on inner pages (on
> leaf pages, ItemPointerData points to heap tuple), it always equal
> FirstOffsetNumber. So I can use offset to indicate correctness of
> tuple's key...

OK, that's a tad klugy but I agree with the approach. I'd like to
preserve the spare flag bit in IndexTuple->t_info --- we might need
that someday.

regards, tom lane

From:	Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To:	Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Gist Recovery testing
Date:	2005-06-15 18:08:32
Message-ID:	42B06EA0.40808@kaltenbrunner.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Teodor Sigaev wrote:
>> btree manages to avoid calling the index support functions during WAL
>> restore --- can't you do the same? Perhaps you need to be including
>> more information in the initial xlog records, so that the cleanup step
>> has everything it needs. Or resort to brute-force search (which is more
>> or less what btree does). I don't think this operation needs to be very
>> efficient, since it's a corner case that should only seldom be invoked.
>
>
> I've just commited WALogging for GiST. It works for online backup
> (normal recovery) and mostly after crash, but in this case it can't
> restore inclompleted inserts although it can detected and say to log
> thet it's needed to reindex GiST index.

FYI: we now have at least 4 machines(otter,kingfisher,lionfish,corgi) on
the buildfarm crashing during testing of GIST-related things in contrib.

Any chance this could be related to this change ?

Stefan

From:	Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To:	Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
Cc:	Teodor Sigaev <teodor(at)sigaev(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Gist Recovery testing
Date:	2005-06-15 19:16:34
Message-ID:	Pine.GSO.4.63.0506152312400.10138@ra.sai.msu.su
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 15 Jun 2005, Stefan Kaltenbrunner wrote:

> Teodor Sigaev wrote:
>>> btree manages to avoid calling the index support functions during WAL
>>> restore --- can't you do the same? Perhaps you need to be including
>>> more information in the initial xlog records, so that the cleanup step
>>> has everything it needs. Or resort to brute-force search (which is more
>>> or less what btree does). I don't think this operation needs to be very
>>> efficient, since it's a corner case that should only seldom be invoked.
>>
>>
>> I've just commited WALogging for GiST. It works for online backup
>> (normal recovery) and mostly after crash, but in this case it can't
>> restore inclompleted inserts although it can detected and say to log
>> thet it's needed to reindex GiST index.
>
>
> FYI: we now have at least 4 machines(otter,kingfisher,lionfish,corgi) on
> the buildfarm crashing during testing of GIST-related things in contrib.
>
> Any chance this could be related to this change ?

Most probably :) But, wait a little bit.
We have a patch currently tested and I see no problem with all GiST-based
contribs on my Slackware Linux 10.1 using it.

>
>
> Stefan
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
>

From:	Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To:	Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Cc:	Teodor Sigaev <teodor(at)sigaev(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Gist Recovery testing
Date:	2005-06-19 19:34:06
Message-ID:	42B5C8AE.9050601@kaltenbrunner.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Oleg Bartunov wrote:
> On Wed, 15 Jun 2005, Stefan Kaltenbrunner wrote:
>
>> Teodor Sigaev wrote:
>>
>>>> btree manages to avoid calling the index support functions during WAL
>>>> restore --- can't you do the same? Perhaps you need to be including
>>>> more information in the initial xlog records, so that the cleanup step
>>>> has everything it needs. Or resort to brute-force search (which is
>>>> more
>>>> or less what btree does). I don't think this operation needs to be
>>>> very
>>>> efficient, since it's a corner case that should only seldom be invoked.
>>>
>>>
>>>
>>> I've just commited WALogging for GiST. It works for online backup
>>> (normal recovery) and mostly after crash, but in this case it can't
>>> restore inclompleted inserts although it can detected and say to log
>>> thet it's needed to reindex GiST index.
>>
>>
>>
>> FYI: we now have at least 4 machines(otter,kingfisher,lionfish,corgi) on
>> the buildfarm crashing during testing of GIST-related things in contrib.
>>
>> Any chance this could be related to this change ?
>
>
> Most probably :) But, wait a little bit. We have a patch currently
> tested and I see no problem with all GiST-based contribs on my Slackware
> Linux 10.1 using it.

I played a little bit on lionfish(this is the result of a COPY of the
btree_gist testdata into an variant of the regressiontest tables) and
managed to get the following backtrace:

#0 gistmakedeal (state=0x0, giststate=0x7fff5128) at gist.c:597
#1 0x00436658 in gistdoinsert (r=0x2c0752e0, itup=0x100b4c10,
giststate=0x7fff5128) at gist.c:325
#2 0x00436444 in gistinsert (fcinfo=0x2b52eab0) at gist.c:288
#3 0x0073522c in FunctionCall6 (flinfo=0x2b52eab0, arg1=39, arg2=0,
arg3=810, arg4=5, arg5=39, arg6=5) at fmgr.c:1270
#4 0x0045aca8 in index_insert (indexRelation=0x2c0752e0,
values=0x7fff6920, isnull=0x7fff69a0 "", heap_t_ctid=0x100b2bec,
heapRelation=0x1, check_uniqueness=0 '\0')
at indexam.c:215
#5 0x00580074 in ExecInsertIndexTuples (slot=0x100ad710,
tupleid=0x100b2bec, estate=0x100ab5c0, is_vacuum=0 '\0') at execUtils.c:935
#6 0x00519800 in CopyFrom (rel=0x2c072a90, attnumlist=0x100ab200,
binary=0 '\0', oids=0 '\0', delim=0x7c277c "\t", null_print=0x7c2774
"\\N", csv_mode=0 '\0',
quote=0x0, escape=0x0, force_notnull_atts=0x0, header_line=0 '\0')
at copy.c:1955
#7 0x00515f08 in DoCopy (stmt=0x2b52eab0) at copy.c:1032
#8 0x00671868 in ProcessUtility (parsetree=0x10090f10, params=0x0,
dest=0x10090f78, completionTag=0x7fff6f78 "") at utility.c:608
#9 0x0066f5ec in PortalRunUtility (portal=0x100990c8, query=0x10090fc8,
dest=0x10090f78, completionTag=0x7fff6f78 "") at pquery.c:940
#10 0x0066fbb0 in PortalRunMulti (portal=0x100990c8, dest=0x10090f78,
altdest=0x10090f78, completionTag=0x7fff6f78 "") at pquery.c:1007
#11 0x0066eb30 in PortalRun (portal=0x100990c8, count=2147483647,
dest=0x10090f78, altdest=0x10090f78, completionTag=0x7fff6f78 "") at
pquery.c:617
#12 0x00666f60 in exec_simple_query (query_string=0x10090be8 "COPY
inettmp FROM STDIN") at postgres.c:1021
#13 0x0066be54 in PostgresMain (argc=4, argv=0x100513c0,
username=0x10051390 "pgbuild") at postgres.c:3186
#14 0x00621304 in BackendRun (port=0x10060300) at postmaster.c:2800
#15 0x006208bc in BackendStartup (port=0x10060300) at postmaster.c:2440
#16 0x0061d23c in ServerLoop () at postmaster.c:1221
#17 0x0061b6d0 in PostmasterMain (argc=3, argv=0x10050e40) at
postmaster.c:930
#18 0x005ab904 in main (argc=3, argv=0x10050e40) at main.c:268

Stefan

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
Cc:	Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Gist Recovery testing
Date:	2005-06-20 00:54:23
Message-ID:	16445.1119228863@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> writes:
> FYI: we now have at least 4 machines(otter,kingfisher,lionfish,corgi) on
> the buildfarm crashing during testing of GIST-related things in contrib.

I'm seeing some problems on Mac OS X, too. The tsearch regression test
crashed ... which we may not care about much since tsearch is presumably
going away ... but after that, the postmaster failed to restart:

LOG: database system was not properly shut down; automatic recovery in progress
LOG: redo starts at 0/4980748
PANIC: gistRedoEntryUpdateRecord: uninitialized page
LOG: startup process (PID 1728) was terminated by signal 6
LOG: aborting startup due to startup process failure

Stack trace for that is

#0 0x9004a12c in kill ()
#1 0x90120954 in abort ()
#2 0x001cfcdc in errfinish (dummy=0) at elog.c:451
#3 0x001d06bc in elog_finish (elevel=-1073761600, fmt=0x2818dc "\002") at elog.c:932
#4 0x0000c7c4 in gistRedoEntryUpdateRecord (lsn={xlogid = 0, xrecoff = 77628316}, record=0x57, isnewroot=0 '\0') at gistxlog.c:186
#5 0x0000ca0c in gist_redo (lsn={xlogid = 0, xrecoff = 77628316}, record=0xa8d000) at gistxlog.c:399
#6 0x00042b4c in StartupXLOG () at xlog.c:4509
#7 0x0004c9d8 in BootstrapMain (argc=4, argv=0xbfffde28) at bootstrap.c:413
#8 0x00126278 in StartChildProcess (xlop=2) at postmaster.c:3484
#9 0x00126858 in reaper (postgres_signal_arg=0) at postmaster.c:2165
#10 <signal handler called>
#11 0x9001efe8 in select ()
#12 0x00126e70 in ServerLoop () at postmaster.c:1168
#13 0x00128c54 in PostmasterMain (argc=3, argv=0xd00600) at postmaster.c:930
#14 0x000e3050 in main (argc=3, argv=0xbfffe57c) at main.c:268

I checked that it was processing a type-0 (XLOG_GIST_ENTRY_UPDATE)
record, but am not sure what else to look at. I do think it's
questionable that this log record type doesn't appear to reference
any buffers in the list it passes to XLogInsert. I believe the
general rule is that an xlog record that is describing a change in
a buffered page ought to tell XLogInsert so --- and the redo routine
has to check to see if the change was replaced by a whole-page record.
I'm not entirely sure that violating that rule leads directly to this
class of failure, but it definitely leads to failure to recover from
partial page writes during a hardware crash.

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Gist Recovery testing
Date:	2005-06-20 01:05:35
Message-ID:	16519.1119229535@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I wrote:
> I'm seeing some problems on Mac OS X, too. The tsearch regression test
> crashed ... which we may not care about much since tsearch is presumably
> going away ... but after that, the postmaster failed to restart:

I tried it again to check that it was reproducible. The tsearch test
crashed in apparently the same place, but the recovery failed in a
*different* place:

LOG: redo starts at 0/470054
PANIC: lock 207 is not held
LOG: startup process (PID 2077) was terminated by signal 6
LOG: aborting startup due to startup process failure

(gdb) bt
#0 0x9004a12c in kill ()
#1 0x90120954 in abort ()
#2 0x001cfcdc in errfinish (dummy=0) at elog.c:451
#3 0x001d06bc in elog_finish (elevel=-1073761552, fmt=0x2818dc "\002") at elog.c:932
#4 0x0014d718 in LWLockRelease (lockid=207) at lwlock.c:442
#5 0x0000cff8 in gist_redo (lsn={xlogid = 0, xrecoff = 5130896}, record=0x14) at gistxlog.c:315
#6 0x00042b4c in StartupXLOG () at xlog.c:4509
#7 0x0004c9d8 in BootstrapMain (argc=4, argv=0xbfffde28) at bootstrap.c:413
#8 0x00126278 in StartChildProcess (xlop=2) at postmaster.c:3484
#9 0x00126858 in reaper (postgres_signal_arg=0) at postmaster.c:2165
#10 <signal handler called>
#11 0x9001efe8 in select ()
#12 0x00126e70 in ServerLoop () at postmaster.c:1168
#13 0x00128c54 in PostmasterMain (argc=3, argv=0xd00600) at postmaster.c:930
#14 0x000e3050 in main (argc=3, argv=0xbfffe57c) at main.c:268

I find this code pretty unreadable, but possibly lines 315 and 316
should be manipulating newpage->buffer instead of buffer?

regards, tom lane

From:	Teodor Sigaev <teodor(at)sigaev(dot)ru>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Gist Recovery testing
Date:	2005-06-20 08:28:14
Message-ID:	42B67E1E.5090508@sigaev.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> I find this code pretty unreadable, but possibly lines 315 and 316
> should be manipulating newpage->buffer instead of buffer?
>

You are right, I'll commit changes some later with a lot of additional work..

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/