Re: WAL format and API changes (9.5)

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: WAL format and API changes (9.5)
Date: 2014-06-27 06:12:05
Message-ID: CAB7nPqQO57-nEXgT2zkvmsYwXov9kzVzEBio05=SSH_ghe8+=Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jun 15, 2014 at 6:11 AM, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com
> wrote:

> On 06/14/2014 09:43 PM, Alvaro Herrera wrote:
>
>> Heikki Linnakangas wrote:
>>
>> Here's a new version, rebased against master. No other changes.
>>>
>>
>> This is missing xlogrecord.h ...
>>
>
> Oops, here you are.
>
Looking at this patch, it does not compile when assertions are enabled
because of this assertion in heapam.c:
Assert(recdata == data + datalen);
It seems like a thinko as removing it makes the code compile.

Here is a review of this patch:
- Compilation without warnings, regression tests pass
- That's not a small patch, but the changes mainly done are xlog record
insertion in accordance to the new API format.
$ git diff master --stat | tail -n1
52 files changed, 3601 insertions(+), 4371 deletions(-)

Some comments about the code:
1) In src/backend/access/transam/README:
's/no further action is no required/no further action is required/g'
2) Description of XLogBeginInsert, XLogHasBlockRef, XLogGetBlockRef and
XLogGetBlockRefIds are missing in src/backend/access/transam/README.
3) There are a couple of code blocks marked as FIXME and BROKEN. There are
as well some NOT_USED blocks.
4) Current patch crashes when running make check in contrib/test_decoding.
Looking closer, wal_level = logical is broken:
=# show wal_level;
wal_level
-----------
logical
(1 row)
=# create database foo;
The connection to the server was lost. Attempting reset: Failed.

Looking even closer, log_heap_new_cid is broken:
(lldb) bt
* thread #1: tid = 0x0000, 0x00007fff8a3c2866
libsystem_kernel.dylib`__pthread_kill + 10, stop reason = signal SIGSTOP
* frame #0: 0x00007fff8a3c2866 libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x00007fff8d37235c libsystem_pthread.dylib`pthread_kill + 92
frame #2: 0x00007fff9369db1a libsystem_c.dylib`abort + 125
frame #3: 0x0000000104428819
postgres`ExceptionalCondition(conditionName=0x00000001044a711c,
errorType=0x000000010448b19f, fileName=0x00000001044a2cfd, lineNumber=6879)
+ 137 at assert.c:54
frame #4: 0x0000000103f08dbe
postgres`log_heap_new_cid(relation=0x00007fae4482afb8,
tup=0x00007fae438062e8) + 126 at heapam.c:6879
frame #5: 0x0000000103f080cf
postgres`heap_insert(relation=0x00007fae4482afb8, tup=0x00007fae438062e8,
cid=0, options=0, bistate=0x0000000000000000) + 383 at heapam.c:2095
frame #6: 0x0000000103f09b6a
postgres`simple_heap_insert(relation=0x00007fae4482afb8,
tup=0x00007fae438062e8) + 74 at heapam.c:2533
frame #7: 0x000000010406aacf postgres`createdb(stmt=0x00007fae44843690)
+ 6335 at dbcommands.c:511
frame #8: 0x000000010429def8
postgres`standard_ProcessUtility(parsetree=0x00007fae44843690,
queryString=0x00007fae44842c38, context=PROCESS_UTILITY_TOPLEVEL,
params=0x0000000000000000, dest=0x00007fae44843a18,
completionTag=0x00007fff5bd4ee30) + 1720 at utility.c:56

5) Yes,there are some WAL records that have only data related to buffers,
XLOG_GIN_VACUUM_DATA_LEAF_PAGE being one, with nothing registered with
buffer_id = -d, but using a dummy entry seems counter-productive:
+ rdata = rdata_head;
+ if (rdata == NULL)
+ {
+ /*
+ * the rest of this function assumes that there is at least
some
+ * rdata entries, so fake an empty rdata entry
+ */
+ dummy_rdata.data = NULL;
+ dummy_rdata.len = 0;
+ dummy_rdata.next = NULL;
+ rdata = &dummy_rdata;
+ }
Could this be removed and replaced by a couple of checks on rdata being
NULL in some cases? XLogInsert should be updated in consequence.
6) XLogRegisterData creates a XLogRecData entry and appends it in one of
the static XLogRecData entries if buffer_id >= 0 or to
rdata_head/rdata_tail if buffer_id = -1 (for permanent data related to the
WAL record). XLogRegisterXLogRecData does something similar, but with an
entry of XLogRecData already built. What do you think about removing
XLogRegisterXLogRecData and keeping only XLogRegisterData. This makes the
interface simpler, and XLogRecData could be kept private in xlog.c. This
would need more work especially on gin side where I am seeing for example
constructLeafRecompressWALData directly building a XLogRecData entry.
By doing so, we could as remove the while loop at the bottom of
XLogRegisterXLogRecData as we would insert in the tail only the latest
record created, aka replacing that:
while ((*tail)->next)
*tail = (*tail)->next;
By simply that:
*tail = (*tail)->next;
7) New structures at the top of xlog.c should have more comments about
their role, particularly rdata_head and rdata_tail that store main WAL data
(id of -1)
8) What do you think about adding in the README a description about how to
retrieve the block list modified by a WAL record, for a flow similar to
that:
record = XLogReadRecord(...);
nblocks = XLogGetBlockRefIds(record, blockids);
for (i = 0; i < nblocks; i++)
{
bkpb = XLogGetBlockRef(record, id, NULL);
// stuff
}
pg_xlogdump is a good example of what can be done as well.
9) In ginxlog.c, shouldn't this block of code use XLogGetPayload?
+ char *payload = XLogRecGetData(record) +
sizeof(ginxlogInsert);
+#ifdef NOT_USED
leftChildBlkno = BlockIdGetBlockNumber((BlockId) payload);
+#endif
10) Shouldn't the call to XLogBeginInsert come first here?
@@ -1646,7 +1647,16 @@ spgAddNodeAction(Relation index, SpGistState *state,
{
XLogRecPtr recptr;

- recptr = XLogInsert(RM_SPGIST_ID,
XLOG_SPGIST_ADD_NODE, rdata);
+ XLogRegisterBuffer(0, saveCurrent.buffer,
REGBUF_STANDARD); /* orig page */
+ XLogRegisterBuffer(1, current->buffer,
REGBUF_STANDARD); /* new page */
+ if (xlrec.parentBlk == 2)
+ XLogRegisterBuffer(2, parent->buffer,
REGBUF_STANDARD);
+
+ XLogBeginInsert();
+ XLogRegisterData(-1, (char *) &xlrec,
sizeof(xlrec));
+ XLogRegisterData(-1, (char *) newInnerTuple,
newInnerTuple->size);
+
+ recptr = XLogInsert(RM_SPGIST_ID,
XLOG_SPGIST_ADD_NODE);
11) Any reason for raising a PANIC here?
@@ -1126,7 +1126,7 @@ LWLockRelease(LWLock *l)
break;
}
if (i < 0)
- elog(ERROR, "lock %s %d is not held", T_NAME(l), T_ID(l));
+ elog(PANIC, "lock %s %d is not held", T_NAME(l), T_ID(l));
num_held_lwlocks--;
for (; i < num_held_lwlocks; i++)
14) SPgist redo is crashing (during make installcheck run on a master, redo
done on a standby at recovery).
(lldb) bt
* thread #1: tid = 0x0000, 0x00007fff8a3c2866
libsystem_kernel.dylib`__pthread_kill + 10, stop reason = signal SIGSTOP
* frame #0: 0x00007fff8a3c2866 libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x00007fff8d37235c libsystem_pthread.dylib`pthread_kill + 92
frame #2: 0x00007fff9369db1a libsystem_c.dylib`abort + 125
frame #3: 0x000000010ba8d819
postgres`ExceptionalCondition(conditionName=0x000000010bb21ea9,
errorType=0x000000010baf019f, fileName=0x000000010bb212dd, lineNumber=65) +
137 at assert.c:54
frame #4: 0x000000010b5c3a50
postgres`addOrReplaceTuple(page=0x000000010bfee980,
tuple=0x00007ffd41822902, size=770059, offset=18) + 688 at spgxlog.c:65
frame #5: 0x000000010b5c031f postgres`spgRedoMoveLeafs(lsn=71708608,
record=0x00007ffd41822800) + 719 at spgxlog.c:263
frame #6: 0x000000010b5bf492 postgres`spg_redo(lsn=71708608,
record=0x00007ffd41822800) + 402 at spgxlog.c:988
frame #7: 0x000000010b5e774a postgres`StartupXLOG + 8474 at xlog.c:6780
frame #8: 0x000000010b86a5fe postgres`StartupProcessMain + 430 at
startup.c:224
frame #9: 0x000000010b600409 postgres`AuxiliaryProcessMain(argc=2,
argv=0x00007fff546ea190) + 1897 at bootstrap.c:416
frame #10: 0x000000010b864998
postgres`StartChildProcess(type=StartupProcess) + 328 at postmaster.c:5090
frame #11: 0x000000010b862aa9 postgres`PostmasterMain(argc=3,
argv=0x00007ffd40c04470) + 5401 at postmaster.c:1212
frame #12: 0x000000010b7a9335 postgres`main(argc=3,
argv=0x00007ffd40c04470) + 773 at main.c:227

Also, I wanted to run the WAL replay facility to compare before and after
buffer images, but code crashed before... I am still planning to do so once
this code gets more stable though.
Regards,
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2014-06-27 06:33:38 Re: review: Built-in binning functions
Previous Message Michael Paquier 2014-06-27 05:57:14 Re: WAL replay bugs