From: | Amit kapila <amit(dot)kapila(at)huawei(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(at)enterprisedb(dot)com> |
Cc: | "pgsql-hackers(at)postgreSQL(dot)org" <pgsql-hackers(at)postgreSQL(dot)org> |
Subject: | Re: New WAL code dumps core trivially on replay of bad data |
Date: | 2012-08-18 05:52:01 |
Message-ID: | 6C0B27F7206C9E4CA54AE035729E9C38285258A8@szxeml509-mbx |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Tom Lane Sent: Saturday, August 18, 2012 7:16 AM
> The startup process's stack trace is
> #0 0x26fd1c in RecordIsValid (record=0x4008d7a0, recptr=80658424, emode=15)
> at xlog.c:3713
> 3713 COMP_CRC32(crc, XLogRecGetData(record), len);
> (gdb) bt
> #0 0x26fd1c in RecordIsValid (record=0x4008d7a0, recptr=80658424, emode=15)
> at xlog.c:3713
> #1 0x270690 in ReadRecord (RecPtr=0x7b03bad0, emode=15,
> fetching_ckpt=0 '\000') at xlog.c:4006
> The current WAL address is 80658424 == 0x04cebff8, that is just 8 bytes
> short of a page boundary, and what RecordIsValid thinks it is dealing
> with is
> so it merrily tries to compute a checksum on a gigabyte worth of data,
> and soon falls off the end of memory.
> In reality, inspection of the WAL file suggests that this is the end of
> valid data and what should have happened is that replay just stopped.
> The xl_len and so forth shown above are just garbage from off the end of
> what was actually read from the file (everything beyond offset 0xcebff8
> in file 4 is in fact zeroes).
> I'm not sure whether this is just a matter of having failed to
> sanity-check that xl_tot_len is at least SizeOfXLogRecord, or whether
> there is a deeper problem with the new design of continuation records
> that makes it impossible to validate records safely.
Earlier there was a check related to total length in ReadRecord, before it calls RecordIsValid()
if (record->xl_tot_len < SizeOfXLogRecord + record->xl_len ||
record->xl_tot_len > SizeOfXLogRecord + record->xl_len +
XLR_MAX_BKP_BLOCKS * (sizeof(BkpBlock) + BLCKSZ))
I think that missing check of total length has caused this problem. However now this check will be different.
With Regards,
Amit Kapila.
From | Date | Subject | |
---|---|---|---|
Next Message | Craig Ringer | 2012-08-18 08:34:37 | Re: [PATCH] Docs: Make notes on sequences and rollback more obvious |
Previous Message | Jeff Janes | 2012-08-18 02:18:52 | "CLUSTER VERBOSE" tab completion |