Quick Links

Re: patch-3 (3-allow-wal-record-header-to-be-split.patch)WAL Format Changes

Lists:	pgsql-hackers

From:	Amit kapila <amit(dot)kapila(at)huawei(dot)com>
To:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "heikki(dot)linnakangas(at)enterprisedb(dot)com" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Subject:	patch-3 (3-allow-wal-record-header-to-be-split.patch)WAL Format Changes
Date:	2012-06-30 07:11:15
Message-ID:	6C0B27F7206C9E4CA54AE035729E9C3828513A55@szxeml509-mbx
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

While reading patch-3 (3-allow-wal-record-header-to-be-split.patch) of
WAL Format
Changes(http://archives.postgresql.org/message-id/4FDA5136.6080206@enterprisedb.com), I had few observations which are summarized below:

1.
ReadRecord(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt)
+ /*
+ * If we got the whole header already, validate it immediately. Otherwise
+ * we validate it after reading the rest of the header from the next page.
+ */
+ if (targetRecOff <= XLOG_BLCKSZ - SizeOfXLogRecord)
+ {
+ if (!ValidXLogRecordHeader(RecPtr, record, emode, randAccess))
+ goto next_record_is_invalid;
+ gotheader = true;
+ }
+ else
+ gotheader = false;
+

Shouldn't the record header validation be done before the check for allocating a bigger record buffer based
on total length. Otherwise it may lead to allocation of bigger buffer which may not be required if record header is invalid.
In cases where record header is not split, validation can be done before otherwise it can be done later.

3. General observation, not related to your changes
XLogInsert(RmgrId rmid, uint8 info, XLogRecData *rdata)
.
.
if (freespace == 0)
{
updrqst = AdvanceXLInsertBuffer(false);

In the code, AdvanceXLInsertBuffer(false); is called after starting critical section and acquiring
WALInsertLock, now if any error occurs inside AdvanceXLInsertBuffer()
(in one of the paths it calls XLogWrite(), from which it can call XLogFileInit() where error can occur);
how will it release WALInsertLock or end critical section.

With Regards,

Amit Kapila.

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Amit kapila <amit(dot)kapila(at)huawei(dot)com>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: patch-3 (3-allow-wal-record-header-to-be-split.patch)WAL Format Changes
Date:	2012-06-30 20:24:17
Message-ID:	4FEF6071.3080107@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 30.06.2012 10:11, Amit kapila wrote:
> ReadRecord(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt)
> + /*
> + * If we got the whole header already, validate it immediately. Otherwise
> + * we validate it after reading the rest of the header from the next page.
> + */
> + if (targetRecOff<= XLOG_BLCKSZ - SizeOfXLogRecord)
> + {
> + if (!ValidXLogRecordHeader(RecPtr, record, emode, randAccess))
> + goto next_record_is_invalid;
> + gotheader = true;
> + }
> + else
> + gotheader = false;
> +
>
> Shouldn't the record header validation be done before the check for allocating a bigger record buffer based
> on total length. Otherwise it may lead to allocation of bigger buffer which may not be required if record header is invalid.

Hmm, doing an unnecessary memory allocation just before giving up isn't
really a problem. And we treat out-of-memory the same as a corrupt
record, so this isn't a correctness issue. But I agree it would still be
better to change the order, if only because you're more likely to get a
better error message than "out of memory".

> In cases where record header is not split, validation can be done before otherwise it can be done later.

Committed that way. We could also delay enlarging the buffer until after
we read the next page and get the whole header, but it's probably fine
as it is.

> 3. General observation, not related to your changes
> XLogInsert(RmgrId rmid, uint8 info, XLogRecData *rdata)
> .
> .
> if (freespace == 0)
> {
> updrqst = AdvanceXLInsertBuffer(false);
>
> In the code, AdvanceXLInsertBuffer(false); is called after starting critical section and acquiring
> WALInsertLock, now if any error occurs inside AdvanceXLInsertBuffer()
> (in one of the paths it calls XLogWrite(), from which it can call XLogFileInit() where error can occur);
> how will it release WALInsertLock or end critical section.

Yep, if an I/O error happens while writing a WAL record - running out of
disk space is the typical example - we PANIC. Nope, it's not ideal.

Even if we somehow managed to avoid doing I/O in the critical section in
XLogInsert(), most callers of XLogInsert() surround the call in a
critical section anyway. For example, when a new tuple is inserted to
heap, once you've modified the page to add the new tuple, it's too late
to back out. The WAL record is written while holding the lock on the
page, and if something goes wrong with writing the WAL record, we have
no choice but PANIC.

It would be nice to avoid that, at least for the common
out-of-disk-space case. Perhaps we could somehow pre-reserve some WAL
space before starting to modify the page. But that's a pretty big project..

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Amit kapila <amit(dot)kapila(at)huawei(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: patch-3 (3-allow-wal-record-header-to-be-split.patch)WAL Format Changes
Date:	2012-07-01 06:27:42
Message-ID:	6C0B27F7206C9E4CA54AE035729E9C38285171CC@szxeml509-mbs
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

From: Heikki Linnakangas [heikki(dot)linnakangas(at)enterprisedb(dot)com]
Sent: Sunday, July 01, 2012 1:54 AM
On 30.06.2012 10:11, Amit kapila wrote:
>> 3. General observation, not related to your changes
>> XLogInsert(RmgrId rmid, uint8 info, XLogRecData *rdata)
>>.
>>.
>> if (freespace == 0)
>> {
>> updrqst = AdvanceXLInsertBuffer(false);
>>
>> In the code, AdvanceXLInsertBuffer(false); is called after starting critical section and acquiring
>> WALInsertLock, now if any error occurs inside AdvanceXLInsertBuffer()
>> (in one of the paths it calls XLogWrite(), from which it can call XLogFileInit() where error can occur);
>> how will it release WALInsertLock or end critical section.

> Yep, if an I/O error happens while writing a WAL record - running out of
> disk space is the typical example - we PANIC. Nope, it's not ideal.

> Even if we somehow managed to avoid doing I/O in the critical section in
> XLogInsert(), most callers of XLogInsert() surround the call in a
> critical section anyway. For example, when a new tuple is inserted to
> heap, once you've modified the page to add the new tuple, it's too late
> to back out. The WAL record is written while holding the lock on the
> page, and if something goes wrong with writing the WAL record, we have
> no choice but PANIC.

PANIC is understandable as after this user cannot perform operation without restart.
However if the level is ERROR, then there might be other problems as user can perform operations after that through same session
The case which I am highlighting is of ERROR, please refer the code of
XLogFileInit().

For Example:
fd = BasicOpenFile(path, O_RDWR | PG_BINARY | get_sync_bit(sync_method),
S_IRUSR | S_IWUSR);
if (fd < 0)
{
if (errno != ENOENT)
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not open file \"%s\" (log file %u, segment %u): %m",
path, log, seg)));
}