Re: WAL replay bugs

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WAL replay bugs
Date: 2014-07-04 06:29:51
Message-ID: CAB7nPqT_fs_3jLNHYWC6nzej4sBL6DGsLFVCg0JBUkgjeP9Tfw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

OK, I have been working more on this patch, improving on-the-fly the
following things on top of what Horiguchi-san has reported:
- Moved sequence page opaque data to sequence.h, renaming it at the same time.
- Improvement of page type identification, particularly for sequences
using a correct opaque data structure. For gin the process is not that
cool, but I guess that there is nothing much to do as it has no proper
page identifier :(

On Thu, Jul 3, 2014 at 7:34 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> ===== bufcapt.c:
>
> - buffer_capture_remember() or so.
>
> Pages for unlogged tables are avoided to be written taking
> advantage that the lsn for such pages stays 0/0. I'd like to see
> a comment mentioning for, say, buffer_capture_is_changed? or
> buffer_capture_forget or somewhere.
Yes, it is worth mentioning and a comment in bufcapt.h seems enough.

> - buffer_capture_forget()
>
> However this error is likely not to occur, in the error message
> "could not find image...", the buffer id seems to bring no
> information. LSN, or quadraplet of LSN, rnode, forknum and
> blockno seems to be more informative.
Yesh, this seems informative.

> - buffer_capture_is_changed()
>
> The name for the function seems to be misleading. What this
> function does is comparing LSNs between stored page image and
> current page. lsn_is_changed(BufferImage) or something like
> would be clearer.
Hm, yes. This name looks better fine as it remains static within bufcapt.c.

> ====== bufmgr.c:
>
> - ConditionalLockBuffer()
>
> Sorry for a trivial comment, but the variable 'res' conceales
> the meaning. "acquired" seems to be more preferable, isn't it?
Fixed.

> - LockBuffer()
>
> There is a 'XXX' comment. The discussion written there seems to
> be right, for me. If you mind that peeking into there is a bad
> behavior, adding a macro into lwlock.h would help:p
>
> lwlock.h: #define LWLockHoldedExclusive(l) ((l)->exclusive > 0)
> lwlock.h: #define LWLockHoldedShared(l) ((l)->shared > 0)
I don't think that there is much to gain with such macros as of now
LWLock->exclusive is only used in the code this patch introduces.

> # I don't see this usable so much..
>
> bufmgr.c: if (LWLockHoldedExclusive(buf->content_lock))
>
> If there isn't any particular concern, 'XXX:' should be removed.
Well yes.

> ===== bufpage.c:
>
> - #include "storage/bufcapt.h"
>
> The include seems to be needless.
Yep.

> ===== bufcapt.h:
>
> - header comment
>
> The file name is misspelled as 'bufcaptr.h'.
Nicely spotted.

> - The includes in this header except for buf.h seem not to be
> necessary.
Yep.

> ===== init_env.sh:
>
> - pg_get_test_port()
> It determines server port using PG_VERSION_NUM, but is it
> necessary? Addition to it, the theoretical maximum initial
> PGPORT seems to be 65535 but this script search for free port
> up to the number larger by 16 from the start, which would be
> beyond the edge.
Hm, no. As of now, there is still some margin:
PG_VERSION_NUM = 90500
PG_VERSION_NUM % 16384 + 49152 = 57732

> - pg_get_test_port()
>
> It stops with error after 16 times of failure, but the message
> complains only about the final attempt. If you want to mention
> the port numbers, it might should be 'port $PGPORTSTART-$PGPORT
> ..' or would be 'All 16 ports attempted failed' or something..
Hm. You could complain about pg_upgrade as well now for the same
thing. But I guess that it doesn't hurt to complain back to caller
about the range of ports already in use, so changed this way.

Regards,
--
Michael

Attachment Content-Type Size
0001-Move-SEQ_MAGIC-and-sequence-page-opaque-data-to-sequ.patch text/plain 4.0 KB
0002-Extract-generic-bash-initialization-process-from-pg_.patch text/plain 4.7 KB
0003-Buffer-capture-facility-check-WAL-replay-consistency.patch text/plain 40.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Martijn van Oosterhout 2014-07-04 06:30:05 Re: Can simplify 'limit 1' with slow function?
Previous Message Ashutosh Bapat 2014-07-04 06:19:04 Re: Issue while calling new PostgreSQL command from a Java Application