Quick Links

Re: incorrect handling of the timeout in pg_receivexlog

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: incorrect handling of the timeout in pg_receivexlog
Date:	2012-02-07 09:35:23
Message-ID:	CABUevEyNa7jr6hNT6e5A5-=3Qkhztiqn+3h3se3PLRpL-6f6zQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Feb 7, 2012 at 10:31, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> On 07.02.2012 09:03, Fujii Masao wrote:
>>
>> On Tue, Feb 7, 2012 at 2:58 PM, Fujii Masao<masao(dot)fujii(at)gmail(dot)com> wrote:
>>>
>>> When I compiled HEAD with --disable-integer-datetimes and tested
>>>
>>> pg_receivexlog, I encountered unexpected replication timeout. As
>>> far as I read the pg_receivexlog code, the cause of this problem is
>>> that pg_receivexlog handles the standby message timeout incorrectly
>>> in --disable-integer-datetimes. The attached patch fixes this problem.
>>> Comments?
>>
>>
>> receivelog.c
>> -------
>> timeout.tv_sec = last_status + standby_message_timeout - now - 1;
>> if (timeout.tv_sec<= 0)
>> -------
>>
>> Umm.. the above code also handles the timestamp incorrectly. ISTM that the
>> root cause of these problems is that receivelog.c uses TimestampTz.
>
>
> Yep. While localGetCurrentTimestamp() returns a TimestampTz and handles
> float timestamps correctly, the caller just assigns the result to a int64
> variable, assuming --enable-integer-datetimes.

Ugh. Indeed.

>> What about changing receivelog.c so that it uses time_t instead of
>> TimestampTz? Which would make the code simpler, I think.
>
>
> Hmm, that would reduce granularity to seconds. The --statusint option is
> given in seconds, but it would be good to have more precision in the
> calculations to avoid rounding errors.
>
> But actually, if the purpose of the --statusint option is to avoid
> disconnection because of exceeding the server's replication_timeout, one
> second granularity just isn't enough to be begin with. replication_timeout
> is given in milliseconds, so if you set replication_timeout=900ms in the
> server, there is no way to make pg_basebackup/pg_receivexlog to reply in
> time.
>
> So, --statusint needs to be in milliseconds. And while we're at it, how
> difficult would be to ask the server for the current value of
> replication_timeout, and set --statusint automatically based on that? Or
> perhaps mark replication_timeout as GUC_REPORT. It is rather fiddly that
> depending on a server setting, you need to pass an option in the client or
> it will just silently fail with no indication of what the problem is.

We can't really ask for it easily, since we're on a replication
connection. Unless we add that to the walsender grammar, but that
would make it impossible to use the receivexlog stuff against a 9.1
server (which I think still works, though I haven't tested it in a
while).

Do we have a facility to make it a GUC_REPORT but only for walsender
connections? It seems like a very unnecessary thing to have it sent
out over every single connection, since it would only be useful in a
very small subset of them.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

In response to

Re: incorrect handling of the timeout in pg_receivexlog at 2012-02-07 09:31:19 from Heikki Linnakangas

Responses

Re: incorrect handling of the timeout in pg_receivexlog at 2012-02-07 09:55:14 from Heikki Linnakangas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Heikki Linnakangas	2012-02-07 09:55:14	Re: incorrect handling of the timeout in pg_receivexlog
Previous Message	Heikki Linnakangas	2012-02-07 09:31:19	Re: incorrect handling of the timeout in pg_receivexlog